CN115132370A

CN115132370A - Flow adjustment auxiliary method and device based on machine vision and deep learning

Info

Publication number: CN115132370A
Application number: CN202210802386.2A
Authority: CN
Inventors: 涂文靖; 宁高宁; 邬佳浩
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-09-30

Abstract

The invention discloses a method and a device for flow adjustment assistance based on machine vision and deep learning. In addition, the system does not need hardware, can be directly carried with the existing monitoring system for use, is maintained remotely, and is convenient to update. Generally, the software effect is superior to the existing flow modulation method mode, and the software has the advantages of strong universality, wide application range, simple and convenient use and the like.

Description

Method and device for flow adjustment assistance based on machine vision and deep learning

Technical Field

The invention belongs to the technical field of computer vision and deep learning, and particularly relates to a method and a device for assisting flow adjustment based on machine vision and deep learning.

Background

Some solutions are currently available for performing flow modulation, such as:

inquiring the travel track of an infected person in a manual inquiry mode, acquiring potential close contacts in the area near the track of the infected person by utilizing information such as public transport and the like, and further manually inquiring and confirming; and secondly, arranging the site codes in a large range, requiring pedestrians to enter the site code scanning site codes, and directly processing potential close contacts who scan the site code of the site of the infected person by using a big data analysis mode after the infected person appears.

In the two main flow solutions described above, the following problems exist:

(1) the first scheme has the interference of factors such as forgetting of an infected person, the obtained track of the infected person is not accurate enough, and meanwhile, the problems of long time consumption and low manpower requirement efficiency exist.

(2) The second scheme also has the problems that pedestrians do not cooperate and the site codes are not scanned, so that the final result is difficult to be accurate.

Above-mentioned two kinds of schemes all can obtain infected person's stroke orbit to a certain extent, but has the precision low, the inefficiency scheduling problem.

Disclosure of Invention

In view of the defects in the prior art, an object of the embodiments of the present application is to provide a method and an apparatus for assisting a style transition based on machine vision and deep learning.

According to a first aspect of embodiments of the present application, there is provided a method for flow adjustment assistance based on machine vision and deep learning, including:

(1) obtaining a target to be tracked: acquiring a video and a video frame of which a target in the video is not blocked, carrying out pedestrian detection on the video frame to obtain a plurality of selection frames, selecting the corresponding selection frame as the selection frame of the target to be tracked according to the received user instruction, and storing the image of the target to be tracked in the corresponding selection frame and the corresponding time;

(2) single-lens target tracking: tracking the target to be tracked forward and backward respectively according to the image of the target to be tracked and the corresponding time saved in the step (1) to obtain all video frames of the target to be tracked appearing in the video, recording the track information of the target to be tracked, and saving each video frame and the corresponding image, rectangular frame coordinates and time in the rectangular frame of the target to be tracked;

(3) mask detection and classified preservation of risk personnel: all video frames of the target to be tracked appearing in the video are subjected to pedestrian detection and mask detection, and then classified storage is carried out through a re-recognition algorithm;

(4) cross-shot object re-recognition: after the target to be tracked disappears in the video, searching whether the target to be tracked exists in the video of the corresponding camera in the map adjacency matrix according to the map adjacency matrix representing the geographic position of the camera, so as to obtain the complete track information of the target to be tracked;

(5) drawing a target track: and drawing the track of the target to be tracked on a map according to the complete track information of the target to be tracked.

Further, the step (1) includes the following sub-steps:

(1.1) after a video frame in which a target to be tracked is not shielded in a video is obtained, carrying out pedestrian identification on the video frame by using a first target detection algorithm, marking the identification result in the video frame to generate a plurality of selection frames for a user to select the target to be tracked;

and (1.2) selecting the corresponding selection frame as the selection frame of the target to be tracked according to the received user instruction, and storing the image of the target to be tracked in the corresponding selection frame and the corresponding time.

Further, the step (2) includes the following sub-steps:

(2.1) obtaining T in the video from the database _s The image of the moment is subjected to pedestrian recognition to obtaink rectangular frames R _Ts1 、R _Ts2 …R _Tsk (k>1), respectively carrying out coincidence degree matching on all rectangular frames on the image and the selection frame of the target to be tracked to obtain a rectangular frame R corresponding to the target to be tracked _w1 And recording the relevant coordinate information;

(2.2) from T _s Starting to load video frames forwards and carrying out pedestrian detection on images of each frame by utilizing a first target detection algorithm for T _n To obtain i rectangular frame R _Tn1 、R _Tn2 …R _Tni (i>1), simultaneously, tracking the target to be tracked by using a multi-target tracking algorithm until a rectangular frame of the target to be tracked disappears in the video, and recording forward track information and disappearing time T of the target to be tracked in the video _f Storing each video frame and the corresponding image in the rectangular frame, the coordinate of the rectangular frame and the time of the target to be tracked;

(2.3) from T _s Starting to load video frames backwards and carrying out pedestrian detection on images of each frame by utilizing a first target detection algorithm for T _m Frame, get j rectangular frames R _Tm1 、R _Tm2 …R _Tmj (j>1), simultaneously, tracking the target to be tracked by using a multi-target tracking algorithm until a rectangular frame of the target to be tracked disappears in the video, and recording backward track information and disappearing time T of the target to be tracked in the video _l And storing each video frame and the corresponding image in the rectangular frame, the rectangular frame coordinate and the time of the target to be tracked.

Further, the step (2) further comprises the following sub-steps:

(2.4) for the rectangular frame image of the target to be tracked saved in the step (2.2) and the step (2.3), the deleting time is less than T _f +1 or greater than T _l 1, to remove pictures that may be missing when the object to be tracked appears and disappears.

Further, the step (3) includes the following sub-steps:

(3.1) detecting whether the corresponding pedestrian wears a mask or not by utilizing a second target detection algorithm to the pictures in the rectangular frame except the rectangular frame of the target to be tracked in all the video frames of the target to be tracked appearing in the video, and further carrying out risk classification;

(3.2) respectively inputting the pictures in the rectangular frame except the target rectangular frame to be tracked into a re-identification network, judging whether the corresponding pedestrians are identified, if so, storing the pictures in the rectangular frame and the corresponding pedestrian numbers, time and rectangular frame coordinates into a corresponding data set, otherwise, creating a new data set and storing the pictures in the rectangular frame and the corresponding pedestrian numbers, time and rectangular frame coordinates into the new data set.

Further, the step (4) comprises the following sub-steps:

(4.1) according to the map adjacency matrix, directly acquiring x cameras (x) adjacent to the camera corresponding to the video in the geographic position>1), the x cameras C are loaded in sequence ₁ 、C ₂ …C _k From (T) _f +T _a ) To (T) _f +T _a + Δ T) video segment, where T _a Storing each video clip and corresponding start-stop time for a simplified search duration obtained according to a map adjacency matrix, wherein delta T is the search duration;

(4.2) carrying out pedestrian detection on the video frames in each video clip by using a first target detection algorithm, adjusting the size of the picture in the obtained rectangular frame to be consistent with the input size of the re-identification network in the step (3), adjusting the average brightness of the picture to be at the same average brightness level as the video frames in the step (1), and storing the picture and the corresponding time and coordinates;

(4.3) inputting the picture stored in the step (4.2) and the image in the rectangular frame of the target to be tracked and stored in the step (2) into the re-identification network, and judging whether the target to be tracked exists in the picture;

(4.4) if yes, returning to the step (2); if not, the search time range is expanded, the step (4.1) is returned to search again, if the search is continuously failed, the search time range is continuously expanded until the search is successful or all the video segments are searched, and the obtained video segment is obtainedComplete track information of the target to be tracked, wherein, Delta T ₁ The search time length is the search time length.

Further, the step (5) comprises the following sub-steps:

(5.1) sequentially reading the complete track information of the target to be tracked to obtain two cameras C passing through in sequence ₁ 、C ₂ ；

(5.2) acquiring coordinate positions P of the two cameras on the map by using the marked map ₁ 、P ₂ Carrying out binarization processing on the map to obtain road information;

(5.3) acquiring P in the map after the binarization processing by using a growth algorithm ₁ And P ₂ The most reasonable route between the two points is recorded, the coordinates of each point in the route are recorded, and then the route is drawn in an original map;

and (5.4) repeating the operations from the step (5.1) to the step (5.3) until the recorded track information of the target to be tracked is read.

According to a second aspect of embodiments of the present application, there is provided a flow adjustment assisting device based on machine vision and deep learning, including:

a target to be tracked acquisition module: acquiring a video and a video frame of which a target in the video is not blocked, carrying out pedestrian detection on the video frame to obtain a plurality of selection frames, selecting the corresponding selection frame as the selection frame of the target to be tracked according to the received user instruction, and storing the image of the target to be tracked in the corresponding selection frame and the corresponding time;

single-lens target tracking module: tracking the target to be tracked forward and backward respectively according to the image of the target to be tracked and the corresponding time saved in the module of the target to be tracked, obtaining all video frames of the target to be tracked appearing in the video, recording the track information of the target to be tracked, and saving the image in the rectangular frame, the coordinate of the rectangular frame and the time of the target to be tracked corresponding to each video frame;

mask detection and risk personnel classification save module: all video frames of the target to be tracked appearing in the video are subjected to pedestrian detection and mask detection, and then classified and stored through a re-identification algorithm;

a cross-shot object re-identification module: after the target to be tracked disappears in the video, searching whether the target to be tracked exists in the video of the corresponding camera in the map adjacency matrix according to the map adjacency matrix representing the geographic position of the camera, so as to obtain the complete track information of the target to be tracked;

a target track drawing module: and drawing the track of the target to be tracked on a map according to the complete track information of the target to be tracked.

According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method as described in the first aspect.

According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to the first aspect.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

according to the embodiment, the pedestrian re-identification technology is innovatively applied to the flow dispatching tracing work, the track to be traced and the joint sealing personnel after risk classification are automatically generated, the manual work is greatly reduced, the flow dispatching accuracy is improved, and the flow dispatching time is shortened. In addition, the system does not need hardware, can be directly carried with the existing monitoring system for use, is maintained remotely, and is convenient to update. Generally, the software effect is superior to the existing flow modulation method mode, and the software has the advantages of strong universality, wide application range, simple and convenient use and the like.

It should be noted that all identification and tracking referred to in this application is known and agreed to by the parties.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a flow chart illustrating a method for machine vision and deep learning based cutback assistance in accordance with an exemplary embodiment.

FIG. 2 is a screenshot of a video clip of a target to be tracked, shown in accordance with an exemplary embodiment.

FIG. 3 illustrates a travel path of an object to be tracked, according to an exemplary embodiment.

Fig. 4 is a block diagram illustrating a machine vision and deep learning based rehearsal assistance device, according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Fig. 1 is a flowchart illustrating a method for machine vision and deep learning based cutback assistance, which may include the following steps, as shown in fig. 1, according to an exemplary embodiment:

(2) single-lens target tracking: tracking the target to be tracked forwards and backwards respectively according to the image of the target to be tracked and the corresponding time saved in the step (1) to obtain all video frames of the target to be tracked appearing in the video, recording track information of the target to be tracked, and saving the image in the rectangular frame, the coordinate of the rectangular frame and the time of the target to be tracked corresponding to each video frame;

According to the embodiment, the pedestrian re-identification technology is innovatively applied to the flow dispatching tracing work, the track and the close contact personnel are automatically generated, the manual work is greatly reduced, the flow dispatching accuracy is improved, and the flow dispatching time is shortened. In addition, the system does not need hardware, can be directly carried with the existing monitoring system for use, is maintained remotely, and is convenient to update. Generally, the software effect is superior to the existing flow modulation method mode, and the software has the advantages of strong universality, wide application range, simple and convenient use and the like.

In the specific implementation of step (1), the following sub-steps can be included:

(1.1) after a video frame in which a target to be tracked is not blocked in a video is obtained, pedestrian recognition is carried out on the video frame by using a first target detection algorithm, and frame marking is carried out on a recognition result in the video frame to generate a plurality of selection frames for a user to select the target to be tracked;

specifically, a user calls all monitoring camera videos of a target to be tracked, which may pass through a time period and a road section, the videos are stored in an appointed folder after proper sequencing, the user selects a corresponding video to open in a system, the system automatically plays the selected video, the user checks the video, when the target to be tracked is observed to be not shielded and appears in a video picture clearly, the user captures a picture, the system obtains an image of a current frame of the video, and the camera number and the time T corresponding to the image are used for obtaining the image _s Storing corresponding folders for file names, and performing pedestrian recognition on the obtained image of the current video frame by using a trained Yolov5 network to obtain k rectangular frames R _Ts1 、R _Ts2 …R _Tsk (k>1) and a rectangular frame is drawn on the original image, and then an image with several clickable selected frames is generated and displayed on the system interface through the Qt library.

It should be noted that the first target detection algorithm in this embodiment is Yolov5, and in a specific implementation, the method that can be adopted includes R-CNN, Fast R-CNN, SSD, YOLO, etc., which are conventional settings in the art and will not be described herein.

(1.2) selecting a corresponding selection frame as a selection frame of the target to be tracked according to the received user instruction, and storing the image of the target to be tracked in the corresponding selection frame and corresponding time;

specifically, the user clicks and selects the rectangular frame R corresponding to the target to be tracked through the mouse _w Keep the rectangular frame R _w Into the multi-query dataset, the naming rule is "video name _ current time". And simultaneously creating a record database, wherein the format of the database is that a line of text is written in a line of information in the shape of 'the abscissa at the left upper corner of the rectangular box of the current time of the video name x1 the ordinate at the left upper corner of the rectangular box y1 the abscissa at the right lower corner of the rectangular box x2 the ordinate at the right lower corner of the rectangular box y 2'. In addition, the rectangular frame R _w The naming rule of the image is not unique, and the video of the image source and the corresponding time can be embodied, and the text form in the record database is not unique in the same way, and the corresponding information can be embodied.

In the specific implementation of step (2), the following sub-steps can be included:

(2.1) obtaining T in the video from the database _s The image at the moment is subjected to pedestrian recognition to obtain k rectangular frames R _Ts1 、R _Ts2 …R _Tsk (k>1), respectively carrying out coincidence degree matching on all rectangular frames on the image and the selection frame of the target to be tracked to obtain a rectangular frame R corresponding to the target to be tracked _w1 And recording the relevant coordinate information;

specifically, time T is recorded using the same Yolov5 network pair _s The video frames are subjected to pedestrian identification to obtain k rectangular frames R _Ts1 、 R _Ts2 …R _Tsk (k>1), and then the rectangular frame R stored in the step (1) is put into use _w The image is matched with the k rectangular frames obtained in the step with the coincidence degree, and a specific rectangular frame R corresponding to the target person to be tracked is obtained _w1 And recording the relevant coordinate information, wherein the contact ratio calculation formula is as follows:

where n is [1, k ]],R _w ∩R _Tsn Is two rectangular frames R _w ,R _Tsn Of (c), S (R) _w ) Is a rectangular frame R _w Area of (S) (R) _w ),S(R _Tsn ) Is the minimum of the area of the two rectangular boxes. R _Tsn The rectangular frame with the highest middle contact ratio is R _w1 。

in particular, from T _s Starting to load video frames forwards and carrying out pedestrian detection by using a Yolov5 network, wherein the time is T _n The frame is given k rectangular frames R _Tn1 、R _Tn2 …R _Tnk (k>1), and the DeepSort algorithm uses the motion information to correspond the rectangular frame obtained by detecting the pedestrians between each frame to the same pedestrian (i.e. tracking), and the principle of the algorithm is as follows:

an object model is described, represented and used to propagate the identification of the target to the next frame. The inter-frame displacement we approximate has a linear iso-velocity model independent of other objects and camera motion. The model for each target of the state is as follows:

where u and v represent the central horizontal and vertical coordinates of the target, respectively, and r and h represent the height and scale of the target BBox frame, note that the aspect ratio should be a constant, with the latter four quantities representing the predicted next frame. The above 8-dimensional state is taken as a direct observation model of the state of the object using a standard kalman filter with equal velocity motion and a linear observation model. With a threshold for each traceThe value a is used to record the time from the last successful match of the track to the current time. When the value is larger than a threshold value A set in advance _max The track is considered to be terminated, and the track which is not matched for a long time is considered to be terminated. Then, upon a match, a new trace is deemed likely to be generated for a detection that the match was not successful.

In Deepsort, the predicted Kalman state and the new state are evaluated using Mahalanobis distances:

represents the degree of motion match between the jth detection and ith tracks, where S _i Is a covariance matrix of the observation space at the current time, y, whose trajectories are predicted by a kalman filter _i Is the predicted observation of the trajectory at the current time, d _j Is the jth detection state (u, v, r, h). Considering the motion continuity, the detection can be screened by the Mahalanobis distance, and a t is set ⁽¹⁾ As a threshold, we can define a threshold function

When the uncertainty of the target motion is low, mahalanobis distance is a good correlation metric, but in practice, a large number of mahalanobis distances cannot be matched when a camera moves, and the metric is invalid, so that a second metric needs to be integrated for each BBox detection box d _j Calculating a surface feature descriptor r _j ,r _j 1, a gapley is created for storing the latest L _k A descriptor of a track, i.e.

Then the minimum cosine distance of the ith track and the jth track is used as a second scale:

for which we can also express by a threshold function:

then, we fuse these two scales as:

c _i,j ＝λd ⁽¹⁾ (i,j)+(1-λ)d ⁽²⁾ (i,j)

in summary, the distance metric works well for short-term predictions and matches, while the apparent information is more effective in matching metrics for long-term missing tracks.

Recording the disappearing time T until the rectangular frame of the target to be tracked disappears under the video of the camera _f I.e. the time when the object to be tracked appears in the current video segment.

And creating an increment directory, and storing pictures of the target to be tracked in the rectangular frame of each video clip. The naming convention is "video name _ current time _ upper left abscissa x1_ upper left ordinate y1_ lower right abscissa x2_ lower right ordinate y 2". It should be noted that the naming rule is not unique, and may be only the coordinates of the rectangular frame and the current time.

It should be noted that the multi-target tracking algorithm in this embodiment adopts deepSORT, but in specific implementation, SORT, JDE algorithm, FAIRMOT algorithm, and the like may also be adopted, and the setting is a conventional setting in the art and is not described herein again.

(2.3) from T _s Starting to load video frames backwards and detecting pedestrians on the image of each frame by using a first target detection algorithm for T _m Frame, get j rectangular frames R _Tm1 、R _Tm2 …R _Tmj (j>1), simultaneously, tracking the target to be tracked by using a multi-target tracking algorithm until a rectangular frame of the target to be tracked disappears in the video, and recording backward track information and disappearing time T of the target to be tracked in the video _l Storing each video frame and the corresponding image in the rectangular frame, the coordinate of the rectangular frame and the time of the target to be tracked;

specifically, step (2.3) and step (2.2) are the same, and are not described herein again.

Preferably, step (2) may further comprise the sub-step (2.4):

(2.4) for the image in the rectangular frame of the target to be tracked, which is saved in the step (2.2) and the step (2.3), the deleting time is less than T _f +1 or greater than T _l 1, to remove images that may be incomplete when the object to be tracked appears and disappears;

specifically, reading the name of the image file to obtain the time for acquiring the image, and the deletion time is less than T _f Images and times of +1 are greater than T _l -1 to remove images that may be missing when pedestrians appear and disappear.

In the specific implementation of step (3), the following sub-steps can be included:

specifically, inputting the rectangular frame except the rectangular frame of the target to be tracked in the step (2) into the SSD network, firstly modifying the picture size to be 360 × 360 pixels of the SSD network input size, identifying a plurality of possible mask-wearing face rectangular frames or mask-free face rectangular frames, then removing the invalid face rectangular frame by using a non-maximum suppression algorithm (NMS) according to the overlap ratio of the rectangular frames (calculated by using the overlap ratio calculation formula) and the SSD network classification score, to obtain whether the input picture pedestrian wears the mask, and drawing rectangular frame marks with different marks on the video frame according to the invalid face rectangular frame, namely risk classification, wherein the rectangular frame marked with '0' is the target to be tracked, the rectangular frame marked with 'normal' is the pedestrian who does not wear the mask, and the rectangular frame marked with 'normal' is the pedestrian who wears the mask.

It should be noted that, in this embodiment, the SSD algorithm is used as the second target detection algorithm, and in specific implementation, R-CNN, Fast R-CNN, Faster R-CNN, SSD, YOLO, etc. may also be used, which are set as conventional settings in the art and are not described herein again.

(3.2) respectively inputting the pictures in the rectangular frame except the target rectangular frame to be tracked into a re-identification network, judging whether the corresponding pedestrians are identified, if so, storing the pictures in the rectangular frame and the corresponding pedestrian numbers, time and rectangular frame coordinates into a corresponding data set, otherwise, creating a new data set and storing the pictures in the rectangular frame and the corresponding pedestrian numbers, time and rectangular frame coordinates into the new data set;

specifically, the pedestrian closely contacted with the target to be tracked is classified and saved by using the re-identification network: and creating an increment catalogue, and storing pedestrians in close contact with the target to be tracked. If the step is executed for the first time, the pictures in the rectangular frame except the target to be tracked are sequentially stored in the contact data set to create a new sub data set. If the step is not executed for the first time, the pictures in the rectangular frame except the target to be tracked and all the pictures in the contact data set are input into a re-identification network based on ResNet50, and the network principle is as follows:

inputting a 64 x 128 RGB picture, adding an average pooling layer at the last layer of the ResNet50 network, pooling the 2048 x 7 dimensional output into 2048 x1 dimensions, and inputting the 2048 dimensional vectors into a fully connected layer to become 512 dimensional vectors as the network output. And for different output pictures, judging whether the picture is a person by calculating whether the Euclidean distance of 512-dimensional vectors corresponding to the two pictures is smaller than a threshold value. Wherein the Euclidean distance is calculated as follows:

if the picture in the rectangular frame and the picture in the contact data set belong to the same pedestrian, assuming that the picture is a certain picture in the nth sub data set in the contact data set, storing the picture in the nth sub data set in the contact data set, and if the picture is not the certain picture in the nth sub data set in the contact data set, creating a sub data set in the contact data set and storing the picture. When saving pictures in this step, the naming rule is "pedestrian number _ video name _ current time _ upper left abscissa x1_ upper left ordinate y1_ lower right abscissa x2_ lower right ordinate y 2". It should be noted that the naming rule is not unique, and may be only the pedestrian serial number, the video name, the coordinates of the rectangular frame, and the current time.

In the specific implementation of step (4), the following sub-steps can be included:

specifically, k cameras (k) adjacent to the geographic position of the camera in the step (2) are obtained according to a map adjacency matrix input by the geographic position information of the camera>1), the k cameras C are loaded in sequence ₁ 、C ₂ …C _k From (T) _f +T _a ) To (T) _f +T _a + Δ T) video segments (Δ T is the search duration), each video segment being saved and named "video name _ (T) _f +T _a )_(T _f +T _a +ΔT)”。T _a The simplified search duration is obtained according to the map adjacency matrix, and the value of the ith row and the j column of the adjacency matrix A is A _ij If A is _ij When equal to 0, the camera C _i And C _j The geographic locations are not adjacent, if A _ij If > 0, then the camera C _i And C _j Adjacent in geographical position, A _ij For pedestriansSlave camera C _i Walk to Camera C _j A conservative estimate of the minimum time required (reflecting the geographical distance between cameras).

The creating process of the map adjacency matrix may be: according to the positions of the cameras and road information in a map, firstly, judging whether any two cameras are connected with a road which does not pass through other cameras in space, if so, judging that the two cameras are adjacent, otherwise, judging that the two cameras are not adjacent; secondly, acquiring the actual distance between two adjacent cameras according to the map, and creating a map adjacency matrix according to the two conditions: the distance between two cameras is recorded when the two cameras are adjacent to each other, and the distance between the two cameras is recorded when the two cameras are not adjacent to each other.

(4.2) carrying out pedestrian detection on the video frame in each video clip by using a first target detection algorithm, adjusting the size of the picture in the obtained rectangular frame to be consistent with the input size of the re-identification network in the step (3), adjusting the average brightness of the picture to be at the same average brightness level as the video frame in the step (1), and storing the picture, the corresponding time and the corresponding coordinates;

specifically, pedestrian detection is performed every 10 frames by using Yolov5, the size of the obtained picture in the rectangular frame is adjusted to 64 × 128, the average brightness of the picture is adjusted to the same average brightness level as that of the target picture to be tracked in the multi-query data set created in step (1), and a galery data set storage picture is created, wherein the picture naming rule of the step storage picture is 'video name _ current time _ upper left horizontal coordinate x1_ upper left vertical coordinate y1_ lower right horizontal coordinate x2_ lower right vertical coordinate y2 of the rectangular frame'. It should be noted that the naming rule is not unique, and only the video name, the coordinates of the rectangular frame, and the current time may be represented. It should be noted that in this step, "detect a pedestrian using the first target detection algorithm for the video frame in each video segment" means that, in each video segment, a pedestrian is detected using the first target detection algorithm for every predetermined number of frames of video frames, and in this embodiment, the predetermined number of frames is set to 10 frames, and the number of frames can be set according to the actual situation.

specifically, the comparison between a series of pictures of the target to be tracked and one picture is input to the network, in this embodiment, the average value of each dimension of 512-dimensional vectors obtained from the series of pictures is calculated to obtain another 512-dimensional vector, and then the euclidean distance between the 512-dimensional vector and the other picture is calculated.

(4.4) if yes, returning to the step (2); if not, expanding the search time range, returning to the step (4.1) to search again, and if the search is continuously failed, continuously expanding the search time range until the search is successful or all video segments are searched, so as to obtain the complete track information of the target to be tracked;

specifically, if yes, representing that the target to be tracked is successfully re-identified, returning to the step (2), if not, continuing to expand the search time, and adjusting the search time to be (T) _f +T _a + Δ T to (T) _f +T _a +ΔT+ΔT ₁ ) And (4) returning to the step (4.1) to search again, and if the search is continuously failed, continuously increasing the search time until the search is successful or all video segments are searched, so as to obtain the complete track information of the target to be tracked, wherein the delta T is ₁ The search time length is the search time length.

In the specific implementation of step (5), the following sub-steps can be included:

Specifically, two adjacent rows in the record database are read in sequence, wherein the first character of each row separated by a space is the camera name.

specifically, the system reads the coordinate positions P of the two cameras on the map by using the marked map ₁ 、P ₂ ，P ₁ 、P ₂ Are all on the corresponding road(ii) a And then, carrying out binarization processing on the map, wherein the pixel value of the processed road is 1, and the pixel value of the non-road is 0, so as to obtain road information.

specifically, P is obtained in the map after the binarization processing by using a region growing algorithm ₁ And P ₂ The most reasonable route between the two points, the coordinate of each point in the route is recorded, the route is drawn in an original map, and the formula of the region growing algorithm is as follows:

Grow(P _x,y )＝InRoad(P _x,y+1 ,P _x，y-1 ,P _x+1,y ,P _x-1,y )

wherein x and y are P point coordinates, and the InRoad function is to judge a point P through a map _x,y Whether the points in the up, down, left and right directions are in the road or not is reserved in the road and is added to the growth set, and the growth set is grown by the iterative growth function Grow until the growth set contains P ₂ Until now.

Examples of the embodiments

The experimental results shown in the drawings were obtained by implementing an embodiment of the present invention on a machine equipped with an Intel Core i7-9750H CPU, NVidia GTX1650 graphics processor, and 16GB memory. Automatically drawing a map track of a person to be tracked, intercepting and taking out video clips of the person to be tracked appearing in each camera video, carrying out risk classification of mask detection on other pedestrians, and simultaneously storing and waiting for the pedestrian pictures in close contact with the tracked target.

As shown in fig. 2, the system detects the mask by using the SSD algorithm, classifies the risk of the pedestrian in close contact with the target to be tracked, and stores the classified contacts in the contact database.

As shown in fig. 3, the system automatically generates the travel track of the target to be tracked, so that the manpower work is greatly reduced, the accuracy of stream adjustment is improved, the efficiency is high, the time for processing the video with the total time of 156 minutes is about 60 minutes, and the stream adjustment time is shortened.

Corresponding to the foregoing embodiments of the method for assisting the style of flow based on machine vision and deep learning, the present application also provides embodiments of a device for assisting the style of flow based on machine vision and deep learning.

Fig. 4 is a block diagram illustrating a machine vision and deep learning based rehearsal assistance device, according to an example embodiment. Referring to fig. 4, the apparatus may include:

target to be tracked acquisition module 21: acquiring a video and a video frame of which a target in the video is not blocked, carrying out pedestrian detection on the video frame to obtain a plurality of selection frames, selecting the corresponding selection frame as the selection frame of the target to be tracked according to the received user instruction, and storing the image of the target to be tracked in the corresponding selection frame and the corresponding time;

single-lens target tracking module 22: tracking the target to be tracked forward and backward respectively according to the image of the target to be tracked and the corresponding time saved in the module of the target to be tracked, obtaining all video frames of the target to be tracked appearing in the video, recording the track information of the target to be tracked, and saving the image in the rectangular frame, the coordinate of the rectangular frame and the time of the target to be tracked corresponding to each video frame;

mask detection and risk personnel classification save module 23: all video frames of the target to be tracked appearing in the video are subjected to pedestrian detection and mask detection, and then classified storage is carried out through a re-recognition algorithm;

the cross-shot object re-recognition module 24: after the target to be tracked disappears in the video, searching whether the target to be tracked exists in the video of the corresponding camera in the map adjacency matrix according to the map adjacency matrix representing the geographic position of the camera, so as to obtain the complete track information of the target to be tracked;

the target trajectory drawing module 25: and drawing the track of the target to be tracked on a map according to the complete track information of the target to be tracked.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

For the device embodiment, since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

Correspondingly, the present application also provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a machine vision and deep learning based flow modulation assistance method as described above.

Accordingly, the present application also provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the machine vision and deep learning based cutback assistance method as described above.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof.

Claims

1. A method for assisting flow adjustment based on machine vision and deep learning is characterized by comprising the following steps:

(4) cross-shot object re-recognition: after the target to be tracked disappears in the video, searching whether the target to be tracked exists in the video of the corresponding camera in the map adjacent matrix or not according to the map adjacent matrix representing the geographic position of the camera, so as to obtain complete track information of the target to be tracked;

2. The method according to claim 1, wherein the step (1) comprises the sub-steps of:

3. The method according to claim 1, wherein the step (2) comprises the sub-steps of:

(2.2) from T _s Starting to load video frames forwards and carrying out pedestrian detection on images of each frame by utilizing a first target detection algorithm for T _n To obtain i rectangular frame R _Tn1 、R _Tn2 …R _Tni (i>1), simultaneously, tracking the target to be tracked by using a multi-target tracking algorithm until a rectangular frame of the target to be tracked disappears in the video, and recording forward track information and disappearing time T of the target to be tracked in the video _f Storing each video frame and the corresponding rectangular frame image, rectangular frame coordinates and time of the target to be tracked;

(2.3) from T _s Starting to load video frames backwards and detecting pedestrians on the image of each frame by using a first target detection algorithm for T _m Frame, get j rectangular frames R _Tm1 、R _Tm2 …R _Tmj (j>1), simultaneously, tracking the target to be tracked by using a multi-target tracking algorithm until a rectangular frame of the target to be tracked disappears in the video, and recording backward track information and disappearing time T of the target to be tracked in the video _l Saving each video frame and the corresponding target to be trackedTarget rectangle in-frame image, rectangle frame coordinates, time.

4. The method according to claim 3, wherein the step (2) further comprises the sub-steps of:

(2.4) for the image in the rectangular frame of the target to be tracked, which is saved in the step (2.2) and the step (2.3), the deleting time is less than T _f +1 or greater than T _l 1, to remove pictures that may be missing when the object to be tracked appears and disappears.

5. The method according to claim 1, characterized in that said step (3) comprises the sub-steps of:

6. The method according to claim 1, characterized in that said step (4) comprises the sub-steps of:

(4.1) according to the map adjacency matrix, directly acquiring x cameras (x) adjacent to the camera corresponding to the video in the geographic position>1), the x cameras C are loaded in sequence ₁ 、C ₂ …C _k From (T) _f +T _a ) To (T) _f +T _a + Δ T) video segment, where T _a For simplifying the search duration obtained from the map adjacency matrix, and Δ T is the search duration, storing each of the video segments and pairsThe required start-stop time;

(4.4) if yes, returning to the step (2); if not, expanding the search time range, returning to the step (4.1) for searching again, and if the search is continuously failed, continuously expanding the search time range until the search is successful or all video segments are searched, so as to obtain the complete track information of the target to be tracked, wherein delta T is ₁ The search time length is the search time length.

7. The method according to claim 1, characterized in that said step (5) comprises the sub-steps of:

8. A flow adjustment assisting device based on machine vision and deep learning is characterized by comprising:

the mask detection and risk personnel classification storage module comprises: all video frames of the target to be tracked appearing in the video are subjected to pedestrian detection and mask detection, and then classified and stored through a re-identification algorithm;

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, perform the steps of the method according to any one of claims 1-7.