CN111126112A - Candidate region determination method and device - Google Patents
Candidate region determination method and device Download PDFInfo
- Publication number
- CN111126112A CN111126112A CN201811292150.9A CN201811292150A CN111126112A CN 111126112 A CN111126112 A CN 111126112A CN 201811292150 A CN201811292150 A CN 201811292150A CN 111126112 A CN111126112 A CN 111126112A
- Authority
- CN
- China
- Prior art keywords
- video frame
- monitoring video
- frame
- detection model
- candidate area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000001514 detection method Methods 0.000 claims abstract description 84
- 238000012544 monitoring process Methods 0.000 claims abstract description 68
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000006399 behavior Effects 0.000 claims description 88
- 230000000694 effects Effects 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 9
- 238000012546 transfer Methods 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000007670 refining Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 7
- 238000004590 computer program Methods 0.000 description 6
- 230000009471 action Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 206010001488 Aggression Diseases 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 206010000117 Abnormal behaviour Diseases 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a candidate region determination method and a candidate region determination device. The method comprises the following steps: acquiring a monitoring video frame; inputting a monitoring video frame into a pre-established target detection model, and outputting a category identifier and a coordinate value of a candidate area, wherein the target detection model is formed by training based on a RefineDet network structure, the category identifier is used for indicating whether the candidate area contains a target object, and the target object comprises a sorting behavior main body which is in contact with a sorted object; and if the category identification indicates that the candidate area contains the target object, determining the monitoring video frame as a starting frame for analyzing the violent sorting behavior. According to the technical scheme of the embodiment of the application, the quality of the candidate area can be effectively improved through the RefineDet network structure, so that the accuracy of the violent sorting behavior identification stage is improved.
Description
Technical Field
The present application relates generally to the field of computer vision, and more particularly, to a candidate region determination method and apparatus.
Background
Sorting is the operation of stacking the articles in different categories according to the sequence of the categories and the sequence of entering and leaving the warehouse. The timeliness of sorting operations directly affects the development prospects of different service providers.
Currently, in a sorting scenario, the quality of service of a service provider is severely affected by the occurrence of violent sorting activities. In order to better improve the service quality, a video-based target detection and analysis method is provided for violent sorting behaviors. The violent sorting behavior detection can be divided into a candidate region generation part and a violent behavior identification part, the quality of the candidate region generation directly influences the violent behavior identification effect, the spatial position of the candidate region is generally determined in a single-frame video, and then sliding window processing is carried out on the time sequence to obtain the time position of the candidate region.
In the detection of violent sorting behaviors, due to the complexity of a video scene, the number of candidate areas is in direct proportion to the number of pedestrians, the false detection rate of a violent behavior identification stage is increased, and the identification time is too long.
Disclosure of Invention
In view of the above-mentioned drawbacks and deficiencies of the prior art, it is desirable to provide a method, an apparatus, and a storage medium for determining a candidate area of violent sorting behavior, so as to reduce the time consumption for recognition of the violent sorting behavior and improve the recognition accuracy.
In a first aspect, an embodiment of the present application provides a method for determining a candidate area of a violent sorting behavior, where the method includes:
acquiring a monitoring video frame;
inputting a monitoring video frame into a pre-established target detection model, and outputting a category identifier and a coordinate value of a candidate area, wherein the target detection model is formed by training based on a RefineDet network structure, the category identifier is used for indicating whether the candidate area contains a target object, and the target object comprises a sorting behavior main body which is in contact with a sorted object;
and if the category identification indicates that the candidate area contains the target object, determining the monitoring video frame as a starting frame for analyzing the violent sorting behavior.
In a second aspect, an embodiment of the present application provides a forced sorting behavior candidate region determination apparatus, which is characterized in that the apparatus includes:
the video frame acquisition module is used for acquiring monitoring video frames;
the target detection module is used for inputting the monitoring video frame into a pre-established target detection model and outputting a category identifier and a coordinate value of a candidate area, the target detection model is formed by training based on a RefineDet network structure, the category identifier is used for indicating whether the candidate area contains a target object, and the target object comprises a sorting behavior main body which is in contact with a sorted object;
and the determining module is used for determining the monitoring video frame as a starting frame for analyzing violent sorting behaviors if the category identification indicates that the candidate area contains the target object.
According to the technical scheme of the candidate region of the violent sorting behavior, the monitoring video frames received in real time are identified through the pre-established target detection model, the effective candidate region which can be used for the violent sorting behavior identification is obtained, and the quality of the candidate region is improved. In the process of training the target detection model, the video frames containing the target objects are marked, and the candidate areas only containing the target objects are reserved, so that the number of the candidate areas is effectively reduced, and the speed of violent sorting behavior identification is increased.
Further, by determining an end frame for analyzing violent sorting behavior, or when a target object is not identified, continuing to acquire a new monitoring video frame, video image processing efficiency can be improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 is a schematic flow chart illustrating a method for determining a candidate area of violent sorting behavior according to an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram illustrating a method for building an object detection model according to another embodiment of the present application;
fig. 3 is a block diagram illustrating an exemplary structure of a forced sorting behavior candidate region determination apparatus 300 according to an embodiment of the present application;
FIG. 4 is a block diagram illustrating an exemplary architecture of an apparatus 400 for modeling object detection provided in accordance with another embodiment of the present application;
FIG. 5 illustrates a schematic diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for determining a candidate area of a violent sorting behavior according to an embodiment of the present application.
As shown in fig. 1, the method includes:
In the embodiment of the application, after the monitoring video data is acquired from the video storage server or the device equivalent to the video storage server, or the video acquisition device, the video data can be converted into a video sequence, and the video sequence is a frame-by-frame image, namely a monitoring video frame.
The monitoring video data is obtained by the video acquisition device through real-time acquisition, and is related to the sorting scene. The surveillance video frames may contain, among other things, objects, pedestrians, vehicles, etc.
And 120, inputting the monitoring video frame into a pre-established target detection model, and outputting the category identification and the coordinate value of the candidate area.
In the embodiment of the application, the pedestrian detection is carried out on the monitoring video frames frame by frame so as to identify the candidate region. Violent sorting behaviors are identified through analysis of the candidate areas, and the violent sorting behaviors mean that the time from contact to separation of a sorting behavior main body and a sorted object is extremely short. For example a time interval of only a few seconds or even 1 second. In order to capture such a short-time motion change in the video image data, it is necessary to accurately recognize the starting point of the contact of the sorting action body with the object to be sorted. According to the embodiment of the application, the initial position of the violent sorting behavior can be accurately identified through detecting and analyzing the video image data, so that the violent sorting behavior can be more accurately identified, and the efficiency of the violent sorting behavior identification stage is improved.
In the embodiment of the application, the acquired monitoring video frame is input into a pre-established target detection model for target detection, and the category identification and the coordinate value of the candidate area are obtained. Wherein the category identification is used for indicating whether the candidate area contains a target object, and the target object comprises a sorting behavior body which is in contact with the sorted object.
It is contemplated that the subject of the sort activity will typically be a sorting operator and the recipient will typically be an object to be sorted, such as an item or package or a courier, for example. The embodiment of the application provides that video frames are analyzed based on a target detection model, and the video frames only containing sorting personnel or sorted objects are excluded, so that the number of candidate areas is reduced, and the speed of a violent sorting behavior identification stage is increased.
The target detection model is trained on the basis of a RefineDet network structure. The RefineDet network architecture is a new single detector with higher accuracy than the two-stage approach and maintains the high efficiency of the single-stage approach. The RefineDet network structure comprises a positioning refinement module, a target detection module and a transfer connection block. The positioning refinement module aims to filter out improper anchor frames to reduce classified search space, and the transfer connecting block carries out conversion operation on the characteristic diagram output by the positioning refinement module and then inputs the characteristic diagram into the target detection module. The features of different sides are fused by a target detection module, and then multi-layer classification and regression are performed. The quality of the candidate area can be effectively improved through the RefineDet network structure, and therefore the accuracy of the violent sorting behavior identification stage is improved.
The category identification is a flag for indicating the attribute category of the candidate region. For example, may be a number or letter, or a combination of both, that primarily indicates the nature of the candidate region with the identity, e.g., indicates that the subject of the sorting activity is in contact or not in contact with the object being sorted. For example, the identification yes on the candidate area indicates that the body of the sorting behavior in the candidate area has contact with the object to be sorted, and the identification no indicates that the body of the sorting behavior in the candidate area has no contact with the object to be sorted.
Further, the class identification may also include a confidence of the candidate region, where the confidence may be identified by a number, and the confidence represents the discriminative power of the model for the candidate region. For example, label yes on the candidate area: and 0.85, representing the confidence probability of the situation that the candidate region belongs to the contact between the sorting behavior main body and the sorted object. Mark no on the candidate area: and 0.95, representing the confidence probability that the candidate area belongs to the condition that the main body of the sorting action and the sorted object do not contact. If the category identification indicates that the candidate area contains the target object, the surveillance video frame is determined to be a starting frame for analyzing the violent sorting behavior, step 130.
In the embodiment of the present application, if the category identifier indicates that the candidate region includes the target object, that is, the main body of the sorting behavior contacts with the object to be sorted (or called a recipient), the surveillance video frame is determined as the starting frame for analyzing the violent sorting behavior.
Further, after determining that the surveillance video frame is a starting frame for analyzing violent sorting behavior, the method further comprises:
an end frame for analyzing violent sorting behavior is obtained. The end frame may be, for example, a video frame of a predetermined time interval sequentially extracted through a time sliding window as the end frame. For example, a fixed number of frames can be extracted from the beginning frame and back, i.e., consecutive frames are obtained from the surveillance video as objects for violent sorting activities. The fixed number of frames may be, for example, 15 frames. Or extracting the video frame in a fixed time range, and taking the last frame as an end frame or a tail frame.
The extracted candidate regions are used for identifying violent sorting behaviors, namely, the candidate regions required by the violent sorting behaviors are determined.
Further, the embodiment of the present application may further include: and if the category identification indicates that the candidate area does not contain the target object, continuously acquiring a new monitoring video frame.
And if the category identification indicates that the candidate region does not comprise the target object, performing target detection on the next video frame in the monitoring video frame sequence until the monitoring video frame is determined to be the initial frame, and starting a time sliding window to sequentially acquire the video frames with the fixed frame number as the identification object in the violent sorting behavior identification stage.
According to the embodiment of the application, the monitoring video frames are detected frame by frame through the RefineDet network structure, the abnormal initial state of the sorting behavior can be effectively identified, the candidate area of the violent sorting behavior can be determined through extracting the video frames in the preset range, the violent sorting behavior can be accurately and efficiently identified through the identification of the candidate area, the accuracy rate of the detection of the violent sorting behavior is improved, and the speed of the detection of the violent sorting behavior is also improved.
The embodiment of the present application further provides a method for training a referendet network structure as a target detection model, please refer to fig. 2, and fig. 2 shows a flowchart of a method for establishing a target detection model according to another embodiment of the present application.
As shown in fig. 2, the method includes:
and 240, training a RefineDet network structure by utilizing the preprocessed historical monitoring video frame sequence according to a gradient descent algorithm to obtain a target detection model.
In the embodiment of the application, the abnormal behaviors of the sorting behavior main body are detected in the express sorting scene, so that the service quality of a service provider is improved, and the satisfaction degree of a user is improved. In an express sorting scene, there are many sorters, and at a certain time, some sorters may not operate the sorted objects, and some sorters may be contacting the sorted objects. If human body detection is carried out on all the sorting behavior main bodies, a large number of candidate frames or candidate regions to be identified are extracted, and due to the fact that the spatial positions of the candidate regions are more, the time for classifying the candidate regions is longer, and the detection accuracy is reduced.
The embodiment of the application divides the pedestrians into two categories: the sorting personnel are in contact with the objects to be sorted or the sorting personnel are not in contact with the objects to be sorted. The candidate area for detecting the violent sorting behavior is only from the case where the sorting person comes into contact with the sorted object.
In order to improve the quality of the candidate region of the violent sorting behavior, a RefineDet network structure training target detection model is selected to detect the monitoring video frames collected in real time, so that the candidate region is obtained.
When the target detection model is trained, a large number of historical monitoring video frames or image frames are acquired, and the video frames are labeled, for example, all coordinates in the monitoring video frames, which are in contact with the objects to be sorted, and all coordinates in the monitoring video frames, which are not in contact with the objects to be sorted, can be labeled.
And after the historical monitoring video frame is labeled, preprocessing operations such as normalization, data enhancement and the like are carried out on the labeled image data.
After preprocessing, training a RefineDet network structure by using the preprocessed monitoring video frame sequence as a training set to establish a model architecture;
and updating the RefineDet network structure by using a random gradient descent algorithm so as to obtain a target detection model.
The RefineDet network structure comprises a positioning refinement module, a target detection module and a transfer connection block, and the RefineDet network structure is updated by using a random gradient descent algorithm, so that obtaining a target detection model for example may include:
determining a training set from the preprocessed historical monitoring video frame sequence;
inputting the training set into a positioning refinement module to perform prediction frame screening to obtain a first type of prediction frame; and transmitting the feature graph of the first type of prediction frame to a target detection module through a transfer connection block, wherein the target detection module regresses the first type of prediction frame to obtain an initial detection model.
And iteratively updating the initial detection model by using the minimum loss function to obtain a target detection model.
In the embodiment of the present application, the gradient descent algorithm may be, for example, Batch Gradient Descent (BGD), random gradient descent (SGD), or small batch random gradient descent (MSGD). Preferably, the model training is performed using a small batch gradient descent algorithm.
The small batch gradient descent algorithm is a compromise scheme, and a small batch sample in a training set is selected to calculate the gradient of a loss function, so that the training process can be ensured to be more stable. The method reduces the violent oscillation of parameter updating in SGD.
In the embodiment of the application, the candidate region is extracted by using the RefineDet network structure, so that the quality of the candidate region can be improved, the number of the candidate regions is reduced, and the speed and the accuracy of the violent sorting behavior identification are improved.
It should be noted that while the operations of the method of the present invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the steps depicted in the flowcharts may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
With further reference to fig. 3, fig. 3 shows an exemplary structural block diagram of a forced sorting behavior candidate region determination apparatus 300 according to an embodiment of the present application.
As shown in fig. 3, the apparatus 300 includes:
the video frame acquiring module 310 is configured to acquire a surveillance video frame.
In the embodiment of the application, after the monitoring video data is acquired from the video storage server or the device equivalent to the video storage server, or the video acquisition device, the video data can be converted into a video sequence, and the video sequence is a frame-by-frame image, namely a monitoring video frame.
The monitoring video data is obtained by the video acquisition device through real-time acquisition, and is related to the sorting scene. The surveillance video frames may contain objects, pedestrians, vehicles, etc.
And the target detection module 320 is configured to input the surveillance video frame into a pre-established target detection model, and output the category identifier and the coordinate value of the candidate region.
In the embodiment of the application, the pedestrian detection is carried out on the monitoring video frames frame by frame so as to identify the candidate region. Violent sorting behaviors are identified through analysis of the candidate areas, and the violent sorting behaviors mean that the time from contact to separation of a sorting behavior main body and a sorted object is extremely short. For example a time interval of only a few seconds or even 1 second. In order to capture such a short-time motion change in the video image data, it is necessary to accurately recognize the starting point of the contact of the sorting action body with the object to be sorted. According to the embodiment of the application, the initial position of the violent sorting behavior can be accurately identified through detecting and analyzing the video image data, so that the violent sorting behavior can be more accurately identified, and the efficiency of the violent sorting behavior identification stage is improved.
In the embodiment of the application, the acquired monitoring video frame is input into a pre-established target detection model for target detection, and the category identification and the coordinate value of the candidate area are obtained. Wherein the category identification is used for indicating whether the candidate area contains a target object, and the target object comprises a sorting behavior body which is in contact with the sorted object.
It is contemplated that the subject of the sort activity will typically be a sorting operator and the recipient will typically be an object to be sorted, such as an item or package or a courier, for example. The embodiment of the application provides that the number of the candidate areas is reduced by excluding the video frames only containing the sorting personnel or the sorted objects based on the target detection model, so that the speed of the violent sorting behavior identification stage is improved.
The target detection model is trained on the basis of a RefineDet network structure. The RefineDet network architecture is a new single detector with higher accuracy than the two-stage approach and maintains the high efficiency of the single-stage approach. The RefineDet network structure comprises a positioning refinement module, a target detection module and a transfer connection block. The positioning refinement module aims to filter out improper anchor frames to reduce classified search space, and the transfer connecting block carries out conversion operation on the characteristic diagram output by the positioning refinement module and then inputs the characteristic diagram into the target detection module. The features of different sides are fused by a target detection module, and then multi-layer classification and regression are performed. The quality of the candidate area can be effectively improved through the RefineDet network structure, and therefore the accuracy of the violent sorting behavior identification stage is improved.
The category identification is a flag for indicating the attribute category of the candidate region. For example, may be a number or letter, or a combination of both, that primarily indicates the nature of the candidate region with the identity, e.g., indicates that the subject of the sorting activity is in contact or not in contact with the object being sorted. For example, the identification yes on the candidate area indicates that the body of the sorting behavior in the candidate area has contact with the object to be sorted, and the identification no indicates that the body of the sorting behavior in the candidate area has no contact with the object to be sorted.
Further, the class identification may also include a confidence of the candidate region, where the confidence may be identified by a number, and the confidence represents the discriminative power of the model for the candidate region. For example, label yes on the candidate area: and 0.85, representing the confidence probability of the situation that the candidate region belongs to the contact between the sorting behavior main body and the sorted object. Mark no on the candidate area: and 0.95, representing the confidence probability that the candidate area belongs to the condition that the main body of the sorting action and the sorted object do not contact.
A determining module 330, configured to determine the surveillance video frame as a starting frame for analyzing the violent sorting behavior if the category identifier indicates that the candidate area contains the target object.
In the embodiment of the present application, if the category identifier indicates that the candidate region includes the target object, that is, the main body of the sorting behavior contacts with the object to be sorted (or called a recipient), the surveillance video frame is determined as the starting frame for analyzing the violent sorting behavior.
Further, after determining that the surveillance video frame is a starting frame for analyzing violent sorting behavior, the method further comprises:
and the end frame acquisition module is used for acquiring an end frame for analyzing the violent sorting behavior. The end frame may be, for example, a video frame of a predetermined time interval sequentially extracted through a time sliding window as the end frame. For example, a fixed number of frames can be extracted from the beginning frame and back, i.e., consecutive frames are obtained from the surveillance video as objects for violent sorting activities. The fixed number of frames may be, for example, 15 frames. Or extracting the video frame in a fixed time range, and taking the last frame as an end frame or a tail frame.
The extracted candidate regions are used for identifying violent sorting behaviors, namely, the candidate regions required by the violent sorting behaviors are determined.
Further, another embodiment of the present application includes:
and the new video acquisition module is used for continuously acquiring a new monitoring video frame if the category identification indicates that the candidate area does not contain the target object.
And if the category identification indicates that the candidate region does not comprise the target object, performing target detection on the next video frame in the monitoring video frame sequence until the monitoring video frame is determined to be the initial frame, and starting a time sliding window to sequentially acquire the video frames with the fixed frame number as the identification object in the violent sorting behavior identification stage.
According to the embodiment of the application, the monitoring video frames are detected frame by frame through the RefineDet network structure, the abnormal initial state of the sorting behavior can be effectively identified, the candidate area of the violent sorting behavior can be determined through extracting the video frames in the preset range, the violent sorting behavior can be accurately and efficiently identified through the identification of the candidate area, the accuracy rate of the detection of the violent sorting behavior is improved, and the speed of the detection of the violent sorting behavior is also improved.
In the embodiment of the present application, a method for training a referendet network structure as a target detection model is further provided, please refer to fig. 4, and fig. 4 shows an exemplary structural block diagram of an apparatus 400 for establishing a target detection model according to yet another embodiment of the present application.
As shown in fig. 4, the apparatus 400 includes:
a historical video frame obtaining sub-module 410, configured to obtain a historical monitoring video frame sequence;
the labeling submodule 420 is configured to label each historical surveillance video frame in the historical surveillance video frame sequence according to whether the historical surveillance video frame includes a target object;
the preprocessing submodule 430 is configured to preprocess the marked historical monitoring video frame sequence;
and the training submodule 440 is configured to train a RefineDet network structure according to a gradient descent algorithm by using the preprocessed historical monitoring video frame sequence, so as to obtain a target detection model.
In the embodiment of the application, the abnormal behaviors of the sorting behavior main body are detected in the express sorting scene, so that the service quality of a service provider is improved, and the satisfaction degree of a user is improved. In an express sorting scene, there are many sorters, and at a certain time, some sorters may not operate the sorted objects, and some sorters may be contacting the sorted objects. If human body detection is carried out on all the sorting behavior main bodies, a large number of candidate frames or candidate regions to be identified are extracted, and due to the fact that the spatial positions of the candidate regions are more, the time for classifying the candidate regions is longer, and the detection accuracy is reduced.
The embodiment of the application divides the pedestrians into two categories: the sorting personnel are in contact with the objects to be sorted or the sorting personnel are not in contact with the objects to be sorted. The candidate area for detecting the violent sorting behavior is only from the case where the sorting person comes into contact with the sorted object.
In order to improve the quality of the candidate region of the violent sorting behavior, a RefineDet network structure training target detection model is selected to detect the monitoring video frames collected in real time, so that the candidate region is obtained.
When the target detection model is trained, a large number of historical monitoring video frames or image frames are acquired, and the video frames are labeled, for example, all coordinates in the monitoring video frames, which are in contact with the objects to be sorted, and all coordinates in the monitoring video frames, which are not in contact with the objects to be sorted, can be labeled.
And after the historical monitoring video frame is labeled, preprocessing operations such as normalization, data enhancement and the like are carried out on the labeled image data.
After preprocessing, training a RefineDet network structure by using the preprocessed monitoring video frame sequence as a training set to establish a model architecture;
and updating the RefineDet network structure by using a random gradient descent algorithm so as to obtain a target detection model.
Wherein, the RefineDet network structure includes a positioning refinement module, a target detection module and a transfer connection block, and the training submodule 440 includes:
a training set determining unit, configured to determine a training set from the preprocessed historical surveillance video frame sequence;
the model establishing unit is used for inputting the training set into the positioning refining module to carry out prediction frame screening to obtain a first type of prediction frame; and transmitting the feature graph of the first type of prediction frame to a target detection module through a transfer connection block, wherein the target detection module regresses the first type of prediction frame to obtain an initial detection model.
And the parameter updating unit is used for iteratively updating the initial detection model by utilizing the minimum loss function to obtain the target detection model.
In the embodiment of the application, the candidate region is extracted by using the RefineDet network structure, so that the quality of the candidate region can be improved, the number of the candidate regions is reduced, and the speed and the accuracy of the violent sorting behavior identification are improved.
It should be understood that the units or modules described in the apparatus 300-400 correspond to the various steps in the method described with reference to fig. 1-2. Thus, the operations and features described above for the method are equally applicable to the apparatus 400 and the units included therein, and are not described in detail here. The apparatus 300-400 may be implemented in a browser or other security applications of the electronic device in advance, or may be loaded into the browser or other security applications of the electronic device by downloading or the like. The corresponding units in the apparatus 300-400 can cooperate with units in the electronic device to implement the solution of the embodiment of the present application.
Referring now to FIG. 5, a block diagram of a computer system 500 suitable for use in implementing a server according to embodiments of the present application is shown.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, the processes described above with reference to fig. 1 or 2 may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method of fig. 1 or 2. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor, and may be described as: a processor includes a video frame acquisition module, a target detection module, and a determination module. The names of these units or modules do not in some cases constitute a limitation on the units or modules themselves, and for example, the video frame acquisition module may also be described as a "module for acquiring surveillance video frames".
As another aspect, the present application also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the foregoing device in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the forced sorting act candidate area determination method described herein.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention as defined above. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.
Claims (10)
1. A method for determining candidate regions for violent sorting activities, the method comprising:
acquiring a monitoring video frame;
inputting the monitoring video frame into a pre-established target detection model, and outputting a category identifier and a coordinate value of a candidate area, wherein the target detection model is formed by training based on a RefineDet network structure, the category identifier is used for indicating whether the candidate area contains a target object, and the target object comprises a sorting behavior main body in contact with a sorted object;
and if the category identification indicates that the candidate area contains the target object, determining the monitoring video frame as a starting frame for analyzing violent sorting behaviors.
2. The method of claim 1, wherein the step of pre-establishing an object detection model comprises:
acquiring a historical monitoring video frame sequence;
labeling each historical monitoring video frame in the historical monitoring video frame sequence according to whether the target object is contained or not;
preprocessing the marked historical monitoring video frame sequence;
and training the RefineDet network structure by utilizing the preprocessed historical monitoring video frame sequence according to a gradient descent algorithm to obtain a target detection model.
3. The method of claim 1 or 2, wherein the class identification further comprises a class confidence.
4. The method of claim 3, wherein after determining that the surveillance video frame is a starting frame for analysis of violent sorting behavior, the method further comprises: an end frame for analyzing violent sorting behavior is obtained.
5. The method of claim 1, further comprising:
and if the category identification indicates that the candidate area does not contain the target object, continuously acquiring a new monitoring video frame.
6. An apparatus for determining a candidate area for violent sorting behavior, comprising:
the video frame acquisition module is used for acquiring monitoring video frames;
the target detection module is used for inputting the monitoring video frame into a pre-established target detection model and outputting a category identifier and a coordinate value of a candidate area, the target detection model is formed by training based on a RefineDet network structure, the category identifier is used for indicating whether the candidate area contains a target object, and the target object comprises a sorting behavior main body which is in contact with a sorted object;
a determining module, configured to determine that the surveillance video frame is a starting frame for analyzing violent sorting behavior if the category identifier indicates that the candidate region contains the target object.
7. The apparatus of claim 6, further comprising means for pre-building the object detection model, the means comprising:
the historical video frame acquisition submodule is used for acquiring a historical monitoring video frame sequence;
the marking submodule is used for marking each historical monitoring video frame in the historical monitoring video frame sequence according to whether the target object is contained or not;
the preprocessing submodule is used for preprocessing the marked historical monitoring video frame sequence;
and the training submodule is used for training the RefineDet network structure by utilizing the preprocessed historical monitoring video frame sequence according to a gradient descent algorithm to obtain a target detection model.
8. The apparatus of claim 7, wherein the training submodule comprises:
a training set determining unit, configured to determine a training set from the preprocessed historical surveillance video frame sequence;
the model establishing unit is used for inputting the training set into the positioning refining module to carry out prediction frame screening to obtain a first type of prediction frame; transmitting the feature map of the first type of prediction frame to a target detection module through a transfer connection block, wherein the target detection module performs regression on the first type of prediction frame to obtain an initial detection model;
and the parameter updating unit is used for iteratively updating the initial detection model by utilizing a minimum loss function to obtain the target detection model.
9. The apparatus of any of claims 6-8, wherein after determining that the surveillance video frame is a starting frame for analyzing violent sorting behavior, the apparatus further comprises: and the end frame acquisition module is used for acquiring an end frame for analyzing the violent sorting behavior.
10. The apparatus of claim 6, further comprising: and the new video acquisition module is used for continuously acquiring a new monitoring video frame if the category identification indicates that the candidate area does not contain the target object.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811292150.9A CN111126112B (en) | 2018-10-31 | 2018-10-31 | Candidate region determination method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811292150.9A CN111126112B (en) | 2018-10-31 | 2018-10-31 | Candidate region determination method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111126112A true CN111126112A (en) | 2020-05-08 |
CN111126112B CN111126112B (en) | 2024-04-16 |
Family
ID=70494471
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811292150.9A Active CN111126112B (en) | 2018-10-31 | 2018-10-31 | Candidate region determination method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111126112B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111797777A (en) * | 2020-07-07 | 2020-10-20 | 南京大学 | Sign language recognition system and method based on spatiotemporal semantic features |
CN112287800A (en) * | 2020-10-23 | 2021-01-29 | 北京中科模识科技有限公司 | Advertisement video identification method and system under no-sample condition |
CN113761993A (en) * | 2020-06-24 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Method and apparatus for outputting information |
CN118781522A (en) * | 2024-07-20 | 2024-10-15 | 广东顺融检测科技股份有限公司 | A method, system, device and medium for on-site target detection and labeling |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102457705A (en) * | 2010-10-19 | 2012-05-16 | 由田新技股份有限公司 | Fighting behavior detection and monitoring method and system |
CN104680557A (en) * | 2015-03-10 | 2015-06-03 | 重庆邮电大学 | Intelligent detection method for abnormal behavior in video sequence image |
CN105094657A (en) * | 2014-05-16 | 2015-11-25 | 上海京知信息科技有限公司 | Multitouch-based interactive video object segmentation fusion method |
CN108197575A (en) * | 2018-01-05 | 2018-06-22 | 中国电子科技集团公司电子科学研究院 | A kind of abnormal behaviour recognition methods detected based on target detection and bone point and device |
CN108446669A (en) * | 2018-04-10 | 2018-08-24 | 腾讯科技(深圳)有限公司 | motion recognition method, device and storage medium |
CN108537172A (en) * | 2018-04-09 | 2018-09-14 | 北京邦天信息技术有限公司 | A kind of method and apparatus of the behavior based on Machine Vision Recognition people |
-
2018
- 2018-10-31 CN CN201811292150.9A patent/CN111126112B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102457705A (en) * | 2010-10-19 | 2012-05-16 | 由田新技股份有限公司 | Fighting behavior detection and monitoring method and system |
CN105094657A (en) * | 2014-05-16 | 2015-11-25 | 上海京知信息科技有限公司 | Multitouch-based interactive video object segmentation fusion method |
CN104680557A (en) * | 2015-03-10 | 2015-06-03 | 重庆邮电大学 | Intelligent detection method for abnormal behavior in video sequence image |
CN108197575A (en) * | 2018-01-05 | 2018-06-22 | 中国电子科技集团公司电子科学研究院 | A kind of abnormal behaviour recognition methods detected based on target detection and bone point and device |
CN108537172A (en) * | 2018-04-09 | 2018-09-14 | 北京邦天信息技术有限公司 | A kind of method and apparatus of the behavior based on Machine Vision Recognition people |
CN108446669A (en) * | 2018-04-10 | 2018-08-24 | 腾讯科技(深圳)有限公司 | motion recognition method, device and storage medium |
Non-Patent Citations (1)
Title |
---|
ZHANG, S ET.AL: "Single-shot refinement neural network for object detection" * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113761993A (en) * | 2020-06-24 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | Method and apparatus for outputting information |
CN111797777A (en) * | 2020-07-07 | 2020-10-20 | 南京大学 | Sign language recognition system and method based on spatiotemporal semantic features |
CN111797777B (en) * | 2020-07-07 | 2023-10-17 | 南京大学 | Sign language recognition system and method based on space-time semantic features |
CN112287800A (en) * | 2020-10-23 | 2021-01-29 | 北京中科模识科技有限公司 | Advertisement video identification method and system under no-sample condition |
CN118781522A (en) * | 2024-07-20 | 2024-10-15 | 广东顺融检测科技股份有限公司 | A method, system, device and medium for on-site target detection and labeling |
Also Published As
Publication number | Publication date |
---|---|
CN111126112B (en) | 2024-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018121690A1 (en) | Object attribute detection method and device, neural network training method and device, and regional detection method and device | |
CN111126112B (en) | Candidate region determination method and device | |
CN108304816B (en) | Identity recognition method and device, storage medium and electronic equipment | |
US20160098636A1 (en) | Data processing apparatus, data processing method, and recording medium that stores computer program | |
CN109740573B (en) | Video analysis method, device, equipment and server | |
CN110781711A (en) | Target object identification method and device, electronic equipment and storage medium | |
CN111274926B (en) | Image data screening method, device, computer equipment and storage medium | |
CN108960124B (en) | Image processing method and device for pedestrian re-identification | |
KR102002024B1 (en) | Method for processing labeling of object and object management server | |
US11164028B2 (en) | License plate detection system | |
US10489637B2 (en) | Method and device for obtaining similar face images and face image information | |
CN111783665A (en) | Action recognition method and device, storage medium and electronic equipment | |
CN113361603A (en) | Training method, class recognition device, electronic device and storage medium | |
CN113344121B (en) | Method for training a sign classification model and sign classification | |
CN111199238A (en) | Behavior identification method and equipment based on double-current convolutional neural network | |
WO2014193220A2 (en) | System and method for multiple license plates identification | |
CN110659588A (en) | Passenger flow volume statistical method and device and computer readable storage medium | |
CN111563398A (en) | Method and device for determining information of target object | |
CN111079621A (en) | Method and device for detecting object, electronic equipment and storage medium | |
JP2010231254A (en) | Image analyzing device, method of analyzing image, and program | |
CN111476059A (en) | Target detection method and device, computer equipment and storage medium | |
CN117292338A (en) | Vehicle accident identification and analysis method based on video stream analysis | |
CN117351462A (en) | Construction operation detection model training method, device, equipment and storage medium | |
CN111985269B (en) | Detection model construction method, detection method, device, server and medium | |
CN112052730A (en) | 3D dynamic portrait recognition monitoring device and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |