CN113469142B - Classification method, device and terminal for monitoring video time-space information fusion - Google Patents
Classification method, device and terminal for monitoring video time-space information fusion Download PDFInfo
- Publication number
- CN113469142B CN113469142B CN202110932947.6A CN202110932947A CN113469142B CN 113469142 B CN113469142 B CN 113469142B CN 202110932947 A CN202110932947 A CN 202110932947A CN 113469142 B CN113469142 B CN 113469142B
- Authority
- CN
- China
- Prior art keywords
- video
- behavior
- category
- time
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 230000004927 fusion Effects 0.000 title claims abstract description 45
- 238000012544 monitoring process Methods 0.000 title claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000013139 quantization Methods 0.000 claims abstract description 13
- 230000006399 behavior Effects 0.000 claims description 121
- 238000005070 sampling Methods 0.000 claims description 26
- 238000004590 computer program Methods 0.000 claims description 12
- 239000000126 substance Substances 0.000 claims description 7
- 238000005259 measurement Methods 0.000 claims description 5
- 238000009826 distribution Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 description 8
- 238000001514 detection method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 230000035622 drinking Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 230000003245 working effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Software Systems (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a classification method, a device and a terminal for monitoring video space-time information fusion, wherein the method comprises the following steps: acquiring a sample video set; randomly and uniformly selecting videos with various types in a sample video set; inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier; importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points; calculating the chessboard distance of the preliminary prediction result of each time point according to the behavior quantization value as a distance value to obtain a category label of the time point; and performing time-dimension behavior category fusion operation on the category labels of all time points to obtain a final behavior category prediction result of the video to be identified. According to the scheme, behavior category quantization is utilized to realize unification of behavior category fusion on the space-time dimension, and data pertinence prediction is achieved on the characteristics of the service window video stream.
Description
Technical Field
The invention relates to the technical field of video analysis and deep learning, in particular to a classification method, a device and a terminal for monitoring video temporal-spatial information fusion.
Background
The system can effectively improve the working efficiency of the service window, protect the legal rights and interests of the working personnel and people, and has obvious significance for improving the service level and image of the government and service enterprise departments for people.
At present, the monitoring video station information is analyzed and processed by means of an automatic behavior detection and classification technology based on the monitoring video, so that people can be helped to release from complex and time-consuming work such as behavior retrieval, behavior analysis and the like of manual video examination, and the existing automatic behavior detection and classification technology based on the monitoring video comprises the following steps:
one is that the behavior detection method based on optical flow realizes the behavior detection depending on the optical flow difference generated by the human behavior change. Specifically, according to the trend or the track of the optical flow change, the video image is correspondingly subjected to feature extraction or coding to obtain classification data of different types of behaviors, and the classification data is used as a classification algorithm such as an SVM (support vector machine) for training to realize a behavior classification process. The method has the problems that the feature extraction and selection are based on general image methods such as histogram analysis, gradient analysis, optical flow tracking and the like, the analysis and modeling are not based on behavior difference, the analysis process is easily interfered by other factors such as object motion and the like, and the detection precision is insufficient.
The method utilizes the human skeleton characteristic information to make classification and identification of human behaviors more targeted, forms a specific key point structure by manually defining and automatically detecting key parts such as the head, the shoulder, the hand and the like of the human body, can well complete human detection and identification tasks under the conditions of no shielding and normal visual angle, and provides powerful guarantee for behavior classification. However, the posture data of the personnel acquired by the service window monitoring video cannot always meet the above conditions, and the service window data generally has the problems that the key points are shielded, the human body monitoring is difficult, the condition of serious personnel missing detection is caused, and the automatic posture recognition and classification task cannot be met.
The other method is a behavior classification method of a double video stream deep neural network by referring to the processing and abstraction processes of human visual cells on visual signals, and the method depends on the learning and training of a multilayer complex network; however, the accurate behavior marking and classification are still difficult due to the characteristics of long-time fixed posture, large working activity range and the like of the workers under video monitoring, and the problems of multi-person behavior category information, space behavior category conflict and the like contained in a video segment are still outstanding; meanwhile, the service window is different from single individual behavior classification, and interaction processes often exist among workers and between the workers and the public, so that the space range, the class marking and classification of the behavior classification are more complicated.
Thus, there is a need for a better solution to the problems of the prior art.
Disclosure of Invention
In view of this, the invention provides a classification method, a device and a terminal for monitoring video temporal-spatial information fusion, which are used for solving the problems in the prior art.
Specifically, the present invention proposes the following specific examples:
the embodiment of the invention provides a classification method for monitoring video space-time information fusion, which comprises the following steps:
acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
randomly and uniformly selecting videos with various behavior categories in the sample video set;
inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier;
importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points;
performing chessboard distance calculation on the preliminary prediction result of each time point according to a behavior quantization value as a distance value to obtain a category label of the time point;
and performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified.
In a specific embodiment, the behavior categories include: one or more of working on duty, not working on duty, playing mobile phone on duty, sleeping on duty and off duty;
the influence values corresponding to different behavior categories are related to the application scene.
In a specific embodiment, the acquiring a sample video set includes:
performing time sampling and space sampling for a certain number of times on the video clips with the segmented duration by using a behavior classification method of the double video streams; each video clip is marked with a behavior category and an influence value;
and taking the video clips subjected to time sampling and space sampling as a sample video set.
In a specific embodiment, the video input to the preset classifier is a plurality of video segments corresponding to different spatial distributions of the same window.
In a specific embodiment, the step of introducing a video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points includes:
importing the video to be identified into a trained classifier;
and carrying out multiple times of spatial sampling and behavior classification on the video clips of the same window in the video to be identified through the trained shunt to obtain a plurality of preliminary prediction results of the same time point.
In a specific embodiment, the category label is predicted based on the following formula:
wherein the content of the first and second substances,is a category label;the result is a preliminary prediction result; when the impact value is a scalar quantity,is a scalar value; when the impact value is a vector of values,in the form of a vector of values,by vector 2n norm (n)>0) Carrying out quantitative measurement; t is tuIs time of dayPoint; c. CsThe number of spatial samples.
In a specific embodiment, the final behavior category prediction result is obtained by performing a behavior category fusion operation in a time dimension based on the following formula:
wherein L iss,tPredicting results for the final behavior categories; c. CtIs the number of time samples taken.
The embodiment of the invention also provides a classification device for monitoring video temporal-spatial information fusion, which comprises:
the acquisition module is used for acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
the selection module is used for randomly and uniformly selecting videos with various behavior categories in the sample video set;
the training module is used for inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier;
the preliminary prediction module is used for importing the video to be recognized into the trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points;
the label module is used for calculating the chessboard distance of the preliminary prediction result of each time point according to the behavior quantization value as a distance value to obtain a category label of the time point;
and the fusion module is used for performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified.
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor runs the computer program to enable the processor to execute the classification method for the monitoring video spatiotemporal information fusion.
The embodiment of the invention also provides a storage medium, wherein a computer program is stored on the storage medium, and when being executed by a processor, the computer program realizes the classification method for the monitoring video time-space information fusion.
Therefore, the embodiment of the invention provides a classification method, a device and a terminal for monitoring video space-time information fusion, wherein the method comprises the following steps: acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level; randomly and uniformly selecting videos with various behavior categories in the sample video set; inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier; importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points; performing chessboard distance calculation on the preliminary prediction result of each time point according to a behavior quantization value as a distance value to obtain a category label of the time point; and performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified. In the scheme, the behavior categories are quantized, so that decision fusion of the behavior categories can be obtained in a distance calculation mode; the vector space concept is adopted in the chemical operation process, so that category expansion and category aggregation are possible; behavior category quantification is utilized to realize unification of behavior category fusion on the space-time dimension; the behavior category prediction method for adjusting the characteristics of the video stream of the service window has data pertinence.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the present invention. Like components are numbered similarly in the various figures.
FIG. 1 is a flow chart illustrating a classification method for temporal-spatial information fusion of surveillance videos according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a classification apparatus for monitoring video temporal-spatial information fusion according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention;
fig. 4 shows a schematic structural diagram of a storage medium according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.
Example 1
The embodiment 1 of the invention discloses a classification method for monitoring video spatio-temporal information fusion, which comprises the following steps as shown in figure 1:
s101, acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
specifically, the behavior categories include: one or more of working on duty, not working on duty, playing mobile phone on duty, sleeping on duty and off duty;
the influence values corresponding to different behavior categories are related to the application scene.
Thus, the "acquiring a sample video set" in step S101 includes:
performing time sampling and space sampling for a certain number of times on the video clips with the segmented duration by using a behavior classification method of the double video streams; each video clip is marked with a behavior category and an influence value;
and taking the video clips subjected to time sampling and space sampling as a sample video set.
Specifically, for example, the behavior or posture of the service window staff mainly includes working on duty, not working on duty (drinking, chatting, inattention, etc.), playing a mobile phone, sleeping, lacking duty, etc., and a certain amount of operation definition classification labels can be performed on the behavior of the window staff according to the level of the influence on the working efficiency, such as:
off duty ═ x6:x6∈R6};
Sleep ═ x5:x5∈R5};
Playing mobile phone ═ x4:x4∈R4};
On Shift not working ═ x3:x3∈R3};
On Shift working conversation ═ x2:x2∈R2};
On Shift work ═ x1:x1∈R1};
Wherein R is1∪R2∪R3∪R4∪R5∪R6=RkAnd k is a positive integer. When k is 1, xiIs a scalar value, then there is x1<x2<x3<x4<x5<x6(ii) a When k > 1, xiAs a vector value, thenWhen two adjacent behaviors affect the level of working efficiency to be constant, x can be madei=xi+1(ii) a Each behavior category can also be further decomposed into a plurality of subclasses, and the same quantization labeling process is adopted; behavioral impact performance levels may also be ranked differently in different applications.
S102, randomly and uniformly selecting videos with various behavior categories in the sample video set;
s103, inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier;
specifically, the video input into the preset classifier is a plurality of video segments which correspond to the same window and are distributed in different spaces.
Specifically, C is carried out on the video clip with the well-segmented duration T by utilizing a behavior classification method of the double video streamstSub time sum CsSub-space sampling; combining the quantization behavior labels corresponding to the video clips to obtain a plurality of fast and slow video samples of the same window video clip in different spatial distributions; and then, inputting the video sample into a classifier to carry out deep network weight training to obtain the classifier.
Step S104, importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points;
specifically, the step of importing a video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points includes:
importing the video to be identified into a trained classifier;
and carrying out multiple times of spatial sampling and behavior classification on the video clips of the same window in the video to be identified through the trained shunt to obtain a plurality of preliminary prediction results of the same time point.
Specifically, after obtaining the classifier, C for a window video stream segmentsThe C can be obtained after the sub-space sampling is carried out and the behavior classification is carried outsAnd the classification category label result related to the behavior space at the same time sampling moment is the preliminary prediction result.
Step S105, carrying out chessboard distance calculation on the preliminary prediction result of each time point according to a behavior quantization value as a distance value to obtain a category label of the time point;
in addition, specifically, after the preliminary prediction result is obtained, the chessboard distance calculation is performed on the spatial dimension behavior prediction result (that is, the preliminary prediction result) of the video stream segment according to the behavior quantization value as the distance value to obtain the time sampling point tuFinal class label of (1):
the class label is predicted based on the following formula:
wherein the content of the first and second substances,is a category label;the result is a preliminary prediction result; when the impact value is a scalar quantity,is a scalar value; when the influence value isWhen the vector is used as the vector, the vector is obtained,in the form of a vector of values,by vector 2n norm (n)>0) Carrying out quantitative measurement; t is tuIs a time point; c. CsThe number of spatial samples.
And S106, performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be recognized.
Specifically, all time sampling points t are determined by considering the condition that the time variation amplitude of the station personnel behavior category is not significantuAfter the behavior category prediction result is obtained, a behavior category fusion operation similar to the time dimension of the space dimension mode can be performed to obtain a final behavior category prediction result of the video segment.
And performing behavior category fusion operation of a time dimension based on the following formula to obtain the final behavior category prediction result:
wherein L iss,tPredicting results for the final behavior categories; c. CtIs the number of time samples taken.
The invention provides a behavior classification method framework of temporal-spatial information fusion aiming at the problem of human behavior classification of the current service window monitoring video, which is used for realizing the following steps: classifying the behavior of the staff in a multi-scene and multi-service window area; quantifying behavior category information according to a preset level, such as influence on working efficiency and the like; by means of a behavior classification method of double video stream deep learning and chessboard distance and norm concepts, time-space domain decision fusion of quantitative behavior category prediction results is achieved.
Example 2
For further explanation of the present invention, embodiment 2 of the present invention further discloses a classification apparatus for monitoring video temporal-spatial information fusion, which includes:
an obtaining module 201, configured to obtain a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
a selecting module 202, configured to randomly and uniformly select videos of different behavior categories in the sample video set;
the training module 203 is used for inputting the selected video into a preset classifier to perform deep network weight training to obtain a trained classifier;
the preliminary prediction module 204 is configured to import a video to be identified into a trained classifier for prediction, so as to obtain multiple preliminary prediction results at corresponding time points;
a label module 205, configured to perform chessboard distance calculation on the preliminary prediction result at each time point according to a behavior quantization value as a distance value to obtain a category label of the time point;
and the fusion module 206 is configured to perform a time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified.
In a specific embodiment, the behavior classification includes: one or more of working on duty, not working on duty, playing mobile phone on duty, sleeping on duty and off duty;
the influence values corresponding to different behavior classifications are related to the application scene.
In a specific embodiment, the obtaining module 201 is configured to:
performing time sampling and space sampling for a certain number of times on the video clips with the segmented duration by using a behavior classification method of the double video streams; each video clip is marked with a behavior category and an influence value;
and taking the video clips subjected to time sampling and space sampling as a sample video set.
In a specific embodiment, the video input to the preset classifier is a plurality of video segments corresponding to different spatial distributions of the same window.
In a specific embodiment, the training module 203 is configured to:
importing the video to be identified into a trained classifier;
and carrying out multiple times of spatial sampling and behavior classification on the video clips of the same window in the video to be identified through the trained shunt to obtain a plurality of preliminary prediction results of the same time point.
In a specific embodiment, the preliminary prediction result is predicted by the trained classifier based on the following formula:
wherein the content of the first and second substances,the result is a preliminary prediction result; when the impact value is a scalar quantity,is a scalar value; when the impact value is a vector of values,in the form of a vector of values,by vector 2n norm (n)>0) Carrying out quantitative measurement; t is tuIs a time point; c. CsThe number of results is preliminarily predicted for the same time point.
In a specific embodiment, the final behavior category prediction result is obtained by performing a behavior category fusion operation in a time dimension based on the following formula:
wherein L iss,tFor final behavior classesMeasuring a result; c. CtIs the number of time samples taken.
Example 3
Embodiment 3 of the present invention further discloses a terminal, as shown in fig. 3, which includes a memory and a processor, where the memory stores a computer program, and the processor runs the computer program to enable the processor to execute the classification method for monitoring video spatiotemporal information fusion according to embodiment 1.
Example 4
Embodiment 4 of the present invention further discloses a storage medium, as shown in fig. 4, where the storage medium stores a computer program, and the computer program is executed by a processor to implement the classification method for monitoring video temporal-spatial information fusion described in embodiment 1.
Therefore, the embodiment of the invention provides a classification method, a device and a terminal for monitoring video space-time information fusion, wherein the method comprises the following steps: acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level; randomly and uniformly selecting videos with various behavior categories in the sample video set; inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier; importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points; performing chessboard distance calculation on the preliminary prediction result of each time point according to a behavior quantization value as a distance value to obtain a category label of the time point; and performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified. In the scheme, the behavior categories are quantized, so that decision fusion of the behavior categories can be obtained in a distance calculation mode; the vector space concept is adopted in the chemical operation process, so that category expansion and category aggregation are possible; behavior category quantification is utilized to realize unification of behavior category fusion on the space-time dimension; the behavior category prediction method for adjusting the characteristics of the video stream of the service window has data pertinence.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part of the technical solution that contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.
Claims (8)
1. A classification method for monitoring video space-time information fusion is characterized by comprising the following steps:
acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
randomly and uniformly selecting videos with various behavior categories in the sample video set;
inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier;
importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points; the step of importing the video to be identified into the trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to the time points includes: importing the video to be identified into a trained classifier; carrying out multiple times of spatial sampling and behavior classification on video segments of the same window in the video to be recognized through a trained classifier to obtain multiple initial prediction results of the same time point;
taking the behavior quantization value corresponding to the preliminary prediction result of each time point as a distance value to calculate chessboard distance to obtain a category label of the time point;
performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified;
the class label is predicted based on the following formula:
wherein the content of the first and second substances,is a category label;the result is a preliminary prediction result; when the impact value is a scalar quantity,is a scalar value; when the impact value is a vector of values,in the form of a vector of values,by vector 2n norm (n)>0) Carrying out quantitative measurement; t is tuIs a time point; c. CsThe number of spatial samples.
2. The method of claim 1, wherein the behavior categories include: one or more of working on duty, not working on duty, playing mobile phone on duty, sleeping on duty and off duty;
the influence values corresponding to different behavior categories are related to the application scene.
3. The method of claim 1, wherein said obtaining a sample video set comprises:
performing time sampling and space sampling for a certain number of times on the video clips with the segmented duration by using a behavior classification method of the double video streams; each video clip is marked with a behavior category and an influence value;
and taking the video clips subjected to time sampling and space sampling as a sample video set.
4. A method as claimed in claim 1 or 3, wherein the video input to the preset classifier is a plurality of video segments corresponding to different spatial distributions of the same window.
6. A classification device for monitoring video temporal-spatial information fusion is characterized by comprising:
the acquisition module is used for acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
the selection module is used for randomly and uniformly selecting videos with various behavior categories in the sample video set;
the training module is used for inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier;
the preliminary prediction module is used for importing the video to be recognized into the trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points; the preliminary prediction module is used for importing the video to be identified into a trained classifier; carrying out multiple times of spatial sampling and behavior classification on video segments of the same window in the video to be recognized through a trained classifier to obtain multiple initial prediction results of the same time point;
the label module is used for calculating the chessboard distance by taking the behavior quantization value corresponding to the preliminary prediction result of each time point as a distance value to obtain a category label of the time point;
the fusion module is used for performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified;
the class label is predicted based on the following formula:
wherein the content of the first and second substances,is a category label;the result is a preliminary prediction result; when the impact value is a scalar quantity,is a scalar value; when the impact value is a vector of values,in the form of a vector of values,by vector 2n norm (n)>0) Carrying out quantitative measurement; t is tuIs a time point; c. CsThe number of spatial samples.
7. A terminal comprising a memory storing a computer program and a processor executing the computer program to cause the processor to perform the classification method for surveillance video spatiotemporal information fusion according to any one of claims 1-5.
8. A storage medium having stored thereon a computer program which, when executed by a processor, implements a surveillance video spatiotemporal information fusion classification method according to any one of claims 1-5.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110268368 | 2021-03-12 | ||
CN2021102683686 | 2021-03-12 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113469142A CN113469142A (en) | 2021-10-01 |
CN113469142B true CN113469142B (en) | 2022-01-14 |
Family
ID=77867979
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110932947.6A Active CN113469142B (en) | 2021-03-12 | 2021-09-01 | Classification method, device and terminal for monitoring video time-space information fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113469142B (en) |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108174165A (en) * | 2018-01-17 | 2018-06-15 | 重庆览辉信息技术有限公司 | Electric power safety operation and O&M intelligent monitoring system and method |
CN108664922A (en) * | 2018-05-10 | 2018-10-16 | 东华大学 | A kind of infrared video Human bodys' response method based on personal safety |
CN109583335A (en) * | 2018-11-16 | 2019-04-05 | 中山大学 | A kind of video human Activity recognition method based on Spatial-temporal Information Fusion |
CN109886165A (en) * | 2019-01-23 | 2019-06-14 | 中国科学院重庆绿色智能技术研究院 | A kind of action video extraction and classification method based on moving object detection |
CN109961037A (en) * | 2019-03-20 | 2019-07-02 | 中共中央办公厅电子科技学院(北京电子科技学院) | A kind of examination hall video monitoring abnormal behavior recognition methods |
CN110032926A (en) * | 2019-02-22 | 2019-07-19 | 哈尔滨工业大学(深圳) | A kind of video classification methods and equipment based on deep learning |
CN110532857A (en) * | 2019-07-16 | 2019-12-03 | 杭州电子科技大学 | Based on the Activity recognition image analysis system under multi-cam |
CN110852295A (en) * | 2019-10-15 | 2020-02-28 | 深圳龙岗智能视听研究院 | Video behavior identification method based on multitask supervised learning |
CN111062356A (en) * | 2019-12-26 | 2020-04-24 | 沈阳理工大学 | Method for automatically identifying human body action abnormity from monitoring video |
CN111259782A (en) * | 2020-01-14 | 2020-06-09 | 北京大学 | Video behavior identification method based on mixed multi-scale time sequence separable convolution operation |
CN111626265A (en) * | 2020-06-12 | 2020-09-04 | 上海依图网络科技有限公司 | Multi-camera downlink identification method and device and computer readable storage medium |
CN111626245A (en) * | 2020-06-01 | 2020-09-04 | 安徽大学 | Human behavior identification method based on video key frame |
CN111798018A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Behavior prediction method, behavior prediction device, storage medium and electronic equipment |
CN112396093A (en) * | 2020-10-29 | 2021-02-23 | 中国汽车技术研究中心有限公司 | Driving scene classification method, device and equipment and readable storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480642A (en) * | 2017-08-18 | 2017-12-15 | 深圳市唯特视科技有限公司 | A kind of video actions recognition methods based on Time Domain Piecewise network |
CN110188668B (en) * | 2019-05-28 | 2020-09-25 | 复旦大学 | Small sample video action classification method |
CN110647903A (en) * | 2019-06-20 | 2020-01-03 | 杭州趣维科技有限公司 | Short video frequency classification method |
CN111178319A (en) * | 2020-01-06 | 2020-05-19 | 山西大学 | Video behavior identification method based on compression reward and punishment mechanism |
-
2021
- 2021-09-01 CN CN202110932947.6A patent/CN113469142B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108174165A (en) * | 2018-01-17 | 2018-06-15 | 重庆览辉信息技术有限公司 | Electric power safety operation and O&M intelligent monitoring system and method |
CN108664922A (en) * | 2018-05-10 | 2018-10-16 | 东华大学 | A kind of infrared video Human bodys' response method based on personal safety |
CN109583335A (en) * | 2018-11-16 | 2019-04-05 | 中山大学 | A kind of video human Activity recognition method based on Spatial-temporal Information Fusion |
CN109886165A (en) * | 2019-01-23 | 2019-06-14 | 中国科学院重庆绿色智能技术研究院 | A kind of action video extraction and classification method based on moving object detection |
CN110032926A (en) * | 2019-02-22 | 2019-07-19 | 哈尔滨工业大学(深圳) | A kind of video classification methods and equipment based on deep learning |
CN109961037A (en) * | 2019-03-20 | 2019-07-02 | 中共中央办公厅电子科技学院(北京电子科技学院) | A kind of examination hall video monitoring abnormal behavior recognition methods |
CN111798018A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Behavior prediction method, behavior prediction device, storage medium and electronic equipment |
CN110532857A (en) * | 2019-07-16 | 2019-12-03 | 杭州电子科技大学 | Based on the Activity recognition image analysis system under multi-cam |
CN110852295A (en) * | 2019-10-15 | 2020-02-28 | 深圳龙岗智能视听研究院 | Video behavior identification method based on multitask supervised learning |
CN111062356A (en) * | 2019-12-26 | 2020-04-24 | 沈阳理工大学 | Method for automatically identifying human body action abnormity from monitoring video |
CN111259782A (en) * | 2020-01-14 | 2020-06-09 | 北京大学 | Video behavior identification method based on mixed multi-scale time sequence separable convolution operation |
CN111626245A (en) * | 2020-06-01 | 2020-09-04 | 安徽大学 | Human behavior identification method based on video key frame |
CN111626265A (en) * | 2020-06-12 | 2020-09-04 | 上海依图网络科技有限公司 | Multi-camera downlink identification method and device and computer readable storage medium |
CN112396093A (en) * | 2020-10-29 | 2021-02-23 | 中国汽车技术研究中心有限公司 | Driving scene classification method, device and equipment and readable storage medium |
Non-Patent Citations (2)
Title |
---|
基于压缩奖惩机制的视频行为识别方法研究;张丽红 等;《测试技术学报》;20201031;第34卷(第05期);第418-424页 * |
基于注意力机制的时间分组深度网络行为识别算法;胡正平 等;《模式识别与人工智能》;20191031;第32卷(第10期);第892-900页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113469142A (en) | 2021-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Goyette et al. | A novel video dataset for change detection benchmarking | |
Boom et al. | A research tool for long-term and continuous analysis of fish assemblage in coral-reefs using underwater camera footage | |
CA3066029A1 (en) | Image feature acquisition | |
You et al. | Traffic accident benchmark for causality recognition | |
CN104254873A (en) | Alert volume normalization in a video surveillance system | |
Yang et al. | Moving object detection for dynamic background scenes based on spatiotemporal model | |
US11275970B2 (en) | Systems and methods for distributed data analytics | |
Yang et al. | Anomaly detection in moving crowds through spatiotemporal autoencoding and additional attention | |
Ratre et al. | Tucker visual search-based hybrid tracking model and Fractional Kohonen Self-Organizing Map for anomaly localization and detection in surveillance videos | |
CN112507860A (en) | Video annotation method, device, equipment and storage medium | |
Tralic et al. | Video frame copy-move forgery detection based on cellular automata and local binary patterns | |
Gupta et al. | Accident detection using time-distributed model in videos | |
Vijayan et al. | A fully residual convolutional neural network for background subtraction | |
Selvaraj et al. | L1 norm based pedestrian detection using video analytics technique | |
Alashban et al. | Single convolutional neural network with three layers model for crowd density estimation | |
CN115187884A (en) | High-altitude parabolic identification method and device, electronic equipment and storage medium | |
CN113469142B (en) | Classification method, device and terminal for monitoring video time-space information fusion | |
Patel et al. | Vehicle tracking and monitoring in surveillance video | |
CN109614893B (en) | Intelligent abnormal behavior track identification method and device based on situation reasoning | |
Zhao | Deep Learning in Video Anomaly Detection and Its Applications | |
Behera et al. | Characterization of dense crowd using gibbs entropy | |
Kwak et al. | Human action classification and unusual action recognition algorithm for intelligent surveillance system | |
David et al. | Crime Forecasting using Interpretable Regression Techniques | |
Tani et al. | Frame-wise action recognition training framework for skeleton-based anomaly behavior detection | |
Srivastava | Machine Learning Based Crowd Behaviour Analysis and Prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 030013 Room 707, Block A, Gaoxin Guozhi Building, No. 3, Dong'e'er Lane, Taiyuan Xuefu Park, Shanxi Comprehensive Reform Demonstration Zone, Taiyuan City, Shanxi Province Patentee after: Changhe Information Co.,Ltd. Address before: 030000 room 707, block a, Gaoxin Guozhi building, No. 3, Dongyi second lane, Taiyuan Xuefu Park, Shanxi comprehensive reform demonstration zone, Taiyuan City, Shanxi Province Patentee before: Shanxi Changhe Technology Co.,Ltd. |