CN113469142B - Classification method, device and terminal for monitoring video time-space information fusion - Google Patents

Classification method, device and terminal for monitoring video time-space information fusion Download PDF

Info

Publication number
CN113469142B
CN113469142B CN202110932947.6A CN202110932947A CN113469142B CN 113469142 B CN113469142 B CN 113469142B CN 202110932947 A CN202110932947 A CN 202110932947A CN 113469142 B CN113469142 B CN 113469142B
Authority
CN
China
Prior art keywords
video
behavior
category
time
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110932947.6A
Other languages
Chinese (zh)
Other versions
CN113469142A (en
Inventor
张煇
剌昊跃
柳世豪
陈宏涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changhe Information Co ltd
Original Assignee
Shanxi Changhe Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi Changhe Technology Co ltd filed Critical Shanxi Changhe Technology Co ltd
Publication of CN113469142A publication Critical patent/CN113469142A/en
Application granted granted Critical
Publication of CN113469142B publication Critical patent/CN113469142B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a classification method, a device and a terminal for monitoring video space-time information fusion, wherein the method comprises the following steps: acquiring a sample video set; randomly and uniformly selecting videos with various types in a sample video set; inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier; importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points; calculating the chessboard distance of the preliminary prediction result of each time point according to the behavior quantization value as a distance value to obtain a category label of the time point; and performing time-dimension behavior category fusion operation on the category labels of all time points to obtain a final behavior category prediction result of the video to be identified. According to the scheme, behavior category quantization is utilized to realize unification of behavior category fusion on the space-time dimension, and data pertinence prediction is achieved on the characteristics of the service window video stream.

Description

Classification method, device and terminal for monitoring video time-space information fusion
Technical Field
The invention relates to the technical field of video analysis and deep learning, in particular to a classification method, a device and a terminal for monitoring video temporal-spatial information fusion.
Background
The system can effectively improve the working efficiency of the service window, protect the legal rights and interests of the working personnel and people, and has obvious significance for improving the service level and image of the government and service enterprise departments for people.
At present, the monitoring video station information is analyzed and processed by means of an automatic behavior detection and classification technology based on the monitoring video, so that people can be helped to release from complex and time-consuming work such as behavior retrieval, behavior analysis and the like of manual video examination, and the existing automatic behavior detection and classification technology based on the monitoring video comprises the following steps:
one is that the behavior detection method based on optical flow realizes the behavior detection depending on the optical flow difference generated by the human behavior change. Specifically, according to the trend or the track of the optical flow change, the video image is correspondingly subjected to feature extraction or coding to obtain classification data of different types of behaviors, and the classification data is used as a classification algorithm such as an SVM (support vector machine) for training to realize a behavior classification process. The method has the problems that the feature extraction and selection are based on general image methods such as histogram analysis, gradient analysis, optical flow tracking and the like, the analysis and modeling are not based on behavior difference, the analysis process is easily interfered by other factors such as object motion and the like, and the detection precision is insufficient.
The method utilizes the human skeleton characteristic information to make classification and identification of human behaviors more targeted, forms a specific key point structure by manually defining and automatically detecting key parts such as the head, the shoulder, the hand and the like of the human body, can well complete human detection and identification tasks under the conditions of no shielding and normal visual angle, and provides powerful guarantee for behavior classification. However, the posture data of the personnel acquired by the service window monitoring video cannot always meet the above conditions, and the service window data generally has the problems that the key points are shielded, the human body monitoring is difficult, the condition of serious personnel missing detection is caused, and the automatic posture recognition and classification task cannot be met.
The other method is a behavior classification method of a double video stream deep neural network by referring to the processing and abstraction processes of human visual cells on visual signals, and the method depends on the learning and training of a multilayer complex network; however, the accurate behavior marking and classification are still difficult due to the characteristics of long-time fixed posture, large working activity range and the like of the workers under video monitoring, and the problems of multi-person behavior category information, space behavior category conflict and the like contained in a video segment are still outstanding; meanwhile, the service window is different from single individual behavior classification, and interaction processes often exist among workers and between the workers and the public, so that the space range, the class marking and classification of the behavior classification are more complicated.
Thus, there is a need for a better solution to the problems of the prior art.
Disclosure of Invention
In view of this, the invention provides a classification method, a device and a terminal for monitoring video temporal-spatial information fusion, which are used for solving the problems in the prior art.
Specifically, the present invention proposes the following specific examples:
the embodiment of the invention provides a classification method for monitoring video space-time information fusion, which comprises the following steps:
acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
randomly and uniformly selecting videos with various behavior categories in the sample video set;
inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier;
importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points;
performing chessboard distance calculation on the preliminary prediction result of each time point according to a behavior quantization value as a distance value to obtain a category label of the time point;
and performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified.
In a specific embodiment, the behavior categories include: one or more of working on duty, not working on duty, playing mobile phone on duty, sleeping on duty and off duty;
the influence values corresponding to different behavior categories are related to the application scene.
In a specific embodiment, the acquiring a sample video set includes:
performing time sampling and space sampling for a certain number of times on the video clips with the segmented duration by using a behavior classification method of the double video streams; each video clip is marked with a behavior category and an influence value;
and taking the video clips subjected to time sampling and space sampling as a sample video set.
In a specific embodiment, the video input to the preset classifier is a plurality of video segments corresponding to different spatial distributions of the same window.
In a specific embodiment, the step of introducing a video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points includes:
importing the video to be identified into a trained classifier;
and carrying out multiple times of spatial sampling and behavior classification on the video clips of the same window in the video to be identified through the trained shunt to obtain a plurality of preliminary prediction results of the same time point.
In a specific embodiment, the category label is predicted based on the following formula:
Figure BDA0003211710290000041
wherein the content of the first and second substances,
Figure BDA0003211710290000042
is a category label;
Figure BDA0003211710290000043
the result is a preliminary prediction result; when the impact value is a scalar quantity,
Figure BDA0003211710290000044
is a scalar value; when the impact value is a vector of values,
Figure BDA0003211710290000045
in the form of a vector of values,
Figure BDA0003211710290000046
by vector 2n norm (n)>0) Carrying out quantitative measurement; t is tuIs time of dayPoint; c. CsThe number of spatial samples.
In a specific embodiment, the final behavior category prediction result is obtained by performing a behavior category fusion operation in a time dimension based on the following formula:
Figure BDA0003211710290000047
wherein L iss,tPredicting results for the final behavior categories; c. CtIs the number of time samples taken.
The embodiment of the invention also provides a classification device for monitoring video temporal-spatial information fusion, which comprises:
the acquisition module is used for acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
the selection module is used for randomly and uniformly selecting videos with various behavior categories in the sample video set;
the training module is used for inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier;
the preliminary prediction module is used for importing the video to be recognized into the trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points;
the label module is used for calculating the chessboard distance of the preliminary prediction result of each time point according to the behavior quantization value as a distance value to obtain a category label of the time point;
and the fusion module is used for performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified.
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor runs the computer program to enable the processor to execute the classification method for the monitoring video spatiotemporal information fusion.
The embodiment of the invention also provides a storage medium, wherein a computer program is stored on the storage medium, and when being executed by a processor, the computer program realizes the classification method for the monitoring video time-space information fusion.
Therefore, the embodiment of the invention provides a classification method, a device and a terminal for monitoring video space-time information fusion, wherein the method comprises the following steps: acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level; randomly and uniformly selecting videos with various behavior categories in the sample video set; inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier; importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points; performing chessboard distance calculation on the preliminary prediction result of each time point according to a behavior quantization value as a distance value to obtain a category label of the time point; and performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified. In the scheme, the behavior categories are quantized, so that decision fusion of the behavior categories can be obtained in a distance calculation mode; the vector space concept is adopted in the chemical operation process, so that category expansion and category aggregation are possible; behavior category quantification is utilized to realize unification of behavior category fusion on the space-time dimension; the behavior category prediction method for adjusting the characteristics of the video stream of the service window has data pertinence.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the present invention. Like components are numbered similarly in the various figures.
FIG. 1 is a flow chart illustrating a classification method for temporal-spatial information fusion of surveillance videos according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a classification apparatus for monitoring video temporal-spatial information fusion according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention;
fig. 4 shows a schematic structural diagram of a storage medium according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.
Example 1
The embodiment 1 of the invention discloses a classification method for monitoring video spatio-temporal information fusion, which comprises the following steps as shown in figure 1:
s101, acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
specifically, the behavior categories include: one or more of working on duty, not working on duty, playing mobile phone on duty, sleeping on duty and off duty;
the influence values corresponding to different behavior categories are related to the application scene.
Thus, the "acquiring a sample video set" in step S101 includes:
performing time sampling and space sampling for a certain number of times on the video clips with the segmented duration by using a behavior classification method of the double video streams; each video clip is marked with a behavior category and an influence value;
and taking the video clips subjected to time sampling and space sampling as a sample video set.
Specifically, for example, the behavior or posture of the service window staff mainly includes working on duty, not working on duty (drinking, chatting, inattention, etc.), playing a mobile phone, sleeping, lacking duty, etc., and a certain amount of operation definition classification labels can be performed on the behavior of the window staff according to the level of the influence on the working efficiency, such as:
off duty ═ x6:x6∈R6};
Sleep ═ x5:x5∈R5};
Playing mobile phone ═ x4:x4∈R4};
On Shift not working ═ x3:x3∈R3};
On Shift working conversation ═ x2:x2∈R2};
On Shift work ═ x1:x1∈R1};
Wherein R is1∪R2∪R3∪R4∪R5∪R6=RkAnd k is a positive integer. When k is 1, xiIs a scalar value, then there is x1<x2<x3<x4<x5<x6(ii) a When k > 1, xiAs a vector value, then
Figure BDA0003211710290000081
When two adjacent behaviors affect the level of working efficiency to be constant, x can be madei=xi+1(ii) a Each behavior category can also be further decomposed into a plurality of subclasses, and the same quantization labeling process is adopted; behavioral impact performance levels may also be ranked differently in different applications.
S102, randomly and uniformly selecting videos with various behavior categories in the sample video set;
s103, inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier;
specifically, the video input into the preset classifier is a plurality of video segments which correspond to the same window and are distributed in different spaces.
Specifically, C is carried out on the video clip with the well-segmented duration T by utilizing a behavior classification method of the double video streamstSub time sum CsSub-space sampling; combining the quantization behavior labels corresponding to the video clips to obtain a plurality of fast and slow video samples of the same window video clip in different spatial distributions; and then, inputting the video sample into a classifier to carry out deep network weight training to obtain the classifier.
Step S104, importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points;
specifically, the step of importing a video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points includes:
importing the video to be identified into a trained classifier;
and carrying out multiple times of spatial sampling and behavior classification on the video clips of the same window in the video to be identified through the trained shunt to obtain a plurality of preliminary prediction results of the same time point.
Specifically, after obtaining the classifier, C for a window video stream segmentsThe C can be obtained after the sub-space sampling is carried out and the behavior classification is carried outsAnd the classification category label result related to the behavior space at the same time sampling moment is the preliminary prediction result.
Step S105, carrying out chessboard distance calculation on the preliminary prediction result of each time point according to a behavior quantization value as a distance value to obtain a category label of the time point;
in addition, specifically, after the preliminary prediction result is obtained, the chessboard distance calculation is performed on the spatial dimension behavior prediction result (that is, the preliminary prediction result) of the video stream segment according to the behavior quantization value as the distance value to obtain the time sampling point tuFinal class label of (1):
the class label is predicted based on the following formula:
Figure BDA0003211710290000091
wherein the content of the first and second substances,
Figure BDA0003211710290000092
is a category label;
Figure BDA0003211710290000093
the result is a preliminary prediction result; when the impact value is a scalar quantity,
Figure BDA0003211710290000094
is a scalar value; when the influence value isWhen the vector is used as the vector, the vector is obtained,
Figure BDA0003211710290000095
in the form of a vector of values,
Figure BDA0003211710290000096
by vector 2n norm (n)>0) Carrying out quantitative measurement; t is tuIs a time point; c. CsThe number of spatial samples.
And S106, performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be recognized.
Specifically, all time sampling points t are determined by considering the condition that the time variation amplitude of the station personnel behavior category is not significantuAfter the behavior category prediction result is obtained, a behavior category fusion operation similar to the time dimension of the space dimension mode can be performed to obtain a final behavior category prediction result of the video segment.
And performing behavior category fusion operation of a time dimension based on the following formula to obtain the final behavior category prediction result:
Figure BDA0003211710290000097
wherein L iss,tPredicting results for the final behavior categories; c. CtIs the number of time samples taken.
The invention provides a behavior classification method framework of temporal-spatial information fusion aiming at the problem of human behavior classification of the current service window monitoring video, which is used for realizing the following steps: classifying the behavior of the staff in a multi-scene and multi-service window area; quantifying behavior category information according to a preset level, such as influence on working efficiency and the like; by means of a behavior classification method of double video stream deep learning and chessboard distance and norm concepts, time-space domain decision fusion of quantitative behavior category prediction results is achieved.
Example 2
For further explanation of the present invention, embodiment 2 of the present invention further discloses a classification apparatus for monitoring video temporal-spatial information fusion, which includes:
an obtaining module 201, configured to obtain a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
a selecting module 202, configured to randomly and uniformly select videos of different behavior categories in the sample video set;
the training module 203 is used for inputting the selected video into a preset classifier to perform deep network weight training to obtain a trained classifier;
the preliminary prediction module 204 is configured to import a video to be identified into a trained classifier for prediction, so as to obtain multiple preliminary prediction results at corresponding time points;
a label module 205, configured to perform chessboard distance calculation on the preliminary prediction result at each time point according to a behavior quantization value as a distance value to obtain a category label of the time point;
and the fusion module 206 is configured to perform a time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified.
In a specific embodiment, the behavior classification includes: one or more of working on duty, not working on duty, playing mobile phone on duty, sleeping on duty and off duty;
the influence values corresponding to different behavior classifications are related to the application scene.
In a specific embodiment, the obtaining module 201 is configured to:
performing time sampling and space sampling for a certain number of times on the video clips with the segmented duration by using a behavior classification method of the double video streams; each video clip is marked with a behavior category and an influence value;
and taking the video clips subjected to time sampling and space sampling as a sample video set.
In a specific embodiment, the video input to the preset classifier is a plurality of video segments corresponding to different spatial distributions of the same window.
In a specific embodiment, the training module 203 is configured to:
importing the video to be identified into a trained classifier;
and carrying out multiple times of spatial sampling and behavior classification on the video clips of the same window in the video to be identified through the trained shunt to obtain a plurality of preliminary prediction results of the same time point.
In a specific embodiment, the preliminary prediction result is predicted by the trained classifier based on the following formula:
Figure BDA0003211710290000111
wherein the content of the first and second substances,
Figure BDA0003211710290000112
the result is a preliminary prediction result; when the impact value is a scalar quantity,
Figure BDA0003211710290000113
is a scalar value; when the impact value is a vector of values,
Figure BDA0003211710290000114
in the form of a vector of values,
Figure BDA0003211710290000115
by vector 2n norm (n)>0) Carrying out quantitative measurement; t is tuIs a time point; c. CsThe number of results is preliminarily predicted for the same time point.
In a specific embodiment, the final behavior category prediction result is obtained by performing a behavior category fusion operation in a time dimension based on the following formula:
Figure BDA0003211710290000116
wherein L iss,tFor final behavior classesMeasuring a result; c. CtIs the number of time samples taken.
Example 3
Embodiment 3 of the present invention further discloses a terminal, as shown in fig. 3, which includes a memory and a processor, where the memory stores a computer program, and the processor runs the computer program to enable the processor to execute the classification method for monitoring video spatiotemporal information fusion according to embodiment 1.
Example 4
Embodiment 4 of the present invention further discloses a storage medium, as shown in fig. 4, where the storage medium stores a computer program, and the computer program is executed by a processor to implement the classification method for monitoring video temporal-spatial information fusion described in embodiment 1.
Therefore, the embodiment of the invention provides a classification method, a device and a terminal for monitoring video space-time information fusion, wherein the method comprises the following steps: acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level; randomly and uniformly selecting videos with various behavior categories in the sample video set; inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier; importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points; performing chessboard distance calculation on the preliminary prediction result of each time point according to a behavior quantization value as a distance value to obtain a category label of the time point; and performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified. In the scheme, the behavior categories are quantized, so that decision fusion of the behavior categories can be obtained in a distance calculation mode; the vector space concept is adopted in the chemical operation process, so that category expansion and category aggregation are possible; behavior category quantification is utilized to realize unification of behavior category fusion on the space-time dimension; the behavior category prediction method for adjusting the characteristics of the video stream of the service window has data pertinence.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part of the technical solution that contributes to the prior art in essence can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims (8)

1. A classification method for monitoring video space-time information fusion is characterized by comprising the following steps:
acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
randomly and uniformly selecting videos with various behavior categories in the sample video set;
inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier;
importing the video to be identified into a trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points; the step of importing the video to be identified into the trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to the time points includes: importing the video to be identified into a trained classifier; carrying out multiple times of spatial sampling and behavior classification on video segments of the same window in the video to be recognized through a trained classifier to obtain multiple initial prediction results of the same time point;
taking the behavior quantization value corresponding to the preliminary prediction result of each time point as a distance value to calculate chessboard distance to obtain a category label of the time point;
performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified;
the class label is predicted based on the following formula:
Figure FDA0003405565390000011
wherein the content of the first and second substances,
Figure FDA0003405565390000012
is a category label;
Figure FDA0003405565390000013
the result is a preliminary prediction result; when the impact value is a scalar quantity,
Figure FDA0003405565390000014
is a scalar value; when the impact value is a vector of values,
Figure FDA0003405565390000015
in the form of a vector of values,
Figure FDA0003405565390000016
by vector 2n norm (n)>0) Carrying out quantitative measurement; t is tuIs a time point; c. CsThe number of spatial samples.
2. The method of claim 1, wherein the behavior categories include: one or more of working on duty, not working on duty, playing mobile phone on duty, sleeping on duty and off duty;
the influence values corresponding to different behavior categories are related to the application scene.
3. The method of claim 1, wherein said obtaining a sample video set comprises:
performing time sampling and space sampling for a certain number of times on the video clips with the segmented duration by using a behavior classification method of the double video streams; each video clip is marked with a behavior category and an influence value;
and taking the video clips subjected to time sampling and space sampling as a sample video set.
4. A method as claimed in claim 1 or 3, wherein the video input to the preset classifier is a plurality of video segments corresponding to different spatial distributions of the same window.
5. The method of claim 1, wherein the final behavior class prediction result is obtained by performing a behavior class fusion operation in a time dimension based on the following formula:
Figure FDA0003405565390000021
wherein L iss,tPredicting results for the final behavior categories; c. CtIs the number of time samples taken.
6. A classification device for monitoring video temporal-spatial information fusion is characterized by comprising:
the acquisition module is used for acquiring a sample video set; each video in the sample video set is marked with a behavior category and an influence value influencing the work efficiency level;
the selection module is used for randomly and uniformly selecting videos with various behavior categories in the sample video set;
the training module is used for inputting the selected video into a preset classifier to carry out deep network weight training to obtain a trained classifier;
the preliminary prediction module is used for importing the video to be recognized into the trained classifier for prediction to obtain a plurality of preliminary prediction results corresponding to time points; the preliminary prediction module is used for importing the video to be identified into a trained classifier; carrying out multiple times of spatial sampling and behavior classification on video segments of the same window in the video to be recognized through a trained classifier to obtain multiple initial prediction results of the same time point;
the label module is used for calculating the chessboard distance by taking the behavior quantization value corresponding to the preliminary prediction result of each time point as a distance value to obtain a category label of the time point;
the fusion module is used for performing time-dimension behavior category fusion operation on the category labels at all time points to obtain a final behavior category prediction result of the video to be identified;
the class label is predicted based on the following formula:
Figure FDA0003405565390000031
wherein the content of the first and second substances,
Figure FDA0003405565390000032
is a category label;
Figure FDA0003405565390000033
the result is a preliminary prediction result; when the impact value is a scalar quantity,
Figure FDA0003405565390000034
is a scalar value; when the impact value is a vector of values,
Figure FDA0003405565390000035
in the form of a vector of values,
Figure FDA0003405565390000036
by vector 2n norm (n)>0) Carrying out quantitative measurement; t is tuIs a time point; c. CsThe number of spatial samples.
7. A terminal comprising a memory storing a computer program and a processor executing the computer program to cause the processor to perform the classification method for surveillance video spatiotemporal information fusion according to any one of claims 1-5.
8. A storage medium having stored thereon a computer program which, when executed by a processor, implements a surveillance video spatiotemporal information fusion classification method according to any one of claims 1-5.
CN202110932947.6A 2021-03-12 2021-09-01 Classification method, device and terminal for monitoring video time-space information fusion Active CN113469142B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110268368 2021-03-12
CN2021102683686 2021-03-12

Publications (2)

Publication Number Publication Date
CN113469142A CN113469142A (en) 2021-10-01
CN113469142B true CN113469142B (en) 2022-01-14

Family

ID=77867979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110932947.6A Active CN113469142B (en) 2021-03-12 2021-09-01 Classification method, device and terminal for monitoring video time-space information fusion

Country Status (1)

Country Link
CN (1) CN113469142B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108174165A (en) * 2018-01-17 2018-06-15 重庆览辉信息技术有限公司 Electric power safety operation and O&M intelligent monitoring system and method
CN108664922A (en) * 2018-05-10 2018-10-16 东华大学 A kind of infrared video Human bodys' response method based on personal safety
CN109583335A (en) * 2018-11-16 2019-04-05 中山大学 A kind of video human Activity recognition method based on Spatial-temporal Information Fusion
CN109886165A (en) * 2019-01-23 2019-06-14 中国科学院重庆绿色智能技术研究院 A kind of action video extraction and classification method based on moving object detection
CN109961037A (en) * 2019-03-20 2019-07-02 中共中央办公厅电子科技学院(北京电子科技学院) A kind of examination hall video monitoring abnormal behavior recognition methods
CN110032926A (en) * 2019-02-22 2019-07-19 哈尔滨工业大学(深圳) A kind of video classification methods and equipment based on deep learning
CN110532857A (en) * 2019-07-16 2019-12-03 杭州电子科技大学 Based on the Activity recognition image analysis system under multi-cam
CN110852295A (en) * 2019-10-15 2020-02-28 深圳龙岗智能视听研究院 Video behavior identification method based on multitask supervised learning
CN111062356A (en) * 2019-12-26 2020-04-24 沈阳理工大学 Method for automatically identifying human body action abnormity from monitoring video
CN111259782A (en) * 2020-01-14 2020-06-09 北京大学 Video behavior identification method based on mixed multi-scale time sequence separable convolution operation
CN111626265A (en) * 2020-06-12 2020-09-04 上海依图网络科技有限公司 Multi-camera downlink identification method and device and computer readable storage medium
CN111626245A (en) * 2020-06-01 2020-09-04 安徽大学 Human behavior identification method based on video key frame
CN111798018A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Behavior prediction method, behavior prediction device, storage medium and electronic equipment
CN112396093A (en) * 2020-10-29 2021-02-23 中国汽车技术研究中心有限公司 Driving scene classification method, device and equipment and readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480642A (en) * 2017-08-18 2017-12-15 深圳市唯特视科技有限公司 A kind of video actions recognition methods based on Time Domain Piecewise network
CN110188668B (en) * 2019-05-28 2020-09-25 复旦大学 Small sample video action classification method
CN110647903A (en) * 2019-06-20 2020-01-03 杭州趣维科技有限公司 Short video frequency classification method
CN111178319A (en) * 2020-01-06 2020-05-19 山西大学 Video behavior identification method based on compression reward and punishment mechanism

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108174165A (en) * 2018-01-17 2018-06-15 重庆览辉信息技术有限公司 Electric power safety operation and O&M intelligent monitoring system and method
CN108664922A (en) * 2018-05-10 2018-10-16 东华大学 A kind of infrared video Human bodys' response method based on personal safety
CN109583335A (en) * 2018-11-16 2019-04-05 中山大学 A kind of video human Activity recognition method based on Spatial-temporal Information Fusion
CN109886165A (en) * 2019-01-23 2019-06-14 中国科学院重庆绿色智能技术研究院 A kind of action video extraction and classification method based on moving object detection
CN110032926A (en) * 2019-02-22 2019-07-19 哈尔滨工业大学(深圳) A kind of video classification methods and equipment based on deep learning
CN109961037A (en) * 2019-03-20 2019-07-02 中共中央办公厅电子科技学院(北京电子科技学院) A kind of examination hall video monitoring abnormal behavior recognition methods
CN111798018A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Behavior prediction method, behavior prediction device, storage medium and electronic equipment
CN110532857A (en) * 2019-07-16 2019-12-03 杭州电子科技大学 Based on the Activity recognition image analysis system under multi-cam
CN110852295A (en) * 2019-10-15 2020-02-28 深圳龙岗智能视听研究院 Video behavior identification method based on multitask supervised learning
CN111062356A (en) * 2019-12-26 2020-04-24 沈阳理工大学 Method for automatically identifying human body action abnormity from monitoring video
CN111259782A (en) * 2020-01-14 2020-06-09 北京大学 Video behavior identification method based on mixed multi-scale time sequence separable convolution operation
CN111626245A (en) * 2020-06-01 2020-09-04 安徽大学 Human behavior identification method based on video key frame
CN111626265A (en) * 2020-06-12 2020-09-04 上海依图网络科技有限公司 Multi-camera downlink identification method and device and computer readable storage medium
CN112396093A (en) * 2020-10-29 2021-02-23 中国汽车技术研究中心有限公司 Driving scene classification method, device and equipment and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于压缩奖惩机制的视频行为识别方法研究;张丽红 等;《测试技术学报》;20201031;第34卷(第05期);第418-424页 *
基于注意力机制的时间分组深度网络行为识别算法;胡正平 等;《模式识别与人工智能》;20191031;第32卷(第10期);第892-900页 *

Also Published As

Publication number Publication date
CN113469142A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
Goyette et al. A novel video dataset for change detection benchmarking
Boom et al. A research tool for long-term and continuous analysis of fish assemblage in coral-reefs using underwater camera footage
CA3066029A1 (en) Image feature acquisition
You et al. Traffic accident benchmark for causality recognition
CN104254873A (en) Alert volume normalization in a video surveillance system
Yang et al. Moving object detection for dynamic background scenes based on spatiotemporal model
US11275970B2 (en) Systems and methods for distributed data analytics
Yang et al. Anomaly detection in moving crowds through spatiotemporal autoencoding and additional attention
Ratre et al. Tucker visual search-based hybrid tracking model and Fractional Kohonen Self-Organizing Map for anomaly localization and detection in surveillance videos
CN112507860A (en) Video annotation method, device, equipment and storage medium
Tralic et al. Video frame copy-move forgery detection based on cellular automata and local binary patterns
Gupta et al. Accident detection using time-distributed model in videos
Vijayan et al. A fully residual convolutional neural network for background subtraction
Selvaraj et al. L1 norm based pedestrian detection using video analytics technique
Alashban et al. Single convolutional neural network with three layers model for crowd density estimation
CN115187884A (en) High-altitude parabolic identification method and device, electronic equipment and storage medium
CN113469142B (en) Classification method, device and terminal for monitoring video time-space information fusion
Patel et al. Vehicle tracking and monitoring in surveillance video
CN109614893B (en) Intelligent abnormal behavior track identification method and device based on situation reasoning
Zhao Deep Learning in Video Anomaly Detection and Its Applications
Behera et al. Characterization of dense crowd using gibbs entropy
Kwak et al. Human action classification and unusual action recognition algorithm for intelligent surveillance system
David et al. Crime Forecasting using Interpretable Regression Techniques
Tani et al. Frame-wise action recognition training framework for skeleton-based anomaly behavior detection
Srivastava Machine Learning Based Crowd Behaviour Analysis and Prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 030013 Room 707, Block A, Gaoxin Guozhi Building, No. 3, Dong'e'er Lane, Taiyuan Xuefu Park, Shanxi Comprehensive Reform Demonstration Zone, Taiyuan City, Shanxi Province

Patentee after: Changhe Information Co.,Ltd.

Address before: 030000 room 707, block a, Gaoxin Guozhi building, No. 3, Dongyi second lane, Taiyuan Xuefu Park, Shanxi comprehensive reform demonstration zone, Taiyuan City, Shanxi Province

Patentee before: Shanxi Changhe Technology Co.,Ltd.