CN116912596A - Multi-source data normalization processing and fusion method and system - Google Patents

Multi-source data normalization processing and fusion method and system Download PDF

Info

Publication number
CN116912596A
CN116912596A CN202310975393.7A CN202310975393A CN116912596A CN 116912596 A CN116912596 A CN 116912596A CN 202310975393 A CN202310975393 A CN 202310975393A CN 116912596 A CN116912596 A CN 116912596A
Authority
CN
China
Prior art keywords
video
classified
actual
duration
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310975393.7A
Other languages
Chinese (zh)
Other versions
CN116912596B (en
Inventor
王可庆
李鹏
席万强
韩基泰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi University
Original Assignee
Wuxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi University filed Critical Wuxi University
Priority to CN202310975393.7A priority Critical patent/CN116912596B/en
Publication of CN116912596A publication Critical patent/CN116912596A/en
Application granted granted Critical
Publication of CN116912596B publication Critical patent/CN116912596B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of electronic digital data processing, in particular to a method and a system for normalizing and fusing multi-source data, wherein the method comprises the following steps: acquiring a plurality of video code stream data to be classified; decoding the video code stream data to be classified according to the actual category confidence value to obtain a video to be classified; dividing a video to be classified into a plurality of video segments, and dividing video frames in each video segment into the same video frame local area; recognizing the human body contour area in the local area of the video frame, judging whether to count the actual number of the residual video frames of the video segment according to the actual area occupation ratio of the human body contour and the standard area occupation ratio of the human body contour, and adjusting the preset recognition time length according to the difference value of the actual number; and calculating the actual duration of the dangerous action according to the total frame number of the dangerous action, marking the video to be classified according to the duration of the dangerous action to obtain marked videos, and storing the marked videos according to time. The video classification method and the video classification device improve the accuracy of video classification.

Description

Multi-source data normalization processing and fusion method and system
Technical Field
The invention relates to the field of electronic digital data processing, in particular to a method and a system for normalizing and fusing multi-source data.
Background
Video material classification refers to the process of assigning video data to predefined categories, thereby establishing a normalized and standardized video material classification hierarchy that facilitates scientific management of video material. The scientific management of the video data is beneficial to developing the information resource of the video data, enhances the data management of the video data and is very important to improving the management efficiency.
A method for implementing video material classification is disclosed in patent document publication No. CN102768669 a. The invention discloses a method for realizing video data classification, which comprises the following steps: receiving video data; analyzing the main classification type of the video material according to the content of the video material; determining the general multi-score and the imitation-score type of the video data according to the content and the main classification type of the video data, and marking the main classification, the general multi-score and the imitation-score type of the video data to obtain marked video data; judging whether a file folder corresponding to the main classification, the general multi-classification and the imitation classification of the video data exists in the disk or not; if yes, directly storing the marked video data into folders corresponding to main classification, general multi-classification and imitation classification of the video data; otherwise, newly building a folder corresponding to the main classification, the universal multiple classification and the imitation classification of the video data, and storing the marked video data.
However, the prior art has limitations in a method of classifying and storing the same type of video using the main classification technique, resulting in insufficient accuracy in video classification.
Disclosure of Invention
Therefore, the invention provides a method and a system for normalizing and fusing multi-source data, which can solve the technical problem of insufficient accuracy of video classification.
In order to achieve the above object, the present invention provides a method for normalizing and fusing multi-source data, the method comprising:
acquiring a plurality of video code stream data to be classified;
for any video code stream data to be classified, obtaining an actual category confidence value of the video code stream data to be classified, and decoding the video code stream data to be classified according to the actual category confidence value to obtain videos to be classified, wherein the videos to be classified are in one-to-one correspondence with the video code stream data to be classified;
dividing the video to be classified into a plurality of video segments, and dividing each video frame in each video segment into video frame local areas with the same area, wherein the number of video frames in each video segment is the same;
identifying the actual area ratio of the human body contour in the local area of the video frame, judging whether to count the actual number of the video segment residual video frames according to the actual area ratio of the human body contour and the standard area ratio of the human body contour, calculating the actual number difference value of the video segment residual video frames according to the actual number of the video segment residual video frames after judging to count the actual number of the video segment residual video frames, adjusting the preset identification duration according to the adjustment coefficient determined by the actual number difference value to obtain a target identification duration, identifying dangerous actions in the video segment residual video frames in the target identification duration and counting the total frame number containing the dangerous actions;
and calculating the actual duration of the dangerous action according to the total frame number of the dangerous action, marking the video to be classified according to the duration of the dangerous action, and storing the marked video to be classified.
Further, determining whether to count the actual number of the remaining video frames of the video segment according to the actual area occupation ratio of the human body contour and the standard area occupation ratio of the human body contour includes:
setting a human body contour standard area occupation ratio S0;
if the actual area occupation ratio S of the human body contour is larger than or equal to the standard area occupation ratio S0 of the human body contour, the actual number of the residual video frames of the video segment is counted;
if the actual area ratio S of the human body contour is smaller than the standard area ratio S0 of the human body contour and the human body contour is in the second video frame local area or the third video frame local area, the actual number of the residual video frames of the video segment is counted;
if the actual area ratio S of the human body contour is smaller than the standard area ratio S0 of the human body contour and the human body contour is in the local area of the first video frame, the actual number of the residual video frames of the video segment is not counted.
Further, calculating an actual number difference value of the video segment remaining video frames according to the actual number of the video segment remaining video frames, and adjusting the preset recognition duration according to the adjustment coefficient determined by the actual number difference value includes:
setting a standard number L of the rest video frames of the video segment;
if the actual number of the video segment residual video frames is larger than the standard number of the video segment residual video frames, calculating an actual number difference value of the video segment residual video frames;
and adjusting the standard recognition duration according to a first recognition duration adjusting coefficient alpha 1, a second recognition duration adjusting coefficient alpha 2 or a third recognition duration adjusting coefficient alpha 3 determined by the relation between the actual quantity difference and a preset first standard quantity difference delta L1 and a second standard quantity difference delta L2.
Further, adjusting the standard recognition duration according to the first recognition duration adjustment coefficient α1, the second recognition duration adjustment coefficient α2, or the third recognition duration adjustment coefficient α3 includes:
setting a standard quantity difference delta L0 of the remaining video frames of the video segment, wherein L0 (Lmin, lmax);
a first standard quantity difference DeltaL 1 and a second standard quantity difference DeltaL 2 are set,
calculating an actual number difference DeltaL of the remaining video frames of the video segment, wherein DeltaL=L-Lmax;
when Δl < [ Δl1 ], then selecting a first recognition duration adjustment coefficient α1 to adjust the preset recognition duration to t1=t0 (1+α1), wherein α1= (Δl1- Δl)/[ Δl1);
when DeltaL 1 is less than or equal to DeltaL < DeltaL2, a second recognition duration adjustment coefficient alpha 2 is selected to adjust the preset recognition duration to be T1=T0 (1+Alpha 2), wherein alpha 2= [ (DeltaL-DeltaL 1) x (DeltaL 2-DeltaL) ]/Delta L x DeltaL 2;
when Δl is not less than Δl2, a third recognition duration adjustment coefficient α3 is selected to adjust the preset recognition duration to t1=t0 (1+α3), wherein α3= (Δl- Δl2)/Δl;
wherein Δl0= (Lmax-Lmin), Δl2=2/3 (Lmax-Lmin), and Δl1=1/3 (Lmax-Lmin).
Further, marking the video to be classified according to the dangerous action duration includes:
setting a first standard dangerous action duration T1 and a second standard dangerous action duration T2, wherein T1< T2;
when the actual duration time T of the dangerous action is smaller than the first standard dangerous action duration time T1, marking the video to be classified as a first marked video;
when the actual dangerous action duration time T is greater than the first standard dangerous action duration time T1 and less than the second standard dangerous action duration time T2, marking the video to be classified as a second marked video;
and when the actual duration time T of the dangerous action is larger than the second standard dangerous action duration time T2, marking the video to be classified as a third marked video.
Further, obtaining an actual category confidence value of each piece of video code stream data to be classified, and decoding the video code stream data to be classified according to the actual category confidence value to obtain video to be classified, wherein the video to be classified comprises;
obtaining an actual category confidence value of each video code stream data to be classified based on the enhanced convolutional neural network of the motion vector;
and comparing the actual category confidence value with a standard category confidence value to obtain a comparison result, and judging whether to decode the video code stream data to be classified according to the comparison result to obtain the video to be classified.
Further, determining whether to decode the video code stream data to be classified according to the comparison result to obtain the video to be classified includes:
setting a standard class confidence value F0;
if the actual category confidence coefficient value F is greater than or equal to the standard category confidence coefficient value F0, decoding the video code stream data to be classified;
if the actual class confidence value F is smaller than the standard class confidence value F0, performing secondary detection on the class confidence value of the video code stream data to be classified, averaging the detected class confidence values to obtain an average class confidence value, and decoding the video code stream data to be classified according to the average class confidence value.
Further, dividing each video frame in each video segment into video frame partial areas having the same area includes:
establishing a two-dimensional rectangular coordinate system by taking a cross point of a transverse edge and a longitudinal edge at the left lower part of the video frame as an origin to obtain a coordinate range (0, x, y) of the video frame, wherein x represents the length of the video frame and y represents the width of the video frame;
and transversely dividing the video frame area along the direction parallel to the x-axis according to the video frame coordinate range to obtain three video frame local areas, wherein the areas of the video frame local areas are the same, and the three video frame local areas are sequentially from top to bottom: a first video frame local area, a second video frame local area, and a third video frame local area.
Further, calculating the actual duration of the dangerous action based on the total frame number of the dangerous action includes: the actual duration T of the dangerous action is calculated according to equation (1),
t=1/12 seconds x total number of frames of dangerous actions (1).
On the other hand, the embodiment of the invention also provides a system of a multisource data normalization processing and fusion method, which comprises the following steps:
the storage module is used for storing a plurality of video code stream data to be classified;
the acquisition module is connected with the storage module and used for acquiring a plurality of video code stream data to be classified;
the decoding module is connected with the acquisition module and used for obtaining the actual category confidence value of any video code stream data to be classified, decoding the video code stream data to be classified according to the actual category confidence value to obtain videos to be classified, wherein the videos to be classified correspond to the video code stream data to be classified one by one;
the dividing module is connected with the decoding module and used for dividing the video to be classified into a plurality of video segments and dividing each video frame in each video segment into video frame local areas with the same area, wherein the number of the video frames in each video segment is the same;
the analysis module is connected with the dividing module and comprises an identification unit, a judging unit and a first calculation unit, wherein the identification unit is used for identifying the actual area occupied ratio of the human body outline in the local area of the video frame, the judging unit is used for judging whether to count the actual number of the video segment residual video frames according to the actual area occupied ratio of the human body outline and the standard area occupied ratio of the human body outline, the first calculation unit is used for calculating the actual number difference value of the video segment residual video frames according to the actual number of the video segment residual video frames, and adjusting the preset identification duration according to the adjustment coefficient determined by the actual number difference value to obtain a target identification duration, and identifying dangerous actions in the video segment residual video frames and counting the total frame number containing the dangerous actions in the target identification duration;
the marking module is connected with the analysis module and comprises a second calculation unit, a marking unit and a storage unit, wherein the second calculation unit is used for calculating the actual duration of the dangerous action according to the total frame number of the dangerous action, the marking unit is used for marking the videos to be classified according to the duration of the dangerous action, and the storage unit is used for storing the marked videos to be classified.
Compared with the prior art, the method has the advantages that the classification of the video code stream data of the same class stored in a compressed mode is further subdivided according to the characteristics of the video code stream data, the decoding of the video code stream data to be classified is achieved by obtaining the actual class confidence value of each video code stream data to be classified, the purpose of dividing the video to be classified into a plurality of video segments is achieved, the purpose of dividing the local area of a video frame is achieved, the actual area ratio of the human body contour in the local area of the video frame is identified, the actual number difference value of the residual video frames of the video segment is calculated according to the actual number of the residual video frames of the video segment, the purpose of identifying dangerous actions is achieved by adjusting preset identification time, the actual duration of the dangerous actions is calculated according to the total frame number of the dangerous actions, the purpose of re-marking the video to be classified according to the duration of the dangerous actions is achieved, the video after re-marking is stored respectively, the classification of the video is achieved, and the classification accuracy is improved.
Particularly, the primary screening of the video code stream data to be classified is realized through the enhanced convolutional neural network of the motion vector, and the screened video code stream data to be classified is subjected to subsequent identification verification, so that the load of system operation and the number of analysis processing are reduced, and the data processing efficiency is improved.
In particular, the statistics of the actual number of the residual video frames of the video segment is realized through the comparison and judgment of the actual area ratio of the human body contour and the standard area ratio of the human body contour, whether the actual number of the residual video frames of the video segment is counted is judged by utilizing the sizes of the human contour areas in the local areas of the three video frames, the screening of the video frames in the video segment is realized, the counted number after screening is reduced, and the counting efficiency is improved.
In particular, by setting the standard quantity difference value of the residual video frames of the video segment, the dynamic adjustment of the dangerous action recognition duration is realized, the waste of calculation force caused by the too small actual quantity of the residual video frames of the video segment is avoided, the recognition defect caused by the too long actual quantity of the residual video frames of the video segment is also avoided, the dynamic adjustment is realized, the running load is saved in a dynamic adjustment mode, and the recognition efficiency is improved.
In particular, the marking of the videos to be classified is realized through the duration of dangerous actions, the classification of the videos is refined, and the accuracy of video classification is improved.
In particular, the purpose of decoding the video code stream data to be classified to obtain the video to be classified is achieved by judging the actual class confidence value and the standard class confidence value, and the secondary judgment of the confidence value is achieved by carrying out secondary detection on the video code stream data to be classified, of which the actual class confidence value is smaller than the standard class confidence value, so that detection omission is avoided, and detection errors are reduced.
In particular, the aim of dividing the video frames in each video segment into the same video frame local areas is achieved by establishing a two-dimensional rectangular coordinate system.
Drawings
FIG. 1 is a flow chart of a method for normalizing and fusing multi-source data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a system for normalizing and fusing multi-source data according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an analysis module according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a marking module according to an embodiment of the present invention.
Detailed Description
In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
It is noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and in the foregoing figures, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, a flowchart of a method for normalizing and fusing multi-source data according to an embodiment of the present invention is shown, where the method includes:
step S100: acquiring a plurality of video code stream data to be classified;
step S200: for any video code stream data to be classified, obtaining an actual category confidence value of the video code stream data to be classified, and decoding the video code stream data to be classified according to the actual category confidence value to obtain videos to be classified, wherein the videos to be classified are in one-to-one correspondence with the video code stream data to be classified;
step S300: dividing the video to be classified into a plurality of video segments, and dividing each video frame in each video segment into video frame local areas with the same area, wherein the number of video frames in each video segment is the same;
step S400: identifying the actual area ratio of the human body contour in the local area of the video frame, judging whether to count the actual number of the video segment residual video frames according to the actual area ratio of the human body contour and the standard area ratio of the human body contour, calculating the actual number difference value of the video segment residual video frames according to the actual number of the video segment residual video frames after judging to count the actual number of the video segment residual video frames, adjusting the preset identification duration according to the adjustment coefficient determined by the actual number difference value to obtain a target identification duration, identifying dangerous actions in the video segment residual video frames in the target identification duration and counting the total frame number containing the dangerous actions;
step S500: and calculating the actual duration of the dangerous action according to the total frame number of the dangerous action, marking the video to be classified according to the duration of the dangerous action, and storing the marked video to be classified.
Specifically, in the embodiment of the invention, the classification of the video code stream data of the same class is further subdivided according to the characteristics of the video code stream data, the decoding of the video code stream data to be classified is realized by obtaining the actual class confidence value of each video code stream data to be classified, so that the video to be classified is obtained, the purpose of dividing the video to be classified into a plurality of video segments is realized by dividing the video to be classified into local areas of video frames, the actual area occupation ratio of the human body contour in the local areas of the video frames is identified, the actual number difference value of the residual video frames of the video segments is calculated according to the actual number of the residual video frames of the video segments, the purpose of identifying dangerous actions is realized by adjusting the preset identification time, the actual duration of the dangerous actions is calculated by the total frame number of the dangerous actions, the purpose of re-marking the video to be classified according to the duration of the dangerous actions is realized, the re-marked video is respectively stored, the classification of the video is further subdivided, and the classification accuracy is improved.
Specifically, obtaining an actual category confidence value of each piece of video code stream data to be classified, and decoding the video code stream data to be classified according to the actual category confidence value to obtain video to be classified, wherein the video to be classified comprises;
obtaining an actual category confidence value of each video code stream data to be classified based on the enhanced convolutional neural network of the motion vector;
and comparing the actual category confidence value with a standard category confidence value to obtain a comparison result, and judging whether to decode the video code stream data to be classified according to the comparison result to obtain the video to be classified.
Specifically, the motion vector-based enhanced convolutional neural network technology refers to a convolutional neural network based on an RGB image obtained by training based on category calibration information of a video to be classified and an original RGB image extracted from a video code stream of a training video, and an enhanced convolutional neural network based on a motion vector obtained by training based on category calibration information of the video to be classified, the motion vector image and an optical flow-based convolutional neural network already trained, and a category confidence value is obtained according to the enhanced convolutional neural network of the motion vector, which is the prior art and is not described in detail.
Specifically, in the embodiment of the invention, the primary screening of the video code stream data to be classified is realized through the enhanced convolutional neural network of the motion vector, and the screened video code stream data to be classified is subjected to subsequent identification verification, so that the load of system operation and the number of analysis processing are reduced, and the efficiency of data processing is improved.
Specifically, determining whether to count the actual number of the remaining video frames of the video segment according to the actual area occupation ratio of the human body contour and the standard area occupation ratio of the human body contour includes:
setting a human body contour standard area occupation ratio S0;
if the actual area occupation ratio S of the human body contour is larger than or equal to the standard area occupation ratio S0 of the human body contour, the actual number of the residual video frames of the video segment is counted;
if the actual area ratio S of the human body contour is smaller than the standard area ratio S0 of the human body contour and the human body contour is in the second video frame local area or the third video frame local area, the actual number of the residual video frames of the video segment is counted;
if the actual area ratio S of the human body contour is smaller than the standard area ratio S0 of the human body contour and the human body contour is in the local area of the first video frame, the actual number of the residual video frames of the video segment is not counted.
Specifically, in the present invention, the identification and calculation of the human body contour area can be performed by an edge detection algorithm such as canny or Opencv, which is the prior art and will not be described in detail.
Specifically, the human body contour standard area ratio S0 can be set within the range of [0.2,0.3] by those skilled in the art.
Specifically, in the embodiment of the invention, the statistics of the actual number of the residual video frames in the video segment is realized through the comparison and judgment of the actual area ratio of the human body contour and the standard area ratio of the human body contour, and because the human body contour in the video frame is smaller than the standard preset area and can not always represent the dangerous action scene, the dangerous action scene operation is generally represented as the key scene to highlight the theme of the video, the invention judges whether to count the actual number of the residual video frames in the video segment or not by utilizing the size of the human body contour area in the local areas of the three video frames, realizes the screening of the video frames in the video segment, reduces the statistical number after the screening, and improves the statistical efficiency.
Specifically, calculating an actual number difference of the video segment remaining video frames according to the actual number of the video segment remaining video frames, and adjusting the preset recognition duration according to an adjustment coefficient determined by the actual number difference includes:
setting a standard number L of the rest video frames of the video segment;
if the actual number of the video segment residual video frames is larger than the standard number of the video segment residual video frames, calculating an actual number difference value of the video segment residual video frames;
and adjusting the standard recognition duration according to a first recognition duration adjusting coefficient alpha 1, a second recognition duration adjusting coefficient alpha 2 or a third recognition duration adjusting coefficient alpha 3 determined by the relation between the actual quantity difference and a preset first standard quantity difference delta L1 and a second standard quantity difference delta L2.
Specifically, there are typically 24 frames per second, and one skilled in the art can set the time of each video segment in the range of [1,3] in minutes, and one skilled in the art can set the standard number of remaining video frames of a video segment in the range of [1000,2000 ].
Specifically, adjusting the standard recognition duration according to the first recognition duration adjustment coefficient α1, the second recognition duration adjustment coefficient α2, or the third recognition duration adjustment coefficient α3 includes:
setting a standard quantity difference delta L0 of the remaining video frames of the video segment, wherein L0 (Lmin, lmax);
a first standard quantity difference DeltaL 1 and a second standard quantity difference DeltaL 2 are set,
calculating an actual number difference DeltaL of the remaining video frames of the video segment, wherein DeltaL=L-Lmax;
when Δl < [ Δl1 ], then selecting a first recognition duration adjustment coefficient α1 to adjust the preset recognition duration to t1=t0 (1+α1), wherein α1= (Δl1- Δl)/[ Δl1);
when DeltaL 1 is less than or equal to DeltaL < DeltaL2, a second recognition duration adjustment coefficient alpha 2 is selected to adjust the preset recognition duration to be T1=T0 (1+Alpha 2), wherein alpha 2= [ (DeltaL-DeltaL 1) x (DeltaL 2-DeltaL) ]/Delta L x DeltaL 2;
when Δl is not less than Δl2, a third recognition duration adjustment coefficient α3 is selected to adjust the preset recognition duration to t1=t0 (1+α3), wherein α3= (Δl- Δl2)/Δl;
wherein Δl0= (Lmax-Lmin), Δl2=2/3 (Lmax-Lmin), and Δl1=1/3 (Lmax-Lmin).
Specifically, in the embodiment of the invention, the standard quantity difference value of the residual video frames of the video segment is set, so that the dynamic adjustment of the dangerous action identification duration is realized, the waste of calculation force caused by the too small actual quantity of the residual video frames of the video segment is avoided, the identification deficiency caused by the too long actual quantity of the residual video frames of the video segment is also avoided, the dynamic adjustment is realized, the running load is saved in a dynamic adjustment mode, and the identification efficiency is improved.
Specifically, marking the video to be classified according to the dangerous action duration includes:
setting a first standard dangerous action duration T1 and a second standard dangerous action duration T2, wherein T1< T2;
when the actual duration time T of the dangerous action is smaller than the first standard dangerous action duration time T1, marking the video to be classified as a first marked video;
when the actual dangerous action duration time T is greater than the first standard dangerous action duration time T1 and less than the second standard dangerous action duration time T2, marking the video to be classified as a second marked video;
and when the actual duration time T of the dangerous action is larger than the second standard dangerous action duration time T2, marking the video to be classified as a third marked video.
Specifically, one skilled in the art may set the first standard dangerous action duration T1 within the range of (0, 4), in seconds, and the second standard dangerous action duration T2 within the range of [4,7], in seconds.
Specifically, the dangerous action video segment picture has a picture with too strong visual impact.
Specifically, in the dangerous action recognition method, a deep learning model is adopted to train and evaluate a dangerous action video data set, so that the dangerous action recognition accuracy is improved, and the dangerous action recognition method is the prior art and is not repeated.
Specifically, in the embodiment of the invention, the marking of the video to be classified is realized by the duration of the dangerous action, the classification of the video is refined, and the accuracy of the video classification is improved.
Specifically, obtaining an actual category confidence value of each piece of video code stream data to be classified, and decoding the video code stream data to be classified according to the actual category confidence value to obtain video to be classified, wherein the video to be classified comprises;
obtaining an actual category confidence value of each video code stream data to be classified based on the enhanced convolutional neural network of the motion vector;
and comparing the actual category confidence value with a standard category confidence value to obtain a comparison result, and judging whether to decode the video code stream data to be classified according to the comparison result to obtain the video to be classified.
Specifically, determining whether to decode the video code stream data to be classified according to the comparison result includes:
setting a standard class confidence value F0;
if the actual category confidence coefficient value F is greater than or equal to the standard category confidence coefficient value F0, decoding the video code stream data to be classified;
if the actual class confidence value F is smaller than the standard class confidence value F0, performing secondary detection on the class confidence value of the video code stream data to be classified, averaging the detected class confidence values to obtain an average class confidence value, and decoding the video code stream data to be classified according to the average class confidence value.
Specifically, the standard class confidence value is calculated based on the video code stream data to be classified in the storage module, class confidence value data of all the video code stream data to be classified in the storage module is obtained, the average value of the class confidence values of the video code stream data to be classified in the storage module is solved, and the average value is determined to be the standard class confidence value F0.
Specifically, in the embodiment of the invention, the purpose of decoding the video code stream data to be classified to obtain the video to be classified is realized by judging the actual class confidence value and the standard class confidence value, and the secondary judgment of the confidence value is realized by carrying out the secondary detection of the class confidence value on the video code stream data to be classified, of which the actual class confidence value is smaller than the standard class confidence value, so that the omission of detection is avoided, and the detection error is reduced.
Specifically, dividing each video frame in each video segment into video frame local regions having the same area includes:
establishing a two-dimensional rectangular coordinate system by taking a cross point of a transverse edge and a longitudinal edge at the left lower part of the video frame as an origin to obtain a coordinate range (0, x, y) of the video frame, wherein x represents the length of the video frame and y represents the width of the video frame;
and transversely dividing the video frame area along the direction parallel to the x-axis according to the video frame coordinate range to obtain three video frame local areas, wherein the areas of the video frame local areas are the same, and the three video frame local areas are sequentially from top to bottom: a first video frame local area, a second video frame local area, and a third video frame local area.
Specifically, in the embodiment of the invention, the aim of dividing the video frames in each video segment into the same video frame local areas is achieved by establishing a two-dimensional rectangular coordinate system.
Specifically, in this embodiment, the video frame area may be divided into 2, 3, or N pieces of the same area, and the specific division may be determined by those skilled in the art according to the actual situation.
Specifically, calculating the dangerous action actual duration from the total frame number of the dangerous action includes: the actual duration T of the dangerous action is calculated according to equation (1),
t=1/12 seconds x total number of frames of dangerous actions (1).
Referring to fig. 2 to 4, a structural schematic diagram of a system for normalizing and fusing multi-source data according to an embodiment of the present invention includes:
the storage module 10 is used for storing a plurality of video code stream data to be classified;
the acquisition module 20 is connected with the storage module 10 and is used for acquiring a plurality of video code stream data to be classified;
the decoding module 30 is connected with the obtaining module 20, and is configured to obtain an actual category confidence value of the video code stream data to be classified for any video code stream data to be classified, and decode the video code stream data to be classified according to the actual category confidence value to obtain a video to be classified, where the video to be classified corresponds to the video code stream data to be classified one by one;
the dividing module 40 is connected with the decoding module 30, and is configured to divide the video to be classified into a plurality of video segments, and divide each video frame in each video segment into video frame local areas with the same area, where the number of video frames in each video segment is the same;
the analysis module 50 is connected with the dividing module 40, and the analysis module comprises an identification unit 51, a judging unit 52 and a first calculating unit 53, wherein the identification unit 51 is used for identifying the actual area occupation ratio of the human body contour in the local area of the video frame, the judging unit 52 is used for judging whether the actual number of the video segment residual video frames is counted according to the actual area occupation ratio of the human body contour and the standard area occupation ratio of the human body contour, the first calculating unit 53 is used for calculating the actual number difference value of the video segment residual video frames according to the actual number of the video segment residual video frames, and adjusting a preset identification duration according to an adjustment coefficient determined by the actual number difference value to obtain a target identification duration, and identifying dangerous actions in the video segment residual video frames and counting the total number of frames containing dangerous actions in the target identification duration;
the marking module 60 is connected to the analysis module 50, and the marking module includes a second calculating unit 61, a marking unit 62, and a storage unit 63, where the second calculating unit 61 is configured to calculate the actual duration of the dangerous action according to the total frame number of the dangerous action, the marking unit 62 is configured to mark the video to be classified according to the duration of the dangerous action, and the storage unit 63 is configured to store the marked video to be classified.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.
The foregoing description is only of the preferred embodiments of the invention and is not intended to limit the invention; various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for normalizing and fusing multi-source data, comprising:
acquiring a plurality of video code stream data to be classified;
for any video code stream data to be classified, obtaining an actual category confidence value of the video code stream data to be classified, and decoding the video code stream data to be classified according to the actual category confidence value to obtain videos to be classified, wherein the videos to be classified are in one-to-one correspondence with the video code stream data to be classified;
dividing the video to be classified into a plurality of video segments, and dividing each video frame in each video segment into video frame local areas with the same area, wherein the number of video frames in each video segment is the same;
identifying the actual area ratio of the human body contour in the local area of the video frame, judging whether to count the actual number of the video segment residual video frames according to the actual area ratio of the human body contour and the standard area ratio of the human body contour, calculating the actual number difference value of the video segment residual video frames according to the actual number of the video segment residual video frames after judging to count the actual number of the video segment residual video frames, adjusting the preset identification duration according to the adjustment coefficient determined by the actual number difference value to obtain a target identification duration, identifying dangerous actions in the video segment residual video frames in the target identification duration and counting the total frame number containing the dangerous actions;
and calculating the actual duration of the dangerous action according to the total frame number of the dangerous action, marking the video to be classified according to the duration of the dangerous action, and storing the marked video to be classified.
2. The method of normalizing and fusing multi-source data according to claim 1, wherein determining whether to count the actual number of remaining video frames of the video segment based on the actual area ratio of the human contour and the standard area ratio of the human contour comprises:
setting a human body contour standard area occupation ratio S0;
if the actual area occupation ratio S of the human body contour is larger than or equal to the standard area occupation ratio S0 of the human body contour, the actual number of the residual video frames of the video segment is counted;
if the actual area ratio S of the human body contour is smaller than the standard area ratio S0 of the human body contour and the human body contour is in the second video frame local area or the third video frame local area, the actual number of the residual video frames of the video segment is counted;
if the actual area ratio S of the human body contour is smaller than the standard area ratio S0 of the human body contour and the human body contour is in the local area of the first video frame, the actual number of the residual video frames of the video segment is not counted.
3. The method of normalizing process and fusion of multi-source data according to claim 2, wherein calculating an actual number difference of video segment remaining video frames according to the actual number of video segment remaining video frames, and adjusting a preset recognition duration according to an adjustment coefficient determined by the actual number difference comprises:
setting a standard number L of the rest video frames of the video segment;
if the actual number of the video segment residual video frames is larger than the standard number of the video segment residual video frames, calculating an actual number difference value of the video segment residual video frames;
and adjusting the standard recognition duration according to a first recognition duration adjusting coefficient alpha 1, a second recognition duration adjusting coefficient alpha 2 or a third recognition duration adjusting coefficient alpha 3 determined by the relation between the actual quantity difference and a preset first standard quantity difference delta L1 and a second standard quantity difference delta L2.
4. The method of normalizing process and fusion of multi-source data according to claim 3, wherein adjusting the standard recognition duration according to the first recognition duration adjustment coefficient α1, the second recognition duration adjustment coefficient α2, or the third recognition duration adjustment coefficient α3 comprises:
setting a standard quantity difference delta L0 of the remaining video frames of the video segment, wherein L0 (Lmin, lmax);
a first standard quantity difference DeltaL 1 and a second standard quantity difference DeltaL 2 are set,
calculating an actual number difference DeltaL of the remaining video frames of the video segment, wherein DeltaL=L-Lmax;
when Δl < [ Δl1 ], then selecting a first recognition duration adjustment coefficient α1 to adjust the preset recognition duration to t1=t0 (1+α1), wherein α1= (Δl1- Δl)/[ Δl1);
when DeltaL 1 is less than or equal to DeltaL < DeltaL2, a second recognition duration adjustment coefficient alpha 2 is selected to adjust the preset recognition duration to be T1=T0 (1+Alpha 2), wherein alpha 2= [ (DeltaL-DeltaL 1) x (DeltaL 2-DeltaL) ]/Delta L x DeltaL 2;
when Δl is not less than Δl2, a third recognition duration adjustment coefficient α3 is selected to adjust the preset recognition duration to t1=t0 (1+α3), wherein α3= (Δl- Δl2)/Δl;
wherein Δl0= (Lmax-Lmin), Δl2=2/3 (Lmax-Lmin), and Δl1=1/3 (Lmax-Lmin).
5. The method of multi-source data normalization and fusion according to claim 4, wherein marking the video to be classified according to the dangerous action duration comprises:
setting a first standard dangerous action duration T1 and a second standard dangerous action duration T2, wherein T1< T2;
when the actual duration time T of the dangerous action is smaller than the first standard dangerous action duration time T1, marking the video to be classified as a first marked video;
when the actual dangerous action duration time T is greater than the first standard dangerous action duration time T1 and less than the second standard dangerous action duration time T2, marking the video to be classified as a second marked video;
and when the actual duration time T of the dangerous action is larger than the second standard dangerous action duration time T2, marking the video to be classified as a third marked video.
6. The method for normalizing and fusing multi-source data according to claim 5, wherein obtaining an actual category confidence value of each piece of video code stream data to be classified, decoding the video code stream data to be classified according to the actual category confidence value to obtain a video to be classified comprises;
obtaining an actual category confidence value of each video code stream data to be classified based on the enhanced convolutional neural network of the motion vector;
and comparing the actual category confidence value with a standard category confidence value to obtain a comparison result, and judging whether to decode the video code stream data to be classified according to the comparison result to obtain the video to be classified.
7. The method of normalizing and fusing multi-source data according to claim 6, wherein determining whether to decode the video code stream data to be classified to obtain the video to be classified according to the comparison result comprises:
setting a standard class confidence value F0;
if the actual category confidence coefficient value F is greater than or equal to the standard category confidence coefficient value F0, decoding the video code stream data to be classified;
if the actual class confidence value F is smaller than the standard class confidence value F0, performing secondary detection on the class confidence value of the video code stream data to be classified, averaging the detected class confidence values to obtain an average class confidence value, and decoding the video code stream data to be classified according to the average class confidence value.
8. The method of multi-source data normalization processing and fusion according to claim 7, wherein dividing each video frame in each video segment into local regions of video frames of equal area comprises:
establishing a two-dimensional rectangular coordinate system by taking a cross point of a transverse edge and a longitudinal edge at the left lower part of the video frame as an origin to obtain a coordinate range (0, x, y) of the video frame, wherein x represents the length of the video frame and y represents the width of the video frame;
and transversely dividing the video frame area along the direction parallel to the x-axis according to the video frame coordinate range to obtain three video frame local areas, wherein the areas of the video frame local areas are the same, and the three video frame local areas are sequentially from top to bottom: a first video frame local area, a second video frame local area, and a third video frame local area.
9. The method of multi-source data normalization and fusion according to claim 8, wherein calculating the actual duration of the dangerous action based on the total number of frames of the dangerous action comprises: the actual duration T of the dangerous action is calculated according to equation (1),
t=1/12 seconds x total number of frames of dangerous actions (1).
10. A system for use in the method of multi-source data normalization and fusion of any one of claims 1 to 9, comprising:
the storage module is used for storing a plurality of video code stream data to be classified;
the acquisition module is connected with the storage module and used for acquiring a plurality of video code stream data to be classified;
the decoding module is connected with the acquisition module and used for obtaining the actual category confidence value of any video code stream data to be classified, decoding the video code stream data to be classified according to the actual category confidence value to obtain videos to be classified, wherein the videos to be classified correspond to the video code stream data to be classified one by one;
the dividing module is connected with the decoding module and used for dividing the video to be classified into a plurality of video segments and dividing each video frame in each video segment into video frame local areas with the same area, wherein the number of the video frames in each video segment is the same;
the analysis module is connected with the dividing module and comprises an identification unit, a judging unit and a first calculation unit, wherein the identification unit is used for identifying the actual area occupied ratio of the human body outline in the local area of the video frame, the judging unit is used for judging whether to count the actual number of the video segment residual video frames according to the actual area occupied ratio of the human body outline and the standard area occupied ratio of the human body outline, the first calculation unit is used for calculating the actual number difference value of the video segment residual video frames according to the actual number of the video segment residual video frames, and adjusting the preset identification duration according to the adjustment coefficient determined by the actual number difference value to obtain a target identification duration, and identifying dangerous actions in the video segment residual video frames and counting the total frame number containing the dangerous actions in the target identification duration;
the marking module is connected with the analysis module and comprises a second calculation unit, a marking unit and a storage unit, wherein the second calculation unit is used for calculating the actual duration of the dangerous action according to the total frame number of the dangerous action, the marking unit is used for marking the videos to be classified according to the duration of the dangerous action, and the storage unit is used for storing the marked videos to be classified.
CN202310975393.7A 2023-08-04 2023-08-04 Multi-source data normalization processing and fusion method and system Active CN116912596B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310975393.7A CN116912596B (en) 2023-08-04 2023-08-04 Multi-source data normalization processing and fusion method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310975393.7A CN116912596B (en) 2023-08-04 2023-08-04 Multi-source data normalization processing and fusion method and system

Publications (2)

Publication Number Publication Date
CN116912596A true CN116912596A (en) 2023-10-20
CN116912596B CN116912596B (en) 2024-03-22

Family

ID=88362912

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310975393.7A Active CN116912596B (en) 2023-08-04 2023-08-04 Multi-source data normalization processing and fusion method and system

Country Status (1)

Country Link
CN (1) CN116912596B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101867729A (en) * 2010-06-08 2010-10-20 上海交通大学 Method for detecting news video formal soliloquy scene based on features of characters
CN101872418A (en) * 2010-05-28 2010-10-27 电子科技大学 Detection method based on group environment abnormal behavior
CN110674790A (en) * 2019-10-15 2020-01-10 山东建筑大学 Abnormal scene processing method and system in video monitoring
CN111860311A (en) * 2020-07-20 2020-10-30 南京智金科技创新服务中心 Method and system for prompting abnormal posture of human body
CN111931856A (en) * 2020-08-14 2020-11-13 深圳市英威诺科技有限公司 Video classification method and device, electronic equipment and storage medium
CN113542804A (en) * 2021-07-09 2021-10-22 杭州当虹科技股份有限公司 Method for detecting static frame sequence based on code stream statistical characteristics
CN114648712A (en) * 2020-12-18 2022-06-21 北京字节跳动网络技术有限公司 Video classification method and device, electronic equipment and computer-readable storage medium
CN114821421A (en) * 2022-04-28 2022-07-29 南京理工大学 Traffic abnormal behavior detection method and system
CN115240090A (en) * 2022-07-23 2022-10-25 南京信息工程大学 Unmanned aerial vehicle target detection system based on edge calculation
CN116229341A (en) * 2022-11-21 2023-06-06 中国大唐集团科学技术研究总院有限公司 Method and system for analyzing and alarming suspicious behaviors in video monitoring among electrons
CN116389831A (en) * 2023-06-06 2023-07-04 湖南马栏山视频先进技术研究院有限公司 Yun Yuansheng-based offline rendering system and method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101872418A (en) * 2010-05-28 2010-10-27 电子科技大学 Detection method based on group environment abnormal behavior
CN101867729A (en) * 2010-06-08 2010-10-20 上海交通大学 Method for detecting news video formal soliloquy scene based on features of characters
CN110674790A (en) * 2019-10-15 2020-01-10 山东建筑大学 Abnormal scene processing method and system in video monitoring
CN111860311A (en) * 2020-07-20 2020-10-30 南京智金科技创新服务中心 Method and system for prompting abnormal posture of human body
CN111931856A (en) * 2020-08-14 2020-11-13 深圳市英威诺科技有限公司 Video classification method and device, electronic equipment and storage medium
CN114648712A (en) * 2020-12-18 2022-06-21 北京字节跳动网络技术有限公司 Video classification method and device, electronic equipment and computer-readable storage medium
CN113542804A (en) * 2021-07-09 2021-10-22 杭州当虹科技股份有限公司 Method for detecting static frame sequence based on code stream statistical characteristics
CN114821421A (en) * 2022-04-28 2022-07-29 南京理工大学 Traffic abnormal behavior detection method and system
CN115240090A (en) * 2022-07-23 2022-10-25 南京信息工程大学 Unmanned aerial vehicle target detection system based on edge calculation
CN116229341A (en) * 2022-11-21 2023-06-06 中国大唐集团科学技术研究总院有限公司 Method and system for analyzing and alarming suspicious behaviors in video monitoring among electrons
CN116389831A (en) * 2023-06-06 2023-07-04 湖南马栏山视频先进技术研究院有限公司 Yun Yuansheng-based offline rendering system and method

Also Published As

Publication number Publication date
CN116912596B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN112149761B (en) Electric power intelligent construction site violation detection method based on YOLOv4 improved algorithm
CN111310850B (en) License plate detection model construction method and system, license plate detection method and system
JP5127392B2 (en) Classification boundary determination method and classification boundary determination apparatus
JP5176763B2 (en) Low quality character identification method and apparatus
US8331631B2 (en) Method, apparatus, and program for discriminating the states of subjects
JP2008159056A (en) Classification through generative model of feature occurring in image
CN101937513A (en) Messaging device, information processing method and program
CN114332650B (en) Remote sensing image road identification method and system
CN107516102B (en) Method, device and system for classifying image data and establishing classification model
US8655060B2 (en) Night-scene light source detecting device and night-scene light source detecting method
CN114972922B (en) Coal gangue sorting and identifying method, device and equipment based on machine learning
US20100074526A1 (en) Methods and Systems for Locating Text in a Digital Image
CN105184291B (en) A kind of polymorphic type detection method of license plate and system
CN111428682B (en) Express sorting method, device, equipment and storage medium
CN111160481B (en) Adas target detection method and system based on deep learning
CN111461101A (en) Method, device and equipment for identifying work clothes mark and storage medium
CN109086825A (en) A kind of more disaggregated model fusion methods based on model adaptation selection
CN110751191A (en) Image classification method and system
CN116912596B (en) Multi-source data normalization processing and fusion method and system
CN104504161A (en) Image retrieval method based on robot vision platform
CN112836697A (en) Yoov 5-based vehicle type detection and quality estimation method
JP2005100121A (en) Device and program for determination of type and discrimination condition of feature quantity used in discrimination processing, recording medium with program recorded, and device for selection of data of specific content
CN114663731B (en) Training method and system of license plate detection model, and license plate detection method and system
CN113177603B (en) Training method of classification model, video classification method and related equipment
CN115170829A (en) System and method for monitoring and identifying foreign matters in generator rotor vent hole

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant