CN116912596A

CN116912596A - Multi-source data normalization processing and fusion method and system

Info

Publication number: CN116912596A
Application number: CN202310975393.7A
Authority: CN
Inventors: 王可庆; 李鹏; 席万强; 韩基泰
Original assignee: Wuxi University
Current assignee: Wuxi University
Priority date: 2023-08-04
Filing date: 2023-08-04
Publication date: 2023-10-20
Anticipated expiration: 2043-08-04
Also published as: CN116912596B

Abstract

The invention relates to the field of electronic digital data processing, in particular to a method and a system for normalizing and fusing multi-source data, wherein the method comprises the following steps: acquiring a plurality of video code stream data to be classified; decoding the video code stream data to be classified according to the actual category confidence value to obtain a video to be classified; dividing a video to be classified into a plurality of video segments, and dividing video frames in each video segment into the same video frame local area; recognizing the human body contour area in the local area of the video frame, judging whether to count the actual number of the residual video frames of the video segment according to the actual area occupation ratio of the human body contour and the standard area occupation ratio of the human body contour, and adjusting the preset recognition time length according to the difference value of the actual number; and calculating the actual duration of the dangerous action according to the total frame number of the dangerous action, marking the video to be classified according to the duration of the dangerous action to obtain marked videos, and storing the marked videos according to time. The video classification method and the video classification device improve the accuracy of video classification.

Description

Multi-source data normalization processing and fusion method and system

Technical Field

The invention relates to the field of electronic digital data processing, in particular to a method and a system for normalizing and fusing multi-source data.

Background

Video material classification refers to the process of assigning video data to predefined categories, thereby establishing a normalized and standardized video material classification hierarchy that facilitates scientific management of video material. The scientific management of the video data is beneficial to developing the information resource of the video data, enhances the data management of the video data and is very important to improving the management efficiency.

A method for implementing video material classification is disclosed in patent document publication No. CN102768669 a. The invention discloses a method for realizing video data classification, which comprises the following steps: receiving video data; analyzing the main classification type of the video material according to the content of the video material; determining the general multi-score and the imitation-score type of the video data according to the content and the main classification type of the video data, and marking the main classification, the general multi-score and the imitation-score type of the video data to obtain marked video data; judging whether a file folder corresponding to the main classification, the general multi-classification and the imitation classification of the video data exists in the disk or not; if yes, directly storing the marked video data into folders corresponding to main classification, general multi-classification and imitation classification of the video data; otherwise, newly building a folder corresponding to the main classification, the universal multiple classification and the imitation classification of the video data, and storing the marked video data.

However, the prior art has limitations in a method of classifying and storing the same type of video using the main classification technique, resulting in insufficient accuracy in video classification.

Disclosure of Invention

Therefore, the invention provides a method and a system for normalizing and fusing multi-source data, which can solve the technical problem of insufficient accuracy of video classification.

In order to achieve the above object, the present invention provides a method for normalizing and fusing multi-source data, the method comprising:

acquiring a plurality of video code stream data to be classified;

for any video code stream data to be classified, obtaining an actual category confidence value of the video code stream data to be classified, and decoding the video code stream data to be classified according to the actual category confidence value to obtain videos to be classified, wherein the videos to be classified are in one-to-one correspondence with the video code stream data to be classified;

dividing the video to be classified into a plurality of video segments, and dividing each video frame in each video segment into video frame local areas with the same area, wherein the number of video frames in each video segment is the same;

identifying the actual area ratio of the human body contour in the local area of the video frame, judging whether to count the actual number of the video segment residual video frames according to the actual area ratio of the human body contour and the standard area ratio of the human body contour, calculating the actual number difference value of the video segment residual video frames according to the actual number of the video segment residual video frames after judging to count the actual number of the video segment residual video frames, adjusting the preset identification duration according to the adjustment coefficient determined by the actual number difference value to obtain a target identification duration, identifying dangerous actions in the video segment residual video frames in the target identification duration and counting the total frame number containing the dangerous actions;

and calculating the actual duration of the dangerous action according to the total frame number of the dangerous action, marking the video to be classified according to the duration of the dangerous action, and storing the marked video to be classified.

Further, determining whether to count the actual number of the remaining video frames of the video segment according to the actual area occupation ratio of the human body contour and the standard area occupation ratio of the human body contour includes:

setting a human body contour standard area occupation ratio S0;

if the actual area occupation ratio S of the human body contour is larger than or equal to the standard area occupation ratio S0 of the human body contour, the actual number of the residual video frames of the video segment is counted;

if the actual area ratio S of the human body contour is smaller than the standard area ratio S0 of the human body contour and the human body contour is in the second video frame local area or the third video frame local area, the actual number of the residual video frames of the video segment is counted;

if the actual area ratio S of the human body contour is smaller than the standard area ratio S0 of the human body contour and the human body contour is in the local area of the first video frame, the actual number of the residual video frames of the video segment is not counted.

Further, calculating an actual number difference value of the video segment remaining video frames according to the actual number of the video segment remaining video frames, and adjusting the preset recognition duration according to the adjustment coefficient determined by the actual number difference value includes:

setting a standard number L of the rest video frames of the video segment;

if the actual number of the video segment residual video frames is larger than the standard number of the video segment residual video frames, calculating an actual number difference value of the video segment residual video frames;

and adjusting the standard recognition duration according to a first recognition duration adjusting coefficient alpha 1, a second recognition duration adjusting coefficient alpha 2 or a third recognition duration adjusting coefficient alpha 3 determined by the relation between the actual quantity difference and a preset first standard quantity difference delta L1 and a second standard quantity difference delta L2.

Further, adjusting the standard recognition duration according to the first recognition duration adjustment coefficient α1, the second recognition duration adjustment coefficient α2, or the third recognition duration adjustment coefficient α3 includes:

setting a standard quantity difference delta L0 of the remaining video frames of the video segment, wherein L0 (Lmin, lmax);

a first standard quantity difference DeltaL 1 and a second standard quantity difference DeltaL 2 are set,

calculating an actual number difference DeltaL of the remaining video frames of the video segment, wherein DeltaL=L-Lmax;

when Δl < [ Δl1 ], then selecting a first recognition duration adjustment coefficient α1 to adjust the preset recognition duration to t1=t0 (1+α1), wherein α1= (Δl1- Δl)/[ Δl1);

when DeltaL 1 is less than or equal to DeltaL < DeltaL2, a second recognition duration adjustment coefficient alpha 2 is selected to adjust the preset recognition duration to be T1=T0 (1+Alpha 2), wherein alpha 2= [ (DeltaL-DeltaL 1) x (DeltaL 2-DeltaL) ]/Delta L x DeltaL 2;

when Δl is not less than Δl2, a third recognition duration adjustment coefficient α3 is selected to adjust the preset recognition duration to t1=t0 (1+α3), wherein α3= (Δl- Δl2)/Δl;

wherein Δl0= (Lmax-Lmin), Δl2=2/3 (Lmax-Lmin), and Δl1=1/3 (Lmax-Lmin).

Further, marking the video to be classified according to the dangerous action duration includes:

setting a first standard dangerous action duration T1 and a second standard dangerous action duration T2, wherein T1< T2;

when the actual duration time T of the dangerous action is smaller than the first standard dangerous action duration time T1, marking the video to be classified as a first marked video;

when the actual dangerous action duration time T is greater than the first standard dangerous action duration time T1 and less than the second standard dangerous action duration time T2, marking the video to be classified as a second marked video;

and when the actual duration time T of the dangerous action is larger than the second standard dangerous action duration time T2, marking the video to be classified as a third marked video.

Further, obtaining an actual category confidence value of each piece of video code stream data to be classified, and decoding the video code stream data to be classified according to the actual category confidence value to obtain video to be classified, wherein the video to be classified comprises;

obtaining an actual category confidence value of each video code stream data to be classified based on the enhanced convolutional neural network of the motion vector;

and comparing the actual category confidence value with a standard category confidence value to obtain a comparison result, and judging whether to decode the video code stream data to be classified according to the comparison result to obtain the video to be classified.

Further, determining whether to decode the video code stream data to be classified according to the comparison result to obtain the video to be classified includes:

setting a standard class confidence value F0;

if the actual category confidence coefficient value F is greater than or equal to the standard category confidence coefficient value F0, decoding the video code stream data to be classified;

if the actual class confidence value F is smaller than the standard class confidence value F0, performing secondary detection on the class confidence value of the video code stream data to be classified, averaging the detected class confidence values to obtain an average class confidence value, and decoding the video code stream data to be classified according to the average class confidence value.

Further, dividing each video frame in each video segment into video frame partial areas having the same area includes:

establishing a two-dimensional rectangular coordinate system by taking a cross point of a transverse edge and a longitudinal edge at the left lower part of the video frame as an origin to obtain a coordinate range (0, x, y) of the video frame, wherein x represents the length of the video frame and y represents the width of the video frame;

and transversely dividing the video frame area along the direction parallel to the x-axis according to the video frame coordinate range to obtain three video frame local areas, wherein the areas of the video frame local areas are the same, and the three video frame local areas are sequentially from top to bottom: a first video frame local area, a second video frame local area, and a third video frame local area.

Further, calculating the actual duration of the dangerous action based on the total frame number of the dangerous action includes: the actual duration T of the dangerous action is calculated according to equation (1),

t=1/12 seconds x total number of frames of dangerous actions (1).

On the other hand, the embodiment of the invention also provides a system of a multisource data normalization processing and fusion method, which comprises the following steps:

the storage module is used for storing a plurality of video code stream data to be classified;

the acquisition module is connected with the storage module and used for acquiring a plurality of video code stream data to be classified;

the decoding module is connected with the acquisition module and used for obtaining the actual category confidence value of any video code stream data to be classified, decoding the video code stream data to be classified according to the actual category confidence value to obtain videos to be classified, wherein the videos to be classified correspond to the video code stream data to be classified one by one;

the dividing module is connected with the decoding module and used for dividing the video to be classified into a plurality of video segments and dividing each video frame in each video segment into video frame local areas with the same area, wherein the number of the video frames in each video segment is the same;

the analysis module is connected with the dividing module and comprises an identification unit, a judging unit and a first calculation unit, wherein the identification unit is used for identifying the actual area occupied ratio of the human body outline in the local area of the video frame, the judging unit is used for judging whether to count the actual number of the video segment residual video frames according to the actual area occupied ratio of the human body outline and the standard area occupied ratio of the human body outline, the first calculation unit is used for calculating the actual number difference value of the video segment residual video frames according to the actual number of the video segment residual video frames, and adjusting the preset identification duration according to the adjustment coefficient determined by the actual number difference value to obtain a target identification duration, and identifying dangerous actions in the video segment residual video frames and counting the total frame number containing the dangerous actions in the target identification duration;

the marking module is connected with the analysis module and comprises a second calculation unit, a marking unit and a storage unit, wherein the second calculation unit is used for calculating the actual duration of the dangerous action according to the total frame number of the dangerous action, the marking unit is used for marking the videos to be classified according to the duration of the dangerous action, and the storage unit is used for storing the marked videos to be classified.

Compared with the prior art, the method has the advantages that the classification of the video code stream data of the same class stored in a compressed mode is further subdivided according to the characteristics of the video code stream data, the decoding of the video code stream data to be classified is achieved by obtaining the actual class confidence value of each video code stream data to be classified, the purpose of dividing the video to be classified into a plurality of video segments is achieved, the purpose of dividing the local area of a video frame is achieved, the actual area ratio of the human body contour in the local area of the video frame is identified, the actual number difference value of the residual video frames of the video segment is calculated according to the actual number of the residual video frames of the video segment, the purpose of identifying dangerous actions is achieved by adjusting preset identification time, the actual duration of the dangerous actions is calculated according to the total frame number of the dangerous actions, the purpose of re-marking the video to be classified according to the duration of the dangerous actions is achieved, the video after re-marking is stored respectively, the classification of the video is achieved, and the classification accuracy is improved.

Particularly, the primary screening of the video code stream data to be classified is realized through the enhanced convolutional neural network of the motion vector, and the screened video code stream data to be classified is subjected to subsequent identification verification, so that the load of system operation and the number of analysis processing are reduced, and the data processing efficiency is improved.

In particular, the statistics of the actual number of the residual video frames of the video segment is realized through the comparison and judgment of the actual area ratio of the human body contour and the standard area ratio of the human body contour, whether the actual number of the residual video frames of the video segment is counted is judged by utilizing the sizes of the human contour areas in the local areas of the three video frames, the screening of the video frames in the video segment is realized, the counted number after screening is reduced, and the counting efficiency is improved.

In particular, by setting the standard quantity difference value of the residual video frames of the video segment, the dynamic adjustment of the dangerous action recognition duration is realized, the waste of calculation force caused by the too small actual quantity of the residual video frames of the video segment is avoided, the recognition defect caused by the too long actual quantity of the residual video frames of the video segment is also avoided, the dynamic adjustment is realized, the running load is saved in a dynamic adjustment mode, and the recognition efficiency is improved.

In particular, the marking of the videos to be classified is realized through the duration of dangerous actions, the classification of the videos is refined, and the accuracy of video classification is improved.

In particular, the purpose of decoding the video code stream data to be classified to obtain the video to be classified is achieved by judging the actual class confidence value and the standard class confidence value, and the secondary judgment of the confidence value is achieved by carrying out secondary detection on the video code stream data to be classified, of which the actual class confidence value is smaller than the standard class confidence value, so that detection omission is avoided, and detection errors are reduced.

In particular, the aim of dividing the video frames in each video segment into the same video frame local areas is achieved by establishing a two-dimensional rectangular coordinate system.

Drawings

FIG. 1 is a flow chart of a method for normalizing and fusing multi-source data according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a system for normalizing and fusing multi-source data according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an analysis module according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a marking module according to an embodiment of the present invention.

Detailed Description

In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

It is noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and in the foregoing figures, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, a flowchart of a method for normalizing and fusing multi-source data according to an embodiment of the present invention is shown, where the method includes:

step S100: acquiring a plurality of video code stream data to be classified;

step S200: for any video code stream data to be classified, obtaining an actual category confidence value of the video code stream data to be classified, and decoding the video code stream data to be classified according to the actual category confidence value to obtain videos to be classified, wherein the videos to be classified are in one-to-one correspondence with the video code stream data to be classified;

step S300: dividing the video to be classified into a plurality of video segments, and dividing each video frame in each video segment into video frame local areas with the same area, wherein the number of video frames in each video segment is the same;

step S400: identifying the actual area ratio of the human body contour in the local area of the video frame, judging whether to count the actual number of the video segment residual video frames according to the actual area ratio of the human body contour and the standard area ratio of the human body contour, calculating the actual number difference value of the video segment residual video frames according to the actual number of the video segment residual video frames after judging to count the actual number of the video segment residual video frames, adjusting the preset identification duration according to the adjustment coefficient determined by the actual number difference value to obtain a target identification duration, identifying dangerous actions in the video segment residual video frames in the target identification duration and counting the total frame number containing the dangerous actions;

step S500: and calculating the actual duration of the dangerous action according to the total frame number of the dangerous action, marking the video to be classified according to the duration of the dangerous action, and storing the marked video to be classified.

Specifically, in the embodiment of the invention, the classification of the video code stream data of the same class is further subdivided according to the characteristics of the video code stream data, the decoding of the video code stream data to be classified is realized by obtaining the actual class confidence value of each video code stream data to be classified, so that the video to be classified is obtained, the purpose of dividing the video to be classified into a plurality of video segments is realized by dividing the video to be classified into local areas of video frames, the actual area occupation ratio of the human body contour in the local areas of the video frames is identified, the actual number difference value of the residual video frames of the video segments is calculated according to the actual number of the residual video frames of the video segments, the purpose of identifying dangerous actions is realized by adjusting the preset identification time, the actual duration of the dangerous actions is calculated by the total frame number of the dangerous actions, the purpose of re-marking the video to be classified according to the duration of the dangerous actions is realized, the re-marked video is respectively stored, the classification of the video is further subdivided, and the classification accuracy is improved.

Specifically, obtaining an actual category confidence value of each piece of video code stream data to be classified, and decoding the video code stream data to be classified according to the actual category confidence value to obtain video to be classified, wherein the video to be classified comprises;

Specifically, the motion vector-based enhanced convolutional neural network technology refers to a convolutional neural network based on an RGB image obtained by training based on category calibration information of a video to be classified and an original RGB image extracted from a video code stream of a training video, and an enhanced convolutional neural network based on a motion vector obtained by training based on category calibration information of the video to be classified, the motion vector image and an optical flow-based convolutional neural network already trained, and a category confidence value is obtained according to the enhanced convolutional neural network of the motion vector, which is the prior art and is not described in detail.

Specifically, in the embodiment of the invention, the primary screening of the video code stream data to be classified is realized through the enhanced convolutional neural network of the motion vector, and the screened video code stream data to be classified is subjected to subsequent identification verification, so that the load of system operation and the number of analysis processing are reduced, and the efficiency of data processing is improved.

Specifically, determining whether to count the actual number of the remaining video frames of the video segment according to the actual area occupation ratio of the human body contour and the standard area occupation ratio of the human body contour includes:

setting a human body contour standard area occupation ratio S0;

Specifically, in the present invention, the identification and calculation of the human body contour area can be performed by an edge detection algorithm such as canny or Opencv, which is the prior art and will not be described in detail.

Specifically, the human body contour standard area ratio S0 can be set within the range of [0.2,0.3] by those skilled in the art.

Specifically, in the embodiment of the invention, the statistics of the actual number of the residual video frames in the video segment is realized through the comparison and judgment of the actual area ratio of the human body contour and the standard area ratio of the human body contour, and because the human body contour in the video frame is smaller than the standard preset area and can not always represent the dangerous action scene, the dangerous action scene operation is generally represented as the key scene to highlight the theme of the video, the invention judges whether to count the actual number of the residual video frames in the video segment or not by utilizing the size of the human body contour area in the local areas of the three video frames, realizes the screening of the video frames in the video segment, reduces the statistical number after the screening, and improves the statistical efficiency.

Specifically, calculating an actual number difference of the video segment remaining video frames according to the actual number of the video segment remaining video frames, and adjusting the preset recognition duration according to an adjustment coefficient determined by the actual number difference includes:

setting a standard number L of the rest video frames of the video segment;

Specifically, there are typically 24 frames per second, and one skilled in the art can set the time of each video segment in the range of [1,3] in minutes, and one skilled in the art can set the standard number of remaining video frames of a video segment in the range of [1000,2000 ].

Specifically, adjusting the standard recognition duration according to the first recognition duration adjustment coefficient α1, the second recognition duration adjustment coefficient α2, or the third recognition duration adjustment coefficient α3 includes:

wherein Δl0= (Lmax-Lmin), Δl2=2/3 (Lmax-Lmin), and Δl1=1/3 (Lmax-Lmin).

Specifically, in the embodiment of the invention, the standard quantity difference value of the residual video frames of the video segment is set, so that the dynamic adjustment of the dangerous action identification duration is realized, the waste of calculation force caused by the too small actual quantity of the residual video frames of the video segment is avoided, the identification deficiency caused by the too long actual quantity of the residual video frames of the video segment is also avoided, the dynamic adjustment is realized, the running load is saved in a dynamic adjustment mode, and the identification efficiency is improved.

Specifically, marking the video to be classified according to the dangerous action duration includes:

Specifically, one skilled in the art may set the first standard dangerous action duration T1 within the range of (0, 4), in seconds, and the second standard dangerous action duration T2 within the range of [4,7], in seconds.

Specifically, the dangerous action video segment picture has a picture with too strong visual impact.

Specifically, in the dangerous action recognition method, a deep learning model is adopted to train and evaluate a dangerous action video data set, so that the dangerous action recognition accuracy is improved, and the dangerous action recognition method is the prior art and is not repeated.

Specifically, in the embodiment of the invention, the marking of the video to be classified is realized by the duration of the dangerous action, the classification of the video is refined, and the accuracy of the video classification is improved.

Specifically, determining whether to decode the video code stream data to be classified according to the comparison result includes:

setting a standard class confidence value F0;

Specifically, the standard class confidence value is calculated based on the video code stream data to be classified in the storage module, class confidence value data of all the video code stream data to be classified in the storage module is obtained, the average value of the class confidence values of the video code stream data to be classified in the storage module is solved, and the average value is determined to be the standard class confidence value F0.

Specifically, in the embodiment of the invention, the purpose of decoding the video code stream data to be classified to obtain the video to be classified is realized by judging the actual class confidence value and the standard class confidence value, and the secondary judgment of the confidence value is realized by carrying out the secondary detection of the class confidence value on the video code stream data to be classified, of which the actual class confidence value is smaller than the standard class confidence value, so that the omission of detection is avoided, and the detection error is reduced.

Specifically, dividing each video frame in each video segment into video frame local regions having the same area includes:

Specifically, in the embodiment of the invention, the aim of dividing the video frames in each video segment into the same video frame local areas is achieved by establishing a two-dimensional rectangular coordinate system.

Specifically, in this embodiment, the video frame area may be divided into 2, 3, or N pieces of the same area, and the specific division may be determined by those skilled in the art according to the actual situation.

Specifically, calculating the dangerous action actual duration from the total frame number of the dangerous action includes: the actual duration T of the dangerous action is calculated according to equation (1),

t=1/12 seconds x total number of frames of dangerous actions (1).

Referring to fig. 2 to 4, a structural schematic diagram of a system for normalizing and fusing multi-source data according to an embodiment of the present invention includes:

the storage module 10 is used for storing a plurality of video code stream data to be classified;

the acquisition module 20 is connected with the storage module 10 and is used for acquiring a plurality of video code stream data to be classified;

the decoding module 30 is connected with the obtaining module 20, and is configured to obtain an actual category confidence value of the video code stream data to be classified for any video code stream data to be classified, and decode the video code stream data to be classified according to the actual category confidence value to obtain a video to be classified, where the video to be classified corresponds to the video code stream data to be classified one by one;

the dividing module 40 is connected with the decoding module 30, and is configured to divide the video to be classified into a plurality of video segments, and divide each video frame in each video segment into video frame local areas with the same area, where the number of video frames in each video segment is the same;

the analysis module 50 is connected with the dividing module 40, and the analysis module comprises an identification unit 51, a judging unit 52 and a first calculating unit 53, wherein the identification unit 51 is used for identifying the actual area occupation ratio of the human body contour in the local area of the video frame, the judging unit 52 is used for judging whether the actual number of the video segment residual video frames is counted according to the actual area occupation ratio of the human body contour and the standard area occupation ratio of the human body contour, the first calculating unit 53 is used for calculating the actual number difference value of the video segment residual video frames according to the actual number of the video segment residual video frames, and adjusting a preset identification duration according to an adjustment coefficient determined by the actual number difference value to obtain a target identification duration, and identifying dangerous actions in the video segment residual video frames and counting the total number of frames containing dangerous actions in the target identification duration;

the marking module 60 is connected to the analysis module 50, and the marking module includes a second calculating unit 61, a marking unit 62, and a storage unit 63, where the second calculating unit 61 is configured to calculate the actual duration of the dangerous action according to the total frame number of the dangerous action, the marking unit 62 is configured to mark the video to be classified according to the duration of the dangerous action, and the storage unit 63 is configured to store the marked video to be classified.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

The foregoing description is only of the preferred embodiments of the invention and is not intended to limit the invention; various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for normalizing and fusing multi-source data, comprising:

acquiring a plurality of video code stream data to be classified;

2. The method of normalizing and fusing multi-source data according to claim 1, wherein determining whether to count the actual number of remaining video frames of the video segment based on the actual area ratio of the human contour and the standard area ratio of the human contour comprises:

setting a human body contour standard area occupation ratio S0;

3. The method of normalizing process and fusion of multi-source data according to claim 2, wherein calculating an actual number difference of video segment remaining video frames according to the actual number of video segment remaining video frames, and adjusting a preset recognition duration according to an adjustment coefficient determined by the actual number difference comprises:

setting a standard number L of the rest video frames of the video segment;

4. The method of normalizing process and fusion of multi-source data according to claim 3, wherein adjusting the standard recognition duration according to the first recognition duration adjustment coefficient α1, the second recognition duration adjustment coefficient α2, or the third recognition duration adjustment coefficient α3 comprises:

wherein Δl0= (Lmax-Lmin), Δl2=2/3 (Lmax-Lmin), and Δl1=1/3 (Lmax-Lmin).

5. The method of multi-source data normalization and fusion according to claim 4, wherein marking the video to be classified according to the dangerous action duration comprises:

6. The method for normalizing and fusing multi-source data according to claim 5, wherein obtaining an actual category confidence value of each piece of video code stream data to be classified, decoding the video code stream data to be classified according to the actual category confidence value to obtain a video to be classified comprises;

7. The method of normalizing and fusing multi-source data according to claim 6, wherein determining whether to decode the video code stream data to be classified to obtain the video to be classified according to the comparison result comprises:

setting a standard class confidence value F0;

8. The method of multi-source data normalization processing and fusion according to claim 7, wherein dividing each video frame in each video segment into local regions of video frames of equal area comprises:

9. The method of multi-source data normalization and fusion according to claim 8, wherein calculating the actual duration of the dangerous action based on the total number of frames of the dangerous action comprises: the actual duration T of the dangerous action is calculated according to equation (1),

t=1/12 seconds x total number of frames of dangerous actions (1).

10. A system for use in the method of multi-source data normalization and fusion of any one of claims 1 to 9, comprising: