CN111325153B - Student behavior feature intelligent analysis method based on multidimensional data - Google Patents

Student behavior feature intelligent analysis method based on multidimensional data Download PDF

Info

Publication number
CN111325153B
CN111325153B CN202010106436.4A CN202010106436A CN111325153B CN 111325153 B CN111325153 B CN 111325153B CN 202010106436 A CN202010106436 A CN 202010106436A CN 111325153 B CN111325153 B CN 111325153B
Authority
CN
China
Prior art keywords
target
student
time
behavior
residual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010106436.4A
Other languages
Chinese (zh)
Other versions
CN111325153A (en
Inventor
纪刚
周亚敏
周萌萌
商胜楠
周粉粉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Lianhe Chuangzhi Technology Co ltd
Original Assignee
Qingdao Lianhe Chuangzhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Lianhe Chuangzhi Technology Co ltd filed Critical Qingdao Lianhe Chuangzhi Technology Co ltd
Priority to CN202010106436.4A priority Critical patent/CN111325153B/en
Publication of CN111325153A publication Critical patent/CN111325153A/en
Application granted granted Critical
Publication of CN111325153B publication Critical patent/CN111325153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of multidimensional data intelligent processing, and relates to an intelligent analysis method for student behavior characteristics based on multidimensional data; s1, constructing a high-low frequency 3D residual neural network model, and performing video structuring processing to obtain the spatial position and behavior category of a target; s2, storing the information of the acquired students into a database to generate a track sequence of the students; s3, constructing a descriptive information sequence of the target, screening a track sequence of a student with highest matching degree with the target, and constructing a track-behavior sequence pair of the student; s4, generating a daily activity rule chart of the student by utilizing the student track-behavior sequence pair, and carrying out long-time rule statistics to judge whether abnormal behaviors exist; the method can combine the data of multiple dimensions such as student behavior data, student positioning data and the like to perform intelligent comprehensive analysis, help schools discover the behavior characteristics of each student, discover abnormal behaviors of the students in time, and realize rapid early warning.

Description

Student behavior feature intelligent analysis method based on multidimensional data
Technical field:
the invention belongs to the technical field of multidimensional data intelligent processing, relates to a multidimensional data intelligent analysis method, and in particular relates to a multidimensional data-based student behavior characteristic intelligent analysis method.
The background technology is as follows:
schools not only teach students cultural knowledge, but also are burdened with teaching students to develop good behavioral habits. The method can accurately analyze the behavior characteristics of the students, and is helpful for improving the learning ability of the students, improving the teaching means of teachers, and improving the good interaction relationship between the students and the teachers. However, students are huge in number, activities during a school cannot be fully supervised, and how to supervise behaviors of the students by using effective means is a problem to be solved.
The application of the current high and new technologies such as big data, the Internet of things, artificial intelligence and the like in the education field is gradually expanded, and the student behavior analysis is also related to a certain degree. In the prior art, chinese patent with publication number of CN108268854A discloses a teaching auxiliary big data intelligent analysis method based on feature recognition, which comprises the following steps: s1, acquiring a student static image; s2, extracting the directional gradient characteristics of the image; s3, machine learning; s4, sliding a window; s5, selecting the action with the highest probability under action classification, and identifying the action; s6, carrying out face recognition on the action sender in the S5. The Chinese patent with publication number of CN108242035A discloses a campus security monitoring method based on big data, which is characterized by comprising the following steps: acquiring campus data; analyzing and processing campus data to obtain a student behavior model; and comparing and analyzing the student behavior model with the reference model and outputting an evaluation result. The invention realizes the safety monitoring of students in school, and the monitoring content comprises: the abnormal conditions are timely found through data analysis, so that the campus safety is improved. The Chinese patent with publication number of CN109145818A discloses a flow statistics method, device and system, relates to the technical field of big data, is applied to a server side, and comprises the steps of receiving target feature information of an interested feature to be subjected to flow statistics; searching a feature map with feature description information matched with target feature information in a preset storage area, wherein the feature description information in the preset storage area is obtained by carrying out structural processing on the feature map in an optimal image sent by a terminal; and counting the number of feature graphs corresponding to the searched feature description information to obtain a flow statistical result of the interested feature, so as to realize accurate statistics of the feature graphs with the same target feature information and further realize flow statistics of the interested feature. And the related statistical and analysis methods are lacking for the comprehensive behavior feature analysis of the student during the course and the course behavior recognition and the activity track.
The invention comprises the following steps:
the invention aims to overcome the defects of the existing foreground image processing, and designs and provides an intelligent analysis method for the behavior characteristics of students based on multidimensional data, aiming at the defects that the students are difficult to monitor the behavior of the students at the present time due to the large number of students at the present time, the behavior characteristics of the students are difficult to accurately analyze, the abnormal behaviors of the students cannot be found in time and the like.
In order to achieve the above purpose, the invention relates to a student behavior characteristic intelligent analysis method based on multidimensional data, which comprises the following specific process steps:
s1, video structuring process
Taking continuous 16 frames as a processing unit, defaulting a video frame in each processing unit to be 3 channels, constructing a high-low frequency 3D residual neutral network model for video structuring, storing text information and video snapshot data into corresponding structuring databases, wherein the high-low frequency 3D residual neutral network model comprises a low-frequency 3D residual neutral network and a high-frequency 3D residual neutral network, the low-frequency 3D residual neutral network performs personnel structuring processing to extract target features, and the high-frequency 3D residual neutral network performs behavior structuring processing to extract behavior features; connecting the target features with the behavior features, and processing to obtain the spatial position and behavior category of the target T;
S2, collecting real-time positioning data of students;
the student wears positioning equipment with the functions of GPS, beidou, wiFi and base station positioning, positions the student at a fixed frequency, stores the positioning result, positioning time and student ID information into a database, and generates a track sequence of the student;
s3, space-time studying and judging analysis
Extracting the spatial characteristics of the target T by using the spatial position of the target T, searching all similar targets of the target T in the L time period according to similarity matching, arranging the spatial characteristics, the behavior types and the installation positions of the monitoring cameras of each similar target according to time sequence, and constructing a descriptive information sequence of the target T; intercepting track sequences of a plurality of students in the L time period, combining the descriptive information sequences of the target T, and screening out the track sequence of the student S with the highest matching degree with the target T from the track sequences of the plurality of students by utilizing track matching, namely, identifying the target T as the student S, thereby constructing a track-behavior sequence pair of the student S;
s4, student behavior feature analysis
Drawing and generating a daily activity rule chart of the student by utilizing the student track-behavior sequence pair obtained in the step S3 and combining with the school time schedule, school work and rest schedule, basic student information and the like of the student in the L time period, and carrying out data mining to find out activity hobbies of the student in the school period, so as to help a teacher to effectively improve a teaching plan according to the characteristics of different students; by carrying out long-time rule statistics on the track-behavior sequence pairs of the students and combining with human supervision, a data prediction alarm function is constructed, whether the students have abnormal behaviors, such as long-time aggregation, loitering behaviors, track deviation and the like, real-time early warning is achieved, and accidents are prevented.
The specific process of the video structuring processing in the step S1 is as follows:
s1.1 target feature extraction
Performing target feature extraction by using a low-frequency 3D residual error network structure, wherein the sampling interval of the low-frequency 3D residual error network structure on a video frame is set to be inv_l=16, and the low-frequency 3D residual error network structure is used for extracting space and semantic information of a target;
s1.2 behavioral characteristics extraction
Performing behavior feature extraction by using a high-frequency 3D residual error network structure, and calculating the extracted behavior feature size to be {8,256,7,7}; setting the sampling interval of a high-frequency 3D residual network structure to be inv_h=inv_l/alpha of a video frame, wherein alpha=8, and the number of convolution kernels is beta times that of the convolution kernels in a low-frequency 3D residual network, wherein beta=1/8;
s1.3 video classification
Firstly, performing size conversion on behavior features through matrix operation, secondly, processing target features and behavior features after size conversion through global mean value pooling operation, thirdly, transversely connecting the processed target features and behavior features, and finally, inputting the target features and behavior features into a full-connection layer to finally obtain the spatial position and behavior category of the target; the specific process is as follows:
(1) Behavioral feature size conversion
S1.2, the behavior feature size obtained through calculation is {8,256,7,7}, the behavior feature size is converted into {1,8× 256,7,7} = {1,2048,7,7}, namely, alpha feature graphs are converted into the number of convolution kernels of a single feature graph, and the feature size conversion is completed;
(2) Feature global averaging pooling
Processing the target features and the behavior features after the size conversion by adopting global averaging pooling operation, wherein the pooled kernel size is {1, 7}, and the pooled feature sizes are {1,1,1,2048};
(3) Feature connection
Transversely connecting the target characteristics and the behavior characteristics processed in the two steps to obtain a connected characteristic length of 4096;
(4) Full connection operation
And inputting the characteristics obtained in the previous step into a full-connection layer to finally obtain the spatial position and behavior category of the target.
Model parameters pre-trained on a kinetic-400 data set are used in the operation process of the steps S1.1, S1.2 and S1.3;
s1.4 video structuring
The spatial position of the target T, the behavior category and the spatial characteristics of the target T obtained in the S1.3 are obtained to jointly construct descriptive information { location_T, spatial_feature_T, action_ID }' of the target T, wherein the behavior category comprises a plurality of categories of writing, drawing, walking, running, stretching limbs, playing basketball, playing football, dancing, swimming, riding a bicycle, handshaking, hugging, drinking water, eating things and mutual pushing; the video structuring comprises target matching and generating a descriptive information sequence of the target, and the specific process is as follows:
(1) Target matching
The method comprises the steps of realizing target tracking between two adjacent processing units by calculating cosine distances between all target space features generated by the two adjacent processing units, namely, target tracking between the last 16 frames and the current 16 frames, wherein a cosine distance calculation formula between all target space features generated by the two adjacent processing units is as follows:
Figure BDA0002388621840000041
wherein x is i The ith spatial eigenvalue, y, of the x target generated for the last processing unit i An ith space feature value of the y target generated for the current processing unit, wherein θ is an included angle between two space feature vectors;
when theta is smaller than threshold_theta, the targets are considered to be matched, and the y targets are marked with the number num_x of the x targets;
when theta is more than or equal to threshold_theta, the targets are considered to be unmatched, and the y target is marked as a new index num_y;
wherein threshold_θ=0.33 is the angle Threshold;
(2) Generating a sequence of descriptive information of a target
Storing the processing results into a corresponding structured database according to a time sequence, wherein the processing results comprise time, monitoring camera numbers, target numbers, behavior type marks and spatial features, namely a descriptive information sequence of a target T, and the descriptive information sequence comprises the following specific forms:
info_Seq_T={time_1,camera_ID,num_T,action_ID,spatial_Feature_T;…;time_i,camera_ID,num_T,action_ID,spatial_Feature_T;}
where i e n+.
The specific form of the track sequence of the student S in the step S2 is as follows:
traj_S={location_Time_1,lat_1,lon_1,ID_S;…;location_Time_i,lat_i,lon_i,ID_S;}
wherein i εN+;
the specific process of the space-time analysis in the step S3 is as follows:
s3.1 Global target retrieval
According to the spatial characteristics of the target T, performing global search on a plurality of cameras in a calibration area to generate a target by utilizing cosine distances, finding out all the describable information sequences of the target T in the L time period, and arranging according to a time sequence;
s3.2 space-time matching
(1) The time distribution of the monitoring cameras where all similar targets of the target T are located in the L time period and the installation positions of the monitoring cameras are utilized to construct a space-time matching sequence, and the space-time matching sequence is specifically formed in the following steps:
camera_Time_Seq_T_L={time_1,camera_m_lat,camera_m_lon;…;time_i,camera_n_lat,camera_n_lon;}
where i e (1, …, L), time_i is the i-th time point, camera_m_lat represents the mounting position-latitude of the monitoring camera m, and camera_m_lon represents the mounting position-longitude of the monitoring camera m;
(2) The track sequence of the student S in the L time period is cut out, and the specific form of the cut-out student track sequence is as follows:
traj_S_L={location_Time_1,lat_1,lon_1,ID_S;…;location_Time_i,lat_i,lon_i,ID_S;}
where i e (1, …, L), action_time_i represents the point in Time nearest to time_i, lat_i represents: the latitude of the student at the Time of action_time_i, lon_i represents: the calculation formula of the Euclidean distance between the longitude of the student at the Time of action_Time_i, camera_Time_Seq_T_L and traj_S_L is as follows:
Figure BDA0002388621840000051
If dist (S, T) is more than or equal to threshold_dist, the target T is not student S;
if dist (S, T) < threshold_dist, then it indicates that the target T is likely student S;
wherein the method comprises the steps ofThreshold_dist=0.8 is the euclidean distance Threshold, s i Lat is the latitude of student S at the moment of action_Time_i, S i LOn is the longitude of student S at the Time of action_Time_i, t i Lat is the latitude of the target T at time_i, s i _lon is the longitude of the target T at time_i;
(3) Sequentially comparing a plurality of intercepted student track sequences, screening out student F with the minimum dist (F, T) value, and identifying the student F as a target T to generate a student track-behavior sequence pair, wherein the student track-behavior sequence pair has the specific form:
Figure BDA0002388621840000052
where i.epsilon.1, …, L.
The low-frequency 3D residual error network structure comprises the following components: an Input layer Input, a convolutional layer Conv1, a pooling layer Pool1, and four residual blocks Res2, res3, res4, res5; the Input layer Input samples the accessed video stream, the sampling interval is 16, namely, one frame is selected as Input every 16 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; and finally outputting to obtain the target feature through Res2 to Res5 residual blocks.
Each residual block of four residual blocks Res2-Res5 of the low-frequency 3D residual network structure of the present invention is composed of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; through Res2 to Res5, a very deep network structure is built together by the main paths, and shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer, so that the problems of gradient disappearance and gradient explosion are solved, and good operation performance of the network is ensured while the deeper network is utilized to extract the characteristics.
The specific parameters of each layer in the low-frequency 3D residual error network are as follows: wherein the Input layer Input: the output size is {1,3,224,224}, the sampling interval is 16, and the video frame size is {3,224,224}; convolutional layer Conv1: the output size is {1,64,112,112}, the number of convolution kernels is 64, the step size is {1, 2}, and the size is {1,3,7,7}; pool layer Pool1: the output size is {1,64,56,56}, the pooling kernel step size is {1, 2}, the size is {1, 3}, and the mode is the maximum value;
Residual block Res2: the output size is 1,256,56,56,
Figure BDA0002388621840000061
residual block Res3: the output size is 1,512,28,28,
Figure BDA0002388621840000062
residual block Res4: the output size is 1,1024,14,14,
Figure BDA0002388621840000071
residual block Res5: the output size is 1,2048,7,7,
Figure BDA0002388621840000072
wherein the above numbers or symbols have the meaning: the video frame size "{3,224,224}" corresponds to { video frame channel number, video frame width, video frame height }; the convolution kernel size "{3,1,7,7}" corresponds to { [ number of video frame channels, ] video frame depth, video frame width, video frame height }, the value in brackets may not exist; the convolution kernel step size "{1, 2}" corresponds to { time step size, horizontal space step size, vertical space step size }; the pooling kernel size "{1, 3}" corresponds to { video frame depth, video frame width, video frame height }; the pooling kernel step size "{1, 2}" corresponds to { time step size, horizontal space step size, vertical space step size }; the output size (e.g., "{1,3,224,224 }") corresponds to { feature map depth, number of convolution kernels, feature map width, feature map height }, and the factor (e.g., "×6") after multiplication in the residual block parameters represents the number of repetitions of the convolution operation;
the high-frequency 3D residual error network comprises an Input layer Input, a convolution layer Conv1, a pooling layer Pool1 and four residual error blocks Res2, res3, res4 and Res5; the Input layer Input samples the accessed video stream, the sampling interval is 2, namely, one frame is selected as Input every 2 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; and finally outputting to obtain the behavior characteristics through Res2 to Res5 residual blocks.
Each residual block of four residual blocks Res2-Res5 of the high-frequency 3D residual network of the present invention is composed of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; through Res2 to Res5, a very deep network structure is built together by the main paths, and shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer, so that the problems of gradient disappearance and gradient explosion are solved, and good operation performance of the network is ensured while the deeper network is utilized to extract the characteristics.
The specific parameters of each layer in the high-frequency 3D residual error network are as follows: wherein the Input layer Input: the output size is {8,3,224,224}, the sampling interval is 2, and the image size is {3,224,224}; convolutional layer Conv1: the output size is {8,8,112,112}, the number of convolution kernels is 8, the step size is {1, 2}, and the size is {5,3,7,7}; pool layer Pool1: the output size is {8,8,56,56}, the pooling kernel step size is {1, 2}, the size is {1, 3}, and the mode is the maximum value {1,64,56,56};
Residual block Res2: the output size is 8,32,56,56,
Figure BDA0002388621840000081
residual block Res3: the output size is 8,64,28,28,
Figure BDA0002388621840000082
residual block Res4: the output size is 8,128,14,14,
Figure BDA0002388621840000083
residual block Res5: the output size is 8,256,7,7,
Figure BDA0002388621840000084
wherein the meaning of the number or symbol in each layer of specific parameters in the high-frequency 3D residual error network is the same as the meaning of the number or symbol in each layer of specific parameters in the low-frequency 3D residual error network;
compared with the prior art, the intelligent analysis method for the student behavior characteristics based on the multidimensional data is designed, and can be used for carrying out intelligent comprehensive analysis by combining data of multiple dimensions such as student behavior data generated based on video structuring, student positioning data calibrated based on wearable positioning equipment, detailed and multi-aspect text information and the like, helping schools to discover the behavior characteristics of each student, helping schools to discover abnormal behaviors of the students in time through long-time tracking and learning, and achieving rapid early warning.
Description of the drawings:
fig. 1 is a schematic diagram of a low frequency 3D residual network structure according to the present invention.
Fig. 2 is a schematic diagram of a high frequency 3D residual network structure according to the present invention.
Fig. 3 is a schematic block diagram of a process flow of the intelligent analysis method of student behavior characteristics based on multidimensional data.
The specific embodiment is as follows:
the invention is further illustrated by the following examples in conjunction with the accompanying drawings.
Example 1:
the embodiment relates to an intelligent analysis method for student behavior characteristics based on multidimensional data, which comprises the following specific process steps:
s1, video structuring process
The video structuring refers to a technology of performing deep learning processing means such as target segmentation, time sequence analysis, target recognition and the like on video contents according to semantic relations, analyzing and recognizing target information, and then organizing the target information into text information which can be understood by a computer and a person. Through video structuring processing, text information and video snapshot data are stored in corresponding structured databases, so that video searching speed can be greatly improved, video storage capacity can be reduced, application value of video data can be improved, and subsequent video data analysis and prediction can be facilitated;
the video structuring process takes continuous 16 frames as a processing unit, and the video frame in each processing unit defaults to 3 channels; the video structuring is processed by constructing a high-low frequency 3D residual neural network model, the high-low frequency 3D residual neural network model comprises a low-frequency 3D residual neural network and a high-frequency 3D residual neural network, the low-frequency 3D residual neural network performs personnel structuring processing, and the high-frequency 3D residual neural network performs behavior structuring processing; the specific process of the video structuring process is as follows:
S1.1 target feature extraction
Because the category of the detected target in the video is basically unchanged, the extraction of the target features only needs to depend on a single or a few video frames, in this step, the target feature extraction is performed by using a low-frequency 3D residual network structure, the sampling interval of the low-frequency 3D residual network structure on the video frames is set to be inv_l=16, the low-frequency 3D residual network structure is used for extracting the spatial and semantic information of the target, and the low-frequency 3D residual network structure is schematically shown in the following diagram:
the low-frequency 3D residual network structure of this embodiment includes: an Input layer Input, a convolutional layer Conv1, a pooling layer Pool1, and four residual blocks Res2, res3, res4, res5; the Input layer Input samples the accessed video stream, the sampling interval is 16, namely, one frame is selected as Input every 16 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; the target characteristics are finally obtained through the Res2 to Res5 residual blocks;
Each of the four residual blocks Res2-Res5 of the low-frequency 3D residual network structure in this embodiment is composed of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; through Res2 to Res5, a very deep network structure is built together by the main paths, and shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer, so that the problems of gradient disappearance and gradient explosion are solved, and good operation performance of the network is ensured while the deeper network extraction characteristics are utilized; the specific parameters of the low-frequency 3D residual network structure are shown in the following table:
Figure BDA0002388621840000101
wherein the numerical or symbolic meaning in the above table is: the video frame size "{3,224,224}" corresponds to { video frame channel number, video frame width, video frame height }; the convolution kernel size "{1,3,7,7}" corresponds to { [ number of video frame channels, ] video frame depth, video frame width, video frame height }, the value in brackets may not exist; the convolution kernel step size "{1, 2}" corresponds to { time step size, horizontal space step size, vertical space step size }; the pooling kernel size "{1, 3}" corresponds to { video frame depth, video frame width, video frame height }; the pooling kernel step size "{1, 2}" corresponds to { time step size, horizontal space step size, vertical space step size }; the output size (e.g., "{1,3,224,224 }") corresponds to { feature map depth, number of convolution kernels, feature map width, feature map height }, and the factor (e.g., "×6") after multiplication in the residual block parameters represents the number of repetitions of the convolution operation;
S1.2 behavioral characteristics extraction
Since the behavior of the target is sometimes changed in a very short time, the behavior feature extraction is performed by using the high-frequency 3D residual error network structure in the step, and the calculated and extracted behavior feature size is {8,256,7,7}; setting the sampling interval of a high-frequency 3D residual network structure to be inv_h=inv_l/alpha of a video frame, wherein alpha=8, and the number of convolution kernels is beta times that of the convolution kernels in a low-frequency 3D residual network, wherein beta=1/8; the high-frequency 3D residual network structure has higher time resolution and fewer convolution kernels, the structure is beneficial to the development of useful time sequence information of a target at a higher speed, the high-frequency 3D residual network structure is schematically shown in figure 2,
the high-frequency 3D residual network described in this embodiment includes an Input layer Input, a convolutional layer Conv1, a pooling layer Pool1, and four residual blocks Res2, res3, res4, res5; the Input layer Input samples the accessed video stream, the sampling interval is 2, namely, one frame is selected as Input every 2 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; and finally outputting to obtain low-resolution behavior characteristics through Res2 to Res5 residual blocks.
Each residual block in the four residual blocks Res2-Res5 of the high-frequency 3D residual network in this embodiment is composed of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; through Res2 to Res5, a very deep network structure is built together by the main paths, and shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer, so that the problems of gradient disappearance and gradient explosion are solved, and good operation performance of the network is ensured while the deeper network extraction characteristics are utilized; the high frequency 3D residual network parameters are shown in the following table:
Figure BDA0002388621840000121
wherein the meaning of the numbers or symbols in the table is the same as the meaning of the numbers or symbols in the table S1.1;
s1.3 video classification
Firstly, performing size conversion on behavior features through matrix operation, secondly, processing the target features and the behavior features after size conversion by global mean value pooling operation, performing transverse connection on the processed target features and behavior features again, and finally inputting the target features and behavior features into a full-connection layer to finally obtain the spatial position and behavior category of the target; the specific process is as follows:
(1) Behavioral feature size conversion
The behavioral characteristic size obtained by S1.2 calculation is {8,256,7,7}, in order to realize characteristic connection with the target characteristic obtained by S1.1 calculation, the size conversion is needed, the behavioral characteristic size is converted into {1,8× 256,7,7} = {1,2048,7,7}, namely, the number of convolution kernels of alpha characteristic graphs is converted into a single characteristic graph through matrix operation, and the characteristic size conversion is completed;
(2) Feature global averaging pooling
Processing the target features and the behavior features after the size conversion by adopting global averaging operation, wherein the sizes of the pooled cores {1, 7}, and the characteristic sizes after pooling are {1,1,1,2048};
(3) Feature connection
Transversely connecting the target characteristics and the behavior characteristics processed in the two steps to obtain a connected characteristic length of 4096;
(4) Full connection operation
And inputting the characteristics obtained in the previous step into a full-connection layer to finally obtain the spatial position and behavior category of the target.
Model parameters pre-trained on a kinetic-400 data set are used in the operation process of the steps S1.1, S1.2 and S1.3;
s1.4 video structuring
The spatial position of the target T, the behavior category and the spatial characteristics of the target T obtained in the S1.3 are obtained to jointly construct descriptive information { location_T, spatial_feature_T, action_ID }. For video structuring, wherein the behavior category comprises a plurality of categories such as writing, drawing, walking, running, stretching limbs, basketball playing, football playing, dancing, swimming, bicycle riding, handshake, hugging, drinking, eating, mutual pushing and the like; the video structuring comprises target matching and generating a descriptive information sequence of the target, and the specific process is as follows:
(1) Target matching
The method comprises the steps of realizing target tracking between two adjacent processing units by calculating cosine distances between all target space features generated by the two adjacent processing units, namely, target tracking between the last 16 frames and the current 16 frames, wherein a cosine distance calculation formula between all target space features generated by the two adjacent processing units is as follows:
Figure BDA0002388621840000131
wherein x is i The ith spatial eigenvalue, y, of the x target generated for the last processing unit i An ith space feature value of the y target generated for the current processing unit, wherein θ is an included angle between two space feature vectors;
when theta is smaller than threshold_theta, the targets are considered to be matched, and the y targets are marked with the number num_x of the x targets;
when theta is more than or equal to threshold_theta, the targets are considered to be unmatched, and the y target is marked as a new index num_y;
where threshold_θ=0.33 is the angle Threshold.
(2) Generating a sequence of descriptive information of a target
Storing the processing results into a corresponding structured database according to a time sequence, wherein the processing results comprise time, monitoring camera numbers, target numbers, behavior type marks and spatial features, namely a descriptive information sequence of a target T, and the descriptive information sequence comprises the following specific forms:
info_Seq_T={time_1,camera_ID,num_T,action_ID,spatial_Feature_T;…;time_i,camera_ID,num_T,action_ID,spatial_Feature_T;}
wherein i εN+;
S2, collecting real-time positioning data of students;
the student wears a positioning device (positioning bracelet/positioner) with the positioning functions of GPS, beidou, wiFi and base station, positions the student at a fixed frequency, stores positioning results, positioning time and student ID information into a database, and generates a track sequence of the student S; the specific form of the trajectory sequence of student S is:
traj_S={location_Time_1,lat_1,lon_1,ID_S;…;location_Time_i,lat_i,lon_i,ID_S;}
wherein i εN+;
s3, space-time studying and judging analysis
According to the spatial characteristics of the target T, searching all the descriptive information sequences of the target T in the L time period, and arranging the descriptive information sequences according to the time sequence; constructing a space-time matching sequence by using the time distribution of the monitoring cameras where all similar targets of the target T are located in the L time period and the installation positions of the monitoring cameras, intercepting the track sequences of a plurality of students similar to each other in the L time period, and finally screening out the track-behavior sequence pairs of the students from the track sequences of the students, wherein the specific process is as follows:
s3.1 Global target retrieval
According to the spatial characteristics of the target T, performing global search on a plurality of cameras in a calibration area to generate a target by utilizing cosine distances, finding out all the describable information sequences of the target T in the L time period, and arranging according to a time sequence;
S3.2 space-time matching
(1) The time distribution of the monitoring cameras where all similar targets of the target T are located in the L time period and the installation positions of the monitoring cameras are utilized to construct a space-time matching sequence, and the space-time matching sequence is specifically formed in the following steps:
camera_Time_Seq_T_L={time_1,camera_m_lat,camera_m_lon;…;time_i,camera_n_lat,camera_n_lon;}
where i e (1, …, L), time_i is the i-th time point, camera_m_lat represents the mounting position-latitude of the monitoring camera m, and camera_m_lon represents the mounting position-longitude of the monitoring camera m;
(2) The track sequence of the student S in the L time period is cut out, and the specific form of the cut-out student track sequence is as follows:
traj_S_L={location_Time_1,lat_1,lon_1,ID_S;…;location_Time_i,lat_i,lon_i,ID_S;}
where i e (1, …, L), action_time_i represents the point in Time nearest to time_i, lat_i represents: the latitude of the student at the Time of action_time_i, lon_i represents: the calculation formula of the Euclidean distance between the longitude of the student at the Time of action_Time_i, camera_Time_Seq_T_L and traj_S_L is as follows:
Figure BDA0002388621840000151
if dist (S, T) is more than or equal to threshold_dist, the target T is not student S;
if dist (S, T) < threshold_dist, then it indicates that the target T is likely student S;
where threshold_dist=0.8 is the euclidean distance Threshold, s i Lat is the latitude of student S at the moment of action_Time_i, S i LOn is the longitude of student S at the Time of action_Time_i, t i Lat is the latitude of the target T at time_i, s i _lon is the longitude of the target T at time_i;
(3) Sequentially comparing a plurality of intercepted student track sequences, screening out student F with the minimum dist (F, T) value, and identifying the student F as a target T to generate a student track-behavior sequence pair, wherein the student track-behavior sequence pair has the specific form:
Figure BDA0002388621840000152
where i.epsilon.1, …, L.
S4, student behavior feature analysis
Drawing and generating a daily activity rule chart of the student by utilizing the student track-behavior sequence pair obtained in the S3 and combining with the time schedule of the student in the T time period, the school work and rest time schedule, the basic information of the student and the like, and carrying out data mining to find the activity hobbies of the student in the school period, so that a teacher is helped to effectively improve the teaching plan according to the characteristics of different students; by carrying out long-time rule statistics on the track-behavior sequence pairs of the students and combining with human supervision, a data prediction alarm function is constructed, whether abnormal behaviors exist or not is judged, such as long-time aggregation, loitering behaviors, track deviation and the like, real-time early warning is achieved, and accidents are prevented.
According to the student behavior characteristic intelligent analysis method based on the multidimensional data, student behavior and track original data are automatically collected through terminal equipment, multiple deep learning methods and video structuring technology are utilized to construct student behavior-track sequence pairs, the student behavior data network is constructed in combination with other text information, the data mining technology is utilized to conduct deep analysis and mining on the student behavior characteristics, school is helped to discover interests of students and activity rules during school, meanwhile, data prediction functions are utilized to achieve real-time early warning of abnormal behavior tracks, and early intervention is achieved. The invention has environment-friendly application and great market prospect.
The student behavior characteristic intelligent analysis method based on the multidimensional data can be suitable for performing behavior characteristic analysis on mental patients in mental hospitals, criminals in prisons and the like; the condition, the health recovery degree and the like of the mental patient are intelligently judged through daily performance of the mental patient, and further a further treatment scheme is made according to the analysis result of the mental patient; or by collecting and analyzing the daily behaviors of the criminals in the prison, the improvement degree of the criminals is determined, so that whether the criminals are suitable for criminal reduction and other measures is judged.

Claims (10)

1. A student behavior characteristic intelligent analysis method based on multidimensional data is characterized in that: the specific process steps are as follows:
s1, video structuring process
Taking continuous 16 frames as a processing unit, defaulting a video frame in each processing unit to be 3 channels, constructing a high-low frequency 3D residual neutral network model for video structuring, storing text information and video snapshot data into corresponding structuring databases, wherein the high-low frequency 3D residual neutral network model comprises a low frequency 3D residual network structure and a high frequency 3D residual network structure, the low frequency 3D residual network structure performs personnel structuring processing to extract target features, and the high frequency 3D residual network structure performs behavior structuring processing to extract behavior features; connecting the target features with the behavior features, and processing to obtain the spatial position and behavior category of the target T;
The sampling interval of the low-frequency 3D residual error network structure on the video frame is set to be inv_l=16, and the low-frequency 3D residual error network structure is used for extracting the space and semantic information of a target; setting the sampling interval of a high-frequency 3D residual network structure to be inv_h=inv_l/alpha of a video frame, wherein alpha=8, and the number of convolution kernels is beta times that of the convolution kernels in a low-frequency 3D residual network, wherein beta=1/8;
s2, collecting real-time positioning data of students;
the student wears positioning equipment with the functions of GPS, beidou, wiFi and base station positioning, positions the student at a fixed frequency, stores the positioning result, positioning time and student ID information into a database, and generates a track sequence of the student;
s3, space-time studying and judging analysis
Extracting the spatial characteristics of the target T by using the spatial position of the target T, searching all similar targets of the target T in the L time period according to similarity matching, arranging the spatial characteristics, the behavior types and the installation positions of the monitoring cameras of each similar target according to time sequence, and constructing a descriptive information sequence of the target T; intercepting track sequences of a plurality of students in an L time period, combining a descriptive information sequence of a target T, and screening out a track sequence of the student S with the highest matching degree with the target T from the track sequences of the plurality of students by utilizing track matching, namely, recognizing the target T as the student S, thereby constructing a track-behavior sequence pair of the student S;
S4, student behavior feature analysis
Drawing and generating a daily activity rule chart of the student by utilizing the student track-behavior sequence pair obtained in the step S3 and combining with the school time schedule, school work and rest schedule, basic student information and the like of the student in the L time period, and carrying out data mining to find out activity hobbies of the student in the school period, so as to help a teacher to effectively improve a teaching plan according to the characteristics of different students; by carrying out long-time rule statistics on the track-behavior sequence pairs of the students and combining with human supervision, a data prediction alarm function is constructed, whether the students have abnormal behaviors or not is judged, the abnormal behaviors comprise long-time aggregation, loitering behaviors and track deviation, real-time early warning is achieved, and accidents are prevented.
2. The multi-dimensional data based student behavior feature intelligent analysis method of claim 1, wherein: the specific process of the video structuring processing in the step S1 is as follows:
s1.1 target feature extraction
Extracting target features by using a low-frequency 3D residual error network structure;
s1.2 behavioral characteristics extraction
Performing behavior feature extraction by using a high-frequency 3D residual error network structure, and calculating the extracted behavior feature size to be {8,256,7,7};
s1.3 video classification
Firstly, performing size conversion on behavior features through matrix operation, secondly, processing target features and behavior features after size conversion through global mean value pooling operation, thirdly, transversely connecting the processed target features and behavior features, and finally, inputting the target features and behavior features into a full-connection layer to finally obtain the spatial position and behavior category of the target;
s1.4 video structuring
The spatial position of the target T, the behavior category and the spatial characteristics of the target T obtained in the S1.3 are obtained to jointly construct descriptive information { location_T, spatial_feature_T, action_ID }' of the target T, wherein the behavior category comprises a plurality of categories of writing, drawing, walking, running, stretching limbs, playing basketball, playing football, dancing, swimming, riding a bicycle, handshaking, hugging, drinking water, eating things and mutual pushing; the video structuring includes target matching and generating a sequence of descriptive information of the target.
3. The multi-dimensional data based student behavior feature intelligent analysis method of claim 2, wherein: the specific process of the S1.3 step video classification is as follows:
(1) Behavioral feature size conversion
S1.2, the behavior feature size obtained through calculation is {8,256,7,7}, the behavior feature size is converted into {1,8× 256,7,7} = {1,2048,7,7}, namely, alpha feature graphs are converted into the number of convolution kernels of a single feature graph, and the feature size conversion is completed;
(2) Feature global averaging pooling
Processing the target features and the behavior features after the size conversion by adopting global averaging pooling operation, wherein the pooled kernel size is {1, 7}, and the pooled feature sizes are {1,1,1,2048};
(3) Feature connection
Transversely connecting the target characteristics and the behavior characteristics processed in the two steps to obtain a connected characteristic length of 4096;
(4) Full connection operation
Inputting the features obtained in the previous step into a full-connection layer to finally obtain the spatial position and behavior category of the target;
and pre-trained model parameters on a kinetic-400 data set are used in the operation process of the steps S1.1, S1.2 and S1.3.
4. A multi-dimensional data based student behavioral characteristics intelligent analysis method according to claim 2 or claim 3, wherein: the specific process of the S1.4 step video structuring is as follows:
(1) Target matching
The method comprises the steps of realizing target tracking between two adjacent processing units by calculating cosine distances between all target space features generated by the two adjacent processing units, namely, target tracking between the last 16 frames and the current 16 frames, wherein a cosine distance calculation formula between all target space features generated by the two adjacent processing units is as follows:
Figure FDA0004155115290000031
Wherein x is i The ith spatial eigenvalue, y, of the x target generated for the last processing unit i The ith spatial feature value of the y-object generated for the current processing unit, θ is twoIncluded angles between the individual spatial feature vectors;
when theta is smaller than threshold_theta, the targets are considered to be matched, and the y targets are marked with the number num_x of the x targets;
when theta is more than or equal to threshold_theta, the targets are considered to be unmatched, and the y target is marked as a new index num_y;
wherein threshold_θ=0.33 is the angle Threshold;
(2) Generating a sequence of descriptive information of a target
Storing the processing results into a corresponding structured database according to a time sequence, wherein the processing results comprise time, monitoring camera numbers, target numbers, behavior type marks and spatial features, namely a descriptive information sequence of a target T, and the descriptive information sequence comprises the following specific forms:
info_Seq_T={time_1,camera_ID,num_T,action_ID,spatial_Feature_T;…;
time_i,camera_ID,num_T,action_ID,spatial_Feature_T;}
where i e n+.
5. The intelligent analysis method for student behavior characteristics based on multidimensional data according to claim 4, wherein: the specific form of the track sequence of the student S in the step S2 is as follows:
traj_s= { location_time_1, lat_1, lon_1, id_s; …; location_time_i, lat_i, lon_i, id_s; where i e n+.
6. The intelligent analysis method for student behavior characteristics based on multidimensional data according to claim 5, wherein: the specific process of the space-time analysis in the step S3 is as follows:
S3.1 Global target retrieval
According to the spatial characteristics of the target T, performing global search on a plurality of cameras in a calibration area to generate a target by utilizing cosine distances, finding out all the describable information sequences of the target T in the L time period, and arranging according to a time sequence;
s3.2 space-time matching
(1) The time distribution of the monitoring cameras where all similar targets of the target T are located in the L time period and the installation positions of the monitoring cameras are utilized to construct a space-time matching sequence, and the space-time matching sequence is specifically formed in the following steps:
camera_Time_Seq_T_L={time_1,camera_m_lat,camera_m_lon;…;
time_i,camera_n_lat,camera_n_lon;}
where i e (1, …, L), time_i is the i-th time point, camera_m_lat represents the mounting position-latitude of the monitoring camera m, and camera_m_lon represents the mounting position-longitude of the monitoring camera m;
(2) The track sequence of the student S in the L time period is cut out, and the specific form of the cut-out student track sequence is as follows:
traj_s_l= { location_time_1, lat_1, lon_1, id_s; …; location_time_i, lat_i, lon_i, id_s; where i e (1, …, L), action_time_i represents the point in Time nearest to time_i, lat_i represents: the latitude of the student at the Time of action_time_i, lon_i represents: the calculation formula of the Euclidean distance between the longitude of the student at the Time of action_Time_i, camera_Time_Seq_T_L and traj_S_L is as follows:
Figure FDA0004155115290000041
If dist (S, T) is more than or equal to threshold_dist, the target T is not student S;
if dist (S, T) < threshold_dist, then it indicates that the target T is likely student S;
where threshold_dist=0.8 is the euclidean distance Threshold, s i Lat is the latitude of student S at the moment of action_Time_i, S i LOn is the longitude of student S at the Time of action_Time_i, t i Lat is the latitude of the target T at time_i, s i _lon is the longitude of the target T at time_i;
(3) Sequentially comparing a plurality of intercepted student track sequences, screening out student F with the minimum dist (F, T) value, and identifying the student F as a target T to generate a student track-behavior sequence pair, wherein the student track-behavior sequence pair has the specific form:
Figure FDA0004155115290000051
where i.epsilon.1, …, L.
7. The intelligent analysis method for student behavior characteristics based on multidimensional data according to claim 6, wherein: the low frequency 3D residual network structure comprises: an Input layer Input, a convolutional layer Conv1, a pooling layer Pool1, and four residual blocks Res2, res3, res4, res5; the Input layer Input samples the accessed video stream, the sampling interval is 16, namely, one frame is selected as Input every 16 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; the target characteristics are finally obtained through the Res2 to Res5 residual blocks;
Each of the four residual blocks Res2-Res5 of the low frequency 3D residual network structure consists of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; the main paths of the two paths together construct a very deep network structure through Res2 to Res5, and the shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer.
8. The multi-dimensional data based student behavior feature intelligent analysis method of claim 7, wherein: specific parameters of each layer in the low-frequency 3D residual error network are as follows: wherein the Input layer Input: the output size is {1,3,224,224}, the sampling interval is 16, and the video frame size is {3,224,224}; convolutional layer Conv1: the output size is {1,64,112,112}, the number of convolution kernels is 64, the step size is {1, 2}, and the size is {1,3,7,7}; pool layer Pool1: the output size is {1,64,56,56}, the pooling kernel step size is {1, 2}, the size is {1, 3}, and the mode is the maximum value;
Residual block Res2: the output size is 1,256,56,56,
Figure FDA0004155115290000052
residual block Res3: the output size is 1,512,28,28,
Figure FDA0004155115290000061
residual block Res4: the output size is 1,1024,14,14,
Figure FDA0004155115290000062
residual block Res5: the output size is 1,2048,7,7,
Figure FDA0004155115290000063
/>
wherein the factor after multiplication in the residual block parameter represents the number of repetitions of the convolution operation.
9. The multi-dimensional data based student behavior feature intelligent analysis method of claim 8, wherein: the high-frequency 3D residual error network comprises an Input layer Input, a convolution layer Conv1, a pooling layer Pool1 and four residual error blocks Res2, res3, res4 and Res5; the Input layer Input samples the accessed video stream, the sampling interval is 2, namely, one frame is selected as Input every 2 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; finally outputting to obtain behavior characteristics through Res2 to Res5 residual blocks;
Each of the four residual blocks Res2-Res5 of the high frequency 3D residual network consists of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; the main paths of the two paths together construct a very deep network structure through Res2 to Res5, and the shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer.
10. The multi-dimensional data based student behavior feature intelligent analysis method of claim 9, wherein: specific parameters of each layer in the high-frequency 3D residual error network are as follows: wherein the Input layer Input: the output size is {8,3,224,224}, the sampling interval is 2, and the image size is {3,224,224}; convolutional layer Conv1: the output size is {8,8,112,112}, the number of convolution kernels is 8, the step size is {1, 2}, and the size is {5,3,7,7}; pool layer Pool1: the output size is {8,8,56,56}, the pooling kernel step size is {1, 2}, the size is {1, 3}, and the mode is the maximum value {1,64,56,56};
Residual block Res2: the output size is 8,32,56,56,
Figure FDA0004155115290000071
residual block Res3: the output size is 8,64,28,28,
Figure FDA0004155115290000072
residual block Res4: the output size is 8,128,14,14,
Figure FDA0004155115290000073
residual block Res5: the output size is 8,256,7,7,
Figure FDA0004155115290000074
wherein the factor after multiplication in the residual block parameter represents the number of repetitions of the convolution operation.
CN202010106436.4A 2020-02-21 2020-02-21 Student behavior feature intelligent analysis method based on multidimensional data Active CN111325153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010106436.4A CN111325153B (en) 2020-02-21 2020-02-21 Student behavior feature intelligent analysis method based on multidimensional data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010106436.4A CN111325153B (en) 2020-02-21 2020-02-21 Student behavior feature intelligent analysis method based on multidimensional data

Publications (2)

Publication Number Publication Date
CN111325153A CN111325153A (en) 2020-06-23
CN111325153B true CN111325153B (en) 2023-05-12

Family

ID=71173053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010106436.4A Active CN111325153B (en) 2020-02-21 2020-02-21 Student behavior feature intelligent analysis method based on multidimensional data

Country Status (1)

Country Link
CN (1) CN111325153B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328077B (en) * 2020-11-05 2021-08-24 重庆第二师范学院 College student behavior analysis system, method, device and medium
CN114818991B (en) * 2022-06-22 2022-09-27 西南石油大学 Running behavior identification method based on convolutional neural network and acceleration sensor
CN116611022B (en) * 2023-04-21 2024-04-26 深圳乐行智慧产业有限公司 Intelligent campus education big data fusion method and platform
CN116602664B (en) * 2023-07-17 2023-09-22 青岛市胶州中心医院 Comprehensive diagnosis and treatment nursing system for neurosurgery patients

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062349A (en) * 2017-10-31 2018-05-22 深圳大学 Video frequency monitoring method and system based on video structural data and deep learning
CN108171630A (en) * 2017-12-29 2018-06-15 三盟科技股份有限公司 Discovery method and system based on campus big data environment Students ' action trail
CN108898560A (en) * 2018-06-21 2018-11-27 四川大学 Rock core CT image super-resolution rebuilding method based on Three dimensional convolution neural network
CN109636062A (en) * 2018-12-25 2019-04-16 湖北工业大学 A kind of students ' behavior analysis method and system based on big data analysis
CN109636688A (en) * 2018-12-11 2019-04-16 武汉文都创新教育研究院(有限合伙) A kind of students ' behavior analysis system based on big data
CN109684514A (en) * 2018-12-11 2019-04-26 武汉文都创新教育研究院(有限合伙) Students ' behavior positioning system and method based on track data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062349A (en) * 2017-10-31 2018-05-22 深圳大学 Video frequency monitoring method and system based on video structural data and deep learning
CN108171630A (en) * 2017-12-29 2018-06-15 三盟科技股份有限公司 Discovery method and system based on campus big data environment Students ' action trail
CN108898560A (en) * 2018-06-21 2018-11-27 四川大学 Rock core CT image super-resolution rebuilding method based on Three dimensional convolution neural network
CN109636688A (en) * 2018-12-11 2019-04-16 武汉文都创新教育研究院(有限合伙) A kind of students ' behavior analysis system based on big data
CN109684514A (en) * 2018-12-11 2019-04-26 武汉文都创新教育研究院(有限合伙) Students ' behavior positioning system and method based on track data
CN109636062A (en) * 2018-12-25 2019-04-16 湖北工业大学 A kind of students ' behavior analysis method and system based on big data analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEI SONG等.A Novel Violent Video Detection Scheme Based on Modified 3D Convolutional Neural Networks.《IEEE Access》.2019,39172-39179. *
高轶 ; 王鹏 ; .一种基于数据挖掘的目标行为规律分析算法.无线电工程.2018,(第12期),全文. *

Also Published As

Publication number Publication date
CN111325153A (en) 2020-06-23

Similar Documents

Publication Publication Date Title
CN111325153B (en) Student behavior feature intelligent analysis method based on multidimensional data
CN107133569B (en) Monitoring video multi-granularity labeling method based on generalized multi-label learning
CN108764085B (en) Crowd counting method based on generation of confrontation network
CN112418117A (en) Small target detection method based on unmanned aerial vehicle image
CN106408030A (en) SAR image classification method based on middle lamella semantic attribute and convolution neural network
CN104268140B (en) Image search method based on weight self study hypergraph and multivariate information fusion
CN112862849B (en) Image segmentation and full convolution neural network-based field rice ear counting method
Li et al. Depthwise nonlocal module for fast salient object detection using a single thread
CN111666823B (en) Pedestrian re-identification method based on individual walking motion space-time law collaborative identification
CN106649663A (en) Video copy detection method based on compact video representation
Ratre et al. Tucker visual search-based hybrid tracking model and Fractional Kohonen Self-Organizing Map for anomaly localization and detection in surveillance videos
Wang et al. Learning discriminative features for fast frame-based action recognition
Wu et al. A novel detection framework for detecting abnormal human behavior
CN109117774A (en) A kind of multi-angle video method for detecting abnormality based on sparse coding
CN104537392B (en) A kind of method for checking object based on the semantic part study of identification
Mou et al. PAEDID: P atch A utoencoder-based D eep I mage D ecomposition for pixel-level defective region segmentation
Li et al. Video is graph: Structured graph module for video action recognition
CN110738167B (en) Pedestrian identification method based on multi-domain spatial attribute correlation analysis
CN117292324A (en) Crowd density estimation method and system
Qiao et al. Rapid trajectory clustering based on neighbor spatial analysis
Yu et al. Drill-Rep: Repetition counting for automatic shot hole depth recognition based on combined deep learning-based model
Hu et al. Multi-level trajectory learning for traffic behavior detection and analysis
CN112015937B (en) Picture geographic positioning method and system
Arshad et al. Anomalous situations recognition in surveillance images using deep learning
CN112699954A (en) Closed-loop detection method based on deep learning and bag-of-words model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant