CN111325153B

CN111325153B - Student behavior feature intelligent analysis method based on multidimensional data

Info

Publication number: CN111325153B
Application number: CN202010106436.4A
Authority: CN
Inventors: 纪刚; 周亚敏; 周萌萌; 商胜楠; 周粉粉
Original assignee: Qingdao Lianhe Chuangzhi Technology Co ltd
Current assignee: Qingdao Lianhe Chuangzhi Technology Co ltd
Priority date: 2020-02-21
Filing date: 2020-02-21
Publication date: 2023-05-12
Anticipated expiration: 2040-02-21
Also published as: CN111325153A

Abstract

The invention belongs to the technical field of multidimensional data intelligent processing, and relates to an intelligent analysis method for student behavior characteristics based on multidimensional data; s1, constructing a high-low frequency 3D residual neural network model, and performing video structuring processing to obtain the spatial position and behavior category of a target; s2, storing the information of the acquired students into a database to generate a track sequence of the students; s3, constructing a descriptive information sequence of the target, screening a track sequence of a student with highest matching degree with the target, and constructing a track-behavior sequence pair of the student; s4, generating a daily activity rule chart of the student by utilizing the student track-behavior sequence pair, and carrying out long-time rule statistics to judge whether abnormal behaviors exist; the method can combine the data of multiple dimensions such as student behavior data, student positioning data and the like to perform intelligent comprehensive analysis, help schools discover the behavior characteristics of each student, discover abnormal behaviors of the students in time, and realize rapid early warning.

Description

Student behavior feature intelligent analysis method based on multidimensional data

Technical field:

the invention belongs to the technical field of multidimensional data intelligent processing, relates to a multidimensional data intelligent analysis method, and in particular relates to a multidimensional data-based student behavior characteristic intelligent analysis method.

The background technology is as follows:

schools not only teach students cultural knowledge, but also are burdened with teaching students to develop good behavioral habits. The method can accurately analyze the behavior characteristics of the students, and is helpful for improving the learning ability of the students, improving the teaching means of teachers, and improving the good interaction relationship between the students and the teachers. However, students are huge in number, activities during a school cannot be fully supervised, and how to supervise behaviors of the students by using effective means is a problem to be solved.

The application of the current high and new technologies such as big data, the Internet of things, artificial intelligence and the like in the education field is gradually expanded, and the student behavior analysis is also related to a certain degree. In the prior art, chinese patent with publication number of CN108268854A discloses a teaching auxiliary big data intelligent analysis method based on feature recognition, which comprises the following steps: s1, acquiring a student static image; s2, extracting the directional gradient characteristics of the image; s3, machine learning; s4, sliding a window; s5, selecting the action with the highest probability under action classification, and identifying the action; s6, carrying out face recognition on the action sender in the S5. The Chinese patent with publication number of CN108242035A discloses a campus security monitoring method based on big data, which is characterized by comprising the following steps: acquiring campus data; analyzing and processing campus data to obtain a student behavior model; and comparing and analyzing the student behavior model with the reference model and outputting an evaluation result. The invention realizes the safety monitoring of students in school, and the monitoring content comprises: the abnormal conditions are timely found through data analysis, so that the campus safety is improved. The Chinese patent with publication number of CN109145818A discloses a flow statistics method, device and system, relates to the technical field of big data, is applied to a server side, and comprises the steps of receiving target feature information of an interested feature to be subjected to flow statistics; searching a feature map with feature description information matched with target feature information in a preset storage area, wherein the feature description information in the preset storage area is obtained by carrying out structural processing on the feature map in an optimal image sent by a terminal; and counting the number of feature graphs corresponding to the searched feature description information to obtain a flow statistical result of the interested feature, so as to realize accurate statistics of the feature graphs with the same target feature information and further realize flow statistics of the interested feature. And the related statistical and analysis methods are lacking for the comprehensive behavior feature analysis of the student during the course and the course behavior recognition and the activity track.

The invention comprises the following steps:

the invention aims to overcome the defects of the existing foreground image processing, and designs and provides an intelligent analysis method for the behavior characteristics of students based on multidimensional data, aiming at the defects that the students are difficult to monitor the behavior of the students at the present time due to the large number of students at the present time, the behavior characteristics of the students are difficult to accurately analyze, the abnormal behaviors of the students cannot be found in time and the like.

In order to achieve the above purpose, the invention relates to a student behavior characteristic intelligent analysis method based on multidimensional data, which comprises the following specific process steps:

s1, video structuring process

Taking continuous 16 frames as a processing unit, defaulting a video frame in each processing unit to be 3 channels, constructing a high-low frequency 3D residual neutral network model for video structuring, storing text information and video snapshot data into corresponding structuring databases, wherein the high-low frequency 3D residual neutral network model comprises a low-frequency 3D residual neutral network and a high-frequency 3D residual neutral network, the low-frequency 3D residual neutral network performs personnel structuring processing to extract target features, and the high-frequency 3D residual neutral network performs behavior structuring processing to extract behavior features; connecting the target features with the behavior features, and processing to obtain the spatial position and behavior category of the target T;

S2, collecting real-time positioning data of students;

the student wears positioning equipment with the functions of GPS, beidou, wiFi and base station positioning, positions the student at a fixed frequency, stores the positioning result, positioning time and student ID information into a database, and generates a track sequence of the student;

s3, space-time studying and judging analysis

Extracting the spatial characteristics of the target T by using the spatial position of the target T, searching all similar targets of the target T in the L time period according to similarity matching, arranging the spatial characteristics, the behavior types and the installation positions of the monitoring cameras of each similar target according to time sequence, and constructing a descriptive information sequence of the target T; intercepting track sequences of a plurality of students in the L time period, combining the descriptive information sequences of the target T, and screening out the track sequence of the student S with the highest matching degree with the target T from the track sequences of the plurality of students by utilizing track matching, namely, identifying the target T as the student S, thereby constructing a track-behavior sequence pair of the student S;

s4, student behavior feature analysis

Drawing and generating a daily activity rule chart of the student by utilizing the student track-behavior sequence pair obtained in the step S3 and combining with the school time schedule, school work and rest schedule, basic student information and the like of the student in the L time period, and carrying out data mining to find out activity hobbies of the student in the school period, so as to help a teacher to effectively improve a teaching plan according to the characteristics of different students; by carrying out long-time rule statistics on the track-behavior sequence pairs of the students and combining with human supervision, a data prediction alarm function is constructed, whether the students have abnormal behaviors, such as long-time aggregation, loitering behaviors, track deviation and the like, real-time early warning is achieved, and accidents are prevented.

The specific process of the video structuring processing in the step S1 is as follows:

s1.1 target feature extraction

Performing target feature extraction by using a low-frequency 3D residual error network structure, wherein the sampling interval of the low-frequency 3D residual error network structure on a video frame is set to be inv_l=16, and the low-frequency 3D residual error network structure is used for extracting space and semantic information of a target;

s1.2 behavioral characteristics extraction

Performing behavior feature extraction by using a high-frequency 3D residual error network structure, and calculating the extracted behavior feature size to be {8,256,7,7}; setting the sampling interval of a high-frequency 3D residual network structure to be inv_h=inv_l/alpha of a video frame, wherein alpha=8, and the number of convolution kernels is beta times that of the convolution kernels in a low-frequency 3D residual network, wherein beta=1/8;

s1.3 video classification

Firstly, performing size conversion on behavior features through matrix operation, secondly, processing target features and behavior features after size conversion through global mean value pooling operation, thirdly, transversely connecting the processed target features and behavior features, and finally, inputting the target features and behavior features into a full-connection layer to finally obtain the spatial position and behavior category of the target; the specific process is as follows:

(1) Behavioral feature size conversion

S1.2, the behavior feature size obtained through calculation is {8,256,7,7}, the behavior feature size is converted into {1,8× 256,7,7} = {1,2048,7,7}, namely, alpha feature graphs are converted into the number of convolution kernels of a single feature graph, and the feature size conversion is completed;

(2) Feature global averaging pooling

Processing the target features and the behavior features after the size conversion by adopting global averaging pooling operation, wherein the pooled kernel size is {1, 7}, and the pooled feature sizes are {1,1,1,2048};

(3) Feature connection

Transversely connecting the target characteristics and the behavior characteristics processed in the two steps to obtain a connected characteristic length of 4096;

(4) Full connection operation

And inputting the characteristics obtained in the previous step into a full-connection layer to finally obtain the spatial position and behavior category of the target.

Model parameters pre-trained on a kinetic-400 data set are used in the operation process of the steps S1.1, S1.2 and S1.3;

s1.4 video structuring

The spatial position of the target T, the behavior category and the spatial characteristics of the target T obtained in the S1.3 are obtained to jointly construct descriptive information { location_T, spatial_feature_T, action_ID }' of the target T, wherein the behavior category comprises a plurality of categories of writing, drawing, walking, running, stretching limbs, playing basketball, playing football, dancing, swimming, riding a bicycle, handshaking, hugging, drinking water, eating things and mutual pushing; the video structuring comprises target matching and generating a descriptive information sequence of the target, and the specific process is as follows:

(1) Target matching

The method comprises the steps of realizing target tracking between two adjacent processing units by calculating cosine distances between all target space features generated by the two adjacent processing units, namely, target tracking between the last 16 frames and the current 16 frames, wherein a cosine distance calculation formula between all target space features generated by the two adjacent processing units is as follows:

wherein x is _i The ith spatial eigenvalue, y, of the x target generated for the last processing unit _i An ith space feature value of the y target generated for the current processing unit, wherein θ is an included angle between two space feature vectors;

when theta is smaller than threshold_theta, the targets are considered to be matched, and the y targets are marked with the number num_x of the x targets;

when theta is more than or equal to threshold_theta, the targets are considered to be unmatched, and the y target is marked as a new index num_y;

wherein threshold_θ=0.33 is the angle Threshold;

(2) Generating a sequence of descriptive information of a target

Storing the processing results into a corresponding structured database according to a time sequence, wherein the processing results comprise time, monitoring camera numbers, target numbers, behavior type marks and spatial features, namely a descriptive information sequence of a target T, and the descriptive information sequence comprises the following specific forms:

info_Seq_T＝{time_1,camera_ID,num_T,action_ID,spatial_Feature_T；…；time_i,camera_ID,num_T,action_ID,spatial_Feature_T；}

where i e n+.

The specific form of the track sequence of the student S in the step S2 is as follows:

traj_S＝{location_Time_1,lat_1,lon_1,ID_S；…；location_Time_i,lat_i,lon_i,ID_S；}

wherein i εN+;

the specific process of the space-time analysis in the step S3 is as follows:

s3.1 Global target retrieval

According to the spatial characteristics of the target T, performing global search on a plurality of cameras in a calibration area to generate a target by utilizing cosine distances, finding out all the describable information sequences of the target T in the L time period, and arranging according to a time sequence;

s3.2 space-time matching

(1) The time distribution of the monitoring cameras where all similar targets of the target T are located in the L time period and the installation positions of the monitoring cameras are utilized to construct a space-time matching sequence, and the space-time matching sequence is specifically formed in the following steps:

camera_Time_Seq_T_L＝{time_1,camera_m_lat,camera_m_lon；…；time_i,camera_n_lat,camera_n_lon；}

where i e (1, …, L), time_i is the i-th time point, camera_m_lat represents the mounting position-latitude of the monitoring camera m, and camera_m_lon represents the mounting position-longitude of the monitoring camera m;

(2) The track sequence of the student S in the L time period is cut out, and the specific form of the cut-out student track sequence is as follows:

traj_S_L＝{location_Time_1,lat_1,lon_1,ID_S；…；location_Time_i,lat_i,lon_i,ID_S；}

where i e (1, …, L), action_time_i represents the point in Time nearest to time_i, lat_i represents: the latitude of the student at the Time of action_time_i, lon_i represents: the calculation formula of the Euclidean distance between the longitude of the student at the Time of action_Time_i, camera_Time_Seq_T_L and traj_S_L is as follows:

If dist (S, T) is more than or equal to threshold_dist, the target T is not student S;

if dist (S, T) < threshold_dist, then it indicates that the target T is likely student S;

wherein the method comprises the steps ofThreshold_dist=0.8 is the euclidean distance Threshold, s _i Lat is the latitude of student S at the moment of action_Time_i, S _i LOn is the longitude of student S at the Time of action_Time_i, t _i Lat is the latitude of the target T at time_i, s _i _lon is the longitude of the target T at time_i;

(3) Sequentially comparing a plurality of intercepted student track sequences, screening out student F with the minimum dist (F, T) value, and identifying the student F as a target T to generate a student track-behavior sequence pair, wherein the student track-behavior sequence pair has the specific form:

where i.epsilon.1, …, L.

The low-frequency 3D residual error network structure comprises the following components: an Input layer Input, a convolutional layer Conv1, a pooling layer Pool1, and four residual blocks Res2, res3, res4, res5; the Input layer Input samples the accessed video stream, the sampling interval is 16, namely, one frame is selected as Input every 16 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; and finally outputting to obtain the target feature through Res2 to Res5 residual blocks.

Each residual block of four residual blocks Res2-Res5 of the low-frequency 3D residual network structure of the present invention is composed of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; through Res2 to Res5, a very deep network structure is built together by the main paths, and shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer, so that the problems of gradient disappearance and gradient explosion are solved, and good operation performance of the network is ensured while the deeper network is utilized to extract the characteristics.

The specific parameters of each layer in the low-frequency 3D residual error network are as follows: wherein the Input layer Input: the output size is {1,3,224,224}, the sampling interval is 16, and the video frame size is {3,224,224}; convolutional layer Conv1: the output size is {1,64,112,112}, the number of convolution kernels is 64, the step size is {1, 2}, and the size is {1,3,7,7}; pool layer Pool1: the output size is {1,64,56,56}, the pooling kernel step size is {1, 2}, the size is {1, 3}, and the mode is the maximum value;

Residual block Res2: the output size is 1,256,56,56,

residual block Res3: the output size is 1,512,28,28,

residual block Res4: the output size is 1,1024,14,14,

residual block Res5: the output size is 1,2048,7,7,

wherein the above numbers or symbols have the meaning: the video frame size "{3,224,224}" corresponds to { video frame channel number, video frame width, video frame height }; the convolution kernel size "{3,1,7,7}" corresponds to { [ number of video frame channels, ] video frame depth, video frame width, video frame height }, the value in brackets may not exist; the convolution kernel step size "{1, 2}" corresponds to { time step size, horizontal space step size, vertical space step size }; the pooling kernel size "{1, 3}" corresponds to { video frame depth, video frame width, video frame height }; the pooling kernel step size "{1, 2}" corresponds to { time step size, horizontal space step size, vertical space step size }; the output size (e.g., "{1,3,224,224 }") corresponds to { feature map depth, number of convolution kernels, feature map width, feature map height }, and the factor (e.g., "×6") after multiplication in the residual block parameters represents the number of repetitions of the convolution operation;

the high-frequency 3D residual error network comprises an Input layer Input, a convolution layer Conv1, a pooling layer Pool1 and four residual error blocks Res2, res3, res4 and Res5; the Input layer Input samples the accessed video stream, the sampling interval is 2, namely, one frame is selected as Input every 2 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; and finally outputting to obtain the behavior characteristics through Res2 to Res5 residual blocks.

Each residual block of four residual blocks Res2-Res5 of the high-frequency 3D residual network of the present invention is composed of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; through Res2 to Res5, a very deep network structure is built together by the main paths, and shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer, so that the problems of gradient disappearance and gradient explosion are solved, and good operation performance of the network is ensured while the deeper network is utilized to extract the characteristics.

The specific parameters of each layer in the high-frequency 3D residual error network are as follows: wherein the Input layer Input: the output size is {8,3,224,224}, the sampling interval is 2, and the image size is {3,224,224}; convolutional layer Conv1: the output size is {8,8,112,112}, the number of convolution kernels is 8, the step size is {1, 2}, and the size is {5,3,7,7}; pool layer Pool1: the output size is {8,8,56,56}, the pooling kernel step size is {1, 2}, the size is {1, 3}, and the mode is the maximum value {1,64,56,56};

Residual block Res2: the output size is 8,32,56,56,

residual block Res3: the output size is 8,64,28,28,

residual block Res4: the output size is 8,128,14,14,

residual block Res5: the output size is 8,256,7,7,

wherein the meaning of the number or symbol in each layer of specific parameters in the high-frequency 3D residual error network is the same as the meaning of the number or symbol in each layer of specific parameters in the low-frequency 3D residual error network;

compared with the prior art, the intelligent analysis method for the student behavior characteristics based on the multidimensional data is designed, and can be used for carrying out intelligent comprehensive analysis by combining data of multiple dimensions such as student behavior data generated based on video structuring, student positioning data calibrated based on wearable positioning equipment, detailed and multi-aspect text information and the like, helping schools to discover the behavior characteristics of each student, helping schools to discover abnormal behaviors of the students in time through long-time tracking and learning, and achieving rapid early warning.

Description of the drawings:

fig. 1 is a schematic diagram of a low frequency 3D residual network structure according to the present invention.

Fig. 2 is a schematic diagram of a high frequency 3D residual network structure according to the present invention.

Fig. 3 is a schematic block diagram of a process flow of the intelligent analysis method of student behavior characteristics based on multidimensional data.

The specific embodiment is as follows:

the invention is further illustrated by the following examples in conjunction with the accompanying drawings.

Example 1:

the embodiment relates to an intelligent analysis method for student behavior characteristics based on multidimensional data, which comprises the following specific process steps:

s1, video structuring process

The video structuring refers to a technology of performing deep learning processing means such as target segmentation, time sequence analysis, target recognition and the like on video contents according to semantic relations, analyzing and recognizing target information, and then organizing the target information into text information which can be understood by a computer and a person. Through video structuring processing, text information and video snapshot data are stored in corresponding structured databases, so that video searching speed can be greatly improved, video storage capacity can be reduced, application value of video data can be improved, and subsequent video data analysis and prediction can be facilitated;

the video structuring process takes continuous 16 frames as a processing unit, and the video frame in each processing unit defaults to 3 channels; the video structuring is processed by constructing a high-low frequency 3D residual neural network model, the high-low frequency 3D residual neural network model comprises a low-frequency 3D residual neural network and a high-frequency 3D residual neural network, the low-frequency 3D residual neural network performs personnel structuring processing, and the high-frequency 3D residual neural network performs behavior structuring processing; the specific process of the video structuring process is as follows:

S1.1 target feature extraction

Because the category of the detected target in the video is basically unchanged, the extraction of the target features only needs to depend on a single or a few video frames, in this step, the target feature extraction is performed by using a low-frequency 3D residual network structure, the sampling interval of the low-frequency 3D residual network structure on the video frames is set to be inv_l=16, the low-frequency 3D residual network structure is used for extracting the spatial and semantic information of the target, and the low-frequency 3D residual network structure is schematically shown in the following diagram:

the low-frequency 3D residual network structure of this embodiment includes: an Input layer Input, a convolutional layer Conv1, a pooling layer Pool1, and four residual blocks Res2, res3, res4, res5; the Input layer Input samples the accessed video stream, the sampling interval is 16, namely, one frame is selected as Input every 16 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; the target characteristics are finally obtained through the Res2 to Res5 residual blocks;

Each of the four residual blocks Res2-Res5 of the low-frequency 3D residual network structure in this embodiment is composed of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; through Res2 to Res5, a very deep network structure is built together by the main paths, and shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer, so that the problems of gradient disappearance and gradient explosion are solved, and good operation performance of the network is ensured while the deeper network extraction characteristics are utilized; the specific parameters of the low-frequency 3D residual network structure are shown in the following table:

wherein the numerical or symbolic meaning in the above table is: the video frame size "{3,224,224}" corresponds to { video frame channel number, video frame width, video frame height }; the convolution kernel size "{1,3,7,7}" corresponds to { [ number of video frame channels, ] video frame depth, video frame width, video frame height }, the value in brackets may not exist; the convolution kernel step size "{1, 2}" corresponds to { time step size, horizontal space step size, vertical space step size }; the pooling kernel size "{1, 3}" corresponds to { video frame depth, video frame width, video frame height }; the pooling kernel step size "{1, 2}" corresponds to { time step size, horizontal space step size, vertical space step size }; the output size (e.g., "{1,3,224,224 }") corresponds to { feature map depth, number of convolution kernels, feature map width, feature map height }, and the factor (e.g., "×6") after multiplication in the residual block parameters represents the number of repetitions of the convolution operation;

S1.2 behavioral characteristics extraction

Since the behavior of the target is sometimes changed in a very short time, the behavior feature extraction is performed by using the high-frequency 3D residual error network structure in the step, and the calculated and extracted behavior feature size is {8,256,7,7}; setting the sampling interval of a high-frequency 3D residual network structure to be inv_h=inv_l/alpha of a video frame, wherein alpha=8, and the number of convolution kernels is beta times that of the convolution kernels in a low-frequency 3D residual network, wherein beta=1/8; the high-frequency 3D residual network structure has higher time resolution and fewer convolution kernels, the structure is beneficial to the development of useful time sequence information of a target at a higher speed, the high-frequency 3D residual network structure is schematically shown in figure 2,

the high-frequency 3D residual network described in this embodiment includes an Input layer Input, a convolutional layer Conv1, a pooling layer Pool1, and four residual blocks Res2, res3, res4, res5; the Input layer Input samples the accessed video stream, the sampling interval is 2, namely, one frame is selected as Input every 2 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; and finally outputting to obtain low-resolution behavior characteristics through Res2 to Res5 residual blocks.

Each residual block in the four residual blocks Res2-Res5 of the high-frequency 3D residual network in this embodiment is composed of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; through Res2 to Res5, a very deep network structure is built together by the main paths, and shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer, so that the problems of gradient disappearance and gradient explosion are solved, and good operation performance of the network is ensured while the deeper network extraction characteristics are utilized; the high frequency 3D residual network parameters are shown in the following table:

wherein the meaning of the numbers or symbols in the table is the same as the meaning of the numbers or symbols in the table S1.1;

s1.3 video classification

Firstly, performing size conversion on behavior features through matrix operation, secondly, processing the target features and the behavior features after size conversion by global mean value pooling operation, performing transverse connection on the processed target features and behavior features again, and finally inputting the target features and behavior features into a full-connection layer to finally obtain the spatial position and behavior category of the target; the specific process is as follows:

(1) Behavioral feature size conversion

The behavioral characteristic size obtained by S1.2 calculation is {8,256,7,7}, in order to realize characteristic connection with the target characteristic obtained by S1.1 calculation, the size conversion is needed, the behavioral characteristic size is converted into {1,8× 256,7,7} = {1,2048,7,7}, namely, the number of convolution kernels of alpha characteristic graphs is converted into a single characteristic graph through matrix operation, and the characteristic size conversion is completed;

(2) Feature global averaging pooling

Processing the target features and the behavior features after the size conversion by adopting global averaging operation, wherein the sizes of the pooled cores {1, 7}, and the characteristic sizes after pooling are {1,1,1,2048};

(3) Feature connection

(4) Full connection operation

s1.4 video structuring

The spatial position of the target T, the behavior category and the spatial characteristics of the target T obtained in the S1.3 are obtained to jointly construct descriptive information { location_T, spatial_feature_T, action_ID }. For video structuring, wherein the behavior category comprises a plurality of categories such as writing, drawing, walking, running, stretching limbs, basketball playing, football playing, dancing, swimming, bicycle riding, handshake, hugging, drinking, eating, mutual pushing and the like; the video structuring comprises target matching and generating a descriptive information sequence of the target, and the specific process is as follows:

(1) Target matching

where threshold_θ=0.33 is the angle Threshold.

(2) Generating a sequence of descriptive information of a target

wherein i εN+;

S2, collecting real-time positioning data of students;

the student wears a positioning device (positioning bracelet/positioner) with the positioning functions of GPS, beidou, wiFi and base station, positions the student at a fixed frequency, stores positioning results, positioning time and student ID information into a database, and generates a track sequence of the student S; the specific form of the trajectory sequence of student S is:

wherein i εN+;

s3, space-time studying and judging analysis

According to the spatial characteristics of the target T, searching all the descriptive information sequences of the target T in the L time period, and arranging the descriptive information sequences according to the time sequence; constructing a space-time matching sequence by using the time distribution of the monitoring cameras where all similar targets of the target T are located in the L time period and the installation positions of the monitoring cameras, intercepting the track sequences of a plurality of students similar to each other in the L time period, and finally screening out the track-behavior sequence pairs of the students from the track sequences of the students, wherein the specific process is as follows:

s3.1 Global target retrieval

S3.2 space-time matching

where threshold_dist=0.8 is the euclidean distance Threshold, s _i Lat is the latitude of student S at the moment of action_Time_i, S _i LOn is the longitude of student S at the Time of action_Time_i, t _i Lat is the latitude of the target T at time_i, s _i _lon is the longitude of the target T at time_i;

where i.epsilon.1, …, L.

S4, student behavior feature analysis

Drawing and generating a daily activity rule chart of the student by utilizing the student track-behavior sequence pair obtained in the S3 and combining with the time schedule of the student in the T time period, the school work and rest time schedule, the basic information of the student and the like, and carrying out data mining to find the activity hobbies of the student in the school period, so that a teacher is helped to effectively improve the teaching plan according to the characteristics of different students; by carrying out long-time rule statistics on the track-behavior sequence pairs of the students and combining with human supervision, a data prediction alarm function is constructed, whether abnormal behaviors exist or not is judged, such as long-time aggregation, loitering behaviors, track deviation and the like, real-time early warning is achieved, and accidents are prevented.

According to the student behavior characteristic intelligent analysis method based on the multidimensional data, student behavior and track original data are automatically collected through terminal equipment, multiple deep learning methods and video structuring technology are utilized to construct student behavior-track sequence pairs, the student behavior data network is constructed in combination with other text information, the data mining technology is utilized to conduct deep analysis and mining on the student behavior characteristics, school is helped to discover interests of students and activity rules during school, meanwhile, data prediction functions are utilized to achieve real-time early warning of abnormal behavior tracks, and early intervention is achieved. The invention has environment-friendly application and great market prospect.

The student behavior characteristic intelligent analysis method based on the multidimensional data can be suitable for performing behavior characteristic analysis on mental patients in mental hospitals, criminals in prisons and the like; the condition, the health recovery degree and the like of the mental patient are intelligently judged through daily performance of the mental patient, and further a further treatment scheme is made according to the analysis result of the mental patient; or by collecting and analyzing the daily behaviors of the criminals in the prison, the improvement degree of the criminals is determined, so that whether the criminals are suitable for criminal reduction and other measures is judged.

Claims

1. A student behavior characteristic intelligent analysis method based on multidimensional data is characterized in that: the specific process steps are as follows:

s1, video structuring process

Taking continuous 16 frames as a processing unit, defaulting a video frame in each processing unit to be 3 channels, constructing a high-low frequency 3D residual neutral network model for video structuring, storing text information and video snapshot data into corresponding structuring databases, wherein the high-low frequency 3D residual neutral network model comprises a low frequency 3D residual network structure and a high frequency 3D residual network structure, the low frequency 3D residual network structure performs personnel structuring processing to extract target features, and the high frequency 3D residual network structure performs behavior structuring processing to extract behavior features; connecting the target features with the behavior features, and processing to obtain the spatial position and behavior category of the target T;

The sampling interval of the low-frequency 3D residual error network structure on the video frame is set to be inv_l=16, and the low-frequency 3D residual error network structure is used for extracting the space and semantic information of a target; setting the sampling interval of a high-frequency 3D residual network structure to be inv_h=inv_l/alpha of a video frame, wherein alpha=8, and the number of convolution kernels is beta times that of the convolution kernels in a low-frequency 3D residual network, wherein beta=1/8;

s2, collecting real-time positioning data of students;

s3, space-time studying and judging analysis

Extracting the spatial characteristics of the target T by using the spatial position of the target T, searching all similar targets of the target T in the L time period according to similarity matching, arranging the spatial characteristics, the behavior types and the installation positions of the monitoring cameras of each similar target according to time sequence, and constructing a descriptive information sequence of the target T; intercepting track sequences of a plurality of students in an L time period, combining a descriptive information sequence of a target T, and screening out a track sequence of the student S with the highest matching degree with the target T from the track sequences of the plurality of students by utilizing track matching, namely, recognizing the target T as the student S, thereby constructing a track-behavior sequence pair of the student S;

S4, student behavior feature analysis

Drawing and generating a daily activity rule chart of the student by utilizing the student track-behavior sequence pair obtained in the step S3 and combining with the school time schedule, school work and rest schedule, basic student information and the like of the student in the L time period, and carrying out data mining to find out activity hobbies of the student in the school period, so as to help a teacher to effectively improve a teaching plan according to the characteristics of different students; by carrying out long-time rule statistics on the track-behavior sequence pairs of the students and combining with human supervision, a data prediction alarm function is constructed, whether the students have abnormal behaviors or not is judged, the abnormal behaviors comprise long-time aggregation, loitering behaviors and track deviation, real-time early warning is achieved, and accidents are prevented.

2. The multi-dimensional data based student behavior feature intelligent analysis method of claim 1, wherein: the specific process of the video structuring processing in the step S1 is as follows:

s1.1 target feature extraction

Extracting target features by using a low-frequency 3D residual error network structure;

s1.2 behavioral characteristics extraction

Performing behavior feature extraction by using a high-frequency 3D residual error network structure, and calculating the extracted behavior feature size to be {8,256,7,7};

s1.3 video classification

Firstly, performing size conversion on behavior features through matrix operation, secondly, processing target features and behavior features after size conversion through global mean value pooling operation, thirdly, transversely connecting the processed target features and behavior features, and finally, inputting the target features and behavior features into a full-connection layer to finally obtain the spatial position and behavior category of the target;

s1.4 video structuring

The spatial position of the target T, the behavior category and the spatial characteristics of the target T obtained in the S1.3 are obtained to jointly construct descriptive information { location_T, spatial_feature_T, action_ID }' of the target T, wherein the behavior category comprises a plurality of categories of writing, drawing, walking, running, stretching limbs, playing basketball, playing football, dancing, swimming, riding a bicycle, handshaking, hugging, drinking water, eating things and mutual pushing; the video structuring includes target matching and generating a sequence of descriptive information of the target.

3. The multi-dimensional data based student behavior feature intelligent analysis method of claim 2, wherein: the specific process of the S1.3 step video classification is as follows:

(1) Behavioral feature size conversion

(2) Feature global averaging pooling

(3) Feature connection

(4) Full connection operation

Inputting the features obtained in the previous step into a full-connection layer to finally obtain the spatial position and behavior category of the target;

and pre-trained model parameters on a kinetic-400 data set are used in the operation process of the steps S1.1, S1.2 and S1.3.

4. A multi-dimensional data based student behavioral characteristics intelligent analysis method according to claim 2 or claim 3, wherein: the specific process of the S1.4 step video structuring is as follows:

(1) Target matching

Wherein x is _i The ith spatial eigenvalue, y, of the x target generated for the last processing unit _i The ith spatial feature value of the y-object generated for the current processing unit, θ is twoIncluded angles between the individual spatial feature vectors;

wherein threshold_θ=0.33 is the angle Threshold;

(2) Generating a sequence of descriptive information of a target

info_Seq_T＝{time_1,camera_ID,num_T,action_ID,spatial_Feature_T；…；

time_i,camera_ID,num_T,action_ID,spatial_Feature_T；}

where i e n+.

5. The intelligent analysis method for student behavior characteristics based on multidimensional data according to claim 4, wherein: the specific form of the track sequence of the student S in the step S2 is as follows:

traj_s= { location_time_1, lat_1, lon_1, id_s; …; location_time_i, lat_i, lon_i, id_s; where i e n+.

6. The intelligent analysis method for student behavior characteristics based on multidimensional data according to claim 5, wherein: the specific process of the space-time analysis in the step S3 is as follows:

S3.1 Global target retrieval

s3.2 space-time matching

camera_Time_Seq_T_L＝{time_1,camera_m_lat,camera_m_lon；…；

time_i,camera_n_lat,camera_n_lon；}

traj_s_l= { location_time_1, lat_1, lon_1, id_s; …; location_time_i, lat_i, lon_i, id_s; where i e (1, …, L), action_time_i represents the point in Time nearest to time_i, lat_i represents: the latitude of the student at the Time of action_time_i, lon_i represents: the calculation formula of the Euclidean distance between the longitude of the student at the Time of action_Time_i, camera_Time_Seq_T_L and traj_S_L is as follows:

where i.epsilon.1, …, L.

7. The intelligent analysis method for student behavior characteristics based on multidimensional data according to claim 6, wherein: the low frequency 3D residual network structure comprises: an Input layer Input, a convolutional layer Conv1, a pooling layer Pool1, and four residual blocks Res2, res3, res4, res5; the Input layer Input samples the accessed video stream, the sampling interval is 16, namely, one frame is selected as Input every 16 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; the target characteristics are finally obtained through the Res2 to Res5 residual blocks;

Each of the four residual blocks Res2-Res5 of the low frequency 3D residual network structure consists of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; the main paths of the two paths together construct a very deep network structure through Res2 to Res5, and the shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer.

8. The multi-dimensional data based student behavior feature intelligent analysis method of claim 7, wherein: specific parameters of each layer in the low-frequency 3D residual error network are as follows: wherein the Input layer Input: the output size is {1,3,224,224}, the sampling interval is 16, and the video frame size is {3,224,224}; convolutional layer Conv1: the output size is {1,64,112,112}, the number of convolution kernels is 64, the step size is {1, 2}, and the size is {1,3,7,7}; pool layer Pool1: the output size is {1,64,56,56}, the pooling kernel step size is {1, 2}, the size is {1, 3}, and the mode is the maximum value;

Residual block Res2: the output size is 1,256,56,56,

residual block Res3: the output size is 1,512,28,28,

residual block Res4: the output size is 1,1024,14,14,

residual block Res5: the output size is 1,2048,7,7,

/>

wherein the factor after multiplication in the residual block parameter represents the number of repetitions of the convolution operation.

9. The multi-dimensional data based student behavior feature intelligent analysis method of claim 8, wherein: the high-frequency 3D residual error network comprises an Input layer Input, a convolution layer Conv1, a pooling layer Pool1 and four residual error blocks Res2, res3, res4 and Res5; the Input layer Input samples the accessed video stream, the sampling interval is 2, namely, one frame is selected as Input every 2 frames, the size of the Input video frame is 224 multiplied by 224, and the Input video frame is defaulted to be three channels; the output end of the Input layer Input is connected with a convolution layer Conv1, and the convolution layer Conv1 is used for extracting high-resolution features; the output end of the convolution layer Conv1 is connected with a pooling layer Pool1; the pooling layer Pool1 adopts a maximum pooling method to downsample the feature map, so that the feature quantity is reduced, and the excessive calculated amount is prevented; the output end of the pooling layer Pool1 is connected with residual blocks, and four residual blocks Res2, res3, res4 and Res5 are sequentially connected; finally outputting to obtain behavior characteristics through Res2 to Res5 residual blocks;

Each of the four residual blocks Res2-Res5 of the high frequency 3D residual network consists of a main path and a shortcut: the main path is composed of N multiplied by 3 convolution layers, wherein N represents the repetition times of 3 convolution layers in residual blocks, and the number and the size of convolution kernels of each convolution layer in different residual blocks are also different; the shortcuts are one convolution layer, the shortcuts directly transmit information to the deep layer of the network, the sizes of convolution kernels in different residual blocks are all 1 multiplied by 1, and the convolution kernels are different in size; the main paths of the two paths together construct a very deep network structure through Res2 to Res5, and the shortcuts in each residual block skip one layer or multiple layers and enter the deep network layer.

10. The multi-dimensional data based student behavior feature intelligent analysis method of claim 9, wherein: specific parameters of each layer in the high-frequency 3D residual error network are as follows: wherein the Input layer Input: the output size is {8,3,224,224}, the sampling interval is 2, and the image size is {3,224,224}; convolutional layer Conv1: the output size is {8,8,112,112}, the number of convolution kernels is 8, the step size is {1, 2}, and the size is {5,3,7,7}; pool layer Pool1: the output size is {8,8,56,56}, the pooling kernel step size is {1, 2}, the size is {1, 3}, and the mode is the maximum value {1,64,56,56};

Residual block Res2: the output size is 8,32,56,56,

residual block Res3: the output size is 8,64,28,28,

residual block Res4: the output size is 8,128,14,14,

residual block Res5: the output size is 8,256,7,7,