CN116758631A

CN116758631A - Big data driven behavior intelligent analysis method and system

Info

Publication number: CN116758631A
Application number: CN202310692726.5A
Authority: CN
Inventors: 孟冠宇; 孙铭阳
Original assignee: Individual
Current assignee: Hangzhou Chasing Video Technology Co ltd
Priority date: 2023-06-13
Filing date: 2023-06-13
Publication date: 2023-09-15
Anticipated expiration: 2043-06-13
Also published as: CN116758631B

Abstract

The invention provides a big data driven intelligent analysis method and system for behaviors, and belongs to the technical field of behavior recognition. Firstly, establishing a data set of irregular behavior data of tourists in a tourist attraction; secondly, inputting the training data set into a multi-scale attention mechanism network for training to obtain an initial classification model of the nonstandard behavior; then inputting the test data set into an initial classification model of the nonstandard behavior for testing, and outputting a final classification model of the nonstandard behavior when the set condition is met; and finally, inputting the acquired video or image of the guest behavior into the non-standard behavior final classification model for prediction, and generating a behavior recognition result. The invention combines the multi-scale characteristics to improve the robustness of the algorithm to the interference of scale change, rotation, shielding and the like, enriches the behavior information of tourists and ensures that the algorithm is easier to converge; in addition, an attention mechanism is used for distinguishing important behavior characteristics from irrelevant behavior characteristics, a target area is focused, and accuracy of a recognition result is improved.

Description

Big data driven behavior intelligent analysis method and system

Technical Field

The invention belongs to the technical field of behavior recognition, and particularly relates to a big data driven intelligent behavior analysis method and system.

Background

Behavior recognition technology is a technology for deducing the intention, state or identity of a human or an object by analyzing and interpreting its behavior patterns, actions and activities, and mainly depends on data acquisition and processing and application of machine learning and pattern recognition algorithms, such as human motion recognition, gesture analysis, emotion recognition, driving behavior recognition, intelligent monitoring, etc. In recent years, with the improvement of the living standard of people, the number of people visiting a scenic spot is gradually increased, the irregular behaviors in the scenic spot are increased, and the promotion of the experience of tourists is promoted, so that the suggestions of the environment of the scenic spot and the integrity of cultural heritage are protected, and the recognition of the tour behaviors is a hot topic of people.

At present, the travel behavior supervision mainly depends on manual patrol and a monitoring camera, firstly, the manual patrol has limited range, can not cover the whole scenic spot, has limited manpower resources, and can not realize real-time supervision; secondly, conventional monitoring cameras often only provide video recordings, require manual analysis and judgment, are time-consuming and are prone to error. In addition, the existing behavior recognition method still has challenges in terms of accuracy and efficiency, and cannot meet the timely recognition requirement of nonstandard behaviors.

Disclosure of Invention

Based on the technical problems, the invention provides a big data driven behavior intelligent analysis method and a big data driven behavior intelligent analysis system, so that the accuracy of video and/or image identification is improved, the artificial work consumption is reduced, and the efficiency is improved.

The invention provides a big data driven behavior intelligent analysis method, which comprises the following steps:

step S1: constructing a data set based on the non-canonical behavior data of tourists in the tourist attraction; the data set comprises photos and/or videos marked with nonstandard behavior categories; dividing the data set into a training data set and a test data set according to a certain proportion;

step S2: inputting the training data set into a multi-scale attention mechanism network for training to obtain an initial non-canonical behavior classification model; the multi-scale attention mechanism network comprises two standard convolution modules, a depth separable convolution module, three self-attention mechanism modules, a space attention mechanism module, a global average pooling module and a Softmax classifier;

the two standard convolution modules are a first standard convolution module and a second standard convolution module respectively;

the step S2 specifically comprises the following steps:

step S21: carrying out data enhancement on the images in the training data set, and inputting the enhanced images into the first standard convolution module for training to obtain a standard convolution module feature map;

Step S22: respectively inputting the standard convolution module feature images into three self-attention modules to perform feature fusion to obtain a multi-scale module feature image;

step S23: sequentially inputting the multi-scale module feature map to the second standard convolution module and the spatial attention mechanism module to extract deep features so as to obtain a spatial attention feature map;

step S24: sequentially inputting the spatial attention feature map to the depth separable convolution module, the global average pooling module and the Softmax classifier for classification to obtain the non-canonical behavior initial classification model;

step S3: inputting the test data set into the initial non-standard behavior classification model for testing, and outputting a final non-standard behavior classification model until the set condition is met;

step S4: and inputting the acquired video and/or image of the guest behavior into the non-canonical behavior final classification model to predict, and generating a behavior recognition result.

In some preferred embodiments, in the foregoing big data driven behavioral intelligent analysis method, the data enhancing the image in the training dataset, and inputting the enhanced image into the first standard convolution module for training, to obtain a standard convolution module feature map, specifically including:

The first standard convolution module comprises a first standard convolution layer, a first normalized activation layer, a second standard convolution layer and a second normalized activation layer;

step S211: at least one strategy of random rotation, random translation in a horizontal direction, random translation in a vertical direction, random scaling and random horizontal overturning is used for carrying out data enhancement on the images in the training data set;

step S212: inputting the enhanced image into the first standard convolution layer to carry out convolution operation to obtain a feature map F1;

step S213: inputting the feature map F1 into the first normalized activation layer to perform normalization and activation operations to obtain a feature map F2;

step S214: inputting the characteristic diagram F2 into the second standard convolution layer to carry out convolution operation to obtain a characteristic diagram F3;

step S215: and inputting the feature map F3 into the second normalized activation layer to perform normalization and activation operations to obtain a standard convolution module feature map.

In some preferred embodiments, in the foregoing big data driven behavioral intelligent analysis method, the inputting the standard convolution module feature map to three self-attention modules respectively to perform feature fusion, to obtain a multi-scale module feature map specifically includes:

Step S221: inputting the standard convolution module feature map into a self-attention module Block2 for feature extraction to obtain a first scale feature map S17;

step S222: inputting the standard convolution module feature map into a self-attention module Block3 for feature extraction to obtain a second scale feature map T17;

step S223: inputting the standard convolution module feature map into a self-attention module Block4 for feature extraction to obtain a third scale feature map M17;

step S224: the feature map S17, the feature map T17 and the feature map M17 are respectively input into a fourth element-by-element convolution layer, a fifth element-by-element convolution layer and a sixth element-by-element convolution layer to carry out convolution operation, so that feature maps S18, T18 and M18 are obtained;

step S225: and adding the feature map S18, the feature map T18 and the feature map M18 element by element to obtain the multi-scale module feature map.

In some preferred embodiments, in the foregoing big data driven behavioral intelligent analysis method, the inputting the multi-scale module feature map to the second standard convolution module and the spatial attention mechanism module sequentially performs deep feature extraction to obtain a spatial attention feature map, which specifically includes:

The spatial attention mechanism module comprises a seventh element-by-element convolution layer and a seventh activation function layer;

step S231: inputting the multi-scale module feature map into the second standard convolution module to perform feature extraction to obtain a feature map P4;

step S232: inputting the characteristic map P4 into the seventh element-by-element convolution layer to carry out convolution operation to obtain a characteristic map P5;

step S233: inputting the feature map P5 to the seventh activation function layer to obtain a feature map P6;

step S234: and multiplying the feature map P4 and the feature map P6 element by element to obtain the spatial attention feature map.

In some preferred embodiments, in the big data driven behavior intelligent analysis method, the step of inputting the spatial attention feature map into the depth separable convolution module, the global average pooling module and the Softmax classifier in sequence to classify the spatial attention feature map to obtain an unnormalized behavior initial classification model specifically includes:

step S241: inputting the spatial attention feature map to the depth separable convolution module for feature extraction to obtain a feature map N4;

step S242: inputting the feature map N4 into the global average pooling module for pooling operation to obtain a feature vector V;

Step S243: inputting the feature vector V into a Softmax classifier to classify, so as to obtain probability distribution of each class;

step S244: and selecting the category with the highest probability as a final classification result to obtain an initial classification model of the nonstandard behavior.

In some preferred embodiments, in the big data driven behavior intelligent analysis method, the inputting the test dataset into the initial non-canonical behavior classification model for testing, until the set condition is met, outputting a final non-canonical behavior classification model, specifically including:

step S31: inputting the test data set into the initial non-canonical behavior classification model for testing, and calculating classification loss and classification accuracy;

step S32: judging whether the iteration times are smaller than or equal to a set value; if the iteration times are greater than the set value, returning to the step S2; if the iteration times are smaller than or equal to the set value, continuing to judge whether the classification loss is smaller than or equal to a first threshold value; returning to step S2 if the classification loss is greater than the first threshold; if the classification loss is less than or equal to the first threshold, continuing to determine whether the classification accuracy is greater than or equal to a second threshold; if the classification accuracy is less than the second threshold, returning to step S2; and if the classification accuracy is greater than or equal to the second threshold value and all the set conditions are met, outputting the non-canonical behavior final classification model.

The invention also provides a big data driven intelligent behavioral analysis system, which comprises:

the data set construction module is used for constructing a data set based on the non-standard behavior data of tourists in the tourist attraction; the data set comprises photos and/or videos marked with nonstandard behavior categories; dividing the data set into a training data set and a test data set according to a certain proportion;

the network training module is used for inputting the training data set into a multi-scale attention mechanism network for training to obtain an initial non-standard behavior classification model; the multi-scale attention mechanism network comprises two standard convolution modules, a depth separable convolution module, three self-attention mechanism modules, a space attention mechanism module, a global average pooling module and a Softmax classifier; the two standard convolution modules are a first standard convolution module and a second standard convolution module respectively;

the network training module specifically comprises:

the data enhancer module is used for enhancing the data of the images in the training data set, inputting the enhanced images into the first standard convolution module for training, and obtaining a standard convolution module feature diagram;

The multi-scale feature fusion sub-module is used for respectively inputting the feature graphs of the standard convolution modules into the three self-attention modules to perform feature fusion to obtain a multi-scale module feature graph;

the spatial attention sub-module is used for sequentially inputting the multi-scale module feature map to the second standard convolution module and the spatial attention mechanism module to extract deep features so as to obtain a spatial attention feature map;

the classification sub-module is used for inputting the spatial attention feature map into the depth separable convolution module, the global average pooling module and the Softmax classifier in sequence for classification to obtain the non-canonical behavior initial classification model;

the test module is used for inputting the test data set into the non-standard behavior initial classification model for testing, and outputting a non-standard behavior final classification model when the set condition is met;

the behavior recognition module is used for inputting the collected video and/or image of the guest behavior into the non-standard behavior final classification model to predict, and generating a behavior recognition result.

In some preferred embodiments, in the big data driven behavioral intelligent analysis system, the data enhancer module specifically includes:

The data enhancement operation unit uses at least one strategy of random rotation, random translation in the horizontal direction, random translation in the vertical direction, random scaling and random horizontal overturn of the images and is used for enhancing the data of the images in the training data set;

the first standard convolution layer operation unit is used for inputting the enhanced image into the first standard convolution layer to carry out convolution operation to obtain a feature map F1;

the first normalized activation layer operation unit is used for inputting the feature map F1 into the first normalized activation layer to perform normalization and activation operations to obtain a feature map F2;

the second standard convolution layer operation unit is used for inputting the characteristic diagram F2 into the second standard convolution layer to carry out convolution operation to obtain a characteristic diagram F3;

and the second normalized activation layer operation unit is used for inputting the feature map F3 into the second normalized activation layer to perform normalization and activation operations, so as to obtain a standard convolution module feature map.

In some preferred embodiments, in the foregoing big data driven behavioral intelligent analysis system, the multi-scale feature fusion submodule specifically includes:

the first scale feature extraction unit is used for inputting the standard convolution module feature map into the self-attention module Block2 for feature extraction to obtain a first scale feature map S17;

The second scale feature extraction unit is used for inputting the standard convolution module feature diagram into the self-attention module Block3 for feature extraction to obtain a second scale feature diagram T17;

the third scale feature extraction unit is used for inputting the standard convolution module feature map into the self-attention module Block4 for feature extraction to obtain a third scale feature map M17;

the element-by-element convolution operation unit is used for respectively inputting the feature map S17, the feature map T17 and the feature map M17 into a fourth element-by-element convolution layer, a fifth element-by-element convolution layer and a sixth element-by-element convolution layer to carry out convolution operation to obtain feature maps S18, T18 and M18;

and the element-by-element adding unit is used for adding the feature map S18, the feature map T18 and the feature map M18 element by element to obtain the multi-scale module feature map.

In some preferred embodiments, in the big data driven behavioral intelligent analysis system described above, the spatial attention sub-module specifically includes:

the second standard convolution module operation unit is used for inputting the multi-scale module feature map into the second standard convolution module to perform feature extraction to obtain a feature map P4;

A seventh element-by-element convolution layer operation unit, configured to input the feature map P4 to the seventh element-by-element convolution layer to perform convolution operation, so as to obtain a feature map P5;

a seventh activation function layer operation unit, configured to input the feature map P5 to the seventh activation function layer, to obtain a feature map P6;

and the element-by-element multiplication unit is used for multiplying the feature map P4 and the feature map P6 element by element to obtain the spatial attention feature map.

Compared with the prior art, the invention has the following beneficial effects:

the invention uses a multi-scale feature fusion mode, the multi-scale features can capture features of different scales by using different convolution kernel sizes, the smaller convolution kernel can better capture local detail information, and the larger convolution kernel can capture wider context environment and background information; feature extraction of different convolution kernel sizes may capture different levels of semantic information, from low-level local features to high-level global features. The complexity and diversity of the input data can be better described by comprehensively utilizing the multi-scale features.

The invention uses a self-attention mechanism and a spatial attention mechanism mode, wherein the self-attention mechanism models according to the association between different positions in the image, and extracts key information in the image; the spatial attention mechanism can highlight important areas or features in the image or the feature map, and neglect secondary or irrelevant areas, so that compared with the traditional recognition method, the accuracy of behavior recognition is improved, the artificial work consumption is reduced, and the efficiency is improved.

Drawings

FIG. 1 is a flow chart of a big data driven intelligent behavioral analysis method of the present invention;

FIG. 2 is a network structure diagram of the big data driven behavior intelligent analysis method of the present invention;

FIG. 3 is a Block2 network structure diagram of a self-attention module of the big data driven behavior intelligent analysis method of the present invention;

FIG. 4 is a Block3 network structure diagram of a self-attention module of the big data driven behavior intelligent analysis method of the invention;

FIG. 5 is a Block4 network architecture diagram of a self-attention module of the big data driven behavior intelligent analysis method of the present invention;

FIG. 6 is a block diagram of a big data driven behavioral intelligent analysis system of the present invention.

Detailed Description

The invention is further described below in connection with specific embodiments and the accompanying drawings, but the invention is not limited to these embodiments.

Example 1

As shown in fig. 1, the present invention provides a big data driven behavior intelligent analysis method, comprising:

step S1: constructing a data set based on the non-canonical behavior data of tourists in the tourist attraction; the data set comprises photos and/or videos marked with nonstandard behavior categories; the data set is divided into a training data set and a test data set according to a certain proportion.

Step S2: inputting the training data set into a multi-scale attention mechanism network for training to obtain an initial non-canonical behavior classification model; the multi-scale attention mechanism network comprises two standard convolution modules, a depth separable convolution module, three self-attention mechanism modules, a space attention mechanism module, a global average pooling module and a Softmax classifier, wherein the two standard convolution modules are a first standard convolution module and a second standard convolution module respectively.

The step S2 specifically comprises the following steps:

step S21: and carrying out data enhancement on the images in the training data set, and inputting the enhanced images into a first standard convolution module for training to obtain a standard convolution module characteristic diagram.

Step S22: and respectively inputting the standard convolution module feature images into three self-attention modules to perform feature fusion to obtain a multi-scale module feature image.

Step S23: and sequentially inputting the multi-scale module feature map to a second standard convolution module and a spatial attention mechanism module to extract deep features so as to obtain a spatial attention feature map.

Step S24: and sequentially inputting the spatial attention feature map into a depth separable convolution module, a global average pooling module and a Softmax classifier for classification to obtain an initial non-canonical behavior classification model.

Step S3: and inputting the test data set into the initial non-standard behavior classification model for testing, and outputting the final non-standard behavior classification model until the set condition is met.

Step S4: and inputting the acquired video and/or image of the guest behavior into an irregular behavior final classification model for prediction, and generating a behavior recognition result.

The steps are discussed in detail below:

step S1: constructing a data set based on the non-canonical behavior data of tourists in the tourist attraction; the invention discloses a method for converting a video into an image data set, which comprises the steps of collecting a data set formed by nonstandard behavior data by video collecting equipment, taking a frame every 4 frames or a frame every 8 frames at intervals according to a certain interval, converting the video into the image data set, cutting again if the obtained image frame number is smaller than a set frame value in the conversion process, continuously reducing the interval frame number until the cut frame number is larger than the set frame value, adjusting the sizes of all obtained image frames to be a specified size, particularly cutting according to the network model requirement input, and dividing the adjusted image frame data set into a training data set and a test data set according to a certain proportion, wherein the training data set and the test data set are as follows: 2, the input network model size is 256×256×16, the non-canonical behavior categories are six categories of painting, smoking at non-smoking places, throwing garbage randomly, picking flowers and fruits, climbing trees, ancient cultural relics and destroying service facilities, which are just one specific example, including but not limited to the above.

The method is executed on a cloud computing platform or a server, and provides strong computing and storage capacity, and complex data analysis, deep learning algorithm and complex model training can be executed, so that accurate prediction, insight and decision support are provided.

Step S21: carrying out data enhancement on images in the training data set, and inputting the enhanced images into a first standard convolution module for training to obtain a standard convolution module feature map; the first standard convolution module includes a first standard convolution layer, a first normalized activation layer, a second standard convolution layer, and a second normalized activation layer.

In fig. 2-5 Conv3D represents a standard convolutional layer, with convolutional kernel sizes of 3 x 3, 5 x 5, and 7 x 7 being selectable; conv3D convolution kernel size is 1×1 x 1 represents an element-by-element convolution layer; strides represents the step size, and takes on the value of 1 or 2; the normalized Activation layer comprises a batch normalization layer (Batchnormalization) and an Activation function layer (Activation (Relu)), wherein the normalized Activation function layer selects a Relu Activation function, and an individual Activation function layer (alpha) takes a value of Relu or Sigmoid; block represents a self-attention module of the Block, the value range is [1,3], and the value range is an integer; con3Dtranspose represents a transposed convolutional layer, convolution kernel size may be selecting 3×3×3 selecting 3×3 x 3; GA-Pooling3D represents a global average Pooling layer; reshape stands for flattening layer; dense stands for full connectivity layer; sepConv3D represents a depth separable convolutional layer with convolutional kernel sizes selected from 3 x 3, 5 x 5, and 7 x 7; globalaeragepooling 3D represents a global average pooling module; maxpooling3D represents the maximum pooling layer; fa represents a characteristic diagram of a first standard convolution module, the value range of a is [1,3], and a is an integer; sb represents each feature map obtained from the self-attention module Block2, the value range of b is [1, 17], and b is an integer; tc represents each characteristic diagram obtained from the self-attention module Block3, the value range of c is [1, 17], and c is an integer; md represents each feature map obtained from the attention module Block4, the value range of d is [1, 17], and d is an integer; pe represents each feature map obtained in the second standard convolution module Block5, the value range of e is [1,6], and e is an integer; nf represents each feature map obtained in the depth separable convolution module Block6, the value range of f is [1,4], and f is an integer.

As shown in fig. 2, step S21 specifically includes:

step S211: at least one strategy of random rotation, random translation in the horizontal direction, random translation in the vertical direction, random scaling and random horizontal overturning is used for carrying out data enhancement on the images in the training data set; the random rotation angle of the invention is 30 degrees, the random translation proportion of the horizontal and vertical directions is 0.1, the random scaling proportion is 0.2, and the random horizontal flip image parameter is set to True, which is not shown in fig. 2.

Step S212: inputting the enhanced image 256×256×16×3 to a first standard convolution layer for convolution operation to obtain a feature map F1; the number of convolution kernels of the first standard convolution layer is 16, the size of the convolution kernels is 3 multiplied by 3, and the step length is 2; the feature map F1 is 128×128×8 of 16 channels.

Step S213: inputting the feature map F1 into a first normalized activation layer for normalization and activation operation to obtain a feature map F2; the feature map F2 is 128×128×8 of 16 channels.

Step S214: inputting the feature map F2 into a second standard convolution layer to carry out convolution operation to obtain a feature map F3; the number of convolution kernels of the second standard convolution layer is 32, the size of the convolution kernels is 3 multiplied by 3, and the step length is 2; the feature map F3 is 64×64×4 of 32 channels.

Step S215: and inputting the feature map F3 into a second normalized activation layer to perform normalization and activation operations, so as to obtain a standard convolution module feature map. The standard convolution module feature map is 64×64×4 for 32 channels.

Step S22: the standard convolution module feature images are respectively input into three self-attention modules to perform feature fusion, and a multi-scale module feature image is obtained, which specifically comprises the following steps:

step S221: inputting the standard convolution module feature map into a self-attention module Block2 for feature extraction to obtain a first scale feature map S17, which specifically comprises the following steps:

inputting the characteristic diagram of the standard convolution module into a first element-by-element convolution layer to carry out convolution operation to obtain a characteristic diagram S1; the number of convolution kernels of the first element-by-element convolution layer is 32, the size of the convolution kernels is 1 multiplied by 1, and the step length is 2; the feature map S1 is 32×32×2 of 32 channels; inputting the feature map S1 into a third normalized activation layer for normalization and activation operation to obtain a feature map S2; the feature map S2 is 32×32×2 of 32 channels; inputting the feature map S2 into a first transposition convolution layer to carry out transposition convolution operation to obtain a feature map S3; the number of convolution kernels of the first rotating convolution layer is 16, the size of the convolution kernels is 3 multiplied by 3, and the step length is 2; the feature map S3 is 64×64×4 of 16 channels; inputting the feature map S3 into a fourth normalized activation layer for normalization and activation operation to obtain a feature map S4; the feature map S4 is 64×64×4 of 16 channels; inputting the feature map S4 into a third standard convolution layer to carry out convolution operation to obtain a feature map S5; the number of convolution kernels of the third standard convolution layer is 32, the size of the convolution kernels is 3 multiplied by 3, and the step length is 2; the feature map S5 is 32×32×2 of 32 channels; inputting the feature map S5 into a fifth normalized activation layer for normalization and activation operation to obtain a feature map S6; the feature map S6 is 32×32×2 of 32 channels; inputting the characteristic diagram of the standard convolution module into a first depth separable convolution layer to carry out convolution operation to obtain a characteristic diagram S7; the number of convolution kernels of the first depth separable convolution layer is 32, the size of the convolution kernels is 3 multiplied by 3, and the step length is 1; the feature map S7 is 64×64×4 of 32 channels; inputting the feature map S7 into a sixth normalized activation layer for normalization and activation operation to obtain a feature map S8; the feature map S8 is 64×64×4 of 32 channels; inputting the feature map S8 into a second depth separable convolution layer to carry out convolution operation to obtain a feature map S9; the number of convolution kernels of the second depth separable convolution layer is 32, the size of the convolution kernels is 3 multiplied by 3, and the step length is 1; the feature map S9 is 64×64×4 of 32 channels; inputting the feature map S9 into a sixth normalized activation layer for normalization and activation operation to obtain a feature map S10; the feature map S10 is 64×64×4 of 32 channels; inputting the feature map S10 into a first maximum pooling layer to obtain a feature map S11; the feature map S11 is 32×32×2 of 32 channels, and the step size is 2; inputting the feature map S6 into a first global average pooling layer to obtain a feature map S12; inputting the feature map S12 into a first flattening layer to obtain a feature map S13; the feature map S13 is 32 channel 1 x 1; inputting the feature map S13 into a first full-connection layer and a first activation function layer in sequence to obtain a feature map S14; the first activation function layer is Relu; the characteristic diagram S14 is 16 channel 1 x 1; sequentially inputting the feature map S14 into a second full-connection layer and a second activation function layer to obtain a feature map S15; the second activation function layer is Sigmoid; the feature map S15 is 32 channel 1 x 1; multiplying the feature map S6 and the feature map S15 element by element to obtain a feature map S16, and adding the feature map S16 element by element to the feature map S11 to obtain a feature map S17; the feature map S16 is 32×32×2 of 32 channels; the feature map S17 is 32×32×2 of 32 channels, and a first scale feature map is obtained.

In this embodiment, the self-attention module Block2 includes: the first element-by-element convolution layer, the third normalized activation layer, the first transpose convolution layer, the fourth normalized activation layer, the third standard convolution layer, the fifth normalized activation layer, the first depth separable convolution layer, the sixth normalized activation layer, the second depth separable convolution layer, the seventh normalized activation layer, the first maximum pooling layer, the first global average pooling layer, the first flattening layer, the first full-connection layer, the first activation function layer, the second full-connection layer, the second activation function layer, as shown in fig. 3.

Step S222: inputting the standard convolution module feature map into a self-attention module Block3 for feature extraction to obtain a second scale feature map T17, which specifically comprises the following steps:

inputting the characteristic diagram of the standard convolution module into a second element-by-element convolution layer to carry out convolution operation to obtain a characteristic diagram T1; the number of convolution kernels of the second element-by-element convolution layer is 64, the size of the convolution kernels is 1 multiplied by 1, and the step length is 2; the feature map T1 is 32×32×2 of 64 channels; inputting the feature map T1 into an eighth normalized activation layer for normalization and activation operation to obtain a feature map T2; the feature map T2 is 32×32×2 of 64 channels; inputting the characteristic diagram T2 into a second transposition convolution layer to carry out transposition convolution operation to obtain a characteristic diagram T3; the number of convolution kernels of the second transpose convolution layer is 32, the size of the convolution kernels is 5 multiplied by 5, and the step length is 2; the feature map T3 is 64×64×4 of 32 channels; inputting the feature map T3 into a ninth normalized activation layer for normalization and activation operation to obtain a feature map T4; the feature map T4 is 64×64×4 of 32 channels; inputting the characteristic diagram T4 into a fourth standard convolution layer for convolution operation to obtain a characteristic diagram T5; the number of convolution kernels of the fourth standard convolution layer is 64, the size of the convolution kernels is 5 multiplied by 5, and the step length is 2; the feature map T5 is 32×32×2 of 64 channels; inputting the feature map T5 into a tenth normalized activation layer for normalization and activation operation to obtain a feature map T6; the feature map T6 is 32×32×2 of 64 channels; inputting the characteristic diagram of the standard convolution module into a third depth separable convolution layer to carry out convolution operation to obtain a characteristic diagram T7; the number of convolution kernels of the third depth separable convolution layer is 64, the size of the convolution kernels is 5 multiplied by 5, and the step length is 1; the feature map T7 is 64×64×4 of 64 channels; inputting the feature map T7 into an eleventh normalized activation layer for normalization and activation operation to obtain a feature map T8; the feature map T8 is 64×64×4 of 64 channels; inputting the characteristic diagram T8 into a fourth depth separable convolution layer to carry out convolution operation to obtain a characteristic diagram T9; the number of convolution kernels of the fourth depth separable convolution layer is 64, the size of the convolution kernels is 5 multiplied by 5, and the step length is 1; the feature map T9 is 64×64×4 of 64 channels; inputting the feature map T9 into a twelfth normalized activation layer for normalization and activation operation to obtain a feature map T10; the feature map T10 is 64×64×4 of 64 channels; inputting the feature map T10 into a second maximum pooling layer to obtain a feature map T11; the feature map T11 is 32×32×2 of 64 channels, with a step size of 2; inputting the feature map T6 into a second global average pooling layer to obtain a feature map T12; inputting the characteristic diagram T12 into a second flattening layer to obtain a characteristic diagram T13; the characteristic diagram T13 is 64 channel 1 x 1; inputting the feature map T13 into a third full-connection layer and a third activation function layer in sequence to obtain a feature map T14; the third activation function layer is Relu; the characteristic diagram T14 is 32 channel 1 x 1; inputting the characteristic diagram T14 into a fourth full-connection layer and a fourth activation function layer in sequence to obtain a characteristic diagram T15; the fourth activation function layer is Sigmoid; the characteristic diagram T15 is 64 channel 1 x 1; multiplying the feature map T6 and the feature map T15 element by element to obtain a feature map T16, and adding the feature map T16 element by element to the feature map T11 to obtain a feature map T17; the feature map T16 is 32×32×2 of 64 channels; the feature map T17 is 32×32×2 of 64 channels, and a second scale feature map is obtained.

In this embodiment, the self-attention module Block3 includes: the second element-by-element convolution layer, the eighth normalized activation layer, the second transpose convolution layer, the ninth normalized activation layer, the fourth standard convolution layer, the tenth normalized activation layer, the third depth separable convolution layer, the eleventh normalized activation layer, the fourth depth separable convolution layer, the twelfth normalized activation layer, the second maximum pooling layer, the second global average pooling layer, the second flattening layer, the third fully-connected layer, the third activation function layer, the fourth fully-connected layer, the fourth activation function layer, as shown in fig. 4.

Step S223: inputting the standard convolution module feature map into a self-attention module Block4 for feature extraction to obtain a third scale feature map M17, which specifically comprises the following steps:

inputting the standard convolution module feature map into a third element-by-element convolution layer to carry out convolution operation to obtain a feature map M1; the number of convolution kernels of the third element-by-element convolution layer is 128, the size of the convolution kernels is 1 multiplied by 1, and the step length is 2; the feature map M1 is 32×32×2 of 128 channels; inputting the feature map M1 into a thirteenth normalized activation layer for normalization and activation operation to obtain a feature map M2; the feature map M2 is 32×32×2 of 128 channels; inputting the feature map M2 into a third transposition convolution layer to carry out transposition convolution operation to obtain a feature map M3; the number of convolution kernels of the third transpose convolution layer is 64, the size of the convolution kernels is 7 multiplied by 7, and the step length is 2; the feature map M3 is 64×64×4 of 64 channels; inputting the feature map M3 into a fourteenth normalized activation layer for normalization and activation operation to obtain a feature map M4; the feature map M4 is 64×64×4 of 64 channels; inputting the feature map M4 into a fifth standard convolution layer for convolution operation to obtain a feature map M5; the number of convolution kernels of the fifth standard convolution layer is 128, the size of the convolution kernels is 7 multiplied by 7, and the step length is 2; the feature map M5 is 32×32×2 of 128 channels; inputting the feature map M5 into a fifteenth normalized activation layer for normalization and activation operation to obtain a feature map M6; the feature map M6 is 32×32×2 of 128 channels; inputting the characteristic diagram of the standard convolution module into a fifth depth separable convolution layer to carry out convolution operation to obtain a characteristic diagram M7; the number of convolution kernels of the fifth depth separable convolution layer is 128, the size of the convolution kernels is 7 multiplied by 7, and the step length is 1; the feature map M7 is 64×64×4 of 128 channels; inputting the feature map M7 into a sixteenth normalized activation layer for normalization and activation operation to obtain a feature map M8; the feature map M8 is 64×64×4 of 128 channels; inputting the characteristic diagram M8 into a sixth depth separable convolution layer to carry out convolution operation to obtain a characteristic diagram M9; the sixth depth separable convolution layer has a convolution kernel number of 128, the convolution kernel size is 7 x 7, the step length is 1; the feature map M9 is 64×64×4 of 128 channels; inputting the feature map M9 into a seventeenth normalized activation layer for normalization and activation operation to obtain a feature map M10; the feature map M10 is 64×64×4 of 128 channels; inputting the feature map M10 into a third maximum pooling layer to obtain a feature map M11; the feature map M11 is 32×32×2 for 128 channels, with a step size of 2; inputting the feature map M6 into a third global average pooling layer to obtain a feature map M12; inputting the characteristic map M12 into a third flattening layer to obtain a characteristic map M13; the feature map M13 is 128 channel 1 x 1; sequentially inputting the feature map M13 into a fifth full-connection layer and a fifth activation function layer to obtain a feature map M14; the fifth activation function layer is Relu; the characteristic map M14 is 64 channel 1 x 1; sequentially passing the feature map M14 through a sixth full-connection layer and a sixth activation function layer to obtain a feature map M15; the sixth activation function layer is Sigmoid; feature map M15 is 128 channel 1 x 1; multiplying the feature map M6 and the feature map M15 element by element to obtain a feature map M16, and adding the feature map M16 element by element to the feature map M11 to obtain a feature map M17; the feature map M16 is 32×32×2 of 128 channels; the feature map M17 is 32×32×2 of 128 channels, resulting in a third scale feature map.

In this embodiment: the self-attention module Block4 includes: a third element-by-element convolution layer, a thirteenth normalized activation layer, a third transpose convolution layer, a fourteenth normalized activation layer, a fifth standard convolution layer, a fifteenth normalized activation layer, a fifth depth separable convolution layer, a sixteenth normalized activation layer, a sixth depth separable convolution layer, a seventeenth normalized activation layer, a third maximum pooling layer, a third global average pooling layer, a third flattening layer, a fifth fully-connected layer, a fifth activation function layer, a sixth fully-connected layer, and a sixth activation function layer, as shown in fig. 5.

Step S224: and respectively inputting the feature map S17, the feature map T17 and the feature map M17 into a fourth element-by-element convolution layer, a fifth element-by-element convolution layer and a sixth element-by-element convolution layer to carry out convolution operation, so as to obtain feature maps S18, T18 and M18. The fourth element-by-element convolution layer, the fifth element-by-element convolution layer, and the sixth element-by-element convolution layer are each a convolution kernel number of 64, the convolution kernel size is 1 x 1, the step size is 2.

Step S225: and adding the feature map S18, the feature map T18 and the feature map M18 element by element to obtain a multi-scale module feature map. The feature map S18, the feature map T18, and the feature map M18 are each 16×16×1 of 64 channels, and the multi-scale module feature map is 16×16×1 of 64 channels.

Step S231: inputting the multi-scale module feature map into a second standard convolution module for feature extraction to obtain a feature map P4, wherein the feature map P4 specifically comprises:

inputting the multi-scale module feature map into a sixth standard convolution layer to carry out convolution operation to obtain a feature map P1; the number of convolution kernels of the sixth standard convolution layer is 128, the size of the convolution kernels is 3 multiplied by 3, and the step length is 2; the feature map P1 is 8×8×1 of 128 channels; inputting the feature map P1 into an eighteenth normalized activation layer for normalization and activation operation to obtain a feature map P2; the feature map P2 is 8×8×1 of 128 channels; inputting the feature map P2 into a seventh standard convolution layer for convolution operation to obtain a feature map P3; the number of convolution kernels of the seventh standard convolution layer is 256, the size of the convolution kernels is 3 multiplied by 3, and the step length is 2; the feature map P3 is 4×4×1 of 256 channels; inputting the feature map P3 into a nineteenth normalized activation layer for normalization and activation operation to obtain a feature map P4; the feature map P4 is 4×4×1 of 256 channels.

In this embodiment, the second standard convolution module includes: a sixth standard convolutional layer, an eighteenth normalized active layer, a seventh standard convolutional layer, and a nineteenth normalized active layer, as shown in Block5 of fig. 2.

Step S232: inputting the feature map P4 into a seventh element-by-element convolution layer to carry out convolution operation to obtain a feature map P5; the number of convolution kernels of the seventh element-by-element convolution layer is 1, the size of the convolution kernels is 1 multiplied by 1, and the step length is 1; the feature map P5 is 4×4×1 of 1 channel.

Step S233: inputting the feature map P5 into a seventh activation function layer to obtain a feature map P6; the feature map P6 is 4×4×1 of 1 channel.

Step S234: and multiplying the feature map P4 and the feature map P6 element by element to obtain a spatial attention feature map. The spatial attention profile is 4×4×1 for 256 channels.

Step S241: inputting the spatial attention feature map to a depth separable convolution module for feature extraction to obtain a feature map N4, wherein the method specifically comprises the following steps of:

inputting the spatial attention feature map into a seventh depth separable convolution layer to carry out convolution operation to obtain a feature map N1; the number of convolution kernels of the seventh depth separable convolution layer is 512, the size of the convolution kernels is 3 multiplied by 3, and the step length is 1; feature map N1 is 4×4×1 of 512 channels; inputting the feature map N1 into a twentieth normalized activation layer for normalization and activation operation to obtain a feature map N2; feature map N2 is 4×4×1 of 512 channels; inputting the feature map N2 into an eighth depth separable convolution layer to carry out convolution operation to obtain a feature map N3; the number of convolution kernels of the eighth depth separable convolution layer is 1024, the size of the convolution kernels is 3 multiplied by 3, and the step length is 1; the feature diagram N3 is 4×4×1 of 1024 channels; inputting the feature map M3 into a twenty-first normalized activation layer for normalization and activation operation to obtain a feature map N4; the feature map N4 is 4×4×1 of 1024 channels.

In this embodiment, the depth separable convolution module includes: a seventh depth separable convolutional layer, a twenty-first normalized active layer, an eighth depth separable convolutional layer, and a twenty-first normalized active layer, as shown by Block6 in fig. 2.

Step S242: inputting the feature map N4 into a global average pooling module for pooling operation, and enhancing the consistency of the feature map and the category to obtain a feature vector V; the feature vector V is 1 x 1 x 1024.

Step S243: inputting the obtained characteristics into a full connection layer to obtain a numerical vector of 6 categories; and simultaneously, normalizing the vector of the upper layer by using a Softmax classifier, converting the vector into a 0-1 probability distribution vector, and obtaining the classification result of each class by using the probability sum of 1.

The convolution filling mode of all convolution layers is Same, and in order to prevent overfitting and improve the generalization capability of the model, L2 regularization is used for all convolution layers.

The present network architecture uses depth separable convolutions, transposed convolutions, 1 x 1 element-by-element convolutions, and spatial and self-attention mechanisms in addition to standard convolutions; the depth separable convolution reduces the parameter quantity of the model, and meanwhile, relatively good feature extraction capability is maintained, so that the method can be widely used in mobile terminal equipment to improve the efficiency and speed of the model; transpose convolution converts the low resolution feature map to a high resolution feature map by applying padding and step sizes to the input feature map to expand the size of the feature map; the dimension of the feature map is reduced by 1 multiplied by 1 element convolution, the nonlinear expression capability is increased, and the number of model parameters is reduced; the spatial attention mechanism dynamically allocates attention weights according to the feature map, enhances important areas and suppresses the representation of non-important areas; the self-attention mechanism captures the global context dependency of continuous frames or feature graphs, so that the model can learn the interrelationship among features better, and the modeling capacity and performance of the model are improved.

Step S3: inputting the test data set into an initial non-standard behavior classification model for testing, and outputting a final non-standard behavior classification model until the set condition is met, wherein the method specifically comprises the following steps of:

step S31: inputting the test data set into an initial non-canonical behavior classification model for testing, and calculating classification loss and classification accuracy, wherein the method specifically comprises the following steps of:

step S311: according toCalculating a classification loss; wherein Q is _i Representing the true distribution, log, of the ith tag _e Representing natural logarithm, P _i Representing the prediction probability of the i-th class.

Step S312: calculating the classification accuracy according to the correct sample number N, which specifically comprises the following steps:

step S3121: and predicting each sample of the test data set to obtain a predicted class label PL.

Step S3122: the predicted class label PL is compared with the true class label TL and the correct number of samples N is statistically predicted.

Step S3123: calculating classification accuracy according to the correct sample number N, wherein the specific formula is as follows:

wherein N represents the number of samples predicted correctly and S represents the number of samples of the test dataset;

step S32: judging whether the iteration number is smaller than or equal to a set value 800; if the iteration number is greater than the set value, returning to the step S2; if the iteration number is smaller than or equal to the set value, continuing to judge whether the classification loss is smaller than or equal to a first threshold value 0.1; if the classification loss is greater than the first threshold, returning to step S2; if the classification loss is less than or equal to the first threshold, continuing to judge whether the classification accuracy is greater than or equal to the second threshold of 98.5%; if the classification accuracy is less than the second threshold, returning to step S2; and if the classification accuracy is greater than or equal to the second threshold value and all the set conditions are met, outputting the non-canonical behavior final classification model.

Step S4: inputting the collected video and/or image of the tourist behavior into the final classification model of the non-standard behavior for prediction, generating a behavior recognition result, judging which type of non-standard behavior is violated by the tourist, so that corresponding reminding measures can be adopted according to the behavior recognition result in the follow-up process, personalized reminding measures can be provided for each tourist, and different reminding modes and intervention measures can be adopted for the non-standard behaviors of different levels or types; for example: throwing garbage, picking flowers and fruits, describing graffiti, and for slight irregular behaviors, sending personalized notification through a mobile phone application program to remind tourists of paying attention to the regular behaviors; for example: non-smoking sites smoke, climb trees, historic cultural relics, destroy service facility equipment, and for serious irregular behaviors, more stringent measures can be taken, such as broadcasting notification through scenic spots, to draw attention and alert of tourists.

The invention can also establish a real-time monitoring system to continuously identify and analyze the tourist behaviors, and once serious irregular behaviors are detected, the system can immediately trigger an alarm mechanism to inform relevant staff to intervene and process. This can help to prevent the occurrence of irregular behavior in time, ensuring scenic spot order and tourist safety.

And counting and analyzing the behavior recognition result, and collecting information such as frequency, place, time and the like of the nonstandard behavior. Based on the data, more effective management strategies and measures can be formulated, scenic spot layout is optimized, warning marks are added, patrol supervision is enhanced, and the like, so that irregular behaviors are reduced.

User feedback and educational guidance, with the data collected by the behavior recognition system, personalized feedback and educational guidance can be provided to the guest. Through mobile phone application programs or other channels, the recognition results of the behaviors and related standard behavior suggestions are displayed to the tourists, and the consciousness and civilization literacy of the tourists are improved.

Example 2

As shown in fig. 6, the present invention further provides a big data driven behavior intelligent analysis system, comprising:

a data set construction module 10 for constructing a data set based on the non-normative behavior data of tourists in the tourist attraction; the data set comprises photos and/or videos marked with nonstandard behavior categories; the data set is divided into a training data set and a test data set according to a certain proportion.

The network training module 20 is configured to input the training data set into a multi-scale attention mechanism network for training, so as to obtain an initial non-canonical behavior classification model; the multi-scale attention mechanism network comprises two standard convolution modules, a depth separable convolution module, three self-attention mechanism modules, a space attention mechanism module, a global average pooling module and a Softmax classifier; the two standard convolution modules are a first standard convolution module and a second standard convolution module respectively.

The network training module specifically comprises:

the data enhancer module 201 is configured to enhance data of an image in the training data set, and input the enhanced image to the first standard convolution module for training, so as to obtain a feature map of the standard convolution module.

The multi-scale feature fusion sub-module 202 is configured to input the feature graphs of the standard convolution module to the three self-attention modules for feature fusion, so as to obtain a multi-scale module feature graph.

The spatial attention sub-module 203 is configured to sequentially input the multi-scale module feature map to the second standard convolution module and the spatial attention mechanism module for deep feature extraction, so as to obtain a spatial attention feature map.

The classification sub-module 204 is configured to sequentially input the spatial attention profile to the depth separable convolution module, the global averaging pooling module, and the Softmax classifier for classification, so as to obtain an initial classification model of the nonstandard behavior.

The test module 30 is configured to input the test data set into the initial classification model of the nonstandard behavior for testing, and output the final classification model of the nonstandard behavior until the set condition is met.

The behavior recognition module 40 is configured to input the collected video and/or image of the guest behavior into an irregular behavior final classification model for prediction, and generate a behavior recognition result.

As an embodiment, the data enhancer module 201 of the present invention specifically includes:

and the data enhancement strategy operation unit uses at least one strategy of random rotation, random translation in the horizontal direction, random translation in the vertical direction, random scaling and random horizontal overturn of the images and is used for carrying out data enhancement on the images in the training data set.

The first standard convolution layer operation unit is used for inputting the enhanced image into the first standard convolution layer to carry out convolution operation to obtain a feature map F1.

The first normalized activation layer operation unit is used for inputting the feature map F1 into the first normalized activation layer to perform normalization and activation operations to obtain a feature map F2.

And the second standard convolution layer operation unit is used for inputting the characteristic diagram F2 into the second standard convolution layer to carry out convolution operation to obtain a characteristic diagram F3.

And the second normalized activation layer operation unit is used for inputting the feature map F3 into the second normalized activation layer to perform normalization and activation operations to obtain a standard convolution module feature map.

As an embodiment, the multi-scale feature fusion submodule 202 according to the present invention specifically includes:

the first scale feature extraction unit is configured to input the standard convolution module feature map to the self-attention module Block2 for feature extraction, and obtain a first scale feature map S17.

The second scale feature extraction unit is used for inputting the standard convolution module feature map into the self-attention module Block3 for feature extraction to obtain a second scale feature map T17.

The third scale feature extraction unit is used for inputting the standard convolution module feature map into the self-attention module Block4 for feature extraction to obtain a third scale feature map M17.

The element-by-element convolution operation unit is configured to input the feature map S17, the feature map T17, and the feature map M17 to the fourth element-by-element convolution layer, the fifth element-by-element convolution layer, and the sixth element-by-element convolution layer, respectively, to perform convolution operations, so as to obtain feature maps S18, T18, and M18.

And the element-by-element adding unit is used for adding the feature map S18, the feature map T18 and the feature map M18 element by element to obtain a multi-scale module feature map.

As an embodiment, the spatial attention sub-module 203 of the present invention specifically includes:

the second standard convolution module operation unit is used for inputting the multi-scale module feature map into the second standard convolution module to perform feature extraction, and obtaining a feature map P4.

And the seventh element-by-element convolution layer operation unit is used for inputting the feature map P4 into the seventh element-by-element convolution layer to perform convolution operation, so as to obtain a feature map P5.

And a seventh activation function layer operation unit, configured to input the feature map P5 to the seventh activation function layer, and obtain a feature map P6.

And the element-by-element multiplication unit is used for multiplying the feature map P4 and the feature map P6 element by element to obtain a spatial attention feature map.

As an embodiment, the classifying sub-module 204 of the present invention specifically includes:

and the depth separable convolution module feature extraction unit inputs the spatial attention feature map to the depth separable convolution module for feature extraction to obtain a feature map N4.

And the global average pooling module pooling operation unit inputs the feature map N4 to the global average pooling module for pooling operation to obtain a feature vector V.

And the Softmax classifying unit inputs the feature vector V into a Softmax classifier to classify, so as to obtain probability distribution of each category.

And the initial classification model output unit is used for selecting the class with the highest probability as a final classification result to obtain the non-canonical behavior initial classification model.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The big data driven behavior intelligent analysis method is characterized by comprising the following steps:

the step S2 specifically comprises the following steps:

2. The intelligent analysis method for big data driven behavior according to claim 1, wherein the step of performing data enhancement on the images in the training dataset, inputting the enhanced images to the first standard convolution module for training, and obtaining a standard convolution module feature map specifically includes:

3. The intelligent behavioral analysis method driven by big data according to claim 1, wherein the feature graphs of the standard convolution modules are respectively input to three self-attention modules for feature fusion to obtain a multi-scale module feature graph, and the intelligent behavioral analysis method specifically comprises the following steps:

4. The intelligent behavioral analysis method driven by big data according to claim 1, wherein the inputting the multi-scale module feature map to the second standard convolution module and the spatial attention mechanism module sequentially performs deep feature extraction to obtain a spatial attention feature map, specifically includes:

5. The big data driven behavior intelligent analysis method according to claim 1, wherein the step of inputting the spatial attention feature map to the depth separable convolution module, the global averaging pooling module and the Softmax classifier in sequence to classify the spatial attention feature map to obtain an unnormalized behavior initial classification model specifically comprises the steps of:

6. The intelligent analysis method for behavior driven by big data according to claim 1, wherein the step of inputting the test data set into the initial classification model for non-standard behavior to perform a test until a set condition is satisfied, and outputting a final classification model for non-standard behavior specifically comprises:

7. A big data driven behavioral intelligent analysis system, the system comprising:

the network training module specifically comprises:

8. The big data driven behavioral intelligent analysis system of claim 7, wherein the data enhancer module comprises:

The data enhancement strategy operation unit uses at least one strategy of random rotation, random translation in a horizontal direction, random translation in a vertical direction, random scaling and random horizontal overturn of images and is used for enhancing data of the images in the training data set;

9. The big data driven behavioral intelligent analysis system of claim 7, wherein the multi-scale feature fusion sub-module specifically comprises:

10. The big data driven behavioral intelligent analysis system of claim 7, wherein the spatial attention sub-module specifically comprises: