CN114550109B - Pedestrian flow detection method and system - Google Patents

Pedestrian flow detection method and system Download PDF

Info

Publication number
CN114550109B
CN114550109B CN202210454852.2A CN202210454852A CN114550109B CN 114550109 B CN114550109 B CN 114550109B CN 202210454852 A CN202210454852 A CN 202210454852A CN 114550109 B CN114550109 B CN 114550109B
Authority
CN
China
Prior art keywords
image set
training
image
detection model
amplified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210454852.2A
Other languages
Chinese (zh)
Other versions
CN114550109A (en
Inventor
李金泽
赵政杰
张舒
张宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Microelectronics of CAS
Original Assignee
Institute of Microelectronics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Microelectronics of CAS filed Critical Institute of Microelectronics of CAS
Priority to CN202210454852.2A priority Critical patent/CN114550109B/en
Publication of CN114550109A publication Critical patent/CN114550109A/en
Application granted granted Critical
Publication of CN114550109B publication Critical patent/CN114550109B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention relates to a pedestrian flow detection method and a pedestrian flow detection system, belongs to the technical field of image preprocessing and recognition, and solves the problems that the training time of an existing self-attention model exceeds the expected time, the two-classification effect of a full connection layer is poor and the like. The method comprises the following steps: acquiring a pre-training image set, a training image set and a verification image set, and amplifying the pre-training image set and the training image set in a data amplification mode; constructing a detection model, wherein the detection model comprises a self-attention module and a support vector machine replacing a full connection layer; pre-training the detection model by using a pre-training image set; formally training the pre-trained detection model by using a training image set, and then verifying by using a verification image set to generate a trained detection model; and acquiring an image to be detected, and sending the image to be detected to the trained detection model to acquire a recognition result. The pre-training of the pre-training data set significantly reduces training time and the use of support vector machines can improve the binary performance.

Description

Pedestrian flow detection method and system
Technical Field
The invention relates to the technical field of image preprocessing and recognition, in particular to a pedestrian flow detection method and system.
Background
In recent years, with the short development and breakthrough of deep learning, deep neural networks achieve good results in the visual field, the recognition accuracy is higher and higher, the recognition methods are the same, and most models are based on the structure of the CNN. Although the CNN has good recognition accuracy, the network structure becomes more and more complex, the number of network layers increases from tens of layers to hundreds of layers, and the training difficulty and the calculation amount are improved. The hundreds of billions of parameter volumes have severely limited the application of models deployed to embedded systems or mobile terminals.
The self-attention mechanism is another method for extracting image features, different from the method that a CNN obtains the global receptive field of an image by using sliding convolution and stacking layers, the self-attention mechanism simplifies a model by using two encoders, and meanwhile, the method directly extracts global correlation features based on query, key and value. However, the self-attention mechanism network has two problems: the first point is the problem of too long training time of the self-attention mechanism, and according to google's paper data, the training time of the self-attention model is about 3 days (under the condition of using 24 TPUs), which obviously greatly exceeds the expected time; the second point is that the self-attention model usually faces the multi-classification problem, and when the two-classification problem is faced, the effect cannot be achieved by only forcibly modifying the output of the last full-connection layer into 2 outputs.
Disclosure of Invention
In view of the foregoing analysis, embodiments of the present invention provide a pedestrian traffic detection method and system, so as to solve the problems that the training time of the existing self-attention model exceeds the expected time, and the effect of performing classification by using a full connection layer is poor.
In one aspect, an embodiment of the present invention provides a pedestrian traffic detection method, including: acquiring a pre-training image set, a training image set and a verification image set, wherein the pre-training image set and the training image set are amplified in a data amplification mode; constructing a detection model, wherein the detection model comprises a self-attention module and a support vector machine replacing a full connection layer; pre-training the detection model by using the amplified pre-training image set; performing formal training on the pre-trained detection model by using the amplified training image set, and then performing verification by using the verification image set to generate a trained detection model; and acquiring an image to be detected, and sending the image to be detected to the trained detection model to acquire a recognition result.
The beneficial effects of the above technical scheme are as follows: the training time required by the self-attention model can be obviously reduced by using the pre-training data set for pre-training and then using the training data set for training, and meanwhile, the performance of the self-attention model is not reduced. Compared with the traditional training method, the training method has great improvement. In addition, the support vector machine is used for replacing the full-connection layer of the last layer, so that the classification performance of the support vector machine can be greatly improved, the training time and the model scale are further reduced, and the precision is guaranteed.
Based on the further improvement of the method, the pre-training image set is a single pedestrian image; and the training image set is a crowd image, wherein the resolution of the training image set is higher than that of the pre-training image set.
Based on a further improvement of the above method, the augmenting the pre-training image set and the training image set using data augmentation comprises: amplifying the pre-training image set and the training image set by using a mirror image amplification matrix to identify the same target at different angles and different directions in the images; and amplifying the pre-training image set and the training image set by using a scaling and amplifying matrix so as to identify characters with different distances and scenes with different sizes in the images.
Based on further improvement of the method, the detection model further comprises a preprocessing module, wherein the preprocessing module is used for preprocessing the images in the amplified image set so as to strengthen the edge features of the target in the images; using the self-attention module to perform feature extraction on the preprocessed image; and performing secondary classification on the image features extracted from the attention module by using the support vector machine, wherein the amplified image set is an amplified pre-training image set in the pre-training process, or the amplified image set is an amplified training image set in the formal training process.
Based on a further improvement of the above method, the preprocessing the images in the amplified image set using the preprocessing module includes: and performing feature extraction on longitudinal texture of the images in the amplified image set by using the following first feature matrix: k1 = [1, 2, 1; 0, 0, 0; -1, -2, -1 ]; and performing feature extraction on the transverse texture of the images in the amplified image set by using the following second feature matrix: k2 = [1, 0, -1, 2, 0, -2, 1, 0, -1 ].
Based on further improvement of the method, the feature extraction of the preprocessed image by using the self-attention module comprises: preprocessing images in the amplified image set using the preprocessing module includes: segmenting an image in the augmented image set into a plurality of patches using the pre-processing module and flattening each patch to generate normalized picture featuresz 0
Figure 264630DEST_PATH_IMAGE001
Wherein the content of the first and second substances,x j adding a binary judgment mark in front of the first dimension of the picture (x j ) To judge whether the pedestrian is a pedestrian or not,
Figure 447349DEST_PATH_IMAGE002
Figure 758245DEST_PATH_IMAGE003
、…、
Figure 419033DEST_PATH_IMAGE004
a plurality of small blocks which are respectively divided for each picture, each small block is 16 multiplied by 16,Efor embedding in a matrix, the shape isP 2C)×DAnd anE pos A position coding vector, which represents that the position coding is carried out on the divided picture; normalizing the normalization using a multi-headed self-attention modulePerforming multi-head self-attention calculation with a residual error network on the picture characteristics to obtain characteristic calculation results; and classifying the feature calculation results by using a multilayer perceptron and extracting the coordinates of the target to be detected from the classified feature calculation results.
Based on a further improvement of the above method, the performing a second classification on the image features extracted from the attention module by using the support vector machine comprises: mapping the processed picture features into a high-dimensional space by using an RBF core:
Figure 498985DEST_PATH_IMAGE005
detecting a sample in the high dimensional space to obtain a linear interface by:
Figure 852606DEST_PATH_IMAGE006
Figure 401530DEST_PATH_IMAGE007
Figure 866009DEST_PATH_IMAGE008
wherein the content of the first and second substances,z m , z n for the image features to be extracted by the self-attention module,
Figure 800467DEST_PATH_IMAGE009
for extracted image featuresz m , z n The standard deviation of (a) is determined,ωandbis the coefficient of the linear interface in question,ωis a high dimensional parameter matrix and depends on image characteristics,ω T for the transposed matrix, ζ is the scaling factor,x i()y i()for the embedding of samples in the high-dimensional space,Fis a hyper-parameter of the support vector machine.
Based on further improvement of the method, the pre-training of the detection model by using the amplified pre-training image set comprises: performing forward propagation training of preset model parameters on the detection model by using the amplified pre-training image set, and calculating a prediction resultqAnd calculating the cross entropy loss:
L(p,t)=-[plog(q)+(1-p)log(1-q)];
and performing back propagation training of preset model parameters on the detection model by using the cross entropy loss as a loss function.
Based on the further improvement of the method, acquiring the image to be detected, and sending the image to be detected to the detection model after training to acquire the recognition result comprises the following steps: acquiring the image to be detected by using a photographing mode, a video frame taking mode and/or an internet mode; and sending the image to be detected to the trained detection model so that the detection model identifies the image to be detected to obtain an identification result.
In another aspect, an embodiment of the present invention provides a pedestrian traffic detection system, including: the image set generation module is used for acquiring a pre-training image set, a training image set and a verification image set, wherein the pre-training image set and the training image set are amplified in a data amplification mode; the detection model generation module comprises a self-attention module and a support vector machine replacing a full connection layer; the pre-training module is used for pre-training the detection model by using the amplified pre-training image set; the training module is used for carrying out formal training on the pre-trained detection model by using the amplified training image set and then carrying out verification by using the verification image set so as to generate a trained detection model; and the target identification module is used for acquiring an image to be detected and sending the image to be detected to the trained detection model to acquire an identification result.
Compared with the prior art, the invention can realize at least one of the following beneficial effects:
1. pre-training on the pre-training data set and then training using the training data set can significantly reduce the training time required for the self-attention model without degrading its performance. Compared with the traditional training method, the training method is greatly improved;
2. the first feature matrix and the second feature matrix are used for carrying out feature extraction on longitudinal texture and transverse texture of the amplified images in the image set, and a good effect is achieved on subsequent classification judgment;
3. the support vector machine is used for replacing the full-connection layer of the last layer, so that the two-classification performance of the full-connection layer can be greatly improved, the training time and the model scale are further reduced, and the precision is guaranteed.
In the invention, the technical schemes can be combined with each other to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout the drawings;
FIG. 1 is a flow chart of a pedestrian traffic detection method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a detection model according to an embodiment of the invention;
FIG. 3 is a block diagram of a self-attention module according to an embodiment of the present invention;
FIG. 4 is a training result obtained by pre-training a test model using a set of pre-training images, according to an embodiment of the present invention;
FIG. 5 is a diagram of an image in an augmented training image set and a pre-processed image obtained after pre-processing the image, according to an embodiment of the invention;
FIG. 6 is a diagram of actual test results according to an embodiment of the present invention;
fig. 7 is a flowchart of a pedestrian traffic detection method applied to a detection model according to an embodiment of the present invention.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
A specific embodiment of the present invention discloses a pedestrian traffic detection method, as shown in fig. 1, the pedestrian traffic detection method includes: in step S102, a pre-training image set, a training image set, and a verification image set are obtained, wherein the pre-training image set and the training image set are amplified in a data amplification manner; in step S104, constructing a detection model, wherein the detection model comprises a self-attention module and a support vector machine replacing a full connection layer; in step S106, pre-training the detection model using the amplified pre-training image set; in step S108, performing formal training on the pre-trained detection model by using the amplified training image set, and then performing verification by using the verification image set to generate a trained detection model; in step S110, an image to be detected is obtained, and the image to be detected is sent to the trained detection model to obtain a recognition result.
Hereinafter, the respective steps of the pedestrian traffic detection method according to the embodiment of the present invention will be described in detail with reference to fig. 1 to 3.
In step S102, a pre-training image set, a training image set, and a verification image set are obtained, wherein the pre-training image set and the training image set are augmented using a data augmentation method. A pre-training dataset P and a dataset C are acquired and the dataset C is divided into a training set T and a validation set V with a ratio split _ ratio = 0.7. The pre-training image set is a single pedestrian image; and the training image set is a crowd image, wherein the resolution of the training image set is higher than that of the pre-training image set. Specifically, the amplifying the pre-training image set and the training image set by using the data amplification mode comprises the following steps: amplifying the pre-training image set and the training image set by using a mirror image amplification matrix so as to identify the same targets in different angles and different directions in the images; and augmenting the pre-training image set and the training image set by using a scaling augmentation matrix to identify different people in the images and different scenes in the images.
In step S104, a detection model is constructed, wherein the detection model includes a preprocessing module, a self-attention module, and a support vector machine instead of a fully connected layer. The preprocessing module is used for preprocessing the image so as to strengthen the edge characteristics of the target in the image. The self-attention module is used for extracting features of the preprocessed image. The support vector machine is used for carrying out secondary classification on the image features extracted from the attention module.
In step S106, the detection model is pre-trained using the amplified pre-training image set. Pre-training the detection model using the amplified pre-training image set comprises: preprocessing the images in the amplified image set by using a preprocessing module in the pre-training process so as to strengthen the edge characteristics of the targets in the images; using a self-attention module to extract the features of the preprocessed image; and using a support vector machine to perform a second classification on the image features extracted from the attention module (fig. 4 shows a diagram of pre-trained images), wherein the augmented image set is an augmented pre-trained image set.
Specifically, pre-training the detection model using the amplified pre-training image set comprises: performing forward propagation training of preset model parameters on the detection model by using the amplified pre-training image set, and calculating a prediction resultqAnd calculating the cross entropy loss:
L(p,t)=-[plog(q)+(1-p)log(1-q)];
and performing back propagation training of preset model parameters on the detection model by using the cross entropy loss as a loss function.
In step S108, the pre-trained detection model is formally trained using the amplified training image set, and then verified using the verification image set to generate a trained detection model. And in the formal training process, the amplified image set is an amplified training image set. Referring to fig. 2, the formal training of the pre-trained detection model using the amplified training image set includes preprocessing the images in the amplified image set using a preprocessing module to enhance edge features of the target in the images; using a self-attention module to extract the features of the preprocessed image; and carrying out secondary classification on the image features extracted from the attention module by using a support vector machine, wherein the image set amplified in the pre-training process is an amplified pre-training image set, or the image set amplified in the formal training process is an amplified training image set.
Fig. 5 shows an image before preprocessing and an image after preprocessing, in particular, preprocessing an image in an augmented image set using a preprocessing module includes: feature extraction is carried out on longitudinal textures of the images in the amplified image set by using the following first feature matrix:
k1 = [1, 2, 1; 0, 0, 0; -1, -2, -1 ]; and
performing feature extraction on the transverse texture of the images in the amplified image set by using the following second feature matrix:
K2 = [1, 0, -1; 2, 0, -2; 1, 0, -1]。
referring to fig. 3, using the self-attention module, the feature extraction of the preprocessed image includes: the preprocessing the images in the amplified image set using a preprocessing module includes: segmenting an image in the augmented image set into a plurality of patches using a pre-processing module and flattening each patch to generate normalized picture featuresz 0
Figure 324989DEST_PATH_IMAGE010
Wherein the content of the first and second substances,x j adding a binary judgment token (correction token) in front of the first dimension of the picture (A)x j ) To judge whether the pedestrian is a pedestrian or not,
Figure 610477DEST_PATH_IMAGE002
Figure 363801DEST_PATH_IMAGE011
、…、
Figure 152765DEST_PATH_IMAGE004
a plurality of small blocks which are respectively divided for each picture, each small block is 16 multiplied by 16,Efor embedding in a matrix, the shape isP 2C)×DAnd, andE pos a position coding vector, which represents that the position coding is carried out on the divided picture; performing multi-head self-attention calculation on the normalized picture features by using a multi-head self-attention module to obtain feature calculation results; and classifying the feature calculation results by using the multilayer perceptron and extracting the coordinates of the target to be detected from the classified feature calculation results.
The two classification of the image features extracted from the attention module using a support vector machine comprises: mapping the processed picture features into a high-dimensional space by using an RBF core:
Figure 848189DEST_PATH_IMAGE005
detecting a sample in a high-dimensional space to obtain a linear interface by the following formula:
Figure 620973DEST_PATH_IMAGE012
Figure 161675DEST_PATH_IMAGE013
Figure 336305DEST_PATH_IMAGE008
wherein, the first and the second end of the pipe are connected with each other,z m , z n to extract image features from the attention module,
Figure 687783DEST_PATH_IMAGE014
for extracted image featuresz m , z n The standard deviation of (a) is determined,ωandbis a coefficient of a linear interface and is,ωis a high dimensional parameter matrix and depends on image characteristics,ω T for the transposed matrix, ζ is the scaling factor,x i()y i()for the embedding of samples in a high-dimensional space,Fis a hyper-parameter of the support vector machine.
In step S110, an image to be detected is obtained, and the image to be detected is sent to the trained detection model to obtain a recognition result. Specifically, acquiring an image to be detected, and sending the image to be detected to the trained detection model to acquire a recognition result (refer to fig. 6) includes: acquiring an image to be detected by using a photographing mode, a video frame taking mode and/or an internet mode; and sending the image to be detected to the trained detection model so that the detection model identifies the image to be detected to obtain an identification result.
In another embodiment of the present invention, a pedestrian traffic detection system is disclosed, and referring to fig. 7, the pedestrian traffic detection system includes: an image set generating module 702, configured to obtain a pre-training image set, a training image set, and a verification image set, where the pre-training image set and the training image set are amplified in a data augmentation manner; a detection model generation module 704, including a self-attention module and a support vector machine instead of a fully connected layer; a pre-training module 706, configured to pre-train the detection model using the amplified pre-training image set; a training module 708, configured to perform formal training on the pre-trained detection model using the amplified training image set, and then perform verification using the verification image set to generate a trained detection model; and the target recognition module 710 is configured to obtain an image to be detected, and send the image to be detected to the trained detection model to obtain a recognition result.
Hereinafter, a pedestrian flow rate detection method according to an embodiment of the present invention is described in detail by way of specific examples.
In a first aspect of the embodiments of the present invention, an embodiment of a pedestrian traffic detection method is provided. The method comprises the following steps:
s1, acquiring a pedestrian image set M and a crowd image set C, using the data set M as a pre-training image set, and dividing the training image set T and a verification image set V by split _ ratio =0.7 in the data set C;
s2, building a self-attention model by using a PyTorch depth learning framework;
s3, pre-training the self-attention model by using the pedestrian image set M database to preset model parameters; performing large-scale formal training on the self-attention model by using the crowd image set C database;
s4, obtaining a to-be-detected pedestrian flow picture, sending the to-be-detected pedestrian flow picture to the trained self-attention model, and obtaining a recognition result;
and S5, obtaining the people flow data of the current picture according to the recognition result.
The pedestrian flow detection system provided by the invention improves the application scene of pedestrian flow detection, simplifies the model implementation scheme and reduces the training difficulty.
In some embodiments, step S1, acquiring the pedestrian image set M and the crowd image set C, and generating a training image set according to the pedestrian image set M and the crowd image set C, further includes:
the Pedestrian image set M uses MIT-CBCL Pedestrian Database, is a single Pedestrian image set M, and is in a ppm format and 64 x 128 in resolution; the crowd image set C uses Caltech Peerstrong Detection Benchmark, is a Pedestrian database with larger scale at present, adopts a vehicle-mounted camera to shoot, and has the resolution of 640 multiplied by 480; the training image set is augmented by data augmentation, and since the identification target is a pedestrian image, a mirror image augmentation matrix is selected for the training image set MH 1Image scaling and amplification matrixH 2The amplification is carried out, and the amplification is carried out,H 2and parameters are randomly generated to ensure the robustness of the model when the model identifies pedestrians with different distances and different angles in the image.H 1H 2Are all matrices, which are for imagesAnd transforming the matrix.
And dividing a training image set and a test set by using a train _ test _ split () function in a scinit spare function library, dividing 70% of a data set into the training image set, and dividing the rest 30% of the data set into the test set.
Specifically, there are a total of 924 pictures in the MIT-CBCL pedistrian Database, the shoulder-to-foot distance is about 80 pixels, the Database contains only two front and back sides, and no negative samples. Approximately 250000 frames, 350000 rectangular boxes, 2300 pedestrians are labeled in the Caltech Peerdetection Benchmark.
In some embodiments, step S2, building the self-attention model using a PyTorch depth learning framework, further comprises:
constructing a Transformer network by using PyTorch deep learning. The transform network converts the picturexR H W C××The division is performed, each Patch has a size of 16X 16, and it is expanded to
Figure 947863DEST_PATH_IMAGE015
In whichPIs the size of the Patch and is,Cis the number of the picture channels,Hthe height of the picture is taken as the height of the picture,Win order to be the width of the picture,Nis based onHWPThe calculation result of (2). Adding a binary judgment mark in front of the first dimension of the picture (x j ) To judge whether it is a pedestrian, flattening the picture features:
Figure 292257DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure 55813DEST_PATH_IMAGE002
Figure 358619DEST_PATH_IMAGE003
、…、
Figure 574836DEST_PATH_IMAGE004
a plurality of pictures divided for each picture respectivelySmall blocks, each of 16 x 16,Efor embedding in a matrix, the shape isP 2C)×DE pos To transform the matrix (i.e., encode the position vector), the effect is to encode the split picture into the desired shape. Specifically, when a picture is divided, the picture is simply divided into (P 2×C) The size of the block also needs a coding matrix to process the divided picture and convert the picture into an input dimension required by a network, namely, the coding operation is carried out on the pixel, wherein the coding matrix is usedEAfter multiplying each picture block, a (1) is obtainedD) The vector of dimensions, which is the result of encoding the partial picture block, can be input to the network for processing and analysis. WhereinEIs an Embedding matrix with a shape of: (P 2C)×DDIs the Embedding dimension, usually 512 is taken, which represents that the picture Embedding is a 512-dimensional vector. For Normalized (LN) picture features z0Perform Multi-headed Self-Attention computation with residual network (MSA):
Figure 739233DEST_PATH_IMAGE017
the normalized image features are processed using multi-headed self-attention computation, and the output result is attention-weighted image features. The features described above are input into three fully-connected layers, and the three results are outputQKVFor every two featuresQAndKinner products are made, and the obtained result is used asVThe processing of the image features is corresponding to each image featureVAnd performing weighting processing. It should be noted that the generationQKVThe parameters of the full connection layer can be trained, and the three parameters are trained in the process of training the network, so that the three parameters can well reflect the image characteristics. Then calculating the characteristic to obtain z'1Classification was performed using a MultiLayer Perceptron (MLP):
Figure 357296DEST_PATH_IMAGE018
after 6-layer MSA and 6-layer MLP are used, MLP is used to extract the coordinates of the object to be detectedR=(x,y,w,h)=MLP(Z 1)。RIs a detection box, which consists of 4 parameters:xandyto detect the coordinates of the upper left corner of the box,wandhthe width and height of the detection frame, respectively. The multi-layer perceptron is used for classification, and particularly, the full connection layer is used in the network to realize the function of the multi-layer perceptron. The input of the multilayer perceptron is the aforementioned result of multi-head self-attention calculation, and the method comprises the steps of firstly carrying out Normalization (Layer Normalization) operation on input features, then inputting the operation result into the multilayer perceptron (namely, a fully-connected network Layer in a network), and finally adding the output of the network and the input of the multilayer perceptron at the beginning to obtain a final image feature extraction result.
Taking the Vision Transformer network as a feature extractor to extract feature resultsz l As input, building a network model of a support vector machine,z l Dimension of (d) is 768. The support vector machine hybrid model uses RBF kernels to extend feature vectors into a high-dimensional space:
Figure 831002DEST_PATH_IMAGE005
detecting the sample in a high-dimensional space, wherein the interface is a linear interface, omega,bAnd when the support vector machine is used for carrying out class judgment, the coefficient of the interface is obtained to be the interface of a high-dimensional space so as to achieve the purpose of classifying the input image characteristics. The coefficients of the interfaces are not one-dimensional, and because the processed picture features are mapped to a high-dimensional space, the coefficients corresponding to the interfaces are also multidimensional, and the specific dimension depends on the image features, and is 768 dimensions here. Omega is 768-dimensional parameter momentArray, needs to be optimized:
Figure 534516DEST_PATH_IMAGE019
Figure 486292DEST_PATH_IMAGE020
Figure 252471DEST_PATH_IMAGE021
wherein ζiIs a scaling factor to implement the soft margin SVM. This is the result of a simplification of the support vector machine for the purpose to be optimized.
The cross entropy loss is used as a loss function of network training, and after image features pass through the self-attention network and support vector machine mixed model, the probability that the detection result is the pedestrian is outputqThe actual result ispThen the cross entropy:
L(p,t)=-[plog(q)+(1-p)log(1-q)],
the model is backpropagated using cross entropy as a loss function to train model parameters.
PyTorch can be regarded as Numpy added with GPU support, and can also be regarded as a powerful deep neural network with an automatic derivation function. In addition to Facebook, it has been adopted by Twitter, CMU, and Salesforce, among other agencies.
Different from the idea of 'sliding weighting' in a convolutional network, an attention mechanism calculates the query, the key and the value of each part in a picture, and the weight of the value of the part is obtained through the matching relationship between the query and the key. And passes the weight to the next layer. In this way, the model can be made to understand the image globally.
The support vector machine method is established on the principle of VC statistics and structure risk minimization, and on the classification problem, the SVM has good robustness, has strong generalization capability on unknown data, and particularly has better performance compared with other traditional machine learning algorithms under the condition of less data volume. And building a Sklear.svm.SCV function library of the scinitlern library, wherein the function library is realized based on libsvm.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by a computer program that is stored in a computer-readable storage medium and that, when executed, may include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a read-only memory (ROM), or a Random Access Memory (RAM). The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated. It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, where the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.
Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (9)

1. A pedestrian flow detection method is characterized by comprising the following steps:
acquiring a pre-training image set, a training image set and a verification image set, wherein the pre-training image set and the training image set are amplified in a data amplification mode, the pre-training image set is a single pedestrian image, the training image set is a crowd image, and the resolution of the training image set is higher than that of the pre-training image set;
constructing a detection model, wherein the detection model comprises a self-attention module and a support vector machine replacing a full connection layer of the last layer, and performing secondary classification on the image features extracted from the self-attention module by using the support vector machine;
pre-training the detection model parameters by using the amplified pre-training image set;
performing formal training on the pre-trained detection model by using the amplified training image set, and then performing verification by using the verification image set to generate a trained detection model;
and acquiring an image to be detected, and sending the image to be detected to the detection model after training so as to acquire a recognition result.
2. The pedestrian traffic detection method of claim 1, wherein augmenting the pre-training image set and the training image set using data augmentation comprises:
amplifying the pre-training image set and the training image set by using a mirror image amplification matrix to identify the same target at different angles and different directions in the images; and
and amplifying the pre-training image set and the training image set by using a scaling and amplifying matrix to identify characters with different distances and scenes with different sizes in the images.
3. The pedestrian traffic detection method according to claim 2, wherein the detection model further includes a preprocessing module, wherein,
preprocessing the images in the amplified image set by using the preprocessing module so as to strengthen the edge characteristics of the target in the images;
and performing feature extraction on the preprocessed image by using the self-attention module, wherein the amplified image set is an amplified pre-training image set in the pre-training process, or the amplified image set is an amplified training image set in the formal training process.
4. The pedestrian traffic detection method according to claim 3, wherein preprocessing the images in the amplified image set using the preprocessing module includes:
feature extraction is carried out on longitudinal textures of the images in the amplified image set by using the following first feature matrix:
k1 = [1, 2, 1; 0, 0, 0; -1, -2, -1 ]; and
performing feature extraction on the transverse texture of the images in the amplified image set by using the following second feature matrix:
K2 = [1, 0, -1; 2, 0, -2; 1, 0, -1]。
5. the pedestrian flow detection method according to claim 4, wherein performing feature extraction on the preprocessed image using the self-attention module includes:
preprocessing images in the amplified image set using the preprocessing module includes: segmenting an image in the augmented image set into a plurality of patches using the pre-processing module and flattening each patch to generate normalized picture featuresz 0
Figure 399049DEST_PATH_IMAGE001
Wherein the content of the first and second substances,x j adding a binary judgment mark in front of the first dimension of the picture (x j ) To judge whether the pedestrian is a pedestrian or not,
Figure 780221DEST_PATH_IMAGE002
Figure 774721DEST_PATH_IMAGE003
、…、
Figure 322377DEST_PATH_IMAGE004
a plurality of small blocks which are respectively divided for each picture, each small block is 16 multiplied by 16,Efor embedding into a matrix, the shape isP 2C)×DAnd, andE pos for coding positionA vector indicating that the divided picture is subjected to position coding;
performing multi-head self-attention calculation on the normalized picture features by using a multi-head self-attention module to obtain feature calculation results; and
and classifying the feature calculation results by using a multilayer perceptron, and extracting the coordinates of the target to be detected from the classified feature calculation results.
6. The pedestrian flow detection method according to claim 4, wherein performing two classifications of the image features extracted from the attention module using the support vector machine comprises:
Figure 413830DEST_PATH_IMAGE005
mapping the processed picture features into a high-dimensional space by using an RBF core:
detecting a sample in the high dimensional space to obtain a linear interface by:
Figure 654319DEST_PATH_IMAGE006
Figure 949165DEST_PATH_IMAGE007
Figure 34933DEST_PATH_IMAGE008
wherein the content of the first and second substances,z m , z n extracting image features for the self-attention module, wherein σ is the extracted image featuresz m , z n The standard deviation of (a) is determined,ωandbis the coefficient of the linear interface in question,ωis a high dimensional parameter matrix and depends on image characteristics,ω T for the transposed matrix, ζ is the scaling factor,x i()y i()for the embedding of samples in the high-dimensional space,Fis a hyper-parameter of the support vector machine.
7. The pedestrian traffic detection method according to claim 2, wherein pre-training the detection model using the amplified pre-training image set comprises:
performing forward propagation training of preset model parameters on the detection model by using the amplified pre-training image set, and calculating a prediction resultqAnd calculating the cross entropy loss
L(p,t)=-[plog(q)+(1-p)log(1-q)];
And performing back propagation training of preset model parameters on the detection model by using the cross entropy loss as a loss function.
8. The pedestrian flow detection method according to claim 1, wherein obtaining an image to be detected and sending the image to be detected to the trained detection model to obtain a recognition result comprises:
acquiring the image to be detected by using a photographing mode, a video frame taking mode and/or an internet mode;
and sending the image to be detected to the trained detection model so that the detection model identifies the image to be detected to obtain an identification result.
9. A pedestrian flow detection system, comprising:
the image set generation module is used for acquiring a pre-training image set, a training image set and a verification image set, wherein the pre-training image set and the training image set are amplified in a data augmentation mode, the pre-training image set is a single pedestrian image, the training image set is a crowd image, and the resolution of the training image set is higher than that of the pre-training image set;
the detection model generation module comprises a self-attention module and a support vector machine for replacing a full connection layer of the last layer, and the support vector machine is used for carrying out secondary classification on the image features extracted from the self-attention module;
the pre-training module is used for pre-training the detection model parameters by using the amplified pre-training image set;
the training module is used for carrying out formal training on the pre-trained detection model by using the amplified training image set and then carrying out verification by using the verification image set so as to generate a trained detection model;
and the target identification module is used for acquiring an image to be detected and sending the image to be detected to the trained detection model to acquire an identification result.
CN202210454852.2A 2022-04-28 2022-04-28 Pedestrian flow detection method and system Active CN114550109B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210454852.2A CN114550109B (en) 2022-04-28 2022-04-28 Pedestrian flow detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210454852.2A CN114550109B (en) 2022-04-28 2022-04-28 Pedestrian flow detection method and system

Publications (2)

Publication Number Publication Date
CN114550109A CN114550109A (en) 2022-05-27
CN114550109B true CN114550109B (en) 2022-07-19

Family

ID=81666610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210454852.2A Active CN114550109B (en) 2022-04-28 2022-04-28 Pedestrian flow detection method and system

Country Status (1)

Country Link
CN (1) CN114550109B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490099A (en) * 2019-07-31 2019-11-22 武汉大学 A kind of subway common location stream of people's analysis method based on machine vision
US20200218937A1 (en) * 2019-01-03 2020-07-09 International Business Machines Corporation Generative adversarial network employed for decentralized and confidential ai training
US20200285896A1 (en) * 2019-03-09 2020-09-10 Tongji University Method for person re-identification based on deep model with multi-loss fusion training strategy
CN113936302A (en) * 2021-11-03 2022-01-14 厦门市美亚柏科信息股份有限公司 Training method and device for pedestrian re-recognition model, computing equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210419A (en) * 2019-06-05 2019-09-06 中国科学院长春光学精密机械与物理研究所 The scene Recognition system and model generating method of high-resolution remote sensing image
CN111639692B (en) * 2020-05-25 2022-07-22 南京邮电大学 Shadow detection method based on attention mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200218937A1 (en) * 2019-01-03 2020-07-09 International Business Machines Corporation Generative adversarial network employed for decentralized and confidential ai training
US20200285896A1 (en) * 2019-03-09 2020-09-10 Tongji University Method for person re-identification based on deep model with multi-loss fusion training strategy
CN110490099A (en) * 2019-07-31 2019-11-22 武汉大学 A kind of subway common location stream of people's analysis method based on machine vision
CN113936302A (en) * 2021-11-03 2022-01-14 厦门市美亚柏科信息股份有限公司 Training method and device for pedestrian re-recognition model, computing equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
融合注意力机制的高效率网络车型识别;柳长源 等;《浙江大学学报(工学版)》;20220401;第56卷(第4期);775-781 *

Also Published As

Publication number Publication date
CN114550109A (en) 2022-05-27

Similar Documents

Publication Publication Date Title
CN111460931B (en) Face spoofing detection method and system based on color channel difference image characteristics
CN109711422B (en) Image data processing method, image data processing device, image data model building method, image data model building device, computer equipment and storage medium
CN110383288B (en) Face recognition method and device and electronic equipment
US11348270B2 (en) Method for stereo matching using end-to-end convolutional neural network
CN103026368B (en) Use the process identification that increment feature extracts
CN110689599A (en) 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement
CN108154133B (en) Face portrait-photo recognition method based on asymmetric joint learning
CN111160295A (en) Video pedestrian re-identification method based on region guidance and space-time attention
CN111369548B (en) No-reference video quality evaluation method and device based on generation countermeasure network
CN110533119B (en) Identification recognition method, model training method and device thereof, and electronic system
CN111160313A (en) Face representation attack detection method based on LBP-VAE anomaly detection model
CN112200057A (en) Face living body detection method and device, electronic equipment and storage medium
CN112785480B (en) Image splicing tampering detection method based on frequency domain transformation and residual error feedback module
CN114387641A (en) False video detection method and system based on multi-scale convolutional network and ViT
CN116309725A (en) Multi-target tracking method based on multi-scale deformable attention mechanism
CN114898080A (en) Image imaging equipment identification method based on ViT network
CN114550109B (en) Pedestrian flow detection method and system
CN116975828A (en) Face fusion attack detection method, device, equipment and storage medium
CN116740487A (en) Target object recognition model construction method and device and computer equipment
CN115273202A (en) Face comparison method, system, equipment and storage medium
Eleuch et al. A study on the impact of multiview distributed feature coding on a multicamera vehicle tracking system at roundabouts
Weitzner et al. Face authentication from grayscale coded light field
CN112132835A (en) SeFa and artificial intelligence-based jelly effect analysis method for photovoltaic track camera
CN117292442B (en) Cross-mode and cross-domain universal face counterfeiting positioning method
Paul et al. Anti-Spoofing Face-Recognition Technique for eKYC Application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant