CN113033657A - Multi-user behavior identification method based on Transformer network - Google Patents
Multi-user behavior identification method based on Transformer network Download PDFInfo
- Publication number
- CN113033657A CN113033657A CN202110312085.7A CN202110312085A CN113033657A CN 113033657 A CN113033657 A CN 113033657A CN 202110312085 A CN202110312085 A CN 202110312085A CN 113033657 A CN113033657 A CN 113033657A
- Authority
- CN
- China
- Prior art keywords
- vector
- matrix
- data
- network
- time sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 239000013598 vector Substances 0.000 claims abstract description 102
- 230000006399 behavior Effects 0.000 claims abstract description 43
- 230000000694 effects Effects 0.000 claims abstract description 28
- 230000007246 mechanism Effects 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims abstract description 10
- 238000005070 sampling Methods 0.000 claims abstract description 9
- 238000013528 artificial neural network Methods 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 70
- 238000012512 characterization method Methods 0.000 claims description 18
- 238000010606 normalization Methods 0.000 claims description 17
- 230000007613 environmental effect Effects 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 6
- 230000001960 triggered effect Effects 0.000 claims description 6
- 210000002569 neuron Anatomy 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000010438 heat treatment Methods 0.000 claims description 3
- 230000006872 improvement Effects 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 abstract description 5
- 238000012549 training Methods 0.000 abstract description 3
- 239000000284 extract Substances 0.000 abstract description 2
- 238000012360 testing method Methods 0.000 abstract description 2
- 230000000875 corresponding effect Effects 0.000 description 19
- 230000006870 function Effects 0.000 description 11
- 230000004913 activation Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000009194 climbing Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000002079 cooperative effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000001035 drying Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B21/00—Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
- G08B21/02—Alarms for ensuring the safety of persons
- G08B21/04—Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons
- G08B21/0438—Sensor means for detecting
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Probability & Statistics with Applications (AREA)
- Gerontology & Geriatric Medicine (AREA)
- Business, Economics & Management (AREA)
- Emergency Management (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-user behavior identification method based on a Transformer network, which comprises the following steps: collecting an environment sensor data set, taking sensor data based on a time sequence as input to enter a model, and sampling through a sliding window with a fixed size; embedding the sampled events into an initial vector, then adding position codes to represent the sequence of the events in the sequence, and then enabling the vector to enter an Encoder of a transform network; the top full connectivity layer is applied to classify the labels of users and activities. The invention uses an end-to-end method, and avoids the process that the traditional machine learning method needs to manually make characteristics and distinguish a training set test set. The invention uses a time attention mechanism to enable the network to pay more attention to the key frame which contributes most to behavior recognition, and can effectively solve the problem of giving equal importance to time sequence data when the deep neural network automatically extracts the characteristics.
Description
Technical Field
The invention belongs to the field of human behavior recognition, and particularly relates to a multi-person behavior recognition method based on an Encoder technology in a transform network, which is mainly used for recognizing human behaviors aiming at environmental sensor data.
Background
In recent years, human behavior recognition has received much attention. Accurate and efficient human behavior recognition plays an important role in human-computer interaction, family safety monitoring and the like. Human behavior recognition may contribute to detecting behavioral activities of the elderly, identifying potential safety hazards and physical degradation, etc. As the basis of smart home, human behavior recognition needs to be performed based on data obtained by sensors. Compare with video sensor and wearing sensor, environmental sensor installs on floor, door and window or electrical equipment, reduces the inconvenience that the data acquisition process probably caused the resident activity, uses more extensively. The current research of human behavior recognition technology based on environmental sensor data faces the following problems:
1. multi-person behavior recognition is difficult: much research is currently focused on identifying the behavior of a single resident, however, there are always a plurality of residents with different behavior habits in a room, and parallel activities or cooperative activities exist, which bring complex challenges to the activity identification.
2. The traditional machine learning method has low recognition efficiency: the machine learning method requires the use of hand-made statistical and frequency features to represent segments of the raw sensor stream and train the machine learning model to classify residents and activities. The effectiveness of this method depends to a large extent on the effectiveness of the manual feature.
3. Neural networks are not suitable for processing binary data: with the research in the field of deep learning, CNN is gradually applied to human behavior recognition, but it is mainly used to process continuous signal data and lacks adaptability to binary environmental sensor data.
The invention content is as follows:
in order to overcome the defects of the background art, the invention provides a multi-user behavior identification method based on a transform network, which is used for solving the problem of simultaneously finishing the identification of multiple users and corresponding activities according to data collected by an environmental sensor.
In order to solve the technical problems, the invention adopts the technical scheme that:
a multi-person behavior identification method based on a Transformer network comprises the following steps:
step 1, collecting an environment sensor data set, taking sensor data based on a time sequence as input, entering a model, and sampling through a sliding window with a fixed size;
step 2, embedding the sampled events into an initial vector, then adding position codes to represent the sequence of the events in the sequence, and then enabling the vector to enter an Encoder of a transform network;
and 3, classifying the labels of the users and the activities by applying a top full-connection layer.
Preferably, the specific method of step 1 comprises:
step 1.1, arranging an environmental sensor in a measured space region, and collecting user behavior data;
step 1.2, the collected environmental sensor data is represented by ON or OFF, wherein ON represents that the sensor is triggered, and OFF represents that the sensor is not triggered;
step 1.3, screening original data, removing data with the attribute of OFF, reserving data with the attribute of ON, taking each ON data as an event, and arranging the screened ON data according to a time sequence to form time sequence data;
and step 1.4, segmenting the time sequence data obtained in the step 1.3 to obtain a data slice sample.
Preferably, the specific method of step 1.4 comprises: arranging the screened data with ON attributes according to a time sequence to form a group of time sequence data; and acquiring original information on the time sequence data by using a sliding window with a preset fixed size, wherein the acquired result of the sliding window is used as a data slice sample.
Preferably, the predetermined fixed-size sliding window size k is an empirical parameter.
Preferably, the specific method of step 2 comprises:
step 2.1, mapping the discrete data variable corresponding to each slice data sample to a continuous characterization vector through an Embedding algorithm, and performing independent heat treatment on each sample data by the Embedding algorithm to convert the sample data into a vector;
step 2.2, the embedded result set is the embedded matrix RT×CWherein T represents a time series dimension and C represents a channel dimension; in the process, the time series dimension is the length k of a sliding window in data slicing, each channel represents a corresponding sensor, and the number is N;
and 2.3, adding position codes. Constructing a matrix PE with the same dimension as the embedded matrix, wherein the rows of the matrix PE represent time sequence samples, the columns represent sensors, and each value in the matrix PE is obtained by the following formula;
PE(pos,2i)=sin(pos/100002i/dmodel)
PE(pos,2i+1)=cos(pos/100002i/dmode)
wherein PE represents a position coding matrix, pos represents a serial number corresponding to the sensor, i represents the position of the row vector in the matrix, and dmodelA dimension representing a row vector;
adding the PE matrix and the embedded matrix to obtain a new eigenvector matrix introduced with position coding;
step 2.4, inputting m row vectors in the new characteristic vector matrix into an Encoder, wherein the numerical value of m is Batch size Batch size set by the transform network;
step 2.5, the vector entering the encoder is firstly transmitted to a multi-head attention layer to obtain a new characterization vector; respectively calculating attention values under different attention heads by adopting a Multi-attention mechanism, so that the network pays more attention to a key frame which has the maximum contribution to behavior identification, wherein the calculation method comprises the following steps of:
MultiHead(Q,K,V)=Concat(head1,...,headh)
q, K and V respectively represent Query vectors in the attention mechanism, and represent sample attributes matched with the samples; the Value vector Key represents the attribute of the sample and the Value vector Value represents the information contained in the sample;
step 2.6, carrying out Normalization processing on the new characterization vector generated by the attention Layer in the step 2.5 through Layer Normalization, summing the input matrix in the step 2.4 and the matrix obtained in the step 2.5, and carrying out Normalization to obtain a new matrix;
step 2.7, transferring the matrix obtained in the step 2.6 to a Feed-Forward neural network Feed Forward for processing to obtain a reinforced characterization vector matrix;
step 2.8, the reinforced characterization vector matrix obtained in the step 2.7 is accessed into a normalization layer, and elements in the matrix are unitized according to rows to obtain a normalization matrix;
and 2.9, continuously sending the output normalized matrix to the next encoder to obtain a final characteristic matrix.
Preferably, the number of row vectors input to the Encoder in step 2.4 is m, the value of m being the Batch size set by the transform network.
Preferably, 6 sequentially arranged Encoder encoders are included.
Preferably, the specific method of step 3 comprises:
step 3.1, inputting the two-dimensional matrix of the T multiplied by C eigenvector obtained in the step 2.9 into a full connection layer, and automatically tiling to generate a one-dimensional vector with the length of T multiplied by C;
step 3.2, mapping the T multiplied by C one-dimensional feature vector to a sample marking space through a full connection layer to obtain a classification result vector, wherein elements in the vector are numerical values of each category obtained by weighting and summing the features;
step 3.3, a Softmax function is adopted as a classifier in the full connection layer, the input of the neurons of the full connection layer is mapped to the output end, each output value in the classification result vector is converted into probability, and a final classification vector Yt is obtained; calculating the difference between the expected output and the actual output by taking the Cross Entropy of Cross Encopy of the expected output and the actual output as a loss function;
and 3.4, finally outputting a final classification vector Yt by the whole network model based on the transform network improvement, wherein the vector Yt comprises user identification information and activity identification information, the first a elements in the vector respectively represent corresponding residents, the second b elements respectively represent corresponding behavior activities, and the value of each element represents the probability of identifying the corresponding resident or activity.
Preferably, the dimension of the vector Yt in step 3.4 is the sum of the number of residents a and the number of activities b.
The invention has the beneficial effects that: the invention uses an end-to-end method, and avoids the process that the traditional machine learning method needs to manually make characteristics and distinguish a training set test set. The invention uses a time attention mechanism to enable the network to pay more attention to the key frame which contributes most to behavior recognition, and can effectively solve the problem of giving equal importance to time sequence data when the deep neural network automatically extracts the characteristics. The invention uses the improved Transformer structure model, and the tasks only need to be classified, so that the decoder structure in the original model is deleted, and the accuracy of user and activity identification is improved by a more simplified framework. The invention can realize the identification of a plurality of users and simultaneously output the corresponding activity of each user.
Drawings
FIG. 1 is a schematic overall flow diagram of an embodiment of the present invention;
FIG. 2 is a schematic diagram of sliding window sampling according to an embodiment of the present invention;
FIG. 3 is a schematic view of a model of an attention mechanism according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an Encoder of the Encoder in the embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings and examples.
A multi-person behavior identification method based on a Transformer network comprises the following steps:
step 1, collecting an environment sensor data set, taking sensor data based on a time sequence as input, entering a model, and sampling through a sliding window with a fixed size; the specific method of step 1 comprises:
step 1.1, arranging an environmental sensor in a measured space region, and collecting user behavior data;
step 1.2, the collected environmental sensor data is represented by ON or OFF, wherein ON represents that the sensor is triggered, and OFF represents that the sensor is not triggered;
step 1.3, screening original data, removing data with the attribute of OFF, reserving data with the attribute of ON, taking each ON data as an event, and arranging the screened ON data according to a time sequence to form time sequence data;
step 1.4, segmenting the time sequence data obtained in the step 1.3 to obtain a data slice sample; the specific method comprises the following steps: arranging the screened data with ON attributes according to a time sequence to form a group of time sequence data; and acquiring original information on the time sequence data by using a sliding window with a preset fixed size, wherein the acquired result of the sliding window is used as a data slice sample. The predetermined fixed size sliding window size k is an empirical parameter.
Step 2, embedding the sampled events into an initial vector, then adding position codes to represent the sequence of the events in the sequence, and then enabling the vector to enter an Encoder of a transform network; the specific method of the step 2 comprises the following steps:
step 2.1, mapping the discrete data variable corresponding to each slice data sample to a continuous characterization vector through an Embedding algorithm, and performing independent heat treatment on each sample data by the Embedding algorithm to convert the sample data into a vector;
step 2.2, the embedded result set is the embedded matrix RT×CWherein T represents a time series dimension and C represents a channel dimension; in the process, the time series dimension is the length k of a sliding window in data slicing, each channel represents a corresponding sensor, and the number is N;
and 2.3, adding position codes. Constructing a matrix PE with the same dimension as the embedded matrix, wherein the rows of the matrix PE represent time sequence samples, the columns represent sensors, and each value in the matrix PE is obtained by the following formula;
PE(pos,2i)=sin(pos/100002i/dmodel)
PE(pos,2i+1)=cos(pos/100002i/dmodel)
wherein PE represents a position coding matrix, pos represents a serial number corresponding to the sensor, i represents the position of the row vector in the matrix, and dmodelA dimension representing a row vector;
adding the PE matrix and the embedded matrix to obtain a new eigenvector matrix introduced with position coding;
step 2.4, inputting m row vectors in the new characteristic vector matrix into an Encoder, wherein the numerical value of m is Batch size Batch size set by the transform network;
step 2.5, the vector entering the Encoder is firstly transmitted to a multi-head attention layer (referring to fig. 4, the multi-head attention layer is one of the internal structures of the Encoder) to obtain a new characterization vector; respectively calculating attention values under different attention heads by adopting a Multi-attention mechanism, so that the network pays more attention to a key frame which has the maximum contribution to behavior identification, wherein the calculation method comprises the following steps of:
MultiHead(Q,K,V)=Concat(head1,...,headh)
q, K and V respectively represent Query vectors in the attention mechanism, and represent sample attributes matched with the samples; the Value vector Key represents the attribute of the sample and the Value vector Value represents the information contained in the sample;
and 2.6, carrying out Normalization processing on the new characterization vector generated by the attention Layer in the step 2.5 through Layer Normalization, and facilitating nonlinear processing of data by a ReLU activation function in a Feed Forward neural network in the follow-up process. Summing the input matrix in the step 2.4 and the matrix obtained in the step 2.5, and normalizing to obtain a new matrix;
step 2.7, transferring the matrix obtained in the step 2.6 to a Feed-Forward neural network Feed Forward for processing to obtain a reinforced characterization vector matrix; the expressive power of the characterization vector is enhanced by activating the function.
Step 2.8, in order to avoid gradient disappearance and accelerate the convergence process of full-connection layer training, the strengthened characterization vector matrix obtained in the step 2.7 is connected into a normalization layer, and elements in the matrix are unitized according to rows to obtain a normalization matrix;
and 2.9, continuously sending the output normalized matrix to the next encoder to obtain a final characteristic matrix.
Preferably, in step 2.4, the number of row vectors of the Encoder is m, and the value of m is the Batch size set by the transform network, and the optimal parameters are obtained according to the multiple experimental results. The Batch size is also the optimum parameter for the network to be tuned, and the value obtained by the experiment is 64. The present embodiment includes 6 sequentially arranged Encoder encoders.
And 3, classifying the labels of the users and the activities by applying a top full-connection layer. The specific method of the step 3 comprises the following steps:
step 3.1, inputting the two-dimensional matrix of the T multiplied by C eigenvector obtained in the step 2.9 into a full connection layer, and automatically tiling to generate a one-dimensional vector with the length of T multiplied by C;
step 3.2, mapping the T multiplied by C one-dimensional feature vector to a sample marking space through a full connection layer to obtain a classification result vector, wherein elements in the vector are numerical values of each category obtained by weighting and summing the features;
step 3.3, a Softmax function is adopted as a classifier in the full connection layer, the input of the neurons of the full connection layer is mapped to the output end, each output value in the classification result vector is converted into probability, and a final classification vector Yt is obtained; calculating the difference between the expected output and the actual output by taking the Cross Entropy of Cross Encopy of the expected output and the actual output as a loss function;
and 3.4, finally outputting a final classification vector Yt by the whole network model based on the transform network improvement, wherein the vector Yt comprises user identification information and activity identification information, the first a elements in the vector respectively represent corresponding residents, the second b elements respectively represent corresponding behavior activities, and the value of each element represents the probability of identifying the corresponding resident or activity. The dimension of the vector Yt in step 3.4 is the sum of the number of residents a and the number of activities b.
The vector Yt is a classification result generated after the feature vector passes through the full connection layer. The Transformer network is originally used for natural language processing and is firstly applied to the field of human behavior recognition. The invention improves the Transformer network according to the requirement of the identification task, removes the decoder therein and adds the full connection layer.
In summary, the multi-user behavior recognition method based on the transform network provided by the invention specifically comprises the steps of collecting and sampling environmental sensor data, preprocessing the data to obtain sampling segments, adding position codes, giving different importance to the data by using an attention mechanism, and realizing recognition and classification of users and activities through a full connection layer.
The following examples are given to illustrate embodiments of the present invention. Referring to fig. 1, the present embodiment provides a method for identifying multi-user behaviors based on a transform network, including the following steps:
(1) data is collected using environmental sensors. The method is characterized in that 37 binary sensors are installed in a workplace, a plurality of volunteer participants are recruited to perform a series of activities in an intelligent home, and 15 daily life activities including door opening, stair climbing, window opening, clothes drying, furniture moving, floor cleaning, flower watering and the like are collected.
(2) Screening original data, screening data with sensor readings of ON from the collected data, removing data with attributes of OFF, and identifying each ON data as an event.
(3) The data segment is intercepted using a sliding window method, with the sliding window size set to 12.
(4) Each sample is converted into a vector by an embedding algorithm.
(5) The result of the embedding is an embedding matrix of the form PT×CWherein T and C areRespectively, a time series dimension and a channel dimension. In this process, the time series dimension is the sliding window length 12, and each channel represents a corresponding sensor, the number being 37.
(6) Position coding is added, adding a vector in each input embedding. By adding different values to these embedded vectors, meaningful distances between the embedded vectors can be provided.
(7) A certain number (Batch size) of vectors obtained by (6) enter the encoder as Input (Input).
(8) These vectors are passed to a multi-headed attention layer. The method comprises the following steps of adopting a multi-attention mechanism to respectively calculate attention values under different attention heads, enabling a network to pay more attention to a key frame which contributes most to behavior identification, and adopting the following calculation method:
MultiHead(Q,K,V)=Concat(head1,...,headh)
wherein Q, K, V represent the query vector, key vector and value vector in the attention mechanism, respectively.
(9) And the new characterization vectors generated by the attention Layer are normalized through Layer Normalization, so that the nonlinear processing of data by a subsequent ReLU activation function in Feed forwarding is facilitated. And (4) summing the input matrix in the step (7) and the matrix obtained in the step (8), and normalizing.
(10) And then the vector is transferred to a Feed-Forward neural network for processing, and the expression capability of the characterization vector is enhanced through an activation function.
(11) And then entering a normalization layer to carry out a summation normalization step.
(12) The output is sent to the next encoder, repeating the steps (8) - (11), the architecture proposed by the present invention comprises 6 encoders.
(13) When a two-dimensional matrix of x eigenvectors enters the fully connected layer, it will be tiled into one-dimensional vectors of length x.
(14) The vector in (1) enters full connection layer processing. The number of the fully-connected layers is set to be 2, and 256 hidden neurons are arranged in the fully-connected layers.
(15) And a Softmax function is adopted in the full connection layer as a classifier, and the cross entropy is adopted as a loss function. The Softmax function maps the inputs of the fully-connected layer neurons to the outputs, transforming each output value into a probability corresponding to each class.
(16) Outputting a predefined resident and activity vector containing user identification information and activity identification information, the Boolean value of each element in the vector reflecting a determination of whether the corresponding resident performs the corresponding activity.
(17) And finally, taking the Accuracy (Accuracy) as an index for evaluating the Accuracy of user identification and activity identification.
In summary, the multi-user behavior identification method based on the Transformer network can realize effective identification of multiple users and corresponding activities. The method comprises the steps of firstly collecting and sampling environmental sensor data, then preprocessing the data and obtaining sampling segments, then adding position codes and giving different importance to the data by an attention mechanism, and finally realizing the identification and classification of users and activities through a full connection layer. The method has the advantages that the method is simple and convenient to operate from end to end, a time attention mechanism enables a network to pay more attention to the key frame which contributes most to behavior identification, the model is light, simple and effective, and multiple users can be identified at the same time.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.
Claims (9)
1. A multi-user behavior identification method based on a Transformer network is characterized by comprising the following steps:
step 1, collecting an environment sensor data set, taking sensor data based on a time sequence as input, entering a model, and sampling through a sliding window with a fixed size;
step 2, embedding the sampled events into an initial vector, then adding position codes to represent the sequence of the events in the sequence, and then enabling the vector to enter an Encoder of a transform network;
and 3, classifying the labels of the users and the activities by applying a top full-connection layer.
2. The method for recognizing multi-user behaviors based on the Transformer network as claimed in claim 1, wherein the specific method in the step 1 comprises:
step 1.1, arranging an environmental sensor in a measured space region, and collecting user behavior data;
step 1.2, the collected environmental sensor data is represented by ON or OFF, wherein ON represents that the sensor is triggered, and OFF represents that the sensor is not triggered;
step 1.3, screening original data, removing data with the attribute of OFF, reserving data with the attribute of ON, taking each ON data as an event, and arranging the screened ON data according to a time sequence to form time sequence data;
and step 1.4, segmenting the time sequence data obtained in the step 1.3 to obtain a data slice sample.
3. The method for recognizing multi-person behaviors based on the Transformer network as claimed in claim 1, wherein the specific method in step 1.4 comprises: arranging the screened data with ON attributes according to a time sequence to form a group of time sequence data; and acquiring original information on the time sequence data by using a preset sliding window with a fixed size, wherein the acquired result of the sliding window is used as a data slice sample.
4. The method for multi-user behavior recognition based on the Transformer network as claimed in claim 1, wherein: the preset fixed-size sliding window size k is an empirical parameter.
5. The method for recognizing multi-user behaviors based on the Transformer network as claimed in claim 1, wherein the specific method in the step 2 comprises:
step 2.1, mapping the discrete data variable corresponding to each slice data sample to a continuous characterization vector through an Embedding algorithm, and performing independent heat treatment on each sample data by the Embedding algorithm to convert the sample data into a vector;
step 2.2, the embedded result set is the embedded matrix RT×CWherein T represents a time series dimension and C represents a channel dimension; in the process, the time series dimension is the length k of a sliding window in data slicing, each channel represents a corresponding sensor, and the number is N;
and 2.3, adding position codes. Constructing a matrix PE with the same dimension as the embedded matrix, wherein rows of the matrix PE represent time sequence samples, columns represent sensors, and each value in the matrix PE is obtained by the following formula;
PE(pos,2i)=sin(pos/100002i/dmodel)
PE(pos,2i+1)=cos(pos/100002i/dmodel)
wherein PE represents a position coding matrix, pos represents a serial number corresponding to the sensor, i represents the position of the row vector in the matrix, and dmodelA dimension representing a row vector;
adding the PE matrix and the embedded matrix to obtain a new eigenvector matrix introduced with position coding;
step 2.4, inputting m row vectors in the new characteristic vector matrix into an Encoder, wherein the numerical value of m is Batch size Batch size set by the transform network;
step 2.5, the vector entering the encoder is firstly transmitted to a multi-head attention layer to obtain a new characterization vector; respectively calculating attention values under different attention heads by adopting a Multi-attention mechanism, so that the network pays more attention to a key frame which has the maximum contribution to behavior identification, wherein the calculation method comprises the following steps of:
MultiHead(Q,K,V)=Concat(head1,…,headh)
q, K and V respectively represent Query vectors in the attention mechanism, and represent sample attributes matched with the samples; the Value vector Key represents the attribute of the sample and the Value vector Value represents the information contained in the sample;
step 2.6, carrying out Normalization processing on the new characterization vector generated by the attention Layer in the step 2.5 through Layer Normalization, summing the input matrix in the step 2.4 and the matrix obtained in the step 2.5, and carrying out Normalization to obtain a new matrix;
step 2.7, transferring the matrix obtained in the step 2.6 to a Feed-Forward neural network Feed Forward for processing to obtain a reinforced characterization vector matrix;
step 2.8, the reinforced characterization vector matrix obtained in the step 2.7 is accessed into a normalization layer, and elements in the matrix are unitized according to rows to obtain a normalization matrix;
and 2.9, continuously sending the output normalized matrix to the next encoder to obtain a final characteristic matrix.
6. The method for multi-person behavior recognition based on Transformer network as claimed in claim 1, wherein the number of row vectors input into the Encoder in step 2.4 is m, and the value of m is Batch size set by the Transformer network.
7. The method for multi-user behavior recognition based on the Transformer network as claimed in claim 1, wherein: including 6 sequentially arranged encoders.
8. The method for recognizing multi-user behaviors based on the Transformer network as claimed in claim 5, wherein the specific method in the step 3 comprises:
step 3.1, inputting the two-dimensional matrix of the T multiplied by C eigenvector obtained in the step 2.9 into a full connection layer, and automatically tiling to generate a one-dimensional vector with the length of T multiplied by C;
step 3.2, mapping the T multiplied by C one-dimensional feature vector to a sample mark space through a full connection layer to obtain a classification result vector, wherein elements in the vector are numerical values of each category obtained by weighting and summing the features;
step 3.3, a Softmax function is adopted as a classifier in the full connection layer, the input of the neurons of the full connection layer is mapped to the output end, each output value in the classification result vector is converted into probability, and a final classification vector Yt is obtained; calculating the difference between the expected output and the actual output by taking the Cross Entropy of Cross Encopy of the expected output and the actual output as a loss function;
and 3.4, finally outputting a final classification vector Yt by the whole network model based on the transform network improvement, wherein the vector Yt comprises user identification information and activity identification information, the first a elements in the vector respectively represent corresponding residents, the second b elements respectively represent corresponding behavior activities, and the value of each element represents the probability of identifying the corresponding resident or activity.
9. The method for multi-person behavior recognition based on the transform network as claimed in claim 1, wherein the dimension of the vector Yt in step 3.4 is the sum of the number of residents a and the number of activities b.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110312085.7A CN113033657A (en) | 2021-03-24 | 2021-03-24 | Multi-user behavior identification method based on Transformer network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110312085.7A CN113033657A (en) | 2021-03-24 | 2021-03-24 | Multi-user behavior identification method based on Transformer network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113033657A true CN113033657A (en) | 2021-06-25 |
Family
ID=76473181
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110312085.7A Pending CN113033657A (en) | 2021-03-24 | 2021-03-24 | Multi-user behavior identification method based on Transformer network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113033657A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255597A (en) * | 2021-06-29 | 2021-08-13 | 南京视察者智能科技有限公司 | Transformer-based behavior analysis method and device and terminal equipment thereof |
CN113256637A (en) * | 2021-07-15 | 2021-08-13 | 北京小蝇科技有限责任公司 | Urine visible component detection method based on deep learning and context correlation |
CN113397572A (en) * | 2021-07-23 | 2021-09-17 | 中国科学技术大学 | Surface electromyographic signal classification method and system based on Transformer model |
CN113627266A (en) * | 2021-07-15 | 2021-11-09 | 武汉大学 | Video pedestrian re-identification method based on Transformer space-time modeling |
CN113688871A (en) * | 2021-07-26 | 2021-11-23 | 南京信息工程大学 | Transformer-based video multi-label action identification method |
CN113936339A (en) * | 2021-12-16 | 2022-01-14 | 之江实验室 | Fighting identification method and device based on double-channel cross attention mechanism |
CN113936243A (en) * | 2021-12-16 | 2022-01-14 | 之江实验室 | Discrete representation video behavior identification system and method |
CN115937990A (en) * | 2023-02-27 | 2023-04-07 | 珠海金智维信息科技有限公司 | Multi-person interactive action detection system and method |
CN116127364A (en) * | 2023-04-12 | 2023-05-16 | 上海术理智能科技有限公司 | Integrated transducer-based motor imagery decoding method and system |
CN116502069A (en) * | 2023-06-25 | 2023-07-28 | 四川大学 | Haptic time sequence signal identification method based on deep learning |
CN117576150A (en) * | 2023-11-03 | 2024-02-20 | 扬州万方科技股份有限公司 | Multi-mode multi-target 3D tracking method and device considering far-frame dependency relationship |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110020623A (en) * | 2019-04-04 | 2019-07-16 | 中山大学 | Physical activity identifying system and method based on condition variation self-encoding encoder |
CN112464861A (en) * | 2020-12-10 | 2021-03-09 | 中山大学 | Behavior early recognition method, system and storage medium for intelligent human-computer interaction |
-
2021
- 2021-03-24 CN CN202110312085.7A patent/CN113033657A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110020623A (en) * | 2019-04-04 | 2019-07-16 | 中山大学 | Physical activity identifying system and method based on condition variation self-encoding encoder |
CN112464861A (en) * | 2020-12-10 | 2021-03-09 | 中山大学 | Behavior early recognition method, system and storage medium for intelligent human-computer interaction |
Non-Patent Citations (1)
Title |
---|
陈伟;何家欢;裴喜平;: "基于相空间重构和卷积神经网络的电能质量扰动分类", 电力系统保护与控制, no. 14, 16 July 2018 (2018-07-16), pages 99 - 93 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255597A (en) * | 2021-06-29 | 2021-08-13 | 南京视察者智能科技有限公司 | Transformer-based behavior analysis method and device and terminal equipment thereof |
CN113256637A (en) * | 2021-07-15 | 2021-08-13 | 北京小蝇科技有限责任公司 | Urine visible component detection method based on deep learning and context correlation |
CN113256637B (en) * | 2021-07-15 | 2021-11-05 | 北京小蝇科技有限责任公司 | Urine visible component detection method based on deep learning and context correlation |
CN113627266A (en) * | 2021-07-15 | 2021-11-09 | 武汉大学 | Video pedestrian re-identification method based on Transformer space-time modeling |
CN113627266B (en) * | 2021-07-15 | 2023-08-18 | 武汉大学 | Video pedestrian re-recognition method based on transform space-time modeling |
CN113397572A (en) * | 2021-07-23 | 2021-09-17 | 中国科学技术大学 | Surface electromyographic signal classification method and system based on Transformer model |
CN113688871A (en) * | 2021-07-26 | 2021-11-23 | 南京信息工程大学 | Transformer-based video multi-label action identification method |
CN113688871B (en) * | 2021-07-26 | 2022-07-01 | 南京信息工程大学 | Transformer-based video multi-label action identification method |
CN113936339B (en) * | 2021-12-16 | 2022-04-22 | 之江实验室 | Fighting identification method and device based on double-channel cross attention mechanism |
CN113936243A (en) * | 2021-12-16 | 2022-01-14 | 之江实验室 | Discrete representation video behavior identification system and method |
CN113936339A (en) * | 2021-12-16 | 2022-01-14 | 之江实验室 | Fighting identification method and device based on double-channel cross attention mechanism |
CN115937990A (en) * | 2023-02-27 | 2023-04-07 | 珠海金智维信息科技有限公司 | Multi-person interactive action detection system and method |
CN115937990B (en) * | 2023-02-27 | 2023-06-23 | 珠海金智维信息科技有限公司 | Multi-person interaction detection system and method |
CN116127364A (en) * | 2023-04-12 | 2023-05-16 | 上海术理智能科技有限公司 | Integrated transducer-based motor imagery decoding method and system |
CN116502069A (en) * | 2023-06-25 | 2023-07-28 | 四川大学 | Haptic time sequence signal identification method based on deep learning |
CN116502069B (en) * | 2023-06-25 | 2023-09-12 | 四川大学 | Haptic time sequence signal identification method based on deep learning |
CN117576150A (en) * | 2023-11-03 | 2024-02-20 | 扬州万方科技股份有限公司 | Multi-mode multi-target 3D tracking method and device considering far-frame dependency relationship |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113033657A (en) | Multi-user behavior identification method based on Transformer network | |
CN111898736B (en) | Efficient pedestrian re-identification method based on attribute perception | |
CN111860600B (en) | User electricity utilization characteristic selection method based on maximum correlation minimum redundancy criterion | |
CN105654037A (en) | Myoelectric signal gesture recognition method based on depth learning and feature images | |
CN111859010B (en) | Semi-supervised audio event identification method based on depth mutual information maximization | |
CN110188653A (en) | Activity recognition method based on local feature polymerization coding and shot and long term memory network | |
CN110108914A (en) | One kind is opposed electricity-stealing intelligent decision making method, system, equipment and medium | |
CN110287863A (en) | A kind of gesture identification method based on WiFi signal | |
CN108392213B (en) | Psychological analysis method and device based on painting psychology | |
CN112464730B (en) | Pedestrian re-identification method based on domain-independent foreground feature learning | |
CN110119545A (en) | A kind of non-intrusive electrical load recognition methods based on stack self-encoding encoder | |
Yang et al. | Auroral sequence representation and classification using hidden Markov models | |
CN110245707B (en) | Human body walking posture vibration information identification method and system based on scorpion positioning | |
CN114176607B (en) | Electroencephalogram signal classification method based on vision transducer | |
Li et al. | Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes | |
CN113069117A (en) | Electroencephalogram emotion recognition method and system based on time convolution neural network | |
CN109241870B (en) | Coal mine underground personnel identity identification method based on gait identification | |
CN115457403A (en) | Intelligent crop identification method based on multi-type remote sensing images | |
Hagiwara et al. | BEANS: The benchmark of animal sounds | |
CN117540908A (en) | Agricultural resource integration method and system based on big data | |
CN107045624B (en) | Electroencephalogram signal preprocessing and classifying method based on maximum weighted cluster | |
Tang et al. | Transound: Hyper-head attention transformer for birds sound recognition | |
CN110889335A (en) | Human skeleton double-person interaction behavior recognition method based on multi-channel space-time fusion network | |
CN113705695A (en) | Power distribution network fault data identification method based on convolutional neural network | |
CN117272230A (en) | Non-invasive load monitoring method and system based on multi-task learning model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |