CN113033657A - Multi-user behavior identification method based on Transformer network - Google Patents

Multi-user behavior identification method based on Transformer network Download PDF

Info

Publication number
CN113033657A
CN113033657A CN202110312085.7A CN202110312085A CN113033657A CN 113033657 A CN113033657 A CN 113033657A CN 202110312085 A CN202110312085 A CN 202110312085A CN 113033657 A CN113033657 A CN 113033657A
Authority
CN
China
Prior art keywords
vector
matrix
data
network
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110312085.7A
Other languages
Chinese (zh)
Inventor
曹菁菁
储洁
郭富康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202110312085.7A priority Critical patent/CN113033657A/en
Publication of CN113033657A publication Critical patent/CN113033657A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/02Alarms for ensuring the safety of persons
    • G08B21/04Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons
    • G08B21/0438Sensor means for detecting

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Gerontology & Geriatric Medicine (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-user behavior identification method based on a Transformer network, which comprises the following steps: collecting an environment sensor data set, taking sensor data based on a time sequence as input to enter a model, and sampling through a sliding window with a fixed size; embedding the sampled events into an initial vector, then adding position codes to represent the sequence of the events in the sequence, and then enabling the vector to enter an Encoder of a transform network; the top full connectivity layer is applied to classify the labels of users and activities. The invention uses an end-to-end method, and avoids the process that the traditional machine learning method needs to manually make characteristics and distinguish a training set test set. The invention uses a time attention mechanism to enable the network to pay more attention to the key frame which contributes most to behavior recognition, and can effectively solve the problem of giving equal importance to time sequence data when the deep neural network automatically extracts the characteristics.

Description

Multi-user behavior identification method based on Transformer network
Technical Field
The invention belongs to the field of human behavior recognition, and particularly relates to a multi-person behavior recognition method based on an Encoder technology in a transform network, which is mainly used for recognizing human behaviors aiming at environmental sensor data.
Background
In recent years, human behavior recognition has received much attention. Accurate and efficient human behavior recognition plays an important role in human-computer interaction, family safety monitoring and the like. Human behavior recognition may contribute to detecting behavioral activities of the elderly, identifying potential safety hazards and physical degradation, etc. As the basis of smart home, human behavior recognition needs to be performed based on data obtained by sensors. Compare with video sensor and wearing sensor, environmental sensor installs on floor, door and window or electrical equipment, reduces the inconvenience that the data acquisition process probably caused the resident activity, uses more extensively. The current research of human behavior recognition technology based on environmental sensor data faces the following problems:
1. multi-person behavior recognition is difficult: much research is currently focused on identifying the behavior of a single resident, however, there are always a plurality of residents with different behavior habits in a room, and parallel activities or cooperative activities exist, which bring complex challenges to the activity identification.
2. The traditional machine learning method has low recognition efficiency: the machine learning method requires the use of hand-made statistical and frequency features to represent segments of the raw sensor stream and train the machine learning model to classify residents and activities. The effectiveness of this method depends to a large extent on the effectiveness of the manual feature.
3. Neural networks are not suitable for processing binary data: with the research in the field of deep learning, CNN is gradually applied to human behavior recognition, but it is mainly used to process continuous signal data and lacks adaptability to binary environmental sensor data.
The invention content is as follows:
in order to overcome the defects of the background art, the invention provides a multi-user behavior identification method based on a transform network, which is used for solving the problem of simultaneously finishing the identification of multiple users and corresponding activities according to data collected by an environmental sensor.
In order to solve the technical problems, the invention adopts the technical scheme that:
a multi-person behavior identification method based on a Transformer network comprises the following steps:
step 1, collecting an environment sensor data set, taking sensor data based on a time sequence as input, entering a model, and sampling through a sliding window with a fixed size;
step 2, embedding the sampled events into an initial vector, then adding position codes to represent the sequence of the events in the sequence, and then enabling the vector to enter an Encoder of a transform network;
and 3, classifying the labels of the users and the activities by applying a top full-connection layer.
Preferably, the specific method of step 1 comprises:
step 1.1, arranging an environmental sensor in a measured space region, and collecting user behavior data;
step 1.2, the collected environmental sensor data is represented by ON or OFF, wherein ON represents that the sensor is triggered, and OFF represents that the sensor is not triggered;
step 1.3, screening original data, removing data with the attribute of OFF, reserving data with the attribute of ON, taking each ON data as an event, and arranging the screened ON data according to a time sequence to form time sequence data;
and step 1.4, segmenting the time sequence data obtained in the step 1.3 to obtain a data slice sample.
Preferably, the specific method of step 1.4 comprises: arranging the screened data with ON attributes according to a time sequence to form a group of time sequence data; and acquiring original information on the time sequence data by using a sliding window with a preset fixed size, wherein the acquired result of the sliding window is used as a data slice sample.
Preferably, the predetermined fixed-size sliding window size k is an empirical parameter.
Preferably, the specific method of step 2 comprises:
step 2.1, mapping the discrete data variable corresponding to each slice data sample to a continuous characterization vector through an Embedding algorithm, and performing independent heat treatment on each sample data by the Embedding algorithm to convert the sample data into a vector;
step 2.2, the embedded result set is the embedded matrix RT×CWherein T represents a time series dimension and C represents a channel dimension; in the process, the time series dimension is the length k of a sliding window in data slicing, each channel represents a corresponding sensor, and the number is N;
and 2.3, adding position codes. Constructing a matrix PE with the same dimension as the embedded matrix, wherein the rows of the matrix PE represent time sequence samples, the columns represent sensors, and each value in the matrix PE is obtained by the following formula;
PE(pos,2i)=sin(pos/100002i/dmodel)
PE(pos,2i+1)=cos(pos/100002i/dmode)
wherein PE represents a position coding matrix, pos represents a serial number corresponding to the sensor, i represents the position of the row vector in the matrix, and dmodelA dimension representing a row vector;
adding the PE matrix and the embedded matrix to obtain a new eigenvector matrix introduced with position coding;
step 2.4, inputting m row vectors in the new characteristic vector matrix into an Encoder, wherein the numerical value of m is Batch size Batch size set by the transform network;
step 2.5, the vector entering the encoder is firstly transmitted to a multi-head attention layer to obtain a new characterization vector; respectively calculating attention values under different attention heads by adopting a Multi-attention mechanism, so that the network pays more attention to a key frame which has the maximum contribution to behavior identification, wherein the calculation method comprises the following steps of:
Figure BDA0002990203440000041
Figure BDA0002990203440000042
MultiHead(Q,K,V)=Concat(head1,...,headh)
q, K and V respectively represent Query vectors in the attention mechanism, and represent sample attributes matched with the samples; the Value vector Key represents the attribute of the sample and the Value vector Value represents the information contained in the sample;
step 2.6, carrying out Normalization processing on the new characterization vector generated by the attention Layer in the step 2.5 through Layer Normalization, summing the input matrix in the step 2.4 and the matrix obtained in the step 2.5, and carrying out Normalization to obtain a new matrix;
step 2.7, transferring the matrix obtained in the step 2.6 to a Feed-Forward neural network Feed Forward for processing to obtain a reinforced characterization vector matrix;
step 2.8, the reinforced characterization vector matrix obtained in the step 2.7 is accessed into a normalization layer, and elements in the matrix are unitized according to rows to obtain a normalization matrix;
and 2.9, continuously sending the output normalized matrix to the next encoder to obtain a final characteristic matrix.
Preferably, the number of row vectors input to the Encoder in step 2.4 is m, the value of m being the Batch size set by the transform network.
Preferably, 6 sequentially arranged Encoder encoders are included.
Preferably, the specific method of step 3 comprises:
step 3.1, inputting the two-dimensional matrix of the T multiplied by C eigenvector obtained in the step 2.9 into a full connection layer, and automatically tiling to generate a one-dimensional vector with the length of T multiplied by C;
step 3.2, mapping the T multiplied by C one-dimensional feature vector to a sample marking space through a full connection layer to obtain a classification result vector, wherein elements in the vector are numerical values of each category obtained by weighting and summing the features;
step 3.3, a Softmax function is adopted as a classifier in the full connection layer, the input of the neurons of the full connection layer is mapped to the output end, each output value in the classification result vector is converted into probability, and a final classification vector Yt is obtained; calculating the difference between the expected output and the actual output by taking the Cross Entropy of Cross Encopy of the expected output and the actual output as a loss function;
and 3.4, finally outputting a final classification vector Yt by the whole network model based on the transform network improvement, wherein the vector Yt comprises user identification information and activity identification information, the first a elements in the vector respectively represent corresponding residents, the second b elements respectively represent corresponding behavior activities, and the value of each element represents the probability of identifying the corresponding resident or activity.
Preferably, the dimension of the vector Yt in step 3.4 is the sum of the number of residents a and the number of activities b.
The invention has the beneficial effects that: the invention uses an end-to-end method, and avoids the process that the traditional machine learning method needs to manually make characteristics and distinguish a training set test set. The invention uses a time attention mechanism to enable the network to pay more attention to the key frame which contributes most to behavior recognition, and can effectively solve the problem of giving equal importance to time sequence data when the deep neural network automatically extracts the characteristics. The invention uses the improved Transformer structure model, and the tasks only need to be classified, so that the decoder structure in the original model is deleted, and the accuracy of user and activity identification is improved by a more simplified framework. The invention can realize the identification of a plurality of users and simultaneously output the corresponding activity of each user.
Drawings
FIG. 1 is a schematic overall flow diagram of an embodiment of the present invention;
FIG. 2 is a schematic diagram of sliding window sampling according to an embodiment of the present invention;
FIG. 3 is a schematic view of a model of an attention mechanism according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an Encoder of the Encoder in the embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings and examples.
A multi-person behavior identification method based on a Transformer network comprises the following steps:
step 1, collecting an environment sensor data set, taking sensor data based on a time sequence as input, entering a model, and sampling through a sliding window with a fixed size; the specific method of step 1 comprises:
step 1.1, arranging an environmental sensor in a measured space region, and collecting user behavior data;
step 1.2, the collected environmental sensor data is represented by ON or OFF, wherein ON represents that the sensor is triggered, and OFF represents that the sensor is not triggered;
step 1.3, screening original data, removing data with the attribute of OFF, reserving data with the attribute of ON, taking each ON data as an event, and arranging the screened ON data according to a time sequence to form time sequence data;
step 1.4, segmenting the time sequence data obtained in the step 1.3 to obtain a data slice sample; the specific method comprises the following steps: arranging the screened data with ON attributes according to a time sequence to form a group of time sequence data; and acquiring original information on the time sequence data by using a sliding window with a preset fixed size, wherein the acquired result of the sliding window is used as a data slice sample. The predetermined fixed size sliding window size k is an empirical parameter.
Step 2, embedding the sampled events into an initial vector, then adding position codes to represent the sequence of the events in the sequence, and then enabling the vector to enter an Encoder of a transform network; the specific method of the step 2 comprises the following steps:
step 2.1, mapping the discrete data variable corresponding to each slice data sample to a continuous characterization vector through an Embedding algorithm, and performing independent heat treatment on each sample data by the Embedding algorithm to convert the sample data into a vector;
step 2.2, the embedded result set is the embedded matrix RT×CWherein T represents a time series dimension and C represents a channel dimension; in the process, the time series dimension is the length k of a sliding window in data slicing, each channel represents a corresponding sensor, and the number is N;
and 2.3, adding position codes. Constructing a matrix PE with the same dimension as the embedded matrix, wherein the rows of the matrix PE represent time sequence samples, the columns represent sensors, and each value in the matrix PE is obtained by the following formula;
PE(pos,2i)=sin(pos/100002i/dmodel)
PE(pos,2i+1)=cos(pos/100002i/dmodel)
wherein PE represents a position coding matrix, pos represents a serial number corresponding to the sensor, i represents the position of the row vector in the matrix, and dmodelA dimension representing a row vector;
adding the PE matrix and the embedded matrix to obtain a new eigenvector matrix introduced with position coding;
step 2.4, inputting m row vectors in the new characteristic vector matrix into an Encoder, wherein the numerical value of m is Batch size Batch size set by the transform network;
step 2.5, the vector entering the Encoder is firstly transmitted to a multi-head attention layer (referring to fig. 4, the multi-head attention layer is one of the internal structures of the Encoder) to obtain a new characterization vector; respectively calculating attention values under different attention heads by adopting a Multi-attention mechanism, so that the network pays more attention to a key frame which has the maximum contribution to behavior identification, wherein the calculation method comprises the following steps of:
Figure BDA0002990203440000081
Figure BDA0002990203440000082
MultiHead(Q,K,V)=Concat(head1,...,headh)
q, K and V respectively represent Query vectors in the attention mechanism, and represent sample attributes matched with the samples; the Value vector Key represents the attribute of the sample and the Value vector Value represents the information contained in the sample;
and 2.6, carrying out Normalization processing on the new characterization vector generated by the attention Layer in the step 2.5 through Layer Normalization, and facilitating nonlinear processing of data by a ReLU activation function in a Feed Forward neural network in the follow-up process. Summing the input matrix in the step 2.4 and the matrix obtained in the step 2.5, and normalizing to obtain a new matrix;
step 2.7, transferring the matrix obtained in the step 2.6 to a Feed-Forward neural network Feed Forward for processing to obtain a reinforced characterization vector matrix; the expressive power of the characterization vector is enhanced by activating the function.
Step 2.8, in order to avoid gradient disappearance and accelerate the convergence process of full-connection layer training, the strengthened characterization vector matrix obtained in the step 2.7 is connected into a normalization layer, and elements in the matrix are unitized according to rows to obtain a normalization matrix;
and 2.9, continuously sending the output normalized matrix to the next encoder to obtain a final characteristic matrix.
Preferably, in step 2.4, the number of row vectors of the Encoder is m, and the value of m is the Batch size set by the transform network, and the optimal parameters are obtained according to the multiple experimental results. The Batch size is also the optimum parameter for the network to be tuned, and the value obtained by the experiment is 64. The present embodiment includes 6 sequentially arranged Encoder encoders.
And 3, classifying the labels of the users and the activities by applying a top full-connection layer. The specific method of the step 3 comprises the following steps:
step 3.1, inputting the two-dimensional matrix of the T multiplied by C eigenvector obtained in the step 2.9 into a full connection layer, and automatically tiling to generate a one-dimensional vector with the length of T multiplied by C;
step 3.2, mapping the T multiplied by C one-dimensional feature vector to a sample marking space through a full connection layer to obtain a classification result vector, wherein elements in the vector are numerical values of each category obtained by weighting and summing the features;
step 3.3, a Softmax function is adopted as a classifier in the full connection layer, the input of the neurons of the full connection layer is mapped to the output end, each output value in the classification result vector is converted into probability, and a final classification vector Yt is obtained; calculating the difference between the expected output and the actual output by taking the Cross Entropy of Cross Encopy of the expected output and the actual output as a loss function;
and 3.4, finally outputting a final classification vector Yt by the whole network model based on the transform network improvement, wherein the vector Yt comprises user identification information and activity identification information, the first a elements in the vector respectively represent corresponding residents, the second b elements respectively represent corresponding behavior activities, and the value of each element represents the probability of identifying the corresponding resident or activity. The dimension of the vector Yt in step 3.4 is the sum of the number of residents a and the number of activities b.
The vector Yt is a classification result generated after the feature vector passes through the full connection layer. The Transformer network is originally used for natural language processing and is firstly applied to the field of human behavior recognition. The invention improves the Transformer network according to the requirement of the identification task, removes the decoder therein and adds the full connection layer.
In summary, the multi-user behavior recognition method based on the transform network provided by the invention specifically comprises the steps of collecting and sampling environmental sensor data, preprocessing the data to obtain sampling segments, adding position codes, giving different importance to the data by using an attention mechanism, and realizing recognition and classification of users and activities through a full connection layer.
The following examples are given to illustrate embodiments of the present invention. Referring to fig. 1, the present embodiment provides a method for identifying multi-user behaviors based on a transform network, including the following steps:
(1) data is collected using environmental sensors. The method is characterized in that 37 binary sensors are installed in a workplace, a plurality of volunteer participants are recruited to perform a series of activities in an intelligent home, and 15 daily life activities including door opening, stair climbing, window opening, clothes drying, furniture moving, floor cleaning, flower watering and the like are collected.
(2) Screening original data, screening data with sensor readings of ON from the collected data, removing data with attributes of OFF, and identifying each ON data as an event.
(3) The data segment is intercepted using a sliding window method, with the sliding window size set to 12.
(4) Each sample is converted into a vector by an embedding algorithm.
(5) The result of the embedding is an embedding matrix of the form PT×CWherein T and C areRespectively, a time series dimension and a channel dimension. In this process, the time series dimension is the sliding window length 12, and each channel represents a corresponding sensor, the number being 37.
(6) Position coding is added, adding a vector in each input embedding. By adding different values to these embedded vectors, meaningful distances between the embedded vectors can be provided.
(7) A certain number (Batch size) of vectors obtained by (6) enter the encoder as Input (Input).
(8) These vectors are passed to a multi-headed attention layer. The method comprises the following steps of adopting a multi-attention mechanism to respectively calculate attention values under different attention heads, enabling a network to pay more attention to a key frame which contributes most to behavior identification, and adopting the following calculation method:
Figure BDA0002990203440000111
Figure BDA0002990203440000112
MultiHead(Q,K,V)=Concat(head1,...,headh)
wherein Q, K, V represent the query vector, key vector and value vector in the attention mechanism, respectively.
(9) And the new characterization vectors generated by the attention Layer are normalized through Layer Normalization, so that the nonlinear processing of data by a subsequent ReLU activation function in Feed forwarding is facilitated. And (4) summing the input matrix in the step (7) and the matrix obtained in the step (8), and normalizing.
(10) And then the vector is transferred to a Feed-Forward neural network for processing, and the expression capability of the characterization vector is enhanced through an activation function.
(11) And then entering a normalization layer to carry out a summation normalization step.
(12) The output is sent to the next encoder, repeating the steps (8) - (11), the architecture proposed by the present invention comprises 6 encoders.
(13) When a two-dimensional matrix of x eigenvectors enters the fully connected layer, it will be tiled into one-dimensional vectors of length x.
(14) The vector in (1) enters full connection layer processing. The number of the fully-connected layers is set to be 2, and 256 hidden neurons are arranged in the fully-connected layers.
(15) And a Softmax function is adopted in the full connection layer as a classifier, and the cross entropy is adopted as a loss function. The Softmax function maps the inputs of the fully-connected layer neurons to the outputs, transforming each output value into a probability corresponding to each class.
(16) Outputting a predefined resident and activity vector containing user identification information and activity identification information, the Boolean value of each element in the vector reflecting a determination of whether the corresponding resident performs the corresponding activity.
(17) And finally, taking the Accuracy (Accuracy) as an index for evaluating the Accuracy of user identification and activity identification.
In summary, the multi-user behavior identification method based on the Transformer network can realize effective identification of multiple users and corresponding activities. The method comprises the steps of firstly collecting and sampling environmental sensor data, then preprocessing the data and obtaining sampling segments, then adding position codes and giving different importance to the data by an attention mechanism, and finally realizing the identification and classification of users and activities through a full connection layer. The method has the advantages that the method is simple and convenient to operate from end to end, a time attention mechanism enables a network to pay more attention to the key frame which contributes most to behavior identification, the model is light, simple and effective, and multiple users can be identified at the same time.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (9)

1. A multi-user behavior identification method based on a Transformer network is characterized by comprising the following steps:
step 1, collecting an environment sensor data set, taking sensor data based on a time sequence as input, entering a model, and sampling through a sliding window with a fixed size;
step 2, embedding the sampled events into an initial vector, then adding position codes to represent the sequence of the events in the sequence, and then enabling the vector to enter an Encoder of a transform network;
and 3, classifying the labels of the users and the activities by applying a top full-connection layer.
2. The method for recognizing multi-user behaviors based on the Transformer network as claimed in claim 1, wherein the specific method in the step 1 comprises:
step 1.1, arranging an environmental sensor in a measured space region, and collecting user behavior data;
step 1.2, the collected environmental sensor data is represented by ON or OFF, wherein ON represents that the sensor is triggered, and OFF represents that the sensor is not triggered;
step 1.3, screening original data, removing data with the attribute of OFF, reserving data with the attribute of ON, taking each ON data as an event, and arranging the screened ON data according to a time sequence to form time sequence data;
and step 1.4, segmenting the time sequence data obtained in the step 1.3 to obtain a data slice sample.
3. The method for recognizing multi-person behaviors based on the Transformer network as claimed in claim 1, wherein the specific method in step 1.4 comprises: arranging the screened data with ON attributes according to a time sequence to form a group of time sequence data; and acquiring original information on the time sequence data by using a preset sliding window with a fixed size, wherein the acquired result of the sliding window is used as a data slice sample.
4. The method for multi-user behavior recognition based on the Transformer network as claimed in claim 1, wherein: the preset fixed-size sliding window size k is an empirical parameter.
5. The method for recognizing multi-user behaviors based on the Transformer network as claimed in claim 1, wherein the specific method in the step 2 comprises:
step 2.1, mapping the discrete data variable corresponding to each slice data sample to a continuous characterization vector through an Embedding algorithm, and performing independent heat treatment on each sample data by the Embedding algorithm to convert the sample data into a vector;
step 2.2, the embedded result set is the embedded matrix RT×CWherein T represents a time series dimension and C represents a channel dimension; in the process, the time series dimension is the length k of a sliding window in data slicing, each channel represents a corresponding sensor, and the number is N;
and 2.3, adding position codes. Constructing a matrix PE with the same dimension as the embedded matrix, wherein rows of the matrix PE represent time sequence samples, columns represent sensors, and each value in the matrix PE is obtained by the following formula;
PE(pos,2i)=sin(pos/100002i/dmodel)
PE(pos,2i+1)=cos(pos/100002i/dmodel)
wherein PE represents a position coding matrix, pos represents a serial number corresponding to the sensor, i represents the position of the row vector in the matrix, and dmodelA dimension representing a row vector;
adding the PE matrix and the embedded matrix to obtain a new eigenvector matrix introduced with position coding;
step 2.4, inputting m row vectors in the new characteristic vector matrix into an Encoder, wherein the numerical value of m is Batch size Batch size set by the transform network;
step 2.5, the vector entering the encoder is firstly transmitted to a multi-head attention layer to obtain a new characterization vector; respectively calculating attention values under different attention heads by adopting a Multi-attention mechanism, so that the network pays more attention to a key frame which has the maximum contribution to behavior identification, wherein the calculation method comprises the following steps of:
Figure FDA0002990203430000031
Figure FDA0002990203430000032
MultiHead(Q,K,V)=Concat(head1,…,headh)
q, K and V respectively represent Query vectors in the attention mechanism, and represent sample attributes matched with the samples; the Value vector Key represents the attribute of the sample and the Value vector Value represents the information contained in the sample;
step 2.6, carrying out Normalization processing on the new characterization vector generated by the attention Layer in the step 2.5 through Layer Normalization, summing the input matrix in the step 2.4 and the matrix obtained in the step 2.5, and carrying out Normalization to obtain a new matrix;
step 2.7, transferring the matrix obtained in the step 2.6 to a Feed-Forward neural network Feed Forward for processing to obtain a reinforced characterization vector matrix;
step 2.8, the reinforced characterization vector matrix obtained in the step 2.7 is accessed into a normalization layer, and elements in the matrix are unitized according to rows to obtain a normalization matrix;
and 2.9, continuously sending the output normalized matrix to the next encoder to obtain a final characteristic matrix.
6. The method for multi-person behavior recognition based on Transformer network as claimed in claim 1, wherein the number of row vectors input into the Encoder in step 2.4 is m, and the value of m is Batch size set by the Transformer network.
7. The method for multi-user behavior recognition based on the Transformer network as claimed in claim 1, wherein: including 6 sequentially arranged encoders.
8. The method for recognizing multi-user behaviors based on the Transformer network as claimed in claim 5, wherein the specific method in the step 3 comprises:
step 3.1, inputting the two-dimensional matrix of the T multiplied by C eigenvector obtained in the step 2.9 into a full connection layer, and automatically tiling to generate a one-dimensional vector with the length of T multiplied by C;
step 3.2, mapping the T multiplied by C one-dimensional feature vector to a sample mark space through a full connection layer to obtain a classification result vector, wherein elements in the vector are numerical values of each category obtained by weighting and summing the features;
step 3.3, a Softmax function is adopted as a classifier in the full connection layer, the input of the neurons of the full connection layer is mapped to the output end, each output value in the classification result vector is converted into probability, and a final classification vector Yt is obtained; calculating the difference between the expected output and the actual output by taking the Cross Entropy of Cross Encopy of the expected output and the actual output as a loss function;
and 3.4, finally outputting a final classification vector Yt by the whole network model based on the transform network improvement, wherein the vector Yt comprises user identification information and activity identification information, the first a elements in the vector respectively represent corresponding residents, the second b elements respectively represent corresponding behavior activities, and the value of each element represents the probability of identifying the corresponding resident or activity.
9. The method for multi-person behavior recognition based on the transform network as claimed in claim 1, wherein the dimension of the vector Yt in step 3.4 is the sum of the number of residents a and the number of activities b.
CN202110312085.7A 2021-03-24 2021-03-24 Multi-user behavior identification method based on Transformer network Pending CN113033657A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110312085.7A CN113033657A (en) 2021-03-24 2021-03-24 Multi-user behavior identification method based on Transformer network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110312085.7A CN113033657A (en) 2021-03-24 2021-03-24 Multi-user behavior identification method based on Transformer network

Publications (1)

Publication Number Publication Date
CN113033657A true CN113033657A (en) 2021-06-25

Family

ID=76473181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110312085.7A Pending CN113033657A (en) 2021-03-24 2021-03-24 Multi-user behavior identification method based on Transformer network

Country Status (1)

Country Link
CN (1) CN113033657A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255597A (en) * 2021-06-29 2021-08-13 南京视察者智能科技有限公司 Transformer-based behavior analysis method and device and terminal equipment thereof
CN113256637A (en) * 2021-07-15 2021-08-13 北京小蝇科技有限责任公司 Urine visible component detection method based on deep learning and context correlation
CN113397572A (en) * 2021-07-23 2021-09-17 中国科学技术大学 Surface electromyographic signal classification method and system based on Transformer model
CN113627266A (en) * 2021-07-15 2021-11-09 武汉大学 Video pedestrian re-identification method based on Transformer space-time modeling
CN113688871A (en) * 2021-07-26 2021-11-23 南京信息工程大学 Transformer-based video multi-label action identification method
CN113936339A (en) * 2021-12-16 2022-01-14 之江实验室 Fighting identification method and device based on double-channel cross attention mechanism
CN113936243A (en) * 2021-12-16 2022-01-14 之江实验室 Discrete representation video behavior identification system and method
CN115937990A (en) * 2023-02-27 2023-04-07 珠海金智维信息科技有限公司 Multi-person interactive action detection system and method
CN116127364A (en) * 2023-04-12 2023-05-16 上海术理智能科技有限公司 Integrated transducer-based motor imagery decoding method and system
CN116502069A (en) * 2023-06-25 2023-07-28 四川大学 Haptic time sequence signal identification method based on deep learning
CN117576150A (en) * 2023-11-03 2024-02-20 扬州万方科技股份有限公司 Multi-mode multi-target 3D tracking method and device considering far-frame dependency relationship

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020623A (en) * 2019-04-04 2019-07-16 中山大学 Physical activity identifying system and method based on condition variation self-encoding encoder
CN112464861A (en) * 2020-12-10 2021-03-09 中山大学 Behavior early recognition method, system and storage medium for intelligent human-computer interaction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020623A (en) * 2019-04-04 2019-07-16 中山大学 Physical activity identifying system and method based on condition variation self-encoding encoder
CN112464861A (en) * 2020-12-10 2021-03-09 中山大学 Behavior early recognition method, system and storage medium for intelligent human-computer interaction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈伟;何家欢;裴喜平;: "基于相空间重构和卷积神经网络的电能质量扰动分类", 电力系统保护与控制, no. 14, 16 July 2018 (2018-07-16), pages 99 - 93 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255597A (en) * 2021-06-29 2021-08-13 南京视察者智能科技有限公司 Transformer-based behavior analysis method and device and terminal equipment thereof
CN113256637A (en) * 2021-07-15 2021-08-13 北京小蝇科技有限责任公司 Urine visible component detection method based on deep learning and context correlation
CN113256637B (en) * 2021-07-15 2021-11-05 北京小蝇科技有限责任公司 Urine visible component detection method based on deep learning and context correlation
CN113627266A (en) * 2021-07-15 2021-11-09 武汉大学 Video pedestrian re-identification method based on Transformer space-time modeling
CN113627266B (en) * 2021-07-15 2023-08-18 武汉大学 Video pedestrian re-recognition method based on transform space-time modeling
CN113397572A (en) * 2021-07-23 2021-09-17 中国科学技术大学 Surface electromyographic signal classification method and system based on Transformer model
CN113688871A (en) * 2021-07-26 2021-11-23 南京信息工程大学 Transformer-based video multi-label action identification method
CN113688871B (en) * 2021-07-26 2022-07-01 南京信息工程大学 Transformer-based video multi-label action identification method
CN113936339B (en) * 2021-12-16 2022-04-22 之江实验室 Fighting identification method and device based on double-channel cross attention mechanism
CN113936243A (en) * 2021-12-16 2022-01-14 之江实验室 Discrete representation video behavior identification system and method
CN113936339A (en) * 2021-12-16 2022-01-14 之江实验室 Fighting identification method and device based on double-channel cross attention mechanism
CN115937990A (en) * 2023-02-27 2023-04-07 珠海金智维信息科技有限公司 Multi-person interactive action detection system and method
CN115937990B (en) * 2023-02-27 2023-06-23 珠海金智维信息科技有限公司 Multi-person interaction detection system and method
CN116127364A (en) * 2023-04-12 2023-05-16 上海术理智能科技有限公司 Integrated transducer-based motor imagery decoding method and system
CN116502069A (en) * 2023-06-25 2023-07-28 四川大学 Haptic time sequence signal identification method based on deep learning
CN116502069B (en) * 2023-06-25 2023-09-12 四川大学 Haptic time sequence signal identification method based on deep learning
CN117576150A (en) * 2023-11-03 2024-02-20 扬州万方科技股份有限公司 Multi-mode multi-target 3D tracking method and device considering far-frame dependency relationship

Similar Documents

Publication Publication Date Title
CN113033657A (en) Multi-user behavior identification method based on Transformer network
CN111898736B (en) Efficient pedestrian re-identification method based on attribute perception
CN111860600B (en) User electricity utilization characteristic selection method based on maximum correlation minimum redundancy criterion
CN105654037A (en) Myoelectric signal gesture recognition method based on depth learning and feature images
CN111859010B (en) Semi-supervised audio event identification method based on depth mutual information maximization
CN110188653A (en) Activity recognition method based on local feature polymerization coding and shot and long term memory network
CN110108914A (en) One kind is opposed electricity-stealing intelligent decision making method, system, equipment and medium
CN110287863A (en) A kind of gesture identification method based on WiFi signal
CN108392213B (en) Psychological analysis method and device based on painting psychology
CN112464730B (en) Pedestrian re-identification method based on domain-independent foreground feature learning
CN110119545A (en) A kind of non-intrusive electrical load recognition methods based on stack self-encoding encoder
Yang et al. Auroral sequence representation and classification using hidden Markov models
CN110245707B (en) Human body walking posture vibration information identification method and system based on scorpion positioning
CN114176607B (en) Electroencephalogram signal classification method based on vision transducer
Li et al. Dating ancient paintings of Mogao Grottoes using deeply learnt visual codes
CN113069117A (en) Electroencephalogram emotion recognition method and system based on time convolution neural network
CN109241870B (en) Coal mine underground personnel identity identification method based on gait identification
CN115457403A (en) Intelligent crop identification method based on multi-type remote sensing images
Hagiwara et al. BEANS: The benchmark of animal sounds
CN117540908A (en) Agricultural resource integration method and system based on big data
CN107045624B (en) Electroencephalogram signal preprocessing and classifying method based on maximum weighted cluster
Tang et al. Transound: Hyper-head attention transformer for birds sound recognition
CN110889335A (en) Human skeleton double-person interaction behavior recognition method based on multi-channel space-time fusion network
CN113705695A (en) Power distribution network fault data identification method based on convolutional neural network
CN117272230A (en) Non-invasive load monitoring method and system based on multi-task learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination