CN113587935A - Indoor scene understanding method based on radio frequency signal multitask learning network - Google Patents

Indoor scene understanding method based on radio frequency signal multitask learning network Download PDF

Info

Publication number
CN113587935A
CN113587935A CN202110891904.8A CN202110891904A CN113587935A CN 113587935 A CN113587935 A CN 113587935A CN 202110891904 A CN202110891904 A CN 202110891904A CN 113587935 A CN113587935 A CN 113587935A
Authority
CN
China
Prior art keywords
layer
network
data
net
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110891904.8A
Other languages
Chinese (zh)
Other versions
CN113587935B (en
Inventor
王林
王新雨
高畅
石中玉
张德安
厉斌斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202110891904.8A priority Critical patent/CN113587935B/en
Publication of CN113587935A publication Critical patent/CN113587935A/en
Application granted granted Critical
Publication of CN113587935B publication Critical patent/CN113587935B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • G01C21/206Instruments for performing navigational calculations specially adapted for indoor navigation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Automation & Control Theory (AREA)
  • Medical Informatics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of behavior perception, in particular to an indoor scene understanding method based on a radio frequency signal multitask learning network, which comprises the following steps of: collecting channel state information by using a wireless network card carrying Atheros; data preprocessing: filtering noise contained in an original signal, synthesizing multilink data after denoising is finished, standardizing a data format, and constructing an input data set of a neural network; multitask identification network: indoor scene understanding is achieved by using a multitask learning network wisnet, wherein the wisnet comprises a shared representation layer, and a domain identification network Dom _ Net, a position identification network Loc _ Net and a behavior identification network Act _ Net which use gradient information between multitasks of the shared representation layer. The method uses a multi-task learning method to simultaneously identify the scene where the user is located, including the domain, the position and the action where the user is located, and senses the user from multiple angles, so as to understand the meaning of the behavior of the user.

Description

Indoor scene understanding method based on radio frequency signal multitask learning network
Technical Field
The invention relates to the technical field of behavior perception, in particular to an indoor scene understanding method based on a radio frequency signal multitask learning network.
Background
When behavior perception is achieved by using commercial WiFi, action semantics are often closely related to occurrence scenes, and single action recognition cannot meet semantic understanding requirements of actions in specific scenes. The patent designs and realizes a scene understanding multi-task learning method based on channel state information. The method utilizes an attention mechanism to endow different weights to signals from different sources, utilizes a multi-task learning network to realize the mining of hidden information, and has stronger cross-domain property and expandability.
There have been many mature efforts for WiFi-based behavioral awareness and indoor positioning. However, in an indoor home environment, the user's actions are not separated from the environment and location in which they occur. The same action or similar actions may represent distinct semantics in different environments. For example, also lying down, with a high probability that the user is sleeping in a bed in a bedroom, while lying down on the floor of a living room may represent that the user has fallen, shocked or even more seriously died; to avoid similar misunderstandings in a home environment, it is important to distinguish semantics of the same or similar actions. Especially in the monitoring of solitary old people, the positions of the old people are known, and then the actions of the old people can be judged to better understand the behaviors of the old people, so that unnecessary misunderstandings are avoided. In the AR game, the same action occurring at different positions may represent different operations of a game character, and if the action can be recognized and the position and the current area where the user is located can be known, a definite semantic meaning can be given to the behavior of the user, and at that time, scenes supported by the AR will be richer. The environment and location of the user constrains the actions that the user can perform, in other words, the actions that the user performs are a reflection of the environment and location of the user and should not be split apart.
The existing touch sensing, such as wearable devices, is limited by the limited battery capacity, the devices cannot continue to work once the batteries are exhausted, and frequent charging undoubtedly brings a certain burden to users. Non-contact sensing devices such as RFID, millimeter wave and infrared sensors are expensive in manufacturing cost and are suitable for being used in places with large people flow, such as markets, airports, stations and the like. The ubiquity of WiFi in the home environment makes it circumvent the limitations of the application scenario, and WiFi is low cost and can be deployed on a large scale. Many key technologies based on WiFi signal indoor behavior semantic understanding need to be broken through, and accurate action semantic understanding not only needs to identify user behaviors, but also needs information support of the domain and the position where the user is located. There is currently no correlation work to fuse these three dimensions of information together.
Disclosure of Invention
In order to solve the problems, the invention provides an indoor scene understanding method based on a radio frequency signal multitask learning network.
In order to achieve the purpose, the invention adopts the technical scheme that:
an indoor scene understanding method based on a radio frequency signal multitask learning network comprises the following steps,
step 1, data acquisition: collecting channel state information by using a wireless network card carrying Atheros;
step 2, data preprocessing: filtering noise contained in an original signal, synthesizing multilink data after denoising is finished, standardizing a data format, and constructing an input data set of a neural network;
step 3, multi-task identification network: indoor scene understanding is achieved by using a multitask learning network wisnet, wherein the wisnet comprises a shared representation layer, and a domain identification network Dom _ Net, a position identification network Loc _ Net and a behavior identification network Act _ Net which use gradient information between multitasks of the shared representation layer.
Preferably, the method is characterized in that:
in step 1, the data acquisition equipment comprises two computers, two routers carrying Atheros wireless network cards and a network cable, wherein the computers are connected with the routers through the network cable, the router system can be accessed through a notebook computer, the setting of parameters such as mode, center frequency, packet sending rate and the like is completed, and a signal sending instruction and a signal receiving instruction are transmitted to the routers; the two routers control the sending and receiving of CSI signals according to commands sent by terminals, the commands comprise destination addresses and the number of sending packets, each router is provided with two pairs of receiving and sending end antennas, the sending rate of the sending end is 500 packets/second, the bandwidth is 20MHZ, and the center frequency adopts 2.4 GHZ.
Preferably, in step 2, the denoising method is wavelet decomposition and reconstruction in wavelet transform, single-scale wavelet transform analysis is performed on the amplitude of CSI by using db3 wavelet, and one subcarrier data in an original signal is randomly selected to perform db3 wavelet coefficient decomposition and reconstruction, so as to complete noise filtering.
Preferably, in step 2, all link data of the two pairs of transmitting and receiving terminals are synthesized into a data format of 2000,56,4, the synthesized data and the three corresponding tags are respectively a domain, a position, and an action, and the synthesized data generates a data set.
Preferably, in step 3, the domain identification network Dom _ Net can give more weight to information with smaller amplitude values using the convolutional attention mechanism based on minimum pooling to distinguish different domains; the behavior recognition network Act _ Net distinguishes different actions based on the fact that information with larger amplitude values can be given more weight using a convolutional attention mechanism based on maximum pooling.
Preferably, the input data set obtained in step 2 is input into a convolution attention mechanism AM, which comprises a channel attention module and a spatial attention module.
Preferably, the input data set obtained in step 2 is input into a normal convolution operation, while adding an attention mechanism, and the channel attention module is expressed as follows:
Mc(F)=σ(MLP(AvgPool(X))+MLP(MinPool(X))),
wherein X is input data of a neural network, Avgpool and Minpool are respectively an average pooling layer and a limit pooling layer, MLP is a sharing layer, data dimension reduction and feature extraction are mainly realized on the sharing layer through convolution operation, and sigma is a corresponding Sigmoid activation function; the method comprises the steps that a channel attention module compresses a feature map on a space dimension, only the features inside each channel are considered, the input feature map passes through a global average pooling layer and a global limit pooling layer of the channel attention module respectively while convolution operation is carried out, the average pooling layer has feedback on each feature point and is used for keeping background information in the feature map, and when gradient back propagation calculation is carried out on the limit pooling layer, only the feature points with small response on the feature map have gradient feedback; inputting two feature maps of an average pooling layer and a limit pooling layer into a shared layer MLP to realize dimension reduction and feature extraction, compressing the spatial dimension of the feature maps, adding the output of the MLP, activating by a sigmoid function to obtain a channel attention matrix, and performing intelligent product operation on the result and the feature matrix subjected to convolution to obtain an adjusted feature F';
the spatial attention module compresses the channel, which is expressed as:
Ms(F)=σ(fn*n([AvgPool(F′);MinPool(F′)])),
wherein F' is a feature after a channel attention mechanism, F corresponds to a two-dimensional convolution operation, n is a dimension of a convolution kernel, AvgPool is used for extracting an average value on a channel, and MinPool is used for extracting a limit value on the channel; connecting the feature matrixes extracted by the average pooling layer and the limit pooling layer, activating the feature matrixes by sigmoid after the convolution layer to obtain a Spatial Attention matrix (Spatial Attention), and carrying out intelligent product operation on the Spatial Attention matrix and the adjusted feature F' to obtain the following formula:
CA=Mc(F)·Ms(F),
CAi.e. adding notes on the basis of CNNResults of the ideogram mechanism, in specific domain identification applications, CAIncluding background information in the collected data, for characterizing the domain where the current user is located, when the network includes multiple layers, CAIterate as input into the calculation of the next layer.
Preferably, the shared representation layer comprises two layers of convolution, and after each layer of convolution operation, the shared representation layer has a structure of a batch normalization layer and a correction linear unit with leakage to avoid the phenomena of gradient disappearance and gradient explosion.
Preferably, in step 3, using the wisnet network structure, the data input to output calculation process is as follows:
original dataset D { (x)1,y1),(x2,y2)...(xn,yn) Therein of
Figure BDA0003196237140000051
xiObtaining shared layer output S through two hard shared layersi
Si=LeaklyRelu(f(∑i∈Dxi*ks i+bs i)),
Wherein k is a corresponding convolution kernel parameter, and b is an offset; after convolution xiAfter activation by LeaklyRelu, k and b are shared among the three tasks; in the gradient updating process, returning the task specific gradient information and simultaneously returning the gradient information of the shared parameter;
in order to judge the domain where the user is located, a network structure shown by Dom _ Net is used; s after passing through sharing layeriFirstly, obtaining the product after convolution
Figure BDA0003196237140000052
Figure BDA0003196237140000053
In the training process, the problem of gradient disappearance or explosion can occur when the data distribution of the middle layer is changed;to solve this problem and at the same time to speed up the training,
Figure BDA0003196237140000054
a batch normalization layer BN is required; after passing through BN layer to obtain
Figure BDA0003196237140000055
Obtaining a result F after one-dimensional convolution by using maximum pooling after LeaklyReludom
Figure BDA0003196237140000056
Firstly, extracting minimum values on each channel, namely adding a channel attention mechanism CA to obtain
Figure BDA0003196237140000057
Figure BDA0003196237140000061
Then, compressing the channel information, i.e. adding a spatial attention mechanism SA to obtain
Figure BDA0003196237140000062
Figure BDA0003196237140000063
Data x after convolutioniAfter the two steps, the linear full-connection layer is output to obtain the optical fiber
Figure BDA0003196237140000064
Figure BDA0003196237140000065
Wherein WdomAnd bdomRespectively updating a weight matrix and a bias matrix for the iteration of the full connection layer;
Figure BDA0003196237140000066
the index value corresponding to the maximum value of each row is the output of the network prediction, and the corresponding loss function is Ldom
Figure BDA0003196237140000067
Same principle SiThe output obtained after Act _ Net is subjected to three-layer convolution layer to obtain Fact(ii) a As the signal with larger change amplitude contains behavior information of more users, an attention mechanism consisting of an average pooling layer and a maximum pooling layer is added; adding an injection machine to obtain
Figure BDA0003196237140000068
Figure BDA0003196237140000069
Figure BDA00031962371400000610
Obtained by passing through a linear full-connection layer
Figure BDA0003196237140000071
Figure BDA0003196237140000072
The corresponding loss functions are respectively:
Figure BDA0003196237140000073
relatively speaking, the network structure of Loc _ Net is simple, because the CNN convolutional neural network is sensitive to spatial information, the location can be well identified without adding a attention mechanism; the output after the convolutional layer after the Loc _ Net is
Figure BDA0003196237140000074
Figure BDA0003196237140000075
F is obtained after batch normalization layer and activation function layerloc:
Figure BDA0003196237140000076
SiObtained by two-layer convolution and finally by full connection layer
Figure BDA0003196237140000077
Figure BDA0003196237140000078
Likewise, the penalty function for the final Loc _ Net is:
Figure BDA0003196237140000079
since the sharing layer is embedded in each sub-network, the loss returned in each sub-network contains gradient information of a specific task and gradient information from the sharing layer, namely theta contains thetash,θiTwo parts, the optimization objective function of wisnet is:
Figure BDA0003196237140000081
wherein L isi={Ldom,Lact,LlocUpdating parameters to minimize the objective function;
the final output of wisnet is the output of three networks,
Figure BDA0003196237140000082
and
Figure BDA0003196237140000083
respectively corresponding to the domain where the user is located, the position of the current domain where the user is located and the executed action; from the information of the domain and the location, the specific meaning contained by the action can be deduced.
The beneficial effects of the invention are as follows:
the method of minimum pooling in the field of image recognition is less used, mainly because in the representation RGB of the image, 000 represents black, and the smaller the value the closer to black. The picture information extracted by the minimum pooling is background information with few characteristics, and the characteristics have no significance. But in the field of signal processing 0 is of practical significance. The signal reflected from different domains will have different amplitudes due to different room locations and furnishings. Starting from this, different domains are distinguished based on the amplitude level when the space is relatively stationary. Using a convolutional attention mechanism based on minimal pooling may give more weight to information with smaller amplitude values. Thereby neglecting the influence of information with large amplitude fluctuation. In the behavior perception technology based on the CSI, a method of multitask concurrency never exists. Aiming at the problem of multi-task scene understanding, the multi-task learning network structure wisnet based on the hard sharing mechanism provided by the patent utilizes the sharing mechanism of the convolutional layer to extract hidden information among subtasks, and provides possibility for cross-scene action identification and indoor positioning.
From the above, the advantages of the present invention are:
(1) the system distinguishes different meanings of the same action under different scenes and positions, and solves the problem that the traditional method cannot realize behavior semantic understanding.
(2) The system takes all the receiving and transmitting terminal carrier signals as network input, defines an effective data splicing format and more effectively utilizes indoor multipath information.
(3) The system provides a multi-task scene understanding network wisnet based on CSI, and behavior recognition and indoor positioning can be carried out under multiple scenes without retraining a model.
Drawings
Fig. 1 is a flowchart of an indoor scene understanding method based on a radio frequency signal multitask learning network according to the present invention.
Fig. 2 is a diagram of actions performed by volunteers in an indoor scene understanding method based on a radio frequency signal multitask learning network according to the present invention.
Fig. 3 is a schematic view of a hall scene in the indoor scene understanding method based on the radio frequency signal multitask learning network.
Fig. 4 is a schematic view of an office scene of the indoor scene understanding method based on the radio frequency signal multitask learning network.
Fig. 5 is a schematic diagram of wavelet reconstruction signals of different scales of the indoor scene understanding method based on the radio frequency signal multitask learning network.
Fig. 6 is a schematic diagram of data set construction of an indoor scene understanding method based on a radio frequency signal multitask learning network.
Fig. 7 is a diagram of a wisnet network structure for understanding an indoor scene based on a radio frequency signal multitask learning network according to the present invention.
Fig. 8 is a diagram of an indoor scene understanding method Dom _ Net attention mechanism structure based on a radio frequency signal multitask learning network.
FIG. 9 is a schematic diagram of sub-network accuracy and loss value in the training process of the indoor scene understanding method based on the radio frequency signal multitask learning network.
Fig. 10 is a schematic diagram of a wisnet confusion matrix of an indoor scene understanding method based on a radio frequency signal multitask learning network.
Fig. 11 is a schematic diagram of wisnet performance evaluation of an indoor scene understanding method based on a radio frequency signal multitask learning network.
Fig. 12 is a comparison graph of training accuracy rates of different structures of Act _ Net in the indoor scene understanding method based on the radio frequency signal multitask learning network.
Fig. 13 is a comparison graph of indexes of the indoor scene understanding method Act _ Net based on the radio frequency signal multitask learning network under different networks.
Detailed Description
In order to make the purpose, technical solution and advantages of the present technical solution more clear, the present technical solution is further described in detail below with reference to specific embodiments. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present teachings.
As shown in fig. 1 to 8, the present embodiment provides an indoor scene understanding method based on a radio frequency signal multitask learning network, focuses on semantic understanding of cross-domain actions, which is a key technology in the intelligent perception field, and provides a scene understanding system architecture Wi-sys based on an indoor wireless signal as shown in fig. 1. Wi-Senys comprises three parts of data acquisition, data preprocessing and multitask identification network. Firstly, the loaded Atheros wireless network card is used for collecting Channel State Information (CSI). Then, noise contained in the original signal is filtered, multi-link data is synthesized after denoising is finished, a data format is standardized, and an input data set of the neural network is constructed. Finally, indoor scene understanding is achieved through the multitask learning network wisnet, and the wisnet comprises a sharing representation layer, a domain identification network Dom _ Net, a position identification network Loc _ Net and a behavior identification network Act _ Net.
Data acquisition
The equipment for acquiring the experimental data comprises two notebook computers, two routers carrying Atheros wireless network cards and two 5-meter long network cables. The computer is connected with the router through a network cable, and the router system can be accessed through a notebook computer to complete the setting of parameters such as modes, center frequencies, packet sending rates and the like. And communicates send and receive signal instructions to the router. The two routers control the transmission and reception of the CSI signals according to commands sent by the terminals. The command includes a destination address and the number of packets to be transmitted. Each router has two pairs of transmit-receive end antennas, and the packet transmission rate of the transmit end is 500 packets/second. The bandwidth is 20MHz, and the center frequency adopts 2.4 GHz.
In the experimental setup, the volunteers performed the actions as shown in fig. 2, including squat, stooping, walking, raising hands, and other common actions in life. The volunteer performed 10 passes of each action at each location in the field, with a sample time of approximately 4.5 seconds per action. Each sample collected consists of 2300 CSI packets.
The scene shown in fig. 3 is a hall with a certain teaching building being relatively open, a few tables are distributed around the hall, and the number of windows around the hall is large. The height of the router from the ground is 85 cm, the size of each position fingerprint block is 1.2 multiplied by 1.2 m, and each domain comprises 9 positions which are respectively named as 1-9 by numbers. The domain size is about 13 square meters. In the process of collecting CSI, pedestrians pass through the CSI, and certain interference is brought to effective signals. Fig. 4 shows a conference room, in which the desks and chairs are arranged closely and the wall area is large. The scene space is larger and the surrounding environment is more complex than the lobby scene shown in fig. 4-9. After the signal is sent out, the number of reflections passing through static objects such as tables, chairs and walls in the environment is more, and therefore more uncertain factors are contained in the collected CSI signal. The volunteers performed the actions shown in fig. 4, including some actions common in living such as squatting, bending, walking, raising hands, etc. The volunteer performed 10 passes of each action at each location in the field, with a sample time of approximately 4.5 seconds per action. Each sample collected consists of 2300 CSI packets.
Data pre-processing
And the acquisition and denoising part is used for acquiring the CSI by using an Atheros network card, and the signal can be reflected, diffracted and scattered by furniture and other static objects and human bodies in the process of reaching the receiving end from the transmitting end. In the process, the device itself may also generate vibration, and meanwhile, other devices transmitting wireless signals in the home environment may also cause interference to the CSI propagation process. Leading to the situation of packet loss, delay and noise in the signal during the double-end transmission, which easily leads to the effective signal being submerged. The data needs to be denoised before extracting effective features in the CSI signal. The denoising method used herein is wavelet decomposition and reconstruction in wavelet transform. Wavelet decomposition and reconstruction denoising, as used herein, primarily utilizes the db3 wavelet to perform a single-scale wavelet transform analysis of the amplitudes of the CSI. Randomly selecting one subcarrier data in the original signal to perform db3 wavelet coefficient decomposition and reconstruction, and obtaining the result shown in fig. 5. As the reconstruction scale increases, the signal tends to be smooth. When the scale 6 reconstruction is used, the relatively high-frequency signal is lost more, and part of the signal is not matched with the original signal, so the a5 scale reconstruction is selected.
In the experimental process, the data set construction part observes that the signals collected by different transceiving end devices are different even if the same volunteer executes the same action at the same position in the same domain. As shown in fig. 5. Even if the receiving end is the same, the amplitude value interval and the data change pattern from different transmitting ends are different. Different transceiving end links form different perspectives of human body variation in space. The common sense tells us that the richer the viewing angle the more comprehensive and true the changes we see. In order to better utilize the data redundancy brought by multipath and simultaneously meet the input of the neural network, all link data of two pairs of transceiving ends are synthesized into a data format of (2000,56, 4). The synthesized data along with its corresponding three labels (domain, location, action) generate a data set. And connecting the data collected by the devices at the two pairs of transmitting and receiving ends, and performing longitudinal splicing to obtain a data format (2000,56, 4). The spliced data together with the three corresponding labels generates a data set, and the data format is as shown in fig. 6.
Multitask identification network
Unlike a single-task learning network, a dataset of a multi-task learning network contains information in three dimensions, namely domain, position and action. The method of multi-task learning is used for simultaneously reading and processing three kinds of information, and hidden information among tasks can be fully mined. The process is mainly completed by a parameter sharing mechanism, and a sharing layer can synthesize gradient information among a plurality of tasks and synchronously update the plurality of tasks. The architecture diagram for a scenario-aware neural network multitask learning as used herein is shown in fig. 7.
Note that in the conventional mechanism, Dom _ Net distinguishes between different domains based on the amplitude level when space is relatively stationary. Using a convolutional attention mechanism based on minimal pooling may give more weight to information with smaller amplitude values. Thereby neglecting the influence of information with large amplitude fluctuation. And Act _ Net adds a mechanism of attention based on maximum pooling, so that information with larger amplitude dominates. Different networks add different attention mechanisms to achieve focusing on different signals. The attention module employed by Dom _ Net is shown in FIG. 8
The convolution attention mechanism AM mainly comprises two parts: a channel attention module and a spatial attention module. Each channel of the features represents a special detector, and the channel attention module compresses the feature matrix in a spatial dimension to extract feature information needing attention in each channel. The spatial attention mechanism is to compress the channels, and integrate the features extracted by each channel in consideration of the feature dimension of the whole data.
The input data is normally convolved while an attention mechanism is added. The channel attention module is expressed as follows:
Mc(F)=σ(MLP(AvgPool(X))+MLP(MinPool(X))),
wherein, X is input data of a neural network, Avgpool and Minpool are an average pooling layer and a minimum pooling layer respectively, MLP is a sharing layer, data dimension reduction and feature extraction are mainly realized through convolution operation in the sharing layer, sigma is a corresponding activation function, and a Sigmoid activation function is used here.
The channel attention module compresses the feature map in the spatial dimension, taking into account only the features inside each channel. The input feature map passes through the global average pooling layer and the global minimum pooling layer of the channel attention module respectively while performing convolution operation. The average pooling layer has feedback on each feature point and is used for keeping background information in the feature map, and when the minimum pooling layer is used for gradient back propagation calculation, only the feature points which respond to small features on the feature map have gradient feedback, so that the minimum pooling can be used for selecting features which are not obvious in change on the feature map. And inputting the two feature maps passing through the average pooling layer and the minimum pooling layer into a shared layer MLP to realize dimension reduction and feature extraction, and compressing the spatial dimension of the feature maps. And activating the output sum of the MLP through a sigmoid function to obtain a Channel Attention matrix (Channel Attention), and performing intelligent product operation on the result and the feature matrix subjected to convolution to obtain an adjusted feature F'.
The spatial attention module compresses the channels and comprehensively considers the relationship among the channels. The spatial attention module is expressed as:
Ms(F)=σ(fn*n([AvgPool(F′);MinPool(F′)])),
where F' is the feature after the channel attention mechanism, F corresponds to a two-dimensional convolution operation, and n is the dimensionality of the convolution kernel.
AvgPool was used to extract the mean on the channel and MinPool was used to extract the minimum on the channel. Connecting the average pooling layer with the feature matrix extracted by the minimum pooling layer, activating by sigmoid after the convolution layer to obtain a Spatial Attention matrix (Spatial Attention), and performing intelligent product operation on the Spatial Attention matrix and the adjusted feature F' to obtain the following formula: .
CA=Mc(F)·Ms(F),
Wherein, CAI.e. the result of adding an attention mechanism on the basis of CNN, in a specific domain identification application, CAIncluding background information in the collected data, to characterize the domain where the current user is located. When the network comprises multiple layers, CAIterate as input into the calculation of the next layer.
The mechanism of attention in Act _ Net is similar to that of FIG. 8, and during use, the minimum pooling needs to be replaced by the maximum pooling.
The wisnet includes a shared representation layer, a domain identification network Dom _ Net, a location identification network Loc _ Net, and an action identification network Act _ Net. The shared representation layer comprises two layers of convolution, and after each layer of convolution operation, the shared representation layer has a batch normalization layer and a structure with leaked correction linear units to avoid gradient extinction and gradient explosion phenomena. The network structure of the three subtasks is shown in fig. 7, and the data input-output calculation process is as follows:
original dataset D { (x)1,y1),(x2,y2)...(xn,yn) Therein of
Figure BDA0003196237140000151
xiObtaining shared layer output S through two hard shared layersi
Si=LeaklyRelu(f(∑i∈Dxi*ks i+bs i)),
Where k is the corresponding convolution kernel parameter and b is the offset. After convolution xiAfter LeaklyRelu activation, k and b are shared among the three tasks. In the gradient updating process, the task-specific gradient information is returned and the gradient information of the shared parameter is returned at the same time.
To determine the domain in which the user is located, a network structure shown by Dom _ Net in fig. 5 is used. S after passing through sharing layeriFirstly, obtaining the product after convolution
Figure BDA0003196237140000152
Figure BDA0003196237140000153
During training, the problem of gradient disappearance or explosion can occur when the data distribution of the middle layer is changed. To solve this problem and at the same time to speed up the training,
Figure BDA0003196237140000154
a batch normalization layer BN is required. After passing through BN layer to obtain
Figure BDA0003196237140000155
Obtaining a result F after one-dimensional convolution by using maximum pooling after LeaklyReludom
Figure BDA0003196237140000156
Firstly, extracting minimum values on each channel, namely adding a channel attention mechanism CA to obtain
Figure BDA0003196237140000157
Figure BDA0003196237140000158
Then, compressing the channel information, i.e. adding a spatial attention mechanism SA to obtain
Figure BDA0003196237140000161
Figure BDA0003196237140000162
Data x after convolutioniAfter the two steps, the linear full-connection layer is output to obtain the optical fiber
Figure BDA0003196237140000163
Figure BDA0003196237140000164
Wherein, WdomAnd bdomRespectively updating a weight matrix and a bias matrix for the iteration of the full connection layer.
Figure BDA0003196237140000165
The index value corresponding to the maximum value of each row is the output of the network prediction, and the corresponding loss function is Ldom
Figure BDA0003196237140000166
Same principle SiThe output obtained after Act _ Net is subjected to three-layer convolution layer to obtain Fact. Since the signal with larger variation amplitude contains more user behavior information, an attention mechanism consisting of average pooling and maximum pooling layers is added. Adding an injection machine to obtain
Figure BDA0003196237140000167
Figure BDA0003196237140000168
Figure BDA0003196237140000169
Obtained by passing through a linear full-connection layer
Figure BDA00031962371400001610
Figure BDA0003196237140000171
The corresponding loss functions are respectively:
Figure BDA0003196237140000172
the network structure of Loc _ Net is relatively simple, because the CNN convolutional neural network itself is sensitive to spatial information, and thus can identify the position well without adding a attention mechanism. The output after the convolutional layer after the Loc _ Net is
Figure BDA0003196237140000173
Figure BDA0003196237140000174
F is obtained after batch normalization layer and activation function layerloc:
Figure BDA0003196237140000175
SiObtained by two-layer convolution and finally by full connection layer
Figure BDA0003196237140000176
Figure BDA0003196237140000177
Likewise, the penalty function for the final Loc _ Net is:
Figure BDA0003196237140000178
since the sharing layer is embedded in each sub-network, the loss returned in each sub-network contains gradient information of a specific task and gradient information from the sharing layer, namely theta contains thetash,θiTwo parts, the optimization objective function of wisnet is:
Figure BDA0003196237140000181
wherein L isi={Ldom,Lact,LlocAnd (4) updating the parameters to minimize the objective function.
The final output of the wisnet is the output of three networks
Figure BDA0003196237140000182
Corresponding to the domain in which the user is located, the location of the current domain in which the user is located, and the action performed, respectively. From the information of the domain and the location, the specific meaning contained by the action can be deduced.
Example 1
The present embodiment verifies the accuracy and system robustness of the above method.
Accuracy of identification
Training is performed using the data sets under both domains. The actions contained under the data set of each domain are not exactly the same, and actions not involved in a domain are individually categorized into one class. The accuracy and loss variation during training is shown in fig. 9.
After the addition of the shared layer, as the number of training rounds increases, the accuracy gradually increases while the loss gradually decreases. After 200 rounds of training, the accuracy of all three tasks reaches more than 95%, and the loss is averagely reduced to be less than 0.1.
Wisnet was trained using a data set under both domains. The confusion matrix for wisnet in the test set is shown in fig. 10.
As can be seen from the graphs of FIG. 10a) and B), the accuracy of each category of Act _ Net is over 80%, and the accuracy of Loc _ Net is over 95%.
Other evaluation indicators on the test set are Recall (Recall), precision (permission), and macro F1 as shown in fig. 11.
As can be seen from fig. 11, Dom _ Net and Loc _ Net performed best, with each index being 95% or more. Act _ Net is difficult to identify when statistical features because features are rich in variability after varying domains and positions. Even so, the precision rate, the recall rate and the macro-F1 value reach 83%. In overview, performing action and location recognition under multiple domains, adding a hard sharing mechanism can significantly improve the performance of the model.
The correct implementation of classification of each subtask of wisnet is a necessary condition for scene understanding, and in the scene understanding task, action semantics can be correctly analyzed only when (domains, positions and actions) are correctly classified. To evaluate the classification performance of wisnet, tests were performed on the test set herein. The test indexes are detailed in table 1 below.
TABLE 1 Wisenet test results
Figure BDA0003196237140000191
Wherein √ is a correct classification, and x is a wrong classification.
The TTT in 1888 data is 1553, and accounts for 82.3%. The TTT in 1888 data is 1553, and accounts for 82.3%. Of the 335 data items with the remaining misclassifications, the TTFs are 291 data items. This indicates that the probability of an Act _ Net misclassification resulting in an overall misclassification is 87% with the precondition that Loc _ Net and Dom _ Net are correctly classified. And the sum of TTF, TFF, FFF and FTF is 300, wherein TTF is 291. That is, when Act _ Net classification is wrong, the ratio of the Loc _ Net classification and the Dom _ Net classification is 97%. And the sum of TTF and TTT is 1844, which accounts for 97.6%, and Loc _ Net and Dom _ Net can correctly classify a large part of data and have little influence on the whole classification. From this analysis, wisnet presents a "short plate effect" whose overall classification performance is determined by the subtask network Act _ Net. Therefore, when the wisnet is improved by adopting different structures and parameters, the classification performance of the Act _ Net is focused.
System robustness
To observe the effect of the attention mechanism, the following comparative experiments were performed for different network structures herein. The attention mechanism is named Act _ o _ Dom _ o, Act _ o _ Dom _ w, Act _ w _ Dom _ o and Act _ w _ Dom _ w according to whether the attention mechanism is added or not. The Act _ Net training accuracy for different network structures under the same data set is shown in FIG. 9. Fig. 12 shows the accuracy of Act _ Net in the course of 100 rounds of training under the four network structures, and it can be obviously observed that the network without adding the attention mechanism has the worst performance, and the accuracy only reaches about 80%, while the network with adding the attention mechanism has relatively good performance, wherein the Act-w-Dom-w, i.e. wisnet, with two attention mechanisms added simultaneously has the best performance.
Fig. 13 shows the accuracy of wisnet semantic recognition under four different network architectures. It can be shown that after the attention is added to Act _ Net and Dom _ Net simultaneously, the recognition accuracy of the action semantics is obviously improved.
The foregoing is only a preferred embodiment of the present invention, and many variations in the specific embodiments and applications of the invention may be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the claims of this patent.

Claims (9)

1. An indoor scene understanding method based on a radio frequency signal multitask learning network is characterized by comprising the following steps: comprises the following steps of (a) carrying out,
step 1, data acquisition: collecting channel state information by using a wireless network card carrying Atheros;
step 2, data preprocessing: filtering noise contained in an original signal, synthesizing multilink data after denoising is finished, standardizing a data format, and constructing an input data set of a neural network;
step 3, multi-task identification network: indoor scene understanding is achieved by using a multitask learning network wisnet, wherein the wisnet comprises a shared representation layer, and a domain identification network Dom _ Net, a position identification network Loc _ Net and a behavior identification network Act _ Net which use gradient information between multitasks of the shared representation layer.
2. The indoor scene understanding method based on the radio frequency signal multitask learning network according to claim 1, characterized in that:
in step 1, the data acquisition equipment comprises two computers, two routers carrying Atheros wireless network cards and a network cable, wherein the computers are connected with the routers through the network cable, the router system can be accessed through a notebook computer, the setting of parameters such as mode, center frequency, packet sending rate and the like is completed, and a signal sending instruction and a signal receiving instruction are transmitted to the routers; the two routers control the sending and receiving of CSI signals according to commands sent by terminals, the commands comprise destination addresses and the number of sending packets, each router is provided with two pairs of receiving and sending end antennas, the sending rate of the sending end is 500 packets/second, the bandwidth is 20MHZ, and the center frequency adopts 2.4 GHZ.
3. The indoor scene understanding method based on the radio frequency signal multitask learning network according to claim 1, characterized in that:
in step 2, the denoising method is wavelet decomposition and reconstruction in wavelet transformation, single-scale wavelet transformation analysis is performed on the amplitude of the CSI by using a db3 wavelet, and one subcarrier data in an original signal is randomly selected to perform db3 wavelet coefficient decomposition and reconstruction, so that noise filtering is completed.
4. The indoor scene understanding method based on the radio frequency signal multitask learning network according to claim 2, characterized in that: in step 2, all link data of two pairs of transceiving ends are synthesized into a data format of 2000,56,4, the synthesized data and three corresponding labels thereof are respectively a domain, a position and an action, and the synthesized data generates a data set.
5. The indoor scene understanding method based on the radio frequency signal multitask learning network according to claim 1, characterized in that: in step 3, the domain identification network Dom _ Net can give more weight to information with smaller amplitude values using a convolutional attention mechanism based on minimum pooling to distinguish different domains; the behavior recognition network Act _ Net distinguishes different actions based on the fact that information with larger amplitude values can be given more weight using a convolutional attention mechanism based on maximum pooling.
6. The indoor scene understanding method based on the radio frequency signal multitask learning network as claimed in claim 5, wherein: and (3) inputting the input data set obtained in the step (2) into a convolution attention mechanism AM, wherein the convolution attention mechanism AM comprises a channel attention module and a space attention module.
7. The indoor scene understanding method based on the radio frequency signal multitask learning network as claimed in claim 6, wherein: inputting the input data set obtained in the step 2 into a normal convolution operation, and adding an attention mechanism at the same time, wherein a channel attention module is expressed as the following formula:
Mc(F)=σ(MLP(AvgPool(X))+MLP(MinPool(X))),
wherein X is input data of a neural network, Avgpool and Minpool are respectively an average pooling layer and a limit pooling layer, MLP is a sharing layer, data dimension reduction and feature extraction are mainly realized on the sharing layer through convolution operation, and sigma is a corresponding Sigmoid activation function; the method comprises the steps that a channel attention module compresses a feature map on a space dimension, only the features inside each channel are considered, the input feature map passes through a global average pooling layer and a global limit pooling layer of the channel attention module respectively while convolution operation is carried out, the average pooling layer has feedback on each feature point and is used for keeping background information in the feature map, and when gradient back propagation calculation is carried out on the limit pooling layer, only the feature points with small response on the feature map have gradient feedback; inputting two feature maps of an average pooling layer and a limit pooling layer into a shared layer MLP to realize dimension reduction and feature extraction, compressing the spatial dimension of the feature maps, adding the output of the MLP, activating by a sigmoid function to obtain a channel attention matrix, and performing intelligent product operation on the result and the feature matrix subjected to convolution to obtain an adjusted feature F';
the spatial attention module compresses the channel, which is expressed as:
MS(F)=σ(fn*n([AvgPool(F′);MinPool(F′)])),
wherein F' is a feature after a channel attention mechanism, F corresponds to a two-dimensional convolution operation, n is a dimension of a convolution kernel, AvgPool is used for extracting an average value on a channel, and MinPool is used for extracting a limit value on the channel; connecting the feature matrixes extracted by the average pooling layer and the limit pooling layer, activating the feature matrixes by sigmoid after the convolution layer to obtain a Spatial Attention matrix (Spatial Attention), and carrying out intelligent product operation on the Spatial Attention matrix and the adjusted feature F' to obtain the following formula:
CA=Mc(F)·Ms(F),
CAi.e. the result of adding an attention mechanism on the basis of CNN, in a specific domain identification application, CAIncluding background information in the collected data, for characterizing the domain where the current user is located, when the network includes multiple layers, CAIterate as input into the calculation of the next layer.
8. The indoor scene understanding method based on the radio frequency signal multitask learning network as claimed in claim 7, wherein: the shared representation layer comprises two layers of convolution, and after each layer of convolution operation, the shared representation layer has a batch normalization layer and a structure with a leaked correction linear unit, so that the phenomena of gradient disappearance and gradient explosion are avoided.
9. The indoor scene understanding method based on the radio frequency signal multitask learning network according to claim 8, characterized in that: in step 3, using the wisnet network structure, the data input-output calculation process is as follows:
original dataset D { (x)1,y1),(x2,y2)...(xn,yn) Therein of
Figure FDA0003196237130000031
xiObtaining shared layer output S through two hard shared layersi
Si=LeaklyRelu(f(∑i∈Dxi*ks i+bs i)),
Wherein k is a corresponding convolution kernel parameter, and b is an offset; after convolution xiAfter activation by LeaklyRelu, k and b are shared among the three tasks; in the gradient updating process, returning the task specific gradient information and simultaneously returning the gradient information of the shared parameter;
in order to judge the domain where the user is located, a network structure shown by Dom _ Net is used; s after passing through sharing layeriFirstly, obtaining the product after convolution
Figure FDA0003196237130000041
Figure FDA0003196237130000042
During training, gradient disappearance may occur when the distribution of the intermediate layer data changesOr the problem of explosion; to solve this problem and at the same time to speed up the training,
Figure FDA0003196237130000043
a batch normalization layer BN is required; after passing through BN layer to obtain
Figure FDA0003196237130000044
Obtaining a result F after one-dimensional convolution by using maximum pooling after LeaklyReludom
Figure FDA0003196237130000045
Firstly, extracting minimum values on each channel, namely adding a channel attention mechanism CA to obtain
Figure FDA0003196237130000046
Figure FDA0003196237130000047
Then, compressing the channel information, i.e. adding a spatial attention mechanism SA to obtain
Figure FDA0003196237130000048
Figure FDA0003196237130000051
Data x after convolutioniAfter the two steps, the linear full-connection layer is output to obtain the optical fiber
Figure FDA0003196237130000052
Figure FDA0003196237130000053
Wherein WdomAnd bdomRespectively updating a weight matrix and a bias matrix for the iteration of the full connection layer;
Figure FDA0003196237130000054
the index value corresponding to the maximum value of each row is the output of the network prediction, and the corresponding loss function is Ldom
Figure FDA0003196237130000055
Same principle SiThe output obtained after Act _ Net is subjected to three-layer convolution layer to obtain Fact(ii) a As the signal with larger change amplitude contains behavior information of more users, an attention mechanism consisting of an average pooling layer and a maximum pooling layer is added; adding an injection machine to obtain
Figure FDA0003196237130000056
Figure FDA0003196237130000057
Figure FDA0003196237130000058
Obtained by passing through a linear full-connection layer
Figure FDA0003196237130000059
Figure FDA00031962371300000510
The corresponding loss functions are respectively:
Figure FDA0003196237130000061
relatively speaking, the network structure of Loc _ Net is simple, because the CNN convolutional neural network is sensitive to spatial information, the location can be well identified without adding a attention mechanism; the output after the convolutional layer after the Loc _ Net is
Figure FDA0003196237130000062
Figure FDA0003196237130000063
F is obtained after batch normalization layer and activation function layerloc
Figure FDA0003196237130000064
SiObtained by two-layer convolution and finally by full connection layer
Figure FDA0003196237130000065
Figure FDA0003196237130000066
Likewise, the penalty function for the final Loc _ Net is:
Figure FDA0003196237130000067
since the sharing layer is embedded in each sub-network, the loss returned from each sub-network contains the specific taskThe gradient information of the service also comprises the gradient information from the sharing layer, namely theta comprises thetash,θiTwo parts, the optimization objective function of wisnet is:
Figure FDA0003196237130000068
wherein L isi={Ldom,Lact,LlocUpdating parameters to minimize the objective function;
the final output of the wisnet is the output of three networks
Figure FDA0003196237130000071
And
Figure FDA0003196237130000072
respectively corresponding to the domain where the user is located, the position of the current domain where the user is located and the executed action; from the information of the domain and the location, the specific meaning contained by the action can be deduced.
CN202110891904.8A 2021-08-04 2021-08-04 Indoor scene understanding method based on radio frequency signal multi-task learning network Active CN113587935B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110891904.8A CN113587935B (en) 2021-08-04 2021-08-04 Indoor scene understanding method based on radio frequency signal multi-task learning network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110891904.8A CN113587935B (en) 2021-08-04 2021-08-04 Indoor scene understanding method based on radio frequency signal multi-task learning network

Publications (2)

Publication Number Publication Date
CN113587935A true CN113587935A (en) 2021-11-02
CN113587935B CN113587935B (en) 2023-12-01

Family

ID=78254994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110891904.8A Active CN113587935B (en) 2021-08-04 2021-08-04 Indoor scene understanding method based on radio frequency signal multi-task learning network

Country Status (1)

Country Link
CN (1) CN113587935B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709481A (en) * 2017-03-03 2017-05-24 深圳市唯特视科技有限公司 Indoor scene understanding method based on 2D-3D semantic data set
CN107451620A (en) * 2017-08-11 2017-12-08 深圳市唯特视科技有限公司 A kind of scene understanding method based on multi-task learning
US20200193296A1 (en) * 2018-12-18 2020-06-18 Microsoft Technology Licensing, Llc Neural network architecture for attention based efficient model adaptation
US20200302214A1 (en) * 2019-03-20 2020-09-24 NavInfo Europe B.V. Real-Time Scene Understanding System
CN112183395A (en) * 2020-09-30 2021-01-05 深兰人工智能(深圳)有限公司 Road scene recognition method and system based on multitask learning neural network
CN112347933A (en) * 2020-11-06 2021-02-09 浙江大华技术股份有限公司 Traffic scene understanding method and device based on video stream
CN112507835A (en) * 2020-12-01 2021-03-16 燕山大学 Method and system for analyzing multi-target object behaviors based on deep learning technology

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709481A (en) * 2017-03-03 2017-05-24 深圳市唯特视科技有限公司 Indoor scene understanding method based on 2D-3D semantic data set
CN107451620A (en) * 2017-08-11 2017-12-08 深圳市唯特视科技有限公司 A kind of scene understanding method based on multi-task learning
US20200193296A1 (en) * 2018-12-18 2020-06-18 Microsoft Technology Licensing, Llc Neural network architecture for attention based efficient model adaptation
US20200302214A1 (en) * 2019-03-20 2020-09-24 NavInfo Europe B.V. Real-Time Scene Understanding System
CN111723635A (en) * 2019-03-20 2020-09-29 北京四维图新科技股份有限公司 Real-time scene understanding system
CN112183395A (en) * 2020-09-30 2021-01-05 深兰人工智能(深圳)有限公司 Road scene recognition method and system based on multitask learning neural network
CN112347933A (en) * 2020-11-06 2021-02-09 浙江大华技术股份有限公司 Traffic scene understanding method and device based on video stream
CN112507835A (en) * 2020-12-01 2021-03-16 燕山大学 Method and system for analyzing multi-target object behaviors based on deep learning technology

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHENG YU ET AL.: "Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images", 《MULTIMEDIA TOOLS AND APPLICATIONS》, vol. 76, no. 3, pages 4427 - 4443, XP036185148, DOI: 10.1007/s11042-016-3423-1 *
姜啸远: "基于深度学习的场景识别研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, pages 138 - 1901 *
杨鹏;蔡青青;孙昊;孙丽红;: "基于卷积神经网络的室内场景识别", 郑州大学学报(理学版), no. 03, pages 76 - 80 *

Also Published As

Publication number Publication date
CN113587935B (en) 2023-12-01

Similar Documents

Publication Publication Date Title
US11763599B2 (en) Model training method and apparatus, face recognition method and apparatus, device, and storage medium
CN109983348A (en) Realize the technology of portable frequency spectrum analyzer
CN108629380B (en) Cross-scene wireless signal sensing method based on transfer learning
CN112036433B (en) CNN-based Wi-Move behavior sensing method
CN114359738B (en) Cross-scene robust indoor people number wireless detection method and system
AU2016200905A1 (en) A system and method for identifying and analyzing personal context of a user
Hao et al. CSI‐HC: A WiFi‐Based Indoor Complex Human Motion Recognition Method
CN111901028B (en) Human body behavior identification method based on CSI (channel State information) on multiple antennas
CN114423034A (en) Indoor personnel action identification method, system, medium, equipment and terminal
CN114781463A (en) Cross-scene robust indoor tumble wireless detection method and related equipment
CN112052816A (en) Human behavior prediction method and system based on adaptive graph convolution countermeasure network
Wu et al. Topological machine learning for multivariate time series
Gu et al. Device‐Free Human Activity Recognition Based on Dual‐Channel Transformer Using WiFi Signals
CN113587935A (en) Indoor scene understanding method based on radio frequency signal multitask learning network
CN117221816A (en) Multi-building floor positioning method based on Wavelet-CNN
CN112380903A (en) Human activity identification method based on WiFi-CSI signal enhancement
CN113642457B (en) Cross-scene human body action recognition method based on antagonistic meta-learning
CN114676727B (en) CSI-based human body activity recognition method irrelevant to position
Gao et al. A Multitask Sign Language Recognition System Using Commodity Wi‐Fi
CN116959059A (en) Living body detection method, living body detection device and storage medium
CN113202461B (en) Neural network-based lithology identification method and device
CN115002703A (en) Passive indoor people number detection method based on Wi-Fi channel state information
CN114358162A (en) Falling detection method and device based on continuous wavelet transform and electronic equipment
CN113378718A (en) Action identification method based on generation of countermeasure network in WiFi environment
US12026977B2 (en) Model training method and apparatus, face recognition method and apparatus, device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant