CN113587935A - Indoor scene understanding method based on radio frequency signal multitask learning network - Google Patents
Indoor scene understanding method based on radio frequency signal multitask learning network Download PDFInfo
- Publication number
- CN113587935A CN113587935A CN202110891904.8A CN202110891904A CN113587935A CN 113587935 A CN113587935 A CN 113587935A CN 202110891904 A CN202110891904 A CN 202110891904A CN 113587935 A CN113587935 A CN 113587935A
- Authority
- CN
- China
- Prior art keywords
- layer
- network
- data
- net
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000009471 action Effects 0.000 claims abstract description 46
- 230000006399 behavior Effects 0.000 claims abstract description 20
- 238000013528 artificial neural network Methods 0.000 claims abstract description 9
- 238000001914 filtration Methods 0.000 claims abstract description 5
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 3
- 238000011176 pooling Methods 0.000 claims description 51
- 230000007246 mechanism Effects 0.000 claims description 48
- 230000006870 function Effects 0.000 claims description 25
- 239000011159 matrix material Substances 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 238000000354 decomposition reaction Methods 0.000 claims description 7
- 230000003213 activating effect Effects 0.000 claims description 6
- 238000004880 explosion Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 230000009467 reduction Effects 0.000 claims description 6
- 230000008034 disappearance Effects 0.000 claims description 5
- 238000004458 analytical method Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 4
- 238000012937 correction Methods 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 238000002347 injection Methods 0.000 claims description 3
- 239000007924 injection Substances 0.000 claims description 3
- 239000013307 optical fiber Substances 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims 2
- 230000008447 perception Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 9
- 238000012360 testing method Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000005452 bending Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
- G01C21/206—Instruments for performing navigational calculations specially adapted for indoor navigation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Automation & Control Theory (AREA)
- Medical Informatics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of behavior perception, in particular to an indoor scene understanding method based on a radio frequency signal multitask learning network, which comprises the following steps of: collecting channel state information by using a wireless network card carrying Atheros; data preprocessing: filtering noise contained in an original signal, synthesizing multilink data after denoising is finished, standardizing a data format, and constructing an input data set of a neural network; multitask identification network: indoor scene understanding is achieved by using a multitask learning network wisnet, wherein the wisnet comprises a shared representation layer, and a domain identification network Dom _ Net, a position identification network Loc _ Net and a behavior identification network Act _ Net which use gradient information between multitasks of the shared representation layer. The method uses a multi-task learning method to simultaneously identify the scene where the user is located, including the domain, the position and the action where the user is located, and senses the user from multiple angles, so as to understand the meaning of the behavior of the user.
Description
Technical Field
The invention relates to the technical field of behavior perception, in particular to an indoor scene understanding method based on a radio frequency signal multitask learning network.
Background
When behavior perception is achieved by using commercial WiFi, action semantics are often closely related to occurrence scenes, and single action recognition cannot meet semantic understanding requirements of actions in specific scenes. The patent designs and realizes a scene understanding multi-task learning method based on channel state information. The method utilizes an attention mechanism to endow different weights to signals from different sources, utilizes a multi-task learning network to realize the mining of hidden information, and has stronger cross-domain property and expandability.
There have been many mature efforts for WiFi-based behavioral awareness and indoor positioning. However, in an indoor home environment, the user's actions are not separated from the environment and location in which they occur. The same action or similar actions may represent distinct semantics in different environments. For example, also lying down, with a high probability that the user is sleeping in a bed in a bedroom, while lying down on the floor of a living room may represent that the user has fallen, shocked or even more seriously died; to avoid similar misunderstandings in a home environment, it is important to distinguish semantics of the same or similar actions. Especially in the monitoring of solitary old people, the positions of the old people are known, and then the actions of the old people can be judged to better understand the behaviors of the old people, so that unnecessary misunderstandings are avoided. In the AR game, the same action occurring at different positions may represent different operations of a game character, and if the action can be recognized and the position and the current area where the user is located can be known, a definite semantic meaning can be given to the behavior of the user, and at that time, scenes supported by the AR will be richer. The environment and location of the user constrains the actions that the user can perform, in other words, the actions that the user performs are a reflection of the environment and location of the user and should not be split apart.
The existing touch sensing, such as wearable devices, is limited by the limited battery capacity, the devices cannot continue to work once the batteries are exhausted, and frequent charging undoubtedly brings a certain burden to users. Non-contact sensing devices such as RFID, millimeter wave and infrared sensors are expensive in manufacturing cost and are suitable for being used in places with large people flow, such as markets, airports, stations and the like. The ubiquity of WiFi in the home environment makes it circumvent the limitations of the application scenario, and WiFi is low cost and can be deployed on a large scale. Many key technologies based on WiFi signal indoor behavior semantic understanding need to be broken through, and accurate action semantic understanding not only needs to identify user behaviors, but also needs information support of the domain and the position where the user is located. There is currently no correlation work to fuse these three dimensions of information together.
Disclosure of Invention
In order to solve the problems, the invention provides an indoor scene understanding method based on a radio frequency signal multitask learning network.
In order to achieve the purpose, the invention adopts the technical scheme that:
an indoor scene understanding method based on a radio frequency signal multitask learning network comprises the following steps,
Preferably, the method is characterized in that:
in step 1, the data acquisition equipment comprises two computers, two routers carrying Atheros wireless network cards and a network cable, wherein the computers are connected with the routers through the network cable, the router system can be accessed through a notebook computer, the setting of parameters such as mode, center frequency, packet sending rate and the like is completed, and a signal sending instruction and a signal receiving instruction are transmitted to the routers; the two routers control the sending and receiving of CSI signals according to commands sent by terminals, the commands comprise destination addresses and the number of sending packets, each router is provided with two pairs of receiving and sending end antennas, the sending rate of the sending end is 500 packets/second, the bandwidth is 20MHZ, and the center frequency adopts 2.4 GHZ.
Preferably, in step 2, the denoising method is wavelet decomposition and reconstruction in wavelet transform, single-scale wavelet transform analysis is performed on the amplitude of CSI by using db3 wavelet, and one subcarrier data in an original signal is randomly selected to perform db3 wavelet coefficient decomposition and reconstruction, so as to complete noise filtering.
Preferably, in step 2, all link data of the two pairs of transmitting and receiving terminals are synthesized into a data format of 2000,56,4, the synthesized data and the three corresponding tags are respectively a domain, a position, and an action, and the synthesized data generates a data set.
Preferably, in step 3, the domain identification network Dom _ Net can give more weight to information with smaller amplitude values using the convolutional attention mechanism based on minimum pooling to distinguish different domains; the behavior recognition network Act _ Net distinguishes different actions based on the fact that information with larger amplitude values can be given more weight using a convolutional attention mechanism based on maximum pooling.
Preferably, the input data set obtained in step 2 is input into a convolution attention mechanism AM, which comprises a channel attention module and a spatial attention module.
Preferably, the input data set obtained in step 2 is input into a normal convolution operation, while adding an attention mechanism, and the channel attention module is expressed as follows:
Mc(F)=σ(MLP(AvgPool(X))+MLP(MinPool(X))),
wherein X is input data of a neural network, Avgpool and Minpool are respectively an average pooling layer and a limit pooling layer, MLP is a sharing layer, data dimension reduction and feature extraction are mainly realized on the sharing layer through convolution operation, and sigma is a corresponding Sigmoid activation function; the method comprises the steps that a channel attention module compresses a feature map on a space dimension, only the features inside each channel are considered, the input feature map passes through a global average pooling layer and a global limit pooling layer of the channel attention module respectively while convolution operation is carried out, the average pooling layer has feedback on each feature point and is used for keeping background information in the feature map, and when gradient back propagation calculation is carried out on the limit pooling layer, only the feature points with small response on the feature map have gradient feedback; inputting two feature maps of an average pooling layer and a limit pooling layer into a shared layer MLP to realize dimension reduction and feature extraction, compressing the spatial dimension of the feature maps, adding the output of the MLP, activating by a sigmoid function to obtain a channel attention matrix, and performing intelligent product operation on the result and the feature matrix subjected to convolution to obtain an adjusted feature F';
the spatial attention module compresses the channel, which is expressed as:
Ms(F)=σ(fn*n([AvgPool(F′);MinPool(F′)])),
wherein F' is a feature after a channel attention mechanism, F corresponds to a two-dimensional convolution operation, n is a dimension of a convolution kernel, AvgPool is used for extracting an average value on a channel, and MinPool is used for extracting a limit value on the channel; connecting the feature matrixes extracted by the average pooling layer and the limit pooling layer, activating the feature matrixes by sigmoid after the convolution layer to obtain a Spatial Attention matrix (Spatial Attention), and carrying out intelligent product operation on the Spatial Attention matrix and the adjusted feature F' to obtain the following formula:
CA=Mc(F)·Ms(F),
CAi.e. adding notes on the basis of CNNResults of the ideogram mechanism, in specific domain identification applications, CAIncluding background information in the collected data, for characterizing the domain where the current user is located, when the network includes multiple layers, CAIterate as input into the calculation of the next layer.
Preferably, the shared representation layer comprises two layers of convolution, and after each layer of convolution operation, the shared representation layer has a structure of a batch normalization layer and a correction linear unit with leakage to avoid the phenomena of gradient disappearance and gradient explosion.
Preferably, in step 3, using the wisnet network structure, the data input to output calculation process is as follows:
original dataset D { (x)1,y1),(x2,y2)...(xn,yn) Therein ofxiObtaining shared layer output S through two hard shared layersi:
Si=LeaklyRelu(f(∑i∈Dxi*ks i+bs i)),
Wherein k is a corresponding convolution kernel parameter, and b is an offset; after convolution xiAfter activation by LeaklyRelu, k and b are shared among the three tasks; in the gradient updating process, returning the task specific gradient information and simultaneously returning the gradient information of the shared parameter;
in order to judge the domain where the user is located, a network structure shown by Dom _ Net is used; s after passing through sharing layeriFirstly, obtaining the product after convolution
In the training process, the problem of gradient disappearance or explosion can occur when the data distribution of the middle layer is changed;to solve this problem and at the same time to speed up the training,a batch normalization layer BN is required; after passing through BN layer to obtainObtaining a result F after one-dimensional convolution by using maximum pooling after LeaklyReludom:
Firstly, extracting minimum values on each channel, namely adding a channel attention mechanism CA to obtain
Data x after convolutioniAfter the two steps, the linear full-connection layer is output to obtain the optical fiber
Wherein WdomAnd bdomRespectively updating a weight matrix and a bias matrix for the iteration of the full connection layer;
the index value corresponding to the maximum value of each row is the output of the network prediction, and the corresponding loss function is Ldom:
Same principle SiThe output obtained after Act _ Net is subjected to three-layer convolution layer to obtain Fact(ii) a As the signal with larger change amplitude contains behavior information of more users, an attention mechanism consisting of an average pooling layer and a maximum pooling layer is added; adding an injection machine to obtain
The corresponding loss functions are respectively:
relatively speaking, the network structure of Loc _ Net is simple, because the CNN convolutional neural network is sensitive to spatial information, the location can be well identified without adding a attention mechanism; the output after the convolutional layer after the Loc _ Net is
F is obtained after batch normalization layer and activation function layerloc:
Likewise, the penalty function for the final Loc _ Net is:
since the sharing layer is embedded in each sub-network, the loss returned in each sub-network contains gradient information of a specific task and gradient information from the sharing layer, namely theta contains thetash,θiTwo parts, the optimization objective function of wisnet is:
wherein L isi={Ldom,Lact,LlocUpdating parameters to minimize the objective function;
the final output of wisnet is the output of three networks,andrespectively corresponding to the domain where the user is located, the position of the current domain where the user is located and the executed action; from the information of the domain and the location, the specific meaning contained by the action can be deduced.
The beneficial effects of the invention are as follows:
the method of minimum pooling in the field of image recognition is less used, mainly because in the representation RGB of the image, 000 represents black, and the smaller the value the closer to black. The picture information extracted by the minimum pooling is background information with few characteristics, and the characteristics have no significance. But in the field of signal processing 0 is of practical significance. The signal reflected from different domains will have different amplitudes due to different room locations and furnishings. Starting from this, different domains are distinguished based on the amplitude level when the space is relatively stationary. Using a convolutional attention mechanism based on minimal pooling may give more weight to information with smaller amplitude values. Thereby neglecting the influence of information with large amplitude fluctuation. In the behavior perception technology based on the CSI, a method of multitask concurrency never exists. Aiming at the problem of multi-task scene understanding, the multi-task learning network structure wisnet based on the hard sharing mechanism provided by the patent utilizes the sharing mechanism of the convolutional layer to extract hidden information among subtasks, and provides possibility for cross-scene action identification and indoor positioning.
From the above, the advantages of the present invention are:
(1) the system distinguishes different meanings of the same action under different scenes and positions, and solves the problem that the traditional method cannot realize behavior semantic understanding.
(2) The system takes all the receiving and transmitting terminal carrier signals as network input, defines an effective data splicing format and more effectively utilizes indoor multipath information.
(3) The system provides a multi-task scene understanding network wisnet based on CSI, and behavior recognition and indoor positioning can be carried out under multiple scenes without retraining a model.
Drawings
Fig. 1 is a flowchart of an indoor scene understanding method based on a radio frequency signal multitask learning network according to the present invention.
Fig. 2 is a diagram of actions performed by volunteers in an indoor scene understanding method based on a radio frequency signal multitask learning network according to the present invention.
Fig. 3 is a schematic view of a hall scene in the indoor scene understanding method based on the radio frequency signal multitask learning network.
Fig. 4 is a schematic view of an office scene of the indoor scene understanding method based on the radio frequency signal multitask learning network.
Fig. 5 is a schematic diagram of wavelet reconstruction signals of different scales of the indoor scene understanding method based on the radio frequency signal multitask learning network.
Fig. 6 is a schematic diagram of data set construction of an indoor scene understanding method based on a radio frequency signal multitask learning network.
Fig. 7 is a diagram of a wisnet network structure for understanding an indoor scene based on a radio frequency signal multitask learning network according to the present invention.
Fig. 8 is a diagram of an indoor scene understanding method Dom _ Net attention mechanism structure based on a radio frequency signal multitask learning network.
FIG. 9 is a schematic diagram of sub-network accuracy and loss value in the training process of the indoor scene understanding method based on the radio frequency signal multitask learning network.
Fig. 10 is a schematic diagram of a wisnet confusion matrix of an indoor scene understanding method based on a radio frequency signal multitask learning network.
Fig. 11 is a schematic diagram of wisnet performance evaluation of an indoor scene understanding method based on a radio frequency signal multitask learning network.
Fig. 12 is a comparison graph of training accuracy rates of different structures of Act _ Net in the indoor scene understanding method based on the radio frequency signal multitask learning network.
Fig. 13 is a comparison graph of indexes of the indoor scene understanding method Act _ Net based on the radio frequency signal multitask learning network under different networks.
Detailed Description
In order to make the purpose, technical solution and advantages of the present technical solution more clear, the present technical solution is further described in detail below with reference to specific embodiments. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present teachings.
As shown in fig. 1 to 8, the present embodiment provides an indoor scene understanding method based on a radio frequency signal multitask learning network, focuses on semantic understanding of cross-domain actions, which is a key technology in the intelligent perception field, and provides a scene understanding system architecture Wi-sys based on an indoor wireless signal as shown in fig. 1. Wi-Senys comprises three parts of data acquisition, data preprocessing and multitask identification network. Firstly, the loaded Atheros wireless network card is used for collecting Channel State Information (CSI). Then, noise contained in the original signal is filtered, multi-link data is synthesized after denoising is finished, a data format is standardized, and an input data set of the neural network is constructed. Finally, indoor scene understanding is achieved through the multitask learning network wisnet, and the wisnet comprises a sharing representation layer, a domain identification network Dom _ Net, a position identification network Loc _ Net and a behavior identification network Act _ Net.
Data acquisition
The equipment for acquiring the experimental data comprises two notebook computers, two routers carrying Atheros wireless network cards and two 5-meter long network cables. The computer is connected with the router through a network cable, and the router system can be accessed through a notebook computer to complete the setting of parameters such as modes, center frequencies, packet sending rates and the like. And communicates send and receive signal instructions to the router. The two routers control the transmission and reception of the CSI signals according to commands sent by the terminals. The command includes a destination address and the number of packets to be transmitted. Each router has two pairs of transmit-receive end antennas, and the packet transmission rate of the transmit end is 500 packets/second. The bandwidth is 20MHz, and the center frequency adopts 2.4 GHz.
In the experimental setup, the volunteers performed the actions as shown in fig. 2, including squat, stooping, walking, raising hands, and other common actions in life. The volunteer performed 10 passes of each action at each location in the field, with a sample time of approximately 4.5 seconds per action. Each sample collected consists of 2300 CSI packets.
The scene shown in fig. 3 is a hall with a certain teaching building being relatively open, a few tables are distributed around the hall, and the number of windows around the hall is large. The height of the router from the ground is 85 cm, the size of each position fingerprint block is 1.2 multiplied by 1.2 m, and each domain comprises 9 positions which are respectively named as 1-9 by numbers. The domain size is about 13 square meters. In the process of collecting CSI, pedestrians pass through the CSI, and certain interference is brought to effective signals. Fig. 4 shows a conference room, in which the desks and chairs are arranged closely and the wall area is large. The scene space is larger and the surrounding environment is more complex than the lobby scene shown in fig. 4-9. After the signal is sent out, the number of reflections passing through static objects such as tables, chairs and walls in the environment is more, and therefore more uncertain factors are contained in the collected CSI signal. The volunteers performed the actions shown in fig. 4, including some actions common in living such as squatting, bending, walking, raising hands, etc. The volunteer performed 10 passes of each action at each location in the field, with a sample time of approximately 4.5 seconds per action. Each sample collected consists of 2300 CSI packets.
Data pre-processing
And the acquisition and denoising part is used for acquiring the CSI by using an Atheros network card, and the signal can be reflected, diffracted and scattered by furniture and other static objects and human bodies in the process of reaching the receiving end from the transmitting end. In the process, the device itself may also generate vibration, and meanwhile, other devices transmitting wireless signals in the home environment may also cause interference to the CSI propagation process. Leading to the situation of packet loss, delay and noise in the signal during the double-end transmission, which easily leads to the effective signal being submerged. The data needs to be denoised before extracting effective features in the CSI signal. The denoising method used herein is wavelet decomposition and reconstruction in wavelet transform. Wavelet decomposition and reconstruction denoising, as used herein, primarily utilizes the db3 wavelet to perform a single-scale wavelet transform analysis of the amplitudes of the CSI. Randomly selecting one subcarrier data in the original signal to perform db3 wavelet coefficient decomposition and reconstruction, and obtaining the result shown in fig. 5. As the reconstruction scale increases, the signal tends to be smooth. When the scale 6 reconstruction is used, the relatively high-frequency signal is lost more, and part of the signal is not matched with the original signal, so the a5 scale reconstruction is selected.
In the experimental process, the data set construction part observes that the signals collected by different transceiving end devices are different even if the same volunteer executes the same action at the same position in the same domain. As shown in fig. 5. Even if the receiving end is the same, the amplitude value interval and the data change pattern from different transmitting ends are different. Different transceiving end links form different perspectives of human body variation in space. The common sense tells us that the richer the viewing angle the more comprehensive and true the changes we see. In order to better utilize the data redundancy brought by multipath and simultaneously meet the input of the neural network, all link data of two pairs of transceiving ends are synthesized into a data format of (2000,56, 4). The synthesized data along with its corresponding three labels (domain, location, action) generate a data set. And connecting the data collected by the devices at the two pairs of transmitting and receiving ends, and performing longitudinal splicing to obtain a data format (2000,56, 4). The spliced data together with the three corresponding labels generates a data set, and the data format is as shown in fig. 6.
Multitask identification network
Unlike a single-task learning network, a dataset of a multi-task learning network contains information in three dimensions, namely domain, position and action. The method of multi-task learning is used for simultaneously reading and processing three kinds of information, and hidden information among tasks can be fully mined. The process is mainly completed by a parameter sharing mechanism, and a sharing layer can synthesize gradient information among a plurality of tasks and synchronously update the plurality of tasks. The architecture diagram for a scenario-aware neural network multitask learning as used herein is shown in fig. 7.
Note that in the conventional mechanism, Dom _ Net distinguishes between different domains based on the amplitude level when space is relatively stationary. Using a convolutional attention mechanism based on minimal pooling may give more weight to information with smaller amplitude values. Thereby neglecting the influence of information with large amplitude fluctuation. And Act _ Net adds a mechanism of attention based on maximum pooling, so that information with larger amplitude dominates. Different networks add different attention mechanisms to achieve focusing on different signals. The attention module employed by Dom _ Net is shown in FIG. 8
The convolution attention mechanism AM mainly comprises two parts: a channel attention module and a spatial attention module. Each channel of the features represents a special detector, and the channel attention module compresses the feature matrix in a spatial dimension to extract feature information needing attention in each channel. The spatial attention mechanism is to compress the channels, and integrate the features extracted by each channel in consideration of the feature dimension of the whole data.
The input data is normally convolved while an attention mechanism is added. The channel attention module is expressed as follows:
Mc(F)=σ(MLP(AvgPool(X))+MLP(MinPool(X))),
wherein, X is input data of a neural network, Avgpool and Minpool are an average pooling layer and a minimum pooling layer respectively, MLP is a sharing layer, data dimension reduction and feature extraction are mainly realized through convolution operation in the sharing layer, sigma is a corresponding activation function, and a Sigmoid activation function is used here.
The channel attention module compresses the feature map in the spatial dimension, taking into account only the features inside each channel. The input feature map passes through the global average pooling layer and the global minimum pooling layer of the channel attention module respectively while performing convolution operation. The average pooling layer has feedback on each feature point and is used for keeping background information in the feature map, and when the minimum pooling layer is used for gradient back propagation calculation, only the feature points which respond to small features on the feature map have gradient feedback, so that the minimum pooling can be used for selecting features which are not obvious in change on the feature map. And inputting the two feature maps passing through the average pooling layer and the minimum pooling layer into a shared layer MLP to realize dimension reduction and feature extraction, and compressing the spatial dimension of the feature maps. And activating the output sum of the MLP through a sigmoid function to obtain a Channel Attention matrix (Channel Attention), and performing intelligent product operation on the result and the feature matrix subjected to convolution to obtain an adjusted feature F'.
The spatial attention module compresses the channels and comprehensively considers the relationship among the channels. The spatial attention module is expressed as:
Ms(F)=σ(fn*n([AvgPool(F′);MinPool(F′)])),
where F' is the feature after the channel attention mechanism, F corresponds to a two-dimensional convolution operation, and n is the dimensionality of the convolution kernel.
AvgPool was used to extract the mean on the channel and MinPool was used to extract the minimum on the channel. Connecting the average pooling layer with the feature matrix extracted by the minimum pooling layer, activating by sigmoid after the convolution layer to obtain a Spatial Attention matrix (Spatial Attention), and performing intelligent product operation on the Spatial Attention matrix and the adjusted feature F' to obtain the following formula: .
CA=Mc(F)·Ms(F),
Wherein, CAI.e. the result of adding an attention mechanism on the basis of CNN, in a specific domain identification application, CAIncluding background information in the collected data, to characterize the domain where the current user is located. When the network comprises multiple layers, CAIterate as input into the calculation of the next layer.
The mechanism of attention in Act _ Net is similar to that of FIG. 8, and during use, the minimum pooling needs to be replaced by the maximum pooling.
The wisnet includes a shared representation layer, a domain identification network Dom _ Net, a location identification network Loc _ Net, and an action identification network Act _ Net. The shared representation layer comprises two layers of convolution, and after each layer of convolution operation, the shared representation layer has a batch normalization layer and a structure with leaked correction linear units to avoid gradient extinction and gradient explosion phenomena. The network structure of the three subtasks is shown in fig. 7, and the data input-output calculation process is as follows:
original dataset D { (x)1,y1),(x2,y2)...(xn,yn) Therein ofxiObtaining shared layer output S through two hard shared layersi:
Si=LeaklyRelu(f(∑i∈Dxi*ks i+bs i)),
Where k is the corresponding convolution kernel parameter and b is the offset. After convolution xiAfter LeaklyRelu activation, k and b are shared among the three tasks. In the gradient updating process, the task-specific gradient information is returned and the gradient information of the shared parameter is returned at the same time.
To determine the domain in which the user is located, a network structure shown by Dom _ Net in fig. 5 is used. S after passing through sharing layeriFirstly, obtaining the product after convolution
During training, the problem of gradient disappearance or explosion can occur when the data distribution of the middle layer is changed. To solve this problem and at the same time to speed up the training,a batch normalization layer BN is required. After passing through BN layer to obtainObtaining a result F after one-dimensional convolution by using maximum pooling after LeaklyReludom:
Firstly, extracting minimum values on each channel, namely adding a channel attention mechanism CA to obtain
Data x after convolutioniAfter the two steps, the linear full-connection layer is output to obtain the optical fiber
Wherein, WdomAnd bdomRespectively updating a weight matrix and a bias matrix for the iteration of the full connection layer.
The index value corresponding to the maximum value of each row is the output of the network prediction, and the corresponding loss function is Ldom:
Same principle SiThe output obtained after Act _ Net is subjected to three-layer convolution layer to obtain Fact. Since the signal with larger variation amplitude contains more user behavior information, an attention mechanism consisting of average pooling and maximum pooling layers is added. Adding an injection machine to obtain
The corresponding loss functions are respectively:
the network structure of Loc _ Net is relatively simple, because the CNN convolutional neural network itself is sensitive to spatial information, and thus can identify the position well without adding a attention mechanism. The output after the convolutional layer after the Loc _ Net is
F is obtained after batch normalization layer and activation function layerloc:
Likewise, the penalty function for the final Loc _ Net is:
since the sharing layer is embedded in each sub-network, the loss returned in each sub-network contains gradient information of a specific task and gradient information from the sharing layer, namely theta contains thetash,θiTwo parts, the optimization objective function of wisnet is:
wherein L isi={Ldom,Lact,LlocAnd (4) updating the parameters to minimize the objective function.
The final output of the wisnet is the output of three networksCorresponding to the domain in which the user is located, the location of the current domain in which the user is located, and the action performed, respectively. From the information of the domain and the location, the specific meaning contained by the action can be deduced.
Example 1
The present embodiment verifies the accuracy and system robustness of the above method.
Accuracy of identification
Training is performed using the data sets under both domains. The actions contained under the data set of each domain are not exactly the same, and actions not involved in a domain are individually categorized into one class. The accuracy and loss variation during training is shown in fig. 9.
After the addition of the shared layer, as the number of training rounds increases, the accuracy gradually increases while the loss gradually decreases. After 200 rounds of training, the accuracy of all three tasks reaches more than 95%, and the loss is averagely reduced to be less than 0.1.
Wisnet was trained using a data set under both domains. The confusion matrix for wisnet in the test set is shown in fig. 10.
As can be seen from the graphs of FIG. 10a) and B), the accuracy of each category of Act _ Net is over 80%, and the accuracy of Loc _ Net is over 95%.
Other evaluation indicators on the test set are Recall (Recall), precision (permission), and macro F1 as shown in fig. 11.
As can be seen from fig. 11, Dom _ Net and Loc _ Net performed best, with each index being 95% or more. Act _ Net is difficult to identify when statistical features because features are rich in variability after varying domains and positions. Even so, the precision rate, the recall rate and the macro-F1 value reach 83%. In overview, performing action and location recognition under multiple domains, adding a hard sharing mechanism can significantly improve the performance of the model.
The correct implementation of classification of each subtask of wisnet is a necessary condition for scene understanding, and in the scene understanding task, action semantics can be correctly analyzed only when (domains, positions and actions) are correctly classified. To evaluate the classification performance of wisnet, tests were performed on the test set herein. The test indexes are detailed in table 1 below.
TABLE 1 Wisenet test results
Wherein √ is a correct classification, and x is a wrong classification.
The TTT in 1888 data is 1553, and accounts for 82.3%. The TTT in 1888 data is 1553, and accounts for 82.3%. Of the 335 data items with the remaining misclassifications, the TTFs are 291 data items. This indicates that the probability of an Act _ Net misclassification resulting in an overall misclassification is 87% with the precondition that Loc _ Net and Dom _ Net are correctly classified. And the sum of TTF, TFF, FFF and FTF is 300, wherein TTF is 291. That is, when Act _ Net classification is wrong, the ratio of the Loc _ Net classification and the Dom _ Net classification is 97%. And the sum of TTF and TTT is 1844, which accounts for 97.6%, and Loc _ Net and Dom _ Net can correctly classify a large part of data and have little influence on the whole classification. From this analysis, wisnet presents a "short plate effect" whose overall classification performance is determined by the subtask network Act _ Net. Therefore, when the wisnet is improved by adopting different structures and parameters, the classification performance of the Act _ Net is focused.
System robustness
To observe the effect of the attention mechanism, the following comparative experiments were performed for different network structures herein. The attention mechanism is named Act _ o _ Dom _ o, Act _ o _ Dom _ w, Act _ w _ Dom _ o and Act _ w _ Dom _ w according to whether the attention mechanism is added or not. The Act _ Net training accuracy for different network structures under the same data set is shown in FIG. 9. Fig. 12 shows the accuracy of Act _ Net in the course of 100 rounds of training under the four network structures, and it can be obviously observed that the network without adding the attention mechanism has the worst performance, and the accuracy only reaches about 80%, while the network with adding the attention mechanism has relatively good performance, wherein the Act-w-Dom-w, i.e. wisnet, with two attention mechanisms added simultaneously has the best performance.
Fig. 13 shows the accuracy of wisnet semantic recognition under four different network architectures. It can be shown that after the attention is added to Act _ Net and Dom _ Net simultaneously, the recognition accuracy of the action semantics is obviously improved.
The foregoing is only a preferred embodiment of the present invention, and many variations in the specific embodiments and applications of the invention may be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the claims of this patent.
Claims (9)
1. An indoor scene understanding method based on a radio frequency signal multitask learning network is characterized by comprising the following steps: comprises the following steps of (a) carrying out,
step 1, data acquisition: collecting channel state information by using a wireless network card carrying Atheros;
step 2, data preprocessing: filtering noise contained in an original signal, synthesizing multilink data after denoising is finished, standardizing a data format, and constructing an input data set of a neural network;
step 3, multi-task identification network: indoor scene understanding is achieved by using a multitask learning network wisnet, wherein the wisnet comprises a shared representation layer, and a domain identification network Dom _ Net, a position identification network Loc _ Net and a behavior identification network Act _ Net which use gradient information between multitasks of the shared representation layer.
2. The indoor scene understanding method based on the radio frequency signal multitask learning network according to claim 1, characterized in that:
in step 1, the data acquisition equipment comprises two computers, two routers carrying Atheros wireless network cards and a network cable, wherein the computers are connected with the routers through the network cable, the router system can be accessed through a notebook computer, the setting of parameters such as mode, center frequency, packet sending rate and the like is completed, and a signal sending instruction and a signal receiving instruction are transmitted to the routers; the two routers control the sending and receiving of CSI signals according to commands sent by terminals, the commands comprise destination addresses and the number of sending packets, each router is provided with two pairs of receiving and sending end antennas, the sending rate of the sending end is 500 packets/second, the bandwidth is 20MHZ, and the center frequency adopts 2.4 GHZ.
3. The indoor scene understanding method based on the radio frequency signal multitask learning network according to claim 1, characterized in that:
in step 2, the denoising method is wavelet decomposition and reconstruction in wavelet transformation, single-scale wavelet transformation analysis is performed on the amplitude of the CSI by using a db3 wavelet, and one subcarrier data in an original signal is randomly selected to perform db3 wavelet coefficient decomposition and reconstruction, so that noise filtering is completed.
4. The indoor scene understanding method based on the radio frequency signal multitask learning network according to claim 2, characterized in that: in step 2, all link data of two pairs of transceiving ends are synthesized into a data format of 2000,56,4, the synthesized data and three corresponding labels thereof are respectively a domain, a position and an action, and the synthesized data generates a data set.
5. The indoor scene understanding method based on the radio frequency signal multitask learning network according to claim 1, characterized in that: in step 3, the domain identification network Dom _ Net can give more weight to information with smaller amplitude values using a convolutional attention mechanism based on minimum pooling to distinguish different domains; the behavior recognition network Act _ Net distinguishes different actions based on the fact that information with larger amplitude values can be given more weight using a convolutional attention mechanism based on maximum pooling.
6. The indoor scene understanding method based on the radio frequency signal multitask learning network as claimed in claim 5, wherein: and (3) inputting the input data set obtained in the step (2) into a convolution attention mechanism AM, wherein the convolution attention mechanism AM comprises a channel attention module and a space attention module.
7. The indoor scene understanding method based on the radio frequency signal multitask learning network as claimed in claim 6, wherein: inputting the input data set obtained in the step 2 into a normal convolution operation, and adding an attention mechanism at the same time, wherein a channel attention module is expressed as the following formula:
Mc(F)=σ(MLP(AvgPool(X))+MLP(MinPool(X))),
wherein X is input data of a neural network, Avgpool and Minpool are respectively an average pooling layer and a limit pooling layer, MLP is a sharing layer, data dimension reduction and feature extraction are mainly realized on the sharing layer through convolution operation, and sigma is a corresponding Sigmoid activation function; the method comprises the steps that a channel attention module compresses a feature map on a space dimension, only the features inside each channel are considered, the input feature map passes through a global average pooling layer and a global limit pooling layer of the channel attention module respectively while convolution operation is carried out, the average pooling layer has feedback on each feature point and is used for keeping background information in the feature map, and when gradient back propagation calculation is carried out on the limit pooling layer, only the feature points with small response on the feature map have gradient feedback; inputting two feature maps of an average pooling layer and a limit pooling layer into a shared layer MLP to realize dimension reduction and feature extraction, compressing the spatial dimension of the feature maps, adding the output of the MLP, activating by a sigmoid function to obtain a channel attention matrix, and performing intelligent product operation on the result and the feature matrix subjected to convolution to obtain an adjusted feature F';
the spatial attention module compresses the channel, which is expressed as:
MS(F)=σ(fn*n([AvgPool(F′);MinPool(F′)])),
wherein F' is a feature after a channel attention mechanism, F corresponds to a two-dimensional convolution operation, n is a dimension of a convolution kernel, AvgPool is used for extracting an average value on a channel, and MinPool is used for extracting a limit value on the channel; connecting the feature matrixes extracted by the average pooling layer and the limit pooling layer, activating the feature matrixes by sigmoid after the convolution layer to obtain a Spatial Attention matrix (Spatial Attention), and carrying out intelligent product operation on the Spatial Attention matrix and the adjusted feature F' to obtain the following formula:
CA=Mc(F)·Ms(F),
CAi.e. the result of adding an attention mechanism on the basis of CNN, in a specific domain identification application, CAIncluding background information in the collected data, for characterizing the domain where the current user is located, when the network includes multiple layers, CAIterate as input into the calculation of the next layer.
8. The indoor scene understanding method based on the radio frequency signal multitask learning network as claimed in claim 7, wherein: the shared representation layer comprises two layers of convolution, and after each layer of convolution operation, the shared representation layer has a batch normalization layer and a structure with a leaked correction linear unit, so that the phenomena of gradient disappearance and gradient explosion are avoided.
9. The indoor scene understanding method based on the radio frequency signal multitask learning network according to claim 8, characterized in that: in step 3, using the wisnet network structure, the data input-output calculation process is as follows:
original dataset D { (x)1,y1),(x2,y2)...(xn,yn) Therein ofxiObtaining shared layer output S through two hard shared layersi:
Si=LeaklyRelu(f(∑i∈Dxi*ks i+bs i)),
Wherein k is a corresponding convolution kernel parameter, and b is an offset; after convolution xiAfter activation by LeaklyRelu, k and b are shared among the three tasks; in the gradient updating process, returning the task specific gradient information and simultaneously returning the gradient information of the shared parameter;
in order to judge the domain where the user is located, a network structure shown by Dom _ Net is used; s after passing through sharing layeriFirstly, obtaining the product after convolution
During training, gradient disappearance may occur when the distribution of the intermediate layer data changesOr the problem of explosion; to solve this problem and at the same time to speed up the training,a batch normalization layer BN is required; after passing through BN layer to obtainObtaining a result F after one-dimensional convolution by using maximum pooling after LeaklyReludom:
Firstly, extracting minimum values on each channel, namely adding a channel attention mechanism CA to obtain
Data x after convolutioniAfter the two steps, the linear full-connection layer is output to obtain the optical fiber
Wherein WdomAnd bdomRespectively updating a weight matrix and a bias matrix for the iteration of the full connection layer;
the index value corresponding to the maximum value of each row is the output of the network prediction, and the corresponding loss function is Ldom:
Same principle SiThe output obtained after Act _ Net is subjected to three-layer convolution layer to obtain Fact(ii) a As the signal with larger change amplitude contains behavior information of more users, an attention mechanism consisting of an average pooling layer and a maximum pooling layer is added; adding an injection machine to obtain
The corresponding loss functions are respectively:
relatively speaking, the network structure of Loc _ Net is simple, because the CNN convolutional neural network is sensitive to spatial information, the location can be well identified without adding a attention mechanism; the output after the convolutional layer after the Loc _ Net is
F is obtained after batch normalization layer and activation function layerloc:
Likewise, the penalty function for the final Loc _ Net is:
since the sharing layer is embedded in each sub-network, the loss returned from each sub-network contains the specific taskThe gradient information of the service also comprises the gradient information from the sharing layer, namely theta comprises thetash,θiTwo parts, the optimization objective function of wisnet is:
wherein L isi={Ldom,Lact,LlocUpdating parameters to minimize the objective function;
the final output of the wisnet is the output of three networksAndrespectively corresponding to the domain where the user is located, the position of the current domain where the user is located and the executed action; from the information of the domain and the location, the specific meaning contained by the action can be deduced.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110891904.8A CN113587935B (en) | 2021-08-04 | 2021-08-04 | Indoor scene understanding method based on radio frequency signal multi-task learning network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110891904.8A CN113587935B (en) | 2021-08-04 | 2021-08-04 | Indoor scene understanding method based on radio frequency signal multi-task learning network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113587935A true CN113587935A (en) | 2021-11-02 |
CN113587935B CN113587935B (en) | 2023-12-01 |
Family
ID=78254994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110891904.8A Active CN113587935B (en) | 2021-08-04 | 2021-08-04 | Indoor scene understanding method based on radio frequency signal multi-task learning network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113587935B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709481A (en) * | 2017-03-03 | 2017-05-24 | 深圳市唯特视科技有限公司 | Indoor scene understanding method based on 2D-3D semantic data set |
CN107451620A (en) * | 2017-08-11 | 2017-12-08 | 深圳市唯特视科技有限公司 | A kind of scene understanding method based on multi-task learning |
US20200193296A1 (en) * | 2018-12-18 | 2020-06-18 | Microsoft Technology Licensing, Llc | Neural network architecture for attention based efficient model adaptation |
US20200302214A1 (en) * | 2019-03-20 | 2020-09-24 | NavInfo Europe B.V. | Real-Time Scene Understanding System |
CN112183395A (en) * | 2020-09-30 | 2021-01-05 | 深兰人工智能(深圳)有限公司 | Road scene recognition method and system based on multitask learning neural network |
CN112347933A (en) * | 2020-11-06 | 2021-02-09 | 浙江大华技术股份有限公司 | Traffic scene understanding method and device based on video stream |
CN112507835A (en) * | 2020-12-01 | 2021-03-16 | 燕山大学 | Method and system for analyzing multi-target object behaviors based on deep learning technology |
-
2021
- 2021-08-04 CN CN202110891904.8A patent/CN113587935B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709481A (en) * | 2017-03-03 | 2017-05-24 | 深圳市唯特视科技有限公司 | Indoor scene understanding method based on 2D-3D semantic data set |
CN107451620A (en) * | 2017-08-11 | 2017-12-08 | 深圳市唯特视科技有限公司 | A kind of scene understanding method based on multi-task learning |
US20200193296A1 (en) * | 2018-12-18 | 2020-06-18 | Microsoft Technology Licensing, Llc | Neural network architecture for attention based efficient model adaptation |
US20200302214A1 (en) * | 2019-03-20 | 2020-09-24 | NavInfo Europe B.V. | Real-Time Scene Understanding System |
CN111723635A (en) * | 2019-03-20 | 2020-09-29 | 北京四维图新科技股份有限公司 | Real-time scene understanding system |
CN112183395A (en) * | 2020-09-30 | 2021-01-05 | 深兰人工智能(深圳)有限公司 | Road scene recognition method and system based on multitask learning neural network |
CN112347933A (en) * | 2020-11-06 | 2021-02-09 | 浙江大华技术股份有限公司 | Traffic scene understanding method and device based on video stream |
CN112507835A (en) * | 2020-12-01 | 2021-03-16 | 燕山大学 | Method and system for analyzing multi-target object behaviors based on deep learning technology |
Non-Patent Citations (3)
Title |
---|
ZHENG YU ET AL.: "Indoor scene recognition via multi-task metric multi-kernel learning from RGB-D images", 《MULTIMEDIA TOOLS AND APPLICATIONS》, vol. 76, no. 3, pages 4427 - 4443, XP036185148, DOI: 10.1007/s11042-016-3423-1 * |
姜啸远: "基于深度学习的场景识别研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, pages 138 - 1901 * |
杨鹏;蔡青青;孙昊;孙丽红;: "基于卷积神经网络的室内场景识别", 郑州大学学报(理学版), no. 03, pages 76 - 80 * |
Also Published As
Publication number | Publication date |
---|---|
CN113587935B (en) | 2023-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11763599B2 (en) | Model training method and apparatus, face recognition method and apparatus, device, and storage medium | |
CN109983348A (en) | Realize the technology of portable frequency spectrum analyzer | |
CN108629380B (en) | Cross-scene wireless signal sensing method based on transfer learning | |
CN112036433B (en) | CNN-based Wi-Move behavior sensing method | |
CN114359738B (en) | Cross-scene robust indoor people number wireless detection method and system | |
AU2016200905A1 (en) | A system and method for identifying and analyzing personal context of a user | |
Hao et al. | CSI‐HC: A WiFi‐Based Indoor Complex Human Motion Recognition Method | |
CN111901028B (en) | Human body behavior identification method based on CSI (channel State information) on multiple antennas | |
CN114423034A (en) | Indoor personnel action identification method, system, medium, equipment and terminal | |
CN114781463A (en) | Cross-scene robust indoor tumble wireless detection method and related equipment | |
CN112052816A (en) | Human behavior prediction method and system based on adaptive graph convolution countermeasure network | |
Wu et al. | Topological machine learning for multivariate time series | |
Gu et al. | Device‐Free Human Activity Recognition Based on Dual‐Channel Transformer Using WiFi Signals | |
CN113587935A (en) | Indoor scene understanding method based on radio frequency signal multitask learning network | |
CN117221816A (en) | Multi-building floor positioning method based on Wavelet-CNN | |
CN112380903A (en) | Human activity identification method based on WiFi-CSI signal enhancement | |
CN113642457B (en) | Cross-scene human body action recognition method based on antagonistic meta-learning | |
CN114676727B (en) | CSI-based human body activity recognition method irrelevant to position | |
Gao et al. | A Multitask Sign Language Recognition System Using Commodity Wi‐Fi | |
CN116959059A (en) | Living body detection method, living body detection device and storage medium | |
CN113202461B (en) | Neural network-based lithology identification method and device | |
CN115002703A (en) | Passive indoor people number detection method based on Wi-Fi channel state information | |
CN114358162A (en) | Falling detection method and device based on continuous wavelet transform and electronic equipment | |
CN113378718A (en) | Action identification method based on generation of countermeasure network in WiFi environment | |
US12026977B2 (en) | Model training method and apparatus, face recognition method and apparatus, device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |