CN114359738B

CN114359738B - Cross-scene robust indoor people number wireless detection method and system

Info

Publication number: CN114359738B
Application number: CN202210257695.6A
Authority: CN
Inventors: 毕宿志; 侯华炜; 全智; 郑莉莉
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2022-03-16
Filing date: 2022-03-16
Publication date: 2022-06-14
Anticipated expiration: 2042-03-16
Also published as: CN114359738A

Abstract

The invention discloses a cross-scene robust wireless detection method and a system for the number of indoor people, wherein the method comprises the following steps: acquiring a feature extractor obtained by training a convolutional neural network model according to a training data set; acquiring a classifier training data set of a scene to be detected, extracting features in the classifier training data set through a feature extractor and training a machine learning model to acquire a classifier of the scene to be detected, wherein the classifier training data set is acquired according to a set of channel state information acquired in the scene to be detected in advance, and the classifier is used for detecting the number of people according to input features and outputting a detection result; and acquiring data to be detected in the scene to be detected, detecting the number of people through the feature extractor and the classifier, and outputting a detection result, wherein the data to be detected is acquired according to the state information of the channel to be detected in the scene to be detected. Compared with the prior art, the method and the device are favorable for improving the efficiency and the accuracy of people number detection and are favorable for the people number detection across scenes.

Description

Cross-scene robust indoor people number wireless detection method and system

Technical Field

The invention relates to the technical field of people number detection, in particular to a cross-scene robust indoor people number wireless detection method and system.

Background

With the development of science and technology, people number detection is more and more widely applied in the fields of safety monitoring, indoor energy conservation, personnel control and the like. People number detection is a process of detecting the number of people within a particular scene, area or environment (e.g., a room). In the prior art, a camera is usually arranged in a scene to be detected to acquire images, and then a trained deep learning model is combined to detect the number of people.

The problem of the prior art is that when people number detection is performed through a camera, the accuracy of people number detection is easily influenced by shielding the sight. Meanwhile, the deep learning model in the prior art is obtained by training according to a large amount of labeled sample data in a specific scene, and if the deep learning model is applied to another scene, the efficiency and the accuracy of people number detection are greatly reduced, so that people number detection across scenes is not facilitated.

Thus, there is still a need for improvement and development of the prior art.

Disclosure of Invention

The invention mainly aims to provide a cross-scene robust indoor people number wireless detection method and system, and aims to solve the problems that in the prior art, when people number detection is carried out through a camera, the accuracy of people number detection is easily influenced due to the fact that sight lines are shielded, and a deep learning model obtained through training is not beneficial to carrying out cross-scene people number detection.

In order to achieve the above object, a first aspect of the present invention provides a cross-scene robust wireless detection method for the number of people in a room, wherein the cross-scene robust wireless detection method for the number of people in the room comprises:

acquiring a feature extractor, wherein the feature extractor is obtained by training a preset convolutional neural network model in advance according to a training data set, the training data set is obtained according to a set of channel state information acquired in a training scene in advance, the number of the channel state information in the training data set is a first preset value, and the feature extractor is used for extracting features of the channel state information;

acquiring a classifier training data set corresponding to a scene to be detected, extracting features in the classifier training data set through the feature extractor, using the features as classifier training features, training a preset machine learning model according to the classifier training features and acquiring a classifier corresponding to the scene to be detected, wherein the classifier training data set is acquired according to a set of channel state information acquired in the scene to be detected in advance, the number of the channel state information in the classifier training data set is a second preset value, the second preset value is smaller than the first preset value, and the classifier is used for performing people number detection according to input features and outputting a detection result;

And acquiring data to be detected in the scene to be detected, detecting the number of people of the data to be detected through the feature extractor and the classifier, and outputting a detection result, wherein the data to be detected is acquired according to the state information of a channel to be detected in the scene to be detected.

Optionally, before the obtaining the feature extractor, the method further includes:

acquiring channel state information corresponding to each people number type in the training scene in advance through a receiving antenna in the training scene, processing the channel state information in the training scene according to a preset preprocessing process, and marking people number type labels to obtain a training data set;

acquiring channel state information corresponding to each people number category in the scene to be detected in advance through a receiving antenna in the scene to be detected, processing the channel state information in the scene to be detected according to a preset preprocessing process, and marking people number category labels to obtain a classifier training data set;

the people number category is obtained according to the people number division in the corresponding scene, and the preprocessing process comprises the following steps: and respectively calculating an amplitude matrix and a phase matrix corresponding to each piece of channel state information, performing layer-based standardization on the amplitude matrix and obtaining an amplitude information matrix, and calculating the phase difference between the receiving antennas according to the phase matrix and obtaining a phase information matrix.

Optionally, the obtaining the feature extractor includes:

dividing the training data set into a training subset and a testing subset;

and training the preset convolutional neural network model according to the training subset and the test subset to obtain the feature extractor, wherein the convolutional neural network model comprises a plurality of convolutional blocks and a full-connected layer, and each convolutional block comprises a plurality of convolutional kernels, a normalization layer and a ReLU activation function.

Optionally, the convolutional neural network model is trained according to a preset training algebra and a preset knowledge self-distillation process, where the preset knowledge self-distillation process includes:

performing 1 generation training on the convolutional neural network model based on the training data set and the number of people class labels corresponding to the training data set to obtain a 1 generation model;

based on the training data set, the number of people category label corresponding to the training data set and the second

Output result pair of generation model

Training on the model to be trained to obtain

A generation model in which, among other things,

greater than 1 and not greater than the preset training algebra, the first

Structure of model to be trained and the first

The generation models are the same.

Optionally, in the process of acquiring the channel state information corresponding to each population category in the scene to be detected in advance through the plurality of receiving antennas in the scene to be detected, at least 5 pieces of corresponding channel state information are acquired for each population category respectively.

Optionally, the acquiring data to be detected in the scene to be detected, performing people number detection on the data to be detected through the feature extractor and the classifier, and outputting a detection result includes:

acquiring and acquiring state information of a channel to be detected in the scene to be detected through a receiving antenna in the scene to be detected;

processing the channel state information to be detected according to the preprocessing process and obtaining the data to be detected;

inputting the data to be detected into the feature extractor, and acquiring the features output by the feature extractor as the features to be detected;

and inputting the features to be detected into a classifier corresponding to the scene to be detected, detecting the number of people through the classifier and outputting the detection result.

Optionally, the preset machine learning model is a logistic regression model or an SVM model.

The second aspect of the present invention provides a cross-scene robust wireless detection system for the number of people in a room, wherein the cross-scene robust wireless detection system for the number of people in the room comprises:

the device comprises a feature extractor obtaining module, a feature extractor obtaining module and a feature extractor, wherein the feature extractor is obtained by training a preset convolutional neural network model in advance according to a training data set, the training data set is obtained according to a set of channel state information collected in a training scene in advance, the number of the channel state information in the training data set is a first preset value, and the feature extractor is used for extracting features of the channel state information;

the classifier training data set is obtained according to a set of channel state information collected in the scene to be detected in advance, the number of the channel state information in the classifier training data set is a second preset value, the second preset value is smaller than the first preset value, and the classifier is used for detecting the number of people according to the input features and outputting a detection result;

And the people number detection module is used for acquiring data to be detected in the scene to be detected, detecting the number of people for the data to be detected through the feature extractor and the classifier, and outputting a detection result, wherein the data to be detected is obtained according to the state information of a channel to be detected in the scene to be detected.

The invention provides an intelligent terminal, which comprises a memory, a processor and a cross-scene robust wireless indoor people number detection program which is stored on the memory and can run on the processor, wherein when the cross-scene robust wireless indoor people number detection program is executed by the processor, the step of realizing any one of the cross-scene robust wireless indoor people number detection methods is realized.

The invention provides a computer-readable storage medium, wherein a cross-scene robust wireless indoor people number detection program is stored on the computer-readable storage medium, and when being executed by a processor, the cross-scene robust wireless indoor people number detection program realizes any one step of the cross-scene robust wireless indoor people number detection method.

As can be seen from the above, in the scheme of the present invention, a feature extractor is obtained, where the feature extractor is obtained by training a preset convolutional neural network model in advance according to a training data set, the training data set is obtained according to a set of channel state information collected in a training scene in advance, the number of the channel state information in the training data set is a first preset value, and the feature extractor is configured to extract features of the channel state information; acquiring a classifier training data set corresponding to a scene to be detected, extracting features in the classifier training data set through the feature extractor, using the features as classifier training features, training a preset machine learning model according to the classifier training features, and acquiring a classifier corresponding to the scene to be detected, wherein the classifier training data set is acquired according to a set of channel state information acquired in the scene to be detected in advance, the number of the channel state information in the classifier training data set is a second preset value, the second preset value is smaller than the first preset value, and the classifier is used for performing people number detection according to input features and outputting a detection result; and acquiring data to be detected in the scene to be detected, performing people number detection on the data to be detected through the feature extractor and the classifier, and outputting a detection result, wherein the data to be detected is acquired according to the state information of a channel to be detected in the scene to be detected. Compared with the scheme of detecting the number of people according to the camera and the trained deep learning model in the prior art, the method and the device for detecting the number of people perform the number of people according to the channel state information, are not influenced by sight shielding, and are beneficial to improving the accuracy of the number of people detection. Meanwhile, the convolutional neural network model is trained according to the channel state information sample data (training data set) in the training scene to obtain the feature extractor, and then when the number of people needs to be detected in other scenes (namely, when the scene needs to be crossed), the classifier of the scene to be detected is obtained by training the machine learning model according to the trained feature extractor and a small amount of channel state information sample data in the scene to be detected.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the embodiments or the prior art description will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive labor.

Fig. 1 is a schematic flowchart of a cross-scene robust wireless detection method for the number of people in a room according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of acquiring a training data set and acquiring a classifier training data set according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a training scenario and a scenario to be detected according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating the step S100 in FIG. 1 according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a convolutional neural network model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of trainable parameters of various layers of a convolutional neural network model provided by an embodiment of the present invention;

FIG. 7 is a schematic flow diagram of a self-distillation knowledge system provided by an embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating a variation of a training sample in a convolutional neural network model according to an embodiment of the present invention;

FIG. 9 is a flowchart illustrating the step S300 in FIG. 1 according to an embodiment of the present invention;

FIG. 10 is a schematic flow chart of model training provided by an embodiment of the present invention;

fig. 11 is a detection result of a classifier in 3 scenes to be detected, obtained based on training of a logistic regression model according to an embodiment of the present invention;

fig. 12 is a schematic diagram of a confusion matrix output by a classifier when a detected person is in a walking state in a scene to be detected according to an embodiment of the present invention;

fig. 13 is a schematic diagram of a confusion matrix output by a classifier when a detected person is in a free-movement state in a scene to be detected according to an embodiment of the present invention;

fig. 14 is a schematic structural diagram of a cross-scene robust wireless indoor population detection system according to an embodiment of the present invention;

fig. 15 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when …" or "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings of the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

According to the invention, the channel state information in the scene needing to be detected can be acquired through the wireless signals, and the people number is detected according to the channel state information. Therefore, the wireless detection of the number of indoor people is realized based on the wireless signals, the privacy of users is guaranteed, and the influence of illumination conditions is avoided. Meanwhile, with the development of wireless local area network technology in recent years, WiFi signals almost completely cover indoor areas, so compared with a method for detecting the number of people by using radar-based equipment, the method can use WiFi equipment, thereby reducing hardware cost and facilitating deployment.

It should be noted that Channel State Information (CSI) is an attribute of a communication link, and reflects a change of a wireless Channel with an environment, such as scattering, fading, and the like. The number of people in the environment, the action of human body and the like all affect the wireless channel, so that the change of the channel state information is caused, and the specific number of people in the room can be detected by monitoring the change of the channel state information.

In one application scenario, feature model-based machine learning and featureless machine learning can be performed based on channel state information to perform people number detection. In the machine learning process based on the feature model, the statistical features of the CSI matrix, such as the statistical information of the mean value, the standard deviation, the variance, the number of peaks and troughs and the like, are mainly extracted, and the machine learning model (such as logistic regression, support vector machine and the like) is used for learning the feature difference between the CSI matrix under the condition of different people numbers, so that the number of people in a room is judged.

However, the performance of the feature-based machine learning method depends on the accuracy of input features, feature calculation and dimension reduction processing need to be performed on a CSI matrix in the method, and a large amount of experimental verification and priori knowledge are needed for specific feature selection, because some CSI features are not affected by human activities, if the features irrelevant to a target task are input into a model, the performance of the model is greatly reduced. Therefore, the method has large calculation amount and is not beneficial to improving the accuracy of people number detection.

In another application scenario, a deep learning method is used, and only amplitude information and phase information of the CSI matrix are input into a neural network model (such as a deep neural network, a convolutional neural network, a cyclic neural network and the like) for supervision and training after certain preprocessing, and CSI information is extracted through the neural network model and used for multi-classification tasks.

The deep learning method relies on a large amount of training data to perform model iteration, a large amount of sample data needs to be acquired and labeled for each different scene, the model training process is complex, and a large amount of computing resources need to be consumed.

Meanwhile, the CSI matrix is sensitive to the change of the environment, and the model trained in a single scene is prone to generate an overfitting problem, the robustness of the model trained by the machine learning method and the deep learning method is not high, that is, after a model is fully trained in a scene, once the detected scene is changed (such as changing to another room or changing the position of the transceiver device), the performance of the model will be sharply reduced. For example, the deep learning model of a specific scene to be detected needs to be obtained by performing complex training according to a large number of labeled samples in the scene to be detected, so that a large number of labeled samples in a new scene to be detected need to be collected and the whole deep learning model needs to be trained again each time the scene to be detected is switched, and the training cost of the model is high. Therefore, the model needs to collect a large number of data samples for each environment to be detected, the collection complexity and the cost are high, the model training acceptance is high, and the robustness is poor during cross-scene detection.

In order to solve at least one of the problems, in the scheme of the invention, the number of people is detected according to the channel state information without being influenced by sight shielding, thereby being beneficial to improving the accuracy of the number of people detection. Meanwhile, according to the invention, a convolutional neural network model is trained according to channel state information sample data (training data set) in a training scene to obtain a feature extractor, and then when the number of people is detected in other scenes (namely, when the scenes need to be crossed), a classifier of the scene to be detected is obtained by training a machine learning model according to the trained feature extractor and a small amount of channel state information sample data in the scene to be detected.

Exemplary method

As shown in fig. 1, an embodiment of the present invention provides a cross-scene robust wireless detection method for the number of people in a room, and specifically, the method includes the following steps:

Step S100, a feature extractor is obtained, wherein the feature extractor is obtained by training a preset convolutional neural network model according to a training data set in advance, the training data set is obtained according to a set of channel state information collected in a training scene in advance, the number of the channel state information in the training data set is a first preset value, and the feature extractor is used for extracting features of the channel state information.

In this embodiment, only a large amount of sample data (i.e., the labeled channel state information) needs to be acquired in the training scene, and only a small amount of sample data (i.e., the labeled channel state information) needs to be acquired in other scenes to be detected. Therefore, a room with a large amount of labeled sample data in advance can be used as a training scene, so that the acquisition and labeling of the large amount of sample data are not required to be repeatedly carried out, and the difficulty and the labeling cost of sample acquisition are favorably reduced.

Step S200, a classifier training data set corresponding to a scene to be detected is obtained, features in the classifier training data set are extracted through the feature extractor and are used as classifier training features, a preset machine learning model is trained according to the classifier training features, a classifier corresponding to the scene to be detected is obtained, wherein the classifier training data set is obtained according to a set of channel state information collected in the scene to be detected in advance, the number of the channel state information in the classifier training data set is a second preset value, the second preset value is smaller than the first preset value, and the classifier is used for carrying out people number detection according to input features and outputting detection results.

In this embodiment, the scene to be detected is any scene in which the number of people needs to be detected, the scene to be detected is different from the training scene, and when the feature extractor obtained by training according to the sample data in the training scene is applied to the scene to be detected, the change of the scene is involved. Therefore, in order to improve robustness, the trained feature extractor is not directly applied to a new scene to be detected for people number detection, but a classifier suitable for the scene to be detected is obtained by combining a small amount of sample data in the scene to be detected and the feature extractor which are collected in advance. And the small amount of sample data in the scene to be detected comprises channel state information with the number being a second preset value, and the second preset value is far smaller than the first preset value. In other words, a large amount of sample data does not need to be acquired in the scene to be detected, and the difficulty and the cost of data acquisition and labeling are favorably reduced.

In this embodiment, as shown in fig. 2, before the step S100, the method further includes the following steps:

step A100, acquiring channel state information corresponding to each people number type in the training scene in advance through a receiving antenna in the training scene, processing the channel state information in the training scene according to a preset preprocessing process, and marking people number type labels to obtain the training data set.

Step A200, acquiring channel state information corresponding to each people number category in the scene to be detected in advance through a receiving antenna in the scene to be detected, processing the channel state information in the scene to be detected according to a preset preprocessing process, marking the people number category labels, and acquiring the classifier training data set.

The people number category is obtained according to the people number division in the corresponding scene, and the preprocessing process comprises the following steps: and respectively calculating an amplitude matrix and a phase matrix corresponding to each piece of channel state information, carrying out layer-based standardization on the amplitude matrix and obtaining an amplitude information matrix, and calculating the phase difference between the receiving antennas according to the phase matrix and obtaining a phase information matrix.

In this embodiment, a wireless signal transmitting device and a receiving device are arranged in a scene (for example, the training scene and the scene to be detected) in which channel state information needs to be acquired, and transmit a wireless signal through an antenna in the wireless signal transmitting device, and then receive the wireless signal through an antenna in the wireless signal receiving device and acquire corresponding channel state information according to the received wireless signal. In this embodiment, channel state information is acquired based on a WiFi technology, and specifically, the WiFi technology is based on an IEEE 802.11a/g/n protocol, supports multiple-input multiple-output (MIMO) and Orthogonal Frequency Division Multiplexing (OFDM) technologies, and uses multiple antennas and multiple measurement subcarriers. In the embodiment, the network card driver is modified to acquire the CSI information on the wireless network card. It should be noted that, in the WiFi wireless local area network protocol, the collected CSI is a set of 4-dimensional channel state matrices including a transmitting antenna, a receiving antenna, a measurement subcarrier, and a measurement time, and the matrix is a complex matrix and includes amplitude and phase information of a complex number.

Fig. 3 is a schematic diagram of a training scene and a scene to be detected according to an embodiment of the present invention, and fig. 3 shows one training scene and three different scenes to be detected, where the training scene may be used as a source domain, and the different scenes to be detected may be used as different target domains. In this embodiment, the training scenario is a common office scenario in which a table and a chair are placed and a signal transmitting device and a signal receiving device are provided. The placement positions of the signal transmitting equipment and the signal receiving equipment of the scene 1 to be detected and the training scene are different; the scene 2 to be detected and the training scene are in different rooms, and the table and chair placing modes are different; the scene 3 to be detected and the training scene are in different rooms, the placing modes of the table and the chair are different, and the placing positions of the signal transmitting equipment and the receiving equipment are different. In this embodiment, a preset convolutional neural network model is trained through a large amount of sample data in a training scene to obtain a feature extractor, and then a preset machine learning model is trained according to the feature extractor and a small amount of sample data in a corresponding scene to be detected (for example, the scene 1 to be detected) to obtain a classifier corresponding to the scene to be detected, so that the number of people in the scene 1 to be detected can be detected according to the feature extractor and the classifier, and the detection accuracy is improved. It should be noted that people in each scene mainly move in the dotted line region, and can perform activities such as sitting, walking, eating, typing, and the like.

In this embodiment, a large amount of channel state information in a training scene and a small amount of channel state information in a scene to be detected are collected in advance, so as to complete training of the feature extractor and training of the classifier. Specifically, each person number category is set in advance, for example, a +1 person number categories of 0 to a are set, where a is a preset threshold of the person number category, the person number category a corresponds to a situation where a includes a persons in the scene, and a is any value from 0 to a. The method comprises the steps of collecting channel state information for training in advance in a training scene and a scene to be detected, collecting each people number category, collecting more channel state information for each people number category in the training scene, and collecting less channel state information for each people number category in the scene to be detected.

For example, in an application scenario, a =8, that is, the channel state information needs to be acquired for 9 people categories respectively. Specifically, in the training scene, the channel state information of the group B under different conditions is obtained for each person number type detection, and in the scene 1 to be detected, the channel state information of the group C under different conditions is obtained for each person number type detection, where C is much smaller than B, for example, C may be 1000, B may be 5, the number (i.e., 9) of the group B multiplied by the number of the person number types is equal to the first preset value, and the number of the group C multiplied by the number of the person number types is equal to the second preset value.

Therefore, the training of the feature extractor can be completed based on one training scene, the feature extractor can be flexibly applied to other scenes to be detected, and when the feature extractor is applied to a new scene to be detected, the feature extractor model and the classifier model can adapt to the new scene to be detected only by acquiring a small amount of channel state information in the new scene to be detected.

In this embodiment, the population number category label is used to label an actual population number corresponding to the corresponding channel state information (or each training data in the training data set, each classifier training data in the classifier training data set), so as to implement training and adjustment of the model according to the labeled actual population number.

In this embodiment, in each scene shown in fig. 3, CSI corresponding to different people is collected, and the collected original CSI stream is a 4-dimensional complex matrix, which can be represented as

Wherein, in the process,

a matrix representing the state information of the channel,

a complex matrix representing that the matrix belongs to a multi-dimensional matrix,

is the dimension of time, and is,

is the number of receiving antennas and is,

is the number of transmit antennas and is,

is the number of subcarriers. The acquired original CSI matrix is complex, and can be processed through a preset preprocessing process, so that the acquired data can better meet the requirement of model training. In the preprocessing process, the amplitude and the phase of the complex number are calculated, and then the amplitude and the phase are processed respectively. It should be noted that, in order to reduce the calculation amount of each calculation and better perform data processing, in an application scenario, the collected CS may also be used The I data is segmented according to a preset fixed time window to obtain segmented data

Wherein, in the process,

is the channel state information in one time window,

the window size is divided, and then the amplitude matrix and the phase matrix of the complex number corresponding to each window are respectively calculated. In an application scenario, the amplitude matrix and the phase matrix corresponding to each segmentation window can be respectively processed, so that the calculation difficulty during processing is simplified. In this embodiment, the amplitude matrices corresponding to all the splitting windows are combined to form a complex amplitude matrix corresponding to the original CSI matrix

And phase matrix

The number of times of repeated processing is reduced.

In an application scenario, the collected CSI is a complex matrix, which is first segmented according to a preset length, and then segmented into matrices with the same time length as one sample. In practical use, the CSI is collected continuously, for example, 10 minutes at a certain transmission rate, the obtained matrix is 10000 in length, and the matrix can be cut into one data with the length of 100. And then calculating the amplitude of the complex matrix to obtain an amplitude matrix, and calculating the phase of the complex matrix to obtain a phase matrix. Amplitude matrix

And phase matrix

All-purpose heel

Have the same shape. In this embodiment, collected

The shape was 200 × 3 × 2 × 114.

For amplitude information, certain standardization needs to be carried out to enable the amplitude information to be adaptive to the input of a model, the iterative updating performance of parameters of a learner is improved, and namely the data amplitude is subjected to normalization processing. In this embodiment, Layer Normalization is adopted, in which the number of receiving antennas and the number of transmitting antennas are combined in a matrix dimension and fixed as an antenna dimension, specifically, the numbers of the transmitting antennas and the receiving antennas are used as two dimensions of the matrix, and the dimension of time marks is added, so that data are combined into one dimension

I.e. when the matrix size is

. Calculate the mean value on the matrix of the last 2 dimensions (time and sub-carrier) corresponding to each data (each sample data)

And standard deviation of

And (2) carrying out standardization according to the following formula (1) to obtain a preprocessed amplitude information matrix:

wherein the content of the first and second substances,

representing the sample data number (i.e. the corresponding CSI)The number of data),

represents the first

The mean value corresponding to the samples was taken,

represents the first

The standard deviation corresponding to the samples was used,

represents the first

The magnitude matrix corresponding to the samples is calculated,

Represents the first

A matrix of amplitude information corresponding to the sample is used,

is a preset additional item for preventing the denominator from being 0, and can be preset as

，

Is a natural constant.

For phase information, the original CSI phase information contains two main noises, i.e., Carrier Time Offset (CFO) and Sampling Time Offset (STO). In this embodiment, the phase difference between the two receiving antennas is calculated, and the corresponding noise component is eliminated, so as to obtain the corresponding phase information matrix. Specifically, calculating the phase difference between two receiving antennas also belongs to the operation of the matrix level. In one application scenario, a phase matrix size is 200 × 3 × 2 × 114, where the second number 3 represents the receive antenna. When this 3 is seen alone, the matrix of each receive antenna is 200 × 2 × 114. The subtraction is the subtraction of the corresponding position elements of the matrix, wherein the subtraction is the subtraction of the 1 st root minus the 2 nd root, the subtraction of the 2 nd root minus the 3 rd root, and the subtraction of the 3 rd root minus the 1 st root. The dimensionalities of the 3 matrixes obtained after the subtraction are not changed, and the 3 matrixes are combined back to be 200 multiplied by 3 multiplied by 2 multiplied by 114.

The amplitude information matrix and the phase information matrix obtained through the preprocessing process are both 3-dimensional matrices, and in an application scenario, the sizes of the obtained amplitude information matrix and the obtained phase information matrix are both 6 × 200 × 114, wherein a first number represents 6 antenna dimensions, 200 is a time dimension, and 114 is the number of subcarriers, 2 transmitting antennas are arranged in the signal transmitting device, 3 receiving antennas are arranged in the signal receiving device, and each antenna corresponds to 114 subcarriers.

Further, the collected data is labeled after the preprocessing process to obtain a data set (which may be a training data set or a classifier training data set, and is determined according to the source of the corresponding channel state information), and one data corresponding to the collected sample in the data set is

Wherein, in the process,

，

is the first

A matrix of amplitude information corresponding to the sample is used,

is the first

A phase information matrix (i.e. a phase difference matrix) corresponding to the sample is used,

is a corresponding people number category label.

In this embodiment, as shown in fig. 4, the step S100 specifically includes the following steps:

step S101, dividing the training data set into a training subset and a testing subset.

Step S102, training the preset convolutional neural network model according to the training subset and the test subset to obtain the feature extractor, wherein the convolutional neural network model comprises a plurality of convolutional blocks and a full-connected layer, and each convolutional block comprises a plurality of convolutional kernels, a normalization layer and a ReLU activation function.

And the data in the training subset is used for training the convolutional neural network model, and the data in the testing subset is used for testing and checking the training accuracy of the convolutional neural network model in real time. Specifically, in this embodiment, a knowledge migration method of meta-learning is used, an accurate prior model (i.e., a trained feature extractor) is trained by using a rich sample of a source domain, and effective feature extraction is performed on target domain data by using the model, so that the requirement on the amount of target domain training data is reduced, and excellent detection performance under a small sample of a target domain is realized.

In one application scenario, let the training data set in the source domain (i.e., training scenario) be

Wherein the subscripts are lower case

Number of scenes representing source domain, training subset

Testing the subsets

，

And

the training subset and the testing subset are used for indicating how many groups of data need to be divided, and the sizes of the training subset and the testing subset can be preset or adjusted according to actual conditions. It should be noted that, in another application scenario, the classifier training data set in the target domain (i.e. the scene to be detected) may also be divided to obtain corresponding classifier training subsets and classifier testing subsets, for example, the classifier training data set of one target domain is

Wherein, in the step (A),

number of scenes representing target domain, classifier training subset

Classifier test subset

，

And

respectively used for indicating how many groups of data need to be divided into a classifier training subset and a classifier testing subset, the size of which can be preset or adjusted according to the actual situation,

are all fieldsNumber of samples in the scene. It should be noted that, in the target domain, the training samples are 5 samples per category, and are used to train a special logistic regression model. The test samples are used to test the model with labels that are used only to compare with the output of the model to evaluate the performance of the model.

In one application scenario, the collection and labeling of the scene to be detected (target domain) is also a large number of samples, but only a small number of samples, i.e. 5 samples per population category, are used for training, and the rest of the samples are used for testing. In this embodiment, in the process of acquiring the channel state information corresponding to each people number category in the scene to be detected in advance through the multiple receiving antennas in the scene to be detected, at least 5 pieces of corresponding channel state information are acquired for each people number category respectively. Preferably, the proportions of the training subset and the testing subset of the source domain are 90% and 10%, respectively, but are not limited specifically.

In this embodiment, the feature extractor is obtained by training in the source domain, using

Performing training, using

The training of the feature extractor is tested. After the training of the feature extractor is completed, the parameters of the feature extractor are fixed. In a target domain detection scene, a classifier training subset of a target domain is firstly selected

Sample of (1)

(each people number category has only 5 samples) is input into the feature extractor to obtain features

Then using the feature to train a classifier specific to the scene

Classifier test subset of target domain

For evaluating classifiers

The performance of (c).

Fig. 5 is a schematic structural diagram of a convolutional neural network model according to an embodiment of the present invention, and as shown in fig. 5, the convolutional neural network model in this embodiment includes 6 convolutional blocks and 1 fully-connected layer, where each convolutional block includes 64 convolutional kernels of 3 × 3 size, one batch normalization layer and one ReLU activation function, and one pooling layer. The overall parameter number is 189513 (the parameter quantity is determined according to the actual structure of the CNN, and is related to the number of convolution layers and convolution kernels), and the trainable parameter number of each layer is shown in fig. 6. In fig. 6, the output shape represents the shape (size) of the feature map output by the corresponding layer.

In this embodiment, only the feature extractor is trained in the source domain, after the training is completed, the parameters of the feature extractor are frozen, and the parameters to be specifically trained in the feature extractor are determined according to the specific structure of the preset convolutional neural network model, which is not specifically limited herein.

It should be noted that the preset convolutional neural network model is not unique in structure, and may be adjusted according to actual requirements, for example, the number of convolutional layers may be adjusted according to actual requirements, and some or all of the convolutional layers may be replaced with Long Short-Term Memory (LSTM) layers.

The purpose of training in the source domain (training scenario) is to obtain a feature extractor

Wherein the training of the parameters in the feature extractor is as shown in equation (2):

wherein, the first and the second end of the pipe are connected with each other,

is a function of the cross-entropy loss that is preset,

is the parameter that needs to be trained in the feature extractor (model), the left side of the equation is the updated parameter, the right side of the equation is the parameter before updating,

is a training subset in the source domain. Preferably, in this embodiment, the feature extractor includes two sub models, and the two sub models are obtained by training two preset convolutional neural network models (such as the model shown in fig. 5) with identical structures. In other words, in this embodiment, the amplitude information and the phase information of the CSI matrix are trained respectively, and two independent multilayer CNN models are trained to form the feature extractor, so that the amplitude and the phase features of the CSI matrix can be extracted respectively by the feature extractor, and compared with a single amplitude or phase-based algorithm, the feature extractor contains richer information, and the detection accuracy of a subsequent model is improved. In this embodiment, the feature extractor

Comprises two submodels

And

，

for extracting the amplitude information of the amplitude value,

For extracting the phase difference information, the architecture of the two submodels is the same, as shown in fig. 5, and their training is independent.

In this embodiment, a knowledge self-distillation technology of deep learning is introduced in the training process of the model, so as to further improve the generalization capability of the network. Specifically, the convolutional neural network model is trained according to a preset training algebra and a preset knowledge self-distillation process, wherein the preset knowledge self-distillation process comprises the following steps: performing 1 st generation training on the convolutional neural network model based on the training data set and the number of people category labels corresponding to the training data set to obtain a 1 st generation model; based on the training data set, the number of people category label corresponding to the training data set and the second

Output result pair of generation model

Training on the model to be trained to obtain

A generation model in which, among other things,

greater than 1 and not greater than the preset training algebra, the first

Structure of model to be trained and the first

The generation models are the same.

FIG. 7 is a schematic flow chart of knowledge self-distillation provided by an embodiment of the present invention, as shown in FIG. 7, first using training samples in a training scenario (source domain)

And corresponding label

For preset convolution neural network model (i.e. original model)

) Training and fixing model parameters to obtain a 1 st generation model. Then initializing the 1 st generation model to be trained with the same structure as the 1 st generation model

To train

The time-input sample is

But the labels are of generation 1 models (i.e. trained) respectively

) Output of (2)

(Soft tag) and original tag

And (hard label), during training, calculating corresponding losses of the two labels according to different algorithms, weighting and summing the two losses to obtain total loss, and converging the model through a gradient descent algorithm. Similarly, the labels for training the model to be trained in the 2 nd generation are the output of the model in the 1 st generation and the original labels

Namely, the label for training the current generation model to be trained is always the combination of the output of the trained previous generation model and the original label, namely:

wherein the content of the first and second substances,

is the first

The model is represented by a model generation,

is a pre-set cross-entropy loss function,

are the parameters in the model that need to be trained,

and

is a loss ratio coefficient (which can be preset or adjusted according to actual requirements), specifically,

KL represents the relative entropy of the output of the current model and the output of the trained previous generation model.

It should be noted that, when a preset training algebra (i.e., the number of times of performing self-distillation and performing model training) or model convergence (e.g., the difference between the total loss of a model of a certain generation and the total loss of a model of a previous generation is smaller than a preset loss difference threshold), the training is stopped, and a trained feature extractor is obtained.

In one application scenario, the amplitude model and the phase difference model in the feature extractor are both subjected to a certain number of self-distillations. In one application scenario, the 3 rd generation or 4 th generation distilled model has performance improved by more than 8% compared with the original model.

In the actual application process, if the scene changes (i.e., the scene is applied to a new detection scene, for example, the WiFi transceiver device is replaced, the detected scene changes, etc.), even if the recognition performance of the trained feature extractor model in the training scene is excellent (for example, more than 97%), if the trained feature extractor model is directly used as a classifier in a new scene to be detected (i.e., a target domain), the recognition performance may be sharply reduced (more than 60%) due to the different data characteristics in the new scene. Therefore, the knowledge migration concept based on meta-learning in the present embodiment further improves the robustness of the model. The pre-trained CNN model is used as a Feature extractor of data of a new scene, in the new scene, only 5 samples are required to be collected for each people number category, then the samples are input into the Feature extractor CNN to obtain a corresponding Feature Map (Feature Map), and then training is carried out according to the Feature Map to obtain a classifier suitable for the new scene.

Fig. 8 is a schematic diagram of a change of a training sample in a convolutional neural network model according to an embodiment of the present invention, where a CSI matrix input into the convolutional neural network model (i.e., a feature extractor) includes magnitude and phase information, an array dimension of the CSI matrix is 6 × 200 × 114, after processing by the convolutional neural network, a feature is refined, and an array size changes. According to different selected neural network layer positions, the sizes of the constructed feature maps are different. In this embodiment, a feature map obtained from the penultimate volume block (a feature map of another layer may be obtained in an actual use process) is expanded into a two-dimensional feature matrix of 1 × 576 (576 =64 × 3 × 3) after the size thereof is 64 × 3 × 3.

The amplitude and phase information of the CSI respectively result in two 1 × 576 feature matrices, and the two matrices are connected to obtain a 1 × 1152 joint feature. This joint feature is a condensed refinement of the data of the new scene (target domain).

In this embodiment, a traditional machine learning algorithm is used in a target domain (scene to be detected), and a lightweight classifier is used to train and predict the obtained joint features. Network routing of the lightweight classifier

Parameterizing, wherein, the parameters of the parameters,

is the weight value of the weight value,

is a bias term, the network is based on a small sample classifier training dataset in the scene to be detected

Training is carried out with the following targets:

is a model of a classifier that is used to classify the feature,

is a sample

And an output obtained after the input of the feature extractor.

Optionally, the preset machine learning model is a logistic regression model or an SVM model. Preferably, in this embodiment, a Logistic Regression model (LR) is adopted, so that fewer parameters need to be trained, which is beneficial to generating a classifier under a small sample. That is, the amplitude and phase characteristics of the wireless data in the scene to be detected can be extracted by using a CNN model (feature extractor) trained according to the data of the training scene, and a simple logistic regression model is used to train the small sample of the scene to be detected, so as to generate a classifier corresponding to the scene to be detected.

In an application scenario, experimental verification shows that LR trained by features extracted by a feature extractor can achieve better performance only by 5 samples in a scenario to be detected, and the highest test accuracy can reach 98%. Therefore, based on the method for learning the small samples in the embodiment, only 5 samples in a new scene are needed, the light-weight classifier model with excellent performance for the new scene can be trained, the problem of robustness of performance reduction when the machine learning model and the deep learning model are migrated to the new scene is solved, and meanwhile, the problems of difficulty in data acquisition, difficulty in training and the like of the model in the application process of the new scene are solved through the small sample learning.

In another application scenario, the parameters of the feature extractor can be further fine-tuned by using a classifier training data set corresponding to a small number of samples in the scene to be detected, so that the feature extractor can better adapt to data of a target domain.

Step S300, acquiring data to be detected in the scene to be detected, detecting the number of people in the data to be detected through the feature extractor and the classifier, and outputting a detection result, wherein the data to be detected is acquired according to the state information of a channel to be detected in the scene to be detected.

In this embodiment, the data to be detected is obtained after the channel state information to be detected is subjected to a preprocessing process the same as the channel state information used in training, so that the data to be detected can be used as the input of the feature extractor model, and the detection precision is improved.

In this embodiment, as shown in fig. 9, the step S300 specifically includes the following steps:

step S301, acquiring and acquiring the state information of the channel to be detected in the scene to be detected through the receiving antenna in the scene to be detected.

Step S302, according to the preprocessing process, the channel state information to be detected is processed and the data to be detected is obtained.

Step S303, inputting the data to be detected into the feature extractor, and acquiring the features output by the feature extractor as the features to be detected.

And S304, inputting the features to be detected into a classifier corresponding to the scene to be detected, detecting the number of people through the classifier and outputting the detection result.

The feature output by the feature extractor may be an output feature map of any layer (in this embodiment, the penultimate convolutional layer) of the feature extractor. In an application scenario, the detection result is a probability corresponding to each people number category label, that is, a probability of 0 person, a probability of 1 person, and the like exist in the current scene to be detected. In this embodiment, the detection result is the specific number of people in the current scene to be detected, that is, the corresponding number of people category label with the maximum probability. In another application scenario, whether the probability corresponding to the people number category label with the maximum probability exceeds a preset probability threshold value or not can be judged, and if not, the detection process is mistaken and used as a detection result and output.

In this embodiment, a method for wirelessly detecting the number of people in a room with robust scene is further described based on a specific application scene, and fig. 10 is a schematic flow diagram of model training provided in an embodiment of the present invention, and as shown in fig. 10, data acquisition is performed first (including CSI matrices used for training models in a training scene and a scene to be detected), then the acquired data is preprocessed and divided into source domain data (training scene data) and target domain data (scene data to be detected), a feature extractor is acquired based on the source domain data training, and then the target domain data is input to the feature extractor and trained based on a feature map output by the feature extractor to obtain a lightweight classifier. It should be noted that, only a small number of samples (5 samples for each population category) are used in training the classifier, and the classifier can be tested and adjusted based on part of the samples, so as to improve the detection accuracy.

In this embodiment, 9 types of data of 0-8 people in a room are collected, and for each scene, data of different activities such as the type of stillness, walking, and free activity of a detected person are collected, and an experimental result is shown in fig. 11. Specifically, fig. 11 is a detection result of the classifier in 3 scenes to be detected, which is obtained based on training of the logistic regression model according to the embodiment of the present invention, and it can be seen from fig. 11 that the detection method in the embodiment obtains a higher accuracy for three different types of activities.

Fig. 12 is a schematic diagram of a confusion matrix output by a classifier when a detected person is in a walking state in a scene to be detected according to an embodiment of the present invention, and fig. 13 is a schematic diagram of a confusion matrix output by a classifier when a detected person is in a free-movement state in a scene to be detected according to an embodiment of the present invention. The scene to be detected in fig. 12 and 13 is the same, and compared with the training scene, the room is changed and the placement position of the object in the room is changed. It should be noted that the confusion matrix is a distribution of the output results of the model for each class sample, and indicates which classes the output of the model includes (for example, 1 person sample, 99% of the probability output is 1 person, and 1% of the probability output is 2 persons, as can be seen from fig. 12 and 13, the accuracy of the method of the present embodiment is high, and in most cases, is close to 100%, the error range is substantially within 1 person in few error cases, and the error amount is substantially within an acceptable range.

In the embodiment, the wireless indoor people number detection method based on the scene robustness can realize the robust people number detection precision in the application scene with huge difference. Specifically, the method firstly uses a convolutional neural network to extract amplitude and phase information characteristics of a CSI matrix: the feature extractor for the amplitude and phase of the CSI matrix is trained using samples acquired in a training scenario (source domain). In a new scene (target domain) to be detected, a small amount of labeled samples as low as 5 are input into a trained feature extractor to output target domain features, and the target domain features are used for training a small classifier model in the new scene and classifying the samples to be detected without labels in the target domain. Experiments prove that when the scene is changed, only 5 samples of each people number category in a new scene need to be collected, a small-sized targeted network model with excellent effect can be trained quickly, and the detection accuracy rate of the people in an indoor static state and a walking state is more than 96%; for the people who freely move indoors, the detection rate reaches more than 93 percent. The problems of difficult data acquisition, overhigh training cost, model robustness and the like faced by deep learning model migration and application deployment are solved.

As can be seen from the above, in the solution of the present invention, a feature extractor is obtained, where the feature extractor is obtained by training a preset convolutional neural network model in advance according to a training data set, the training data set is obtained according to a set of channel state information collected in a training scene in advance, the number of the channel state information in the training data set is a first preset value, and the feature extractor is configured to extract features of the channel state information; acquiring a classifier training data set corresponding to a scene to be detected, extracting features in the classifier training data set through the feature extractor, using the features as classifier training features, training a preset machine learning model according to the classifier training features, and acquiring a classifier corresponding to the scene to be detected, wherein the classifier training data set is acquired according to a set of channel state information acquired in the scene to be detected in advance, the number of the channel state information in the classifier training data set is a second preset value, the second preset value is smaller than the first preset value, and the classifier is used for performing people number detection according to input features and outputting a detection result; and acquiring data to be detected in the scene to be detected, performing people number detection on the data to be detected through the feature extractor and the classifier, and outputting a detection result, wherein the data to be detected is acquired according to the state information of a channel to be detected in the scene to be detected. Compared with the scheme of detecting the number of people according to the camera and the trained deep learning model in the prior art, the method and the device for detecting the number of people perform the number of people according to the channel state information, are not influenced by sight shielding, and are beneficial to improving the accuracy of the number of people detection. Meanwhile, the convolutional neural network model is trained according to the channel state information sample data (training data set) in the training scene to obtain the feature extractor, and then when the number of people needs to be detected in other scenes (namely, when the scene needs to be crossed), the classifier of the scene to be detected is obtained by training the machine learning model according to the trained feature extractor and a small amount of channel state information sample data in the scene to be detected.

Exemplary device

As shown in fig. 14, corresponding to the cross-scene robust wireless detection method for the number of people in the room, an embodiment of the present invention further provides a cross-scene robust wireless detection system for the number of people in the room, where the cross-scene robust wireless detection system for the number of people in the room includes:

a feature extractor obtaining module 410, configured to obtain a feature extractor, where the feature extractor is obtained by training a preset convolutional neural network model in advance according to a training data set, the training data set is obtained according to a set of channel state information collected in a training scene in advance, the number of the channel state information in the training data set is a first preset value, and the feature extractor is configured to extract features of the channel state information.

In this embodiment, only a large amount of sample data (i.e., the labeled channel state information) needs to be collected in the training scene, and only a small amount of sample data (i.e., the second preset value) needs to be collected in other scenes to be detected. Therefore, a room with a large amount of labeled sample data can be used as a training scene, so that the acquisition and labeling of the sample data are not required to be repeated, and the difficulty and the labeling cost of sample acquisition are reduced.

The classifier obtaining module 420 is configured to obtain a classifier training data set corresponding to a scene to be detected, extract features in the classifier training data set through the feature extractor, use the features as classifier training features, train a preset machine learning model according to the classifier training features, and obtain a classifier corresponding to the scene to be detected, where the classifier training data set is obtained according to a set of channel state information collected in the scene to be detected in advance, the number of the channel state information in the classifier training data set is a second preset value, the second preset value is smaller than the first preset value, the classifier is configured to perform people number detection according to input features, and output a detection result, where the data to be detected is obtained according to the channel state information to be detected in the scene to be detected.

In this embodiment, the scene to be detected is any scene in which the number of people needs to be detected, the scene to be detected is different from the training scene, and when the feature extractor obtained by training according to the sample data in the training scene is applied to the scene to be detected, the change of the scene is involved. Therefore, in order to improve robustness, the trained feature extractor is not directly applied to a new scene to be detected for people number detection, and a classifier suitable for the scene to be detected is obtained by combining a small amount of sample data in the scene to be detected and the feature extractor which are collected in advance. And the small amount of sample data in the scene to be detected comprises channel state information with the number being a second preset value, and the second preset value is far smaller than the first preset value. In other words, a large amount of sample data does not need to be acquired in the scene to be detected, and the difficulty and the cost of data acquisition and labeling are favorably reduced.

The people number detection module 430 is configured to acquire data to be detected in the scene to be detected, perform people number detection on the data to be detected through the feature extractor and the classifier, and output a detection result.

In the embodiment, the data to be detected is obtained after the channel state information to be detected is subjected to the same preprocessing process as the channel state information used in training, so that the data to be detected can be used as the input of the feature extractor model, and the detection precision is improved.

Therefore, based on the system, the classifiers in different scenes to be detected can be flexibly obtained, so that the feature extractor and the corresponding classifier are combined to detect the number of people in the data to be detected in the scenes to be detected, the efficiency and the accuracy of number detection are favorably improved, the number detection of people crossing the scenes is favorably carried out, and the robustness of the model is improved.

It should be noted that specific functions and implementation flows of the system and each module thereof may refer to specific descriptions in the method embodiment, and are not described herein again.

Based on the embodiment, the invention also provides an intelligent terminal, and a schematic block diagram of the intelligent terminal can be shown in fig. 15. The intelligent terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a cross-scene robust wireless detection program for the number of people in the room. The internal memory provides an environment for the operating system in the non-volatile storage medium and the running of the cross-scene robust wireless detection program for the number of indoor people. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. When being executed by a processor, the cross-scene robust wireless indoor people number detection program realizes the steps of any cross-scene robust wireless indoor people number detection method. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen.

It will be understood by those skilled in the art that the block diagram of fig. 15 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have different arrangements of components.

In one embodiment, a smart terminal is provided, the smart terminal comprising a memory, a processor, and a cross-scene robust wireless detection program of the number of indoor persons stored on the memory and executable on the processor, the cross-scene robust wireless detection program of the number of indoor persons being executed by the processor to perform the following operation instructions:

acquiring a classifier training data set corresponding to a scene to be detected, extracting features in the classifier training data set through the feature extractor, using the features as classifier training features, training a preset machine learning model according to the classifier training features, and acquiring a classifier corresponding to the scene to be detected, wherein the classifier training data set is acquired according to a set of channel state information acquired in the scene to be detected in advance, the number of the channel state information in the classifier training data set is a second preset value, the second preset value is smaller than the first preset value, and the classifier is used for performing people number detection according to input features and outputting a detection result;

The embodiment of the invention also provides a computer readable storage medium, wherein a cross-scene robust wireless indoor people number detection program is stored on the computer readable storage medium, and when being executed by a processor, the cross-scene robust wireless indoor people number detection program realizes the steps of any cross-scene robust wireless indoor people number detection method provided by the embodiment of the invention.

It should be understood that, the sequence numbers of the steps in the embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

It should be clear to those skilled in the art that, for convenience and simplicity of description, the division of the functional units and modules is only used for illustration, and in practical applications, the functions may be distributed by different functional units and modules as needed, that is, the internal structure of the system may be divided into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present invention. For the specific working processes of the units and modules in the system, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.

In the embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed system/terminal device and method may be implemented in other ways. For example, the above-described system/terminal device embodiments are merely illustrative, and for example, the division of the modules or units is only one type of logical function division, and may be implemented by another division manner in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The integrated module/unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the method according to the embodiments of the present invention can also be implemented by a computer program, which can be stored in a computer readable storage medium and can implement the steps of the embodiments of the method when executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the contents of the computer-readable storage medium can be increased or decreased as required by the legislation and patent practice in the jurisdiction.

The above-mentioned embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims

1. A cross-scene robust wireless detection method for the number of people in a room is characterized by comprising the following steps:

Acquiring data to be detected in the scene to be detected, detecting the number of people for the data to be detected through the feature extractor and the classifier, and outputting a detection result, wherein the data to be detected is acquired according to the state information of a channel to be detected in the scene to be detected;

prior to the obtaining a feature extractor, the method further comprises:

the people number category is obtained according to the people number division in the corresponding scene, and the preprocessing process comprises the following steps: respectively calculating an amplitude matrix and a phase matrix corresponding to each piece of channel state information, carrying out layer-based standardization on the amplitude matrix and obtaining an amplitude information matrix, and calculating phase differences among the receiving antennas according to the phase matrix and obtaining a phase information matrix;

The method for acquiring the data to be detected in the scene to be detected, detecting the number of people of the data to be detected through the feature extractor and the classifier, and outputting a detection result comprises the following steps:

inputting the characteristics to be detected into a classifier corresponding to the scene to be detected, detecting the number of people through the classifier and outputting the detection result.

2. The cross-scene robust wireless indoor population detection method according to claim 1, wherein the obtaining feature extractor comprises:

dividing the training data set into a training subset and a testing subset;

and training the preset convolutional neural network model according to the training subset and the test subset to obtain the feature extractor, wherein the convolutional neural network model comprises a plurality of convolutional blocks and a full connection layer, and each convolutional block comprises a plurality of convolutional kernels, a normalization layer and a ReLU activation function.

3. The wireless detection method for the number of indoor people with cross-scene robustness as recited in claim 2, wherein the convolutional neural network model is trained according to a preset training algebra and a preset knowledge self-distillation process, wherein the preset knowledge self-distillation process comprises the following steps:

performing 1 st generation training on the convolutional neural network model based on the training data set and the number of people category labels corresponding to the training data set to obtain a 1 st generation model;

Output result pair of generation model

Training on the model to be trained to obtain

A generation model in which, among other things,

greater than 1 and not greater than the preset training algebra, the first

Structure of model to be trained and the method

The generation models are the same.

4. The cross-scene robust wireless indoor population detection method according to claim 1, wherein in the process of acquiring the channel state information corresponding to each population category in the scene to be detected through the plurality of receiving antennas in the scene to be detected in advance, at least 5 corresponding channel state information are acquired for each population category respectively.

5. The cross-scene robust wireless detection method for the number of indoor people according to claim 1, wherein the preset machine learning model is a logistic regression model or an SVM model.

6. A cross-scene robust wireless detection system for the number of people in a room, the system comprising:

the device comprises a feature extractor obtaining module, a feature extractor obtaining module and a feature extractor, wherein the feature extractor is obtained by training a preset convolutional neural network model according to a training data set in advance, the training data set is obtained according to a set of channel state information collected in a training scene in advance, the number of the channel state information in the training data set is a first preset value, and the feature extractor is used for extracting features of the channel state information;

the system comprises a classifier acquisition module, a classifier acquisition module and a classifier processing module, wherein the classifier acquisition module is used for acquiring a classifier training data set corresponding to a scene to be detected, extracting features in the classifier training data set through a feature extractor and using the features as classifier training features, training a preset machine learning model according to the classifier training features and acquiring a classifier corresponding to the scene to be detected, the classifier training data set is acquired according to a set of channel state information acquired in the scene to be detected in advance, the number of the channel state information in the classifier training data set is a second preset value, the second preset value is smaller than the first preset value, and the classifier is used for detecting the number of people according to input features and outputting a detection result;

The system comprises a people number detection module, a channel state information acquisition module and a channel state information acquisition module, wherein the people number detection module is used for acquiring data to be detected in a scene to be detected, detecting the people number of the data to be detected through the feature extractor and the classifier and outputting a detection result, and the data to be detected is acquired according to the channel state information to be detected in the scene to be detected;

the cross-scene robust wireless indoor people number detection system is also used for:

7. An intelligent terminal, characterized in that the intelligent terminal comprises a memory, a processor and a cross-scene robust wireless detection program of the number of indoor people stored on the memory and capable of running on the processor, wherein the cross-scene robust wireless detection program of the number of indoor people is executed by the processor to realize the steps of the cross-scene robust wireless detection method of the number of indoor people according to any one of claims 1 to 5.

8. A computer-readable storage medium, wherein a cross-scene robust wireless detection program of the number of indoor people is stored on the computer-readable storage medium, and when being executed by a processor, the cross-scene robust wireless detection program of the number of indoor people realizes the steps of the cross-scene robust wireless detection method of the number of indoor people according to any one of claims 1 to 5.