CN117355763A

CN117355763A - Self-supervising passive positioning using wireless data

Info

Publication number: CN117355763A
Application number: CN202280036462.1A
Authority: CN
Inventors: I·卡尔马诺夫; D·H·F·德克曼; S·莫林
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2021-05-27
Filing date: 2022-05-24
Publication date: 2024-01-05
Also published as: EP4348286A1; BR112023024220A2; KR20240009954A

Abstract

Systems, methods, and non-transitory media for performing passive Radio Frequency (RF) position detection operations are disclosed. In some aspects, RF data, such as RF signals including Channel State Information (CSI), may be received from a wireless device. The RF data may be provided to a self-supervised machine learning architecture configured to perform three-dimensional (3D) object position estimation.

Description

Self-supervising passive positioning using wireless data

FIELD OF THE DISCLOSURE

Aspects of the present disclosure relate generally to wireless positioning and the like. In some implementations, examples are described for providing passive positioning based on wireless data, such as Radio Frequency (RF) data.

BACKGROUND OF THE DISCLOSURE

The wireless sensing device is capable of providing radio frequency characteristics that can be used to detect objects in a given environment. For example, the radio frequency sensing device may include software and hardware components that may be distributed throughout the environment and may be configured to track users moving throughout the environment. To implement various telecommunications functions, wireless sensing devices may include hardware and software components configured to transmit and receive Radio Frequency (RF) signals. For example, the wireless device may be configured to communicate via Wi-Fi, 5G/New Radio (NR), bluetooth ^TM And/or Ultra Wideband (UWB), etc.

SUMMARY

The following presents a simplified summary in connection with one or more aspects disclosed herein. Thus, the following summary should not be considered an extensive overview of all contemplated aspects, nor should the following summary be considered to identify key or critical elements of all contemplated aspects or to delineate the scope associated with any particular aspect. Accordingly, the sole purpose of the summary below is to present some concepts related to one or more aspects related to the mechanisms disclosed herein in a simplified form prior to the detailed description that is presented below.

Systems, apparatuses (devices), methods, and computer-readable media for making a position prediction based on Radio Frequency (RF) data are disclosed. According to at least one example, an apparatus for performing position prediction is provided. The apparatus may include at least one network interface, at least one memory, and at least one processor (e.g., configured in circuitry) coupled to the at least one memory. The at least one processor is configured to: obtaining Radio Frequency (RF) data via the at least one network interface; determining a plurality of feature vectors based on the RF data; generating a plurality of first clusters based on the plurality of feature vectors, wherein the first clusters correspond to a plurality of first pseudo tags; determining a plurality of projection features based on the plurality of feature vectors; training a first ML model using the plurality of first pseudo tags and the plurality of projection features; and predicting a location of the user based on the plurality of projection features and the floor loss.

In another example, a method for performing position prediction is provided. According to at least one example, there is provided a method for training one or more position prediction models, the method comprising: obtaining Radio Frequency (RF) data; determining a plurality of feature vectors based on the RF data; generating a plurality of first clusters based on the plurality of feature vectors, wherein the first clusters correspond to a plurality of first pseudo tags; determining a plurality of projection features based on the plurality of feature vectors; training a first ML model using the plurality of first pseudo tags and the plurality of projection features; and predicting a location of the user based on the plurality of projection features and the floor loss.

In another example, a non-transitory computer-readable storage medium is provided that includes at least one instruction to cause a computer or processor to: obtaining Radio Frequency (RF) data; determining a plurality of feature vectors based on the RF data; generating a plurality of first clusters based on the plurality of feature vectors, wherein the first clusters correspond to a plurality of first pseudo tags; determining a plurality of projection features based on the plurality of feature vectors; training a first ML model using the plurality of first pseudo tags and the plurality of projection features; and predicting a location of the user based on the plurality of projection features and the floor loss.

In another example, an apparatus for performing position prediction is provided. The apparatus includes: means for obtaining Radio Frequency (RF) data; means for determining a plurality of feature vectors based on the RF data; means for generating a plurality of first clusters based on the plurality of feature vectors, wherein the first clusters correspond to a plurality of first pseudo tags; means for determining a plurality of projection features based on the plurality of feature vectors; means for training a first ML model using the plurality of first pseudo tags and the plurality of projection features; and means for predicting a location of the user based on the plurality of projection features and the floor loss.

This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. The subject matter should be understood with reference to appropriate portions of the entire specification of this patent, any or all of the accompanying drawings, and each claim.

Other objects and advantages associated with the aspects disclosed herein will be apparent to those skilled in the art based on the drawings and the detailed description.

Brief Description of Drawings

The accompanying drawings are presented to aid in the description of aspects of the disclosure and are provided solely for illustration of the aspects and not limitation thereof.

Fig. 1 is a block diagram illustrating an example of a computing system of a user device according to some examples;

fig. 2 is a diagram illustrating an example of a wireless device utilizing Radio Frequency (RF) sensing technology to detect user presence according to some examples;

fig. 3 is a diagram illustrating an example of an environment including a wireless device for facilitating detection of a user location, according to some examples;

FIG. 4 is a diagram illustrating a single-layer environment in which a position estimation process of the disclosed technology may be implemented;

5A-5C are diagrams illustrating examples of object detection with a distributed sensing system, according to some examples;

fig. 6 is a diagram illustrating an example graph of signal strength versus signal space positioning according to some examples;

fig. 7 is a diagram illustrating an example block diagram for radar cross-section measurement according to some examples;

fig. 8 is a diagram illustrating an example architecture of a self-supervised position estimation system, according to some examples;

FIG. 9 is a diagram illustrating an example of a cluster of pseudo tags that may be used to generate a self-supervised position estimation system, according to some examples;

FIG. 10A is a diagram illustrating an example of a comparison between two-dimensional implicit space, a Cartesian diagram, and corresponding truth data of a geographic environment, according to some examples;

fig. 10B is a diagram illustrating an example of a comparison between an example multi-layer environment and layers (floors) of a respective region in a cartesian plane;

fig. 11 is a block diagram illustrating an example of a deep learning neural network according to some examples;

fig. 12 is a block diagram illustrating an example of a Convolutional Neural Network (CNN) according to some examples;

FIG. 13 illustrates an example flowchart of a process for training one or more sensing models, according to some examples;

FIG. 14 illustrates an example flowchart of a process for initiating a training procedure and a position estimation process, according to some examples; and

fig. 15 illustrates an example computing system according to some examples.

Detailed Description

For illustrative purposes, certain aspects and embodiments of the disclosure are provided below. Alternate aspects may be devised without departing from the scope of the disclosure. Additionally, well-known elements in this disclosure will not be described in detail or will be omitted so as not to obscure the relevant details of this disclosure. It will be apparent to those skilled in the art that some of the aspects and embodiments described herein may be applied independently and that some of them may be applied in combination. In the following description, for purposes of explanation, specific details are set forth in order to provide a thorough understanding of the embodiments of the present application. It may be evident, however, that the embodiments may be practiced without these specific details. The drawings and descriptions are not intended to be limiting.

The following description merely provides example embodiments and is not intended to limit the scope, applicability, or configuration of the disclosure. Rather, the following description of the exemplary embodiments will provide those skilled in the art with an enabling description for implementing the exemplary embodiments. It being understood that various changes may be made in the function and arrangement of elements without departing from the spirit and scope of the application as set forth in the appended claims.

The terms "exemplary" and/or "example" are used herein to mean "serving as an example, instance, or illustration. Any aspect described herein as "exemplary" and/or "example" is not necessarily to be construed as preferred or advantageous over other aspects. Likewise, the term "aspects of the disclosure" does not require that all aspects of the disclosure include the discussed feature, advantage or mode of operation.

Many sensing devices (e.g., portable electronic devices, smart phones, tablet devices, laptop devices, and WiFi mesh access points) are capable of performing radio frequency sensing (also referred to as RF sensing). For example, the sensing device may utilize RF sensing technology to perform object detection (e.g., determine that an intruder has entered the venue). RF sensing has many applications such as tracking object movement, providing home and enterprise security, and the like.

In some examples, radio frequency sensing may utilize signals (e.g., wiFi signals, 3GPP signals, bluetooth ^TM Signals, etc.) to detect and characterize environmental changes, such as passive localization or movement of a person and activity characterization. For example, a radio frequency sensing system as described herein may analyze communications associated with a wireless device (e.g., a WiFi access point or other device), which may be referred to as a sensing device, to provide accurate detection and location estimation of one or more moving objects (e.g., people or other objects) in an environment. Examples of sensing detection operations include detecting motion (e.g., presence or absence of motion or no motion), a motion pattern (e.g., walking, falling, gesture, or other motion), a motion location (e.g., positioning), motion tracking (e.g., movement of an object (such as a person) over time), vital signs of a person or animal (e.g., respiration, heart rate, etc.), any combination thereof, and/or other information, etc. In one illustrative example, the location of a mobile object may be determined in a multi-room environment (such as a multi-room indoor environment).

In some cases, the machine learning based system may perform various sensing detection operations, such as location detection, motion tracking, and the like. Using the labels and test or training data, the machine learning system may be trained to perform sensing operations. However, it may be difficult to obtain enough tagged data to effectively train some such machine learning systems. For example, in some cases, a sufficient number of location tags may not be available due to difficulty in performing true value data collection (e.g., due to cost, difficulty, and/or privacy concerns).

The disclosed technology generally provides solutions for improving machine-learned position estimation, particularly for deployment in a sparse tag/training data scenario. In some aspects, the present disclosure provides systems, apparatuses, processes (also referred to as methods) and computer-readable media (collectively, "systems and techniques") for performing self-supervision to generate models that predict accurate positioning of objects (e.g., people) in a three-dimensional (3D) environment. In some aspects, the process of the disclosed technology uses RF data, such as Channel State Information (CSI), to generate topologically accurate implicit space that can be mapped onto real world cartesian space using a sparse amount of a priori information. The implicit space may be two-dimensional (2D) or three-dimensional (3D), depending on the desired implementation. Depending on the implementation, the a priori information may include, but is not limited to, location information of various wireless devices (e.g., access points), topological features (e.g., information about indoor floor plans), room-level tags and/or floor tags, and so forth. Using only sparse tag information, accurate positioning of various objects (e.g., people/users) can be performed for different topologies (such as multi-floor and multi-room environments) without requiring the use of accurate positioning tags during training.

As discussed in further detail below, aspects of the disclosed technology utilize triplet losses to train a machine learning model (e.g., a neural network) in an unsupervised manner. The triplet loss may be based on the temporal characteristics of the packet or sample (e.g., CSI sample), as described below. In some aspects, the ML architecture may be configured to: while learning spatial similarity metrics between input RF data (e.g., CSI samples) and performing dimension reduction to generate a 3D (or 2D) implicit space that is topologically similar to the target environment. In some aspects, the generated 2D implicit space can be accurately mapped into the target environment by combining the triplet loss with neural clustering (which involves clustering representations encoded by the neural network) and then training the network to predict previously assigned clusters (pseudo tags). In some approaches, the triplet loss and the neural clustering may play a complementary role. For example, the triplet loss may encourage the model to learn a representation that combines certain samples (e.g., samples or points that are close in time and in some cases spatially close together). Further, the neural clustering may extend the solution to spatially adjacent but temporally distant points.

In some aspects, the implicit space may be mapped to a real world (cartesian) representation using user-provided prior information, which may include, for example, various types of information including, but not limited to, location of access points, planogram information, and/or regional tags. The region tag may be an "attribute" associated with a particular collected CSI and may indicate that a certain CSI is collected when the user is in a particular region X. For example, during training, a user with a device (e.g., a mobile device or smart phone) may walk in different areas of a venue and may indicate in which area the user is located (e.g., by entering an interhome or area indicator in the interface of the device). The system may append the information to the corresponding CSI received at the point in time. Other advanced zone tags, such as room tags and/or floor tags, etc. In some examples, the real world a priori information may be provided by a user via, for example, an application executing on a device (e.g., a mobile device or smart phone) associated with the user. As an example, an application (or app) may facilitate a user in providing a sketch of an environment in which location estimation is to be performed, providing location information regarding one or more wireless devices (e.g., access points or base stations) associated with the environment, and/or providing a room or area indicator (e.g., which may be used for the area tag described above) indicating a room or area in which the user is located. For example, in some implementations, an app may be used to receive input indicators of characteristics about an environment, such as an inter-room or area indicator, or other descriptors.

Various aspects of the systems and techniques described herein are discussed below with reference to the figures. Fig. 1 illustrates an example of a computing system 170 of a user device 107. User device 107 is an example of a device that may be used by an end user. For example, the user device 107 may include a mobile phone, router, tablet computer, laptop computer, tracking device, wearable device (e.g., smart watch, glasses, XR device, etc.), internet of things (IoT) device, vehicle (or computing device of the vehicle), and/or other device used by a user to communicate over a wireless communication network. In some cases, a device may be referred to as a Station (STA) (such as when referring to a device configured to communicate using the Wi-Fi standard). In some cases, a device may be referred to as a User Equipment (UE) (such as when referring to a device configured to communicate using 5G/New Radio (NR), long Term Evolution (LTE), or other telecommunications standards).

Computing system 170 includes software and hardware components that may be electrically or communicatively coupled (or may be otherwise in communication as appropriate) via a bus 189. For example, the computing system 170 includes one or more processors 184. The one or more processors 184 may include one or more CPU, ASIC, FPGA, AP, GPU, VPU, NSP, microcontrollers, dedicated hardware, any combination thereof, and/or other processing devices and/or systems. Bus 189 may be used by one or more processors 184 to communicate between cores and/or with one or more memory devices 186.

The computing system 170 may also include one or more memory devices 186, one or more Digital Signal Processors (DSPs) 182, one or more Subscriber Identity Modules (SIMs) 174, one or more modems 176, one or more wireless transceivers 178, one or more antennas 187, one or more input devices 172 (e.g., camera, mouse, keyboard, touch-sensitive screen, touchpad, keypad, microphone, etc.), and one or more output devices 180 (e.g., display, speaker, printer, etc.).

The one or more wireless transceivers 178 may receive wireless signals (e.g., signals 188) via an antenna 187 from one or more other devices, such as other user devices, network devices (e.g., base stations (such as enbs and/or gnbs), wiFi Access Points (APs) (such as routers, range expanders, etc.), cloud networks, and so forth. In some examples, computing system 170 may include multiple ones that may facilitate simultaneous transmission and reception functionalityAn antenna or an antenna array. The antenna 187 may be an omni-directional antenna so that RF signals may be received and transmitted in all directions. The wireless signal 188 may be transmitted via a wireless network. The wireless network may be any wireless network, such as a cellular or telecommunications network (e.g., 3G, 4G, 5G, etc.), a wireless local area network (e.g., wiFi network), bluetooth ^TM A network and/or other networks. In some examples, the one or more wireless transceivers 178 may include an RF front-end that includes one or more components, such as an amplifier, a mixer for signal down-conversion (also referred to as a signal multiplier), a frequency synthesizer (also referred to as an oscillator) that provides signals to the mixer, a baseband filter, an analog-to-digital converter (ADC), one or more power amplifiers, and other components. The RF front-end may generally handle the selection and conversion to baseband or intermediate frequency of the wireless signal 188 and may convert the RF signal to the digital domain.

In some cases, the computing system 170 may include an encoding-decoding device (or CODEC) configured to encode and/or decode data transmitted and/or received using one or more wireless transceivers 178. In some cases, computing system 170 may include an encryption-decryption device or component configured to encrypt and/or decrypt data transmitted and/or received by one or more wireless transceivers 178 (e.g., in accordance with the Advanced Encryption Standard (AES) and/or the Data Encryption Standard (DES) standard).

The one or more SIMs 174 may each securely store an International Mobile Subscriber Identity (IMSI) number and associated keys assigned to a user of the user device 107. The IMSI and key may be used to identify and authenticate a subscriber when accessing a network provided by a network service provider or operator associated with one or more SIMs 174. One or more modems 176 may modulate one or more signals to encode information for transmission using one or more wireless transceivers 178. The one or more modems 176 may also demodulate signals received by the one or more wireless transceivers 178 to decode the transmitted information. In some examples, the one or more modems 176 may include a WiFi modem, a 4G (or LTE) modem, a 5G (or NR) modem, and/or other types of modems. One or more modems 176 and one or more wireless transceivers 178 may be used to communicate data for one or more SIMs 174.

The computing system 170 may also include (and/or be in communication with) one or more non-transitory machine-readable storage media or storage devices (e.g., one or more memory devices 186), which may include, but are not limited to, local and/or network-accessible storage, disk drives, drive arrays, optical storage devices, solid-state storage devices (such as RAM and/or ROM), which may be programmable, flash-updateable, and the like. Such storage devices may be configured to enable any suitable data storage, including but not limited to various file systems, database structures, and the like.

In various embodiments, the functions may be stored as one or more computer program products (e.g., instructions or code) in the memory device(s) 186 and executed by the processor(s) 184 and/or the DSP(s) 182. The computing system 170 may also include software elements (e.g., located within one or more memory devices 186), including, for example, an operating system, device drivers, executable libraries, and/or other code, such as one or more application programs, which may include computer programs that implement the functions provided by the various embodiments, and/or which may be designed to implement the methods and/or configure the system, as described herein.

Fig. 2 is a diagram illustrating an example of a wireless device 200 that utilizes RF sensing technology to perform one or more functions, such as detecting the presence of a user 202, detecting orientation characteristics of a user, performing motion detection, any combination thereof, and/or performing other functions. In some examples, wireless device 200 may be user device 107, such as a mobile phone, tablet computer, wearable device, or other device that includes at least one RF interface. In some examples, wireless device 200 may be a device that provides connectivity for a user device (e.g., for user device 107), such as a wireless Access Point (AP), a base station (e.g., a gNB, eNB, etc.), or other device that includes at least one RF interface.

In some aspects, wireless device 200 may include one or more components for transmitting RF signals. The wireless device 200 may include a digital-to-analog converter (DAC) 204 that is capable of receiving a digital signal or waveform (e.g., from a microprocessor, not illustrated) and converting the signal or waveform to an analog waveform. The analog signal, which is the output of DAC 204, may be provided to RF transmitter 206. The RF transmitter 206 may be a Wi-Fi transmitter, a 5G/NR transmitter, bluetooth ^TM A transmitter or any other transmitter capable of transmitting RF signals.

The RF transmitter 206 may be coupled to one or more transmit antennas, such as TX antenna 212. In some examples, TX antenna 212 may be an omni-directional antenna capable of transmitting RF signals in all directions. For example, TX antenna 212 may be an omni-directional Wi-Fi antenna capable of radiating Wi-Fi signals (e.g., 2.4GHz, 5GHz, 6GHz, etc.) in a 360 degree radiation pattern. In another example, TX antenna 212 may be a directional antenna that transmits RF signals in a particular direction.

In some examples, wireless device 200 may also include one or more components for receiving RF signals. For example, a receiver lineup in wireless device 200 may include one or more receive antennas, such as RX antenna 214. In some examples, RX antenna 214 may be an omni-directional antenna capable of receiving RF signals from multiple directions. In other examples, RX antenna 214 may be a directional antenna configured to receive signals from a particular direction. In a further example, both TX antenna 212 and RX antenna 214 may include multiple antennas (e.g., elements) configured as an antenna array.

Wireless device 200 may also include an RF receiver 210 coupled to an RX antenna 214. The RF receiver 210 may include a receiver for receiving RF waveforms (such as Wi-Fi signals, bluetooth ^TM Signals, 5G/NR signals, or any other radio frequency signals). The output of the RF receiver 210 may be coupled to an analog-to-digital converter (ADC) 208. The ADC 208 may be configured to convert the received analog RF waveform into a digital waveform that may be provided to a processor, such as a digital signal processor (not illustrated).

In one example, wireless device 200 may implement an RF sensing technique by causing TX waveform 216 to be transmitted from TX antenna 212. Although TX waveform 216 is illustrated as a single line, in some cases TX waveform 216 may be transmitted in all directions through omni-directional TX antenna 212. In one example, TX waveform 216 may be a Wi-Fi waveform transmitted by a Wi-Fi transmitter in wireless device 200. In some cases, TX waveform 216 may correspond to a Wi-Fi waveform that is transmitted simultaneously or nearly simultaneously with a Wi-Fi data communication signal or Wi-Fi control function signal (e.g., beacon transmission). In some examples, TX waveform 216 may be transmitted using the same or similar frequency resources as Wi-Fi data communication signals or Wi-Fi control function signals (e.g., beacon transmissions). In some aspects, TX waveform 216 may correspond to a Wi-Fi waveform that is transmitted separately from Wi-Fi data communication signals and/or Wi-Fi control signals (e.g., TX waveform 216 may be transmitted at different times and/or using different frequency resources).

In some examples, TX waveform 216 may correspond to a 5G NR waveform transmitted simultaneously or nearly simultaneously with a 5G NR data communication signal or a 5G NR control function signal. In some examples, TX waveform 216 may be transmitted using the same or similar frequency resources as the 5G NR data communication signals or the 5G NR control function signals. In some aspects, TX waveform 216 may correspond to a 5G NR waveform that is transmitted separately from a 5G NR data communication signal and/or a 5G NR control signal (e.g., TX waveform 216 may be transmitted at different times and/or using different frequency resources).

In some aspects, one or more parameters associated with TX waveform 216 may be modified, which may be used to increase or decrease the RF sensing resolution. These parameters may include frequency, bandwidth, number of spatial streams, number of antennas configured to transmit TX waveform 216, number of antennas configured to receive the reflected RF signal corresponding to TX waveform 216, number of spatial links (e.g., number of spatial streams multiplied by number of antennas configured to receive the RF signal), sampling rate, or any combination thereof.

In a further example, TX waveform 216 may be implemented with a sequence having perfect or nearly perfect auto-correlation properties. For example, TX waveform 216 may include a single carrier Zadoff sequence or may include symbols similar to Orthogonal Frequency Division Multiplexing (OFDM) Long Training Field (LTF) symbols. In some cases, TX waveform 216 may include a chirp signal used, for example, in a frequency modulated continuous wave (FM-CW) radar system. In some configurations, the chirp signal may include a signal in which the signal frequency increases and/or decreases periodically in a linear and/or exponential manner.

In some aspects, the wireless device 200 may further implement RF sensing techniques by performing concurrent transmit and receive functions. For example, wireless device 200 may enable its RF receiver 210 to receive at or near the same time that it causes RF transmitter 206 to transmit TX waveform 216. In some examples, the transmission of the sequence or pattern included in TX waveform 216 may be repeated continuously such that the sequence is transmitted a particular number of times or for a particular time duration. In some examples, if RF receiver 210 is enabled after RF transmitter 206, repeating the pattern in the transmission of TX waveform 216 may be used to avoid missing any receipt of the reflected signal. In one example implementation, TX waveform 216 may include a sequence having a sequence length of L that is transmitted two or more times, which may allow RF receiver 210 to be enabled at times less than or equal to L in order to receive reflections corresponding to the entire sequence without missing any information.

By implementing simultaneous transmit and receive functionality, wireless device 200 may receive any signal corresponding to TX waveform 216. For example, wireless device 200 may receive a signal reflected from an object or person within range of TX waveform 216, such as RX waveform 218 reflected from user 202. Wireless device 200 may also receive a leakage signal (e.g., TX leakage signal 220) that is directly coupled from TX antenna 212 to RX antenna 214 without being reflected from any object. For example, the leakage signal may include a signal that passes from a transmitter antenna (e.g., TX antenna 212) on the wireless device to a receiving antenna (e.g., RX antenna 214) on the wireless device without being reflected from any object. In some cases, RX waveform 218 may include multiple sequences corresponding to multiple copies of the sequences included in TX waveform 216. In some examples, wireless device 200 may combine multiple sequences received by RF receiver 210 to improve the signal-to-noise ratio (SNR).

Wireless device 200 may further implement RF sensing techniques by obtaining RF data associated with each received signal corresponding to TX waveform 216. In some examples, the RF data may include Channel State Information (CSI) data related to a direct path of TX waveform 216 (e.g., leakage signal 220), along with data related to a reflected path corresponding to TX waveform 216 (e.g., RX waveform 218).

In some aspects, the RF data (e.g., CSI data) may include information that may be used to determine the manner in which the RF signal (e.g., TX waveform 216) propagates from RF transmitter 206 to RF receiver 210. The RF data may include data corresponding to effects on the transmitted RF signal due to scattering, fading, and/or power attenuation over distance, or any combination thereof. In some examples, the RF data may include imaginary and real data (e.g., I/Q components) corresponding to each tone in the frequency domain over a particular bandwidth.

In some examples, the RF data may be used to calculate a distance and angle of arrival corresponding to a reflected waveform (such as RX waveform 218). In further examples, the RF data may also be used to detect motion, determine position, detect position changes or motion patterns, obtain channel estimates, or any combination thereof. In some cases, the distance and angle of arrival of the reflected signal may be used to identify the size, position, movement, or orientation of a user in the surrounding environment (e.g., user 202) in order to detect object presence/proximity, detect object attention, and/or perform motion detection.

Wireless device 200 may calculate the distance and angle of arrival corresponding to the reflected waveform (e.g., the distance and angle of arrival corresponding to RX waveform 218) by utilizing signal processing, machine learning algorithms, using any other suitable technique, or any combination thereof. In other examples, wireless device 200 may transmit the RF data to another computing device (such as a server) that may perform the calculations to obtain a distance and angle of arrival corresponding to RX waveform 218 or other reflected waveforms.

In one example, the distance of the RX waveform 218 can be calculated by measuring the time difference from receiving the leakage signal to receiving the reflected signal. For example, wireless device 200 may determine a baseline distance of zero based on a difference (e.g., propagation delay) from a time wireless device 200 transmits TX waveform 216 to a time it receives leakage signal 220. Wireless device 200 may then determine a distance associated with RX waveform 218 based on a difference (e.g., time of flight) from a time wireless device 200 transmitted TX waveform 216 to a time it received RX waveform 218, which may then be adjusted according to a propagation delay associated with leakage signal 220. By doing so, wireless device 200 may determine the distance traveled by RX waveform 218, which may be used to determine the presence and movement of the user (e.g., user 202) that caused the reflection.

In a further example, the angle of arrival of the RX waveform 218 may be calculated by measuring the time difference of arrival of the RX waveform 218 between individual elements of a receiving antenna array (such as antenna 214). In some examples, the time difference of arrival may be calculated by measuring a receive phase difference at each element in the receive antenna array.

In some cases, the distance and angle of arrival of the RX waveform 218 may be used to determine the distance between the wireless device 200 and the user 202 and the location of the user 202 relative to the wireless device 200. The distance and angle of arrival of the RX waveform 218 may also be used to determine the presence, movement, proximity, attention, identity, or any combination thereof of the user 202. For example, wireless device 200 may utilize the calculated distance and angle of arrival corresponding to RX waveform 218 to determine that user 202 is heading towards wireless device 200. Based on the proximity of the user 202 to the wireless device 200, the wireless device 200 may activate facial authentication to unlock the device. In some aspects, facial authentication may be activated based on user 202 being within a threshold distance of wireless device 200. Examples of threshold distances may include 2 feet, 1 foot, 6 inches, 3 inches, or any other distance.

As mentioned above, the wireless device 200 may include a mobile device (e.g., a smart phone, a laptop, a tablet device, an access point, etc.) or other type of device. In some examples, wireless device 200 may be configured to obtain device location data and device orientation data, as well as RF data. In some examples, the device location data and the device orientation data may be used to determine or adjust the distance and angle of arrival of the reflected signal (such as RX waveform 218). For example, the wireless device 200 may be disposed on a ceiling-facing table as the user 202 walks to the table during the RF sensing process. In this example, wireless device 200 may use its location data and orientation data, as well as RF data, to determine the direction in which user 202 walks.

In some examples, wireless device 200 may collect device location data using techniques including Round Trip Time (RTT) measurements, passive positioning, angle of arrival, received Signal Strength Indicators (RSSI), CSI data, using any other suitable technique, or any combination thereof. In further examples, the device orientation data may be obtained from electronic sensors on the wireless device 200, such as gyroscopes, accelerometers, compasses, magnetometers, barometers, any other suitable sensor, or any combination thereof.

Fig. 3 is a diagram illustrating an environment 300 including a wireless device 302, an Access Point (AP) 304, and a user 308. The wireless device 302 may include a user device (e.g., the user device 107 of fig. 1, such as a mobile device or any other type of device). In some examples, the AP 304 may also be referred to as a sensing device, a radio frequency sensing device, or a wireless device. As shown, user 308 (e.g., with wireless device 302) may move to different locations, including first user location 309a, second user location 309b, and third user location 309c. In some aspects, wireless device 302 and AP 304 may each be configured to perform RF sensing in order to detect the presence of user 308, detect the movement of user 308, any combination thereof, and/or perform other functions with respect to user 308.

In some aspects, the AP 304 may be a Wi-Fi access point that includes hardware and software components that may be configured to simultaneously transmit and receive RF signals, such as the components described herein with respect to the wireless device 200 of fig. 2. For example, AP 304 may include one or more antennas that may be configured to transmit RF signals and one or more antennas (e.g., antenna 306) that may be configured to receive RF signals. As mentioned with respect to wireless device 200 of fig. 2, AP 304 may include an omni-directional antenna or antenna array configured to transmit and receive signals from any direction.

In some aspects, the AP 304 and the wireless device 302 may be configured to implement a dual base configuration in which transmit and receive functions are performed by different devices. For example, AP 304 may transmit an omni-directional RF signal that may include signal 310a and signal 310 b. As illustrated, signal 310a may travel directly (e.g., without reflection) from AP 304 to wireless device 302, and signal 310b may reflect from user 308 at location 309a and cause a corresponding reflected signal 312 to be received by wireless device 302.

In some examples, wireless device 302 may utilize RF data associated with signals 310a and 310b to determine the presence, location, orientation, and/or movement of user 308 at position 309 a. For example, wireless device 302 can obtain, retrieve, and/or estimate location data associated with AP 304. In some aspects, wireless device 302 may use location data associated with AP 304, as well as RF data (e.g., CSI data), to determine time of flight, distance, and/or angle of arrival of associated signals (e.g., direct path signals (such as signal 310 a) and reflected path signals (such as signal 312)) transmitted by AP 304. In some cases, AP 304 and wireless device 302 may further transmit and/or receive communications that may include data (e.g., time of transmission, sequence/pattern, time of arrival, angle of arrival, etc.) associated with RF signal 310a and/or reflected signal 312.

In some examples, wireless device 302 may be configured to: RF sensing is performed using a single base configuration, in which case wireless device 302 performs both transmit and receive functions (e.g., simultaneous TX/RX discussed in connection with wireless device 200). For example, wireless device 302 may detect the presence or movement of user 308 at location 309b by transmitting RF signal 314, which may cause reflected signal 316 from user 308 at location 309b to be received by wireless device 302.

In some aspects, wireless device 302 may obtain RF data associated with reflected signal 316. For example, the RF data may include CSI data corresponding to the reflected signal 316. In a further aspect, the wireless device 302 may use the RF data to calculate a distance and an angle of arrival corresponding to the reflected signal 316. For example, wireless device 302 may determine the distance by calculating a time of flight of reflected signal 316 based on a difference between a leakage signal (not illustrated) and reflected signal 316. In a further example, the wireless device 302 may determine the angle of arrival by receiving the reflected signal with an antenna array and measuring a received phase difference at each element of the antenna array.

In some examples, wireless device 302 may obtain RF data in the form of CSI data, which may be used to formulate a matrix based on a number of frequencies (denoted "K") (e.g., tones) and a number of antenna array elements (denoted "N"). In one technique, the CSI matrix may be formulated according to the relationship given by equation (1 a):

CSI matrix, h= [ H ] _ik ],i＝1,…,N,k＝1,…,K (1a)

In some cases, the CSI matrix is a complex number representing propagation properties (e.g., attenuation and phase) between the transmitter antenna and the receiver antenna at a particular frequency (tone k), as estimated by using reference signals transmitted by the transmitter antenna and received by the receiver antenna. In some examples, CSI matrix h _ik May be denoted as h _ijk Where i is the receive antenna index and j is the transmit antenna index. In some examples, a transmitter for determining a reference signal for CSI may include M>1 antenna, and CSI may be estimated per transmit antenna. For example, in such an example, the CSI matrix may be formulated according to the relationship given by equation (1 b):

CSI matrix, h= [ H ] _ijk ],i＝1,…,N,j＝1…M,k＝1,…,K (1b)

Upon compiling the CSI matrix, wireless device 302 may calculate the angle of arrival and time of flight of the direct signal path (e.g., leakage signal) and the reflected signal path (e.g., reflected signal 316) by utilizing a two-dimensional fourier transform. In one example, the fourier transform may be defined by a relationship given by the following equation (2), where K corresponds to the number of tones in the frequency domain; n corresponds to the number of receiving antennas;h _ik corresponding to CSI data (e.g., complex numbers with real and imaginary parts) captured on the i-th antenna and the k-th tone; f (f) ₀ Corresponding to a carrier frequency; l corresponds to the antenna spacing; c corresponds to the speed of light; and Δf corresponds to a frequency interval between two adjacent tones. The relationship of formula (2) is provided as follows:

in some cases, a relationship similar to that in expression (2) may be formed to estimate the departure angle.

In some aspects, the leakage signal (e.g., leakage signal 220 and/or other leakage signals) may be eliminated by using an iterative elimination method.

In some cases, wireless device 302 may utilize the distance and angle of arrival corresponding to reflected signal 316 to detect the presence or movement of user 308 at location 309 b. In other examples, wireless device 302 may detect further movement of user 308 to third location 309 c. Wireless device 302 may transmit RF signal 318, which results in reflected signal 320 from user 308 at location 309 c. Based on the RF data associated with reflected signal 320, wireless device 302 may determine the location of user 308 at location 309c, detect the presence and/or orientation of the user's head, and/or perform facial recognition and facial authentication.

In some implementations, the wireless device 302 may utilize artificial intelligence or machine learning algorithms to perform motion detection, object classification, and/or detect head orientation in relation to the user 308. In some examples, the machine learning techniques may include supervised machine learning techniques, such as those utilizing neural networks, linear and logistic regression, classification trees, support vector machines, any other suitable supervised machine learning techniques, or any combination thereof. For example, a dataset of sample RF data may be selected to train a machine learning algorithm.

In some aspects, the wireless device 302 and the AP 304 may perform RF sensing techniques regardless of their association with each other or with a Wi-Fi network. For example, when the wireless device 302 is not associated with any access points or Wi-Fi networks, the wireless device 302 may utilize its Wi-Fi transmitter and Wi-Fi receiver to perform RF sensing as discussed herein. In a further example, the AP 304 may perform RF sensing techniques regardless of whether it has any wireless devices associated with it.

In some aspects, wireless device 302 and AP 304 may facilitate RF sensing using one or more machine learning models. For example, the wireless device 302 and/or the AP 304 may be configured to: RF data regarding the environment associated with the various locations of user 308 is collected and provided to a machine learning architecture that is configured, for example, to make location estimate predictions regarding user 308.

In some aspects, the tag regarding the environment may be received from a user, for example, via wireless device 302. In some examples, the tag may include information regarding the location of one or more wireless devices (e.g., AP 304 and/or wireless device 302), as well as information regarding the environment, such as information indicating the plan and/or location of the respective room, and the like. An example of an indoor environment is provided in connection with fig. 4.

Fig. 4 is a diagram illustrating an example environment 400 in which a location estimation process of the disclosed technology may be implemented. As illustrated, the environment 400 includes several different wireless (sensing) devices, such as access points 410, 412, 414, 416. However, it is to be appreciated that other wireless devices, such as a wireless device associated with a user (e.g., wireless device 302), can exist in environment 400 without departing from the scope of the disclosed technology.

In the example of fig. 4, the sensing devices are access points 410, 412, 414, 416 (e.g., a transmitting device 410, and receiving devices 412, 414, 416); however, arrangements are contemplated that include a greater (or fewer) number of wireless devices. As an example, the other wireless devices may include user devices (e.g., user device 107 of fig. 1, such as a mobile device or any other type of device), internet of things (IoT) devices, expanders, replicators, any combination thereof, and/or any other wireless device.

The access points 410, 412, 414, 416 may operate as radio frequency sensing devices, wi-Fi sensing enabled access points, and wireless devices utilizing at least one transceiver (or separate transmitters and receivers), as described herein. The access points 410, 412, 414, 416 and any other wireless devices (not illustrated) may be distributed throughout the environment to provide distributed sensing coverage for the environment 400. For example, as shown in fig. 4, access points 410, 412, 414, 416 are placed in various rooms or areas of an indoor environment. In the illustrated example, region 1 402 includes access point 412, while region 4 404 corresponds to access point 416, region 5 406 corresponds to access point 410, and region 6 408 corresponds to access point 414. Additionally, region 2 404 and region 3 405 contain no devices. The placement and positioning of the access point 410 and the wireless device 412 may be used to determine coverage of a distributed sensing system that may be repositioned to provide optimal sensing coverage, as described herein.

In some aspects, the RF data collected from environment 400 may be used to perform various radio frequency sensing-based detections, such as to perform position estimation and/or motion profiling. For example, the RF data received by the one or more sensing devices may include signals received directly from one or more other sensing devices (e.g., access points 410, 412, 414, 416) and/or may include signals reflected from one or more objects (e.g., people, animals, furniture) and/or structures (e.g., walls, ceilings, posts, etc.) in the environment.

In general, the radio frequency signals are reflected by objects (e.g., walls, posts, furniture, animals, etc.) and/or people located in the home 402. The data related to the radio frequency reflection includes: the amplitude and phase of the radio frequency signal changes as the object/person moves in a given space. By receiving RF data collected from environment 400, a position estimation system (not illustrated) may be used to generate accurate position estimates for various users (e.g., user 416) and/or other objects in environment 400. Depending on the implementation, the position estimation system may also be configured to: a motion profile and/or pattern is identified (e.g., by detecting the presence or absence of motion or no motion), a motion pattern (e.g., walking, falling, gesture, or other motion), a motion location (e.g., positioning), a motion tracking (e.g., movement of a subject or person over time), vital signs of a person or animal (e.g., respiration, heart rate, etc.), any combination thereof, and/or other information.

In some implementations, the RF signals may be used to determine characteristics (e.g., positioning and movement) of objects detected within the environment 400. For example, the RF signal may first be transmitted by the sensing device (e.g., one of the access points 410, 412, 414, 416) or one of the transmitting antennas of the sensing device. Depending on the configuration of the devices within environment 400, the RF signals may then be received at another sensing device (e.g., another of access points 410, 412, 414, 416) or a receiving antenna of the other sensing device. RF data based on the received RF signal may then be transmitted to a position estimation system (not illustrated).

In some aspects, the location estimation may be performed by one or more machine learning models, for example, configured to receive RF and make inferences about the location of objects (e.g., one or more people) within the environment 400.

Fig. 5A-5C are diagrams 510, 520, 530 illustrating examples of object detection with a distributed sensing system. Fig. 5A-5C may further illustrate motion detection and positioning across building 502. For example, in diagram 510 of fig. 5A, an object 512 (e.g., a person) is detected by a distributed sensing system (e.g., the distributed sensing system used with reference to fig. 4), as described herein. The object 512 is detected in a corridor 514 in the west portion of the building 502. As shown in diagram 520 of fig. 5B, object 512 enters room 524 of building 502 as object 512 moves in the eastern direction. By utilizing sensing devices distributed throughout building 502, the distributed sensing system can determine where object 512 is located. Thereafter, as shown in diagram 530 of fig. 5C, object 512 moves from room 524 to corridor 514, and to another room 534. In room 534, the distributed sensing system may detect the location of object 512 in room 534. For example, object 512 in diagram 530 of FIG. 5C, object 512 is detected to be located at the southeast corner of room 534.

Sensing devices of the distributed sensing system, such as, for example, access points and/or wireless devices, may also be used to receive and collect channel estimation information and data from the sensing devices. In some aspects, one or more devices (such as user devices or other devices associated with users within building 502) may be used to collect certain tag data. As an example, the user may provide floor plan information, or tags indicating the relative locations of various rooms and/or wireless devices, such as access points (e.g., access points 410, 412, 414, 416 discussed above).

In some implementations, to detect events in an environment by a distributed sensing system, it may be desirable to have a signal that is strong enough to enable the reflected signal to reach the receiver of the sensing device of the distributed sensing system. As illustrated in fig. 6, the strength of the signal may depend at least on: the transmit power of the sensing device, the antenna gain of the sensing device, and the distance between the transmitter, the target, and the receiver. For example, the greater the transmit power, the more likely the reflected signal will reach the receiver of the corresponding sensing device. If the transmit power is too low, the reflected RF signal may be too low to be detected by the receiver of the sensing device. Similarly, if the antenna gain is too low, the receiver may not receive the reflected RF signal sufficiently. The distance also affects the quality of the transmitted and reflected signals. For example, depending on the configuration of the distributed sensing system, the greater the distance (e.g., path loss) between two sensing devices or the distance between a transmitter and receiver of the same sensing device, the lower the signal strength of the RF signal and the reflected RF signal. Path loss (e.g., spatial losses 614, 618 of fig. 6) or path attenuation is the decrease in power density of an electromagnetic wave as the signal propagates through space. The strength of the signal may also depend on the type of target. For example, if the target size is small (e.g., 1 inch diameter, 3 inches diameter, 6 inches diameter, etc.), the surface area of the target may be small and thus only a small amount of RF signals may be reflected from the target. If the target size is large, the target will have a large surface area that reflects a large amount of RF signals. The reflectivity of the target may be referred to as the radar cross section. The distributed sensing system may measure the intensity of signals reflected from different objects. Based on the signals, the reflected signals, and the intensities of these signals, the distributed sensing system may predict aspects of the target, such as the position and movement of the target. However, if the target is remote from the sensing device, the signal received by the distributed sensing system may be too weak to detect the location of the target or other aspects of the target. If the target is in close proximity to the sensing device, the signal reflected by the target may have sufficient signal strength to enable accurate detection by the distributed sensing system.

Fig. 6 is a diagram illustrating an example graph 600 of signal strength 602 versus signal space location 604 for a detected object. In some implementations, the distributed sensing system may detect events in the environment, which may be represented as a function of signal strength (e.g., of radio frequency signals) received by the sensing device. The radio frequency signal may be generated by the object sensing device as a reflected radio frequency signal. In some implementations, the signal strength of the radio frequency signal may be based on: a transmit power; an antenna gain; path loss between the transmitter and the reflector as a function of sensing device and target location; path loss between the reflector and the receiver as a function of sensing device and target location; reflectivity of the target (e.g., radar Cross Section (RCS)); receiver specification; any combination thereof; and/or other factors. In some cases, the RCS may be determined as a function of the target size and/or shape. In some cases, the antenna gain may be approximated by a distributed sensing system. The distributed sensing system may predict a received sensed signal caused by a target at a given location, such as based on a Received Signal Strength Indicator (RSSI), a path loss measurement, and/or other factors.

Referring to fig. 6, a graph 600 illustrates a transmitter power (P _T ) 610, transmit antenna gain (G _t ) 612, space loss (α) 614 approximating the target, target gain factor (G) _σ ) 616, space loss (a) 618 returned from the target, receiver antenna gain (G _r ) 620 and receiver power (P _r ) 622. The distributed sensing system may further determine an Effective Radiated Power (ERP). For example, if the target is located in region A (e.g., approaching the targetSpace loss) or region B (e.g., space loss 618 from the target), the power may be measured in power density (mW/cm) ² ) Or field strength (V/m).

The signal strength versus signal space location of fig. 6 may be defined by: 10log P _t +10log G _t –α+G _σ –α+10log G _r ＝10log P _r 。

Fig. 7 is a diagram illustrating an example block diagram for radar cross-section measurement 700. For example, the radar cross-section measurement may include a transmitter power (P _T ) 710, transmit antenna gain (G) _t ) 712, free space loss 714, 716Receiver antenna gain (G) _r ) 718, and receiver power (P _r ) 720 (e.g.)> Radar cross-section measurement 700 may further utilize the following: p (P) _t G _t λ ² /(4πR) ² ，(4·π·σ)/λ ² ，P _t G _t λ ² /4πR ² And P _t G _t σλ ² /(4π) ³ (R ₁ ² R ₂ ² ). Lambda refers to the wavelength of the radio frequency signal. R is R _i Refers to the distance from the transmitter or receiver to the target. For example, R ₁ Refers to the distance between the transmitter and the target, and R ₂ Refers to the distance between the target and the receiver. Sigma refers to Radar Cross Section (RCS). The distributed sensing system may also adjust the power and wavelength of the radio frequency signal to optimize the quality and range of the radio frequency signal.

Fig. 8 is a diagram illustrating an example architecture of a self-supervised position estimation system 800, in accordance with some aspects of the disclosed technology. The location estimation system 800 may be configured to generate an object location estimate for a given input RF data (CSI), for example, by creating a reduced dimension implicit space that may be used to perform region-level classification using representation learning and dimension reduction techniques. Depending on the desired implementation, the implicit space may include three-dimensional (3D) or two-dimensional (2D) projections. As discussed in further detail below, dimension reduction techniques utilizing cross-dimension and multi-scale clustering may be implemented to preserve local and global structures within the data and facilitate mapping of 3D (or 2D) implicit spatial representations into accurate position estimates.

In operation, the position estimation system 800 is configured to receive RF data associated with, for example, at least one wireless device in a given environment or location, such as in an indoor environment (block 802). As an example, the RF data may be associated with an object (e.g., a person or user) in the environment of the wireless device to which the location estimation is to be performed. In some examples, the RF data is (or includes) Channel State Information (CSI) measured by one or more wireless devices, such as access points 410-416 discussed above with respect to fig. 4. In some aspects, the CSI provided as input to the position estimation system 800 is pre-processed, for example, by applying a high pass filter to isolate time-varying portions of the CSI signal. Depending on the desired implementation, various types (dimensions) of information from the CSI may be used to perform positioning estimation. As an example, CSI may include, but is not limited to: transmit antenna information, receive antenna information, subcarrier information, speed information, coverage area information, transmitter processing information, receiver processing information, or a combination thereof.

The CSI information is then provided to a feature extractor (block 804), for example, to extract one or more feature vectors. In some approaches, feature extraction may be accomplished using a neural network (e.g., a first machine learning model), such as a Convolutional Neural Network (CNN) configured to generate feature vectors representing salient characteristics of the input RF data. The dimensions of the extracted feature vectors may vary depending on the dimensions of the RF data input and the configuration of the feature extractor (CNN). As an example, the extracted feature vectors may include one or more 128-dimensional vectors (arrays). These high-dimensional feature vectors are then processed to generate one or more clusters (block 806). Although different clustering techniques may be used depending on the desired implementation, in some aspects, a k-means clustering approach is used to create pseudo tags (block 808). In some aspects, dimension reduction is performed to generate clusters (block 806). In such approaches, the number of clusters obtained (or generated) by the feature extractor (block 804) may depend on parameters, such as cluster count parameters indicating the dimensions of the obtained clusters and pseudo tags. In some aspects, this dimension may depend on the size of the space in which the position estimation is performed; for example, there may be a greater number of clusters for a larger space, or a greater number of regions for a larger space may be considered. The pseudo tags (e.g., gao Weiwei tags) may then be used to train the first machine learning model and use multi-layer perceptions (MLPs) (block 814) to counter-propagate cross entropy loss (CE loss) (block 810) for cluster prediction by the first ML model (block 814).

In some aspects, the extracted features (block 804) are also processed to produce a lower dimensional projection (such as a three-dimensional (3D) projection or a two-dimensional (2D) projection) (block 816), which represents, for example, a 3D/2D implicit space corresponding to the environment from which the RF data was collected (block 802). In some examples, the projections (block 816) may also be clustered (block 818) to generate pseudo tags (e.g., 3D or 2D pseudo tags) (block 820). In some aspects, the pseudo tag (block 820) may be based on a priori information (e.g., tag information) provided by, for example, a user associated with the received RF data (block 802). The a priori information may include information about the topology of the environment in which the position estimation is to be performed. As an example, the a priori information may include information about the location of one or more wireless devices (e.g., access points) in the environment, floor plan information, and/or tags, such as tags indicating areas, floors, and/or rooms in the sensed environment.

Using the cross entropy loss function (e.g., cluster loss) (block 822), the 3D/2D pseudo tag (block 820) may be used to train a second machine learning model (block 826), for example, configured to make cluster predictions based on the extracted feature vectors (block 804) (block 824). In some aspects, projections of the implicit space generated from the above-described cross-trained ML architecture (block 816) may be used to make accurate position estimates for one or more objects associated with RF data (CSI) received by the estimation system 800 (block 802).

In some aspects, the cluster loss (L _C ) The relationship of equation (3) can be used to calculate:

is restricted by->

The LHS of equation (3) gives an example of an objective function that a clustering algorithm, such as K-means clustering, can use to create a set of pseudo tags, i.e., to create a set (K) of centroids (C), where the (euclidean) distances (y_n) between points assigned to centroids are minimized so that all (N) points are assigned to centroids. The RHS of equation (3) represents a standard cross entropy loss that is used to provide a gradient to train the neural network using a gradient descent method to predict the pseudo-labels obtained from the clustering algorithm.

In some aspects, the position estimate determined from the projection (block 816) is further based on additional penalty functions including, but not limited to, triple penalty (block 828), access point penalty (which may also be referred to as base station penalty) (block 830), area penalty (832), floor penalty (block 834), or a combination thereof. As discussed further below, both the area loss (832) and the floor loss (block 834) may be based on a priori information/tags, e.g., indicated by the plan information (836), and area tags (838) that may be used to correlate individual areas with floor information.

In some aspects, the triple loss may be based on a time characteristic or dimension corresponding to a wireless device associated with the RF data, such as one or more of the access points 410-414 discussed above with respect to fig. 4. For example, when two packets that are close in time (and similar or dissimilar) are close in implicit space, their triplet loss is low. In one illustrative example, the triplet loss may use only time information. For example, a positive anchor may be selected as a CSI sample for a first period of time (e.g., 1 second, 2 seconds, 3 seconds, or other period of time) of the anchor, and a negative anchor may be selected as a CSI sample for a second period of time (e.g., 2-4 seconds when the first period of time is 2 seconds, or other period of time) of the anchor. In such an illustrative example, the triplet loss may result in encoding temporally closer packets as implicit representations that are closer in implicit space.

In some aspects, the triplet loss (L _T ) The relationship of equation (4) can be used to calculate:

where xi is the anchor, x _j Represents positive samples and xk represents negative samples. In some examples, the euclidean distance between representations may be given by the relationship of equation (5):

d(x,x′)=l||f _θ (x)-f _θ (x′)|| (5)

wherein the hyper-parameter M from formula (4) _t A margin, e.g., a minimum gap between distances that reduces losses to zero, may be represented. For implementations where the triplet loss is specifically adapted for 3D projection, a weight greater than "1" may be introduced in the z-axis when calculating the distance. Weighting in this way may help ensure that the model is severely penalized in case the predicted CSI samples are located at the wrong floor.

In some aspects, the region loss may be based at least in part on the region tag. The region loss may measure the accuracy of correspondence between the region classification predicted based on the received RF data samples (or packets) and the corresponding real region indicated by the region indicator provided on the cartesian diagram representing the associated environment. For example, a region loss may be assigned or determined a high loss value (and thus punishment) for a region class indicated by one or more inter-room or region indicators as belonging to a different region, and a low loss value may be assigned or determined for a region class that is correct based on one or more inter-room or region indicators. In some approaches, one or more a priori information (such as information about an associated plan (for indoor environments)) and implicit space may be based The representation is used to determine a region prediction for each received RF sample. As an example, the region prediction may generate a prediction region corresponding to a given bounding box B based on a K Nearest Neighbor (KNN) lookup. In some aspects, if a point is predicted to lie outside the bounding box, then the region loss is equal to the Manhattan distance d between box B and the track (point) predicted for the RF sample (packet) _m Otherwise, zero. In some implementations, the region loss may be calculated using the relationship of equation (6):

wherein the region coordinates B _ZONE Is identified using the relationship of (7):

[B _zone ]＝([x ₀ ,y ₀ ],[x ₁ ,y ₁ ]) And d _m (x,x′) (7)

In some aspects, access point loss may be based on at least one of: signal strength, location, or a combination thereof of a wireless device associated with the RF data. Further to the example provided above with respect to fig. 4, the access point loss may be based on one or more a priori information provided by the user 416 regarding placement (location) of one or more access points or base stations in the environment 400. In some aspects, access point losses may operate in a manner similar to triple losses. For example, for each packet in a batch, a negative anchor packet is sampled that is far enough apart to have a power difference, but close enough so that it is in the same region. For each access point a from a set of access points a, a power difference may be calculated. In some approaches, if the calculated power difference is greater than a threshold, then there is a higher power (x _i ) The packets should be lower power (x _k ) Packets closer to the corresponding host than the argument M _a . In some approaches, margin M for both triplet loss and access point loss _t And M _a May correspond to a desired euclidean distance difference on a cartesian graph and may be tuned to reflect the speed and time of the tracked object (e.g., person)Timestamp differences. In some implementations, the access point loss may be given by the relationship of equation (8):

in some approaches, access point loss may help bring the position prediction in implicit space closer to real space, and may also help the model know the correct orientation of some rooms.

Additionally, floor losses may be applied to use a priori information about what areas/rooms correspond to what floors to help identify floors (e.g., for multi-floor spaces). In some aspects, the area/floor information may be obtained from a plan, for example, obtained from a user (e.g., via an app), as discussed in further detail below with reference to fig. 14.

In some aspects, the predicted floor locations may be assumed to be continuous, e.g., the user may be assumed not to be able to jump quickly between areas (or floors). To smooth the prediction, a low pass filter may be applied, for example, to eliminate prediction that alternates or flashes between regions. An estimate of the floor may be made using the zone information. For example, a lookup table may be used to determine floors from known zone (room) information.

In some approaches, floor loss (L _F ) Can be given by the relation of formula (9):

where F may define a set of z-axis limits, e.g., indicating an expected number of floors. As an example, if two floors represent a set of expected floor locations, set F may include integer values 1 and 2. In some implementations, m may be used to specify a mask index value to represent a transition between floors, such as a location on a stairwell or in an elevator.

Fig. 9 is a diagram illustrating an example of a cluster of pseudo tags that may be used to generate a self-supervised position estimation system, such as the position estimation system 800 discussed above. As illustrated, the diagram of fig. 9 shows an example of cluster assignments that may be made for high-dimensional feature vectors (high D) and 3D and/or 2D projections (as discussed above with respect to fig. 8).

In some aspects, as the number of clusters increases, the size of the neighborhood may also decrease (e.g., as shown in fig. 9, the number of letters in each bubble is smaller (corresponding to a small neighborhood) for "high K" and greater (corresponding to a large neighborhood) for "low K" than for "high K"). For example, if it is assumed that a very large number of clusters are generated such that on average two points form clusters in a high-dimensional representation, the 3D/2D implicit space may preserve the (local) nearest neighbors. On the other hand, if several clusters are assumed to be present, the 3D (or 2D) implicit space may preserve more global structures, such as from which rooms the samples originate. In some examples, to implement a structure on multiple scales, rather than working with only one set of clusters, the hierarchy of cluster assignments may be extracted and predicted, as discussed above.

Fig. 10A is a diagram 1000 illustrating an example of a comparison between a two-dimensional implicit space 1002, a cartesian diagram 1004, and corresponding truth data 1006 of a geographic environment. In particular, the 2D implicit space 1002 illustrates an example using only triplets and cluster losses, while the illustrated cartesian diagrams incorporate some a priori information (e.g., region level labels and planogram information).

Fig. 10B is a diagram 1050 illustrating an example of a comparison between an example multi-layer environment and layers (floors) of a respective region in a cartesian plane. In the example of fig. 10B, the area of the plan view in diagram 1050 is shown mapped to cartesian coordinate plane 1065, and the area of plan view 1070 is shown mapped to cartesian coordinate plane 1075. In the example of fig. 10B, plan views 1060 and 1070 may represent first and second layers, respectively, of a multi-layer environment. As discussed above, by predicting the area information, a lookup table may be used to infer floors, for example.

Fig. 11 is an illustrative example of a deep learning neural network 1100 that may be used to implement the distributed sensing system described above. The input layer 1120 includes input data. In one illustrative example, input layer 1120 may include data representing pixels of an input video frame. The neural network 1100 includes a plurality of hidden layers 1122a, 1122b through 1122n. The hidden layers 1122a, 1222b through 1222n include "n" hidden layers, where "n" is an integer greater than or equal to 1. The number of hidden layers may be made to include as many layers as are required for a given application. The neural network 1100 further includes an output layer 1121 that provides outputs resulting from the processing performed by the hidden layers 1122a, 1222b, and 1222 n. In one illustrative example, the output layer 1121 may provide classification for objects in an input video frame. The classification may include a category that identifies a type of activity (e.g., play football, play piano, listen to piano, play guitar, etc.).

The neural network 1100 is a multi-layer neural network of interconnected nodes. Each node may represent a piece of information. Information associated with these nodes is shared between different layers, and each layer retains information as it is processed. In some cases, the neural network 1100 may include a feed-forward network in which no output of the network is fed back to its own feedback connection. In some cases, the neural network 1100 may include a recurrent neural network that may have loops that allow information to be carried across nodes when reading an input.

Information may be exchanged between nodes through node-to-node interconnections between the various layers. The nodes of the input layer 1120 may activate a set of nodes in the first hidden layer 1122 a. For example, as shown, each input node of the input layer 1120 is connected to each node of the first hidden layer 1122 a. The nodes of the first hidden layer 1122a may convert the information of each input node by applying an activation function to the information of that input node. The information derived from the conversion may then be passed to and may activate the nodes of the next hidden layer 1122b, which may perform their own specified functions. Example functions include convolution, upsampling, data transformation, and/or any other suitable function. The output of hidden layer 1122b may then activate the node of the next hidden layer, and so on. The output of the last hidden layer 1122n may activate one or more nodes of the output layer 1121 where the output is provided. In some cases, while a node (e.g., node 1126) in the neural network 1100 is shown as having multiple output lines, the node has a single output and all lines are shown as being output from nodes representing the same output value.

In some cases, each node or the interconnections between nodes may have a weight, which is a set of parameters derived from training of the neural network 1100. Once the neural network 1100 is trained, it may be referred to as a trained neural network, which may be used to classify one or more activities. For example, the interconnections between nodes may represent learned pieces of information about the interconnected nodes. The interconnect may have an adjustable numerical weight that may be tuned (e.g., based on a training data set) to allow the neural network 1100 to adapt to inputs and to be able to learn more and more data as it is processed.

The neural network 1100 is pre-trained to process features of data from the input layer 1120 using the different hidden layers 1122a, 1122b, through 1122n to provide an output through the output layer 1121. In examples where the neural network 1100 is used to identify an activity performed by a driver in a frame, the neural network 1100 may be trained using training data that includes both the frame and a tag, as described above. For example, training frames may be input into the network, where each training frame has a label indicating a feature in the frame (for a feature extraction machine learning system) or a label indicating a category of activity in each frame. In one example using object classification for illustration purposes, the training frame may include a digital 2 image, in which case the label of the image may be [ 00 100 00 00 0].

In some cases, the neural network 1100 may use a training process called back propagation to adjust the weights of the nodes. As mentioned above, the back propagation process may include forward transfer, loss function, back transfer, and weight update. Forward pass, loss function, backward pass, and parameter update are performed for one training iteration. This process may be repeated for a certain number of iterations for each set of training images until the neural network 1100 is trained good enough so that the weights of the layers are accurately tuned.

For an example of identifying objects in a frame, forward passing may include passing a training frame through the neural network 1100. The weights are initially randomized prior to training the neural network 1100. As an illustrative example, a frame may include a digital array of pixels representing an image. Each number in the array may include a value from 0 to 255 that describes the pixel intensity for that location in the array. In one example, the array may include a 28 x 3 digital array having 28 rows and 28 columns of pixels and 3 color components (e.g., red, green, and blue, or luminance and two chrominance components, etc.).

As mentioned above, for the first training iteration of the neural network 1100, the output will likely include values that do not give any particular class preference due to randomly selecting weights at initialization. For example, if the output is a vector with a probability that the object includes different categories, the probability values for each different category may be equal or at least very similar (e.g., each category may have a probability value of 0.1 for ten possible categories). With the initial weights, the neural network 1100 cannot determine low-level features, and thus cannot accurately determine what the classification of the object may be. The loss function may be used to analyze the error in the output. Any suitable loss function definition may be used, such as cross entropy loss. Another example of a loss function includes a Mean Square Error (MSE), defined as The penalty may be set equal to E_total (E\u _{Together, a total of} ) Is a value of (2).

For the first training sample, the loss (or error) will be high, because the actual value will be quite different from the predicted output. The goal of training is to minimize the amount of loss so that the predicted output is the same as the training label. The neural network 1100 may perform backward pass by determining which inputs (weights) contribute most to the loss of the network, and the weights may be adjusted to reduce and ultimately minimize the loss. The derivative of the loss with respect to the weight (denoted dL/dW, where W is the weight of a particular layer) may be calculated to determine the weight that contributes the most to the network loss. After the derivative is calculated, the weight update may be performed by updating the weights of all filters. For example, the weights may be updated such that they vary in opposite directions of the gradient. The weight update may be expressed as w=w_i- ηdl/dW, where w represents the weight, wi represents the initial weight, and η represents the learning rate. The learning rate may be set to any suitable value, where a high learning rate includes a larger weight update and a lower value indicates a smaller weight update.

The neural network 1100 may comprise any suitable deep network. One example includes a Convolutional Neural Network (CNN) that includes an input layer and an output layer, with a plurality of hidden layers between the input layer and the output layer. The hidden layers of CNNs include a series of convolutional layers, nonlinear layers, pooling layers (for downsampling), and fully-connected layers. The neural network 1100 may include any other depth network other than a CNN, such as an auto encoder, a Deep Belief Network (DBN), a Recurrent Neural Network (RNN), and the like.

Fig. 12 is an illustrative example of a Convolutional Neural Network (CNN) 1200. The input layer 1220 of the CNN 1200 includes data representing an image or frame. For example, the data may include an array of numbers representing pixels of the image, wherein each number in the array includes a value from 0 to 255 that describes the pixel intensity at that location in the array. Using the previous examples from above, the array may include a 28 x 3 digital array having 28 rows and 28 columns of pixels and 3 color components (e.g., red, green, and blue, or luminance and two chrominance components, etc.). The image may pass through convolution hidden layer 1222a, optional non-linear activation layer, pooling hidden layer 1222b, and full communication hidden layer 1222c to obtain an output at output layer 1224. Although only one of each hidden layer is shown in fig. 12, one of ordinary skill will appreciate that multiple convolutional hidden layers, non-linear layers, pooled hidden layers, and/or fully connected layers may be included in CNN 1200. As previously described, the output may indicate a single class of objects or may include probabilities that best describe the class of objects in the image.

The first layer of CNN 1200 is convolution hidden layer 1222a. Convolution hidden layer 1222a analyzes image data of input layer 1220. Each node of the convolution hidden layer 1222a is connected to a node (pixel) area (called receptive field) of the input image. The convolution hidden layer 1222a may be considered as one or more filters (each corresponding to a different activation or feature map), where each convolution iteration of the filter is a node or neuron of the convolution hidden layer 1222a. For example, the input image area that the filter covers at each convolution iteration will be the receptive field of the filter. In one illustrative example, if the input image includes a 28 x 28 array and each filter (and corresponding receptive field) is a 5 x 5 array, there will be 24 x 24 nodes in the convolutional hidden layer 1222a. Each connection between a node and the receptive field of that node learns a weight and, in some cases, a global bias such that each node learns to analyze its particular local receptive field in the input image. Each node of hidden layer 1222a will have the same weight and bias (referred to as shared weight and shared bias). For example, the filter has an array of weights (numbers) and the same depth as the input. For the video frame example, the filter will have a depth of 3 (from the three color components of the input image). An illustrative example size of a filter array is 5×5×3, which corresponds to the size of the receptive field of a node.

The convolution nature of the convolution hidden layer 1222a is due to each node of the convolution layer being applied to its corresponding receptive field. For example, the filter of convolution hidden layer 1222a may start from the top left corner of the input image array and may convolve around the input image. As mentioned above, each convolution iteration of the filter may be considered a node or neuron of the convolution hidden layer 1222 a. In each convolution iteration, the value of the filter is multiplied by the corresponding number of original pixel values of the image (e.g., a 5 x 5 array of filters is multiplied by a 5 x 5 array of input pixel values in the upper left corner of the input image array). The multiplications from each convolution iteration may be added together to obtain a sum for that iteration or node. Next, the process continues at the next position in the input image, based on the receptive field of the next node in the convolved hidden layer 1222 a. For example, the filter may be moved one step (referred to as a stride) to the next receptive field. The stride may be set to 1 or another suitable amount. For example, if the step size is set to 1, the filter will be shifted 1 pixel to the right at each convolution iteration. Processing the filter at each unique location of the input quantity produces a number representing the filter result for that location, resulting in a sum value being determined for each node of the convolution hidden layer 1222 a.

The mapping from the input layer to the convolution hidden layer 1222a is referred to as an activation map (or feature map). The activation graph includes a value for each node representing the filtering result at each location of the input quantity. The activation map may include an array that includes various sum values produced by each iteration of the filter over the input quantities. For example, if a 5 x 5 filter is applied to each pixel (stride of 1) of a 28x28 input image, the activation map will include a 24 x 24 array. The convolution hidden layer 1222a may include several activation maps to identify a plurality of features in an image. The example shown in fig. 12 includes three activation graphs. By using three activation graphs, the convolution hidden layer 1222a can detect three different types of features, each of which is detectable across the entire image.

In some examples, a non-linear hidden layer may be applied after convolving hidden layer 1222 a. The non-linear layer may be used to introduce non-linearities into a system that is always calculating a linear operation. One illustrative example of a non-linear layer is a modified linear unit (ReLU) layer. The ReLU layer may apply a function f (x) =max (0, x) to all values in the input quantity, which changes all negative activations to 0. Thus, the ReLU may increase the nonlinear properties of CNN 1200 without affecting the receptive field of convolution hidden layer 1222 a.

The pooled hidden layer 1222b may be applied after the convolved hidden layer 1222a (and after the non-linear hidden layer (when in use)). The pooled hidden layer 1222b is used to simplify the information in the output from the convolved hidden layer 1222 a. For example, the pooling hidden layer 1222b may obtain each activation graph output from the convolutional hidden layer 1222a and generate a compressed activation graph (or feature graph) using a pooling function. Max pooling is one example of a function performed by a pooling hidden layer. Other forms of pooling functions (such as average pooling, L2 norm pooling, or other suitable pooling functions) may be used by the pooling hidden layer 1222 a. A pooling function (e.g., a max-pooling filter, an L2-norm filter, or other suitable pooling filter) is applied to each activation graph included in the convolution hidden layer 1222 a. In the example shown in fig. 12, three pooling filters are used to convolve three activation graphs in hidden layer 1222 a.

In some examples, the maximum pooling may be used by applying a maximum pooling filter (e.g., having a size of 2x 2) with a stride (e.g., equal to the size of the filter, such as stride 2) to the activation map output from the convolutional hidden layer 1222 a. The output from the maximum pool filter includes the maximum number in each subregion of the filter convolution. Using a 2x2 filter as an example, each cell in the pooling layer may summarize the area of 2x2 nodes in the previous layer (where each node is a value in the activation graph). For example, four values (nodes) in the activation graph will be analyzed by a 2x2 max pooling filter in each iteration of the filter, with the maximum of the four values being output as the "max" value. If such a max-pooling filter is applied to an active filter from convolutional hidden layer 1222a having a 24 x 24 node size, the output from pooled hidden layer 1222b would be an array of 12 x 12 nodes.

In some examples, an L2 norm pooling filter may also be used. The L2 norm pooling filter includes: calculate the square root of the sum of squares of the values in the 2 x 2 region (or other suitable region) of the activation map (rather than calculating the maximum as in maximum pooling)), and use the calculated values as output.

Intuitively, a pooling function (e.g., max-pooling, L2-norm pooling, or other pooling function) determines whether a given feature is found anywhere in the region of the image, and discards exact location information. This can be done without affecting the feature detection results, since once a feature is found, its exact location is less important than its approximate location with respect to other features. The benefit of maximum pooling (and other pooling methods) is that the pooling features are much less, thereby reducing the number of parameters required in subsequent layers of the CNN 1200.

The last layer of connections in the network is the full connectivity layer, which connects each node from the pooled hidden layer 1222b to each output node in the output layer 1224. Using the example above, the input layer includes 28×28 nodes that encode pixel intensities of the input image; the convolution hidden layer 1222a includes 3×24×24 hidden feature nodes based on applying a 5×5 local receptive field (for the filter) to the three activation maps; and the pooling hidden layer 1222b includes a 3 x 12 hidden feature node layer based on applying a maximum pooling filter to a 2 x 2 region across each of the three feature maps. Expanding this example, output layer 1224 may include ten output nodes. In this example, each node of the 3×12×12 pooled hidden layer 1222b is connected to each node of the output layer 1224.

The full connectivity layer 1222c can take the output of the previous pooled hidden layer 1222b (which should represent an activation graph of advanced features) and determine the features most relevant to a particular class. For example, the full connectivity layer 1222c layer may determine the high-level features most strongly related to a particular category and may include weights (nodes) of the high-level features. The product between the weights of the full connectivity layer 1222c and the pooled hidden layer 1222b may be calculated to obtain probabilities of different categories. For example, if CNN 1200 is used to predict that an object in a video frame is a person, then higher values representing advanced features of the person (e.g., two legs present, face present at the top of the object, two eyes present at the top left and right of the face, nose present in the middle of the face, mouth present at the bottom of the face, and/or other features common to the person) will appear in the activation graph.

In some examples, the output from the output layer 1224 may include an M-dimensional vector (m=10 in the existing example). M indicates the number of categories that CNN 1200 must select when classifying objects in an image. Other example outputs may also be provided. Each number in the M-dimensional vector may represent a probability that the object belongs to a particular class. In one illustrative example, if the 10-dimensional output vector representing ten different classes of objects is [ 00 0.05 0.8 0 0.15 00 00 ], the vector indicates a 5% probability of being present with respect to an object (e.g., dog) of the third class of images, a 80% probability of being present with respect to an object (e.g., person) of the fourth class of images, and a 15% probability of being present with respect to an object (e.g., kangaroo) of the sixth class of images. The probability for a class may be considered the confidence level that the object is part of the class.

Fig. 13 illustrates an example flowchart of a process 1300 for performing position prediction according to some examples of this disclosure. At operation 1302, the process 1300 may include: radio Frequency (RF) data is obtained via at least one network interface. As discussed above, the RF data may include (or may be) Channel State Information (CSI). In some aspects, CSI may comprise at least one of: transmit antenna information, receive antenna information, subcarrier information, speed information, coverage area information, transmitter processing information, receiver processing information, or a combination thereof.

At operation 1304, the process 1300 may include: a plurality of feature vectors is determined based on the RF data. In some implementations, the feature vector may be extracted by a feature extractor, such as the Convolutional Neural Network (CNN) discussed above with respect to fig. 8. The feature vector may provide a high-dimensional (e.g., 128D) representation of the received CSI information.

At operation 1306, the process 1300 may include: a plurality of first clusters are generated based on the plurality of feature vectors, wherein the first clusters correspond to a plurality of first pseudo tags. In some aspects, cluster generation may be based on one or more configuration parameters, such as cluster count parameters indicating a number of dimension reductions. As an example, the cluster count parameter may be a preconfigured parameter based on the location or type of the environment in which the location estimation is performed.

At operation 1308, the process 1300 may include: a plurality of projection features is determined based on the plurality of feature vectors. The projection features may be three-dimensional (3D) or two-dimensional (2D) depending on the desired implementation. At operation 1310, the process 1300 may include: the first ML model is trained using the plurality of first pseudo tags and projection features. In some aspects, the projection features may be processed to generate a plurality of second clusters, and the second clusters may correspond to a plurality of second pseudo tags. In some examples, the second pseudo tag may be used to train a second ML model.

At operation 1312, the process 1300 may include: the location of the user is predicted based on the 3D (or 2D) projection features and floor losses. For example, a user location or a location of one or more other objects may be predicted with respect to various areas or rooms of the indoor space. In some examples, the second pseudo tag may be based on one or more user-provided prior information, such as room indicators, area indicators, and/or floor plan information, or a combination thereof. As an example, area tags and/or a priori planogram information may be used to facilitate determining an elevation location (such as a floor) of a user. In some aspects, the second pseudo tag may be used to train the first ML model.

In some examples, the processes described herein (e.g., processes 1300, 1400, and/or other processes described herein) may be performed by a computing device or apparatus. In one example, processes 1300 and/or 1400 can be performed by a computing device or computing system 1500 shown in fig. 15.

The computing device may include any suitable UE or device, such as a mobile device (e.g., a mobile phone), a desktop computing device, a tablet computing device, a wearable device (e.g., a VR headset, AR glasses, a networked watch or smart watch, or other wearable device), a server computer, an autonomous vehicle or a computing device of an autonomous vehicle, a robotic device, a television, and/or any other computing device having resource capabilities to perform the processes described herein (including process 1300). In some cases, the computing device or apparatus may include various components, such as one or more input devices, one or more output devices, one or more processors, one or more microprocessors, one or more microcomputers, one or more cameras, one or more sensors, and/or other component(s) configured to perform the steps of the processes described herein. In some examples, a computing device may include a display, a network interface configured to communicate and/or receive data, any combination thereof, and/or other component(s). The network interface may be configured to communicate and/or receive Internet Protocol (IP) based data or other types of data.

The components of the computing device may be implemented in circuitry. For example, the components may include and/or be implemented using electronic circuitry or other electronic hardware, which may include one or more programmable electronic circuits (e.g., microprocessors, graphics Processing Units (GPUs), digital Signal Processors (DSPs), central Processing Units (CPUs), visual Processing Units (VPUs), network Signal Processors (NSPs), microcontrollers (MCUs), and/or other suitable electronic circuits), and/or may include and/or be implemented using computer software, firmware, or any combination thereof, to perform the various operations described herein.

Process 1300 is illustrated as a logic flow diagram whose operations represent a sequence of operations that can be implemented in hardware, computer instructions, or a combination thereof. In the context of computer instructions, the operations represent computer-executable instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations. Generally, computer-executable instructions include routines, programs, objects, components, data structures, etc. that perform particular functions or implement particular data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations may be combined and/or performed in parallel in any order to implement the processes.

Additionally, process 1300 and/or other processes described herein may be performed under control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executing concurrently on one or more processors, by hardware, or a combination thereof. As mentioned above, the code may be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program comprising a plurality of instructions executable by one or more processors. The computer-readable or machine-readable storage medium may be non-transitory.

Fig. 14 illustrates an example flow chart of a process 1400 for initiating a training procedure and a position estimation process in accordance with aspects of the disclosed technology. At operation 1402, the process 1400 includes: one or more a priori information associated with the environment is received. In some aspects, the a priori information may include various types of information related to the environment, such as location or placement information of one or more wireless devices or access points within a plan of the indoor environment. In some implementations, the a priori information may include room or area indicators, such as tags or other indicators that specify the type of room (e.g., "kitchen" or "garage"). As mentioned above, the room or region indicator may be associated with CSI data obtained when the user is located in a particular room or region of the venue. In some aspects, the prior information may include labels or other information, such as a plan sketch indicating the relative placement of rooms or areas of the interior space with respect to each other. Depending on the desired implementation, the a priori information may be received via a device associated with the user (e.g., at a server, access point/base station), such as a smart phone or other mobile device.

At operation 1404, process 1400 includes: RF data is obtained. In some examples, the RF data may be associated with one or more wireless devices (e.g., access points or base stations) located in, around, or near the environment in which the position estimation is to be performed. In some aspects, the RF data may include or may represent Channel State Information (CSI) of RF signals transmitted between two or more devices (e.g., a transmitter and a receiver). As such, the RF data may include data regarding signal disturbances associated with, for example, placement and/or movement of objects in the environment. As an example, the RF data may include CSI corresponding to RF signal disturbances caused by movement of a person (e.g., a user) through one or more rooms of an indoor environment.

At operation 1406, the process 1400 includes: one or more location estimation models are generated that are, for example, configured to facilitate location determination of one or more objects in an environment. As discussed above, the one or more location estimation models may include machine learning models configured to: RF data associated with the environment is received or retrieved as input and provided with the necessary processing (e.g., clustering and classification) to make object position estimates for the various objects in the environment. As an example, the one or more location estimation models may be configured to generate a 2D implicit space, which may represent a topology of the associated indoor environment and may be used to generate object location estimates or predictions. In some aspects, the position estimation model may be configured to detect or identify a motion profile of an object, such as identifying events or actions performed in the environment based on the motion of various animate or inanimate objects.

At operation 1408, the process 1400 includes: an alert is generated that includes an object location estimate. In some examples, the alert may include information describing the presence and/or location of one or more objects in the environment. As an example, the alarm may be provided as an intrusion alarm, for example, to alert homeowners and/or security personnel to the presence of people (or other objects) in the vicinity of the home or business environment. In some aspects, the alert may be transmitted to a device (e.g., a UE or a smart phone) corresponding to the intended recipient. Depending on the desired implementation, the alert may be configured to provide audible, visual, and/or tactile notification to a user associated with the recipient device (e.g., a smart phone).

Fig. 15 is a diagram illustrating an example of a system for implementing certain aspects of the technology herein. In particular, fig. 15 illustrates an example of a computing system 1500, which computing system 1500 can be, for example, any computing device that constitutes an internal computing system, a remote computing system, a camera, or any component thereof, with the components of the system in communication with one another using a connection 1505. Connection 1505 may be a physical connection using a bus or a direct connection to processor 1510 (such as in a chipset architecture). Connection 1505 may also be a virtual connection, a networking connection, or a logical connection.

In some embodiments, computing system 1500 is a distributed system, wherein the functionality described in this disclosure may be distributed within a data center, multiple data centers, a peer-to-peer network, and so forth. In some embodiments, one or more of the described system components represent many such components, each of which performs some or all of the functions described for that component. In some embodiments, the components may be physical or virtual devices.

The example system 1500 includes at least one processing unit (CPU or processor) 1510 and a connection 1505 that couples various system components including the system memory 1515, such as the Read Only Memory (ROM) 1520 and the Random Access Memory (RAM) 1525, to the processor 1510. The computing system 1500 may include a cache 1512 that is directly connected to the processor 1510, immediately adjacent to the processor 1510, or integrated as part of the processor 1510.

Processor 1510 may include any general purpose processor and hardware services or software services, such as services 1532, 1534, and 1536 stored in storage 1530 configured to control processor 1510, and special purpose processors, with software instructions incorporated into the actual processor design. Processor 1510 may be essentially a fully self-contained computing system, containing multiple cores or processors, buses, memory controllers, caches, etc. The multi-core processor may be symmetrical or asymmetrical.

To enable user interaction, computing system 1500 includes an input device 1545 that can represent any number of input mechanisms, such as a microphone for voice, a touch-sensitive screen for gesture or graphical input, a keyboard, a mouse, motion input, voice, and so forth. The computing system 1500 may also include an output device 1535, which output device 1535 may be one or more of several output mechanisms. In some examples, the multimodal system may enable a user to provide multiple types of input/output to communicate with the computing system 1500. The computing system 1500 may include a communication interface 1540 that can generally manage and manage user inputs and system outputs.

The communication interface may perform or facilitate the use of a wired and/or wireless transceiver to receive and/or transmit wired or wireless communications, including utilizing an audio jack/plug, a microphone jack/plug, a Universal Serial Bus (USB) port/plug,Port/plug, ethernet port/plug, fiber optic port/plug, dedicated wired port/plug,Radio signal transmission, < >>Low Energy (BLE) radio signaling, < > and->Wireless signaling, radio Frequency Identification (RFID) wireless signaling, near Field Communication (NFC) wireless signaling, dedicated Short Range Communication (DSRC) wireless signaling, 802.11Wi-Fi wireless signaling, wireless Local Area Network (WLAN) signaling, visible Light Communication (VLC), worldwide Interoperability for Microwave Access (WiMAX), infrared (IR) communication wireless signaling, public Switched Telephone Network (PSTN) signaling, integrated Services Digital Network (ISDN) signaling, 3G/4G/5G/LTE cellular data network wireless signaling, ad hoc network signaling, radio wave signaling, microwave signaling, infrared signaling, visible light signaling, ultraviolet light signaling, wireless signaling along the electromagnetic spectrum, or some combination thereof.

Communication interface 1540 may also include one or more Global Navigation Satellite System (GNSS) receivers or transceivers that are used to determine the location of computing system 1500 based on receiving one or more signals from one or more satellites associated with one or more GNSS systems. GNSS systems include, but are not limited to, the united states based Global Positioning System (GPS), the russian based global navigation satellite system (GLONASS), the chinese based beidou navigation satellite system (BDS), and the european based galileo GNSS. There are no limitations to operating on any particular hardware arrangement, and thus the underlying features herein may be readily replaced to obtain an improved hardware or firmware arrangement as they are developed.

The storage device 1530 may be non-volatile and/or non-transitory and/or computingA machine-readable Memory device, and may be a hard disk or other type of computer-readable medium capable of storing data accessible by a computer, such as a magnetic tape cartridge, a flash Memory card, a solid state Memory device, a digital versatile disk, a cassette, a floppy disk, a hard disk, a magnetic tape, a magnetic stripe/strip, any other magnetic storage medium, a flash Memory, a memristor Memory, any other solid state Memory, a compact disk read-only Memory (CD-ROM) disc, a compact disk rewriteable (CD) disc, a Digital Video Disk (DVD) disc, a blu-ray disc (BDD) disc, a holographic disc, another optical medium, a Secure Digital (SD) card, a micro-secure digital (microSD) card, a Memory Stick Cards, smart card chips, EMV chips, subscriber Identity Module (SIM) cards, mini/micro/nano/pico SIM cards, another Integrated Circuit (IC) chip/card, random Access Memory (RAM), static RAM (SRAM), dynamic RAM (DRAM), read Only Memory (ROM), programmable Read Only Memory (PROM), erasable Programmable Read Only Memory (EPROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory EPROM (FLASHEPROM), cache memory (L1/L2/L3/L4/L5/l#), resistive random access memory (RRAM/ReRAM), phase Change Memory (PCM), spin transfer torque RAM (STT-RAM), another memory chip or cartridge, and/or combinations thereof.

Storage 1530 may include software services, servers, services, etc., which when executed by processor 1510 cause the system to perform functions. In some embodiments, the hardware services performing particular functions may include software components stored in a computer-readable medium that interfaces with the necessary hardware components (such as processor 1510, connection 1505, output device 1535, etc.) to perform the functions. The term "computer-readable medium" includes, but is not limited to, portable or non-portable storage devices, optical storage devices, and various other media capable of storing, containing, or carrying instruction(s) and/or data. Computer-readable media may include non-transitory media in which data may be stored and which do not include carrier waves and/or transitory electronic signals propagating wirelessly or through a wired connection.

Examples of non-transitory media may include, but are not limited to, magnetic disks or tapes, optical storage media such as Compact Discs (CDs) or Digital Versatile Discs (DVDs), flash memory, or memory devices. The computer-readable medium may have code and/or machine-executable instructions stored thereon, which may represent procedures, functions, subroutines, programs, routines, subroutines, modules, software packages, classes, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.

Specific details are provided in the above description to provide a thorough understanding of the embodiments and examples provided herein, but one skilled in the art will recognize that the application is not so limited. Thus, although illustrative embodiments of the present application have been described in detail herein, it is to be understood that the various inventive concepts may be otherwise variously embodied and employed, and that the appended claims are not intended to be construed to include such variations unless limited by the prior art. The various features and aspects of the above-mentioned applications may be used singly or in combination. Furthermore, embodiments may be utilized in any number of environments and applications beyond those described herein without departing from the broader spirit and scope of the specification. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. For purposes of illustration, the methods are described in a particular order. It should be appreciated that in alternative embodiments, the methods may be performed in a different order than described.

For clarity of illustration, in some examples, the inventive techniques may be presented as including individual functional blocks that include devices, device components, steps or routines in a method implemented in software or a combination of hardware and software. Additional components other than those shown in the figures and/or described herein may be used. For example, circuits, systems, networks, processes, and other components may be shown in block diagram form in order to avoid obscuring the embodiments in unnecessary detail. In other instances, well-known circuits, processes, algorithms, structures, and techniques may be shown without unnecessary detail in order to avoid obscuring the embodiments.

Furthermore, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

Various embodiments may be described above as a process or method, which is depicted as a flowchart, a flow diagram, a data flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be rearranged. The process is terminated when its operations are completed, but may have additional steps not included in the figures. The process may correspond to a method, a function, a procedure, a subroutine, etc. When a process corresponds to a function, its termination corresponds to the function returning to the calling function or the main function.

The processes and methods according to the examples above may be implemented using stored computer-executable instructions or computer-executable instructions otherwise available from a computer-readable medium. Such instructions may include, for example, instructions and data which cause or otherwise configure a general purpose computer, special purpose computer, or processing device to perform a certain function or group of functions. Portions of the computer resources used are accessible over a network. The computer-executable instructions may be, for example, binary files, intermediate format instructions (such as assembly language), firmware, source code. Examples of computer readable media that may be used to store instructions, information used during a method according to the described examples, and/or created information include magnetic or optical disks, flash memory, USB devices provided with non-volatile memory, networked storage devices, and the like.

In some embodiments, the computer readable storage devices, media, and memory may comprise a cable or wireless signal comprising a bit stream or the like. However, when referred to, non-transitory computer-readable storage media expressly exclude media such as energy, carrier signals, electromagnetic waves, and signals themselves.

Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof, depending in part on the particular application, on the desired design, on the corresponding technology, and the like.

The various illustrative logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof, and may take on any of a variety of form factors. When implemented in software, firmware, middleware or microcode, the program code or code segments (e.g., a computer program product) to perform the necessary tasks may be stored in a computer-readable or machine-readable medium. The processor may perform the necessary tasks. Examples of form factors include: laptop devices, smart phones, mobile phones, tablet devices, or other small form factor personal computers, personal digital assistants, rack-mounted devices, free-standing devices, and the like. The functionality described herein may also be implemented with a peripheral device or a plug-in card. As a further example, such functionality may also be implemented on different chips or circuit boards among different processes executing on a single device.

The instructions, the media used to convey these instructions, the computing resources used to execute them, and other structures used to support such computing resources are example means for providing the functionality described in this disclosure.

The techniques described herein may also be implemented in electronic hardware, computer software, firmware, or any combination thereof. The techniques may be implemented in any of a variety of devices such as a general purpose computer, a wireless communication device handset, or an integrated circuit device having multiple uses including applications in wireless communication device handsets and other devices. Any features described as modules or components may be implemented together in an integrated logic device or separately as discrete but interoperable logic devices. If implemented in software, the techniques may be realized at least in part by a computer-readable data storage medium comprising program code that includes instructions that, when executed, perform one or more of the methods, algorithms, and/or operations described above. The computer readable data storage medium may form part of a computer program product, which may include packaging material. The computer-readable medium may include memory or data storage media such as Random Access Memory (RAM), such as Synchronous Dynamic Random Access Memory (SDRAM), read-only memory (ROM), non-volatile random access memory (NVRAM), electrically erasable programmable read-only memory (EEPROM), flash memory, magnetic or optical data storage media, and the like. The techniques may additionally or alternatively be implemented at least in part by a computer-readable communication medium that carries or communicates program code in the form of instructions or data structures that may be accessed, read, and/or executed by a computer, such as propagated signals or waves.

The program code may be executed by a processor, which may include one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Such processors may be configured to perform any of the techniques described in this disclosure. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Accordingly, the term "processor" as used herein may refer to any of the foregoing structure, any combination of the foregoing structure, or any other structure or apparatus suitable for implementation of the techniques described herein.

Those of ordinary skill in the art will appreciate that less ("<") and greater than (">) symbols or terms used herein may be substituted with less than equal (" +") and greater than equal (" +") symbols, respectively, without departing from the scope of the present description.

Where components are described as "configured to" perform certain operations, such configuration may be achieved, for example, by designing electronic circuitry or other hardware to perform the operations, by programming programmable electronic circuitry (e.g., a microprocessor, or other suitable electronic circuitry), or any combination thereof.

The phrase "coupled to" refers to any component being physically connected directly or indirectly to another component, and/or any component being in communication directly or indirectly with another component (e.g., being connected to the other component through a wired or wireless connection and/or other suitable communication interface).

Claim language or other language reciting "at least one" of a collection and/or "one or more" of a collection indicates that a member of the collection or members of the collection (in any combination) satisfies the claim. For example, claim language reciting "at least one of a and B" or "at least one of a or B" means A, B, or a and B. In another example, claim language reciting "at least one of A, B and C" or "at least one of A, B or C" means A, B, C, or a and B, or a and C, or B and C, or a and B and C. The language of "at least one of the sets" and/or "one or more of the sets" does not limit the set to the items listed in the set. For example, claim language reciting "at least one of a and B" or "at least one of a or B" may mean A, B or a and B, and may additionally include items not recited in the set of a and B.

Illustrative examples of the present disclosure include: the scope of the aspects is changed to ensure that the aspects do not result in mutually exclusive embodiments overlapping.

Aspect 1: an apparatus for performing position prediction, the apparatus comprising: at least one network interface; at least one memory; and at least one processor coupled to the at least one memory, the at least one processor configured to: obtaining Radio Frequency (RF) data via the at least one network interface; determining a plurality of feature vectors based on the RF data; generating a plurality of first clusters based on the plurality of feature vectors, wherein the plurality of first clusters correspond to a plurality of first pseudo tags; determining a plurality of projection features based on the plurality of feature vectors; training a first Machine Learning (ML) model using the plurality of first pseudo tags and the plurality of projection features; and predicting a location of the user based on the plurality of projection features and the floor loss.

Aspect 2: the apparatus of aspect 1, wherein the at least one processor is further configured to: the plurality of projection features are processed to generate a plurality of second clusters, wherein the plurality of second clusters corresponds to a plurality of second pseudo tags.

Aspect 3: the apparatus of aspect 2, wherein the plurality of second pseudo tags are used to train a second ML model.

Aspect 4: the apparatus of any of aspects 1-3, wherein the plurality of projection features are three-dimensional (3D) projection features.

Aspect 5: the apparatus of any one of aspects 1 to 4, wherein, to generate the plurality of first clusters, the at least one processor is configured to: a cluster count parameter is received.

Aspect 6: the apparatus of any of aspects 1-5, wherein the floor penalty is based on one or more tag a priori information of an environment associated with the RF data.

Aspect 7: the apparatus of any of aspects 1-6, wherein predicting the location of the user is further based on at least one of: triple loss, access point loss, area loss, or a combination thereof.

Aspect 8: the apparatus of aspect 7, wherein the triplet loss is based on a similarity of packets corresponding to a wireless device associated with the RF data.

Aspect 9: the apparatus of any of aspects 7 or 8, wherein the regional loss is based on one or more a priori information of an environment associated with the RF data.

Aspect 10: the apparatus of any of aspects 7-9, wherein the access point loss is based on at least one of: the signal strength, location, or a combination thereof of the wireless device associated with the RF data.

Aspect 11: the apparatus of any one of aspects 1 to 10, wherein the RF data comprises Channel State Information (CSI).

Aspect 12: the apparatus of any one of aspects 1 to 11, wherein the CSI comprises at least one of: transmit antenna information, receive antenna information, subcarrier information, speed information, coverage area information, transmitter processing information, receiver processing information, or a combination thereof.

Aspect 13: a method for performing position prediction, the method comprising: obtaining Radio Frequency (RF) data; determining a plurality of feature vectors based on the RF data; generating a plurality of first clusters based on the plurality of feature vectors, wherein the plurality of first clusters correspond to a plurality of first pseudo tags; determining a plurality of projection features based on the plurality of feature vectors; training a first Machine Learning (ML) model using the plurality of first pseudo tags and the plurality of projection features; and predicting a location of the user based on the plurality of projection features and the floor loss.

Aspect 14: the method of aspect 13, further comprising: the plurality of projection features are processed to generate a plurality of second clusters, wherein the plurality of second clusters corresponds to a plurality of second pseudo tags.

Aspect 15: the method of aspect 14, wherein the plurality of second pseudo tags are used to train a second ML model.

Aspect 16: the method of any of aspects 13-15, wherein the plurality of projection features are three-dimensional (3D) projection features.

Aspect 17: the method of any of aspects 13-16, wherein generating the plurality of first clusters further comprises: a cluster count parameter is received.

Aspect 18: the method of any of aspects 13-17, wherein the floor penalty is based on one or more tag a priori information of an environment associated with the RF data.

Aspect 19: the method of any of aspects 13-18, wherein predicting the location of the user is further based on at least one of: triple loss, access point loss, area loss, or a combination thereof.

Aspect 20: the method of aspect 19, wherein the triplet loss is based on a similarity of packets corresponding to a wireless device associated with the RF data.

Aspect 21: the method of any of aspects 19 or 20, wherein the regional loss is based on one or more a priori information of an environment associated with the RF data.

Aspect 22: the method of any of aspects 19-21, wherein the access point loss is based on at least one of: the signal strength, location, or a combination thereof of the wireless device associated with the RF data.

Aspect 23: the method of any of aspects 13-22, wherein the RF data comprises Channel State Information (CSI).

Aspect 24: the method of any of aspects 13 to 23, wherein the CSI comprises at least one of: transmit antenna information, receive antenna information, subcarrier information, speed information, coverage area information, transmitter processing information, receiver processing information, or a combination thereof.

Aspect 25: a non-transitory computer-readable storage medium comprising at least one instruction for causing a computer or processor to: obtaining Radio Frequency (RF) data; determining a plurality of feature vectors based on the RF data; generating a plurality of first clusters based on the plurality of feature vectors, wherein the plurality of first clusters correspond to a plurality of first pseudo tags; determining a plurality of projection features based on the plurality of feature vectors; training a first ML model using the plurality of first pseudo tags and the plurality of projection features; and predicting a location of the user based on the plurality of projection features and the floor loss.

Aspect 26: the non-transitory computer-readable storage medium of aspect 25, comprising at least one instruction for causing a computer or processor to perform operations according to any one of aspects 1 to 24.

Aspect 27: an apparatus, comprising: means for obtaining Radio Frequency (RF) data; means for determining a plurality of feature vectors based on the RF data; means for generating a plurality of first clusters based on the plurality of feature vectors, wherein the plurality of first clusters corresponds to a plurality of first pseudo tags; means for determining a plurality of projection features based on the plurality of feature vectors; means for training a first ML model using the plurality of first pseudo tags and the plurality of projection features; and means for predicting a location of the user based on the plurality of projection features and the floor loss.

Aspect 28: the apparatus of aspect 27 comprising means for performing the operations according to any one of aspects 1 to 24.

Claims

1. An apparatus for performing position prediction, the apparatus comprising:

at least one network interface;

at least one memory; and

at least one processor coupled to the at least one memory, the at least one processor configured to:

obtaining Radio Frequency (RF) data via the at least one network interface;

determining a plurality of feature vectors based on the RF data;

generating a plurality of first clusters based on the plurality of feature vectors, wherein the plurality of first clusters correspond to a plurality of first pseudo tags;

Determining a plurality of projection features based on the plurality of feature vectors;

training a first Machine Learning (ML) model using the plurality of first pseudo tags and the plurality of projection features; and

the location of the user is predicted based on the plurality of projection features and the floor loss.

2. The apparatus of claim 1, wherein the at least one processor is further configured to:

the plurality of projection features are processed to generate a plurality of second clusters, wherein the plurality of second clusters corresponds to a plurality of second pseudo tags.

3. The apparatus of claim 2, wherein the plurality of second pseudo tags are used to train a second ML model.

4. The apparatus of claim 1, wherein the plurality of projection features are three-dimensional (3D) projection features.

5. The apparatus of claim 1, wherein to generate the plurality of first clusters, the at least one processor is configured to:

a cluster count parameter is received.

6. The apparatus of claim 1, wherein the floor penalty is based on one or more tag priors corresponding to an environment associated with the RF data.

7. The apparatus of claim 1, wherein predicting the location of the user is further based on at least one of: triple loss, access point loss, area loss, or a combination thereof.

8. The apparatus of claim 7, wherein the triplet loss is based on a similarity of packets corresponding to wireless devices associated with the RF data.

9. The apparatus of claim 7, wherein the regional loss is based on one or more a priori information of an environment associated with the RF data.

10. The apparatus of claim 7, wherein the access point loss is based on at least one of: signal strength, location, or a combination thereof, of a wireless device associated with the RF data.

11. The apparatus of claim 1, wherein the RF data comprises Channel State Information (CSI).

12. The apparatus of claim 11, wherein the CSI comprises at least one of: transmit antenna information, receive antenna information, subcarrier information, speed information, coverage area information, transmitter processing information, receiver processing information, or a combination thereof.

13. A method for performing position prediction, the method comprising:

obtaining Radio Frequency (RF) data;

determining a plurality of feature vectors based on the RF data;

14. The method of claim 13, further comprising:

15. The method of claim 14, wherein the plurality of second pseudo tags are used to train a second ML model.

16. The method of claim 13, wherein the plurality of projection features are three-dimensional (3D) projection features.

17. The method of claim 13, wherein generating the plurality of first clusters further comprises:

a cluster count parameter is received.

18. The method of claim 13, wherein the floor penalty is based on one or more tag priors corresponding to an environment associated with the RF data.

19. The method of claim 13, wherein predicting the location of the user is further based on at least one of: triple loss, access point loss, area loss, or a combination thereof.

20. The method of claim 19, wherein the triplet loss is based on a similarity of packets corresponding to wireless devices associated with the RF data.

21. The method of claim 19, wherein the regional loss is based on one or more a priori information of an environment associated with the RF data.

22. The method of claim 19, wherein the access point loss is based on at least one of: signal strength, location, or a combination thereof, of a wireless device associated with the RF data.

23. The method of claim 13, wherein the RF data comprises Channel State Information (CSI).

24. The method of claim 23, wherein the CSI comprises at least one of: transmit antenna information, receive antenna information, subcarrier information, speed information, coverage area information, transmitter processing information, receiver processing information, or a combination thereof.

25. A non-transitory computer-readable storage medium comprising at least one instruction for causing a computer or processor to:

obtaining Radio Frequency (RF) data;

determining a plurality of feature vectors based on the RF data;

training a first ML model using the plurality of first pseudo tags and the plurality of projection features; and

26. The non-transitory computer-readable storage medium of claim 25, further comprising at least one instruction for causing the computer or processor to:

the plurality of projection features are processed to generate a plurality of second clusters, wherein the second clusters correspond to a plurality of second pseudo tags.

27. The non-transitory computer-readable storage medium of claim 26, wherein the second pseudo tag is used to train a second ML model.

28. The non-transitory computer-readable storage medium of claim 25, wherein the plurality of projection features are three-dimensional (3D) projection features.

29. The non-transitory computer-readable storage medium of claim 25, wherein generating the plurality of first clusters further comprises:

a cluster count parameter is received.

30. An apparatus, comprising:

means for obtaining Radio Frequency (RF) data;

means for determining a plurality of feature vectors based on the RF data;

Means for generating a plurality of first clusters based on the plurality of feature vectors, wherein the plurality of first clusters corresponds to a plurality of first pseudo tags;

means for determining a plurality of projection features based on the plurality of feature vectors;

means for training a first ML model using the plurality of first pseudo tags and the plurality of projection features; and

means for predicting a location of the user based on the plurality of projection features and the floor loss.