CN107203259B - Method and apparatus for determining probabilistic content awareness for mobile device users using single and/or multi-sensor data fusion - Google Patents

Method and apparatus for determining probabilistic content awareness for mobile device users using single and/or multi-sensor data fusion Download PDF

Info

Publication number
CN107203259B
CN107203259B CN201610466435.4A CN201610466435A CN107203259B CN 107203259 B CN107203259 B CN 107203259B CN 201610466435 A CN201610466435 A CN 201610466435A CN 107203259 B CN107203259 B CN 107203259B
Authority
CN
China
Prior art keywords
vector
electronic device
level context
sensor
posterior probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610466435.4A
Other languages
Chinese (zh)
Other versions
CN107203259A (en
Inventor
M·乔达里
A·库马
G·辛
K·R·J·米尔
I·N·卡
R·巴尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
STMICROELECTRONICS INTERNATIONAL NV
STMicroelectronics lnc USA
Original Assignee
STMICROELECTRONICS INTERNATIONAL NV
STMicroelectronics lnc USA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/074,188 external-priority patent/US9870535B2/en
Application filed by STMICROELECTRONICS INTERNATIONAL NV, STMicroelectronics lnc USA filed Critical STMICROELECTRONICS INTERNATIONAL NV
Publication of CN107203259A publication Critical patent/CN107203259A/en
Application granted granted Critical
Publication of CN107203259B publication Critical patent/CN107203259B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package

Abstract

The present disclosure relates to methods and apparatus for determining probabilistic content awareness for a mobile device user using single-sensor and/or multi-sensor data fusion. An electronic device described herein includes a sensing unit having at least one sensor for acquiring sensed data. An associated computing device extracts a plurality of sensor-specific features from the sensed data and generates a motion activity vector, a voice activity vector, and a spatial environment vector from the sensor-specific features. Processing the motion activity vector, the voice activity vector, and the spatial environment vector to determine a basic level context of the electronic device relative to its surroundings, wherein the basic level context has a plurality of aspects, each aspect based on the motion activity vector, the voice activity vector, and the spatial environment vector. Determining a meta-level context of the electronic device relative to its surroundings from the basic-level context, wherein the meta-level context is at least one inference made from at least two of the plurality of aspects of the basic-level context.

Description

Method and apparatus for determining probabilistic content awareness for mobile device users using single and/or multi-sensor data fusion
RELATED APPLICATIONS
This application claims benefit and priority to U.S. application No. 62/121,104 filed on 26/2/2015 and is also a continuation-in-part application to U.S. application No. 14/749,118 filed on 24/6/2015, the contents of both applications being incorporated by reference to the maximum extent allowable under law.
Technical Field
The present disclosure relates to the field of electronic devices, and more particularly to a framework for determining a context of a user of a mobile device based on the user's athletic activity, voice activity, and spatial environment using single-sensor data and/or multi-sensor data fusion.
Background
Mobile devices and wearable devices, such as smartphones, tablet computers, smartwatches, and activity trackers, increasingly carry one or more sensors, such as accelerometers, gyroscopes, magnetometers, barometers, microphones, and GPS receivers, that can be used, alone or in combination, to detect a user's context, such as a user's athletic activity, a user's or speech activity related thereto, and a user's spatial environment. Previous research efforts on athletic activity have considered the classification of basic athletic activity of users, such as walking, jogging, and cycling. Speech detection uses microphone recording to detect human utterances from silence in the presence of background noise and is used in a variety of applications such as audio conferencing, variable rate speech codecs, speech recognition, and echo cancellation. Research studies have been conducted to detect the spatial environment of a mobile device user from an audio recording in order to determine the user's environmental classification, such as in the office, on the street, at the stadium, at the beach, etc.
In most context detection tasks, data from one sensor is used. Accelerometers are typically used for motion activity detection, while microphones are used for voice activity detection and spatial environment detection.
These prior art detection methods provide a deterministic output in the form of a class detected from a set of specific classes of athletic activity or acoustic environment as described above. However, determining the context of the user using such prior art techniques may not be as accurate as desired and, moreover, does not allow for more complex determinations regarding the context of the user. Therefore, further developments in this area are needed.
Disclosure of Invention
This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.
An electronic device described herein includes a sensing unit having at least one sensor for acquiring sensed data. An associated computing device extracts a plurality of sensor-specific features from the sensed data and generates a motion activity vector, a voice activity vector, and a spatial environment vector from the sensor-specific features. Processing the motion activity vector, the voice activity vector, and the spatial environment vector to determine a basic level context of the electronic device relative to its surroundings, wherein the basic level context has a plurality of aspects, each aspect based on the motion activity vector, the voice activity vector, and the spatial environment vector. Determining a meta-level context of the electronic device relative to its surroundings from the basic-level context, wherein the meta-level context is at least one inference made from at least two of the plurality of aspects of the basic-level context.
Another aspect relates to an electronic device that includes a Printed Circuit Board (PCB) having at least one conductive trace thereon and a system on a chip (SoC) mounted on the PCB and electrically coupled to the at least one conductive trace. A sensor chip is mounted on the PCB in spaced relation to the SoC and electrically coupled to the at least one conductive trace such that the sensor chip and the SoC are electrically coupled. The sensor chip is configured to collect sensed data.
The sensor chip may include a micro-electro-mechanical system (MEMS) sensing unit and an embedded processing node. The embedded processing node may be configured to pre-process the sensed data, extract sensor-specific features from the sensed data, and generate an athletic activity posterior probability, a voice activity posterior probability, and a spatial environment posterior probability from the sensor-specific features. The embedded processing node may further process the athletic activity posterior probability, the voice activity posterior probability, and the spatial environment posterior probability to determine a base level context of the electronic device relative to its surroundings, the base level context having a plurality of aspects, each aspect based on the athletic activity posterior probability, the voice activity posterior probability, and the spatial environment posterior probability. The processing node may also determine a meta-level context of the electronic device relative to its surroundings from the basic-level context, wherein the meta-level context is at least one inference made from at least two of the plurality of aspects of the basic-level context.
One method aspect includes acquiring sensing data from a sensing unit, extracting, using a computing device, a plurality of sensor-specific features from the sensing data, and generating, using the computing device, a motion activity vector, a speech activity vector, and a spatial environment vector from the sensor-specific features. The method continues with processing, using the computing device, the motion activity vector, the voice activity vector, and the spatial environment vector to determine a base level context of the electronic device relative to its surroundings, the base level context having a plurality of aspects, each aspect based on the motion activity vector, the voice activity vector, and the spatial environment vector. A meta-level context of the electronic device relative to its surroundings can be determined from the basic-level context, wherein the meta-level context is at least one inference made from at least two of the plurality of aspects of the basic-level context.
Drawings
FIG. 1 is a block diagram of an electronic device configured to determine a contextual awareness of a user of the electronic device in accordance with the present disclosure.
FIG. 2 is a flow chart of a method for obtaining an a posteriori estimate of the probability of a basic level representation of context awareness of a user of the electronic device of FIG. 1.
Fig. 3 shows a basic level representation of the context awareness (in terms of information about activity, speech, and environmental classes grouped into three independent vectors) of a mobile device user as determined by the electronic device of fig. 1 and a meta level context awareness inferred from this message.
Fig. 4 depicts the motion activity posterior probability generated from the motion activity vector of fig. 3.
FIG. 5 depicts a voice activity posterior probability generated from the voice activity vector of FIG. 3.
Fig. 6 is a graph of a time evolution of an athletic activity posterior probability generated using accelerometer data classified as ambulatory activity.
FIG. 7 is a graph of a time evolution of the posterior probability of athletic activity generated using an accelerometer classified as an activity ascending stairs.
FIG. 8 illustrates two methods of data fusion from multiple sensors for determining probabilistic context awareness.
FIG. 9 is a graph of a time evolution of the posterior probability of athletic activity generated using fusion of accelerometer and pressure sensor data for activities classified as walking.
Fig. 10 is a graph of the time evolution of the posterior probability of athletic activity generated using the fusion of accelerometer and pressure sensor data for activities classified as ascending stairs.
Fig. 11 lists a confusion matrix obtained for the athletic activity class using a probabilistic athletic activity posterior probability output generated using features obtained from accelerometer and barometer data.
FIG. 12 is a block diagram of a method of meta-level context-aware embedded application development using motion activity posterior probability, voice activity posterior probability, and spatial environment posterior probability.
Fig. 13 shows two screen shots of a smartphone application that calculates the posterior probability of athletic activity and displays its temporal evolution.
Detailed Description
In the following description, numerous details are set forth in order to provide an understanding of the present disclosure. However, it will be understood by those skilled in the art that the embodiments of the present disclosure may be practiced without these details and that numerous variations or modifications from the described embodiments may be possible.
As will be described in detail herein, the present disclosure relates to an algorithmic framework for determining mobile device user context in the form of motion activity, voice activity, and spatial environment using single sensor data and multi-sensor data fusion. In particular, the algorithmic framework provides probabilistic information about motion activity, voice activity, and spatial environment through heterogeneous sensor measurements, which may include data from accelerometers, barometers, gyroscopes, and microphones (but not limited to these sensors) embedded on mobile devices. The computing architecture allows for combining probabilistic outputs in a number of ways in order to infer meta-level context-aware information about a mobile device user.
Referring first to FIG. 1, an electronic device 100 is now described. The electronic device 100 may be a smartphone, tablet computer, smart watch, activity tracker, or other wearable device. The electronic device 100 includes a Printed Circuit Board (PCB)99 on which various components are mounted. Conductive traces 97 printed on the PCB 99 are used to electrically couple these various components in a desired manner.
Mounted on the PCB 99 is a system-on-a-chip (SoC)150 that includes a Central Processing Unit (CPU)152 coupled to a Graphics Processing Unit (GPU) 154. Memory block 140, optional transceiver 160, and touch-sensitive display 130 are coupled to SoC150, via which SoC150 may wirelessly communicate with a remote server over the internet, via which SoC150 may display output and receive input. Coupled to SoC150 is a sensor unit 110 that includes a three-axis accelerometer 111 for determining accelerations experienced by electronic device 100, a microphone 112 for detecting audible noise in the environment, a barometer 116 for determining atmospheric pressure in the environment (and thus, indicating the altitude of electronic device 100), a barometer 113 for determining the angular rate and thus orientation (roll, pitch, or yaw) of electronic device 100 relative to the environment, a magnetometer 118 for determining the angular rate and thus orientation (roll, pitch, or yaw) of electronic device 100 relative to the environment, and a proximity sensor 119 for determining the ambient light level in the environment in which electronic device 100 is located, SoC150 may communicate with a remote server over the internet via the WiFi transceiver, SoC150 may determine the geospatial location of electronic device 100 via the GPS receiver, and the light sensor is for determining the ambient light level in the environment in which electronic device 100 is located, the magnetometer is used to determine the strength of magnetic fields in the environment and thereby the orientation of the electronic device 100, and the proximity sensor is used to determine the proximity of a user with respect to the electronic device 100.
Sensor unit 110 is configurable and mounted on PCB 99 spaced apart from SoC150, and its various sensors are coupled to the SoC by conductive traces 97. Some of the sensors of the sensor unit 110 may form a MEMS sensing unit 105, which may include any sensor that can be implemented in MEMS, such as an accelerometer 111 and a gyroscope 114.
The sensor unit 110 may be formed of discrete components and/or integrated components and/or a combination of discrete and integrated components, and may be formed as a package. It should be understood that the sensors shown as part of the sensor unit 110 are each optional, and that some of the sensors shown may be used, and some of the sensors shown may be omitted.
It should be understood that the configurable sensor unit 110 or the MEMS sensing unit 105 is not part of the SoC150, but is a separate and distinct component from the SoC 150. In practice, the sensor unit 110 or MEMS sensor unit 105 and SoC150 may be separate, distinct, mutually exclusive structures or packages mounted on the PCB 99 at different locations and coupled together via conductive traces 97 as shown. In other applications, the sensor unit 110 or the MEMS sensor unit 105 and the SoC150 may be contained using a single package, or may have any other relationship suitable for each other. Further, in some applications, the sensor unit 110 or the MEMS sensor unit 105 and the processing node 120 may be collectively considered as the sensor chip 95.
Each sensor of sensor unit 110 collects signals, performs signal conditioning, and presents digitized output at different sampling rates. A single one of these sensors may be used, or multiple ones of these sensors may be used. The multi-channel digital sensor data from the sensors of the sensor unit 110 is passed to the processing node 120. Processing node 120 performs various signal processing tasks. First, the pre-processing steps of filtering and down-sampling the multi-channel sensor data are completed (block 121), and then time synchronization between different data channels when using sensor data from multiple sensors is performed (block 122). Sensor data obtained from a single sensor or multiple sensors is then buffered into a frame using overlapping/sliding time domain windows (block 123). Sensor-specific features are extracted from the data frame and given as output to a probabilistic classifier routine (block 124).
In a probabilistic classifier routine, Motion Activity Vectors (MAVs), Voice Activity Vectors (VAVs), and Spatial Environment Vectors (SEVs) are generated from these sensor-specific features. These vectors are then processed to form a posterior probability from each vector (block 125). The pattern library of probabilistic classifiers is used to obtain three posterior probabilities based on the vector and stored in memory block 140 or in cloud 170 accessed over the internet. Using these pattern libraries, a basic level context aware a posteriori probability is obtained for each data frame, which can be used to make inferences about the basic level or meta level context of the electronic device 100 (block 126). Display 130 may be used to present such inferences and intermediate results, as desired.
Therefore, the motion activity posterior probability is generated from the motion activity vector, and represents the probability that each element of the motion activity vector changes according to time. A voice activity posterior probability is generated from the voice activity vector and represents a probability that each element of the voice activity vector changes according to time. A spatial environment posterior probability is generated from the spatial environment vector, the spatial environment posterior probability representing a probability that each element of the spatial environment vector changes according to time. The sum of each probability of the athletic activity posterior probabilities at any given time is equal to one (i.e., 100%). Similarly, the sum of each probability of the speech activity a posteriori probabilities at any given time is equal to one, and the sum of each probability of the spatial environment a posteriori probabilities at any given time is equal to one.
The basic level context has a plurality of aspects, each aspect based on a motion activity vector, a speech activity vector, and a spatial environment vector. Each aspect of the basic level context based on the motion activity vector is mutually exclusive to each other, each aspect of the basic level context based on the speech activity vector is mutually exclusive to each other, and each aspect of the basic level context based on the spatial environment vector is mutually exclusive to each other.
One of these aspects of the basic level scenario is the movement pattern of the user carrying the electronic device. Further, one of these aspects of the basic level context is the nature of the biologically generated sound within audible distance of the user. Furthermore, one of these aspects of the basic level scenario is the nature of the physical space around the user.
Examples of multiple classes of motion patterns, properties of biologically generated sounds, properties of physical space will now be given, although it is understood that the present disclosure contemplates and is intended to encompass any such classes.
Different categories of motion patterns may include user standing still, walking, going up stairs, going down stairs, jogging, cycling, climbing, using a wheelchair, and riding a vehicle. The different classes of determined properties of the biologically generated sound may include that the user is engaged in a telephone conversation, that the user is engaged in a multi-party conversation, that the user is speaking, that another party is speaking, that a background conversation occurs around the user, and that an animal utters a sound. The different categories of properties of the physical space around the user may include an office environment, a home environment, a mall environment, a street environment, a stadium environment, a restaurant environment, a bar environment, a beach environment, a natural environment, a temperature of the physical space, an air pressure of the physical space, and a humidity of the physical space.
Each vector has a class of "none of these are" which means the remaining classes in each vector are not explicitly incorporated as elements. This allows the sum of the probabilities of the elements of the vector to be equal to one, i.e. mathematically related. Also, this makes the vector representation flexible, so that new classes can be explicitly incorporated in the corresponding vector as needed, and this will simply change the composition of the "none of these" classes for that vector.
A meta-level context represents an inference made from a combination of probabilities of two or more classes of posterior probabilities. For example, the meta-level context may be that the user of the electronic device 100 is walking in a mall or busy in a telephone conversation in an office.
The processing node 120 may communicate the determined base-level context and the meta-level context to the SoC150, which may perform at least one contextual function of the electronic device 100 according to the base-level context or the meta-level context of the electronic device.
Fig. 3 shows the derivation of basic level context awareness from time-dependent information about the activity/environment class in each of the three vectors. Meta-level context awareness is derived from time-stamped information available from one or more of these base-level vectors and information stored in mobile device memory 140 or cloud 170 (e.g., schema library and database). The following introduces a desirable form of representing this information useful in application development related to base-level and meta-level context awareness.
The method for representing information is in the form of a probability that the class of vectors (motion activity, speech activity, and spatial environment) changes according to time, given the observations from one sensor or multiple sensors. This general information representation can be used to solve several application problems, such as detecting possible events from each vector in a time frame. These can be estimated as the posterior probabilities that each element of the MAV, VAV and SEV vectors is adjusted at a given time according to "observations", which are features derived from the sensor data records. The respective vectors of probability values are the corresponding "a posteriori probabilities", i.e. the motion activity a posteriori probability (MAP), the voice activity a posteriori probability (VAP) and the spatial environment a posteriori probability (SEP) of the processed output of the base level context awareness information.
Fig. 4 shows the probability of an element of the MAP comprising MAVs as a function of time, feature estimates derived from time-window observation data. The probability of the motion activity class is estimated from time window data obtained from one or more of the various sensors. Some of the models that may be used are i) Hidden Markov Models (HMMs), ii) Gaussian Mixture Models (GMMs), iii) Artificial Neural Networks (ANN) that produce probabilistic outputs for each class, and iv) multi-class probabilistic Support Vector Machines (SVMs) that incorporate Directed Acyclic Graphs (DAGs) and Voting (Maximum Wins Voting (MWV)). For each athletic activity class, the model parameters are trained using supervised learning from a training database that includes annotated data from all sensors to be used.
The number of sensors used to obtain the MAP depends on a number of factors, such as the number of available sensors on the mobile device 100, energy consumption constraints for the task, accuracy of the estimation, and so forth. When more than one sensor is used, different methods may be used to estimate MAP. One particularly useful method for fusing data from up to K different sensors to estimate MAP is shown in FIG. 4. In this method, sensor-specific features are extracted from the time window data from the corresponding sensor, and these features from the sensor are used to obtain the MAP.
Fig. 5 shows the probability that the VAP and SEP include feature estimates, derived from time-varying time-window observations received from microphone 112, which may be the beamformed output of such microphone array, for the elements of the VAV and SEV, respectively. With regard to MAP, probabilities are obtained from each active model (e.g., HMM, GMM, ANN, and multi-class probabilistic SVM incorporating DAG or MWV that produce probabilistic outputs for each class). For each athletic activity class, the model parameters are trained using supervised learning from a training database that includes annotated data from all sensors to be used.
The MAP based on tri-axial accelerometer data for a "walking" athletic activity of 150 seconds duration is shown in fig. 6. The tri-axial accelerometer data is sampled at 50Hz and a five second time window data frame is extracted. Successive frames are obtained by shifting the time window by two seconds. The amplitude of the three channel data is used to extract 17-dimensional features per frame. These features include the maximum number, minimum number, mean, root mean square, three cumulative features, and 10 th order linear prediction coefficients. The probability of each activity is estimated from the multi-class probabilistic SVM frame incorporating the DAG. For athletic activity in the MAV, a multi-class probabilistic SVM-DAG model of the MAP graph in fig. 6 is trained from the tri-axial accelerometer data using supervised learning from a training database that includes time-synchronized multi-sensor data from tri-axial accelerometer 111, barometer 113, tri-axial gyroscope 114, microphone 112, and tri-axial magnetometer 118.
The temporal evolution of a posteriori probability information as shown for MAP in fig. 6 is a general representation of context aware information at the basic level. It provides the probability of a class in an activity/context vector at a given time and shows its evolution over time. The following silence features of this representation format are relevant:
at any given time, the sum of the probabilities for all classes equals one; and is
At any given time, the activity/context classification is performed from the corresponding a posteriori probabilities, supporting the most probable class, thus providing a hard decision.
The "confidence" in the classification result, such as the difference between the maximum probability value and the second highest probability value, may be obtained from different measurements. The greater the difference between the two probability values, the greater confidence in the accuracy of the decoded class should be.
It is observed from fig. 6 that the probability of walking is highest compared to the probabilities of all other athletic activities, which results in correct classification at almost all times in the graph. The classification result is erroneous in two small time intervals, where the correct activity is misclassified as "stair-stepping".
Another time evolution of MAP based on tri-axial accelerometer data for a 30 second duration "stair climbing" athletic activity is shown in fig. 7. It can be seen that the maximum probability class at each time instant varies between "stair-climbing," "walking," and some other athletic activity. Therefore, decoding motion activity will be erroneous at those times, where the "stair up" class does not have the maximum probability. Also, the maximum probability at each time instant is lower than the "walking" activity shown in the MAP of fig. 6 and closer to the next highest probability. From this it can be deduced that the "confidence" in the accuracy of the decoded class is lower than in the "walking" activity case of fig. 6.
FIG. 8 shows two methods of data fusion from multiple sensors. The first method involves concatenating the features obtained from each sensor to form a composite feature vector. This feature vector is then given as input to the probabilistic classifier. The second method is based on bayesian theory. Suppose observation ZK={Z1,…,ZKIn which Z isiIs the feature vector for sensor number i. Bayes' theorem considers the following: given a particular class, the slave sensor SiCharacteristic vector Z ofiCollected information and slave sensor SjCharacteristic vector Z ofjThe information obtained is irrelevant. That is, P (Z)i,ZjClass IL)=P(ZiClass IL).P(ZjClass IL) Given this kind, it gives the joint probability of feature vectors from multiple sensors. Bayesian theorem is then used to fuse the data from multiple sensors to obtain the posterior probability.
FIG. 2 depicts a flow diagram of a method for determining probabilistic context awareness for a mobile device user using single-sensor and multi-sensor data fusion. Make SiDenotes the ith sensor, where i ═ 1,2, … K, and K is the total number of sensors used (block 202). The sensor providing input data si(m), where i is the sensor number from 1 to K, and m is the discrete time index. Preprocessed time-aligned data si(m) is segmented into a plurality of fixed duration frames xi(n) (block 204).
Thereafter, sensor-specific features are extracted and grouped into a plurality of vectors (block 206). Let z bef iIs a feature f, which is data x from the ith sensori(n) is extracted. Compound special materialThe eigenvector is the pass Zi=[z1 i,z2 i,…,zFi i]' given Zi. Composite feature vector for n sensors
Figure GDA0001133431440000111
And (4) showing. For basic level context detection, the following features are extracted.
i.MAV:
a. An accelerometer: maximum number, minimum number, mean, root mean square, 3 cumulative characteristics, and 10 th order linear prediction coefficients.
These three cumulative characteristics are as follows:
1. average minimum number: is defined as xiAverage of the first 15% of (n).
2. Average median number: is defined as xiAverage between 30% and 40% of (n).
3. Average maximum number: is defined as xiAverage of (n) between 95% and 100%.
b. A pressure sensor: maximum number, minimum number, mean, slope, and 6 th order linear prediction coefficients.
c. A gyroscope: maximum number, minimum number, mean, root mean square, 3 cumulative characteristics, and 10 th order linear prediction coefficients.
d. A microphone: concatenated 10 th order linear prediction coefficients, zero crossing rate and short time energy.
VAV and SEV:
a. a microphone: 13 mel-frequency cepstral coefficients (MFCCs), 13 differential MFCCs, and 13 double differential MFCCs.
b. Microphone array: 13 MFCCs, 13 differential MFCCs, and 13 double differential MFCCs.
The feature vectors are given as inputs to a probabilistic classifier, such as a multiclass probabilistic SVM-DAG (block 208). The obtained outputs are the corresponding a posteriori probabilities viz, MAP, VAP and SEP of the corresponding base level context awareness vectors MAV, VAV, SEV (block 212). The posterior probability is [ P (class)1/ZK) P (class)2/ZK) ,., P (class)L/ZK)]' ofForm (L) wherein L is the number of classes in the MAV/VAV/SEV.
Fig. 9 and 10 show MAPs using data from two sensors, such as a three-axis accelerometer and a barometer. The 17 features from the tri-axial accelerometer listed above are used and one feature (i.e., the time slope of the pressure over a 5 second frame estimated using the least squares method) is used together in the multi-class probabilistic SVM-DAG model of the 18-dimensional input to obtain the probability for each active class. Comparing fig. 6 with fig. 9, it can be seen that one of the two false decision intervals when only accelerometer data is used is corrected using barometer data fusion. The effect of the fusion of accelerometer data with barometer data is evident in the comparison of fig. 6 and 9, respectively, where all incorrect decisions using accelerometer sensor data are corrected when the accelerometer data is fused with barometer data. Additional input from the pressure sensor can correctly disambiguate "stair-up" activity from "walking" and other activities.
The performance of the 9 classes of athletic activity classifiers using probabilistic MAP outputs is shown in FIG. 11 in the form of a confusion matrix. The classification is based on a fusion of 18 features obtained from accelerometer data and barometer data obtained from a smartphone. MAP is obtained using a multi-class probabilistic SVM-DAG model that was previously trained based on user data. Performance results have been obtained using leave-one-out on data from 10 subjects. The rows in the confusion matrix give the true motion activity class and the columns give the decoding activity class. Thus, the diagonal values represent the percentage of correct decisions for the corresponding class, while the non-diagonal values represent incorrect decisions. The total percentage of correct decisions obtained for the 9 activity classes was 95.16%.
The single-sensor data and/or the multi-sensor fused data are used to derive probabilistic output on basic-level context-aware information. This general algorithmic framework for basic level context awareness is extensible such that it may also include more motion and voice activity classes and spatial environment contexts in probabilistic output formats as needed. These corresponding a posteriori probability outputs may be integrated over time to provide a more accurate, but delayed, decision regarding activity or environmental class. The algorithmic framework allows integrating additional a posteriori probabilities for other classes of detection tasks derived from the same sensor or additional sensors.
The posterior probability output of motion or voice activity and spatial environment classes can be used to perform meta-level probabilistic analysis and develop embedded applications for context awareness as shown in fig. 12. For example, an inference of "walking" activity class from MAP and a "mall" class from SEP may together draw a meta-level inference: the user is walking in a mall. Probabilistic information in the three a posteriori probabilities can be used as input to a meta-level context-aware classifier, on which more advanced applications can be built.
Fig. 13 shows a snapshot of an application developed using Java for an android OS based smartphone. The user interface of the application includes start, stop, and pause buttons as shown in the snapshot on the left for calculating a posterior probability in real time, logging its time evolution, and displaying them graphically in real time for up to 40 past frames. The snapshot on the right shows the MAPs of the 9 athletic activity classes as a function of time. It also displays the decoded class of the current frame from the maximum probability value. The total duration of time the user spent in each athletic activity class since the start of the application is also shown. The application determines the athletic activity posterior probability using a fusion of accelerometer, barometer, and gyroscope data. The number of features varies depending on the number of sensors used. The posterior probability is evaluated using one of three methods: i) multi-class probabilistic SVMs in conjunction with a DAG, ii) multi-class probabilistic SVMs in conjunction with MWVs, and iii) multi-class SVMs that produce hard decision outputs. The real-time graphical display of the probability values of all classes also gives a quick visual depiction of the "confidence" of the classification result as the most probable class by comparing the second highest probability class.
Although the foregoing description has been described herein with reference to particular means, materials and embodiments, it is not intended to be limited to the particulars disclosed herein; but rather extends to all functionally equivalent structures, methods and uses, such as are within the scope of the appended claims.

Claims (20)

1. An electronic device, comprising:
a sensing unit comprising at least one sensor and configured to acquire sensing data;
a computing device configured to:
extracting a plurality of sensor-specific features from the sensed data;
generating a motion activity vector, a voice activity vector, and a spatial environment vector from the plurality of sensor-specific features;
processing the motion activity vector, the voice activity vector, and the spatial environment vector to determine a basic level context of the electronic device relative to its surroundings, the basic level context having a plurality of aspects, each aspect being based on the motion activity vector, the voice activity vector, and the spatial environment vector, and
determining a meta-level context of the electronic device relative to its surroundings from the basic-level context, wherein the meta-level context comprises at least one inference made from at least two of the plurality of aspects of the basic-level context.
2. The electronic device of claim 1, wherein each aspect of the basic level context based on the motion activity vector is mutually exclusive from each other; wherein each aspect of the base level context based on the voice activity vector is mutually exclusive from each other; and wherein each aspect of the base level context based on the spatial environment vector is mutually exclusive from each other.
3. The electronic device of claim 1, wherein the aspects of the basic level context consist of one aspect based on the motion activity vector, one aspect based on the voice activity vector, and one aspect based on the spatial environment vector.
4. The electronic device of claim 1, wherein the computing device is further configured to cause performance of at least one contextual function of the electronic device in accordance with the meta-level context of the electronic device.
5. The electronic device of claim 1, wherein, according to the motion activity vector, one of the aspects of the basic level context is determined to be a motion pattern of a user carrying the electronic device; wherein, according to the voice activity vector, one of the aspects of the basic level context is determined to be a property of a biologically generated sound within an audible distance of the user; and wherein one of the aspects of the base level context is determined to be a property of a physical space surrounding the user according to the spatial environment vector.
6. The electronic device of claim 5, wherein the determined movement pattern of the user comprises one of: the user is stationary, walking, ascending stairs, descending stairs, jogging, cycling, climbing, using a wheelchair, and riding a vehicle; wherein the determined property of the biologically generated sound comprises one of: said user is engaged in a telephone conversation, said user is engaged in a multi-party conversation, said user is speaking, another party is speaking, a background conversation occurs around said user, and an animal makes a sound; and wherein the determined property of the physical space around the user comprises an office environment, a home environment, a mall environment, a street environment, a stadium environment, a restaurant environment, a bar environment, a beach environment, a physical environment, a temperature of the physical space, an air pressure of the physical space, and a humidity of the physical space.
7. The electronic device of claim 1, wherein the computing device is configured to process the motion activity vector, the voice activity vector, and the spatial environment vector by:
generating a motion activity posterior probability according to the motion activity vector, wherein the motion activity posterior probability represents the probability of each element of the motion activity vector changing according to time;
generating a voice activity posterior probability according to the voice activity vector, wherein the voice activity posterior probability represents the probability of each element of the voice activity vector changing according to time; and is
And generating a space environment posterior probability according to the space environment vector, wherein the space environment posterior probability represents the probability of each element of the space environment vector changing according to time.
8. The electronic device of claim 7, wherein the sum of each probability of the athletic activity posterior probability at any given time is equal to one; wherein the sum of each of the probabilities of the voice activity a posteriori at any given time is equal to one; and wherein the sum of each probability of the spatial environment posterior probability at any given time is equal to one.
9. The electronic device of claim 7, wherein the sensing unit consists essentially of one sensor.
10. The electronic device of claim 7, wherein the sensing unit comprises a plurality of sensors; and wherein the motion activity vector, the voice activity vector, and the spatial environment vector are generated from a fusion of the plurality of sensor-specific features.
11. The electronic device of claim 10, wherein the plurality of sensors includes at least two of an accelerometer, a pressure sensor, a microphone, a gyroscope, a magnetometer, a GPS unit, and a barometer.
12. The electronic device of claim 1, further comprising a Printed Circuit Board (PCB) having at least one conductive trace thereon; further comprising a system on a chip (SoC) mounted on the PCB and electrically coupled to the at least one conductive trace; and wherein the computing device comprises a sensor chip mounted on the PCB in spaced relation to the SoC and electrically coupled to the at least one conductive trace such that the sensor chip and the SoC are electrically coupled; and wherein the sensor chip comprises a micro-electro-mechanical system (MEMS) sensing unit and control circuitry configured to perform the extracting, the generating, the processing, and the determining.
13. An electronic device, comprising:
a Printed Circuit Board (PCB) having at least one conductive trace thereon;
a system on a chip (SoC) mounted on the PCB and electrically coupled to the at least one conductive trace;
a sensor chip mounted on the PCB in spaced relation to the SoC and electrically coupled to the at least one conductive trace such that the sensor chip and the SoC are electrically coupled and configured to acquire sensed data;
wherein the sensor chip comprises:
a micro-electro-mechanical system (MEMS) sensing unit;
an embedded processing node configured to:
the sensed data is pre-processed in such a way that,
extracting a plurality of sensor-specific features from the sensed data,
generating an exercise activity posterior probability, a voice activity posterior probability, and a spatial environment posterior probability based on the plurality of sensor-specific features,
processing the athletic activity posterior probability, the voice activity posterior probability, and the spatial environment posterior probability to determine a base level context of the electronic device relative to its surroundings, the base level context having a plurality of aspects, each aspect based on the athletic activity posterior probability, the voice activity posterior probability, and the spatial environment posterior probability, and
determining a meta-level context of the electronic device relative to its surroundings from the basic-level context and a schema library stored in a cloud or local memory, wherein the meta-level context comprises at least one inference made from at least two of the plurality of aspects of the basic-level context.
14. The electronic device of claim 13, further comprising at least one additional sensor; wherein the SoC is configured to acquire additional data from the at least one additional sensor; wherein the embedded processing node is further configured to receive the additional data from the SoC and also extract the plurality of sensor-specific features from the additional data.
15. The electronic device of claim 13, wherein the embedded processing node is configured to generate the motion activity a posteriori probability, the voice activity a posteriori probability, and the spatial environment a posteriori probability to represent a probability of each element of a motion activity vector, a voice activity vector, and a spatial environment vector, respectively, varying according to time.
16. The electronic device of claim 13, wherein the sum of each probability of the athletic activity posterior probability at any given time is equal to one; wherein the sum of each of the probabilities of the voice activity a posteriori at any given time is equal to one; and wherein the sum of each probability of the spatial environment posterior probability at any given time is equal to one.
17. The electronic device of claim 13, wherein the sensor chip consists essentially of one MEMS sensing unit.
18. The electronic device of claim 13, wherein the sensor chip comprises a plurality of MEMS sensing units; and wherein the athletic activity posterior probability, the voice activity posterior probability, and the spatial environment posterior probability are generated from a fusion of the plurality of sensor-specific features.
19. A method, comprising:
collecting sensing data from a sensing unit;
extracting, using a computing device, a plurality of sensor-specific features from the sensed data;
generating, using the computing device, a motion activity vector, a voice activity vector, and a spatial environment vector from the plurality of sensor-specific features;
processing, using the computing device, the motion activity vector, the voice activity vector, and the spatial environment vector to determine a base-level context of the electronic device relative to its surroundings, the base-level context having a plurality of aspects, each aspect based on the motion activity vector, the voice activity vector, and the spatial environment vector; and is
Determining, using the computing device, a meta-level context of the electronic device relative to its surroundings from the basic-level context, wherein the meta-level context comprises at least one inference made from at least two of the plurality of aspects of the basic-level context.
20. The method of claim 19, wherein the motion activity vector, the speech activity vector, and the spatial environment vector are processed by:
generating a motion activity posterior probability according to the motion activity vector, wherein the motion activity posterior probability represents the probability of each element of the motion activity vector changing according to time;
generating a voice activity posterior probability according to the voice activity vector, wherein the voice activity posterior probability represents the probability of each element of the voice activity vector changing according to time; and is
And generating a space environment posterior probability according to the space environment vector, wherein the space environment posterior probability represents the probability of each element of the space environment vector changing according to time.
CN201610466435.4A 2016-03-18 2016-06-23 Method and apparatus for determining probabilistic content awareness for mobile device users using single and/or multi-sensor data fusion Active CN107203259B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/074,188 US9870535B2 (en) 2015-02-26 2016-03-18 Method and apparatus for determining probabilistic context awareness of a mobile device user using a single sensor and/or multi-sensor data fusion
US15/074,188 2016-03-18

Publications (2)

Publication Number Publication Date
CN107203259A CN107203259A (en) 2017-09-26
CN107203259B true CN107203259B (en) 2020-04-24

Family

ID=59904565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610466435.4A Active CN107203259B (en) 2016-03-18 2016-06-23 Method and apparatus for determining probabilistic content awareness for mobile device users using single and/or multi-sensor data fusion

Country Status (1)

Country Link
CN (1) CN107203259B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109375777A (en) * 2018-10-30 2019-02-22 张家口浩扬科技有限公司 A kind of based reminding method and system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107843287B (en) * 2017-10-26 2019-08-13 苏州数言信息技术有限公司 Integrated sensor device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984454A (en) * 2010-11-19 2011-03-09 杭州电子科技大学 Multi-source multi-characteristic information fusion method based on data drive
CN102147468A (en) * 2011-01-07 2011-08-10 西安电子科技大学 Bayesian theory-based multi-sensor detecting and tracking combined processing method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9173574B2 (en) * 2009-04-22 2015-11-03 Rodrigo E. Teixeira Mechanical health monitor apparatus and method of operation therefor
US9916538B2 (en) * 2012-09-15 2018-03-13 Z Advanced Computing, Inc. Method and system for feature detection
US20140275886A1 (en) * 2013-03-14 2014-09-18 Streamline Automation, Llc Sensor fusion and probabilistic parameter estimation method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984454A (en) * 2010-11-19 2011-03-09 杭州电子科技大学 Multi-source multi-characteristic information fusion method based on data drive
CN102147468A (en) * 2011-01-07 2011-08-10 西安电子科技大学 Bayesian theory-based multi-sensor detecting and tracking combined processing method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109375777A (en) * 2018-10-30 2019-02-22 张家口浩扬科技有限公司 A kind of based reminding method and system
CN109375777B (en) * 2018-10-30 2021-12-14 青岛民航凯亚系统集成有限公司 Reminding method and system

Also Published As

Publication number Publication date
CN107203259A (en) 2017-09-26

Similar Documents

Publication Publication Date Title
US10504031B2 (en) Method and apparatus for determining probabilistic context awareness of a mobile device user using a single sensor and/or multi-sensor data fusion
US10309980B2 (en) Fall detection system using a combination of accelerometer, audio input and magnetometer
Lim et al. Fall-detection algorithm using 3-axis acceleration: combination with simple threshold and hidden Markov model
US11467180B2 (en) Context awareness of a smart device through sensing transient and continuous events
US10311303B2 (en) Information processing apparatus, information processing method, and program
Hoseini-Tabatabaei et al. A survey on smartphone-based systems for opportunistic user context recognition
Ashiq et al. CNN-based object recognition and tracking system to assist visually impaired people
US8781991B2 (en) Emotion recognition apparatus and method
AU2015316575A1 (en) Inertial tracking based determination of the position of a mobile device carried by a user in a geographical area
Zhang et al. A comprehensive study of smartphone-based indoor activity recognition via Xgboost
CN108960430B (en) Method and apparatus for generating personalized classifiers for human athletic activities
US10430896B2 (en) Information processing apparatus and method that receives identification and interaction information via near-field communication link
Bieber et al. The hearing trousers pocket: activity recognition by alternative sensors
US10721347B2 (en) Detecting patterns and behavior to prevent a mobile terminal drop event
CN104823433B (en) Infer in semantically integrating context
De Cillis et al. Indoor positioning system using walking pattern classification
Kim et al. Deep neural network-based indoor emergency awareness using contextual information from sound, human activity, and indoor position on mobile device
CN107203259B (en) Method and apparatus for determining probabilistic content awareness for mobile device users using single and/or multi-sensor data fusion
Chen et al. Detection of falls with smartphone using machine learning technique
Kolobe et al. A review on fall detection in smart home for elderly and disabled people
CN116092193A (en) Pedestrian track reckoning method based on human motion state identification
Chirakkal et al. Exploring smartphone-based indoor navigation: A QR code assistance-based approach
He et al. Application of Kalman filter and k-NN classifier in wearable fall detection device
CN111797655A (en) User activity identification method and device, storage medium and electronic equipment
KR101627533B1 (en) System and method for predicting of user situation, and recording medium for performing the method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant