CN107203259B

CN107203259B - Method and apparatus for determining probabilistic content awareness for mobile device users using single and/or multi-sensor data fusion

Info

Publication number: CN107203259B
Application number: CN201610466435.4A
Authority: CN
Inventors: M·乔达里; A·库马; G·辛; K·R·J·米尔; I·N·卡; R·巴尔
Original assignee: STMICROELECTRONICS INTERNATIONAL NV; STMicroelectronics lnc USA
Current assignee: STMICROELECTRONICS INTERNATIONAL NV; STMicroelectronics lnc USA
Priority date: 2016-03-18
Filing date: 2016-06-23
Publication date: 2020-04-24
Anticipated expiration: 2036-06-23
Also published as: CN107203259A

Abstract

The present disclosure relates to methods and apparatus for determining probabilistic content awareness for a mobile device user using single-sensor and/or multi-sensor data fusion. An electronic device described herein includes a sensing unit having at least one sensor for acquiring sensed data. An associated computing device extracts a plurality of sensor-specific features from the sensed data and generates a motion activity vector, a voice activity vector, and a spatial environment vector from the sensor-specific features. Processing the motion activity vector, the voice activity vector, and the spatial environment vector to determine a basic level context of the electronic device relative to its surroundings, wherein the basic level context has a plurality of aspects, each aspect based on the motion activity vector, the voice activity vector, and the spatial environment vector. Determining a meta-level context of the electronic device relative to its surroundings from the basic-level context, wherein the meta-level context is at least one inference made from at least two of the plurality of aspects of the basic-level context.

Description

Method and apparatus for determining probabilistic content awareness for mobile device users using single and/or multi-sensor data fusion

RELATED APPLICATIONS

This application claims benefit and priority to U.S. application No. 62/121,104 filed on 26/2/2015 and is also a continuation-in-part application to U.S. application No. 14/749,118 filed on 24/6/2015, the contents of both applications being incorporated by reference to the maximum extent allowable under law.

Technical Field

The present disclosure relates to the field of electronic devices, and more particularly to a framework for determining a context of a user of a mobile device based on the user's athletic activity, voice activity, and spatial environment using single-sensor data and/or multi-sensor data fusion.

Background

Mobile devices and wearable devices, such as smartphones, tablet computers, smartwatches, and activity trackers, increasingly carry one or more sensors, such as accelerometers, gyroscopes, magnetometers, barometers, microphones, and GPS receivers, that can be used, alone or in combination, to detect a user's context, such as a user's athletic activity, a user's or speech activity related thereto, and a user's spatial environment. Previous research efforts on athletic activity have considered the classification of basic athletic activity of users, such as walking, jogging, and cycling. Speech detection uses microphone recording to detect human utterances from silence in the presence of background noise and is used in a variety of applications such as audio conferencing, variable rate speech codecs, speech recognition, and echo cancellation. Research studies have been conducted to detect the spatial environment of a mobile device user from an audio recording in order to determine the user's environmental classification, such as in the office, on the street, at the stadium, at the beach, etc.

In most context detection tasks, data from one sensor is used. Accelerometers are typically used for motion activity detection, while microphones are used for voice activity detection and spatial environment detection.

These prior art detection methods provide a deterministic output in the form of a class detected from a set of specific classes of athletic activity or acoustic environment as described above. However, determining the context of the user using such prior art techniques may not be as accurate as desired and, moreover, does not allow for more complex determinations regarding the context of the user. Therefore, further developments in this area are needed.

Disclosure of Invention

This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter.

An electronic device described herein includes a sensing unit having at least one sensor for acquiring sensed data. An associated computing device extracts a plurality of sensor-specific features from the sensed data and generates a motion activity vector, a voice activity vector, and a spatial environment vector from the sensor-specific features. Processing the motion activity vector, the voice activity vector, and the spatial environment vector to determine a basic level context of the electronic device relative to its surroundings, wherein the basic level context has a plurality of aspects, each aspect based on the motion activity vector, the voice activity vector, and the spatial environment vector. Determining a meta-level context of the electronic device relative to its surroundings from the basic-level context, wherein the meta-level context is at least one inference made from at least two of the plurality of aspects of the basic-level context.

Another aspect relates to an electronic device that includes a Printed Circuit Board (PCB) having at least one conductive trace thereon and a system on a chip (SoC) mounted on the PCB and electrically coupled to the at least one conductive trace. A sensor chip is mounted on the PCB in spaced relation to the SoC and electrically coupled to the at least one conductive trace such that the sensor chip and the SoC are electrically coupled. The sensor chip is configured to collect sensed data.

The sensor chip may include a micro-electro-mechanical system (MEMS) sensing unit and an embedded processing node. The embedded processing node may be configured to pre-process the sensed data, extract sensor-specific features from the sensed data, and generate an athletic activity posterior probability, a voice activity posterior probability, and a spatial environment posterior probability from the sensor-specific features. The embedded processing node may further process the athletic activity posterior probability, the voice activity posterior probability, and the spatial environment posterior probability to determine a base level context of the electronic device relative to its surroundings, the base level context having a plurality of aspects, each aspect based on the athletic activity posterior probability, the voice activity posterior probability, and the spatial environment posterior probability. The processing node may also determine a meta-level context of the electronic device relative to its surroundings from the basic-level context, wherein the meta-level context is at least one inference made from at least two of the plurality of aspects of the basic-level context.

One method aspect includes acquiring sensing data from a sensing unit, extracting, using a computing device, a plurality of sensor-specific features from the sensing data, and generating, using the computing device, a motion activity vector, a speech activity vector, and a spatial environment vector from the sensor-specific features. The method continues with processing, using the computing device, the motion activity vector, the voice activity vector, and the spatial environment vector to determine a base level context of the electronic device relative to its surroundings, the base level context having a plurality of aspects, each aspect based on the motion activity vector, the voice activity vector, and the spatial environment vector. A meta-level context of the electronic device relative to its surroundings can be determined from the basic-level context, wherein the meta-level context is at least one inference made from at least two of the plurality of aspects of the basic-level context.

Drawings

FIG. 1 is a block diagram of an electronic device configured to determine a contextual awareness of a user of the electronic device in accordance with the present disclosure.

FIG. 2 is a flow chart of a method for obtaining an a posteriori estimate of the probability of a basic level representation of context awareness of a user of the electronic device of FIG. 1.

Fig. 3 shows a basic level representation of the context awareness (in terms of information about activity, speech, and environmental classes grouped into three independent vectors) of a mobile device user as determined by the electronic device of fig. 1 and a meta level context awareness inferred from this message.

Fig. 4 depicts the motion activity posterior probability generated from the motion activity vector of fig. 3.

FIG. 5 depicts a voice activity posterior probability generated from the voice activity vector of FIG. 3.

Fig. 6 is a graph of a time evolution of an athletic activity posterior probability generated using accelerometer data classified as ambulatory activity.

FIG. 7 is a graph of a time evolution of the posterior probability of athletic activity generated using an accelerometer classified as an activity ascending stairs.

FIG. 8 illustrates two methods of data fusion from multiple sensors for determining probabilistic context awareness.

FIG. 9 is a graph of a time evolution of the posterior probability of athletic activity generated using fusion of accelerometer and pressure sensor data for activities classified as walking.

Fig. 10 is a graph of the time evolution of the posterior probability of athletic activity generated using the fusion of accelerometer and pressure sensor data for activities classified as ascending stairs.

Fig. 11 lists a confusion matrix obtained for the athletic activity class using a probabilistic athletic activity posterior probability output generated using features obtained from accelerometer and barometer data.

FIG. 12 is a block diagram of a method of meta-level context-aware embedded application development using motion activity posterior probability, voice activity posterior probability, and spatial environment posterior probability.

Fig. 13 shows two screen shots of a smartphone application that calculates the posterior probability of athletic activity and displays its temporal evolution.

Detailed Description

In the following description, numerous details are set forth in order to provide an understanding of the present disclosure. However, it will be understood by those skilled in the art that the embodiments of the present disclosure may be practiced without these details and that numerous variations or modifications from the described embodiments may be possible.

As will be described in detail herein, the present disclosure relates to an algorithmic framework for determining mobile device user context in the form of motion activity, voice activity, and spatial environment using single sensor data and multi-sensor data fusion. In particular, the algorithmic framework provides probabilistic information about motion activity, voice activity, and spatial environment through heterogeneous sensor measurements, which may include data from accelerometers, barometers, gyroscopes, and microphones (but not limited to these sensors) embedded on mobile devices. The computing architecture allows for combining probabilistic outputs in a number of ways in order to infer meta-level context-aware information about a mobile device user.

Referring first to FIG. 1, an electronic device 100 is now described. The electronic device 100 may be a smartphone, tablet computer, smart watch, activity tracker, or other wearable device. The electronic device 100 includes a Printed Circuit Board (PCB)99 on which various components are mounted. Conductive traces 97 printed on the PCB 99 are used to electrically couple these various components in a desired manner.

Mounted on the PCB 99 is a system-on-a-chip (SoC)150 that includes a Central Processing Unit (CPU)152 coupled to a Graphics Processing Unit (GPU) 154. Memory block 140, optional transceiver 160, and touch-sensitive display 130 are coupled to SoC150, via which SoC150 may wirelessly communicate with a remote server over the internet, via which SoC150 may display output and receive input. Coupled to SoC150 is a sensor unit 110 that includes a three-axis accelerometer 111 for determining accelerations experienced by electronic device 100, a microphone 112 for detecting audible noise in the environment, a barometer 116 for determining atmospheric pressure in the environment (and thus, indicating the altitude of electronic device 100), a barometer 113 for determining the angular rate and thus orientation (roll, pitch, or yaw) of electronic device 100 relative to the environment, a magnetometer 118 for determining the angular rate and thus orientation (roll, pitch, or yaw) of electronic device 100 relative to the environment, and a proximity sensor 119 for determining the ambient light level in the environment in which electronic device 100 is located, SoC150 may communicate with a remote server over the internet via the WiFi transceiver, SoC150 may determine the geospatial location of electronic device 100 via the GPS receiver, and the light sensor is for determining the ambient light level in the environment in which electronic device 100 is located, the magnetometer is used to determine the strength of magnetic fields in the environment and thereby the orientation of the electronic device 100, and the proximity sensor is used to determine the proximity of a user with respect to the electronic device 100.

Sensor unit 110 is configurable and mounted on PCB 99 spaced apart from SoC150, and its various sensors are coupled to the SoC by conductive traces 97. Some of the sensors of the sensor unit 110 may form a MEMS sensing unit 105, which may include any sensor that can be implemented in MEMS, such as an accelerometer 111 and a gyroscope 114.

The sensor unit 110 may be formed of discrete components and/or integrated components and/or a combination of discrete and integrated components, and may be formed as a package. It should be understood that the sensors shown as part of the sensor unit 110 are each optional, and that some of the sensors shown may be used, and some of the sensors shown may be omitted.

It should be understood that the configurable sensor unit 110 or the MEMS sensing unit 105 is not part of the SoC150, but is a separate and distinct component from the SoC 150. In practice, the sensor unit 110 or MEMS sensor unit 105 and SoC150 may be separate, distinct, mutually exclusive structures or packages mounted on the PCB 99 at different locations and coupled together via conductive traces 97 as shown. In other applications, the sensor unit 110 or the MEMS sensor unit 105 and the SoC150 may be contained using a single package, or may have any other relationship suitable for each other. Further, in some applications, the sensor unit 110 or the MEMS sensor unit 105 and the processing node 120 may be collectively considered as the sensor chip 95.

Each sensor of sensor unit 110 collects signals, performs signal conditioning, and presents digitized output at different sampling rates. A single one of these sensors may be used, or multiple ones of these sensors may be used. The multi-channel digital sensor data from the sensors of the sensor unit 110 is passed to the processing node 120. Processing node 120 performs various signal processing tasks. First, the pre-processing steps of filtering and down-sampling the multi-channel sensor data are completed (block 121), and then time synchronization between different data channels when using sensor data from multiple sensors is performed (block 122). Sensor data obtained from a single sensor or multiple sensors is then buffered into a frame using overlapping/sliding time domain windows (block 123). Sensor-specific features are extracted from the data frame and given as output to a probabilistic classifier routine (block 124).

In a probabilistic classifier routine, Motion Activity Vectors (MAVs), Voice Activity Vectors (VAVs), and Spatial Environment Vectors (SEVs) are generated from these sensor-specific features. These vectors are then processed to form a posterior probability from each vector (block 125). The pattern library of probabilistic classifiers is used to obtain three posterior probabilities based on the vector and stored in memory block 140 or in cloud 170 accessed over the internet. Using these pattern libraries, a basic level context aware a posteriori probability is obtained for each data frame, which can be used to make inferences about the basic level or meta level context of the electronic device 100 (block 126). Display 130 may be used to present such inferences and intermediate results, as desired.

Therefore, the motion activity posterior probability is generated from the motion activity vector, and represents the probability that each element of the motion activity vector changes according to time. A voice activity posterior probability is generated from the voice activity vector and represents a probability that each element of the voice activity vector changes according to time. A spatial environment posterior probability is generated from the spatial environment vector, the spatial environment posterior probability representing a probability that each element of the spatial environment vector changes according to time. The sum of each probability of the athletic activity posterior probabilities at any given time is equal to one (i.e., 100%). Similarly, the sum of each probability of the speech activity a posteriori probabilities at any given time is equal to one, and the sum of each probability of the spatial environment a posteriori probabilities at any given time is equal to one.

The basic level context has a plurality of aspects, each aspect based on a motion activity vector, a speech activity vector, and a spatial environment vector. Each aspect of the basic level context based on the motion activity vector is mutually exclusive to each other, each aspect of the basic level context based on the speech activity vector is mutually exclusive to each other, and each aspect of the basic level context based on the spatial environment vector is mutually exclusive to each other.

One of these aspects of the basic level scenario is the movement pattern of the user carrying the electronic device. Further, one of these aspects of the basic level context is the nature of the biologically generated sound within audible distance of the user. Furthermore, one of these aspects of the basic level scenario is the nature of the physical space around the user.

Examples of multiple classes of motion patterns, properties of biologically generated sounds, properties of physical space will now be given, although it is understood that the present disclosure contemplates and is intended to encompass any such classes.

Different categories of motion patterns may include user standing still, walking, going up stairs, going down stairs, jogging, cycling, climbing, using a wheelchair, and riding a vehicle. The different classes of determined properties of the biologically generated sound may include that the user is engaged in a telephone conversation, that the user is engaged in a multi-party conversation, that the user is speaking, that another party is speaking, that a background conversation occurs around the user, and that an animal utters a sound. The different categories of properties of the physical space around the user may include an office environment, a home environment, a mall environment, a street environment, a stadium environment, a restaurant environment, a bar environment, a beach environment, a natural environment, a temperature of the physical space, an air pressure of the physical space, and a humidity of the physical space.

Each vector has a class of "none of these are" which means the remaining classes in each vector are not explicitly incorporated as elements. This allows the sum of the probabilities of the elements of the vector to be equal to one, i.e. mathematically related. Also, this makes the vector representation flexible, so that new classes can be explicitly incorporated in the corresponding vector as needed, and this will simply change the composition of the "none of these" classes for that vector.

A meta-level context represents an inference made from a combination of probabilities of two or more classes of posterior probabilities. For example, the meta-level context may be that the user of the electronic device 100 is walking in a mall or busy in a telephone conversation in an office.

The processing node 120 may communicate the determined base-level context and the meta-level context to the SoC150, which may perform at least one contextual function of the electronic device 100 according to the base-level context or the meta-level context of the electronic device.

Fig. 3 shows the derivation of basic level context awareness from time-dependent information about the activity/environment class in each of the three vectors. Meta-level context awareness is derived from time-stamped information available from one or more of these base-level vectors and information stored in mobile device memory 140 or cloud 170 (e.g., schema library and database). The following introduces a desirable form of representing this information useful in application development related to base-level and meta-level context awareness.

The method for representing information is in the form of a probability that the class of vectors (motion activity, speech activity, and spatial environment) changes according to time, given the observations from one sensor or multiple sensors. This general information representation can be used to solve several application problems, such as detecting possible events from each vector in a time frame. These can be estimated as the posterior probabilities that each element of the MAV, VAV and SEV vectors is adjusted at a given time according to "observations", which are features derived from the sensor data records. The respective vectors of probability values are the corresponding "a posteriori probabilities", i.e. the motion activity a posteriori probability (MAP), the voice activity a posteriori probability (VAP) and the spatial environment a posteriori probability (SEP) of the processed output of the base level context awareness information.

Fig. 4 shows the probability of an element of the MAP comprising MAVs as a function of time, feature estimates derived from time-window observation data. The probability of the motion activity class is estimated from time window data obtained from one or more of the various sensors. Some of the models that may be used are i) Hidden Markov Models (HMMs), ii) Gaussian Mixture Models (GMMs), iii) Artificial Neural Networks (ANN) that produce probabilistic outputs for each class, and iv) multi-class probabilistic Support Vector Machines (SVMs) that incorporate Directed Acyclic Graphs (DAGs) and Voting (Maximum Wins Voting (MWV)). For each athletic activity class, the model parameters are trained using supervised learning from a training database that includes annotated data from all sensors to be used.

The number of sensors used to obtain the MAP depends on a number of factors, such as the number of available sensors on the mobile device 100, energy consumption constraints for the task, accuracy of the estimation, and so forth. When more than one sensor is used, different methods may be used to estimate MAP. One particularly useful method for fusing data from up to K different sensors to estimate MAP is shown in FIG. 4. In this method, sensor-specific features are extracted from the time window data from the corresponding sensor, and these features from the sensor are used to obtain the MAP.

Fig. 5 shows the probability that the VAP and SEP include feature estimates, derived from time-varying time-window observations received from microphone 112, which may be the beamformed output of such microphone array, for the elements of the VAV and SEV, respectively. With regard to MAP, probabilities are obtained from each active model (e.g., HMM, GMM, ANN, and multi-class probabilistic SVM incorporating DAG or MWV that produce probabilistic outputs for each class). For each athletic activity class, the model parameters are trained using supervised learning from a training database that includes annotated data from all sensors to be used.

The MAP based on tri-axial accelerometer data for a "walking" athletic activity of 150 seconds duration is shown in fig. 6. The tri-axial accelerometer data is sampled at 50Hz and a five second time window data frame is extracted. Successive frames are obtained by shifting the time window by two seconds. The amplitude of the three channel data is used to extract 17-dimensional features per frame. These features include the maximum number, minimum number, mean, root mean square, three cumulative features, and 10 th order linear prediction coefficients. The probability of each activity is estimated from the multi-class probabilistic SVM frame incorporating the DAG. For athletic activity in the MAV, a multi-class probabilistic SVM-DAG model of the MAP graph in fig. 6 is trained from the tri-axial accelerometer data using supervised learning from a training database that includes time-synchronized multi-sensor data from tri-axial accelerometer 111, barometer 113, tri-axial gyroscope 114, microphone 112, and tri-axial magnetometer 118.

The temporal evolution of a posteriori probability information as shown for MAP in fig. 6 is a general representation of context aware information at the basic level. It provides the probability of a class in an activity/context vector at a given time and shows its evolution over time. The following silence features of this representation format are relevant:

at any given time, the sum of the probabilities for all classes equals one; and is

At any given time, the activity/context classification is performed from the corresponding a posteriori probabilities, supporting the most probable class, thus providing a hard decision.

The "confidence" in the classification result, such as the difference between the maximum probability value and the second highest probability value, may be obtained from different measurements. The greater the difference between the two probability values, the greater confidence in the accuracy of the decoded class should be.

It is observed from fig. 6 that the probability of walking is highest compared to the probabilities of all other athletic activities, which results in correct classification at almost all times in the graph. The classification result is erroneous in two small time intervals, where the correct activity is misclassified as "stair-stepping".

Another time evolution of MAP based on tri-axial accelerometer data for a 30 second duration "stair climbing" athletic activity is shown in fig. 7. It can be seen that the maximum probability class at each time instant varies between "stair-climbing," "walking," and some other athletic activity. Therefore, decoding motion activity will be erroneous at those times, where the "stair up" class does not have the maximum probability. Also, the maximum probability at each time instant is lower than the "walking" activity shown in the MAP of fig. 6 and closer to the next highest probability. From this it can be deduced that the "confidence" in the accuracy of the decoded class is lower than in the "walking" activity case of fig. 6.

FIG. 8 shows two methods of data fusion from multiple sensors. The first method involves concatenating the features obtained from each sensor to form a composite feature vector. This feature vector is then given as input to the probabilistic classifier. The second method is based on bayesian theory. Suppose observation Z^K＝{Z₁,…,Z_KIn which Z is_iIs the feature vector for sensor number i. Bayes' theorem considers the following: given a particular class, the slave sensor S_iCharacteristic vector Z of_iCollected information and slave sensor S_jCharacteristic vector Z of_jThe information obtained is irrelevant. That is, P (Z)_i,Z_jClass I^L)＝P(Z_iClass I^L).P(Z_jClass I^L) Given this kind, it gives the joint probability of feature vectors from multiple sensors. Bayesian theorem is then used to fuse the data from multiple sensors to obtain the posterior probability.

FIG. 2 depicts a flow diagram of a method for determining probabilistic context awareness for a mobile device user using single-sensor and multi-sensor data fusion. Make S_iDenotes the ith sensor, where i ═ 1,2, … K, and K is the total number of sensors used (block 202). The sensor providing input data s_i(m), where i is the sensor number from 1 to K, and m is the discrete time index. Preprocessed time-aligned data s_i(m) is segmented into a plurality of fixed duration frames x_i(n) (block 204).

Thereafter, sensor-specific features are extracted and grouped into a plurality of vectors (block 206). Let z be_f ⁱIs a feature f, which is data x from the ith sensor_i(n) is extracted. Compound special materialThe eigenvector is the pass Z_i＝[z₁ ⁱ,z₂ ⁱ,…,z_Fi ⁱ]' given Z_i. Composite feature vector for n sensors

And (4) showing. For basic level context detection, the following features are extracted.

i.MAV：

a. An accelerometer: maximum number, minimum number, mean, root mean square, 3 cumulative characteristics, and 10 th order linear prediction coefficients.

These three cumulative characteristics are as follows:

1. average minimum number: is defined as x_iAverage of the first 15% of (n).

2. Average median number: is defined as x_iAverage between 30% and 40% of (n).

3. Average maximum number: is defined as x_iAverage of (n) between 95% and 100%.

b. A pressure sensor: maximum number, minimum number, mean, slope, and 6 th order linear prediction coefficients.

c. A gyroscope: maximum number, minimum number, mean, root mean square, 3 cumulative characteristics, and 10 th order linear prediction coefficients.

d. A microphone: concatenated 10 th order linear prediction coefficients, zero crossing rate and short time energy.

VAV and SEV:

a. a microphone: 13 mel-frequency cepstral coefficients (MFCCs), 13 differential MFCCs, and 13 double differential MFCCs.

b. Microphone array: 13 MFCCs, 13 differential MFCCs, and 13 double differential MFCCs.

The feature vectors are given as inputs to a probabilistic classifier, such as a multiclass probabilistic SVM-DAG (block 208). The obtained outputs are the corresponding a posteriori probabilities viz, MAP, VAP and SEP of the corresponding base level context awareness vectors MAV, VAV, SEV (block 212). The posterior probability is [ P (class)¹/Z^K) P (class)²/Z^K) ,., P (class)^L/Z^K)]' ofForm (L) wherein L is the number of classes in the MAV/VAV/SEV.

Fig. 9 and 10 show MAPs using data from two sensors, such as a three-axis accelerometer and a barometer. The 17 features from the tri-axial accelerometer listed above are used and one feature (i.e., the time slope of the pressure over a 5 second frame estimated using the least squares method) is used together in the multi-class probabilistic SVM-DAG model of the 18-dimensional input to obtain the probability for each active class. Comparing fig. 6 with fig. 9, it can be seen that one of the two false decision intervals when only accelerometer data is used is corrected using barometer data fusion. The effect of the fusion of accelerometer data with barometer data is evident in the comparison of fig. 6 and 9, respectively, where all incorrect decisions using accelerometer sensor data are corrected when the accelerometer data is fused with barometer data. Additional input from the pressure sensor can correctly disambiguate "stair-up" activity from "walking" and other activities.

The performance of the 9 classes of athletic activity classifiers using probabilistic MAP outputs is shown in FIG. 11 in the form of a confusion matrix. The classification is based on a fusion of 18 features obtained from accelerometer data and barometer data obtained from a smartphone. MAP is obtained using a multi-class probabilistic SVM-DAG model that was previously trained based on user data. Performance results have been obtained using leave-one-out on data from 10 subjects. The rows in the confusion matrix give the true motion activity class and the columns give the decoding activity class. Thus, the diagonal values represent the percentage of correct decisions for the corresponding class, while the non-diagonal values represent incorrect decisions. The total percentage of correct decisions obtained for the 9 activity classes was 95.16%.

The single-sensor data and/or the multi-sensor fused data are used to derive probabilistic output on basic-level context-aware information. This general algorithmic framework for basic level context awareness is extensible such that it may also include more motion and voice activity classes and spatial environment contexts in probabilistic output formats as needed. These corresponding a posteriori probability outputs may be integrated over time to provide a more accurate, but delayed, decision regarding activity or environmental class. The algorithmic framework allows integrating additional a posteriori probabilities for other classes of detection tasks derived from the same sensor or additional sensors.

The posterior probability output of motion or voice activity and spatial environment classes can be used to perform meta-level probabilistic analysis and develop embedded applications for context awareness as shown in fig. 12. For example, an inference of "walking" activity class from MAP and a "mall" class from SEP may together draw a meta-level inference: the user is walking in a mall. Probabilistic information in the three a posteriori probabilities can be used as input to a meta-level context-aware classifier, on which more advanced applications can be built.

Fig. 13 shows a snapshot of an application developed using Java for an android OS based smartphone. The user interface of the application includes start, stop, and pause buttons as shown in the snapshot on the left for calculating a posterior probability in real time, logging its time evolution, and displaying them graphically in real time for up to 40 past frames. The snapshot on the right shows the MAPs of the 9 athletic activity classes as a function of time. It also displays the decoded class of the current frame from the maximum probability value. The total duration of time the user spent in each athletic activity class since the start of the application is also shown. The application determines the athletic activity posterior probability using a fusion of accelerometer, barometer, and gyroscope data. The number of features varies depending on the number of sensors used. The posterior probability is evaluated using one of three methods: i) multi-class probabilistic SVMs in conjunction with a DAG, ii) multi-class probabilistic SVMs in conjunction with MWVs, and iii) multi-class SVMs that produce hard decision outputs. The real-time graphical display of the probability values of all classes also gives a quick visual depiction of the "confidence" of the classification result as the most probable class by comparing the second highest probability class.

Although the foregoing description has been described herein with reference to particular means, materials and embodiments, it is not intended to be limited to the particulars disclosed herein; but rather extends to all functionally equivalent structures, methods and uses, such as are within the scope of the appended claims.

Claims

1. An electronic device, comprising:

a sensing unit comprising at least one sensor and configured to acquire sensing data;

a computing device configured to:

extracting a plurality of sensor-specific features from the sensed data;

generating a motion activity vector, a voice activity vector, and a spatial environment vector from the plurality of sensor-specific features;

processing the motion activity vector, the voice activity vector, and the spatial environment vector to determine a basic level context of the electronic device relative to its surroundings, the basic level context having a plurality of aspects, each aspect being based on the motion activity vector, the voice activity vector, and the spatial environment vector, and

determining a meta-level context of the electronic device relative to its surroundings from the basic-level context, wherein the meta-level context comprises at least one inference made from at least two of the plurality of aspects of the basic-level context.

2. The electronic device of claim 1, wherein each aspect of the basic level context based on the motion activity vector is mutually exclusive from each other; wherein each aspect of the base level context based on the voice activity vector is mutually exclusive from each other; and wherein each aspect of the base level context based on the spatial environment vector is mutually exclusive from each other.

3. The electronic device of claim 1, wherein the aspects of the basic level context consist of one aspect based on the motion activity vector, one aspect based on the voice activity vector, and one aspect based on the spatial environment vector.

4. The electronic device of claim 1, wherein the computing device is further configured to cause performance of at least one contextual function of the electronic device in accordance with the meta-level context of the electronic device.

5. The electronic device of claim 1, wherein, according to the motion activity vector, one of the aspects of the basic level context is determined to be a motion pattern of a user carrying the electronic device; wherein, according to the voice activity vector, one of the aspects of the basic level context is determined to be a property of a biologically generated sound within an audible distance of the user; and wherein one of the aspects of the base level context is determined to be a property of a physical space surrounding the user according to the spatial environment vector.

6. The electronic device of claim 5, wherein the determined movement pattern of the user comprises one of: the user is stationary, walking, ascending stairs, descending stairs, jogging, cycling, climbing, using a wheelchair, and riding a vehicle; wherein the determined property of the biologically generated sound comprises one of: said user is engaged in a telephone conversation, said user is engaged in a multi-party conversation, said user is speaking, another party is speaking, a background conversation occurs around said user, and an animal makes a sound; and wherein the determined property of the physical space around the user comprises an office environment, a home environment, a mall environment, a street environment, a stadium environment, a restaurant environment, a bar environment, a beach environment, a physical environment, a temperature of the physical space, an air pressure of the physical space, and a humidity of the physical space.

7. The electronic device of claim 1, wherein the computing device is configured to process the motion activity vector, the voice activity vector, and the spatial environment vector by:

generating a motion activity posterior probability according to the motion activity vector, wherein the motion activity posterior probability represents the probability of each element of the motion activity vector changing according to time;

generating a voice activity posterior probability according to the voice activity vector, wherein the voice activity posterior probability represents the probability of each element of the voice activity vector changing according to time; and is

And generating a space environment posterior probability according to the space environment vector, wherein the space environment posterior probability represents the probability of each element of the space environment vector changing according to time.

8. The electronic device of claim 7, wherein the sum of each probability of the athletic activity posterior probability at any given time is equal to one; wherein the sum of each of the probabilities of the voice activity a posteriori at any given time is equal to one; and wherein the sum of each probability of the spatial environment posterior probability at any given time is equal to one.

9. The electronic device of claim 7, wherein the sensing unit consists essentially of one sensor.

10. The electronic device of claim 7, wherein the sensing unit comprises a plurality of sensors; and wherein the motion activity vector, the voice activity vector, and the spatial environment vector are generated from a fusion of the plurality of sensor-specific features.

11. The electronic device of claim 10, wherein the plurality of sensors includes at least two of an accelerometer, a pressure sensor, a microphone, a gyroscope, a magnetometer, a GPS unit, and a barometer.

12. The electronic device of claim 1, further comprising a Printed Circuit Board (PCB) having at least one conductive trace thereon; further comprising a system on a chip (SoC) mounted on the PCB and electrically coupled to the at least one conductive trace; and wherein the computing device comprises a sensor chip mounted on the PCB in spaced relation to the SoC and electrically coupled to the at least one conductive trace such that the sensor chip and the SoC are electrically coupled; and wherein the sensor chip comprises a micro-electro-mechanical system (MEMS) sensing unit and control circuitry configured to perform the extracting, the generating, the processing, and the determining.

13. An electronic device, comprising:

a Printed Circuit Board (PCB) having at least one conductive trace thereon;

a system on a chip (SoC) mounted on the PCB and electrically coupled to the at least one conductive trace;

a sensor chip mounted on the PCB in spaced relation to the SoC and electrically coupled to the at least one conductive trace such that the sensor chip and the SoC are electrically coupled and configured to acquire sensed data;

wherein the sensor chip comprises:

a micro-electro-mechanical system (MEMS) sensing unit;

an embedded processing node configured to:

the sensed data is pre-processed in such a way that,

extracting a plurality of sensor-specific features from the sensed data,

generating an exercise activity posterior probability, a voice activity posterior probability, and a spatial environment posterior probability based on the plurality of sensor-specific features,

processing the athletic activity posterior probability, the voice activity posterior probability, and the spatial environment posterior probability to determine a base level context of the electronic device relative to its surroundings, the base level context having a plurality of aspects, each aspect based on the athletic activity posterior probability, the voice activity posterior probability, and the spatial environment posterior probability, and

determining a meta-level context of the electronic device relative to its surroundings from the basic-level context and a schema library stored in a cloud or local memory, wherein the meta-level context comprises at least one inference made from at least two of the plurality of aspects of the basic-level context.

14. The electronic device of claim 13, further comprising at least one additional sensor; wherein the SoC is configured to acquire additional data from the at least one additional sensor; wherein the embedded processing node is further configured to receive the additional data from the SoC and also extract the plurality of sensor-specific features from the additional data.

15. The electronic device of claim 13, wherein the embedded processing node is configured to generate the motion activity a posteriori probability, the voice activity a posteriori probability, and the spatial environment a posteriori probability to represent a probability of each element of a motion activity vector, a voice activity vector, and a spatial environment vector, respectively, varying according to time.

16. The electronic device of claim 13, wherein the sum of each probability of the athletic activity posterior probability at any given time is equal to one; wherein the sum of each of the probabilities of the voice activity a posteriori at any given time is equal to one; and wherein the sum of each probability of the spatial environment posterior probability at any given time is equal to one.

17. The electronic device of claim 13, wherein the sensor chip consists essentially of one MEMS sensing unit.

18. The electronic device of claim 13, wherein the sensor chip comprises a plurality of MEMS sensing units; and wherein the athletic activity posterior probability, the voice activity posterior probability, and the spatial environment posterior probability are generated from a fusion of the plurality of sensor-specific features.

19. A method, comprising:

collecting sensing data from a sensing unit;

extracting, using a computing device, a plurality of sensor-specific features from the sensed data;

generating, using the computing device, a motion activity vector, a voice activity vector, and a spatial environment vector from the plurality of sensor-specific features;

processing, using the computing device, the motion activity vector, the voice activity vector, and the spatial environment vector to determine a base-level context of the electronic device relative to its surroundings, the base-level context having a plurality of aspects, each aspect based on the motion activity vector, the voice activity vector, and the spatial environment vector; and is

Determining, using the computing device, a meta-level context of the electronic device relative to its surroundings from the basic-level context, wherein the meta-level context comprises at least one inference made from at least two of the plurality of aspects of the basic-level context.

20. The method of claim 19, wherein the motion activity vector, the speech activity vector, and the spatial environment vector are processed by: