WO2021170014A1

WO2021170014A1 - User behavior identification method and apparatus, computer device, and medium

Info

Publication number: WO2021170014A1
Application number: PCT/CN2021/077744
Authority: WO
Inventors: 张夏天
Original assignee: 北京腾云天下科技有限公司
Priority date: 2020-02-28
Filing date: 2021-02-24
Publication date: 2021-09-02
Also published as: CN111427754A

Abstract

The present disclosure provides a user behavior identification method and apparatus, a computer device, and a medium. The user behavior identification method in the present disclosure comprises: obtaining a plurality of groups of motion state data, each group of motion state data comprising a plurality of pieces of state information, and the state information being used for representing a motion state of a mobile device carried by a user; performing de-correlation processing on the plurality of groups of motion state data to eliminate a correlation between the state information in each group of motion state data so as to obtain target data; and determining a user behavior type according to the target data.

Description

User behavior identification method and device, computer equipment and medium

Cross-references to related applications

This application claims the priority of Chinese patent application 202010127977.5 filed on February 28, 2020, the entire content of which is incorporated into this application by reference in its entirety.

Technical field

The present disclosure relates to the field of mobile data processing technology, and in particular to a user behavior recognition method, device, computer equipment, non-transitory computer-readable storage medium, and computer program product.

Background technique

Mobile devices, such as mobile phones, tablet computers, smart wearable devices, etc., have brought great convenience to people's lives. Users can watch audio and video, shop, play games, record exercise status, etc. through their portable mobile devices. In some cases, it is advantageous to recognize the user's behavior type in real time. For example, it is possible to customize and intelligently recommend content, provide health advice, and develop exercise plans for users based on the identified user behavior types.

The methods described in this section are not necessarily those that have been previously conceived or adopted. Unless otherwise indicated, it should not be assumed that any of the methods described in this section are considered prior art simply because they are included in this section. Similarly, unless otherwise specified, the problems mentioned in this section should not be considered recognized in any prior art.

Summary of the invention

It would be advantageous to provide a mechanism to alleviate, alleviate or even eliminate one or more of the above-mentioned problems.

According to an aspect of the present disclosure, there is provided a user behavior recognition method, including: acquiring multiple sets of motion state data, each set of motion state data includes multiple pieces of state information, and the state information is used to indicate the motion state of a mobile device carried by a user; Perform decorrelation processing on the multiple sets of exercise state data to eliminate the correlation of various state information in each set of exercise state data to obtain target data; and determine the user behavior type according to the target data.

According to another aspect of the present disclosure, there is provided a user behavior recognition device, including: a data acquisition module configured to acquire multiple sets of exercise state data, each set of exercise state data includes multiple pieces of state information, and the state information is used to indicate a user The movement state of the carried mobile device; a decorrelation module configured to decorrelate the above multiple sets of movement state data to eliminate the correlation of various state information in each group of movement state data to obtain target data; and The classification module is configured to determine the user behavior type according to the target data.

According to another aspect of the present disclosure, there is provided a computer device including: a memory, a processor, and a computer program stored on the memory. The processor is configured to execute the above-mentioned computer program to implement the steps of the above-mentioned user behavior identification method.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by the processor, the steps of the above-mentioned user behavior identification method are realized.

According to another aspect of the present disclosure, there is provided a computer program product, including a computer program, which, when executed by a processor, implements the above-mentioned user behavior recognition method.

According to the embodiments of the present disclosure, by acquiring and processing multiple sets of motion state data of the mobile device carried by the user, the current behavior type of the user can be determined. The processing of multiple sets of motion state data includes decorrelation processing. By decorrelating multiple sets of motion state data, the correlation of each state information in each set of motion state data is eliminated, and the redundant information in the multiple sets of acquired motion state data is greatly reduced. The user behavior type is determined according to the target data obtained by the decorrelation processing, which improves the efficiency and accuracy of user behavior recognition.

These and other aspects of the present disclosure will be clear from the embodiments described below, and will be clarified with reference to the embodiments described below.

Description of the drawings

The accompanying drawings exemplarily show the embodiments and constitute a part of the specification, and together with the text description of the specification are used to explain exemplary implementations of the embodiments. The illustrated embodiments are for illustrative purposes only and do not limit the scope of the claims. In all the drawings, the same reference signs refer to similar but not necessarily the same elements.

FIG. 1 is a schematic diagram illustrating an example system in which various methods described herein may be implemented according to an example embodiment;

Fig. 2 is a flowchart illustrating a user behavior recognition method according to an exemplary embodiment;

Fig. 3 is a schematic diagram illustrating a fuselage coordinate system according to an exemplary embodiment;

4 is a schematic diagram illustrating the correlation of various pieces of state information in exercise state data according to an exemplary embodiment;

FIG. 5 is a schematic diagram illustrating the correlation of various state information after decorrelation processing is performed on the motion state data involved in FIG. 4;

Fig. 6 is a schematic diagram illustrating a user behavior recognition apparatus according to an exemplary embodiment;

Fig. 7 is a structural block diagram illustrating a user behavior recognition model according to an exemplary embodiment;

Fig. 8 is a structural block diagram illustrating another user behavior recognition model according to an exemplary embodiment;

Fig. 9 is a schematic diagram illustrating an exemplary computer device that can be applied to the exemplary embodiment.

Detailed ways

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of these elements. Such terms are only used for Distinguish one element from another. In some examples, the first element and the second element may refer to the same instance of the element, and in some cases, based on the description of the context, they may also refer to different instances.

The terms used in the description of the various examples in this disclosure are only for the purpose of describing specific examples, and are not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, there may be one or more elements. As used herein, the term "plurality" means two or more, and the term "based on" should be interpreted as "based at least in part." In addition, the terms "and/or" and "at least one of" cover any one of the listed items and all possible combinations.

The exemplary embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram illustrating an example system 100 in which various methods described herein may be implemented according to an example embodiment.

Referring to FIG. 1, the system 100 includes a mobile device 110, a server 120, and a network 130 that communicatively couples the mobile device 110 and the server 120.

The mobile device 110 includes a display 114 and a client application (APP) 112 that can be displayed via the display 114. The client application 112 may be an application that needs to be downloaded and installed before running, or a small program (liteapp) that is a lightweight application. When the client application 112 is an application that needs to be downloaded and installed before running, the client application 112 may be pre-installed on the mobile device 110 and activated. In the case where the client application 112 is an applet, the user 102 can search for the client application 112 in the host application (for example, by the name of the client application 112, etc.) or scan the graphic code of the client application 112 (for example, a barcode). , QR code, etc.), directly run the client application 112 on the mobile device 110 without installing the client application 112. The mobile device 110 may be any type of mobile device, including a mobile computer, a mobile phone, a wearable computer device (such as a smart watch, a head-mounted device, including smart glasses, etc.) or other types of mobile devices.

The server 120 is typically a server deployed by an Internet Service Provider (ISP) or Internet Content Provider (ICP). The server 120 may represent a single server, a cluster of multiple servers, a distributed system, or a cloud server that provides basic cloud services (such as cloud databases, cloud computing, cloud storage, and cloud communications). It will be understood that although the server 120 is shown in FIG. 1 to communicate with only one mobile device 110, the server 120 may provide background services for multiple mobile devices at the same time.

Examples of the network 130 include a combination of a local area network (LAN), a wide area network (WAN), a personal area network (PAN), and/or a communication network such as the Internet. The network 130 may be a wired or wireless network. In some embodiments, technologies and/or formats including Hypertext Markup Language (HTML), Extensible Markup Language (XML), etc. are used to process data exchanged through the network 130. In addition, encryption technologies such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (IPsec), etc. can also be used to encrypt all or some links. In some embodiments, customized and/or dedicated data communication technologies can also be used to replace or supplement the aforementioned data communication technologies.

For the purpose of the embodiments of the present disclosure, in the example of FIG. 1, a sensor for collecting motion state data of the mobile device 110 and a client application 112 are deployed in the mobile device 110. The client application 112 may be a user behavior recognition application, and the application may determine the behavior type of the user carrying the mobile device 110 according to the motion state data of the mobile device 110 collected by the sensor. Correspondingly, the server 120 may be a server used with the user behavior recognition application. A user behavior recognition model is deployed in the server 120 (the details of the user behavior recognition model will be detailed in the following embodiments). The server 120 can provide a user behavior recognition service to the client application 112 running in the mobile device 110 based on the user behavior recognition model. For example, the server 120 receives the exercise status data sent by the mobile device 110 through the client application 112, and then transfers the exercise status data to the client application 112. Enter the user behavior recognition model to determine the type of user behavior. Alternatively, the server 120 may also provide the user behavior recognition model to the mobile device 110, and the client application 112 running in the mobile device 110 may implement the user behavior recognition function by calling the user behavior recognition model.

The user behavior recognition model may be a pre-trained neural network model, and the details of the model will be described in detail in the following embodiments. The server 120 may, for example, use various training or learning techniques such as back propagation of errors to train the user behavior recognition model. In some embodiments, performing backpropagation of errors may include performing truncated backpropagation through time. The server 120 may employ a variety of generalization techniques (for example, weight attenuation, loss, etc.) to improve the generalization ability of the model being trained. Specifically, the server 120 may train the user behavior recognition model based on the set of training samples. In an embodiment of the present disclosure, the training sample may be, for example, exercise state data marked with the type of user behavior. After the user behavior recognition model training is completed, the model can be converted into a TFLite format model file (TensorFlow Lite file, in which the model parameters are quantized into an 8-bit integer type, that is, int8) in the server 120. The file is deployed in the mobile device 110, so that the client application 112 running in the mobile device 110 can implement the user behavior recognition function by calling the user behavior recognition model.

FIG. 2 is a flowchart illustrating a user behavior recognition method 200 according to an exemplary embodiment. The method 200 may be executed at a mobile device (for example, the mobile device 110 shown in FIG. 1 ), that is, the execution subject of each step of the method 200 may be the mobile device 110 shown in FIG. 1. In some embodiments, the method 200 may be executed at a server (for example, the server 120 shown in FIG. 1). In other embodiments, the method 200 may be executed by a combination of a mobile device (e.g., mobile device 110) and a server (e.g., server 120). Hereinafter, the method 200 is described by taking the mobile device 110 as the execution subject as an example.

As shown in FIG. 2, the method 200 includes steps 210 to 230. In step 210, multiple sets of exercise state data are obtained, and each set of exercise state data includes multiple pieces of state information, and the state information is used to indicate the exercise state of the mobile device carried by the user. In step 220, decorrelation processing is performed on the multiple sets of motion state data to eliminate the correlation of various state information in each set of motion state data to obtain target data. In step 230, the user behavior type is determined according to the target data.

Hereinafter, each step of the method 200 will be described in detail.

Referring to FIG. 2, in step 210, multiple sets of exercise state data are obtained, and each set of exercise state data includes multiple pieces of state information, and the state information is used to indicate the exercise state of the mobile device carried by the user.

According to some embodiments, the multiple sets of motion state data described in step 210 are collected by sensors deployed in the mobile device. Sensors include, but are not limited to, acceleration sensors, gyroscopes (ie, angular velocity sensors), magnetic field sensors, and the like.

According to some embodiments, the sensors deployed in the mobile device include an acceleration sensor and a gyroscope. Accordingly, the motion state data of the mobile device collected by the sensor includes: acceleration information of the mobile device in multiple directions; Angular velocity information in each direction.

Generally, the acceleration sensor can collect the acceleration information of the mobile device on the three coordinate axes of x, y and z; the gyroscope can collect the angular velocity information of the mobile device on the three coordinate axes of x, y, and z , Y, z three coordinate axis rotation angular velocity information). Fig. 3 shows a schematic diagram of a fuselage coordinate system according to an exemplary embodiment. As shown in Figure 3, the origin of the body coordinate system is the center point of the screen of the mobile device, the x-axis direction is horizontal to the right in the screen plane; the y-axis direction is vertical upwards in the screen plane; the z-axis direction is perpendicular to the screen plane And point to the top of the screen.

According to some embodiments, step 210 includes: acquiring multiple sets of motion state data collected by sensors deployed in the mobile device within a predetermined period of time at a predetermined frequency.

It should be noted that the values of the predetermined frequency and the predetermined duration can be set by those skilled in the art according to actual conditions. Preferably, the predetermined frequency is set to a smaller value, thereby reducing the demand for sensor data subscription and reducing power consumption; at the same time, the predetermined duration is also set to a smaller appropriate value, thereby ensuring the recognition of user behavior The accuracy is reduced while the delay is increased, the sensitivity of the response is increased, and the real-time performance of user behavior recognition is improved, that is, the current behavior type of the user is recognized in real time. For example, in one embodiment, the predetermined frequency may be set to 20 Hz (that is, the interval time for obtaining motion state data from the sensor is 50 ms), and the predetermined duration is set to 6400 ms, then step 210 is equivalent to obtaining 128 sets of motion at a time interval of 50 ms. Status data.

Each group of motion state data includes the acceleration information of the mobile device in the directions of the three coordinate axes x, y, and z collected by the acceleration sensor and the angular velocity of the mobile device rotating around the three coordinate axes of x, y, and z collected by the gyroscope. Take the information as an example. When the predetermined frequency is set to 20Hz and the predetermined duration is set to 6400ms, the obtained 128 sets of motion state data d ₁ to d ₁₂₈ are shown in Table 1 below:

Table 1. Example of status data

In Table 1, each row corresponds to a set of motion state data, and each set of motion state data includes x-axis acceleration information, y-axis acceleration information, z-axis acceleration information, x-axis angular velocity information, y-axis angular velocity information, and z-axis angular velocity Information six status information.

The motion state data of the mobile device acquired in step 210 has a strong correlation with the behavior of the user currently carrying the mobile device, and the user behavior type can be identified accordingly. User behavior types include but are not limited to being stationary, walking, running, cycling, taking a bus, taking a subway, taking a small car (such as a taxi, private car, etc.), etc.

Continuing to refer to FIG. 2, after obtaining multiple sets of motion state data in step 210, step 220 is executed.

In step 220, decorrelation processing is performed on the multiple sets of motion state data to eliminate the correlation of various state information in each set of motion state data to obtain target data.

Since multiple data collected by the same sensor often have a large linear correlation, for example, there is a large correlation between the acceleration information of the x, y, and z axes collected by the acceleration sensor. There is a large correlation between the collected angular velocity information of the x, y, and z axes. Figure 4 shows the correlation of various state information in multiple sets of motion state data in the form of a matrix, where numbers 0 to 5 respectively represent x-axis acceleration information, y-axis acceleration information, z-axis acceleration information, and x-axis angular velocity. Information, y-axis angular velocity information, z-axis angular velocity information, the element in the i-th row and j-th column of the matrix represents the correlation (covariance) between the information corresponding to the number i and the information corresponding to the number j. The absolute value of the correlation is closer to 1 , Indicates that the stronger the correlation between the two state information, the closer the absolute value of the correlation is to 0, the weaker the correlation between the two state information. The correlation is a positive number, which means that the two state information is positively correlated; the correlation is a negative number, which means that the two state information is negatively correlated.

The right side of Figure 4 shows an example of the corresponding relationship between correlation and gray value to show the correlation of various state information more intuitively. The lighter the gray, the stronger the correlation between the two state information; the more gray Deep, it means that the correlation between the two status information is weaker. In the correlation matrix shown on the left side of Figure 4, the more correlated parts (that is, the light-colored parts) are located in the upper left and lower right corners of the matrix, that is, the acceleration information of the x, y, and z axes collected by the acceleration sensor There is a large correlation between the three, and there is a large correlation between the angular velocity information of the x, y, and z axes collected by the gyroscope.

The relevance of status information means that there is a large amount of information redundancy in the original motion status data collected by the sensor, which will reduce the efficiency and accuracy of user behavior recognition. Therefore, in step 220, decorrelation processing is performed on multiple sets of motion state data to improve the efficiency and accuracy of user behavior recognition.

In some embodiments, step 220 further includes the following steps 222 to 224:

In step 222, centralization processing is performed on the multiple sets of motion state data, so that the average value of each state information after processing is 0, and centralization data is obtained.

Specifically, multiple sets of motion state data may form an original data matrix, each row in the original data matrix represents a set of motion state data, and each column identifies one item of state information. For example, the 128 groups of motion state data in Table 1 can form a 128*6 original data matrix D, namely:

Each row of the original data matrix D represents a group of motion state data, and each column corresponds to a piece of state information. The state information corresponding to each column is x-axis acceleration information, y-axis acceleration information, z-axis acceleration information, and x-axis angular velocity information. , Y-axis angular velocity information, z-axis angular velocity information.

For the original data matrix D, find the mean value of each column, and get the mean vector

Centralized data C is the difference between the original data matrix D and the mean vector E(D), namely:

In step 224, a transformation matrix of the centralization data is determined, and the centralization data is multiplied by the transformation matrix to obtain target data, wherein the transformation matrix makes the covariance matrix of the target data Identity matrix, thereby eliminating the relevance of various status information.

According to some embodiments, the transformation matrix of the centralized data is determined using the following steps 2242 to 2246:

In step 2242, the covariance matrix Σ of the centralized data is calculated. Have:

Among them, m is the number of groups of exercise state data. Taking the exercise state data shown in Table 1 as an example, m=128.

Subsequently, in step 2244, the covariance matrix Σ is subjected to singular value decomposition, and the covariance matrix Σ is transformed into the product of the first matrix U, the second matrix S, and the transposed matrix U ^T of the first matrix U. One matrix U is a matrix composed of eigenvectors of the covariance matrix Σ, and the second matrix S is a diagonal matrix composed of eigenvalues of the covariance matrix Σ. That is, the singular value decomposition result of the covariance matrix Σ is:

∑=USU ^T

Subsequently, in step 2246, the product of the matrices U, S ^-1/2 and U ^T is used as the transformation matrix, where ^{the value of each element in S -1/2} is the square root of the element at the corresponding position in the second matrix S reciprocal. That is, the transformation matrix W is:

Returning to step 224, after the transformation matrix W of the centralized data is determined, the centralized data C is multiplied by the transformation matrix W to obtain the target data Z, namely:

Z=CW

It is easy to prove that the covariance matrix of the target data Z

which is:

It can be seen from the above that the covariance matrix of the target data Z is an identity matrix I with diagonal elements of 1, and other elements of 0. That is to say, in the target data Z, the covariance between different state information is 0, thereby eliminating the correlation between various state information.

FIG. 5 shows the correlation (covariance) of various state information after decorrelating the multiple sets of motion state data involved in FIG. 4. In the correlation matrix shown in Figure 5, except for the diagonal element value 1, the element values at other positions are all very small, reaching the order of 10^(-6), 10^(-7), almost 0 , Indicating that the correlation between various status information is eliminated.

Continuing to refer to FIG. 2, in step 220, the multiple sets of motion state data are decorrelated, and after the target data is obtained, step 230 is executed.

In step 230, the user behavior type is determined according to the target data.

The target data obtained in step 220 has a strong correlation with the type of user behavior. In step 230, the user behavior type corresponding to the target data can be determined according to the correlation between the target data and the user behavior type. The user behavior types include, for example, being stationary, walking, running, cycling, taking a bus, taking a subway, taking a small car (such as a taxi, a private car, etc.), but not limited thereto.

According to some embodiments, step 220 and step 230 may be implemented by a trained user behavior recognition model. That is, step 220 includes: inputting the multiple sets of motion state data into a preset user behavior recognition model, so that the user behavior recognition model performs decorrelation processing on the multiple sets of motion state data to obtain target data. Step 230 includes: processing the target data by the user behavior recognition model to determine the user behavior type.

According to some embodiments, the user behavior recognition model is trained by using the motion state data marked with the user behavior type as the training sample. The trained user behavior recognition model takes exercise state data as input, and outputs the user behavior type corresponding to the exercise state data. Specifically, the user behavior recognition model includes a decorrelation module and a classification module. The decorrelation module is configured to perform step 220, that is, decorrelate multiple sets of exercise state data to eliminate items in each group of exercise state data. Relevance of status information to get target data. The classification module is configured to perform step 230, which is to determine the user behavior type according to the target data.

FIG. 6 is a schematic diagram illustrating a user behavior recognition apparatus 600 according to an exemplary embodiment.装置600。 Device 600. As shown in FIG. 6, the device 600 includes a data acquisition module 610, a decorrelation module 620 and a classification module 630.

The data acquisition module 610 is configured to acquire multiple sets of motion state data, and each set of motion state data includes multiple pieces of state information, and the state information is used to indicate the motion state of the mobile device carried by the user.

The decorrelation module 620 is configured to decorrelate the multiple sets of motion state data, so as to eliminate the correlation of various state information in each set of motion state data to obtain target data.

The classification module 630 is configured to determine the user behavior type according to the target data.

The various modules of the device 600 are described in detail below.

According to some embodiments, multiple sets of motion state data are collected by sensors deployed in the mobile device. Correspondingly, the data acquisition module 610 is further configured to acquire multiple sets of status data collected by the sensors deployed in the mobile terminal within a predetermined period of time according to a predetermined frequency.

According to some embodiments, the motion state data includes: acceleration information of the mobile device in multiple directions; and angular velocity information of the mobile device in multiple directions.

According to some embodiments, the decorrelation module 620 is further configured to: perform centralized processing on the multiple sets of motion state data so that the average value of each item of state information after processing is 0 to obtain centralized data; determine the centralized data The transformation matrix of the data is multiplied by the centralization data and the transformation matrix to obtain the target data, wherein the transformation matrix makes the covariance matrix of the target data an identity matrix, thereby eliminating various status information Correlation.

According to some embodiments, the decorrelation module 620 is further configured to: calculate the covariance matrix Σ of the centralized data; perform singular value decomposition on the covariance matrix Σ, and convert the covariance matrix Σ into a first matrix U, The product of the second matrix S and the transposed matrix U ^T of the first matrix U, where the first matrix U is a matrix composed of the eigenvectors of the covariance matrix Σ, and the second matrix S is the eigenvalues of the covariance matrix Σ. Diagonal matrix composed of the matrix; take the product of the matrices U, S ^-1/2 and U ^T as the transformation matrix, where ^{the value of each element in S -1/2} is the reciprocal of the square root of the element at the corresponding position in the second matrix S .

The classification module 630 is configured to determine the user behavior type according to the target data. The user behavior types include, for example, stationary, walking, running, cycling, taking a bus, taking a subway, taking a small car, and so on.

It should be understood that each module of the apparatus 600 shown in FIG. 6 may correspond to each step in the method 200 described with reference to FIG. 2. Therefore, the operations, features, and advantages described above for the method 200 are also applicable to the device 600 and the modules included in it. For the sake of brevity, some operations, features and advantages will not be repeated here.

According to some embodiments, the decorrelation module 620 and the classification module 630 form a user behavior recognition model, and the user behavior recognition model is trained using exercise state data marked with user behavior types as training samples.

FIG. 7 shows a structural block diagram of a user behavior recognition model 700 according to an exemplary embodiment. As shown in FIG. 7, the user behavior recognition model 700 includes a decorrelation module 7100 and a classification module 7200.

The decorrelation module 7100 is configured to decorrelate multiple sets of motion state data to eliminate the correlation of various state information in each set of motion state data to obtain target data.

For details of the features and advantages related to the decorrelation processing of the decorrelation module 7100, please refer to the related description of the aforementioned step 220 and the decorrelation module 620, which will not be repeated here.

According to some embodiments, the decorrelation module 7100 further includes a batch normalization layer (Batch Normalization), which is configured to adjust the mean value and variance of various status information in the target data Z so that the target data Z is most suitable The state of model training improves the efficiency of model training and subsequent classification effects. Mark the adjusted data as Z’, and the processing performed by the batch standardization layer is as follows:

Z′=γZ+β

Among them, γ and β are trainable parameters, and their values are determined by training.

After the decorrelation module 7100 obtains the target data from which the relevance of the state information is eliminated, the target data is input to the classification module 7200 for processing.

The classification module 7200 is configured to process the target data to determine the user behavior type and output it.

According to some embodiments, as shown in FIG. 7, the classification module 7200 includes a plurality of convolution processing units 7220-1 to 7220-N. Each convolution processing unit includes two depthwise separable convolution layers (depthwise separable convolution) 7221 and 7222. Wherein, the convolution stride of the separable convolutional layer at the first depth is 1, and the convolution stride of the separable convolutional layer at the second depth is greater than 1. For example, the convolution step length of the second depth separable convolution layer may be 2 or other values.

Compared with the traditional convolutional layer, the depth separable convolutional layer has greatly reduced the number of parameters, which reduces the complexity of the model, thereby increasing the running speed of the model. The convolution step size setting greater than 1 replaces the pooling layer in the traditional neural network model. Compared with setting the pooling layer, the convolution step size greater than 1 can hardly affect the accuracy of the model and avoid A large amount of information loss caused by pooling reduces the complexity of the model and the risk of overfitting, and improves the running speed of the model.

FIG. 8 shows a structural block diagram of another user behavior recognition model 800 according to an exemplary embodiment. As shown in FIG. 8, the user behavior recognition type 800 includes a decorrelation module 8100 and a classification module 8200.

The decorrelation module 8100 is configured to decorrelate the input multiple sets of motion state data to eliminate the correlation of various state information in each set of motion state data to obtain target data.

For the detailed features and advantages of the decorrelation module 8100, please refer to the related descriptions of the aforementioned step 220, the decorrelation module 620, and the decorrelation module 7100, which will not be repeated here.

For example, the input multiple sets of motion state data form a 128*6 original data matrix (a total of 128 sets of motion state data, each group of motion state data includes 6-dimensional state information), after processing by the decorrelation module 8100, 128*6 In the target data, the correlation between various status information is eliminated in the target data.

The target data output by the decorrelation module 8100 is input to the classification module 8200.

The classification module 8200 is configured to perform a series of processing on the target data, and finally output the user behavior type.

For example, as shown in FIG. 8, the classification module 8200 includes three convolution processing units, and each convolution processing unit includes a first-depth separable convolutional layer and a second-depth separable convolutional layer. Among them, the convolution stride of the separable convolutional layer at the first depth is 1, and the convolution stride of the separable convolutional layer at the second depth is 2. In addition, the kernel size of all depth separable convolutional layers is 5, that is, 5*1, and the number of output channels (filters) is 16. In addition, the depth separable convolutional layers all use the ReLu activation function (the activation layer is not shown in FIG. 8), and the convolution kernel has only weights and no biases.

It should be noted that the various parameters shown above (including the convolution step size of the depth separable convolution layer, the size of the convolution kernel, the number of output channels, the activation function, and whether there is a bias) are only examples. The technical personnel can adjust the parameters of each processing layer of the user behavior recognition model according to the actual situation, and this publication does not limit this.

In the user behavior recognition model shown in FIG. 8, the first-depth separable convolutional layer 1 performs convolution processing on 128*6 target data, and outputs a 128*16 feature map;

The second-depth separable convolutional layer 1 performs convolution processing on the 128*16 feature map output by the first-depth separable convolutional layer 1, and outputs a 64*16 feature map;

The first-depth separable convolutional layer 2 performs convolution processing on the 64*16 feature map output by the second-depth separable convolutional layer 1, and outputs a 64*16 feature map;

The second-depth separable convolutional layer 2 performs convolution processing on the 64*16 feature map output by the first-depth separable convolutional layer 2, and outputs a 32*16 feature map;

The first-depth separable convolutional layer 3 performs convolution processing on the 32*16 feature map output by the second-depth separable convolutional layer 2, and outputs a 32*16 feature map;

The second-depth separable convolutional layer 3 performs convolution processing on the 32*16 feature map output by the first-depth separable convolutional layer 3, and outputs a 16*16 feature map;

The flattening layer flattens the 16*16 feature map output by the second depth separable convolutional layer 3, and outputs a 256-dimensional feature vector;

The fully connected softmax layer fully connects the 256-dimensional feature vector output by the flattening layer, determines the probability that the feature vector belongs to each user behavior type, and outputs the user behavior type with the highest probability as the classification result.

In the model structure shown in FIG. 8, the convolution step size of the second-depth separable convolution layer is 2, which realizes down-sampling of feature data, which replaces the pooling layer in the traditional neural network model. Compared with setting the pooling layer, setting the convolution step size to a value greater than 1 can avoid a large amount of information loss caused by pooling, and reduce model complexity and overtime without affecting the accuracy of the model. The risk of fitting improves the running speed of the model.

In addition, compared with the traditional convolutional layer, the depth separable convolutional layer greatly reduces the number of parameters that need to be trained, which reduces the complexity of the model and improves the running speed of the model.

The following takes the first depth separable convolution layer 1 in FIG. 8 as an example to illustrate the technical effect of the depth separable convolution layer compared with the traditional convolution layer.

The target data Z is a 128*6 data matrix, as shown in Table 2 below:

Table 2

z _1,1 z _1,1	z _1,2 z _1,2	z _1,3 z _1,3	z _1,4 z _1,4	z _1,5 z _1,5	z _1,6 z _1,6
z _2,1 z _2,1	z _2,2 z _2,2	z _2,3 z _2,3	z _2,4 z _2,4	z _2,5 z _2,5	z _2,6 z _2,6
z _3,1 z _3,1	z _3,2 z _3,2	z _3,3 z _3,3	z _3,4 z _3,4	z _3,5 z _3,5	z _3,6 z _3,6
z _4,1 z _4,1	z _4,2 z _4,2	z _4,3 z _4,3	z _4,4 z _4,4	z _4,5 z _4,5	z _4,6 z _4,6
z _5,1 z _5,1	z _5,2 z _5,2	z _5,3 z _5,3	z _5,4 z _5,4	z _5,5 z _5,5	z _5,6 z _5,6
z _6,1 z _6,1	z _6,2 z _6,2	z _6,3 z _6,3	z _6,4 z _6,4	z _6,5 z _6,5	z _6,6 z _6,6
……	……	……	……	……	……
z _128,1 z _128,1	z _128,2 z _128,2	z _128,3 z _128,3	z _128,4 z _128,4	z _128,5 z _128,5	z _128,6 z _128,6

As shown in FIG. 8, the parameters of the first depth separable convolution layer 1 are: the convolution step stride is 1, the kernel size of the convolution kernel is 5, that is, 5*1, and the number of output channels filters is 16.

The processing process of the above-mentioned target data Z in the separable convolutional layer 1 at the first depth is:

Taking each column of status information in the target data Z as an input channel, the number of input channels of the target data Z is 6.

First, set the same number of convolution kernels with a size of 5*1 as the number of input channels, and perform convolution on each input channel separately. That is, six 5*1 convolution kernels are set, and each 5*1 convolution kernel is used to convolve a column of data in the target data Z. After each 5*1 convolution kernel performs convolution processing on a column of data in the target data Z (padding is set), a 128*1 vector will be obtained, that is, a column of data will be obtained. After the 6 convolution kernels complete the convolution processing, 6 columns of 128*1 data will be obtained, that is, a feature map F with a size of 128*6 is obtained, as shown in Table 3 below. In this step, the parameters to be trained for the 6 5*1 convolution kernels are the weights of each position in the convolution kernel, and there are 6*5*1=30 in total (the convolution kernel is not biased in this example).

table 3

f _1,1 f _1,1	f _1,2 f _1,2	f _1,3 f _1,3	f _1,4 f _1,4	f _1,5 f _1,5	f _1,6 f _1,6
f _2,1 f _2,1	f _2,2 f _2,2	f _2,3 f _2,3	f _2,4 f _2,4	f _2,5 f _2,5	f _2,6 f _2,6
f _3,1 f _3,1	f _3,2 f _3,2	f _3,3 f _3,3	f _3,4 f _3,4	f _3,5 f _3,5	f _3,6 f _3,6
f _4,1 f _4,1	f _4,2 f _4,2	f _4,3 f _4,3	f _4,4 f _4,4	f _4,5 f _4,5	f _4,6 f _4,6
f _5,1 f _5,1	f _5,2 f _5,2	f _5,3 f _5,3	f _5,4 f _5,4	f _5,5 f _5,5	f _5,6 f _5,6
f _6,1 f _6,1	f _6,2 f _{6, 2}	f _6,3 f _6,3	f _6,4 f _6,4	f _6,5 f _6,5	f _6,6 f _6,6
……	……	……	……	……	……
f _128,1 f _128,1	f _128,2 f _128,2	f _128,3 f _128,3	f _128,4 f _128,4	f _128,5 f _128,5	f _128,6 f _128,6

Subsequently, a convolution kernel with the same size as the number of output channels of 1*the number of input channels is set, and each row of data of the feature map obtained in the previous step is convolved. That is, 16 convolution kernels with a size of 1*6 are set, and each convolution kernel convolves the feature map F separately. Take a 1*6 convolution kernel [w ₁ , w ₂ , w ₃ , w ₄ , w ₅ , w ₆ ] as an example. The convolution kernel performs convolution processing on each row of the feature map F in turn, After one line is processed, a value p ₁ is obtained, that is, f ₁₁ w ₁ +f ₁₂ w ₂ +f ₁₃ w ₃ +f ₁₄ w ₄ +f ₁₅ w ₅ +f ₁₆ w ₆ =p ₁ . Similarly, the 128 rows of data of the feature map F are convolved to obtain 128 values p ₁ to p ₁₂₈ , and these 128 values form a 128*1 column vector.

Each 1*6 convolution kernel performs convolution processing on the feature map F, and a 128*1 column vector can be obtained. Correspondingly, 16 1*6 convolution kernels perform convolution processing on the feature map F, and 16 128*1 column vectors will be obtained, that is, a 128*16 feature map will be obtained. In this step, the parameters to be trained in the 16 1*6 convolution kernels are the weights of each position in the convolution kernel, and there are 16*1*6=96 in total (the convolution kernel is not biased in this example).

Therefore, in the first depth separable convolutional layer 1, there are a total of 30+96=126 parameters that need to be trained.

If a traditional convolutional layer is used to convolve the 128*6 target data Z, according to the convolution kernel size of 5*1 and the number of output channels set in Figure 8, the processing process is as follows:

Set 16 5*1 convolution kernels, the number of input channels is 6, then each convolution kernel also needs to have 6 channels, that is, the depth of the convolution kernel is 6, which is equivalent to setting 16 5*1*6 The convolution kernel. Each 5*1*6 convolution kernel convolves the 6 input channels of the target data Z to obtain a 128*1 feature vector. Correspondingly, 16 5*1*6 convolution kernels will get 16 A 128*1 feature vector, that is, a 128*16 feature map is obtained. In this process, the parameter to be trained is the weight of each position in the convolution kernel (the convolution kernel is not biased in this example), and there are 16*5*1*6=480 in total.

It can be seen from the above that using a depth separable convolutional layer to convolve the target data Z requires 126 parameters to be trained; while using traditional convolutional layer processing, 480 parameters need to be trained. Compared with the traditional convolutional layer, the depth separable convolutional layer greatly reduces the number of parameters that need to be trained, reduces the complexity of the model, and improves the running speed of the model.

Although specific functions have been discussed above with reference to specific modules, it should be noted that the functions of each module discussed herein may be divided into multiple modules, and/or at least some functions of multiple modules may be combined into a single module. The specific module execution action discussed herein includes the specific module itself performing the action, or alternatively the specific module calls or otherwise accesses another component or module that performs the action (or performs the action in conjunction with the specific module). Therefore, the specific module that performs an action may include the specific module itself that performs the action and/or another module that is called or accessed by the specific module to perform the action. For example, the decorrelation module 620 and the classification module 630 described above may be combined into a single module in some embodiments.

It should also be understood that various techniques may be described herein in the general context of software hardware elements or program modules. The various modules described above with respect to FIG. 6 may be implemented in hardware or in hardware combined with software and/or firmware. For example, these modules may be implemented as computer program codes/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, these modules can be implemented as hardware logic/circuitry. For example, in some embodiments, one or more of the data acquisition module 610, the decorrelation module 620, and the classification module 630 may be implemented together in a System on Chip (SoC). The SoC may include an integrated circuit chip (which includes a processor (for example, a central processing unit (CPU), a microcontroller, a microprocessor, a digital signal processor (Digital Signal Processor, DSP), etc.), a memory, a Or multiple communication interfaces, and/or one or more components in other circuits), and may optionally execute the received program code and/or include embedded firmware to perform functions.

According to an aspect of the present disclosure, there is provided a computer device including a memory, a processor, and a computer program stored on the memory. The processor is configured to execute a computer program to implement the steps of any method embodiment described above.

According to an aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having a computer program stored thereon, and the computer program, when executed by a processor, implements the steps of any method embodiment described above.

According to an aspect of the present disclosure, there is provided a computer program product, which includes a computer program that, when executed by a processor, implements the steps of any method embodiment described above.

In the following, illustrative examples of such computer equipment, non-transitory computer-readable storage media, and computer program products are described in conjunction with FIG. 9.

Figure 9 shows an example configuration of a computer device 900 that can be used to implement the methods described herein. For example, the mobile device 110 and/or the server 120 shown in FIG. 1 may include an architecture similar to the computer device 900. The user behavior recognition apparatus 600 described above may also be fully or at least partially implemented by the computer device 900 or similar devices or systems.

The computer device 900 may be a variety of different types of devices, such as a server of a service provider, a device associated with a mobile device, a system on a chip, and/or any other suitable computer device or computing system. Examples of computer equipment 900 include, but are not limited to: desktop computers, server computers, laptops or netbook computers, mobile devices (for example, tablet computers, cellular or other wireless phones (for example, smart phones), notebook computers, mobile stations), Wearable devices (eg, glasses, watches), entertainment devices (eg, entertainment appliances, set-top boxes communicatively coupled to display devices, game consoles), televisions or other display devices, car computers, and so on. Therefore, the computer device 900 can range from full-resource devices with large amounts of memory and processor resources (e.g., personal computers, game consoles) to low-resource devices with limited memory and/or processing resources (e.g., traditional set-top boxes). , Handheld game console).

The computer device 900 may include at least one processor 902, a memory 904, a communication interface(s) 906, a display device 908, and other input/output (I/O) devices capable of communicating with each other, such as through the system bus 914 or other appropriate connections. 910 and one or more mass storage devices 912. When the computer device 900 is implemented as the mobile device 110 in FIG. 1, the computing device 900 further includes a sensor 924 for collecting data on its motion state. The sensor 924 includes, but is not limited to, an acceleration sensor, a gyroscope, and the like.

The processor 902 may be a single processing unit or multiple processing units, and all processing units may include a single or multiple computing units or multiple cores. The processor 902 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits, and/or any devices that manipulate signals based on operating instructions. In addition to other capabilities, the processor 902 may be configured to obtain and execute computer-readable instructions stored in the memory 904, the mass storage device 912, or other computer-readable media, such as the program code of the operating system 916, and the application program 918 The program code of 920, the program code of other programs 920, etc.

The memory 904 and the mass storage device 912 are examples of computer-readable storage media for storing instructions, which are executed by the processor 902 to implement the various functions described above. For example, the memory 904 may generally include both volatile memory and non-volatile memory (e.g., RAM, ROM, etc.). In addition, the mass storage device 912 may generally include hard disk drives, solid state drives, removable media, including external and removable drives, memory cards, flash memory, floppy disks, optical disks (such as CDs, DVDs), storage arrays, and network-attached storage. , Storage Area Network, etc. Both the memory 904 and the mass storage device 912 may be collectively referred to herein as a memory or a computer-readable storage medium, and may be non-transitory media capable of storing computer-readable and processor-executable program instructions as computer program codes. The computer program code may be executed by the processor 902 as a specific machine configured to implement the operations and functions described in the examples herein.

Multiple program modules may be stored on the mass storage device 912. These programs include an operating system 916, one or more application programs 918, other programs 920, and program data 922, and they can be loaded into the memory 904 for execution. Examples of such application programs or program modules may include, for example, computer program logic (e.g., computer program code or instructions) for implementing the following components/functions: client application 112 (including data receiving module 610, decorrelation module 620, and classification Module 630), method 200 (including any suitable steps of method 200), and/or additional embodiments described herein.

Although illustrated in FIG. 9 as being stored in the memory 904 of the computer device 900, the

modules

916, 918, 920, and 922 or parts thereof may be implemented using any form of computer readable media that can be accessed by the computer device 900. As used herein, "computer-readable media" includes at least two types of computer-readable media, namely computer storage media and communication media.

Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented by any method or technology for storing information, such as computer-readable instructions, data structures, program modules, or Other data. Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other storage technologies, CD-ROM, digital versatile disk (DVD), or other optical storage devices, magnetic cassettes, tapes, disk storage devices or other magnetic storage devices, Or any other non-transmission medium that can be used to store information for computer equipment to access.

In contrast, a communication medium can specifically implement computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transmission mechanism. Computer storage media as defined herein does not include communication media.

The computer device 900 may also include one or more communication interfaces 906 for exchanging data with other devices, such as through a network, direct connection, etc., as discussed above. Such a communication interface may be one or more of the following: any type of network interface (for example, network interface card (NIC)), wired or wireless (such as IEEE 802.11 wireless LAN (WLAN)) wireless interface, global microwave Access interoperability (Wi-MAX) interface, Ethernet interface, universal serial bus (USB) interface, cellular network interface, Bluetooth ^TM interface, near field communication (NFC) interface, etc. The communication interface 906 can facilitate communication within a variety of network and protocol types, including wired networks (such as LAN, cable, etc.) and wireless networks (such as WLAN, cellular, satellite, etc.), the Internet, and so on. The communication interface 906 may also provide communication with external storage devices (not shown) such as in storage arrays, network-attached storage, storage area networks, and the like.

In some examples, a display device 908 such as a monitor may be included for displaying information and images to the user. The other I/O device 910 may be a device that receives various inputs from the user and provides various outputs to the user, and may include a touch input device, a gesture input device, a camera, a keyboard, a remote control, a mouse, a printer, and an audio input/ Output devices and so on.

Although the present disclosure has been illustrated and described in detail in the accompanying drawings and the foregoing description, such illustration and description should be regarded as illustrative and illustrative, and not restrictive; the present disclosure is not limited to what is disclosed Examples. By studying the drawings, the disclosure, and the appended claims, those skilled in the art can understand and implement modifications to the disclosed embodiments when practicing the claimed subject matter. In the claims, the word "comprising" does not exclude other elements or steps not listed, and the word "a" or "an" does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to benefit.

Although the embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it should be understood that the above-mentioned methods, systems, and devices are merely exemplary embodiments or examples, and the scope of the present invention is not limited by these embodiments or examples, but It is only limited by the authorized claims and their equivalent scope. Various elements in the embodiments or examples may be omitted or may be replaced by equivalent elements. In addition, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples can be combined in various ways. What is important is that as technology evolves, many elements described herein can be replaced by equivalent elements that appear after this disclosure.

Claims

A method for identifying user behaviors, including:

Acquiring multiple sets of exercise state data, each set of exercise state data includes multiple pieces of state information, and the state information is used to indicate the exercise state of the mobile device carried by the user;

Perform decorrelation processing on the multiple sets of exercise state data to eliminate the correlation of various state information in each group of exercise state data to obtain target data; and

The user behavior type is determined according to the target data.
The method according to claim 1, wherein said acquiring multiple sets of motion state data comprises:

Acquire multiple sets of motion state data collected by sensors deployed in the mobile device within a predetermined period of time according to a predetermined frequency.
The method according to claim 1, wherein the exercise status data includes:

Acceleration information of the mobile device in multiple directions; and

Angular velocity information of the mobile device in multiple directions.
The method according to any one of claims 1 to 3, wherein the decorrelation processing on the multiple sets of motion state data comprises:

Perform centralized processing on the multiple sets of exercise state data, so that the average value of each item of state information after processing is 0, to obtain centralized data; and

Determine the transformation matrix of the centralized data, and multiply the centralized data by the transformation matrix to obtain target data,

Wherein, the transformation matrix makes the covariance matrix of the target data an identity matrix, thereby eliminating the correlation of various status information.
The method according to claim 4, wherein the determining the transformation matrix of the centralized data comprises:

Calculate the covariance matrix Σ of the centralized data;

Perform singular value decomposition on the covariance matrix Σ, and transform the covariance matrix Σ into the product of the first matrix U, the second matrix S, and the transposed matrix U T of the first matrix U, where the first matrix U is the covariance matrix U A matrix composed of the eigenvectors of the variance matrix Σ, and the second matrix S is a diagonal matrix composed of the eigenvalues of the covariance matrix Σ; and

The product of the matrices U, S -1/2 and U T is used as the transformation matrix, where the value of each element in S -1/2 is the reciprocal of the square root of the element at the corresponding position in the second matrix S.
The method according to claim 1, wherein the user behavior types include: standing still, walking, running, cycling, taking a bus, taking a subway, and taking a small car.
The method of claim 1, wherein:

The decorrelation processing on the multiple sets of motion state data includes: inputting the multiple sets of motion state data into a preset user behavior recognition model, so that the user behavior recognition model performs de-correlation processing on the multiple sets of motion state data. Related processing,

The determining the user behavior type according to the target data includes: processing the target data by the user behavior recognition model to determine the user behavior type.
8. The method according to claim 7, wherein the user behavior recognition model is trained by using the exercise state data marked with the user behavior type as the training sample.
A user behavior recognition device, including:

A data acquisition module, configured to acquire multiple sets of exercise state data, each set of exercise state data includes multiple pieces of state information, and the state information is used to indicate the exercise state of the mobile device carried by the user;

The decorrelation module is configured to decorrelate the multiple sets of motion state data to eliminate the correlation of various state information in each set of motion state data to obtain target data; and

The classification module is configured to determine the user behavior type according to the target data.
The device according to claim 9, wherein:

The data acquisition module is further configured to acquire multiple sets of status data collected by sensors deployed in the mobile device within a predetermined period of time according to a predetermined frequency.
The device according to claim 9, wherein the exercise state data comprises:

Acceleration information of the mobile device in multiple directions; and

Angular velocity information of the mobile device in multiple directions.
The apparatus according to any one of claims 9-11, wherein the decorrelation module is further configured to:

Perform centralized processing on the multiple sets of exercise state data, so that the average value of each item of state information after processing is 0, to obtain centralized data; and

Determine the transformation matrix of the centralized data, and multiply the centralized data by the transformation matrix to obtain target data,

Wherein, the transformation matrix makes the covariance matrix of the target data an identity matrix, thereby eliminating the correlation of various status information.
The apparatus according to claim 12, wherein the decorrelation module is further configured to:

Calculate the covariance matrix Σ of the centralized data;

Perform singular value decomposition on the covariance matrix Σ, and transform the covariance matrix Σ into the product of the first matrix U, the second matrix S, and the transposed matrix U T of the first matrix U, where the first matrix U is the covariance matrix U A matrix composed of the eigenvectors of the variance matrix Σ, and the second matrix S is a diagonal matrix composed of the eigenvalues of the covariance matrix Σ; and

The product of the matrices U, S -1/2 and U T is used as the transformation matrix, where the value of each element in S -1/2 is the reciprocal of the square root of the element at the corresponding position in the second matrix S.
9. The device according to claim 9, wherein the decorrelation module and the classification module form a user behavior recognition model, and the user behavior recognition model is trained using exercise state data marked with user behavior types as training samples.
The apparatus according to claim 14, wherein the decorrelation module includes a batch normalization layer configured to adjust the mean value and variance of each status data in the target data.
The apparatus according to claim 14, wherein the classification module includes a plurality of convolution processing units, each convolution processing unit includes two depth separable convolution layers, wherein,

The convolution step length of the first depth separable convolution layer is 1;

The input of the second-depth separable convolutional layer is the output of the first-depth separable convolutional layer, and the step size of the second-depth separable convolutional layer is greater than one.
A computer equipment including:

A memory, a processor, and a computer program stored on the memory,

Wherein, the processor is configured to execute the computer program to implement the steps of the method in any one of claims 1 to 8.
A non-transitory computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of any one of claims 1 to 8 when the computer program is executed by a processor.
A computer program product comprising a computer program, wherein when the computer program is executed by a processor, the steps of the method described in any one of claims 1 to 8 are realized.