CN111796926A

CN111796926A - Instruction execution method and device, storage medium and electronic equipment

Info

Publication number: CN111796926A
Application number: CN201910282156.6A
Authority: CN
Inventors: 陈仲铭; 何明
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2020-10-20

Abstract

The embodiment of the application discloses an instruction execution method, an instruction execution device, a storage medium and electronic equipment, wherein when a user instruction is received, a first feature vector is generated according to the user instruction; acquiring current panoramic data and generating a second eigenvector according to the panoramic data; fusing the first feature vector and the second feature vector to generate a user feature matrix; acquiring an intention label matched with a user instruction according to the user characteristic matrix and a pre-trained intention prediction model; the user instruction is executed according to the intention tag. According to the scheme, the context information and the contextual information of the user instruction are supplemented by collecting panoramic data, so that the implicit intention of the user can be more accurately understood, and the user instruction can be better executed.

Description

Instruction execution method and device, storage medium and electronic equipment

Technical Field

The application relates to the technical field of terminals, in particular to an instruction execution method, an instruction execution device, a storage medium and electronic equipment.

Background

The user generally controls the terminal to perform corresponding operations, such as a voice instruction, a touch operation instruction, and the like, through the control instruction. With the development of terminal technology, a terminal is required to understand the implicit intention of a control instruction sent by a user, so that the user instruction can be better executed; however, this way of recognizing the user's intention is only based on the user's voice command, which makes it difficult to accurately understand and draw the actual needs of the user.

Disclosure of Invention

The embodiment of the application provides an instruction execution method, an instruction execution device, a storage medium and electronic equipment, which can improve the accuracy of recognition of a terminal on a user implicit intention so as to more accurately execute a user instruction.

In a first aspect, an embodiment of the present application provides an instruction execution method, including:

when a user instruction is received, generating a first feature vector according to the user instruction;

acquiring current panoramic data and generating a second eigenvector according to the panoramic data;

fusing the first feature vector and the second feature vector to generate a user feature matrix;

acquiring an intention label matched with the user instruction according to the user characteristic matrix and a pre-trained intention prediction model;

executing the user instruction according to the intention tag.

In a second aspect, an embodiment of the present application provides an instruction execution apparatus, including:

the signal feature extraction module is used for generating a first feature vector according to a user instruction when the user instruction is received;

the scene feature extraction module is used for acquiring current panoramic data and generating a second feature vector according to the panoramic data;

the feature fusion module is used for carrying out fusion processing on the first feature vector and the second feature vector to generate a user feature matrix;

the intention prediction module is used for acquiring an intention label matched with the user instruction according to the user characteristic matrix and a pre-trained intention prediction model;

and the instruction execution module is used for executing the user instruction according to the intention label.

In a third aspect, a storage medium is provided in this application, and a computer program is stored on the storage medium, and when the computer program runs on a computer, the computer is caused to execute the instruction execution method provided in any embodiment of this application.

In a fourth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory has a computer program, and the processor is configured to execute the instruction execution method provided in any embodiment of the present application by calling the computer program.

According to the technical scheme, when a user instruction is received, a first feature vector is generated according to the user instruction, current panoramic data is obtained, a second feature vector is generated according to the panoramic data, the first feature vector and the second feature vector are subjected to fusion processing to generate a user feature matrix, the user feature matrix is used as input data of a pre-trained intention prediction model to obtain an intention label matched with the user instruction, and the user instruction is executed according to the intention label.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of a panoramic sensing architecture of an instruction execution method according to an embodiment of the present application.

Fig. 2 is a schematic application scenario diagram of an instruction execution method according to an embodiment of the present application.

Fig. 3 is a first flowchart illustrating an instruction execution method according to an embodiment of the present disclosure.

Fig. 4 is a second flowchart illustrating an instruction execution method according to an embodiment of the present application.

Fig. 5 is a third flowchart illustrating an instruction execution method according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of an instruction execution apparatus according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of a first electronic device according to an embodiment of the present application.

Fig. 8 is a schematic structural diagram of a second electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present application.

The terms "first", "second", and "third", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules listed, but rather, some embodiments may include other steps or modules not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Referring to fig. 1, fig. 1 is a schematic view of a panoramic sensing architecture of an instruction execution method according to an embodiment of the present disclosure. The instruction execution method is applied to the electronic equipment. A panoramic perception framework is arranged in the electronic equipment. The panoramic sensing architecture is an integration of hardware and software for implementing the instruction execution method in an electronic device.

The panoramic perception architecture comprises an information perception layer, a data processing layer, a feature extraction layer, a scene modeling layer and an intelligent service layer.

The information perception layer is used for acquiring information of the electronic equipment or information in an external environment. The information-perceiving layer may include a plurality of sensors. For example, the information sensing layer includes a plurality of sensors such as a distance sensor, a magnetic field sensor, a light sensor, an acceleration sensor, a fingerprint sensor, a hall sensor, a position sensor, a gyroscope, an inertial sensor, an attitude sensor, a barometer, and a heart rate sensor.

Among other things, a distance sensor may be used to detect a distance between the electronic device and an external object. The magnetic field sensor may be used to detect magnetic field information of the environment in which the electronic device is located. The light sensor can be used for detecting light information of the environment where the electronic equipment is located. The acceleration sensor may be used to detect acceleration data of the electronic device. The fingerprint sensor may be used to collect fingerprint information of a user. The Hall sensor is a magnetic field sensor manufactured according to the Hall effect, and can be used for realizing automatic control of electronic equipment. The location sensor may be used to detect the geographic location where the electronic device is currently located. Gyroscopes may be used to detect angular velocity of an electronic device in various directions. Inertial sensors may be used to detect motion data of an electronic device. The gesture sensor may be used to sense gesture information of the electronic device. A barometer may be used to detect the barometric pressure of the environment in which the electronic device is located. The heart rate sensor may be used to detect heart rate information of the user.

And the data processing layer is used for processing the data acquired by the information perception layer. For example, the data processing layer may perform data cleaning, data integration, data transformation, data reduction, and the like on the data acquired by the information sensing layer.

The data cleaning refers to cleaning a large amount of data acquired by the information sensing layer to remove invalid data and repeated data. The data integration refers to integrating a plurality of single-dimensional data acquired by the information perception layer into a higher or more abstract dimension so as to comprehensively process the data of the plurality of single dimensions. The data transformation refers to performing data type conversion or format conversion on the data acquired by the information sensing layer so that the transformed data can meet the processing requirement. The data reduction means that the data volume is reduced to the maximum extent on the premise of keeping the original appearance of the data as much as possible.

The characteristic extraction layer is used for extracting characteristics of the data processed by the data processing layer so as to extract the characteristics included in the data. The extracted features may reflect the state of the electronic device itself or the state of the user or the environmental state of the environment in which the electronic device is located, etc.

The feature extraction layer may extract features or process the extracted features by a method such as a filtering method, a packing method, or an integration method.

The filtering method is to filter the extracted features to remove redundant feature data. Packaging methods are used to screen the extracted features. The integration method is to integrate a plurality of feature extraction methods together to construct a more efficient and more accurate feature extraction method for extracting features.

The scene modeling layer is used for building a model according to the features extracted by the feature extraction layer, and the obtained model can be used for representing the state of the electronic equipment, the state of a user, the environment state and the like. For example, the scenario modeling layer may construct a key value model, a pattern identification model, a graph model, an entity relation model, an object-oriented model, and the like according to the features extracted by the feature extraction layer.

The intelligent service layer is used for providing intelligent services for the user according to the model constructed by the scene modeling layer. For example, the intelligent service layer can provide basic application services for users, perform system intelligent optimization for electronic equipment, and provide personalized intelligent services for users.

In addition, the panoramic perception architecture can further comprise a plurality of algorithms, each algorithm can be used for analyzing and processing data, and the plurality of algorithms can form an algorithm library. For example, the algorithm library may include algorithms such as markov algorithm, hidden dirichlet distribution algorithm, bayesian classification algorithm, support vector machine, K-means clustering algorithm, K-nearest neighbor algorithm, conditional random field, residual network, long-short term memory network, convolutional neural network, cyclic neural network, and the like.

Based on the panoramic sensing framework, the electronic device collects panoramic data of the user through the information sensing layer and/or other modes, and the data processing layer processes the panoramic data, for example, data cleaning, data integration and the like are performed on the obtained panoramic data. Then, the intelligent service layer responds to the user instruction according to the instruction execution method provided by the application, for example, when the user instruction is received, a first feature vector is generated according to the user instruction, current panoramic data is obtained, a second feature vector is generated according to the panoramic data, the first feature vector and the second feature vector are subjected to fusion processing to generate a user feature matrix, the user feature matrix is used as input data of a pre-trained intention prediction model to obtain an intention label matched with the user instruction, and the user instruction is executed according to the intention label.

An execution main body of the instruction execution method may be the instruction execution device provided in the embodiment of the present application, or an electronic device integrated with the instruction execution device, where the instruction execution device may be implemented in a hardware or software manner. The electronic device may be a smart phone, a tablet computer, a palm computer, a notebook computer, or a desktop computer.

Referring to fig. 2, fig. 2 is a schematic view of an application scenario of the instruction execution method according to the embodiment of the present application, taking an example that an instruction execution device is integrated in an electronic device, where the electronic device may receive a user instruction, such as a voice instruction, a touch instruction, or a holding instruction, and when receiving the user instruction, generate a first feature vector according to the user instruction; then, the current panoramic data of the electronic equipment, such as user information, sensor state data, use information of an application program in the electronic equipment and the like, is collected, a second feature vector is generated according to the collected panoramic data, then the first feature vector and the second feature vector are subjected to fusion processing to generate a user feature matrix, an intention label corresponding to the user feature matrix is generated according to the user feature matrix and an intention prediction model obtained by pre-training, and a user instruction is executed according to the intention label.

Referring to fig. 3, fig. 3 is a first flowchart illustrating an instruction execution method according to an embodiment of the present disclosure. The specific flow of the instruction execution method provided by the embodiment of the application may be as follows:

step 101, when a user instruction is received, generating a first feature vector according to the user instruction.

In the present embodiment, the user command may be of various types, such as a voice command, a touch command, and a holding command. When the electronic equipment receives a user instruction, signal data corresponding to the user instruction are obtained, normalization processing is carried out on the signal data, a corresponding first feature vector is generated, and information contained in the user instruction is represented by using the first feature vector. Wherein the first feature vector may be represented as follows:

s₁＝{yi₁,yi₂,…,yi_n}

for example, in one embodiment, a voice component, such as a microphone, is disposed in the electronic device, and the electronic device may continuously collect voice data of the user through the microphone. After the electronic device starts the voice recognition function, the user can control the electronic device through a voice instruction, such as instructions of "playing music", "making a call", and the like. For example, when a user instruction is received, the step of generating a first feature vector according to the user instruction includes: when a voice instruction is received, voice data acquired by a voice component is acquired; and generating a semantic feature vector of the voice data according to a pre-trained self-coding recurrent neural network, and taking the semantic feature vector as the first feature vector.

The self-coding neural network model consists of an encoder and a decoder, the output of the network is equal to the input, the network also comprises a middle hidden layer, and the middle hidden layer can extract semantic feature vectors of voice data. According to the scheme, the self-coding cyclic neural network is adopted to extract semantic feature vectors from voice data, and input data and output data of the self-coding cyclic neural network are the voice data. When the network is trained, the voice data does not need to be labeled, a large amount of voice data is collected in advance to be used as input and output of the network, and the network determines parameters through self-learning.

Alternatively, in another embodiment, other feature extraction methods may also be used to obtain the speech feature vector. Step 101, when a user instruction is received, the step of generating a first feature vector according to the user instruction may include:

when a voice instruction is received, voice data acquired by a voice component is acquired;

converting the voice data into a spectrogram according to an audio feature extraction algorithm;

and generating a semantic feature vector of the voice data according to a pre-trained self-coding convolutional neural network and the spectrogram, and taking the semantic feature vector as the first feature vector.

The audio feature extraction algorithm may be an MFCC (Mel Frequency Cepstrum Coefficient) algorithm or an FFT (Fast Fourier transform) algorithm, and converts voice data into a spectrogram through the audio feature extraction algorithm, and extracts semantic feature vectors from the network by using the spectrogram as input data and output data of a self-coding convolutional neural network. Similar to the self-coding recurrent neural network, the self-coding convolutional neural network is also a self-encoder, wherein the self-coding convolutional neural network uses convolutional layers to construct a self-encoder, and the output data of the self-encoder is trained to be consistent with the input data so as to obtain the valuable information in the middle hidden layer.

The electronic equipment acquires the semantic vector features through the scheme and takes the semantic vector features as a first feature vector.

For another example, in an embodiment, the user command is a touch command; when a user instruction is received, the step of generating a first feature vector according to the user instruction may include: when a touch instruction is received, acquiring touch data acquired by a touch sensor; converting the touch data into the first feature vector.

The electronic device is provided with a touch sensor for detecting touch operations, for example, the touch sensor is arranged below a touch screen, and can detect various touch operations and touch gestures input by a user. When a touch instruction is received, touch data acquired by a touch sensor is acquired, wherein the touch data comprises a touch position and a touch track.

For another example, in an embodiment, the user command is a holding command; when a user instruction is received, the step of generating a first feature vector according to the user instruction may include: when a holding instruction is received, holding data acquired by a holding sensor is acquired; converting the grip data into the first feature vector.

The holding sensors are arranged at the positions of the frame, the rear cover and the like of the electronic equipment, and can detect a holding instruction triggered by holding operation of a user. The holding data may be position information, such as position coordinates. In order to facilitate the subsequent fusion of a plurality of eigenvectors into an eigenvector matrix, the preset length of the eigenvector needs to be defined in advance. The electronic device converts the acquired position coordinates into a vector with a preset length, for example, if the preset vector length is 10, the position coordinates with the length of 2 are converted into a first feature vector with the length of 10 in a repeated superposition manner.

And 102, acquiring current panoramic data, and generating a second eigenvector according to the panoramic data.

In the embodiment of the present application, the panoramic data of the electronic device includes, but is not limited to, the following data: terminal status data, user status data, and sensor status data.

The user state data comprises a face image of the user indirectly captured by a camera of the electronic equipment, and user information such as the age, the sex and the like of the user acquired from a user database.

The terminal operation data comprises operation modes of the user terminal in each time interval, wherein the operation modes comprise a game mode, an entertainment mode, a video mode and the like, the operation mode of the terminal can be determined according to the type of the currently operated application program, and the type of the currently operated application program can be directly obtained from the classification information of the application program installation package; or, the terminal operation data may further include a remaining power of the terminal, a display mode, a network state, a screen-off/lock state, and the like.

The sensor status data includes signals collected by various sensors on the electronic device, for example, the following sensors are included on the electronic device: a plurality of sensors such as distance sensor, magnetic field sensor, light sensor, acceleration sensor, fingerprint sensor, hall sensor, position sensor, gyroscope, inertial sensor, gesture sensor, barometer, heart rate sensor. The method comprises the steps of obtaining sensor state data of the electronic equipment when a user instruction is received, or obtaining sensor state data of the electronic equipment for a period of time before the user instruction is received. In some embodiments, the status data of some sensors may be acquired in a targeted manner.

Referring to fig. 4, fig. 4 is a second flowchart illustrating an instruction execution method according to an embodiment of the present disclosure. Step 102, obtaining current panoramic data, and generating a second eigenvector according to the panoramic data may include:

step 1021, acquiring current terminal state data, user state data and sensor state data;

step 1022, generating a terminal state feature according to the terminal state data, generating a user state feature according to the user state data, and generating a terminal scenario feature according to the sensor state data;

and 1023, fusing the terminal state feature, the user state feature and the terminal scene feature to generate the second feature vector.

In some embodiments, the step of acquiring the user status data may include: calling a camera assembly to capture a face image of a user; identifying the user face image according to a preset convolutional neural network model to generate a user emotion label; and acquiring user information, and taking the user emotion label and the user information as the user state data. The user information may be obtained from a user database, and the user information may include user gender, user age, user preferences, and the like.

In some embodiments, the step of acquiring current terminal state data includes: determining the program category of the currently running application program; determining the current operation mode of the terminal according to the program type, wherein the operation mode comprises a game mode, an entertainment mode and a video mode; and taking the operation mode as terminal state data.

In summary, the terminal state feature ys is generated according to the terminal state data₁From the state data of the sensors, for example, four-dimensional terminal attitude features ys are obtained from magnetometers, accelerometers, gyroscopes by means of a Kalman filtering algorithm₂～ys₅Acquiring the air pressure characteristic ys through the data acquired by the barometer₆Determining WIFI connection state ys through network module₇Positioning is carried out through data collected by position sensing, current position attributes (such as shopping malls, families, companies, parks and the like) of users are obtained, and features ys are generated₈(ii) a The method can further combine with magnetometer, acceleration sensor, gyroscope and barometer 10 axis information to obtain new multidimensional by using a filtering algorithm or a principal component analysis algorithmData, generating corresponding features ys₉. Generating features ys from user's emotional tags₁₀Generating features ys according to the gender, age and hobby of the user₁₁～ys₁₃. For the non-digital features, index numbers can be established and converted into digital representations, for example, for the feature of the current system terminal state operation mode, the index numbers are used to represent the current state mode, such as 1 is game mode, 2 is entertainment mode, and 3 is video mode. If the current operation mode is the game mode, the current system state ys1 is determined to be 1. After the characteristics represented by all numbers are obtained, the characteristic data are fused to obtain a long vector, and the long vector is normalized to obtain a second characteristic vector s₂：

s₂＝{ys₁,ys₂,…,ys_m}

And 103, fusing the first feature vector and the second feature vector to generate a user feature matrix.

Wherein the first feature vector s may be divided into₁And a second feature vector s₂Performing matrix superposition to generate the following user characteristic matrix:

if the lengths of the first eigenvector and the second eigenvector are not equal, the short-length vectors can be adjusted by zero padding, and if n is less than m, the first eigenvector s is padded by zero padding₁Is extended by m. If n is more than m, the second feature vector s is subjected to zero filling by adopting a zero filling mode₂Is extended by n.

Or, in an alternative embodiment, after the first feature vector and the second feature vector are adjusted to have the same length, the first feature vector s is adjusted to have the same length₁And a second feature vector s₂Performing matrix superposition, and then, turning the superposed matrix to generate the following matrix in order to generate richer feature vectors and provide subsequent operation:

and combining the matrixes before and after the overturning operation to obtain the following matrix as the user characteristic matrix.

And 104, acquiring an intention label matched with the user instruction according to the user characteristic matrix and a pre-trained intention prediction model.

In the embodiment of the application, the intention prediction model is a classification model and represents the relationship between the user characteristic matrix and the intention label. For example, the intention prediction model may be obtained by training a classification algorithm such as a convolutional neural network, a BP neural network (Back Propagation) or an SVM (Support Vector Machine) algorithm. Taking a convolutional neural network as an example, collecting sample data of a large number of test users, extracting features according to steps 101 to 103, and labeling the extracted features, for example, selecting a part of users from all the users as the test users, recording user instructions of the users, and generating a first feature vector; and collecting panoramic data when the user instruction is received, and generating a second feature vector according to the panoramic data. And then recording the response condition of the electronic device to the user instruction and the operation performed by the user based on the response condition, in an alternative embodiment, the intention label data can be automatically generated according to the response condition of the electronic device to the user instruction and the operation performed by the user based on the response condition, or in other embodiments, the label can be added to the user feature matrix in a manual labeling mode.

Inputting the user characteristic matrix with the intention label into a convolutional neural network for training, wherein the structure and the hyper-parameters of the convolutional neural network can be preset by a user according to needs, and the weight parameters of the network are determined through training to generate an intention prediction model.

In addition, in order to fully represent the user intention, the intention label corresponding to one user feature matrix is an intention label set, the intention label set comprises a plurality of labels capable of representing information such as user instructions, places, time, environments, states, habits and the like, and the labels depict the current panoramic image of the user.

The user feature matrix obtained in step 103 is input into the intention prediction model, and an intention label matched with the user instruction can be generated.

Step 105, executing the user instruction according to the intention label.

The intention label output by the intention prediction model is a label set consisting of a plurality of labels, the labels form a panoramic picture corresponding to the user instruction, and the electronic equipment responds to the user instruction and starts the corresponding target application and simultaneously pushes the label set to the target application.

For example, referring to fig. 5, fig. 5 is a third flowchart illustrating an instruction execution method according to an embodiment of the present application. Step 105, executing the user instruction according to the intention tag comprises:

step 1051, determining a target application corresponding to the user instruction according to the intention tag;

step 1052, launching the target application and sending the intention label to the target application, wherein the target application performs a corresponding operation based on the intention label.

For example, after a user triggers a voice command "xiaohuo, order me", and a voice recognition module of the system detects the command, obtains more data capable of embodying the user's intention through a panoramic sensing technology, for example, terminal state data, user state data, sensor state data, and the like, and supplements context information of the command with the data to obtain more comprehensive intention information, for example: current time, current location, scene of the user, eating habits of the user, hunger of the user, and the like. When the electronic equipment starts the third-party ordering application according to the instruction, the information is pushed to the third-party ordering application, and the third-party ordering application can recommend specific ordering places and ordering contents to the user according to the information.

As can be seen from the above, according to the instruction execution method provided in the embodiment of the application, when a user instruction is received, a first feature vector is generated according to the user instruction, current panoramic data is acquired, a second feature vector is generated according to the panoramic data, the first feature vector and the second feature vector are subjected to fusion processing, a user feature matrix is generated, the user feature matrix is then used as input data of a pre-trained intent prediction model, an intent tag matched with the user instruction is acquired, and the user instruction is executed according to the intent tag.

An instruction execution apparatus is also provided in an embodiment. Referring to fig. 6, fig. 6 is a schematic structural diagram of an instruction execution apparatus 400 according to an embodiment of the present disclosure. The instruction execution apparatus 400 is applied to an electronic device, and the instruction execution apparatus 400 includes a first feature extraction module 401, a second feature extraction module 402, a feature fusion module 403, an intention prediction module 404, and an instruction execution module 405, as follows:

the first feature extraction module 401 is configured to, when a user instruction is received, generate a first feature vector according to the user instruction.

In the present embodiment, the user command may be of various types, such as a voice command, a touch command, and a holding command. When a user instruction is received, the first feature extraction module 401 obtains signal data corresponding to the user instruction, performs normalization processing on the signal data, generates a corresponding first feature vector, and uses the first feature vector to represent information included in the user instruction. Wherein the first feature vector may be represented as follows:

s₁＝{yi₁,yi₂,…,yi_n}

for example, in one embodiment, a voice component, such as a microphone, is disposed in the electronic device, and the electronic device may continuously collect voice data of the user through the microphone. After the electronic device starts the voice recognition function, the user can control the electronic device through a voice instruction, such as instructions of "playing music", "making a call", and the like. For example, the first feature extraction module 401 is further configured to: when a voice instruction is received, voice data acquired by a voice component is acquired; and generating a semantic feature vector of the voice data according to a pre-trained self-coding recurrent neural network, and taking the semantic feature vector as the first feature vector.

Alternatively, in another embodiment, other feature extraction methods may also be used to obtain the speech feature vector. The first feature extraction module 401 is further configured to: when a voice instruction is received, voice data acquired by a voice component is acquired; converting the voice data into a spectrogram according to an audio feature extraction algorithm; and generating a semantic feature vector of the voice data according to a pre-trained self-coding convolutional neural network and the spectrogram, and taking the semantic feature vector as the first feature vector.

The first feature extraction module 401 obtains the semantic vector features through the above scheme, and uses the semantic vector features as a first feature vector.

For another example, in an embodiment, the user command is a touch command; the first feature extraction module 401 is further configured to: when a touch instruction is received, acquiring touch data acquired by a touch sensor; converting the touch data into the first feature vector.

For another example, in an embodiment, the user command is a holding command; the first feature extraction module 401 is further configured to: when a holding instruction is received, holding data acquired by a holding sensor is acquired; converting the grip data into the first feature vector.

A second feature extraction module 402, configured to obtain current panoramic data, and generate a second feature vector according to the panoramic data.

In the embodiment of the present application, the panoramic data of the electronic device includes, but is not limited to, the following data: terminal status data, user status data, and sensor status data. The user state data comprises a face image of the user indirectly captured by a camera of the electronic equipment, and user information such as the age, the sex and the like of the user acquired from a user database.

Optionally, in an embodiment, the second feature extraction module 402 is further configured to: acquiring current terminal state data, user state data and sensor state data; generating terminal state characteristics according to the terminal state data, generating user state characteristics according to the user state data, and generating terminal scene characteristics according to the sensor state data; and fusing the terminal state feature, the user state feature and the terminal scene feature to generate the second feature vector.

Wherein, in some embodiments, the second feature extraction module 402 is further configured to: calling a camera assembly to capture a face image of a user; identifying the user face image according to a preset convolutional neural network model to generate a user emotion label; and acquiring user information, and taking the user emotion label and the user information as the user state data. The user information may be obtained from a user database, and the user information may include user gender, user age, user preferences, and the like.

Wherein, in some embodiments, the second feature extraction module 402 is further configured to: determining the program category of the currently running application program; determining the current operation mode of the terminal according to the program type, wherein the operation mode comprises a game mode, an entertainment mode and a video mode; and taking the operation mode as terminal state data.

In summary, the terminal state feature ys is generated according to the terminal state data₁From the state data of the sensors, for example, four-dimensional terminal attitude features ys are obtained from magnetometers, accelerometers, gyroscopes by means of a Kalman filtering algorithm₂～ys₅Acquiring the air pressure characteristic ys through the data acquired by the barometer₆Determining WIFI connection state ys through network module₇Positioning is carried out through data collected by position sensing, current position attributes (such as shopping malls, families, companies, parks and the like) of users are obtained, and features ys are generated₈(ii) a Furthermore, the method can be combined with magnetometer, acceleration sensor, gyroscope and barometer 10 axis information to obtain new multi-dimensional data by using a filtering algorithm or a principal component analysis algorithm to generate corresponding characteristic ys₉. Generating features ys from user's emotional tags₁₀Generating features ys according to the gender, age and hobby of the user₁₁～ys₁₃. For the non-digital features, index number can be established and converted into digital representation, for example, the operation mode of the current system terminal stateThis feature uses the index number to represent the current status mode, such as 1 being game mode, 2 being entertainment mode, and 3 being video mode. If the current operation mode is the game mode, the current system state ys1 is determined to be 1. After the characteristics represented by all numbers are obtained, the characteristic data are fused to obtain a long vector, and the long vector is normalized to obtain a second characteristic vector s₂：

s₂＝{ys₁,ys₂,…,ys_m}

A feature fusion module 403, configured to perform fusion processing on the first feature vector and the second feature vector to generate a user feature matrix.

Wherein the feature fusion module 403 may fuse the first feature vector s₁And a second feature vector s₂Performing matrix superposition to generate the following user characteristic matrix:

Alternatively, in an alternative embodiment, after the first feature vector and the second feature vector are adjusted to have the same length, the feature fusion module 403 transforms the first feature vector s into the second feature vector s₁And a second feature vector s₂Performing matrix superposition, and then, turning the superposed matrix to generate the following matrix in order to generate richer feature vectors and provide subsequent operation:

And the intention predicting module 404 is configured to obtain an intention label matched with the user instruction according to the user feature matrix and a pre-trained intention predicting model.

The user feature matrix obtained by the feature fusion module 403 is input into the intention prediction model, and an intention label matched with the user instruction can be generated.

An instruction execution module 405, configured to execute the user instruction according to the intention tag.

The intention tag acquired by the intention prediction module 404 is a tag set composed of a plurality of tags, the tags form a panoramic image corresponding to the user instruction, and the instruction execution module 405 pushes the tag set to the target application while responding to the user instruction and starting the corresponding target application, so that the electronic device executes a simple operation of not only starting software but also executing more specific operations according to the received tag set after the target application is started.

For example, the instruction execution module 405 is further configured to: determining a target application corresponding to the user instruction according to the intention label; and starting the target application and sending the intention label to the target application, wherein the target application executes corresponding operation based on the intention label.

As can be seen from the above, in the instruction execution apparatus provided in the embodiment of the present application, when receiving the user instruction, the first feature extraction module 401 generates the first feature vector according to the user instruction, the second feature extraction module 402 acquires the current panoramic data, generating a second feature vector according to the panoramic data, performing fusion processing on the first feature vector and the second feature vector by a feature fusion module 403 to generate a user feature matrix, taking the user feature matrix as input data of a pre-trained intent prediction model by an intent prediction module 404 to obtain an intent tag matched with a user instruction, executing the user instruction according to the intent tag by an instruction execution module 405, according to the scheme, the context information and the contextual information of the user instruction are supplemented by collecting panoramic data, so that the implicit intention of the user can be more accurately understood, and the user instruction can be better executed.

The embodiment of the application also provides the electronic equipment. The electronic device can be a smart phone, a tablet computer and the like. As shown in fig. 7, fig. 7 is a schematic view of a first structure of an electronic device according to an embodiment of the present application. The electronic device 300 comprises a processor 301 and a memory 302. The processor 301 is electrically connected to the memory 302.

The processor 301 is a control center of the electronic device 300, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or calling a computer program stored in the memory 302 and calling data stored in the memory 302, thereby performing overall monitoring of the electronic device.

In this embodiment, the processor 301 in the electronic device 300 loads instructions corresponding to one or more processes of the computer program into the memory 302 according to the following steps, and the processor 301 runs the computer program stored in the memory 302, so as to implement various functions:

executing the user instruction according to the intention tag.

In some embodiments, the user instruction is a voice instruction; when a user instruction is received and a first feature vector is generated according to the user instruction, the processor 301 executes the following steps:

and generating a semantic feature vector of the voice data according to a pre-trained self-coding recurrent neural network, and taking the semantic feature vector as the first feature vector.

In some embodiments, the user instruction is a touch instruction; when a user instruction is received and a first feature vector is generated according to the user instruction, the processor 301 executes the following steps:

when a touch instruction is received, acquiring touch data acquired by a touch sensor;

converting the touch data into the first feature vector.

In some embodiments, the panoramic data includes terminal state data, user state data, and sensor state data; when acquiring current panoramic data and generating a second eigenvector according to the panoramic data, the processor 301 performs the following steps:

acquiring current terminal state data, user state data and sensor state data;

generating terminal state characteristics according to the terminal state data, generating user state characteristics according to the user state data, and generating terminal scene characteristics according to the sensor state data;

and fusing the terminal state feature, the user state feature and the terminal scene feature to generate the second feature vector.

In some embodiments, when obtaining current user state data, processor 301 performs the following steps:

calling a camera assembly to capture a face image of a user;

identifying the user face image according to a preset convolutional neural network model to generate a user emotion label;

and acquiring user information, and taking the user emotion label and the user information as the user state data.

In some embodiments, when the first feature vector and the second feature vector are fused to generate the user feature matrix, the processor 301 performs the following steps:

and performing matrix superposition processing on the first eigenvector and the second eigenvector to generate a user characteristic matrix.

In some embodiments, when the user instruction is executed according to the intent tag, processor 301 performs the steps of:

determining a target application corresponding to the user instruction according to the intention label;

and starting the target application and sending the intention label to the target application, wherein the target application executes corresponding operation based on the intention label.

Memory 302 may be used to store computer programs and data. The memory 302 stores computer programs containing instructions executable in the processor. The computer program may constitute various functional modules. The processor 301 executes various functional applications and data processing by calling a computer program stored in the memory 302.

In some embodiments, as shown in fig. 8, fig. 8 is a second schematic structural diagram of an electronic device provided in the embodiments of the present application. The electronic device 300 further includes: radio frequency circuit 303, display screen 304, control circuit 305, input unit 306, audio circuit 307, sensor 308, and power supply 309. The processor 301 is electrically connected to the rf circuit 303, the display 304, the control circuit 305, the input unit 306, the audio circuit 307, the sensor 308, and the power source 309, respectively.

The radio frequency circuit 303 is used for transceiving radio frequency signals to communicate with a network device or other electronic devices through wireless communication.

The display screen 304 may be used to display information entered by or provided to the user as well as various graphical user interfaces of the electronic device, which may be comprised of images, text, icons, video, and any combination thereof.

The control circuit 305 is electrically connected to the display screen 304, and is used for controlling the display screen 304 to display information.

The input unit 306 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. The input unit 306 may include a fingerprint recognition module.

Audio circuitry 307 may provide an audio interface between the user and the electronic device through a speaker, microphone. Where audio circuitry 307 includes a microphone. The microphone is electrically connected to the processor 301. The microphone is used for receiving voice information input by a user.

The sensor 308 is used to collect external environmental information. The sensor 308 may include one or more of an ambient light sensor, an acceleration sensor, a gyroscope, and the like.

The power supply 309 is used to power the various components of the electronic device 300. In some embodiments, the power source 309 may be logically coupled to the processor 301 through a power management system, such that functions to manage charging, discharging, and power consumption management are performed through the power management system.

Although not shown in fig. 8, the electronic device 300 may further include a camera, a bluetooth module, and the like, which are not described in detail herein.

As can be seen from the above, the electronic device provided in the embodiments of the present application may receive a user instruction, such as a voice instruction, a touch instruction, or a holding instruction, and generate a first feature vector according to the user instruction when receiving the user instruction; then, the current panoramic data of the electronic equipment, such as user information, sensor state data, use information of an application program in the electronic equipment and the like, is collected, a second feature vector is generated according to the collected panoramic data, then the first feature vector and the second feature vector are subjected to fusion processing to generate a user feature matrix, an intention label corresponding to the user feature matrix is generated according to the user feature matrix and an intention prediction model obtained by pre-training, and a user instruction is executed according to the intention label.

The embodiment of the present application further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer executes the instruction execution method according to any of the above embodiments.

It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

The instruction execution method, the instruction execution device, the storage medium, and the electronic device provided in the embodiments of the present application are described in detail above. The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An instruction execution method, comprising:

executing the user instruction according to the intention tag.

2. The instruction execution method of claim 1 wherein the user instruction is a voice instruction; when a user instruction is received, generating a first feature vector according to the user instruction, wherein the step comprises the following steps:

3. The instruction execution method of claim 1 wherein the user instruction is a voice instruction; when a user instruction is received, generating a first feature vector according to the user instruction, wherein the step comprises the following steps:

4. The instruction execution method of claim 1, wherein the user instruction is a touch instruction; when a user instruction is received, generating a first feature vector according to the user instruction, wherein the step comprises the following steps:

converting the touch data into the first feature vector.

5. The instruction execution method of any of claims 1 to 4, wherein the panoramic data comprises terminal state data, user state data, and sensor state data; the method comprises the steps of obtaining current panoramic data and generating a second eigenvector according to the panoramic data, wherein the steps comprise:

acquiring current terminal state data, user state data and sensor state data;

6. The instruction execution method of claim 5 wherein the step of obtaining current user state data comprises:

calling a camera assembly to capture a face image of a user;

7. The instruction execution method of any one of claims 1 to 4, wherein the step of performing fusion processing on the first feature vector and the second feature vector to generate a user feature matrix comprises:

8. The instruction execution method of any of claims 1-4, wherein executing the user instruction according to the intent tag comprises:

9. The instruction execution method of any one of claims 1 to 4, wherein the step of obtaining the intent tag matching the user instruction according to the user feature matrix and a pre-trained intent prediction model comprises:

and acquiring an intention label matched with the user instruction according to the user characteristic matrix and an intention prediction model obtained based on neural network training.

10. An instruction execution apparatus, comprising:

the first feature extraction module is used for generating a first feature vector according to a user instruction when the user instruction is received;

the second feature extraction module is used for acquiring current panoramic data and generating a second feature vector according to the panoramic data;

11. A storage medium having stored thereon a computer program, characterized in that, when the computer program is run on a computer, it causes the computer to execute the instruction execution method according to any one of claims 1 to 9.

12. An electronic device comprising a processor and a memory, the memory storing a computer program, wherein the processor is configured to perform the instruction execution method of any of claims 1 to 9 by invoking the computer program.