Disclosure of Invention
Based on the problems, the optimization problems of low gesture recognition precision and low recognition speed can be solved by selecting proper granularity by utilizing the capability of extracting features of a deep neural network and combining a multi-granularity information expression mode and a three-branch decision making idea.
The invention provides a method for optimizing a man-machine interaction interface of an intelligent cabin based on three decisions, which comprises the following steps:
s1, acquiring a gesture video in the cabin, and preprocessing the gesture video to obtain a static gesture image;
s2, performing segmentation processing on the gestures and the background in the gesture image to obtain a gesture area image;
s3, performing multi-granularity expression on the gesture area image from coarse granularity to fine granularity; extracting multi-granularity characteristics of the gesture area image by using a convolutional neural network;
s4, calculating the conditional probability of classifying each granularity gesture area image into each category from coarse granularity to fine granularity, and sequentially completing gesture recognition by utilizing three decisions;
s5, performing semantic conversion on the recognized gesture, and performing corresponding operation on the human-computer interaction interface according to the gesture recognition result after the semantic conversion;
s6, obtaining the best granularity by adopting a weighted summation mode, and repeatedly executing the steps S3-S5 by taking the best granularity as the finest granularity.
Further, the gesture area image is expressed in a multi-granularity mode from coarse granularity to fine granularity, and for the same gesture area image, the multi-granularity information expression mode is as follows:
wherein A isiInformation representing different granularities of images of gesture areas, A1Information indicating the coarse granularity of the image of the gesture area, AnInformation indicating that the gesture area image is in a fine granularity, namely the fine granularity comprises a coarse granularity; i 1,2, n, n represents the particle size.
Further, the extracting the multi-granularity features of the gesture area image by using the convolutional neural network comprises extracting the multi-granularity image features of the gesture image by using different convolutional kernels in the convolutional neural network.
Further, the step S4 includes extracting coarse-grained features from the gesture area image to make three decisions, if the classification category of the gesture can be determined, not continuing fine-grained feature extraction and further three decisions, otherwise, extracting finer-grained features to make three decisions until the classification category of the gesture area image is determined.
Further, the step S6 includes obtaining a final human-computer interaction interface optimization result for each granularity by a weighted summation method, so as to determine a granularity at which the gesture has the best human-computer interaction interface optimization effect;
Result=w×Acc+(1-w)×Time
Time=T1+T2
wherein Result is the optimal granularity of the gesture area image, Acc represents gesture recognition accuracy, Time represents Time spent in the gesture recognition process, w represents weight, T represents weight1Representing the time for extracting the multi-granularity features of the gesture area image; t is2Indicating the time at which the gesture was recognized.
The invention provides a system for optimizing a human-computer interaction interface of an intelligent cockpit based on three decisions, which comprises a camera, a cockpit gesture acquisition module, a gesture image segmentation module, a multi-granularity feature extraction module, a three-decision gesture recognition module, a gesture semantic conversion module and an optimal granularity acquisition module, wherein the camera, the cockpit gesture acquisition module, the gesture image segmentation module, the multi-granularity feature extraction module, the three-decision gesture recognition module, the gesture semantic conversion module and the optimal granularity acquisition module;
the cockpit gesture acquisition module acquires a gesture video in the cockpit through a camera and converts a video frame into a series of static gesture images;
the gesture image segmentation module is used for segmenting the gesture and the background of the gesture image to obtain a gesture area image;
the gesture multi-granularity feature extraction module is used for extracting multi-granularity features of the gesture area image from coarse granularity to fine granularity;
the three-decision gesture recognition module is used for carrying out three-decision on the gesture area image in each granularity according to the extracted multi-granularity features so as to classify the gestures;
the gesture semantic conversion module is used for performing semantic conversion on the classified gestures;
the optimal granularity acquisition module is used for acquiring optimal granularity and sending the optimal granularity to the multi-granularity feature extraction module.
Further, the gesture multi-granularity feature extraction module comprises a convolutional neural network unit, and extracts multi-granularity image features of the gesture area image by using different convolutional kernels in the convolutional neural network unit; the multi-granularity information representation mode is specifically
Wherein A is
iInformation representing different granularities of images of gesture areas, A
1Information indicating the coarse granularity of the image of the gesture area, A
nInformation indicating that the gesture area image is in a fine granularity, namely the fine granularity comprises a coarse granularity;
i 1,2, n, n represents the particle size.
Further, the three-branch decision gesture recognition module performs three-branch decision on coarse-grained features of the gesture area image, if the classification category of the gesture can be determined, fine-grained feature extraction and further three-branch decision are not continued, otherwise, finer-grained features are extracted to perform three-branch decision until the classification category of the gesture area image is determined.
Further, the optimal granularity acquisition module acquires a final human-computer interaction interface optimization result of each granularity by adopting a weighted summation mode so as to determine the optimal granularity of the gesture area image;
Result=w×Acc+(1-w)×Time
Time=T1+T2
wherein Result is a gesture areaOptimal granularity of a domain image, Acc represents gesture recognition precision, Time represents Time spent in a gesture recognition process, w represents weight, T represents weight1Representing the time for extracting the multi-granularity features of the gesture area image; t is2Indicating the time at which the gesture was recognized.
The invention has the beneficial effects that:
the invention utilizes the thought of 'gradual calculation' in particle calculation to construct a multi-granularity information expression mode for a gesture image, utilizes a convolutional neural network to extract the characteristics of the multi-granularity gesture image, and uses a three-decision method to identify the gesture in each granularity from coarse granularity to fine granularity, then carries out corresponding semantic conversion on the identified gesture, and applies the gesture identification result to HMI interface optimization in a cabin.
The method and the device can utilize the characteristics of the acquired gestures with different granularities and combine three decision-making ideas, the gestures are recognized more accurately, and corresponding semantic operations are executed more quickly, so that the interaction time of the HMI interface of the cockpit can be reduced, and more comfortable interaction experience can be provided for users.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly and completely apparent, the technical solutions in the embodiments of the present invention are described below with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
To better illustrate the specific implementation steps of the method, the following is illustrated by way of example in conjunction with fig. 1:
example 1
The invention comprises the following steps:
s1, acquiring a gesture video in the cabin, and preprocessing the gesture video to obtain a static gesture image;
s2, performing segmentation processing on the gestures and the background in the gesture image to obtain a gesture area image;
s3, performing multi-granularity expression on the gesture area image from coarse granularity to fine granularity; extracting multi-granularity characteristics of the gesture area image by using a convolutional neural network;
s4, calculating the conditional probability of classifying each granularity gesture area image into each category from coarse granularity to fine granularity, and sequentially completing gesture recognition by utilizing three decisions;
s5, performing semantic conversion on the recognized gesture area image, and operating the human-computer interaction interface according to the gesture recognition result after the semantic conversion;
the gesture area image is expressed in a multi-granularity mode from coarse granularity to fine granularity, and for the same gesture area image, the multi-granularity information expression mode is as follows:
wherein A isiInformation representing different granularities of images of gesture areas, A1Information indicating the coarse granularity of the image of the gesture area, AnInformation indicating that the gesture area image is in a fine granularity, namely the fine granularity comprises a coarse granularity; i 1,2, n, n represents the particle size.
The extracting of the multi-granularity features of the gesture area image by using the convolutional neural network includes extracting the multi-granularity image features of the gesture image by using different convolutional kernels in the convolutional neural network, and as shown in fig. 2, extracting n granularity features (in turn, coarse granularity to fine granularity features) of the gesture area image by using the convolutional neural network CNN.
Further, the step S4 includes performing three decisions on coarse-grained features of the gesture area image, if the classification category of the gesture can be determined, not continuing fine-grained feature extraction and the further three decisions, otherwise, extracting more fine-grained features and performing three decisions until the classification category of the gesture area image is determined.
The flow chart of the three-branch decision is shown in fig. 3, and the input data set is used for extracting the multi-granularity features of the gesture area image, calculating the conditional probability and performing the three-branch decision.
Selecting a softmax function to calculate a conditional probability, wherein the conditional probability for classifying the gesture x into the category j is as follows:
wherein, l is 1, 2.. k, k represents the total number of categories of the gesture area image; θ is a parameter vector.
The three-branch decision model uses a group of decision thresholds alpha, beta and gamma to draw the gesture object into a positive domain (POS), a boundary domain (BND) and a negative domain (NEG), adopts an acceptance and rejection rule for the positive domain and the negative domain to directly obtain a gesture recognition result, adopts a delay decision for the boundary domain, and continues to apply three-branch decisions when more information is obtained at finer granularity.
The expressions for the positive, boundary and negative domains are as follows:
POS(α,β)={x∈U|p(X|[x])≥α}
BND(α,β)={x∈U|β<p(X|[x])<α}
NEG(α,β)={x∈U|p(X|[x])≤β}
where p (X [ X ]) is the conditional probability of classification, and [ X ] is the equivalence class containing X.
Threshold alpha of three decisionsi,βi,γiThe calculation method of (c) is as follows:
respectively, are loss functions that take different actions,
respectively representing a penalty function of taking an accept, delay and reject decision respectively when the ith granularity gesture X belongs to category X,
respectively representing the loss functions of taking acceptance, delay and rejection decisions respectively when the ith granularity gesture X does not belong to the category X, the loss functions of each granularity being given by experts according to experience.
The multi-granularity three-branch decision threshold is set in such a way that finer-grained decisions are made only if necessary or beneficial. This provides the basis for setting the three decision thresholds at different granularities, i.e. the coarse granularity selects a larger acceptance threshold and a smaller rejection threshold, i ═ 1,2, …, n-1 represents the sequence from coarse granularity to fine granularity, and then the thresholds at different granularities are specifically described as follows:
0≤βi<αi≤1,1≤i<n,
β1≤β2≤…≤βi<αi≤…≤α2≤α1
when i is n granularity, the three-branch decision becomes the two-branch decision, and the decision threshold calculation mode is as follows:
the three-branch decision is a decision mode conforming to human thinking, and compared with the traditional two-branch decision, a choice of no commitment is added, namely, a third delay decision is adopted when the information is not enough to be accepted or rejected. The two-branch decision making process is quick and simple, but the three-branch decision making is more suitable when the obtained information is insufficient or the obtained information needs a certain cost. The purpose of selecting three decisions for gesture recognition is that time spent for acquiring gesture features of different granularities is different, and for HMI interface operation with high real-time requirement, it is very necessary to consider time cost. In three-branch decision gesture recognition, the key steps are extracting multi-granularity features, and calculating threshold value pairs and conditional probabilities of three-branch decisions.
Example 2
On the basis of steps S1-S5, the embodiment further adds step S6, obtains the optimal granularity by means of weighted summation, and repeatedly executes steps S3-S5 with the optimal granularity as the finest granularity.
The HMI interface optimization design method is as shown in fig. 4, a final human-computer interaction interface optimization result of each granularity is obtained by adopting a weighted summation mode, so that the optimal granularity of the gesture area image is determined, the optimal granularity is used as the finest granularity, a convolutional neural network is utilized to extract multi-granularity characteristics of a new gesture, and three decisions are sequentially carried out;
Result=w×Acc+(1-w)×Time
Time=T1+T2
wherein Result is the optimal granularity of the gesture area image, Acc represents gesture recognition accuracy, Time represents Time spent in the gesture recognition process, w represents weight, T represents weight1Representing the time for extracting the multi-granularity features of the gesture area image; t is2Indicating the time at which the gesture was recognized.
This embodiment can save more time resources and have less computational complexity than embodiment 1, for example, in the case of embodiment 1, instead of using the optimal granularity, it takes 100 time to set the feature of extracting 5 granularities, and if it is known that the effect of 3 granularities is slightly worse than the recognition effect of 5 granularities, but the time is 40, then it is considered comprehensively that it is actually more suitable for practical application than 5 to 1 granularities at 5 to 3 granularities.
And the optimal granularity is calculated and then used as the finest granularity of subsequent gesture image processing. Different recognition results can be obtained due to different information amounts of the extracted features with different granularities, time spent by fine-grained feature extraction is more than that spent by coarse-grained feature extraction, gesture recognition accuracy and recognition time are considered in a weighting mode, and the most appropriate granularity can be selected for gesture feature extraction so as to meet the gesture-based HMI interface optimization design target in the cockpit.
The invention provides a system for optimizing a human-computer interaction interface of an intelligent cockpit based on three decisions, which comprises a camera, a cockpit gesture acquisition module, a gesture image segmentation module, a multi-granularity feature extraction module, a three-decision gesture recognition module, a gesture semantic conversion module and an optimal granularity acquisition module, wherein the camera, the cockpit gesture acquisition module, the gesture image segmentation module, the multi-granularity feature extraction module, the three-decision gesture recognition module, the gesture semantic conversion module and the optimal granularity acquisition module;
the cockpit gesture acquisition module acquires a gesture video in the cockpit through a camera and converts a video frame into a series of static gesture images;
the gesture image segmentation module is used for segmenting the gesture and the background of the gesture image to obtain a gesture area image;
the gesture multi-granularity feature extraction module is used for extracting multi-granularity features of the gesture area image from coarse granularity to fine granularity;
the three-decision gesture recognition module is used for carrying out three-decision on the gesture area image in each granularity according to the extracted multi-granularity features so as to classify the gestures;
the gesture semantic conversion module is used for performing semantic conversion on the classified gestures;
the optimal granularity acquisition module is used for acquiring optimal granularity and sending the optimal granularity to the multi-granularity feature extraction module.
Further, the gesture multi-granularity feature extraction module comprises a convolutional neural network unit, and extracts multi-granularity image features of the gesture area image by using different convolutional kernels in the convolutional neural network unit; the multi-granularity information representation mode is specifically
Wherein A is
iInformation representing different granularities of images of gesture areas, A
1Information indicating the coarse granularity of the image of the gesture area, A
nInformation indicating that the gesture area image is in a fine granularity, namely the fine granularity comprises a coarse granularity; i 1,2, n, n represents the particle size.
Further, the three-branch decision gesture recognition module performs three-branch decision on coarse-grained features of the gesture area image, if the classification type of the gesture can be determined, fine-grained feature extraction and further three-branch decision are not continued, otherwise, more fine-grained features are extracted to perform three-branch decision until the classification type of the gesture area image is determined.
Further, the optimal granularity acquisition module acquires a final human-computer interaction interface optimization result of each granularity by adopting a weighted summation mode so as to determine the optimal granularity of the gesture area image;
Result=w×Acc+(1-w)×Time
Time=T1+T2
wherein Result is the optimal granularity of the gesture area image, Acc represents gesture recognition accuracy, Time represents Time spent in the gesture recognition process, w represents weight, T represents weight1Representing the time for extracting the multi-granularity features of the gesture area image; t is2Indicating the time at which the gesture was recognized.
Those skilled in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be performed by hardware associated with program instructions, and that the program may be stored in a computer-readable storage medium, which may include: ROM, RAM, magnetic or optical disks, and the like.
The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.