CN108256307B

CN108256307B - Hybrid enhanced intelligent cognitive method of intelligent business travel motor home

Info

Publication number: CN108256307B
Application number: CN201810030098.3A
Authority: CN
Inventors: 朱智勤; 王冠; 李鹏华; 李嫄源; 秦石磊
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2018-01-12
Filing date: 2018-01-12
Publication date: 2021-04-02
Anticipated expiration: 2038-01-12
Also published as: CN108256307A

Abstract

The invention relates to a hybrid enhanced intelligent cognitive method of an intelligent business sojourn motor home, which specifically comprises the following steps: s1: the driver and the passenger communicate with the vehicle-mounted electronic equipment and are tracked through the conversation state of the user and the vehicle-mounted electronic equipment; s2: carrying out identity authentication on the driver and the passenger according to the voiceprint information of the driver and the passenger which is tracked and collected; s3: analyzing the behavior intention of the driver and the passenger; s4: carrying out face recognition on a driver and a passenger to carry out identity authentication and fatigue monitoring on the driver and the passenger; s5: performing gesture recognition on the driver and the passengers; s6: and comprehensively obtaining an analysis and identification result. The invention introduces the human action and the human cognitive model into the business sojourn recreational vehicle to form a stronger intelligent form, improves the capability of understanding and adapting to the internal and external environments of the business sojourn recreational vehicle by a machine, completes complex space-time related tasks, and enhances the function and space experience of the business sojourn recreational vehicle.

Description

Hybrid enhanced intelligent cognitive method of intelligent business travel motor home

Technical Field

The invention belongs to the field of artificial intelligence hybrid enhanced intelligent cognition, and relates to a hybrid enhanced intelligent cognition method for an intelligent business motor caravan.

Background

At present, China enters the national self-driving travel era. The tourism of the commercial motor caravan is more and more popular with the consumers, and the government is also in vigorous development of the commercial motor caravan industry to drive the collaborative development of the tourism industry and the automobile industry. Meanwhile, the development of the automobile industry in China also steps towards the development era of intelligent networking. The sojourn car is a product combining an intelligent networked automobile and an intelligent home, embodies the fact that artificial intelligence enters human social life, and realizes the deep integration of science and technology and social life. In the intelligent process of a business sojourn motor home, identifying, reasoning and cognition of information carried by each modal data of a loading system are one of core problems to be solved by the intellectualization of the motor home.

Disclosure of Invention

In view of the above, the present invention provides a hybrid enhanced intelligent cognition method for an intelligent business sojourn car, which introduces human actions and a human cognition model into the business sojourn car to form a stronger intelligent form by constructing a cross-media unified semantic expression of a multi-dimensional intelligent space of the business sojourn car, so as to improve machine understanding, adapt to the internal and external environments of the business sojourn car, complete the complex time-space correlation task, and enhance the function and space experience of the business sojourn car.

In order to achieve the purpose, the invention provides the following technical scheme:

a hybrid enhanced intelligent cognition method for an intelligent business motor caravan specifically comprises the following steps:

s1: the driver and the passenger communicate with the vehicle-mounted electronic equipment and are tracked through the conversation state of the user and the vehicle-mounted electronic equipment;

s2: carrying out identity authentication on the driver and the passenger according to the voiceprint information of the driver and the passenger which is tracked and collected;

s3: analyzing the behavior intention of the driver and the passenger;

s4: carrying out face recognition on a driver and a passenger to carry out identity authentication and fatigue monitoring on the driver and the passenger;

s5: performing gesture recognition on the driver and the passengers;

s6: and comprehensively obtaining an analysis and identification result.

Further, step S1 specifically includes:

s11: converting the voice of the driver and the crew into text data through a voice recognition engine, and performing spelling error correction on the text data;

s12: performing Word segmentation processing on the text data after error correction to obtain a Word sequence, and obtaining a Word vector through Word2 Vec;

s13: processing the word vectors through a cascade convolution neural network to obtain a conversation scene type;

s14: constructing a deep reinforcement learning network, and iteratively finishing reinforcement learning of a dialogue state behavior strategy through two independent deep reinforcement learning networks;

s15: constructing a semantic knowledge graph through the triples, and calculating the upper and lower bounds of the association score and the embedding cost of the knowledge atoms in the semantic knowledge graph in real time to obtain a knowledge query result;

s16: and generating corresponding spectral parameters and fundamental frequency by adopting a multi-space probability distribution HMM parameter generation algorithm to generate a smooth acoustic feature sequence, and submitting the acoustic feature sequence to a synthesizer to generate final voice.

Further, step S2 specifically includes:

s21: carrying out pre-emphasis, framing and windowing and end point detection processing on the voice information of the drivers and passengers which is tracked and collected;

s22: fourier transform is carried out on the processed voice information to obtain frequency spectrum energy distribution, a triangular filter bank is adopted to carry out critical band division, and amplitude weighting calculation and discrete cosine transform are carried out to obtain cepstrum coefficients;

s23: inputting the cepstrum coefficients into a speaker identification (UBM) model to obtain voiceprint characteristics;

s24: and matching the voiceprint templates and judging whether the voiceprint information corresponds to the voiceprint template.

Further, step S24 specifically includes:

s241: the energy function is defined as the function of,

wherein h and v are vectors respectively representing the states of the hidden layer and the visible layer, a and b represent the bias of the visible layer and the hidden layer, v_jAnd h_jRepresenting the states of the ith visible layer node and the jth hidden layer node, m is the number of the hidden layer nodes, W_ijIs the connection weight of the visible layer and the hidden layer, i is the number of the ranking, j is the number of the ranking, a_iFor the ith visible layer offset, b_jBiasing for the jth hidden layer;

s242: given model θ ═ w_ij,a_i,b_jObtaining the joint probability distribution of the states (v, h),

wherein,

is a normalization factor;

s243: obtaining two types of high-dimensional Gaussian supervectors of RBM-i-vector by the energy evolution calculation of RBM, and performing channel compensation by adopting linear discriminant analysis;

s244: and performing cosine similarity calculation on the two types of compensated high-dimensional Gaussian supervectors, comparing the two types of compensated high-dimensional Gaussian supervectors with a preset threshold value, and judging whether the voiceprint information corresponds to the voiceprint information or not.

Further, step S3 specifically includes:

s31: constructing a depth cascade network with head and shoulder recognition, and intercepting a series of candidate image blocks according to a preset step length by adopting a multi-scale sliding window for each frame of image to form a sample to be recognized;

s32: inputting a sample to be recognized into a trained head-shoulder/non-head-shoulder recognition model for recognition and classification;

s33: introducing a nonlinear model and an appearance model for correlation analysis;

s34: and detecting the attitude coincidence degree and comparing thresholds.

Further, step S34 specifically includes:

s341: fusing continuously detected passenger posture frames into a complete action;

s342: designing a two-posture fusion rule,

wherein f (i, j) is a fusion function, 1 indicates fusion is possible, 0 indicates fusion is not possible,

detecting frame overlap ratio, T, for two poses_IoUAs a coincidence threshold, S_hisFor two detection in-box histogram matching scores, T_hisFor histogram matching threshold, t₁，t₂At time, Δ T is the two-pose time difference threshold.

Further, the identification of the identity of the driver and the passenger in step S4 is specifically as follows:

s401: roughly positioning the human face through radial symmetric transformation;

s402: obtaining an optimal iterative vector from a current point to a target point by adopting supervised gradient descent (SDM) learning, and establishing a shape offset delta x as x^*-x and the characteristics of the current shape x

A linear regression model of the two or more,

wherein x^*For, b is the offset,

is a regression model;

s403: iterating by using the current shape x and the deformation vector delta x to obtain an expected position vector;

x:＝x+Δx

x represents the desired position vector;

s404: constructing a learning target of the SDM, obtaining the real deviation of the ith point and the actual boundary point,

where k is the number of iterations, x_kRepresenting the shape vector when iterated to the kth time,

representing the coordinates of the ith point in the shape vector,

the coordinate deformation amount of the ith point in the shape vector, b_kIs the bias at the kth iteration;

s405: by adopting a differential operator, the method adopts the following steps,

accurately positioning a human face organ, wherein G_σ(r) is a smoothing function, I (x, y) is an image gray matrix, (a, b) is a circle center, and r is a radius;

s406: fitting the human face organs in the target area with the characteristic points to obtain characteristic point mark positions;

s407: intercepting the subgraph in the neighborhood range of each feature point, obtaining the adjacent features of human face organs, connecting the adjacent features of all the feature points in series to form the feature point limit learning feature,

s408: counting the extreme learning characteristics of each image partition as a training set of a generalized single hidden layer feedforward neural network, training an extreme learning machine, searching an identity label of a specific person with the characteristics fused, and completing identity identification;

the fatigue monitoring of the driver and the passengers in the step S4 is specifically as follows:

s411: 3D face modeling of a driver is realized by adopting a 3D face modeling method, and the head posture of the driver is tracked in real time according to the face recognition method of S406-S408;

s412: solving the positions of the eyes in the 2D face image by using the positions of the eyes and the head posture in the 3D face model;

s413: positioning feature points in the eye region by adopting a face point detection algorithm (CLM), and utilizing face image texture normalization to verify the positioned feature points;

s414: positioning the center of the iris according to the physiological structure characteristics of the iris;

s415: positioning the upper eyelid and the lower eyelid according to the parameterized template, and extracting the eye movement of the driver and the crew;

s416: respectively extracting fatigue characteristics related to the opening and closing degree of the eyelids, the opening and closing speed of the eyes and the motion characteristics of the iris according to the eye movement, and comparing the fatigue characteristics with the characteristics in the waking state to obtain variation characteristics;

s417: and analyzing the relevance among all fatigue characteristics by adopting a Bayesian network classifier to complete the fatigue monitoring of the drivers and passengers.

Further, step S5 specifically includes:

s51: collecting gesture images of driver and passenger, and converting the gesture images into image sequence I_rgb；

S52: image sequence I_rgbConversion into a sequence of grayscale images I_grayAnd image I_rgbConversion into a binary skin tone image sequence I_skin；

S53: according to a gray-scale image sequence I_grayAnd binary skin tone signal image sequence I_skinCalculating motion parameters as the inter-frame motion characteristics;

s54: regulating the time length of the gesture motion, constructing a probability function,

where i denotes the ith state, j denotes the jth characteristic parameter, x_i,jFor the time-normalized motion and shape characteristics of a gesture sequence, λ represents the gesture class, μ represents the mathematical expectation matrix for each characteristic parameter, σ is the standard deviation, u_i,jA mathematical expectation matrix, σ, for the jth characteristic parameter of the ith state_i,jThe standard deviation of the jth characteristic parameter of the ith state;

s55: a probability function of the occurrence of the complete gesture sequence observation is constructed,

wherein X is the observation of the complete gesture sequence, m is the sum of the gesture states, and n is the sum of the gesture features;

s56: for all kinds of gesture recognition, calculating

And obtaining the minimum value which is the attribution gesture category.

The invention has the beneficial effects that: the invention relates to a hybrid enhanced intelligent cognitive technology based on deep learning and oriented to a business sojourn motor home.

The first is to carry out people's intelligence pronunciation interactive demand to facilities such as driver and crew and on-vehicle electron, car machine amusement, and people's car many people dialogue model has been designed to this patent, realizes driver and crew and mobile device's intelligent pronunciation and exchanges. The module specifically comprises a voice data acquisition layer, a preprocessing layer, a semantic analysis layer, a dialogue management layer, a knowledge reasoning layer and a voice output layer. The voice data is analyzed and processed step by step, and voice communication between a person and the vehicle-mounted system is achieved.

The second is identity recognition and fatigue detection for drivers. The driver identification comprises voiceprint identification and face identification, and personnel identification is carried out through an algorithm driving model and a feedforward neural network model respectively. Through the discernment to the navigating mate, ensured the safety of car as a house and inside property.

Thirdly, driver fatigue detection to ensure driving safety is essential for the in-vehicle devices of the motor home. The fatigue detection is to extract the information of the head posture and the eye movement, calculate a characteristic value and compare the characteristic value with the characteristic value in the waking state, acquire a variation characteristic and judge whether fatigue exists according to the value of the variation characteristic.

Fourthly, the patent also provides own innovative ideas for behavior analysis and gesture recognition of drivers. And analyzing the behavior intention of the driver by training and designing a deep cascade network of the nonlinear motion model and the appearance model to assist driving. Gesture recognition is a part of vehicle-mounted human-vehicle interaction, and the purpose of the part is to simplify the operation requirement of a driver, extract and analyze motion information and color information of hands through a multi-posture model to interact with the driver, and meet the requirement of the driver. The method provided by the patent enriches the actual experience of drivers and passengers while ensuring property safety and driving safety.

Drawings

In order to make the object, technical scheme and beneficial effect of the invention more clear, the invention provides the following drawings for explanation:

FIG. 1 is a multi-turn dialogue model based on a POMDP strategy and a ternary knowledge graph according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of modeling an overall factor based on a constrained Boltzmann machine according to an embodiment of the present invention;

FIG. 3 is a diagram of a three-level deep cascade network according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating multi-target tracking for online learning of non-linear motion patterns and robust appearance models in accordance with an embodiment of the present invention;

FIG. 5 is a diagram of a face feature extraction area and a positioning effect in an embodiment of the present invention;

FIG. 6 is a monitoring diagram of fatigue status of drivers and passengers according to the embodiment of the invention;

FIG. 7 is a diagram of gesture feature spatiotemporal representations according to an embodiment of the present invention.

Detailed Description

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

The invention comprises the following steps:

1. human-vehicle intelligent voice interaction technology. Around the requirement of carrying out human-vehicle intelligent voice interaction between a driver and a vehicle-mounted electronic device, a vehicle-mounted electronic device and other devices, a human-vehicle multi-turn conversation model based on a POMDP strategy and a ternary knowledge map is designed by adopting a conversation state tracking and management technology, and smooth communication between the driver and the vehicle-mounted device is realized. As shown in fig. 1, the specific steps are as follows:

a) a data acquisition layer: and converting the user voice into text data through a voice recognition engine, and completing the spelling error correction of the characters.

b) A pretreatment layer: and performing Word segmentation processing on the corrected text data to obtain a Word sequence, completing part-of-speech tagging, entity naming, common-finger disambiguation and relationship dependence in vocabulary and semantics, and acquiring Word vectors by means of Word2 Vec.

c) A semantic analysis layer: and submitting the word vectors subjected to fusion coding to a cascade convolution neural network to complete primary semantic analysis and obtain the conversation scene type.

d) A conversation management layer: and designing a dialogue problem guide strategy in the POMDP model to realize dialogue state tracking, and completing reinforcement learning of a dialogue state behavior strategy by constructing a deep reinforcement learning network (DQN) and by means of iteration of two independent Q networks.

e) Knowledge reasoning layer: the semantic knowledge graph is constructed by constructing triples, the upper and lower bounds of knowledge atom association scores and embedding costs in the knowledge graph are calculated in real time under the condition that indexes are not adopted, the knowledge query result of Top-k is deduced, corresponding score functions are respectively designed on the basis of determining a single scene knowledge atom combination set and a cross-scene knowledge atom combination set, training of the cross-boundary combination set is driven by combining a multi-column convolution network, and cross-boundary knowledge fusion scores are calculated.

f) A voice output layer: the method comprises the steps of performing text analysis on an input text, generating corresponding spectral parameters and fundamental frequency by adopting a multi-space probability distribution HMM parameter generation algorithm to generate a smooth acoustic feature sequence, and submitting the acoustic feature sequence to a synthesizer to generate final voice.

2. The driver is identified using voiceprints. As shown in fig. 2, around the requirement of identity authentication of drivers and passengers in the intelligent security field of business travel caravans, i-vector feature extraction is replaced by a limited boltzmann machine feature extraction technology under the total variation factor, a UBM model under the drive of an EM algorithm is designed, and voiceprint identification under the high-dimensional gaussian component representation is realized.

a) And collecting and processing the voice fragments. The collected voice segments are processed by pre-emphasis, framing and windowing and end point detection. Fourier transform is carried out on the signals to obtain frequency spectrum energy distribution, a triangular filter bank is adopted to carry out critical band division, and amplitude weighting calculation and discrete cosine transform are carried out to obtain a cepstrum coefficient (MFCC).

b) Voiceprint features are obtained. And submitting the cepstrum coefficient to a UBM model trained by an EM algorithm to obtain probability scores of the voiceprint characteristics, and performing template matching with corresponding Gaussian components.

c) And matching the voiceprint templates. Designing a Restricted Boltzmann Machine (RBM) consisting of a visible layer of n nodes and a hidden layer of m nodes, and defining an energy function as follows:

where vectors h and v represent the states of the hidden and visible layers, respectively, a and b represent the offsets of the visible and hidden layers, v_iAnd h_jRepresenting the states of the ith visible level node and the jth hidden level node. Given model θ ═ w_ij,a_i,b_jGet the joint probability distribution of the states (v, h)

Wherein,

is a normalization factor.

d) And judging the speaker. And obtaining two types of high-dimensional Gaussian supervectors of the RBM-i-vector by the energy evolution calculation of the RBM, and performing channel compensation by adopting Linear Discriminant Analysis (LDA). And performing cosine similarity calculation on the two compensated RBM-i-vectors, and comparing the two compensated RBM-i-vectors with a preset threshold value, thereby judging the attribution of the voiceprint to a specific speaker.

3. And analyzing the behavior of the driver. Around the analysis requirement of the driver and passenger intentions in the intelligent behavior interaction field of the commercial motor caravan, a deep cascade network with a head and shoulder recognition function is designed by adopting nonlinear motion mode learning and appearance model multi-instance learning, and the analysis of the driver and passenger behavior intentions driven by a hierarchical association multi-target tracking learning strategy is realized.

a) And constructing a deep cascade network screening sample. Constructing a depth cascade network (HsNet) with head and shoulder identification, and intercepting a series of candidate blocks (Patch) according to a preset step length by adopting a multi-scale sliding window for each frame of image to form a sample to be identified; the samples are sent to a pre-trained head-shoulder/non-head-shoulder recognition model HsNet and a three-level CNN cascade network, and classified as shown in figure 3. In the specific classification process, the Patch judged as a negative sample is directly abandoned, and the rest samples continue to enter the next stage of the network for more strict identification and classification, so that three stages of CNN network classification and identification are sequentially carried out; the output result of the third level of the network is used for judging whether the image Patch belongs to a head and shoulder area or not, and the height of the head and shoulder frame is expanded to be 3 times of that of the original corresponding sliding window to obtain a whole body frame detected by the passenger; and for the same passenger, a plurality of detection frames are formed, and finally, redundant detection frames are removed by using a non-maximum suppression strategy, and only one most possible detection frame-passenger detection identification result is reserved at each position.

b) And introducing a nonlinear model and an appearance model for correlation analysis. Introducing nonlinear motion mode learning and appearance model multi-instance learning in a hierarchical association multi-target tracking learning strategy, and performing bottom layer credible association on a detection object to form a track segment; and effectively connecting track segments by utilizing the nonlinear motion mode online learning and the appearance model multi-instance learning to obtain a reliable object track. Parameters such as speed, direction and distance extracted from the motion trail of the object are used as features, and a plurality of features are combined to form higher-level semantics to describe the behavior of the object, so that the behavior intention of the driver and the passenger is judged.

c) And detecting the attitude coincidence degree and a comparison threshold value. As shown in fig. 4, to improve the robustness of behavior detection, continuously detected passenger posture frames are fused into a complete action behavior. The two-posture fusion rule is designed as follows:

where f (i, j) is a fusion function, 1 indicates fusion is possible, 0 indicates fusion is not possible,

detecting frame overlap ratio, T, for two poses_IoU0.5 denotes the overlap ratio threshold, S_hisFor two detection in-box histogram matching scores, T_his35 is the histogram matching threshold, T_Δ25 denotes the two-pose time difference threshold.

4. And (5) carrying out face recognition and fatigue monitoring on the driver and the passengers. Around the requirements of identity authentication and fatigue state monitoring of drivers in the intelligent safety field of commercial motor homes, a supervision gradient descent algorithm and a CLM positioning algorithm are respectively adopted, a generalized single-hidden-layer feedforward neural network based on an extreme learning machine is designed to complete identity authentication of specific drivers, and a matching template based on face 3D modeling is used for monitoring the fatigue state of the drivers.

a) Identity authentication based on human face features: roughly positioning the face by radial symmetric transformation, obtaining the optimal iterative vector from the current point to the target point by adopting supervised gradient descent (SDM) learning, and establishing the shape offset delta x as x^*-x and the characteristics of the current shape x

Linear regression model between

Then, iteration is carried out by using the current shape x and the deformation vector delta x to obtainThe desired position vector x:x + Δ x. Constructing a learning objective for SDM:

representing the coordinates of the ith point in the shape vector. And carrying out repeated iterative learning to obtain the real deviation of the ith point and the actual boundary point. Then, a calculus operator is adopted:

accurately locating the eyes, nose, mouth, etc. in the face of a person, where G_σAnd (r) is a smoothing function, I (x, y) is an image gray matrix, (a, b) is a circle center, and r is a radius. The face localization result is shown in fig. 5.

A generalized single hidden layer feedforward neural network is designed to carry out face recognition, and the network has invariance to monotone gray level change and angle rotation and insensitivity to image change caused by uneven illumination. In the identification process, fitting the human face organs in the target area with the characteristic points to obtain the marking positions of the characteristic points. And intercepting the subgraph in the neighborhood range of each feature point, acquiring the adjacent features of the face organs, and finally connecting the adjacent features of all the feature points in series to form the feature point limit learning feature. And respectively counting the extreme learning features of each image partition as a training set of a generalized single hidden layer feedforward neural network, training a plurality of extreme learning machines, combining the output results of the feedforward neural network, and searching a specific human identity label fused with the features under the drive of an output decision of an optimal integrated classifier to finish identity authentication. The result of face extreme learning feature extraction is shown in fig. 5.

b) Fatigue monitoring based on human face features: and 3D face modeling of the driver is realized by adopting a 3D face modeling method, and the head posture of the driver is tracked in real time by combining the face recognition method. The eye positions in the 2D face image are indirectly solved by using the eye positions and the head postures in the 3D face model, the feature points in the eye areas are positioned by using a CLM algorithm, and the feature point positioning is verified by using the texture normalization of the face image. The center of the iris is positioned through the physiological structure characteristics of the iris, and the imaging difference of the iris under different illumination conditions is overcome. And (3) positioning the upper eyelid and the lower eyelid by using a parameterized template to extract the eye movement of the driver. According to the eye action characteristics, fatigue characteristics related to the eyelid opening and closing degree, the eye opening and closing speed and the iris movement characteristics are respectively extracted and compared with characteristic values in a waking state to obtain variation characteristics. And constructing a Bayesian network classifier to analyze the relevance among all fatigue characteristics, and finishing the fatigue state judgment of the driver, as shown in FIG. 6.

5. And (5) gesture recognition of the driver and the passenger. Surrounding the requirement that a driver and a vehicle carry out human-vehicle typical gesture interaction with facilities such as vehicle-mounted electronics and vehicle-mounted entertainment, a multi-state Gaussian probability model under a complex background is designed, and the hand motion information and the color information of a hand are combined to carry out human-hand segmentation driver and vehicle gesture recognition, as shown in FIG. 7.

a) And (5) converting and processing the image. Color image sequence I obtained by shooting_rgbOn the one hand, it is converted into a 256-level gray image sequence I_grayFor analysis of motion parameters; on the other hand, according to the distribution of RGB colors in HSI space, converting the RGB colors into binary skin color signal image sequence I_skinDivided into skin color areas and non-skin color areas.

b) And extracting characteristic information and fusing the image. For gray level image sequence I_grayProcessing to obtain a coarse binary motion image sequence I_mov. At the same time, I_movAnd I_skinCorresponding to the AND operation between the images to obtain a binary skin motion region image sequence I_mov-skinSequence I_mov-skinThe middle area is the moving skin area. Due to I_mov-skinDoes not necessarily contain a complete hand region, so the seed algorithm is designed to find a complete hand region. First, assume a motion region of a handThe domain is I_mov-skinIn accordance with regional connectivity, in_mov-skinThe largest connected domain B is found by applying a seed algorithm, and the connected domain B is used as a part of a human hand; then, mapping connected component B to I_skinAt the same position in (1), applying a seed algorithm to seed the position, at_skinImage sequence I with middle expansion to obtain complete hand area_hand. For hand region image sequence I_handExtracting shape characteristics of hand region, combining with I_grayAnd I_handAnd calculating motion parameters in the hand areas of two adjacent frames as the inter-frame motion characteristics.

c) And extracting gesture motion characteristics and shape characteristics. Let L denote the temporal length of the gesture, and the shape feature of the t-th frame is s [ t ]]The motion characteristic between the t-th frame and the t + 1-th frame is m [ t ]]Defining an 8-dimensional feature vector f [ t ]](f[t]＝[m[t],s[t]^T]) The method is used for uniformly describing the apparent features of the gestures, forming a time scale invariant feature sequence and realizing time scale invariant feature extraction and matching. Constructing a spatiotemporal apparent feature of a gesture A ═ f [0 [ ]],f[1],…,f[L-2]]^TDefining a feature vector f [ t ]]Change over time. Regulating the time length L of the gesture motion, and constructing the j characteristic parameter appearance observation x in the ith state_i,jThe probability function of (c):

wherein x_i,jThe method comprises the steps of representing motion characteristics and shape characteristics of a time-normalized gesture sequence, representing any gesture type model by lambda, representing a mathematical expectation matrix of each characteristic parameter by mu, and representing standard deviation by sigma. Then for the gesture model λ (μ, σ), a probability function of the occurrence of the complete gesture sequence observation X can be constructed:

for each gesture recognition, calculating

The gesture category with the smallest value is the attributive gesture category.

Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, although the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.

Claims

1. A hybrid enhanced intelligent cognitive method of an intelligent business sojourn motor home is characterized by comprising the following steps: the method specifically comprises the following steps:

s3: analyzing the behavior intention of the driver and the passenger;

s5: performing gesture recognition on the driver and the passengers;

s6: comprehensively obtaining an analysis and identification result;

step S1 specifically includes:

2. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 1, wherein: step S2 specifically includes:

s23: inputting the cepstrum coefficient into a speaker identification UBM model to obtain voiceprint characteristics;

3. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 2, wherein: step S24 specifically includes:

s241: the energy function is defined as the function of,

s242: given model θ ═ w_ij，a_i，b_jObtaining the joint probability distribution of the states (v, h),

wherein,

is a normalization factor;

4. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 2, wherein: step S3 specifically includes:

s34: and detecting the attitude coincidence degree and comparing thresholds.

5. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 4, wherein: step S34 specifically includes:

s342: designing a two-posture fusion rule,

6. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 4, wherein: the identity authentication of the driver and the passenger in the step S4 specifically comprises the following steps:

A linear regression model of the two or more,

wherein x^*For, b is the offset,

is a regression model;

x:＝x+Δx

x represents the desired position vector;

representing the coordinates of the ith point in the shape vector,

s413: the method comprises the steps of positioning feature points in an eye region by adopting a face point detection algorithm (CLM), and utilizing face image texture normalization to verify the positioned feature points;

7. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 6, wherein: step S5 specifically includes:

where i denotes the ith state, j denotes the jth characteristic parameter, x_i，jFor the time-normalized motion and shape characteristics of a gesture sequence, λ represents the gesture class, μ represents the mathematical expectation matrix for each characteristic parameter, σ is the standard deviation, u_i，jA mathematical expectation matrix, σ, for the jth characteristic parameter of the ith state_i，jThe standard deviation of the jth characteristic parameter of the ith state;

s56: for all kinds of gesture recognition, calculating

And obtaining the minimum value which is the attribution gesture category.