CN108256307B - Hybrid enhanced intelligent cognitive method of intelligent business travel motor home - Google Patents

Hybrid enhanced intelligent cognitive method of intelligent business travel motor home Download PDF

Info

Publication number
CN108256307B
CN108256307B CN201810030098.3A CN201810030098A CN108256307B CN 108256307 B CN108256307 B CN 108256307B CN 201810030098 A CN201810030098 A CN 201810030098A CN 108256307 B CN108256307 B CN 108256307B
Authority
CN
China
Prior art keywords
driver
passenger
gesture
intelligent
sojourn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810030098.3A
Other languages
Chinese (zh)
Other versions
CN108256307A (en
Inventor
朱智勤
王冠
李鹏华
李嫄源
秦石磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201810030098.3A priority Critical patent/CN108256307B/en
Publication of CN108256307A publication Critical patent/CN108256307A/en
Application granted granted Critical
Publication of CN108256307B publication Critical patent/CN108256307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • Ophthalmology & Optometry (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a hybrid enhanced intelligent cognitive method of an intelligent business sojourn motor home, which specifically comprises the following steps: s1: the driver and the passenger communicate with the vehicle-mounted electronic equipment and are tracked through the conversation state of the user and the vehicle-mounted electronic equipment; s2: carrying out identity authentication on the driver and the passenger according to the voiceprint information of the driver and the passenger which is tracked and collected; s3: analyzing the behavior intention of the driver and the passenger; s4: carrying out face recognition on a driver and a passenger to carry out identity authentication and fatigue monitoring on the driver and the passenger; s5: performing gesture recognition on the driver and the passengers; s6: and comprehensively obtaining an analysis and identification result. The invention introduces the human action and the human cognitive model into the business sojourn recreational vehicle to form a stronger intelligent form, improves the capability of understanding and adapting to the internal and external environments of the business sojourn recreational vehicle by a machine, completes complex space-time related tasks, and enhances the function and space experience of the business sojourn recreational vehicle.

Description

Hybrid enhanced intelligent cognitive method of intelligent business travel motor home
Technical Field
The invention belongs to the field of artificial intelligence hybrid enhanced intelligent cognition, and relates to a hybrid enhanced intelligent cognition method for an intelligent business motor caravan.
Background
At present, China enters the national self-driving travel era. The tourism of the commercial motor caravan is more and more popular with the consumers, and the government is also in vigorous development of the commercial motor caravan industry to drive the collaborative development of the tourism industry and the automobile industry. Meanwhile, the development of the automobile industry in China also steps towards the development era of intelligent networking. The sojourn car is a product combining an intelligent networked automobile and an intelligent home, embodies the fact that artificial intelligence enters human social life, and realizes the deep integration of science and technology and social life. In the intelligent process of a business sojourn motor home, identifying, reasoning and cognition of information carried by each modal data of a loading system are one of core problems to be solved by the intellectualization of the motor home.
Disclosure of Invention
In view of the above, the present invention provides a hybrid enhanced intelligent cognition method for an intelligent business sojourn car, which introduces human actions and a human cognition model into the business sojourn car to form a stronger intelligent form by constructing a cross-media unified semantic expression of a multi-dimensional intelligent space of the business sojourn car, so as to improve machine understanding, adapt to the internal and external environments of the business sojourn car, complete the complex time-space correlation task, and enhance the function and space experience of the business sojourn car.
In order to achieve the purpose, the invention provides the following technical scheme:
a hybrid enhanced intelligent cognition method for an intelligent business motor caravan specifically comprises the following steps:
s1: the driver and the passenger communicate with the vehicle-mounted electronic equipment and are tracked through the conversation state of the user and the vehicle-mounted electronic equipment;
s2: carrying out identity authentication on the driver and the passenger according to the voiceprint information of the driver and the passenger which is tracked and collected;
s3: analyzing the behavior intention of the driver and the passenger;
s4: carrying out face recognition on a driver and a passenger to carry out identity authentication and fatigue monitoring on the driver and the passenger;
s5: performing gesture recognition on the driver and the passengers;
s6: and comprehensively obtaining an analysis and identification result.
Further, step S1 specifically includes:
s11: converting the voice of the driver and the crew into text data through a voice recognition engine, and performing spelling error correction on the text data;
s12: performing Word segmentation processing on the text data after error correction to obtain a Word sequence, and obtaining a Word vector through Word2 Vec;
s13: processing the word vectors through a cascade convolution neural network to obtain a conversation scene type;
s14: constructing a deep reinforcement learning network, and iteratively finishing reinforcement learning of a dialogue state behavior strategy through two independent deep reinforcement learning networks;
s15: constructing a semantic knowledge graph through the triples, and calculating the upper and lower bounds of the association score and the embedding cost of the knowledge atoms in the semantic knowledge graph in real time to obtain a knowledge query result;
s16: and generating corresponding spectral parameters and fundamental frequency by adopting a multi-space probability distribution HMM parameter generation algorithm to generate a smooth acoustic feature sequence, and submitting the acoustic feature sequence to a synthesizer to generate final voice.
Further, step S2 specifically includes:
s21: carrying out pre-emphasis, framing and windowing and end point detection processing on the voice information of the drivers and passengers which is tracked and collected;
s22: fourier transform is carried out on the processed voice information to obtain frequency spectrum energy distribution, a triangular filter bank is adopted to carry out critical band division, and amplitude weighting calculation and discrete cosine transform are carried out to obtain cepstrum coefficients;
s23: inputting the cepstrum coefficients into a speaker identification (UBM) model to obtain voiceprint characteristics;
s24: and matching the voiceprint templates and judging whether the voiceprint information corresponds to the voiceprint template.
Further, step S24 specifically includes:
s241: the energy function is defined as the function of,
Figure BDA0001546244800000021
wherein h and v are vectors respectively representing the states of the hidden layer and the visible layer, a and b represent the bias of the visible layer and the hidden layer, vjAnd hjRepresenting the states of the ith visible layer node and the jth hidden layer node, m is the number of the hidden layer nodes, WijIs the connection weight of the visible layer and the hidden layer, i is the number of the ranking, j is the number of the ranking, aiFor the ith visible layer offset, bjBiasing for the jth hidden layer;
s242: given model θ ═ wij,ai,bjObtaining the joint probability distribution of the states (v, h),
Figure BDA0001546244800000022
wherein,
Figure BDA0001546244800000023
is a normalization factor;
s243: obtaining two types of high-dimensional Gaussian supervectors of RBM-i-vector by the energy evolution calculation of RBM, and performing channel compensation by adopting linear discriminant analysis;
s244: and performing cosine similarity calculation on the two types of compensated high-dimensional Gaussian supervectors, comparing the two types of compensated high-dimensional Gaussian supervectors with a preset threshold value, and judging whether the voiceprint information corresponds to the voiceprint information or not.
Further, step S3 specifically includes:
s31: constructing a depth cascade network with head and shoulder recognition, and intercepting a series of candidate image blocks according to a preset step length by adopting a multi-scale sliding window for each frame of image to form a sample to be recognized;
s32: inputting a sample to be recognized into a trained head-shoulder/non-head-shoulder recognition model for recognition and classification;
s33: introducing a nonlinear model and an appearance model for correlation analysis;
s34: and detecting the attitude coincidence degree and comparing thresholds.
Further, step S34 specifically includes:
s341: fusing continuously detected passenger posture frames into a complete action;
s342: designing a two-posture fusion rule,
Figure BDA0001546244800000031
wherein f (i, j) is a fusion function, 1 indicates fusion is possible, 0 indicates fusion is not possible,
Figure BDA0001546244800000032
detecting frame overlap ratio, T, for two posesIoUAs a coincidence threshold, ShisFor two detection in-box histogram matching scores, ThisFor histogram matching threshold, t1,t2At time, Δ T is the two-pose time difference threshold.
Further, the identification of the identity of the driver and the passenger in step S4 is specifically as follows:
s401: roughly positioning the human face through radial symmetric transformation;
s402: obtaining an optimal iterative vector from a current point to a target point by adopting supervised gradient descent (SDM) learning, and establishing a shape offset delta x as x*-x and the characteristics of the current shape x
Figure BDA0001546244800000033
A linear regression model of the two or more,
Figure BDA0001546244800000034
wherein x*For, b is the offset,
Figure BDA0001546244800000035
is a regression model;
s403: iterating by using the current shape x and the deformation vector delta x to obtain an expected position vector;
x:=x+Δx
x represents the desired position vector;
s404: constructing a learning target of the SDM, obtaining the real deviation of the ith point and the actual boundary point,
Figure BDA0001546244800000036
where k is the number of iterations, xkRepresenting the shape vector when iterated to the kth time,
Figure BDA0001546244800000037
representing the coordinates of the ith point in the shape vector,
Figure BDA0001546244800000038
the coordinate deformation amount of the ith point in the shape vector, bkIs the bias at the kth iteration;
s405: by adopting a differential operator, the method adopts the following steps,
Figure BDA0001546244800000041
accurately positioning a human face organ, wherein Gσ(r) is a smoothing function, I (x, y) is an image gray matrix, (a, b) is a circle center, and r is a radius;
s406: fitting the human face organs in the target area with the characteristic points to obtain characteristic point mark positions;
s407: intercepting the subgraph in the neighborhood range of each feature point, obtaining the adjacent features of human face organs, connecting the adjacent features of all the feature points in series to form the feature point limit learning feature,
s408: counting the extreme learning characteristics of each image partition as a training set of a generalized single hidden layer feedforward neural network, training an extreme learning machine, searching an identity label of a specific person with the characteristics fused, and completing identity identification;
the fatigue monitoring of the driver and the passengers in the step S4 is specifically as follows:
s411: 3D face modeling of a driver is realized by adopting a 3D face modeling method, and the head posture of the driver is tracked in real time according to the face recognition method of S406-S408;
s412: solving the positions of the eyes in the 2D face image by using the positions of the eyes and the head posture in the 3D face model;
s413: positioning feature points in the eye region by adopting a face point detection algorithm (CLM), and utilizing face image texture normalization to verify the positioned feature points;
s414: positioning the center of the iris according to the physiological structure characteristics of the iris;
s415: positioning the upper eyelid and the lower eyelid according to the parameterized template, and extracting the eye movement of the driver and the crew;
s416: respectively extracting fatigue characteristics related to the opening and closing degree of the eyelids, the opening and closing speed of the eyes and the motion characteristics of the iris according to the eye movement, and comparing the fatigue characteristics with the characteristics in the waking state to obtain variation characteristics;
s417: and analyzing the relevance among all fatigue characteristics by adopting a Bayesian network classifier to complete the fatigue monitoring of the drivers and passengers.
Further, step S5 specifically includes:
s51: collecting gesture images of driver and passenger, and converting the gesture images into image sequence Irgb
S52: image sequence IrgbConversion into a sequence of grayscale images IgrayAnd image IrgbConversion into a binary skin tone image sequence Iskin
S53: according to a gray-scale image sequence IgrayAnd binary skin tone signal image sequence IskinCalculating motion parameters as the inter-frame motion characteristics;
s54: regulating the time length of the gesture motion, constructing a probability function,
Figure BDA0001546244800000042
where i denotes the ith state, j denotes the jth characteristic parameter, xi,jFor the time-normalized motion and shape characteristics of a gesture sequence, λ represents the gesture class, μ represents the mathematical expectation matrix for each characteristic parameter, σ is the standard deviation, ui,jA mathematical expectation matrix, σ, for the jth characteristic parameter of the ith statei,jThe standard deviation of the jth characteristic parameter of the ith state;
s55: a probability function of the occurrence of the complete gesture sequence observation is constructed,
Figure BDA0001546244800000051
wherein X is the observation of the complete gesture sequence, m is the sum of the gesture states, and n is the sum of the gesture features;
s56: for all kinds of gesture recognition, calculating
Figure BDA0001546244800000052
And obtaining the minimum value which is the attribution gesture category.
The invention has the beneficial effects that: the invention relates to a hybrid enhanced intelligent cognitive technology based on deep learning and oriented to a business sojourn motor home.
The first is to carry out people's intelligence pronunciation interactive demand to facilities such as driver and crew and on-vehicle electron, car machine amusement, and people's car many people dialogue model has been designed to this patent, realizes driver and crew and mobile device's intelligent pronunciation and exchanges. The module specifically comprises a voice data acquisition layer, a preprocessing layer, a semantic analysis layer, a dialogue management layer, a knowledge reasoning layer and a voice output layer. The voice data is analyzed and processed step by step, and voice communication between a person and the vehicle-mounted system is achieved.
The second is identity recognition and fatigue detection for drivers. The driver identification comprises voiceprint identification and face identification, and personnel identification is carried out through an algorithm driving model and a feedforward neural network model respectively. Through the discernment to the navigating mate, ensured the safety of car as a house and inside property.
Thirdly, driver fatigue detection to ensure driving safety is essential for the in-vehicle devices of the motor home. The fatigue detection is to extract the information of the head posture and the eye movement, calculate a characteristic value and compare the characteristic value with the characteristic value in the waking state, acquire a variation characteristic and judge whether fatigue exists according to the value of the variation characteristic.
Fourthly, the patent also provides own innovative ideas for behavior analysis and gesture recognition of drivers. And analyzing the behavior intention of the driver by training and designing a deep cascade network of the nonlinear motion model and the appearance model to assist driving. Gesture recognition is a part of vehicle-mounted human-vehicle interaction, and the purpose of the part is to simplify the operation requirement of a driver, extract and analyze motion information and color information of hands through a multi-posture model to interact with the driver, and meet the requirement of the driver. The method provided by the patent enriches the actual experience of drivers and passengers while ensuring property safety and driving safety.
Drawings
In order to make the object, technical scheme and beneficial effect of the invention more clear, the invention provides the following drawings for explanation:
FIG. 1 is a multi-turn dialogue model based on a POMDP strategy and a ternary knowledge graph according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of modeling an overall factor based on a constrained Boltzmann machine according to an embodiment of the present invention;
FIG. 3 is a diagram of a three-level deep cascade network according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating multi-target tracking for online learning of non-linear motion patterns and robust appearance models in accordance with an embodiment of the present invention;
FIG. 5 is a diagram of a face feature extraction area and a positioning effect in an embodiment of the present invention;
FIG. 6 is a monitoring diagram of fatigue status of drivers and passengers according to the embodiment of the invention;
FIG. 7 is a diagram of gesture feature spatiotemporal representations according to an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
The invention comprises the following steps:
1. human-vehicle intelligent voice interaction technology. Around the requirement of carrying out human-vehicle intelligent voice interaction between a driver and a vehicle-mounted electronic device, a vehicle-mounted electronic device and other devices, a human-vehicle multi-turn conversation model based on a POMDP strategy and a ternary knowledge map is designed by adopting a conversation state tracking and management technology, and smooth communication between the driver and the vehicle-mounted device is realized. As shown in fig. 1, the specific steps are as follows:
a) a data acquisition layer: and converting the user voice into text data through a voice recognition engine, and completing the spelling error correction of the characters.
b) A pretreatment layer: and performing Word segmentation processing on the corrected text data to obtain a Word sequence, completing part-of-speech tagging, entity naming, common-finger disambiguation and relationship dependence in vocabulary and semantics, and acquiring Word vectors by means of Word2 Vec.
c) A semantic analysis layer: and submitting the word vectors subjected to fusion coding to a cascade convolution neural network to complete primary semantic analysis and obtain the conversation scene type.
d) A conversation management layer: and designing a dialogue problem guide strategy in the POMDP model to realize dialogue state tracking, and completing reinforcement learning of a dialogue state behavior strategy by constructing a deep reinforcement learning network (DQN) and by means of iteration of two independent Q networks.
e) Knowledge reasoning layer: the semantic knowledge graph is constructed by constructing triples, the upper and lower bounds of knowledge atom association scores and embedding costs in the knowledge graph are calculated in real time under the condition that indexes are not adopted, the knowledge query result of Top-k is deduced, corresponding score functions are respectively designed on the basis of determining a single scene knowledge atom combination set and a cross-scene knowledge atom combination set, training of the cross-boundary combination set is driven by combining a multi-column convolution network, and cross-boundary knowledge fusion scores are calculated.
f) A voice output layer: the method comprises the steps of performing text analysis on an input text, generating corresponding spectral parameters and fundamental frequency by adopting a multi-space probability distribution HMM parameter generation algorithm to generate a smooth acoustic feature sequence, and submitting the acoustic feature sequence to a synthesizer to generate final voice.
2. The driver is identified using voiceprints. As shown in fig. 2, around the requirement of identity authentication of drivers and passengers in the intelligent security field of business travel caravans, i-vector feature extraction is replaced by a limited boltzmann machine feature extraction technology under the total variation factor, a UBM model under the drive of an EM algorithm is designed, and voiceprint identification under the high-dimensional gaussian component representation is realized.
a) And collecting and processing the voice fragments. The collected voice segments are processed by pre-emphasis, framing and windowing and end point detection. Fourier transform is carried out on the signals to obtain frequency spectrum energy distribution, a triangular filter bank is adopted to carry out critical band division, and amplitude weighting calculation and discrete cosine transform are carried out to obtain a cepstrum coefficient (MFCC).
b) Voiceprint features are obtained. And submitting the cepstrum coefficient to a UBM model trained by an EM algorithm to obtain probability scores of the voiceprint characteristics, and performing template matching with corresponding Gaussian components.
c) And matching the voiceprint templates. Designing a Restricted Boltzmann Machine (RBM) consisting of a visible layer of n nodes and a hidden layer of m nodes, and defining an energy function as follows:
Figure BDA0001546244800000071
where vectors h and v represent the states of the hidden and visible layers, respectively, a and b represent the offsets of the visible and hidden layers, viAnd hjRepresenting the states of the ith visible level node and the jth hidden level node. Given model θ ═ wij,ai,bjGet the joint probability distribution of the states (v, h)
Figure BDA0001546244800000072
Wherein,
Figure BDA0001546244800000073
is a normalization factor.
d) And judging the speaker. And obtaining two types of high-dimensional Gaussian supervectors of the RBM-i-vector by the energy evolution calculation of the RBM, and performing channel compensation by adopting Linear Discriminant Analysis (LDA). And performing cosine similarity calculation on the two compensated RBM-i-vectors, and comparing the two compensated RBM-i-vectors with a preset threshold value, thereby judging the attribution of the voiceprint to a specific speaker.
3. And analyzing the behavior of the driver. Around the analysis requirement of the driver and passenger intentions in the intelligent behavior interaction field of the commercial motor caravan, a deep cascade network with a head and shoulder recognition function is designed by adopting nonlinear motion mode learning and appearance model multi-instance learning, and the analysis of the driver and passenger behavior intentions driven by a hierarchical association multi-target tracking learning strategy is realized.
a) And constructing a deep cascade network screening sample. Constructing a depth cascade network (HsNet) with head and shoulder identification, and intercepting a series of candidate blocks (Patch) according to a preset step length by adopting a multi-scale sliding window for each frame of image to form a sample to be identified; the samples are sent to a pre-trained head-shoulder/non-head-shoulder recognition model HsNet and a three-level CNN cascade network, and classified as shown in figure 3. In the specific classification process, the Patch judged as a negative sample is directly abandoned, and the rest samples continue to enter the next stage of the network for more strict identification and classification, so that three stages of CNN network classification and identification are sequentially carried out; the output result of the third level of the network is used for judging whether the image Patch belongs to a head and shoulder area or not, and the height of the head and shoulder frame is expanded to be 3 times of that of the original corresponding sliding window to obtain a whole body frame detected by the passenger; and for the same passenger, a plurality of detection frames are formed, and finally, redundant detection frames are removed by using a non-maximum suppression strategy, and only one most possible detection frame-passenger detection identification result is reserved at each position.
b) And introducing a nonlinear model and an appearance model for correlation analysis. Introducing nonlinear motion mode learning and appearance model multi-instance learning in a hierarchical association multi-target tracking learning strategy, and performing bottom layer credible association on a detection object to form a track segment; and effectively connecting track segments by utilizing the nonlinear motion mode online learning and the appearance model multi-instance learning to obtain a reliable object track. Parameters such as speed, direction and distance extracted from the motion trail of the object are used as features, and a plurality of features are combined to form higher-level semantics to describe the behavior of the object, so that the behavior intention of the driver and the passenger is judged.
c) And detecting the attitude coincidence degree and a comparison threshold value. As shown in fig. 4, to improve the robustness of behavior detection, continuously detected passenger posture frames are fused into a complete action behavior. The two-posture fusion rule is designed as follows:
Figure BDA0001546244800000081
where f (i, j) is a fusion function, 1 indicates fusion is possible, 0 indicates fusion is not possible,
Figure BDA0001546244800000082
detecting frame overlap ratio, T, for two posesIoU0.5 denotes the overlap ratio threshold, ShisFor two detection in-box histogram matching scores, This35 is the histogram matching threshold, TΔ25 denotes the two-pose time difference threshold.
4. And (5) carrying out face recognition and fatigue monitoring on the driver and the passengers. Around the requirements of identity authentication and fatigue state monitoring of drivers in the intelligent safety field of commercial motor homes, a supervision gradient descent algorithm and a CLM positioning algorithm are respectively adopted, a generalized single-hidden-layer feedforward neural network based on an extreme learning machine is designed to complete identity authentication of specific drivers, and a matching template based on face 3D modeling is used for monitoring the fatigue state of the drivers.
a) Identity authentication based on human face features: roughly positioning the face by radial symmetric transformation, obtaining the optimal iterative vector from the current point to the target point by adopting supervised gradient descent (SDM) learning, and establishing the shape offset delta x as x*-x and the characteristics of the current shape x
Figure BDA0001546244800000083
Linear regression model between
Figure BDA0001546244800000084
Then, iteration is carried out by using the current shape x and the deformation vector delta x to obtainThe desired position vector x:x + Δ x. Constructing a learning objective for SDM:
Figure BDA0001546244800000085
where k is the number of iterations, xkRepresenting the shape vector when iterated to the kth time,
Figure BDA0001546244800000086
representing the coordinates of the ith point in the shape vector. And carrying out repeated iterative learning to obtain the real deviation of the ith point and the actual boundary point. Then, a calculus operator is adopted:
Figure BDA0001546244800000087
accurately locating the eyes, nose, mouth, etc. in the face of a person, where GσAnd (r) is a smoothing function, I (x, y) is an image gray matrix, (a, b) is a circle center, and r is a radius. The face localization result is shown in fig. 5.
A generalized single hidden layer feedforward neural network is designed to carry out face recognition, and the network has invariance to monotone gray level change and angle rotation and insensitivity to image change caused by uneven illumination. In the identification process, fitting the human face organs in the target area with the characteristic points to obtain the marking positions of the characteristic points. And intercepting the subgraph in the neighborhood range of each feature point, acquiring the adjacent features of the face organs, and finally connecting the adjacent features of all the feature points in series to form the feature point limit learning feature. And respectively counting the extreme learning features of each image partition as a training set of a generalized single hidden layer feedforward neural network, training a plurality of extreme learning machines, combining the output results of the feedforward neural network, and searching a specific human identity label fused with the features under the drive of an output decision of an optimal integrated classifier to finish identity authentication. The result of face extreme learning feature extraction is shown in fig. 5.
b) Fatigue monitoring based on human face features: and 3D face modeling of the driver is realized by adopting a 3D face modeling method, and the head posture of the driver is tracked in real time by combining the face recognition method. The eye positions in the 2D face image are indirectly solved by using the eye positions and the head postures in the 3D face model, the feature points in the eye areas are positioned by using a CLM algorithm, and the feature point positioning is verified by using the texture normalization of the face image. The center of the iris is positioned through the physiological structure characteristics of the iris, and the imaging difference of the iris under different illumination conditions is overcome. And (3) positioning the upper eyelid and the lower eyelid by using a parameterized template to extract the eye movement of the driver. According to the eye action characteristics, fatigue characteristics related to the eyelid opening and closing degree, the eye opening and closing speed and the iris movement characteristics are respectively extracted and compared with characteristic values in a waking state to obtain variation characteristics. And constructing a Bayesian network classifier to analyze the relevance among all fatigue characteristics, and finishing the fatigue state judgment of the driver, as shown in FIG. 6.
5. And (5) gesture recognition of the driver and the passenger. Surrounding the requirement that a driver and a vehicle carry out human-vehicle typical gesture interaction with facilities such as vehicle-mounted electronics and vehicle-mounted entertainment, a multi-state Gaussian probability model under a complex background is designed, and the hand motion information and the color information of a hand are combined to carry out human-hand segmentation driver and vehicle gesture recognition, as shown in FIG. 7.
a) And (5) converting and processing the image. Color image sequence I obtained by shootingrgbOn the one hand, it is converted into a 256-level gray image sequence IgrayFor analysis of motion parameters; on the other hand, according to the distribution of RGB colors in HSI space, converting the RGB colors into binary skin color signal image sequence IskinDivided into skin color areas and non-skin color areas.
b) And extracting characteristic information and fusing the image. For gray level image sequence IgrayProcessing to obtain a coarse binary motion image sequence Imov. At the same time, ImovAnd IskinCorresponding to the AND operation between the images to obtain a binary skin motion region image sequence Imov-skinSequence Imov-skinThe middle area is the moving skin area. Due to Imov-skinDoes not necessarily contain a complete hand region, so the seed algorithm is designed to find a complete hand region. First, assume a motion region of a handThe domain is Imov-skinIn accordance with regional connectivity, inmov-skinThe largest connected domain B is found by applying a seed algorithm, and the connected domain B is used as a part of a human hand; then, mapping connected component B to IskinAt the same position in (1), applying a seed algorithm to seed the position, atskinImage sequence I with middle expansion to obtain complete hand areahand. For hand region image sequence IhandExtracting shape characteristics of hand region, combining with IgrayAnd IhandAnd calculating motion parameters in the hand areas of two adjacent frames as the inter-frame motion characteristics.
c) And extracting gesture motion characteristics and shape characteristics. Let L denote the temporal length of the gesture, and the shape feature of the t-th frame is s [ t ]]The motion characteristic between the t-th frame and the t + 1-th frame is m [ t ]]Defining an 8-dimensional feature vector f [ t ]](f[t]=[m[t],s[t]T]) The method is used for uniformly describing the apparent features of the gestures, forming a time scale invariant feature sequence and realizing time scale invariant feature extraction and matching. Constructing a spatiotemporal apparent feature of a gesture A ═ f [0 [ ]],f[1],…,f[L-2]]TDefining a feature vector f [ t ]]Change over time. Regulating the time length L of the gesture motion, and constructing the j characteristic parameter appearance observation x in the ith statei,jThe probability function of (c):
Figure BDA0001546244800000101
wherein xi,jThe method comprises the steps of representing motion characteristics and shape characteristics of a time-normalized gesture sequence, representing any gesture type model by lambda, representing a mathematical expectation matrix of each characteristic parameter by mu, and representing standard deviation by sigma. Then for the gesture model λ (μ, σ), a probability function of the occurrence of the complete gesture sequence observation X can be constructed:
Figure BDA0001546244800000102
for each gesture recognition, calculating
Figure BDA0001546244800000103
The gesture category with the smallest value is the attributive gesture category.
Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, although the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.

Claims (7)

1. A hybrid enhanced intelligent cognitive method of an intelligent business sojourn motor home is characterized by comprising the following steps: the method specifically comprises the following steps:
s1: the driver and the passenger communicate with the vehicle-mounted electronic equipment and are tracked through the conversation state of the user and the vehicle-mounted electronic equipment;
s2: carrying out identity authentication on the driver and the passenger according to the voiceprint information of the driver and the passenger which is tracked and collected;
s3: analyzing the behavior intention of the driver and the passenger;
s4: carrying out face recognition on a driver and a passenger to carry out identity authentication and fatigue monitoring on the driver and the passenger;
s5: performing gesture recognition on the driver and the passengers;
s6: comprehensively obtaining an analysis and identification result;
step S1 specifically includes:
s11: converting the voice of the driver and the crew into text data through a voice recognition engine, and performing spelling error correction on the text data;
s12: performing Word segmentation processing on the text data after error correction to obtain a Word sequence, and obtaining a Word vector through Word2 Vec;
s13: processing the word vectors through a cascade convolution neural network to obtain a conversation scene type;
s14: constructing a deep reinforcement learning network, and iteratively finishing reinforcement learning of a dialogue state behavior strategy through two independent deep reinforcement learning networks;
s15: constructing a semantic knowledge graph through the triples, and calculating the upper and lower bounds of the association score and the embedding cost of the knowledge atoms in the semantic knowledge graph in real time to obtain a knowledge query result;
s16: and generating corresponding spectral parameters and fundamental frequency by adopting a multi-space probability distribution HMM parameter generation algorithm to generate a smooth acoustic feature sequence, and submitting the acoustic feature sequence to a synthesizer to generate final voice.
2. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 1, wherein: step S2 specifically includes:
s21: carrying out pre-emphasis, framing and windowing and end point detection processing on the voice information of the drivers and passengers which is tracked and collected;
s22: fourier transform is carried out on the processed voice information to obtain frequency spectrum energy distribution, a triangular filter bank is adopted to carry out critical band division, and amplitude weighting calculation and discrete cosine transform are carried out to obtain cepstrum coefficients;
s23: inputting the cepstrum coefficient into a speaker identification UBM model to obtain voiceprint characteristics;
s24: and matching the voiceprint templates and judging whether the voiceprint information corresponds to the voiceprint template.
3. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 2, wherein: step S24 specifically includes:
s241: the energy function is defined as the function of,
Figure FDA0002936505820000011
wherein h and v are vectors respectively representing the states of the hidden layer and the visible layer, a and b represent the bias of the visible layer and the hidden layer, vjAnd hjRepresenting the states of the ith visible layer node and the jth hidden layer node, m is the number of the hidden layer nodes, WijIs the connection weight of the visible layer and the hidden layer, i is the number of the ranking, j is the number of the ranking, aiFor the ith visible layer offset, bjBiasing for the jth hidden layer;
s242: given model θ ═ wij,ai,bjObtaining the joint probability distribution of the states (v, h),
Figure FDA0002936505820000021
wherein,
Figure FDA0002936505820000022
is a normalization factor;
s243: obtaining two types of high-dimensional Gaussian supervectors of RBM-i-vector by the energy evolution calculation of RBM, and performing channel compensation by adopting linear discriminant analysis;
s244: and performing cosine similarity calculation on the two types of compensated high-dimensional Gaussian supervectors, comparing the two types of compensated high-dimensional Gaussian supervectors with a preset threshold value, and judging whether the voiceprint information corresponds to the voiceprint information or not.
4. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 2, wherein: step S3 specifically includes:
s31: constructing a depth cascade network with head and shoulder recognition, and intercepting a series of candidate image blocks according to a preset step length by adopting a multi-scale sliding window for each frame of image to form a sample to be recognized;
s32: inputting a sample to be recognized into a trained head-shoulder/non-head-shoulder recognition model for recognition and classification;
s33: introducing a nonlinear model and an appearance model for correlation analysis;
s34: and detecting the attitude coincidence degree and comparing thresholds.
5. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 4, wherein: step S34 specifically includes:
s341: fusing continuously detected passenger posture frames into a complete action;
s342: designing a two-posture fusion rule,
Figure FDA0002936505820000023
wherein f (i, j) is a fusion function, 1 indicates fusion is possible, 0 indicates fusion is not possible,
Figure FDA0002936505820000024
detecting frame overlap ratio, T, for two posesIoUAs a coincidence threshold, ShisFor two detection in-box histogram matching scores, ThisFor histogram matching threshold, t1,t2At time, Δ T is the two-pose time difference threshold.
6. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 4, wherein: the identity authentication of the driver and the passenger in the step S4 specifically comprises the following steps:
s401: roughly positioning the human face through radial symmetric transformation;
s402: obtaining an optimal iterative vector from a current point to a target point by adopting supervised gradient descent (SDM) learning, and establishing a shape offset delta x as x*-x and the characteristics of the current shape x
Figure FDA0002936505820000031
A linear regression model of the two or more,
Figure FDA0002936505820000032
wherein x*For, b is the offset,
Figure FDA0002936505820000033
is a regression model;
s403: iterating by using the current shape x and the deformation vector delta x to obtain an expected position vector;
x:=x+Δx
x represents the desired position vector;
s404: constructing a learning target of the SDM, obtaining the real deviation of the ith point and the actual boundary point,
Figure FDA0002936505820000034
where k is the number of iterations, xkRepresenting the shape vector when iterated to the kth time,
Figure FDA0002936505820000035
representing the coordinates of the ith point in the shape vector,
Figure FDA0002936505820000036
the coordinate deformation amount of the ith point in the shape vector, bkIs the bias at the kth iteration;
s405: by adopting a differential operator, the method adopts the following steps,
Figure FDA0002936505820000037
accurately positioning a human face organ, wherein Gσ(r) is a smoothing function, I (x, y) is an image gray matrix, (a, b) is a circle center, and r is a radius;
s406: fitting the human face organs in the target area with the characteristic points to obtain characteristic point mark positions;
s407: intercepting the subgraph in the neighborhood range of each feature point, obtaining the adjacent features of human face organs, connecting the adjacent features of all the feature points in series to form the feature point limit learning feature,
s408: counting the extreme learning characteristics of each image partition as a training set of a generalized single hidden layer feedforward neural network, training an extreme learning machine, searching an identity label of a specific person with the characteristics fused, and completing identity identification;
the fatigue monitoring of the driver and the passengers in the step S4 is specifically as follows:
s411: 3D face modeling of a driver is realized by adopting a 3D face modeling method, and the head posture of the driver is tracked in real time according to the face recognition method of S406-S408;
s412: solving the positions of the eyes in the 2D face image by using the positions of the eyes and the head posture in the 3D face model;
s413: the method comprises the steps of positioning feature points in an eye region by adopting a face point detection algorithm (CLM), and utilizing face image texture normalization to verify the positioned feature points;
s414: positioning the center of the iris according to the physiological structure characteristics of the iris;
s415: positioning the upper eyelid and the lower eyelid according to the parameterized template, and extracting the eye movement of the driver and the crew;
s416: respectively extracting fatigue characteristics related to the opening and closing degree of the eyelids, the opening and closing speed of the eyes and the motion characteristics of the iris according to the eye movement, and comparing the fatigue characteristics with the characteristics in the waking state to obtain variation characteristics;
s417: and analyzing the relevance among all fatigue characteristics by adopting a Bayesian network classifier to complete the fatigue monitoring of the drivers and passengers.
7. The hybrid enhanced intelligent cognitive method of the intelligent business sojourn caravan according to claim 6, wherein: step S5 specifically includes:
s51: collecting gesture images of driver and passenger, and converting the gesture images into image sequence Irgb
S52: image sequence IrgbConversion into a sequence of grayscale images IgrayAnd image IrgbConversion into a binary skin tone image sequence Iskin
S53: according to a gray-scale image sequence IgrayAnd binary skin tone signal image sequence IskinCalculating motion parameters as the inter-frame motion characteristics;
s54: regulating the time length of the gesture motion, constructing a probability function,
Figure FDA0002936505820000041
where i denotes the ith state, j denotes the jth characteristic parameter, xi,jFor the time-normalized motion and shape characteristics of a gesture sequence, λ represents the gesture class, μ represents the mathematical expectation matrix for each characteristic parameter, σ is the standard deviation, ui,jA mathematical expectation matrix, σ, for the jth characteristic parameter of the ith statei,jThe standard deviation of the jth characteristic parameter of the ith state;
s55: a probability function of the occurrence of the complete gesture sequence observation is constructed,
Figure FDA0002936505820000042
wherein X is the observation of the complete gesture sequence, m is the sum of the gesture states, and n is the sum of the gesture features;
s56: for all kinds of gesture recognition, calculating
Figure FDA0002936505820000043
And obtaining the minimum value which is the attribution gesture category.
CN201810030098.3A 2018-01-12 2018-01-12 Hybrid enhanced intelligent cognitive method of intelligent business travel motor home Active CN108256307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810030098.3A CN108256307B (en) 2018-01-12 2018-01-12 Hybrid enhanced intelligent cognitive method of intelligent business travel motor home

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810030098.3A CN108256307B (en) 2018-01-12 2018-01-12 Hybrid enhanced intelligent cognitive method of intelligent business travel motor home

Publications (2)

Publication Number Publication Date
CN108256307A CN108256307A (en) 2018-07-06
CN108256307B true CN108256307B (en) 2021-04-02

Family

ID=62727133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810030098.3A Active CN108256307B (en) 2018-01-12 2018-01-12 Hybrid enhanced intelligent cognitive method of intelligent business travel motor home

Country Status (1)

Country Link
CN (1) CN108256307B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034020A (en) * 2018-07-12 2018-12-18 重庆邮电大学 A kind of community's Risk Monitoring and prevention method based on Internet of Things and deep learning
CN109079813A (en) * 2018-08-14 2018-12-25 重庆四通都成科技发展有限公司 Automobile Marketing service robot system and its application method
CN109143870B (en) * 2018-10-23 2021-08-06 宁波溪棠信息科技有限公司 Multi-target task control method
CN110070884B (en) * 2019-02-28 2022-03-15 北京字节跳动网络技术有限公司 Audio starting point detection method and device
CN109918513B (en) * 2019-03-12 2023-04-28 北京百度网讯科技有限公司 Image processing method, device, server and storage medium
CN110111795B (en) * 2019-04-23 2021-08-27 维沃移动通信有限公司 Voice processing method and terminal equipment
CN112308116B (en) * 2020-09-28 2023-04-07 济南大学 Self-optimization multi-channel fusion method and system for old-person-assistant accompanying robot

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120317621A1 (en) * 2011-06-09 2012-12-13 Canon Kabushiki Kaisha Cloud system, license management method for cloud service
US9286029B2 (en) * 2013-06-06 2016-03-15 Honda Motor Co., Ltd. System and method for multimodal human-vehicle interaction and belief tracking
CN105654753A (en) * 2016-01-08 2016-06-08 北京乐驾科技有限公司 Intelligent vehicle-mounted safe driving assistance method and system
CN105812129A (en) * 2016-05-10 2016-07-27 成都景博信息技术有限公司 Method for monitoring vehicle running state
CN104183091B (en) * 2014-08-14 2017-02-08 苏州清研微视电子科技有限公司 System for adjusting sensitivity of fatigue driving early warning system in self-adaptive mode
CN106682603A (en) * 2016-12-19 2017-05-17 陕西科技大学 Real time driver fatigue warning system based on multi-source information fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120317621A1 (en) * 2011-06-09 2012-12-13 Canon Kabushiki Kaisha Cloud system, license management method for cloud service
US9286029B2 (en) * 2013-06-06 2016-03-15 Honda Motor Co., Ltd. System and method for multimodal human-vehicle interaction and belief tracking
CN104183091B (en) * 2014-08-14 2017-02-08 苏州清研微视电子科技有限公司 System for adjusting sensitivity of fatigue driving early warning system in self-adaptive mode
CN105654753A (en) * 2016-01-08 2016-06-08 北京乐驾科技有限公司 Intelligent vehicle-mounted safe driving assistance method and system
CN105812129A (en) * 2016-05-10 2016-07-27 成都景博信息技术有限公司 Method for monitoring vehicle running state
CN106682603A (en) * 2016-12-19 2017-05-17 陕西科技大学 Real time driver fatigue warning system based on multi-source information fusion

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Hybrid-augmented intelligence:collaboration and cognition;Nan-ning ZHENG,et al;《Frontiers of Information Technology & Electronic Engineering》;20170215;第18卷(第2期);第153-179页 *
Toward Intelligent Driver-Assistance and Safety Warning Systems;Nan-Ning Zheng,et al;《IEEE Intelligent System》;20040430;第8-11页 *
Visually Guided Landing of an Unmanned Aerial Vehicle;Srikanth Saripalli,et al;《IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION》;20030625;第19卷(第3期);第371-380页 *
基于数据资源的认知图挖掘系统研究;李嫄源,等;《重庆邮电大学学报 (自然科学版 )》;20110630;第23卷(第3期);第374-378页 *
车用自组网媒体访问控制机制改进;李嫄源,等;《微电子学》;20110630;第41卷(第3期);第372-376,380页 *

Also Published As

Publication number Publication date
CN108256307A (en) 2018-07-06

Similar Documents

Publication Publication Date Title
CN108256307B (en) Hybrid enhanced intelligent cognitive method of intelligent business travel motor home
CN109409296B (en) Video emotion recognition method integrating facial expression recognition and voice emotion recognition
CN110609891B (en) Visual dialog generation method based on context awareness graph neural network
CN107145842B (en) Face recognition method combining LBP characteristic graph and convolutional neural network
Chen et al. K-means clustering-based kernel canonical correlation analysis for multimodal emotion recognition in human–robot interaction
CN106127156A (en) Robot interactive method based on vocal print and recognition of face
CN112101241A (en) Lightweight expression recognition method based on deep learning
CN101187990A (en) A session robotic system
CN110853656B (en) Audio tampering identification method based on improved neural network
Ocquaye et al. Dual exclusive attentive transfer for unsupervised deep convolutional domain adaptation in speech emotion recognition
CN111114556A (en) Lane change intention identification method based on LSTM under multi-source exponential weighting loss
CN108345866B (en) Pedestrian re-identification method based on deep feature learning
CN115294658B (en) Personalized gesture recognition system and gesture recognition method for multiple application scenes
CN109344713A (en) A kind of face identification method of attitude robust
CN103035239B (en) Speaker recognition method based on partial learning
CN113361636A (en) Image classification method, system, medium and electronic device
CN116363712B (en) Palmprint palm vein recognition method based on modal informativity evaluation strategy
CN106992000A (en) Prediction-based multi-feature fusion old people voice emotion recognition method
Ahammad et al. Recognizing Bengali sign language gestures for digits in real time using convolutional neural network
Das et al. Emotion recognition from face dataset using deep neural nets
Sahu et al. Modeling feature representations for affective speech using generative adversarial networks
CN116244474A (en) Learner learning state acquisition method based on multi-mode emotion feature fusion
Shekofteh et al. MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space
Shukla et al. A novel stochastic deep conviction network for emotion recognition in speech signal
CN114582372A (en) Multi-mode driver emotional feature recognition method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant