CN108256307A - A kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car - Google Patents

A kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car Download PDF

Info

Publication number
CN108256307A
CN108256307A CN201810030098.3A CN201810030098A CN108256307A CN 108256307 A CN108256307 A CN 108256307A CN 201810030098 A CN201810030098 A CN 201810030098A CN 108256307 A CN108256307 A CN 108256307A
Authority
CN
China
Prior art keywords
driver
passenger
characteristic
intelligent
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810030098.3A
Other languages
Chinese (zh)
Other versions
CN108256307B (en
Inventor
朱智勤
王冠
李鹏华
李嫄源
秦石磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201810030098.3A priority Critical patent/CN108256307B/en
Publication of CN108256307A publication Critical patent/CN108256307A/en
Application granted granted Critical
Publication of CN108256307B publication Critical patent/CN108256307B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • Ophthalmology & Optometry (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

A kind of mixing the present invention relates to intelligent business Sojourn house car enhances intelligent cognition method, and this method specifically comprises the following steps:S1:Driver and passenger are exchanged with vehicle electronic device, and the dialogue state for passing through user and vehicle electronic device tracks;S2:Voiceprint according to collected driver and passenger are tracked carries out authentication to driver and passenger;S3:The behavior intention of driver and passenger is analyzed;S4:Recognition of face is carried out to driver and passenger and carries out driver and passenger's identity authentication and fatigue monitoring;S5:Gesture identification is carried out to driver and passenger;S6:Synthesis obtains analysis and identification result.The cognitive model of the effect of people and people is introduced into commercial Sojourn house car by the present invention forms stronger intelligent form, hoisting machine understands and adapts to commercial Sojourn house car internal and external environment, completes the ability of complicated space time correlation task, enhances commercial Sojourn house car function and space experience.

Description

A kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car
Technical field
The invention belongs to the mixing enhancing intelligent cognition fields of artificial intelligence, are related to a kind of the mixed of intelligent business Sojourn house car Close enhancing intelligent cognition method.
Background technology
At present, China enters whole people's self-driving tourism epoch.Commercial Sojourn house car tourism is increasingly by consumers in general Welcome, government also drives the cooperative development of tourist industry and automobile industry greatly developing commercial Sojourn house car industry.With this Meanwhile China's development of automobile industry is also marched toward " intelligent network connection " development epoch.Sojourn house car is intelligent networking automobile and intelligent family United product is occupied, embodies the social life that artificial intelligence enters people, realizes the depth integration of Science & Society life. During the intelligence of commercial Sojourn house car, information entrained by the upper each modal data of dress system is identified, reasoning, is recognized Know it is one of intelligent key problem to be solved of caravan.
Invention content
In view of this, the purpose of the present invention is to provide a kind of mixing of intelligent business Sojourn house car to enhance intelligent cognition side Method is expressed by across the media Uniform semantics for building commercial Sojourn house car various dimensions intelligent space, by recognizing for the effect of people and people Perception model, which is introduced into commercial Sojourn house car, forms stronger intelligent form, and hoisting machine understands and adapts in commercial Sojourn house car External environment, the ability for completing complicated space time correlation task, enhance commercial Sojourn house car function and space experience.
In order to achieve the above objectives, the present invention provides following technical solution:
A kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car, this method specifically comprise the following steps:
S1:Driver and passenger are exchanged with vehicle electronic device, and pass through the dialogue state of user and vehicle electronic device Tracking;
S2:Voiceprint according to collected driver and passenger are tracked carries out authentication to driver and passenger;
S3:The behavior intention of driver and passenger is analyzed;
S4:Recognition of face is carried out to driver and passenger and carries out driver and passenger's identity authentication and fatigue monitoring;
S5:Gesture identification is carried out to driver and passenger;
S6:Synthesis obtains analysis and identification result.
Further, step S1 is specially:
S11:The voice of driver and passenger is converted to, and text data is spelled by text data by speech recognition engine Write error correction;
S12:By after error correction text data carry out word segmentation processing, obtain word sequence, by Word2Vec obtain word to Amount;
S13:Term vector is handled by concatenated convolutional neural network, obtains session operational scenarios type;
S14:Depth enhancing learning network is built, enhancing learning network iteration by two independent depth completes dialogue shape The intensified learning of state behavioral strategy;
S15:Semantic knowledge figure is built by triple, calculates knowledge atom relationship score and embedding in semantic knowledge figure in real time Enter the bound of cost, obtain knowledge query results;
S16:Corresponding spectrum parameter is generated using more spatial probability distribution HMM parameter generation algorithms and fundamental frequency generates smoothly Acoustic feature sequence, and submit to synthesizer and generate final voice.
Further, step S2 is specially:
S21:Preemphasis, framing adding window and end-point detection processing are carried out to tracking collected driver and passenger's voice messaging;
S22:By treated, voice messaging progress Fourier transformation obtains spectrum energy distribution, using triangle filter Group carries out critical band division, and line amplitude of going forward side by side weighted calculation and discrete cosine transform obtain cepstrum coefficient;
S23:Cepstrum coefficient is input to speaker identification (UBM) model, obtains vocal print feature;
S24:Vocal print template matches are carried out, whether judgement voiceprint corresponds to.
Further, step S24 is specially:
S241:Defining energy function is,
Wherein, h and v is vector, represents the state of hidden layer and visible layer respectively, and a and b represents the inclined of visible layer and hidden layer It puts, vjAnd hjRepresent the state of i-th of visible node layer and j-th of hidden node, m is hidden node number, WijFor visible layer With the connection weight of hidden layer, i for which sequence number, j for which sort number, aiIt is biased for i-th of visible layer, bjIt is j-th Hidden layer biases;
S242:Setting models θ={ wij,ai,bjThe joint probability distribution of state (v, h) is obtained,
Wherein,For normalization factor;
S243:The higher-dimension Gauss super vector for obtaining two class RBM-i-vector is calculated by the energy evolution of RBM, using line Property discriminant analysis carry out channel compensation;
S244:Two class higher-dimension Gauss super vectors after compensation are subjected to the similar calculating of cosine, and compared with predetermined threshold value, Whether judgement voiceprint corresponds to.
Further, step S3 is specially:
S31:Depth cascade network of the structure with head and shoulder identification, to every frame image using multi-scale sliding window mouth according to pre- A series of candidate segments of fixed step size interception, form sample to be identified;
S32:Sample to be identified is inputted into trained head and shoulder/non-head and shoulder identification model, classification is identified;
S33:It introduces nonlinear model and display model is associated analysis;
S34:Test pose registration simultaneously carries out threshold value comparison.
Further, step S34 is specially:
S341:The passenger attitude frame being consecutively detected is fused into a complete action;
S342:Two posture fusion rules are designed,
Wherein, wherein f (i, j) is fusion function, and 1 represents to merge, and 0 represents to merge,
For two attitude detection frame registrations, TIoUAttach most importance to Right threshold value, ShisFor Histogram Matching score in two detection blocks, ThisFor Histogram Matching threshold value, t1, t2For the moment, Δ T is two Posture time difference threshold value.
Further, driver and passenger's identity authentication is specially in step S4:
S401:Pass through radial symmetry transform coarse positioning face;
S402:Current point is obtained to the optimal iterative vectorized of target point using supervision gradient descent method (SDM) study, is established Shaped Offset amount Δ x=x*The feature of-x and current shape xBetween linear regression model (LRM),
Wherein x*For, b is biasing,For regression model;
S403:Desired position vector is obtained using current shape x and deformation vectors Δ x iteration;
x:=x+ Δs x
x:Represent desired position vector;
S404:The learning objective of SDM is constructed, obtains the i-th point of true deviation with actual boundary point,
Wherein, k is iterations, xkShape vector when representing to iterate to kth time,Represent in the shape vector The coordinate of i point,For i-th point of coordinate deformation quantity in the shape vector, bkBiasing during for kth time iteration;
S405:Using differential operator,
Face's organ of people is accurately positioned, wherein, Gσ(r) for smooth function, I (x, y) is gradation of image matrix, (a, b) For the center of circle, r is radius;
S406:Human face in target area with characteristic point is fitted, obtains characteristic point mark position;
S407:Interception subgraph in each characteristic point contiguous range, obtains human face adjacent features, and by all characteristic points Adjacent features characteristic point limit learning characteristic in series,
S408:Count training of the limit learning characteristic of each picture portion as extensive Single hidden layer feedforward neural networks Collection, training extreme learning machine search for the identity label of the particular person of fusion feature, complete identity authentication;
Driver and passenger's fatigue monitoring is specially in step S4:
S411:The 3D faces for realizing driver using 3D human face model buildings model, and according to the face of S406-S408 The head pose of recognition methods real-time tracing driver;
S412:The position of eyes in 2D facial images is solved using the eye position in 3D faceforms and head pose;
S413:Characteristic point in eye areas is positioned, and utilize facial image texture using face point detection algorithm (CLM) Normalization verifies positioned characteristic point;
S414:Iris center is positioned according to the physiological structure characteristic of iris;
S415:Upper palpebra inferior is positioned according to parameterized template, extracts driver and passenger's eye motion;
S416:According to eye motion, eyelid opening and closing degree is extracted respectively, eyes open that close speed related to iris kinetic characteristic Fatigue characteristic, and compared with the feature under waking state, obtain variation features;
S417:Relevance between each fatigue characteristic is analyzed using BAYESIAN NETWORK CLASSIFIER, it is tired to complete driver and passenger Labor monitors.
Further, step S5 is specially:
S51:The images of gestures of driver and passenger is acquired, and is converted into image sequence Irgb
S52:By image sequence IrgbBe converted to grayscale image sequence Igray, and by image IrgbBe converted to two-value skin-color news Image sequence Iskin
S53:According to grayscale image sequence IgrayWith two-value skin-color news image sequence IskinKinematic parameter is calculated, as frame Between motion feature;
S54:The time span of regular gesture motion constructs probability function,
Wherein i represents i-th of state, and j represents j-th of characteristic parameter, xi,jMovement for time normalization gesture sequence is special It seeks peace shape feature, λ represents gesture class, and μ represents the mathematic expectaion matrix of each characteristic parameter, and σ is standard deviation, ui,jIt is i-th The mathematic expectaion matrix of j-th of characteristic parameter of state, σi,jThe standard deviation of j-th of characteristic parameter for i-th of state;
S55:There is the probability function of complete gesture sequence observation in structure,
Wherein, X is the observation of complete gesture sequence, m for gesture state and, n for gesture feature and;
S56:For all kinds of gesture identifications, calculate
The minimum value of acquisition is to belong to gesture classification.
The beneficial effects of the present invention are:The present invention is the enhancing of the mixing based on deep learning towards commercial Sojourn house car Intelligent cognition technology.
First is the demand interacted for the facilities such as driver and passenger and vehicle electronics, vehicle device amusement into pedestrian's intelligent sound, This patent devises the more people's dialog models of people's vehicle, realizes that driver and passenger exchange with the intelligent sound of mobile unit.The module is specific Including data under voice layer, pretreatment layer, semantic analytic sheaf, Dialog management Layer, knowledge reasoning layer, voice output layer.Voice Data realize the speech exchange of people and onboard system by analysis and processing step by step.
Second is the identification and fatigue detecting for driver.Application on Voiceprint Recognition and people are included to the discriminating of driver Face identifies, carries out personal identification by algorithm drives model and BP network model respectively.Pass through the knowledge to driver Not, the safety of caravan and its internal property has been ensured.
Third, the driver fatigue detection for ensureing traffic safety is required for caravan mobile unit.Fatigue detecting is By the way that head pose, the information extraction of eye motion and the characteristic value calculated under characteristic value and waking state are made and being compared, obtain Take variation features and according to the values of variation features to determine whether fatigue.
4th, behavioural analysis and gesture identification of this patent for driver also proposed the innovation idea of oneself.By non- The training projected depth of linear movement model and display model joins grade network and is intended to analyze the behavior of driver, carries out auxiliary and drives It sails.Gesture identification is a part for vehicle-mounted people's car mutual, and the purpose of this part is the demand in order to simplify driver's operation, is passed through The movable information and colouring information of colourful states model opponent extract make analysis come with driver interaction, meet driver's Demand.The method of the proposition of this patent enriches the reality of driver and passenger while property safety and traffic safety is ensured Experience.
Description of the drawings
In order to make the purpose of the present invention, technical solution and advantageous effect clearer, the present invention provides drawings described below and carries out Explanation:
Fig. 1 is more wheel dialog models of the embodiment of the present invention based on POMDP strategies and ternary knowledge mapping;
Fig. 2 is that overall factor of the embodiment of the present invention based on limited Boltzmann machine models schematic diagram;
Fig. 3 is three-level depth cascade network structure chart of the embodiment of the present invention;
Fig. 4 is on-line study of embodiment of the present invention non-linear movement pattern and the multiple target tracking of robust display model;
Fig. 5 embodiment of the present invention face characteristic extracts region and locating effect figure;
Fig. 6 is driver and passenger of embodiment of the present invention fatigue state monitoring figure;
Fig. 7 is gesture feature of embodiment of the present invention space-time performance figure.
Specific embodiment
Below in conjunction with attached drawing, the preferred embodiment of the present invention is described in detail.
The present invention includes the following steps:
1st, people's vehicle intelligent sound interaction technique.The facilities such as entertain into pedestrian's vehicle around driver and passenger and vehicle electronics, vehicle device The demand of intelligent sound interaction, using dialogue state tracking and administrative skill, design is based on POMDP strategies and ternary knowledge mapping People's vehicle take turns dialog model more, realize that driver and passenger exchange with the smooth of mobile unit.As shown in Figure 1, it is as follows:
A) data collection layer:User speech is converted to by text data by speech recognition engine, and completes word spelling Error correction.
B) pretreatment layer:Text data after correction is subjected to word segmentation processing and obtains word sequence, it is complete in vocabulary and semanteme Into part-of-speech tagging, entity name, to refer to disambiguation, relationship altogether interdependent, and obtains term vector by Word2Vec.
C) semantic analytic sheaf:The term vector merged after encoding is submitted into concatenated convolutional neural network, is completed preliminary semantic Parsing obtains session operational scenarios type.
D) Dialog management Layer:Design dialogue problem guiding strategy realizes dialogue state tracking in POMDP models, passes through structure Depth enhancing learning network (DQN) is built, the extensive chemical of dialogue state behavioral strategy is completed by the iteration of two independent Q networks It practises.
E) knowledge reasoning layer:Semantic knowledge figure is built by building triple, in the case where not using index, in real time The bound of knowledge atom relationship score and embedded cost in calculation knowledge figure, derives the knowledge query results of Top-k, and On the basis of determining single scene and across scene knowledge atom combination of sets, corresponding scoring function is separately designed, with reference to multiple row convolution Training of the network-driven to combination of sets transboundary calculates knowledge fusion score transboundary.
F) voice output layer:Text analyzing is carried out by the text to input, is given birth to using more spatial probability distribution HMM parameters Corresponding spectrum parameter is generated into algorithm and fundamental frequency generates smooth acoustic feature sequence, and is submitted to synthesizer and generated final language Sound.
2nd, differentiate driver with vocal print.As shown in Fig. 2, the driving people in commercial Sojourn house car intelligent and safe field Member's authentication demand substitutes i-vector features using the limited Boltzmann machine Feature Extraction Technology under the entire change factor Extraction designs the UBM model under EM algorithm drives, realizes that the vocal print under higher-dimension Gaussian component characterization differentiates.
A) acquisition process sound bite.Preemphasis, framing adding window and end-point detection are carried out by the sound bite to acquisition Processing.Signal progress Fourier transformation is obtained into spectrum energy distribution, critical band division is carried out using triangle filter group, and It carries out amplitude weighting calculating and discrete cosine transform obtains cepstrum coefficient (MFCC).
B) vocal print feature is obtained.Cepstrum coefficient is submitted to the UBM model trained by EM algorithms, obtains the general of vocal print feature Rate score, and carry out template matches with corresponding Gaussian component.
C) vocal print template matches.The limited Boltzmann that design is made of the visible layer of n node and the hidden layer of m node Machine (RBM), defining its energy function is:Wherein, vectorial h and v difference The state of hidden layer and visible layer is represented, a and b represents the biasing of visible layer and hidden layer, viAnd hjRepresent i-th visible node layer and The state of j-th of hidden node.Setting models θ={ wij,ai,bj, obtain the joint probability distribution of state (v, h)Wherein,For normalization factor.
D) judge speaker.Calculated by the energy evolution of RBM obtain two class RBM-i-vector higher-dimension Gauss surpass to Amount, and channel compensation is carried out using linear discriminant analysis (LDA).Two class RBM-i-vector after compensation are subjected to cosine phase Like calculating, and compared with predetermined threshold value, so as to judge ownership of the vocal print to speaker dependent.
3rd, driver's behavioural analysis.Driver and passenger in commercial Sojourn house car intelligent behavior interaction field are intended to divide Analysis demand, using non-linear movement pattern study and the more case-based learnings of display model, depth of the design with head and shoulder identification function Cascade network realizes that driver and passenger's behavior of layering association multiple target tracking learning strategy driving is intended to analysis.
A) depth cascade network Screening Samples are built.Depth cascade network (HsNet) of the structure with head and shoulder identification, to every Frame image, according to a series of candidate segments (Patch) of pre- fixed step size interception, forms sample to be identified using multi-scale sliding window mouth; By these samples be sent into advance trained head and shoulder/non-head and shoulder identification model HsNet, three-level CNN cascade networks, as shown in figure 3, Classify.In specific assorting process, the Patch for being judged as negative sample directly gives up, and remaining sample goes successively to net The next stage of network carries out tightened up identification classification, so carries out three-level CNN network class successively and differentiates;The network third level it is defeated Go out result for judging whether image Patch belongs to head and shoulder region, head and shoulder frame height degree is extended to the 3 of former corresponding sliding window Times, obtain the whole body frame of occupant detection;For same passenger, multiple detection blocks can be formed, finally with non-maxima suppression plan Extra detection block is slightly rejected, each position only retains a most probable detection block-occupant detection recognition result.
B) it introduces nonlinear model and display model is associated analysis.It is associated in multiple target tracking learning strategy in layering Non-linear movement pattern study and the more case-based learnings of display model are introduced, by carrying out the credible association of bottom, shape to detection object Into path segment;Using non-linear movement pattern on-line study and the more case-based learnings of display model, path segment is carried out effective Connection, obtains reliable object trajectory.Using parameters such as speed, direction, the distances extracted from object motion trajectory as special Multiple features are combined the more advanced semanteme of composition and carry out description object behavior, so as to judge that driver and passenger's behavior is intended to by sign.
C) test pose registration and compare threshold value.As shown in figure 4, the robustness to improve behavioral value, will continuously examine The passenger attitude frame measured is fused into a complete action behavior.Designing two posture fusion rules is:
Wherein f (i, j) is fusion function, and 1 represents to merge, and 0 represents to merge,
For two attitude detection frame registrations, TIoU= 0.5 represents registration threshold value, ShisFor Histogram Matching score in two detection blocks, This=35 be Histogram Matching threshold value, TΔ= 25 represent two posture time difference threshold values.
4th, driver and passenger's recognition of face and fatigue monitoring.Driver and passenger in commercial Sojourn house car intelligent and safe field Supervision gradient descent algorithm and CLM location algorithms is respectively adopted in authentication and fatigue state monitoring requirements, and design is based on the limit The identity that the extensive Single hidden layer feedforward neural networks of learning machine complete specific driver differentiates and based on face 3D modeling Matching template realizes the fatigue state monitoring of driver.
A) identity authentication based on face characteristic:By radial symmetry transform coarse positioning face, declined using supervision gradient Method (SDM) study obtains current point to the optimal iterative vectorized of target point, establishes shaped Offset amount Δ x=x*- x and current shape The feature of xBetween linear regression model (LRM)Then current shape x and deformation vectors Δ are utilized X iteration obtains desired position vector x:=x+ Δs x.Construct the learning objective of SDM:
Wherein, k is iterations, xkShape vector when representing to iterate to kth time,Represent i-th in the shape vector The coordinate of a point.Successive ignition study is carried out, obtains the i-th point of true deviation with actual boundary point.Then calculus is used Operator:
It is accurately positioned the organs, wherein G such as eyes, nose, the mouth in faceσ(r) it is smooth function, I (x, y) is image ash Matrix is spent, (a, b) is the center of circle, and r is radius.The results are shown in Figure 5 for Face detection.
It designs extensive Single hidden layer feedforward neural networks and carries out recognition of face, which rotates dull grey scale change and angle With invariance, there is insensitivity to image change caused by uneven illumination.In identification process, by the people in target area Face is fitted with characteristic point, obtains characteristic point mark position.Subgraph is intercepted in each characteristic point contiguous range, is obtained Human face adjacent features, finally by all characteristic point adjacent features characteristic point limit learning characteristic in series.It counts respectively Training set of the limit learning characteristic of each picture portion as extensive Single hidden layer feedforward neural networks, the multiple limit study of training Machine, the output of combination feedforward neural network is as a result, and under the driving of optimal integrated classifier output decision, search for fusion feature Particular person identity label completes identity authentication.The results are shown in Figure 5 for the extraction of face limit learning characteristic.
B) fatigue monitoring based on face characteristic:The 3D faces for realizing driver using 3D human face model buildings model, And combine the head pose of above-mentioned face identification method real-time tracing driver.Using the eye position in 3D faceforms with Head pose solves the eye position in 2D facial images indirectly, and the characteristic point in eye areas, and profit are positioned using CLM algorithms Calibration feature point location is normalized with face-image texture.Iris center is positioned by the physiological structure characteristic of iris, overcomes rainbow Imaging difference of the film under different illumination conditions.Upper palpebra inferior is positioned using parameterized template, realizes driver's eye motion Extraction.According to eye motion characteristic, eyelid opening and closing degree is extracted respectively, eyes open and close speed and iris kinetic characteristic associated fatigue Feature, and make comparisons to obtain variation features with the feature value under waking state.It is each to build BAYESIAN NETWORK CLASSIFIER analysis Relevance between a fatigue characteristic completes the fatigue state differentiation of driver, as shown in Figure 6.
5th, driver and passenger's gesture identification.It is typical into pedestrian's vehicle around the facilities such as driver and passenger and vehicle electronics, vehicle device amusement The demand of gesture interaction designs the multi state Gaussian probability model under complex background, with reference to the movable information and colouring information of hand Driver and passenger's gesture identification of human hand segmentation is carried out, as shown in Figure 7.
A) conversion and processing of image.Color image sequence I is obtained by shootingrgb, on the one hand it is converted into 256 grades of ashes Spend image sequence Igray, for the analysis of kinematic parameter;On the other hand it according to distribution of the RGB color in HSI spaces, is converted Image sequence I is interrogated for two-value skin-colorskin, wherein being divided into skin-coloured regions and non-skin color region.
B) extraction of characteristic information and image co-registration.To grayscale image sequence Igray, handle and obtain rough two-value movement Image sequence Imov.Meanwhile ImovAnd IskinBetween correspondence image with operation, obtain two-value skin movements area image sequence Imov-skin, sequence Imov-skinMiddle region is the skin area of movement.Due to Imov-skinIn not necessarily comprising complete hand region, Therefore design seed algorithm finds complete hand region.First, it is assumed that the moving region of hand is Imov-skinIn major part, root According to the connectivity of region, in Imov-skinIt is middle to apply seed algorithm, maximum connected domain B is found, using this connected domain B as human hand A part;Then, connected domain B is mapped to IskinIn same position, using seed algorithm using this position as seed, in Iskin Middle extension obtains the image sequence I in complete hand regionhand.For hand area image sequence Ihand, the shape spy in extraction hand region Sign, with reference to IgrayAnd Ihand, in the hand region of adjacent two frame, kinematic parameter is calculated, as interframe movement feature
C) extraction of gesture motion feature and shape feature.L is enabled to represent the time span of gesture, the shape feature of t frames It is s [t], the motion feature between t frames and t+1 frames is m [t], defines 8 dimensional feature vector f [t] (f [t]=[m [t],s[t]T]), for the appearance features of Unify legislation gesture, THE INVARIANCE OF THE SCALE OF TIME characteristic sequence is formed, realizes time scale Invariant feature extraction and matching.Construct the space-time appearance features A=[f [0], f [1] ..., f [L-2]] of gestureTDefined feature to Amount f [t] changes with time.The time span L of regular gesture motion, construction i-th of state, j-th of characteristic parameter are observed xi,jProbability function:
Wherein xi,jRepresent the motion feature and shape feature of time normalization gesture sequence, λ represents any gesture class mould Type, μ represent the mathematic expectaion matrix of each characteristic parameter, and σ is standard deviation.Then for gesture model λ (μ, σ), appearance can be built The probability function of complete gesture sequence observation X:
During to each gesture identification, calculateObtain the minimum as ownership of value Gesture classification.
Finally illustrate, preferred embodiment above is only to illustrate the technical solution of invention and unrestricted, although passing through Above preferred embodiment is described in detail the present invention, however, those skilled in the art should understand that, can be in shape Various changes are made in formula and to it in details, without departing from claims of the present invention limited range.

Claims (8)

1. a kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car, it is characterised in that:This method is specifically comprising such as Lower step:
S1:Driver and passenger are exchanged with vehicle electronic device, and the dialogue state for passing through user and vehicle electronic device tracks;
S2:Voiceprint according to collected driver and passenger are tracked carries out authentication to driver and passenger;
S3:The behavior intention of driver and passenger is analyzed;
S4:Recognition of face is carried out to driver and passenger and carries out driver and passenger's identity authentication and fatigue monitoring;
S5:Gesture identification is carried out to driver and passenger;
S6:Synthesis obtains analysis and identification result.
2. a kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car according to claim 1, feature exist In:Step S1 is specially:
S11:The voice of driver and passenger is converted to by text data by speech recognition engine, and spelling is carried out to text data and is entangled It is wrong;
S12:Text data after error correction is subjected to word segmentation processing, word sequence is obtained, term vector is obtained by Word2Vec;
S13:Term vector is handled by concatenated convolutional neural network, obtains session operational scenarios type;
S14:Depth enhancing learning network is built, enhancing learning network iteration by two independent depth completes dialogue state row Intensified learning for strategy;
S15:Semantic knowledge figure is built by triple, calculates knowledge atom relationship score and embedded generation in semantic knowledge figure in real time The bound of valency obtains knowledge query results;
S16:Corresponding spectrum parameter is generated using more spatial probability distribution HMM parameter generation algorithms and fundamental frequency generates smooth acoustics Characteristic sequence, and submit to synthesizer and generate final voice.
3. a kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car according to claim 2, feature exist In:Step S2 is specially:
S21:Preemphasis, framing adding window and end-point detection processing are carried out to tracking collected driver and passenger's voice messaging;
S22:Treated voice messaging is subjected to Fourier transformation and obtains spectrum energy distribution, using triangle filter group into Row critical band divides, and line amplitude of going forward side by side weighted calculation and discrete cosine transform obtain cepstrum coefficient;
S23:Cepstrum coefficient is input to speaker identification (UBM) model, obtains vocal print feature;
S24:Vocal print template matches are carried out, whether judgement voiceprint corresponds to.
4. a kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car according to claim 3, feature exist In:Step S24 is specially:
S241:Defining energy function is,
Wherein, h and v is vector, represents the state of hidden layer and visible layer respectively, and a and b represents the biasing of visible layer and hidden layer, vj And hjRepresent the state of i-th of visible node layer and j-th of hidden node, m is hidden node number, WijFor visible layer and hidden layer Connection weight, i for which sequence number, j for which sort number, aiIt is biased for i-th of visible layer, bjIt is inclined for j-th of hidden layer It puts;
S242:Setting models θ={ wij,ai,bjThe joint probability distribution of state (v, h) is obtained,
Wherein,For normalization factor;
S243:The higher-dimension Gauss super vector for obtaining two class RBM-i-vector is calculated by the energy evolution of RBM, using linearly sentencing Channel compensation Fen Xi not carried out;
S244:Two class higher-dimension Gauss super vectors after compensation are subjected to the similar calculating of cosine, and compared with predetermined threshold value, judgement Whether voiceprint corresponds to.
5. a kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car according to claim 3, feature exist In:Step S3 is specially:
S31:Depth cascade network of the structure with head and shoulder identification, to every frame image using multi-scale sliding window mouth according to predetermined step A series of long candidate segments of interception, form sample to be identified;
S32:Sample to be identified is inputted into trained head and shoulder/non-head and shoulder identification model, classification is identified;
S33:It introduces nonlinear model and display model is associated analysis;
S34:Test pose registration simultaneously carries out threshold value comparison.
6. a kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car according to claim 5, feature exist In:Step S34 is specially:
S341:The passenger attitude frame being consecutively detected is fused into a complete action;
S342:Two posture fusion rules are designed,
Wherein, wherein f (i, j) is fusion function, and 1 represents to merge, and 0 represents to merge,
For two attitude detection frame registrations, TIoUFor registration Threshold value, ShisFor Histogram Matching score in two detection blocks, ThisFor Histogram Matching threshold value, t1, t2For the moment, Δ T is two postures Time difference threshold value.
7. a kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car according to claim 5, feature exist In:Driver and passenger's identity authentication is specially in step S4:
S401:Pass through radial symmetry transform coarse positioning face;
S402:Current point is obtained to the optimal iterative vectorized of target point using supervision gradient descent method (SDM) study, establishes shape Offset Δ x=x*The feature of-x and current shape xBetween linear regression model (LRM),
Wherein x*For, b is biasing,For regression model;
S403:Desired position vector is obtained using current shape x and deformation vectors Δ x iteration;
x:=x+ Δs x
x:Represent desired position vector;
S404:The learning objective of SDM is constructed, obtains the i-th point of true deviation with actual boundary point,
Wherein, k is iterations, xkShape vector when representing to iterate to kth time,Represent i-th point in the shape vector Coordinate,For i-th point of coordinate deformation quantity in the shape vector, bkBiasing during for kth time iteration;
S405:Using differential operator,
Face's organ of people is accurately positioned, wherein, Gσ(r) it is smooth function, I (x, y) is gradation of image matrix, and (a, b) is circle The heart, r are radius;
S406:Human face in target area with characteristic point is fitted, obtains characteristic point mark position;
S407:Interception subgraph in each characteristic point contiguous range, obtains human face adjacent features, and by the neighbour of all characteristic points Nearly feature series connection constitutive characteristic point limit learning characteristic,
S408:Training set of the limit learning characteristic of each picture portion as extensive Single hidden layer feedforward neural networks is counted, is instructed Practice extreme learning machine, search for the identity label of the particular person of fusion feature, complete identity authentication;
Driver and passenger's fatigue monitoring is specially in step S4:
S411:The 3D faces for realizing driver using 3D human face model buildings model, and according to the recognition of face of S406-S408 The head pose of method real-time tracing driver;
S412:The position of eyes in 2D facial images is solved using the eye position in 3D faceforms and head pose;
S413:Characteristic point in eye areas is positioned, and utilize facial image texture normalizing using face point detection algorithm (CLM) Change and verify positioned characteristic point;
S414:Iris center is positioned according to the physiological structure characteristic of iris;
S415:Upper palpebra inferior is positioned according to parameterized template, extracts driver and passenger's eye motion;
S416:According to eye motion, eyelid opening and closing degree is extracted respectively, eyes open and close speed and iris kinetic characteristic associated fatigue Feature, and compared with the feature under waking state, obtain variation features;
S417:Relevance between each fatigue characteristic is analyzed using BAYESIAN NETWORK CLASSIFIER, completes driver and passenger's fatigue prison It surveys.
8. a kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car according to claim 7, feature exist In:Step S5 is specially:
S51:The images of gestures of driver and passenger is acquired, and is converted into image sequence Irgb
S52:By image sequence IrgbBe converted to grayscale image sequence Igray, and by image IrgbBe converted to two-value skin-color news image Sequence Iskin
S53:According to grayscale image sequence IgrayWith two-value skin-color news image sequence IskinKinematic parameter is calculated, is transported as interframe Dynamic feature;
S54:The time span of regular gesture motion constructs probability function,
Wherein i represents i-th of state, and j represents j-th of characteristic parameter, xi,jFor time normalization gesture sequence motion feature and Shape feature, λ represent gesture class, and μ represents the mathematic expectaion matrix of each characteristic parameter, and σ is standard deviation, ui,jFor i-th of state J-th of characteristic parameter mathematic expectaion matrix, σi,jThe standard deviation of j-th of characteristic parameter for i-th of state;
S55:There is the probability function of complete gesture sequence observation in structure,
Wherein, X is the observation of complete gesture sequence, m for gesture state and, n for gesture feature and;
S56:For all kinds of gesture identifications, calculate
The minimum value of acquisition is to belong to gesture classification.
CN201810030098.3A 2018-01-12 2018-01-12 Hybrid enhanced intelligent cognitive method of intelligent business travel motor home Active CN108256307B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810030098.3A CN108256307B (en) 2018-01-12 2018-01-12 Hybrid enhanced intelligent cognitive method of intelligent business travel motor home

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810030098.3A CN108256307B (en) 2018-01-12 2018-01-12 Hybrid enhanced intelligent cognitive method of intelligent business travel motor home

Publications (2)

Publication Number Publication Date
CN108256307A true CN108256307A (en) 2018-07-06
CN108256307B CN108256307B (en) 2021-04-02

Family

ID=62727133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810030098.3A Active CN108256307B (en) 2018-01-12 2018-01-12 Hybrid enhanced intelligent cognitive method of intelligent business travel motor home

Country Status (1)

Country Link
CN (1) CN108256307B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034020A (en) * 2018-07-12 2018-12-18 重庆邮电大学 A kind of community's Risk Monitoring and prevention method based on Internet of Things and deep learning
CN109079813A (en) * 2018-08-14 2018-12-25 重庆四通都成科技发展有限公司 Automobile Marketing service robot system and its application method
CN109143870A (en) * 2018-10-23 2019-01-04 宁波溪棠信息科技有限公司 A kind of control method of multiple target task
CN109918513A (en) * 2019-03-12 2019-06-21 北京百度网讯科技有限公司 Image processing method, device, server and storage medium
CN110070884A (en) * 2019-02-28 2019-07-30 北京字节跳动网络技术有限公司 Audio originates point detecting method and device
CN110111795A (en) * 2019-04-23 2019-08-09 维沃移动通信有限公司 A kind of method of speech processing and terminal device
CN112308116A (en) * 2020-09-28 2021-02-02 济南大学 Self-optimization multi-channel fusion method and system for old-person-assistant accompanying robot

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120317621A1 (en) * 2011-06-09 2012-12-13 Canon Kabushiki Kaisha Cloud system, license management method for cloud service
US9286029B2 (en) * 2013-06-06 2016-03-15 Honda Motor Co., Ltd. System and method for multimodal human-vehicle interaction and belief tracking
CN105654753A (en) * 2016-01-08 2016-06-08 北京乐驾科技有限公司 Intelligent vehicle-mounted safe driving assistance method and system
CN105812129A (en) * 2016-05-10 2016-07-27 成都景博信息技术有限公司 Method for monitoring vehicle running state
CN104183091B (en) * 2014-08-14 2017-02-08 苏州清研微视电子科技有限公司 System for adjusting sensitivity of fatigue driving early warning system in self-adaptive mode
CN106682603A (en) * 2016-12-19 2017-05-17 陕西科技大学 Real time driver fatigue warning system based on multi-source information fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120317621A1 (en) * 2011-06-09 2012-12-13 Canon Kabushiki Kaisha Cloud system, license management method for cloud service
US9286029B2 (en) * 2013-06-06 2016-03-15 Honda Motor Co., Ltd. System and method for multimodal human-vehicle interaction and belief tracking
CN104183091B (en) * 2014-08-14 2017-02-08 苏州清研微视电子科技有限公司 System for adjusting sensitivity of fatigue driving early warning system in self-adaptive mode
CN105654753A (en) * 2016-01-08 2016-06-08 北京乐驾科技有限公司 Intelligent vehicle-mounted safe driving assistance method and system
CN105812129A (en) * 2016-05-10 2016-07-27 成都景博信息技术有限公司 Method for monitoring vehicle running state
CN106682603A (en) * 2016-12-19 2017-05-17 陕西科技大学 Real time driver fatigue warning system based on multi-source information fusion

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
NAN-NING ZHENG,ET AL: "Hybrid-augmented intelligence:collaboration and cognition", 《FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING》 *
NAN-NING ZHENG,ET AL: "Toward Intelligent Driver-Assistance and Safety Warning Systems", 《IEEE INTELLIGENT SYSTEM》 *
SRIKANTH SARIPALLI,ET AL: "Visually Guided Landing of an Unmanned Aerial Vehicle", 《IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION》 *
李嫄源,等: "基于数据资源的认知图挖掘系统研究", 《重庆邮电大学学报 (自然科学版 )》 *
李嫄源,等: "车用自组网媒体访问控制机制改进", 《微电子学》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034020A (en) * 2018-07-12 2018-12-18 重庆邮电大学 A kind of community's Risk Monitoring and prevention method based on Internet of Things and deep learning
CN109079813A (en) * 2018-08-14 2018-12-25 重庆四通都成科技发展有限公司 Automobile Marketing service robot system and its application method
CN109143870A (en) * 2018-10-23 2019-01-04 宁波溪棠信息科技有限公司 A kind of control method of multiple target task
CN109143870B (en) * 2018-10-23 2021-08-06 宁波溪棠信息科技有限公司 Multi-target task control method
CN110070884A (en) * 2019-02-28 2019-07-30 北京字节跳动网络技术有限公司 Audio originates point detecting method and device
CN110070884B (en) * 2019-02-28 2022-03-15 北京字节跳动网络技术有限公司 Audio starting point detection method and device
CN109918513A (en) * 2019-03-12 2019-06-21 北京百度网讯科技有限公司 Image processing method, device, server and storage medium
CN109918513B (en) * 2019-03-12 2023-04-28 北京百度网讯科技有限公司 Image processing method, device, server and storage medium
CN110111795A (en) * 2019-04-23 2019-08-09 维沃移动通信有限公司 A kind of method of speech processing and terminal device
CN110111795B (en) * 2019-04-23 2021-08-27 维沃移动通信有限公司 Voice processing method and terminal equipment
CN112308116A (en) * 2020-09-28 2021-02-02 济南大学 Self-optimization multi-channel fusion method and system for old-person-assistant accompanying robot
CN112308116B (en) * 2020-09-28 2023-04-07 济南大学 Self-optimization multi-channel fusion method and system for old-person-assistant accompanying robot

Also Published As

Publication number Publication date
CN108256307B (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN109409296B (en) Video emotion recognition method integrating facial expression recognition and voice emotion recognition
CN108256307A (en) A kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car
CN107145842B (en) Face recognition method combining LBP characteristic graph and convolutional neural network
Bhattacharya et al. Step: Spatial temporal graph convolutional networks for emotion perception from gaits
CN103258204B (en) A kind of automatic micro-expression recognition method based on Gabor and EOH feature
CN106127156A (en) Robot interactive method based on vocal print and recognition of face
CN108256421A (en) A kind of dynamic gesture sequence real-time identification method, system and device
CN112101241A (en) Lightweight expression recognition method based on deep learning
CN106909938B (en) Visual angle independence behavior identification method based on deep learning network
CN110674483B (en) Identity recognition method based on multi-mode information
CN106803098A (en) A kind of three mode emotion identification methods based on voice, expression and attitude
CN110969073B (en) Facial expression recognition method based on feature fusion and BP neural network
CN104794441B (en) Human face characteristic positioning method based on active shape model and POEM texture models under complex background
Ocquaye et al. Dual exclusive attentive transfer for unsupervised deep convolutional domain adaptation in speech emotion recognition
CN109344713A (en) A kind of face identification method of attitude robust
CN110211594A (en) A kind of method for distinguishing speek person based on twin network model and KNN algorithm
CN107992854A (en) Forest Ecology man-machine interaction method based on machine vision
CN110135327A (en) A kind of driving behavior recognition methods based on multi-region feature learning model
CN108898621B (en) Related filtering tracking method based on instance perception target suggestion window
CN107437090A (en) The continuous emotion Forecasting Methodology of three mode based on voice, expression and electrocardiosignal
CN114241458A (en) Driver behavior recognition method based on attitude estimation feature fusion
CN114202792A (en) Face dynamic expression recognition method based on end-to-end convolutional neural network
CN106980845B (en) Face key point positioning method based on structured modeling
CN117195148A (en) Ore emotion recognition method based on expression, electroencephalogram and voice multi-mode fusion
Wang et al. Interpret neural networks by extracting critical subnetworks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant