WO2019056471A1 - 驾驶模型训练方法、驾驶人识别方法、装置、设备及介质 - Google Patents

驾驶模型训练方法、驾驶人识别方法、装置、设备及介质 Download PDF

Info

Publication number
WO2019056471A1
WO2019056471A1 PCT/CN2017/107814 CN2017107814W WO2019056471A1 WO 2019056471 A1 WO2019056471 A1 WO 2019056471A1 CN 2017107814 W CN2017107814 W CN 2017107814W WO 2019056471 A1 WO2019056471 A1 WO 2019056471A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
probability
image data
model
layer
Prior art date
Application number
PCT/CN2017/107814
Other languages
English (en)
French (fr)
Inventor
吴壮伟
金鑫
张川
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019056471A1 publication Critical patent/WO2019056471A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present application relates to the field of identity recognition, and in particular, to a driving model training method, a driver identification method, an apparatus, a device, and a medium.
  • the driver himself or herself generally uses the gyroscope data and the mobile phone trajectory data acquired by the mobile phone to judge whether or not to drive the vehicle, but the accuracy of the driver identification using the gyro data and the mobile phone trajectory data is not high.
  • the data obtained by the driver's identification using the gyroscope data and the mobile phone trajectory data often cannot reflect the real state of the driver's driving, and the specific data such as the speed, acceleration or trajectory data on the map is difficult to achieve for the driver.
  • Accurate identification The data collected and used is mostly the physical characteristics of the car when driving, and does not use other characteristics that can effectively reflect the driver's recognition, and can not reflect the state of the driver's real driving process well, resulting in poor recognition of the driver's recognition.
  • the embodiment of the present application provides a driving model training method, a driver identification method, a device, a device, and a medium, so as to solve the problem that the current driving model recognition effect is poor.
  • an embodiment of the present application provides a driving model training method, including:
  • training image data and training audio data of the same driving scene the training image data and the training audio data being associated with a user identification
  • an embodiment of the present application provides a driving model training apparatus, including:
  • a training data acquisition module configured to acquire training image data and training audio data of the same driving scene, where the training image data and the training audio data are associated with a user identifier;
  • a face recognition model acquisition module configured to train the convolutional neural network model by using the training image data to obtain a face recognition model
  • An audio recognition model acquisition module configured to train the convolutional neural network model based on the training audio data to obtain an audio recognition model
  • an associated storage module configured to perform consistency verification on the face recognition model and the audio recognition model by using the training image data and the audio image data, and verify the face recognition model and the audio
  • the identification model is stored in association with the user identification.
  • an embodiment of the present application provides a driver identification method, including:
  • an audio recognition model stored in association with the user identifier, where the audio recognition model is a model acquired by using the driving model training method;
  • an embodiment of the present application provides a driver identification device, including:
  • the to-be-identified data acquisition module is configured to acquire image data to be identified and audio data to be identified of the same driving scene of the user, where the to-be-identified image data and the to-be-identified audio data are associated with the user identifier;
  • a recognition model invoking module configured to query a database based on the user identifier, and invoke a face recognition model and an audio recognition model corresponding to the user identifier, where the face recognition model and the audio recognition model adopt the driving a model obtained by a model training method;
  • a first probability acquisition module configured to acquire a first probability based on the image data to be identified and the face recognition model
  • a second probability acquisition module configured to acquire a second probability based on the to-be-identified audio data and the audio recognition model
  • a final probability acquisition module configured to determine a final probability of the user driving the vehicle based on the first probability and the second probability
  • the confirmation result obtaining module is configured to determine that the user himself drives the vehicle if the final probability is greater than the second preset threshold.
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor executes the computer The following steps are implemented when reading the instruction:
  • training image data and training audio data of the same driving scene the training image data and the training audio data being associated with a user identification
  • the convolutional neural network model is trained by using the training audio data to obtain an audio recognition model
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor executes the computer The following steps are implemented when reading the instruction:
  • the embodiment of the present application provides a computer readable medium storing computer readable instructions, where the computer readable instructions are executed by a processor to implement the following steps:
  • training image data and training audio data of the same driving scene the training image data and the training audio data being associated with a user identification
  • the convolutional neural network model is trained by using the training audio data to obtain an audio recognition model
  • an embodiment of the present application provides a computer readable medium storing computer readable instructions, where the computer readable instructions are executed by a processor to implement the following steps:
  • training image data and training audio data of the same driving scene are first acquired, so as to acquire training image data and training required for driving model training based on the user identification.
  • the audio data is used to ensure that the driving model obtained by the training can determine whether the user is driving by face recognition and audio recognition.
  • the training image data is used to train the convolutional neural network model to obtain the face recognition model.
  • the face recognition model obtained by the convolutional neural network model training can more accurately identify the user, in order to determine whether the user drives himself. A guarantee is provided.
  • the convolutional neural network model is trained by using the training audio data to obtain an audio recognition model.
  • the audio recognition model Based on the face recognition model, the audio recognition model also identifies the user's own driving in the audio recognition dimension, which can further improve the recognition. Precision. Finally, the face recognition model and the audio recognition model are consistently verified by using the training image data and the audio image data, and the verified face recognition model and the audio recognition model are stored in association with the user identifier, and the association storage can be performed by the same user.
  • the user identification directly associates the face recognition model with the audio recognition model to realize the recognition of the image and audio data, so that the driving model realizes whether the user drives the car from two important dimensions, and makes full use of the potential connection between the image dimension and the audio dimension. , Make the recognition result closer to the actual driving situation.
  • the above two identification models are stored in a database associated with the same user identifier, so that face recognition and audio recognition are respectively performed on the image dimension and the sound dimension acquired in the same driving scene, and the recognition process effectively reduces the error caused by the single dimension data. Effectively ensure the accuracy of driving model identification.
  • the first probability is obtained based on the image data to be recognized and the face recognition model
  • the second probability is obtained based on the audio data to be identified and the audio recognition model, according to the first A probability and a second probability determine the final probability of the user driving the car, and determine whether the final probability is greater than the second preset threshold to determine whether to drive the user himself, so that the driver recognition result is more accurate and reliable.
  • Embodiment 1 is a flow chart of a driving model training method in Embodiment 1 of the present application.
  • FIG. 2 is a specific flow chart before step S11 in FIG. 1.
  • FIG. 3 is a specific flow chart of step S11 of FIG. 1.
  • step S12 of FIG. 1 is a specific flow chart of step S12 of FIG. 1.
  • FIG. 5 is a specific flowchart of step S13 in FIG. 1.
  • FIG. 6 is a schematic block diagram of a driving model training device in Embodiment 2 of the present application.
  • FIG. 7 is a flowchart of a driver identification method in Embodiment 3 of the present application.
  • FIG. 8 is a specific flowchart of step S25 in FIG. 7.
  • FIG. 9 is a schematic block diagram of a driver identification device in Embodiment 4 of the present application.
  • FIG. 10 is a schematic diagram of a terminal device in Embodiment 6 of the present application.
  • FIG. 1 shows a flow chart of a driving model training method in this embodiment.
  • the driving model training method can be applied to a terminal device of an insurance institution or other institution for training a driving model to utilize a trained driving model for knowledge. Don't, achieve the effect of intelligent recognition. If it is applicable to the terminal device of the insurance institution, it is used to train the driving model corresponding to the user, so as to utilize the trained driving model to identify the user who handles the automobile insurance in the insurance institution to determine whether to drive the user himself.
  • the driving model training method includes the following steps:
  • S11 acquiring training image data and training audio data of the same driving scene, and the training image data and the training audio data are associated with the user identification.
  • the same driving scene refers to a driving scene where the user is at the same time
  • the training image data and the training audio data are data collected by the user in the same driving scene.
  • the user identifier is an identifier for uniquely identifying the user.
  • all the acquired training image data and the training audio data are associated with the user identifier.
  • all the training image data and the training audio data are associated with the user identifier, which means that the training image data and the training audio data generated by each user when traveling are uniquely corresponding to the user identifier, and one user identifier may be associated with multiple same driving.
  • Training image data and training audio data of the scene It can be understood that both the training image data and the training audio data associated with the user identifier carry time tags, and the training image data and the training audio data of the same driving scene corresponding to the same user identifier carry the same time label.
  • the user completes the registration on the application (ie, APP) on the mobile terminal such as the mobile phone or the tablet, so that the server corresponding to the application can obtain the corresponding user identifier.
  • the user identifier can be the user.
  • the mobile phone number or the ID number can uniquely identify the user's identity.
  • the mobile terminal After acquiring the image data and the audio data, the mobile terminal uploads the image data and the audio data to the server, so that the server stores the acquired image data and audio data in a database such as MySQL and Oracle, and makes each image data And the audio data is stored in association with a user identifier.
  • the image data and the audio data associated with the user identifier can be obtained from the database of MySQL, Oracle, etc., as the training image data of the training driving model.
  • training audio data The user's training image data Training audio data contains a large amount of training data users, can provide enough training image data and audio training data, the data provide a good basis for driving model training, to ensure recognition of the effect of model driving training get.
  • step S11 training image data and training audio data of the same driving scene are acquired, and the training image data and the training audio data are associated with the user identifier, and the following steps are also included:
  • S1111 Acquire a current vehicle speed of the vehicle in the driving scene, and determine whether the current vehicle speed reaches a preset vehicle speed threshold.
  • the built-in sensor in the mobile terminal acquires the current vehicle speed of the vehicle in real time, and compares the obtained current vehicle speed with the preset vehicle speed threshold in real time to determine whether the current vehicle speed is The preset speed threshold is reached.
  • user A has a vehicle speed of 0 km/h to 60 km/h in a driving scene. Incremental change, the preset vehicle speed threshold is 15km/h, and the user's mobile terminal will determine in real time whether the current vehicle speed of the vehicle reaches 15km/h.
  • S1112 Acquire a current vehicle speed of the vehicle in the driving scene, determine whether the current vehicle speed reaches a preset vehicle speed threshold, and associate the current image data and the current audio data with the user identifier.
  • the user's mobile device when the user drives a driving scene, when the current vehicle speed reaches the preset vehicle speed threshold, the user's mobile device will call the camera and the recording device of the mobile terminal to collect the current image data and current information in the driving scene. Audio data, and the current image data and current audio data are associated with a user identification. Specifically, the user A changes the vehicle speed from 0 km/h to 60 km/h in a driving scenario, and the preset vehicle speed threshold is 15 km/h. When the user's driving speed does not reach 15 km/h, the user's mobile device continues.
  • the user's mobile terminal will call the camera and the recording device of the mobile terminal to collect the current image data and current audio data of the user A in the driving scene, the current The image data and current audio data are associated with the user ID of User A. Further, the current image data and the current audio data collected by different users, such as the driving scenes of the user B and the user C at the same time, are associated with the user identifier, that is, the current image data collected by the user B and the current audio data and the user B's The user identification is associated, and the current image data and current audio data collected by the user C are associated with the user identification of the user C.
  • S1113 Store current image data and current audio data in a database.
  • the user's mobile device acquires current image data and current audio data, and uploads the current image data and current audio data to the server, so that the server stores the acquired current image data and current audio data in MySQL.
  • MySQL a database such as Oracle
  • each current image data and current audio data are stored in association with a user identifier.
  • the terminal device needs to perform driving model training, the current image data and the current audio data associated with the user identifier may be obtained from a database of MySQL, Oracle, etc., as training image data and training audio data of the training driving model.
  • S1114 Create a driving data information table in the database, the driving data information table includes at least one driving data information; each driving data information includes a user identifier, a storage address of the current image data in the database, and a storage address of the current audio data in the database. .
  • the driving data information table is an information table that details current image data and current audio data collected from the user mobile terminal, and the driving data information table includes at least one driving data information, and each driving data information is for the user in the same driving scene.
  • the current image data and the current audio data are acquired, and thus the driving data information includes a user identification, a storage address of the current image data in the database, and a storage address of the current audio data in the database.
  • the collected data is stored in the database through the data table, and is associated with the user identifier, and can be queried according to the user identifier.
  • the storage address of the current image data in the database and the storage address of the current audio data in the database thereby quickly obtaining the current image data and the current audio data stored in the database as the training image required for training the driving model Data and training audio data.
  • step S11 acquiring training image data and training audio data of the same driving scene includes the following steps:
  • S1121 Acquire a model training instruction input by a user, where the model training instruction includes a user identifier.
  • the model training instruction refers to training image data and training audio data instructions required for driving model training acquired by the user's mobile terminal.
  • the user inputs a model training instruction on the mobile terminal interface, and after the mobile terminal interface acquires the model training instruction, the instruction is transmitted to the background of the mobile terminal, so that the instruction is processed in the background.
  • the model training instruction includes a user identification that can be used to query a driving data information table in a database.
  • S1122 Query the driving data information table based on the user identifier, and determine whether the number of driving data information is greater than a preset number.
  • the driving data information table is queried according to the user identifier, and the driving data information table includes at least one driving data information; each driving data information includes a user identifier, a storage address of the current image data in the database, and current audio data in the database.
  • the mobile terminal queries the number of the driving data information in the driving data information table according to the obtained user identifier, and determines whether the number of the driving driving data information is greater than a preset number, wherein the preset number refers to a quantity threshold set in advance.
  • the preset number can be set to 10,000.
  • the data should not be too small, too little data will result in poor recognition of the driving model acquired by the training, and the driving model is easy to over-fitting; too many numbers will cause the model training time to be too long, which is not conducive to practical application, so A moderate amount of driving data information should be taken, that is, the driving model can be prevented from being over-fitting, and the model training can be completed within an expected time, and the recognition effect of the driving model can also be ensured.
  • the number of driving data queried in the database is compared with a preset number. If the number of driving data is greater than the preset number, the number of driving data information stored in the database has reached the number of driving model training. Then, the stored training image data and training audio data are output for driving model training.
  • the driving data includes the training image data and the training audio data, and the training image data and the training audio data are acquired in the same driving scene, so the relationship between the training image data and the training audio data stored in the data table is 1:1, and the number of driving data includes training image data and training audio data, as long as one of the two is greater than the preset number, and the other training data is greater than the preset number, and the driving model training can be performed simultaneously.
  • Training face recognition mode Type and audio recognition model The driving model training includes a face recognition model training and an audio recognition model training, and the training image data and the training audio data acquired by the user mobile terminal are respectively used to train the face recognition model and the audio recognition model.
  • S12 training the convolutional neural network model by using training image data to obtain a face recognition model.
  • the Convolutional Neural Network (CNN) model is a feedforward neural network, and its artificial neurons can respond to a part of the coverage area, which is often used for the processing of large images.
  • a convolutional neural network typically includes at least two non-linearly trainable convolutional layers, at least two non-linear pooling layers and at least one fully connected layer, ie comprising at least five hidden layers, in addition to an input layer and an output Floor.
  • the training image data is input into the convolutional neural network, and the convolutional layer of the convolutional neural network performs convolution calculation on the training image data, and a corresponding number of feature maps are obtained according to the set number of filters.
  • the obtained feature map is subjected to downsampling calculation at the pooling layer to obtain a pooled feature map.
  • the purpose of the downsampling calculation is to remove the unimportant samples in the feature map and further reduce the number of parameters.
  • the maximum pooling is actually taking the maximum value in the n*n sample as the sampled value after sampling.
  • the convolutional layer and the pooled layer appear in pairs, that is, the convolutional calculation is performed on the convolutional layer, followed by downsampling calculation of the feature map obtained by the convolution calculation at the pooling layer.
  • the feature map after multiple rounds of convolution-pooling will pass through at least one fully connected layer and the last layer of the output layer in the network model.
  • the only difference between the output layer and the normal fully connected layer at this time is that the activation function is a softmax function, and the activation function of the fully connected layer is generally sigmoid.
  • the error calculation and gradient back-transition update are performed on each layer of the convolutional neural network model, and the weights of the updated layers are obtained. Based on the updated weights of each layer, the face recognition model is obtained.
  • the face recognition model obtained by the convolutional neural network model training can more accurately identify the user's face and provide a guarantee for determining whether the user himself or not.
  • step S12 the convolutional neural network model is trained by using the training image data to obtain a face recognition model, which specifically includes the following steps:
  • initializing the convolutional neural network primarily initializes the convolutional kernel (ie, weights) and offsets of the convolutional layer.
  • the weight initialization of the convolutional neural network model refers to assigning an initial value to the ownership value in the convolutional neural network model. If the initial weight is in a relatively flat region of the error surface, the convergence speed of the convolutional neural network model training may be abnormally slow. In general, the weight of the network is initialized evenly over a relatively small interval with a mean of 0, such as in the range [-0.30, +0.30].
  • S122 Input training image data in a convolutional neural network model, and calculate an output of each layer of the convolutional neural network model.
  • the training image data is input in the convolutional neural network model, and the output of each layer of the convolutional neural network model is calculated, and the output of each layer is acquired by using a forward propagation algorithm.
  • a forward propagation algorithm unlike the fully connected neural network model, for the locally connected convolutional neural network model, it is necessary to calculate the characteristic map of each output of the convolutional layer in the model and the characteristic map of each output of the pooled layer, Update the weights.
  • a feature map for each output of the convolutional layer Where l is the current layer and Mj is the selected input feature map combination. Is the input of the i-th feature map, that is, the output of the l-1 layer.
  • the feature map x j for each output of the pooled layer is Where down represents the downsampling calculation, here The multiplicative offset corresponding to the layer 1 of the j-th feature map, and b is the additive bias corresponding to the layer 1 of the j-th feature map.
  • the convolutional layer and the pooled layer output of the neural network model with different and generally connected connections in the convolutional neural network model are mainly given.
  • the output of the remaining layers is the same as that of the generally fully connected neural network model.
  • the propagation algorithm can be obtained, so it is not an example to avoid cumbersome.
  • S123 Perform error back propagation update on each layer of the convolutional neural network model according to the output of each layer, and obtain the weights of the updated layers.
  • step S122 there is necessarily an error between the obtained predicted value and the real value, and the error information needs to be transmitted back to each layer layer by layer, so that each layer updates their weights to obtain a face with better recognition effect.
  • the error back-propagation is updated on each layer of the convolutional neural network model according to the output of each layer, and the weights of the updated layers are obtained, which specifically includes calculating error information of each layer of the convolutional neural network model, and using The gradient descent method updates the weight of each layer.
  • the gradient descent method updates the weight mainly by using the error cost function to the gradient of the parameters, so the goal of weight update is to let each layer get such a gradient and then update.
  • step S123 specifically includes the following steps: an expression according to the nth error cost function Where n is a single training sample and the target output in the convolutional neural network model is T2, t3,...,tk), used Said that For the actual output, c is the dimension of the actual output.
  • the sensitivity ⁇ is defined here as the rate of change of the error to the output.
  • E is the error cost function
  • l represents the current layer 1
  • W l represents the weight of the layer
  • x l-1 represents the input of the layer
  • b l indicates the additive bias of the layer.
  • Back propagation can be realized by calculating the error layer back error information.
  • the back propagation process refers to the process of error back propagation of each layer of the convolutional neural network model to obtain the updated weights of each layer.
  • the sensitivity of the first layer of the convolution layer is where ⁇ means that each element is multiplied, because each neuron connection will have a sensitivity ⁇ , so the sensitivity of each layer is a matrix, and the l+1 layer refers to the pooling layer.
  • the convolution operation for example, the downsampling operation with the feature map size of 2 is to convolve the image with a convolution kernel of each value of 2*2, so the weight W here is actually this 2*2.
  • the convolution kernel whose value is ⁇ j . Up represents the upsampling calculation.
  • the upsampling calculation is the calculation relative to the downsampling calculation.
  • the upsampling calculation will copy each pixel by n times in the vertical and horizontal directions, respectively. Since the sensitivity matrix of the l+1 pooling layer is 1/4 of the size of the layer 1 sensitivity matrix, it is necessary to perform upsampling calculation on the sensitivity matrix of the l+1 layer to make them uniform in size. According to the obtained sensitivity, the bias of the calculated error cost function to the additive bias b is That is, all nodes in the sensitivity in layer 1 are summed, where (u, v) represents the position of the element in the sensitivity matrix.
  • the multiplicative bias ⁇ is related to the pooling layer of the current layer in forward propagation, so it is defined first.
  • the calculated error cost function is biased to the multiplicative bias ⁇
  • the partial derivative of the error cost function is calculated to the convolution kernel k
  • the small block in each feature map convolved with k ij (u, v) refers to the center of the small block, and the value of the (u, v) position in the output feature map is the input feature.
  • the value obtained by convolving the small block at the position of (u, v) and the convolution kernel k ij According to the operation of the above formula, the weight of the updated convolutional neural network model convolutional layer can be obtained.
  • the pooling layer should also be updated, and the characteristic map for each output of the pooling layer should be Where down represents downsampling, where ⁇ is the multiplicative bias and b is the additive bias.
  • the calculation formula of the sensitivity of the pooling layer in the convolutional neural network model is And according to ⁇ , the biased cost function can be obtained as the partial bias of the additive bias b Where conv2, rot180, and full are functions required for calculation, and the remaining parameters of the above formula have the same meanings as those mentioned in the above convolutional layer formula, and will not be described in detail herein.
  • the updated pooling layer weights can be obtained, and the weights between other layers of the convolutional neural network model (such as the fully connected layer) should be updated.
  • the update process and the general fully connected neural network model The weight update method is the same, and the weight is updated by the backward propagation algorithm. To avoid cumbersome, it will not be detailed.
  • the weights of the updated layers are obtained by performing error back propagation on each layer of the convolutional neural network model.
  • the acquired face recognition model can be obtained by applying the acquired weights of the updated layers to the convolutional neural network model. Further, the weight between the layers in the face recognition model reflects the potential relationship between each part of the module and its neighboring modules, and achieves effective capture and recognition of the image information.
  • the face recognition model will eventually output a probability value indicating how close the image data to be identified is to the target driving model after being processed by the face recognition model.
  • the model can be widely applied to driver identification to achieve accurate identification of whether the target user is driving himself.
  • the training audio data is used to train the convolutional neural network model, and the audio data needs to be processed first, and the acquired abstract audio data is converted into a training sound spectrum map.
  • the training sound spectrum is input into the convolutional neural network, and the convolutional layer of the convolutional neural network performs convolution calculation on the spectrogram, and a corresponding number of feature maps are obtained according to the number of filters set.
  • the obtained feature map is subjected to downsampling calculation at the pooling layer to obtain a pooled feature map.
  • the convolutional layer and the pooled layer appear in pairs, that is, the convolutional calculation is performed on the convolutional layer, followed by downsampling calculation of the feature map obtained by the convolution calculation at the pooling layer.
  • the feature map after multiple rounds of convolution-pooling will pass through at least one fully connected layer and the last layer of the output layer in the network model.
  • the error calculation and gradient back-transition update are performed on each layer of the convolutional neural network model by calculating the output of each layer, and the updated weights of each layer are obtained, and the audio recognition model is obtained based on the weights of the updated layers.
  • the audio recognition model obtained by the convolutional neural network model training can more accurately identify the user and provide a guarantee for determining whether the user himself or not.
  • step S13 the convolutional neural network model is trained by using the training audio data to obtain an audio recognition model, which specifically includes the following steps:
  • the convolutional neural network model needs to be initialized.
  • the initialization of the convolutional neural network is primarily to initialize the convolutional kernel (ie weight) and offset of the convolutional layer.
  • Network weight initialization is to assign the ownership value in the network to an initial value.
  • the initial value setting of the convolutional neural network model may be different from the training face recognition model, such as in the interval [-0.20, +0.20].
  • Step S132 specifically includes the following steps: First, the training audio data is divided into short frames, which may be hundreds of milliseconds; and in order to ensure continuity and accuracy of information, there should be overlapping portions between adjacent frames.
  • the corresponding concepts above are frame length and step size.
  • the frame length is the duration of one frame
  • the step size is the interval between the start of one frame and the start of the next frame. Since adjacent frames need to have a certain overlap before, the step size is usually smaller than the frame length. Since the frame length is generally small, it can be considered that the fundamental frequency and harmonics and their intensities are constant values in the short time domain.
  • each frame is subjected to a short-time Fourier transform to obtain corresponding spectrum information.
  • Short-Time Fourier Transform STFT is a mathematical transformation related to Fourier transform to determine the frequency and phase of a local region sine wave of a time-varying signal.
  • the spectrum information includes the frequency of each frame and its intensity.
  • the intensity of the image is expressed in color or gray scale to obtain the training spectrum.
  • S133 Input a training sound spectrum map in the convolutional neural network model to calculate the output of each layer of the convolutional neural network model.
  • the training sound spectrum map is input to the convolutional neural network for training, and the output of each layer of the convolutional neural network model is calculated, and the output of each layer is acquired by using a forward propagation algorithm.
  • the feature map x j for each output of the convolution layer is Where l is the current layer and Mj is the selected input feature map combination. Is the input of the i-th feature map, that is, the output of the l-1 layer.
  • the calculation of the feature map of each output of the pooling layer is the same as that of step S122, and the description will not be repeated here.
  • the convolutional layer and the pooled layer output of the neural network model with different and generally connected connections in the convolutional neural network model are mainly given.
  • the output of the remaining layers is the same as that of the generally fully connected neural network model.
  • the propagation algorithm can be obtained, so it is not an example to avoid cumbersome.
  • S134 Perform error back propagation update on each layer of the convolutional neural network model according to the output of each layer, and obtain weights of the updated layers.
  • the error back-transmission update is performed on each layer of the convolutional neural network model according to the output of each layer, and the weights of the updated layers are obtained, including calculating the error information of each layer of the convolutional neural network model, and using the gradient to decrease
  • the method updates the weight of each layer.
  • the gradient descent method updates the weight mainly by using the error cost function to the gradient of the parameters, so the goal of weight update is to let each layer get such a gradient and then update.
  • the present embodiment will not be described in detail.
  • S135 Acquire an audio recognition model based on the weights of the updated layers.
  • the updated weights of each layer obtained by training the sound spectrum map training are applied to the convolutional neural network model to obtain the trained audio recognition model.
  • the weight between the layers in the audio recognition model reflects the potential relationship between each part of the module in the spectrogram and its neighboring modules, and also indirectly reflects the correlation between the training audio data and the user.
  • the audio recognition model will eventually output a probability value indicating how close the audio data to be identified is to the audio recognition model after being processed by the driving model.
  • the model can be widely applied to driver identification to achieve accurate identification of whether the target user is driving himself.
  • S14 Perform consistency verification on the face recognition model and the audio recognition model by using the training image data and the audio image data, and store the verified face recognition model and the audio recognition model in association with the user identifier.
  • the consistency verification of the face recognition model and the audio recognition model by using the training image data and the audio image data refers to verifying the training image data and the training audio data in the same driving scene by using the face recognition model and the audio recognition model.
  • the recognition result is directed to the target user driving or not the target user driving, it is determined that the face recognition model and the audio recognition model have consistency in the training image data and the training audio data identification in the same driving scene.
  • the consistency verification is performed, and the verification result is statistically calculated, that is, the quantity that conforms to the consistency and the quantity that does not conform to the consistency are calculated; and the consistency is calculated according to the statistical verification result.
  • Verifying the probability and determining whether the verification probability is greater than a preset probability If the verification probability is greater than the preset probability, it is determined that the face recognition model and the audio recognition model are verified, and the verified face recognition model and the audio recognition model and the user are verified. Identifies the associated store. Correlating the face recognition model and the audio recognition model through consistency verification can help ensure the accuracy of face recognition model and audio recognition model recognition.
  • the training image data of the same driving scene is input into the face recognition model for recognition, and the first Identifying the result; inputting the training audio data of the same driving scene into the audio recognition model for identification, acquiring the second recognition result; determining whether the first recognition result and the second recognition result are consistent; if the first recognition result is consistent with the second recognition result , it is found to be consistent. It can be understood that the first recognition result and the second recognition result can all adopt a probability value.
  • the probability value is greater than 50%, it is determined that the recognition result is that the target user himself drives; if the probability value is less than 50%, it is determined that the recognition result is not The target user himself drives the vehicle; only if the first recognition result and the second recognition result are both greater than 50% or less than 50% at the same time, the consistency is determined.
  • the storage associated with the user identifier refers to storing according to the user identifier of the same user, and the storage is dependent on the user identifier, so that the face recognition model and the audio recognition model are associated by the same user identifier.
  • the face recognition model and the audio recognition model obtained by the training are stored in association with the user identifier, that is, the same user identifier is to be
  • the face recognition model and the audio recognition model are stored in a database, and a model information table is created in the database.
  • the model information table includes a user identifier and a storage address of the face recognition model and the audio recognition model corresponding to the user identifier in the database.
  • the face recognition model and the audio recognition model are stored in association according to the user identification, and together form an overall driving model, so as to simultaneously call the face recognition model and the audio recognition in the overall driving model when using the driving model for recognition.
  • the model realizes the recognition of image data and audio data, so that the driving model realizes whether the user himself drives the car from two important dimensions, and makes full use of the potential connection between the image dimension and the audio dimension, so that the recognition result is closer to the actual driving situation. Improve the accuracy of recognition.
  • the training image data and the training audio data of the same driving scene are first acquired, and the training image data and the training audio data are associated with the user identifier to ensure that the acquired data is generated by the driving behavior of the same user at the same time, and
  • the user identification facilitates obtaining training image data and training audio data required for driving model training to ensure that the driving model obtained by the training can determine whether the user himself or herself is driven by face recognition and audio recognition.
  • the training image data is used to train the convolutional neural network model to obtain the face recognition model.
  • the face recognition model obtained by the convolutional neural network model reflects the potential relationship between each part of the module and its neighboring modules.
  • the effective grasping and recognition effect on the picture information is realized, and the user can be more accurately identified, which provides a guarantee for determining whether the user himself or not.
  • the training audio data is used to train the convolutional neural network model to obtain the audio recognition model.
  • the audio recognition model obtained by the convolutional neural network model reflects the potential relationship between each module in the spectrogram and its adjacent modules. It also indirectly reflects the relevance of the training audio data to the user, enabling accurate identification and providing assurance for determining whether the user is driving himself.
  • the face recognition model and the audio recognition model are consistently verified by using the training image data and the audio image data, and the verified face recognition model and the audio recognition model are stored in association with the user identifier, and the association storage can be used by the same user.
  • the identification stores the face recognition model and the audio recognition model through a database establishment model information table.
  • the above two identification models that are verified by the consistency are stored in a database associated with the same user identifier, so as to perform face recognition and audio recognition on the image data to be recognized and the face data to be recognized acquired in the same driving scene, respectively. Ensure the accuracy of driving model identification.
  • Fig. 6 is a block diagram showing the principle of the driving model training device corresponding to the driving model training method of the first embodiment.
  • the driving model training device includes a training data acquisition module 11, a face recognition model acquisition module 12, an audio recognition model acquisition module 13, and an associated storage module 14.
  • the implementation functions of the training data acquisition module 11, the face recognition model acquisition module 12, the audio recognition model acquisition module 13, and the associated storage module 14 are driving in the embodiment.
  • the steps corresponding to the model training method are one-to-one correspondence. To avoid redundancy, the present embodiment will not be described in detail.
  • the training data acquisition module 11 is configured to acquire training image data and training audio data of the same driving scene.
  • the face recognition model acquisition module 12 is configured to train the convolutional neural network model with the training image data to obtain a face recognition model.
  • the audio recognition model acquisition module 13 is configured to train the convolutional neural network model by using the training audio data to obtain an audio recognition model.
  • the association storage module 14 is configured to perform consistency verification on the face recognition model and the audio recognition model by using the training image data and the audio image data, and store the verified face recognition model and the audio recognition model in association with the user identifier.
  • the training data acquisition module 11 includes a training instruction acquisition unit 111, an information table query unit 112, and a training data acquisition unit 113.
  • the training instruction obtaining unit 111 is configured to acquire a model training instruction input by the user, and the model training instruction includes a user identifier.
  • the information table querying unit 112 is configured to query the driving data information table based on the user identifier to determine whether the number of driving data information is greater than a preset number.
  • the training data acquiring unit 113 is configured to acquire training image data and training audio data of the same driving scene if the number of driving data is greater than a preset number.
  • the face recognition model acquisition module 12 includes a first model initialization unit 121, a first model layer output unit 122, a first weight update unit 123, and a face recognition model acquisition unit 124.
  • the first model initializing unit 121 is configured to initialize a convolutional neural network model.
  • the first model layer output unit 122 is configured to input training image data in the convolutional neural network model, and calculate an output of each layer of the convolutional neural network model.
  • the first weight update unit 123 is configured to perform error back propagation update on each layer of the convolutional neural network model according to the output of each layer, and obtain weights of the updated layers.
  • the face recognition model acquisition unit 124 is configured to acquire a face recognition model based on the weights of the updated layers.
  • the audio recognition model acquisition module 13 includes a second model initialization unit 131, a training sound spectrum map acquisition unit 132, a second model layer output unit 133, a second weight update unit 134, and an audio recognition model acquisition unit 135.
  • the second model initializing unit 131 is configured to initialize the convolutional neural network model.
  • the training sound spectrum map obtaining unit 132 is configured to perform feature extraction on the training audio data to obtain a corresponding training sound spectrum map.
  • a second model layer output unit 133 configured to input a training sound spectrum map and calculate a convolutional nerve in a convolutional neural network model The output of each layer of the network model.
  • the second weight update unit 134 is configured to perform error back propagation update on each layer of the convolutional neural network model according to the output of each layer, and obtain weights of the updated layers.
  • the audio recognition model acquisition unit 135 is configured to acquire an audio recognition model based on the weights of the updated layers.
  • the training data acquiring module 11 is configured to acquire training image data and training audio data of the same driving scene, and simultaneously collect the training image data and the training audio data in the same scene.
  • the acquired training data has potential correlation, and effectively utilizes the respective characteristics of the image dimension and the sound dimension, so that the training model obtained by the training (including the face recognition model part and the driving model of the audio recognition part) is more closely recognized. In actual scenes, driving improves the recognition accuracy of the driving model.
  • the face recognition model acquisition module 12 is configured to train the convolutional neural network model by using the training image data, obtain a face recognition model, and train the convolutional neural network by using the training image data to make the weight in the face recognition model acquired by the training.
  • the feature of the training image data associated with the user identification enables the face recognition model to be more accurately identified by the updated weight, thereby improving the recognition effect of the face recognition model.
  • the audio recognition model acquisition module 13 is configured to train the convolutional neural network model by using the training audio data. Similar to the training of the face recognition model, the training audio data is used to train the convolutional neural network to update the weight of the network, so that the training is performed.
  • the acquired audio recognition model has the characteristics of the training audio data associated with the user identification, which improves the recognition effect of the face recognition model. Get the audio recognition model.
  • the association storage module 14 is configured to perform consistency verification on the face recognition model and the audio recognition model by using the training image data and the audio image data, and store the verified face recognition model and the audio recognition model in association with the user identifier, in the same scenario. There is a potential connection between the acquired training image data and the training audio data, and the two models are stored in association according to the user identification, so as to improve the recognition effect of the user driving.
  • Fig. 7 is a flow chart showing the driver identification method in the embodiment.
  • the driver identification method can be applied to the terminal equipment of the insurance institution or other institutions to identify the driver's driving behavior and achieve the effect of intelligent recognition. As shown in FIG. 7, the driver identification method includes the following steps:
  • S21 Acquire image data to be identified and audio data to be identified of the same driving scene of the user, and the image data to be identified and the audio data to be identified are associated with the user identifier.
  • the image data to be identified and the audio data to be identified refer to real-time image data and audio data respectively collected by the user through the camera and the recording device of the mobile terminal during actual driving, and the data is used for model identification to determine whether the user himself/herself Drive.
  • the user's mobile terminal acquires real-time image data to be recognized and audio data to be identified according to the driving situation of the user, and the image data to be identified and the audio data to be identified are associated with the user identifier.
  • the to-be-identified data is acquired by the user in the same driving scenario, that is, the user's mobile terminal acquires the image data to be recognized and the audio data to be recognized when the user is driving at the same time.
  • S22 Query the database based on the user identifier, and invoke a face recognition model and an audio recognition model corresponding to the user identifier.
  • the face recognition model and the audio recognition model are models obtained by using the driving model training method in Embodiment 1, specifically, a face recognition model and an audio recognition model that are stored in association with the user identification and pass the consistency verification.
  • the storage address of the face recognition model corresponding to the user identifier in the model information table corresponding to the user identifier is searched in the database, and according to the The address is stored, and the face recognition model corresponding to the user identification is called.
  • the image data to be recognized and the audio data to be identified respectively collected by the camera and the recording device on the user's mobile terminal carry the user identifier; and the model information table stored in the database also includes the user identifier, and the model
  • the information table includes a storage address of the face recognition model and the audio recognition model corresponding to the user identifier in the database, that is, the model information table is queried by the user identifier, and then stored in the database according to the face recognition model storage address in the table.
  • the face recognition model; and the audio recognition model stored in the database is called according to the audio recognition model storage address in the table.
  • S23 Acquire a first probability based on the image data to be identified and the face recognition model.
  • the acquired image data to be recognized is processed in the face recognition model, so that the face recognition model outputs a probability value, and the probability value is referred to as a first probability,
  • the acquired probability value is distinguished from the audio recognition model.
  • S24 Acquire a second probability based on the to-be-identified audio data and the audio recognition model.
  • the acquired audio data to be identified is processed in the audio recognition model, and finally a probability value is outputted in the audio recognition model, and the probability value is referred to as a second probability to distinguish
  • the face recognition model identifies the first probability value obtained.
  • S25 Determine a final probability that the user drives the car based on the first probability and the second probability.
  • the first probability value obtained by the image recognition data to be recognized by the face recognition model and the second probability value obtained by the audio recognition model processing and recognition are numerically processed to obtain the final judgment. Whether the user himself drives the final probability. Understandably, the final probability reflects the relationship between the image and the audio dimension, which can effectively eliminate the deficiencies in the single dimension, and make the recognition effect of the model more accurate.
  • step S25 determining a final probability of the user driving the vehicle based on the first probability and the second probability, specifically including the following steps:
  • S251 Acquire a probability difference between the first probability and the second probability.
  • the obtained first probability and the second probability are subtracted from the probability value with a large value by a probability value having a large value, and the probability difference between the first probability and the second probability is obtained.
  • the probability difference value is an error value of a probability value acquired by each of the face recognition model and the audio recognition model, which reflects the difference in recognition by different dimensions, and the probability difference between the first probability and the second probability is adopted. The calculation can use the difference to further reduce the error of the recognition process, so that the recognition effect is more accurate.
  • S252 Determine whether the probability difference is greater than a first preset threshold.
  • the first preset threshold is a preset threshold that is compared with the probability difference.
  • the obtained probability difference is compared with the first preset threshold, and it is determined whether the probability difference is greater than the first preset threshold.
  • the probability difference is greater than the first preset threshold
  • the larger of the first probability and the second probability is output as the final probability.
  • the first probability value is 92%
  • the second probability value is 98%
  • the first preset threshold is 5%
  • the calculated probability difference is 6%
  • the probability difference is greater than the first preset threshold.
  • the larger of the first probability and the second probability 98%, is selected as the final probability output.
  • the first preset threshold is used to reduce the error in the identification process. When the data abnormality is occasionally recognized, the unreal data can be effectively removed. Generally speaking, this situation will occur less frequently.
  • the second preset threshold refers to a preset threshold used for comparison with the final probability.
  • the final probability obtained is compared with the second preset threshold. If the final probability is greater than the second preset threshold, it is determined that the user drives the vehicle. Further, if the final probability is not greater than the second preset threshold, Then it is determined that the user is not driving himself. Specifically, the final probability of acquisition may be 98%, and the second preset threshold is 95%, then it may be determined that the user drives himself.
  • the probability difference is not greater than the first preset threshold
  • the average of the first probability and the second probability is selected as the final probability output.
  • the probability difference can be calculated to be 2%, and the probability difference is not greater than the first preset.
  • the average of the first probability and the second probability is 98% as the final probability output.
  • the first probability is obtained based on the image data to be recognized and the face recognition model
  • the second probability is obtained based on the audio data to be identified and the audio recognition model
  • the final probability of the user driving is determined according to the first probability and the second probability.
  • the error in the recognition process can be reduced.
  • the unreal data can be effectively removed and the recognition result is closer to the actual situation.
  • the result of the better recognition is finally determined according to whether the final probability is greater than the second preset threshold to determine whether the user is driving the vehicle, so that the driver recognition result is more accurate and reliable.
  • Fig. 9 is a block diagram showing the principle of the driver identification device corresponding to the driver identification method in the third embodiment.
  • the driver identification device includes a data identification module 21 to be identified, a recognition model invoking module 22, a first probability acquisition module 23, a second probability acquisition module 24, a final probability acquisition module 25, and a confirmation result acquisition module 26.
  • the corresponding steps of the method correspond one-to-one. In order to avoid redundancy, the present embodiment is not described in detail.
  • the to-be-identified data acquisition module 21 is configured to acquire image data to be recognized and audio data to be identified of the same driving scene of the user, and the image data to be identified and the audio data to be identified are associated with the user identifier.
  • the recognition model invoking module 22 is configured to query a database based on the user identifier, and invoke a face recognition model and an audio recognition model corresponding to the user identifier, where the face recognition model and the audio recognition model are acquired by using the driving model training method in Embodiment 1. model.
  • the first probability acquisition module 23 is configured to acquire the first probability based on the image data to be recognized and the face recognition model.
  • the second probability acquisition module 24 is configured to acquire a second probability based on the audio data to be identified and the audio recognition model.
  • the final probability acquisition module 25 is configured to determine a final probability of the user driving the vehicle based on the first probability and the second probability.
  • the final probability acquisition module 25 includes a probability difference acquisition unit 251, a probability difference determination unit 252, a first final probability output unit 253, and a second final probability output unit 254.
  • the probability difference obtaining unit 251 is configured to acquire probability difference values of the first probability and the second probability.
  • the probability difference determining unit 252 is configured to determine whether the probability difference is greater than a first preset threshold.
  • the first final probability output unit 253 is configured to select a larger one of the first probability and the second probability as the final probability output if the probability difference is greater than the first preset threshold.
  • the second final probability output unit 254 is configured to select an average of the first probability and the second probability as the final probability output if the probability difference is not greater than the first preset threshold.
  • the confirmation result obtaining module 26 is configured to determine that the user himself drives the vehicle if the final probability is greater than the second preset threshold.
  • the module 21-module 26 realizes obtaining the first probability and the first probability by subtracting the probability value with a small probability value from the first probability and the second probability.
  • the probability difference of the two probabilities is an error value of a probability value obtained by using each of the face recognition model and the audio recognition model, which reflects the difference in recognition by different dimensions, and calculates the probability difference between the first probability and the second probability. Setting the first preset threshold and the second preset threshold effectively implements the control of the abnormal data and the rational operation of the numerical value, so that the acquired probability value is closer to the actual situation, further reducing the error of the small recognition process, so that the identification The effect is more accurate.
  • the embodiment provides a computer readable medium on which computer readable instructions are stored.
  • the driving model training method in Embodiment 1 is implemented. Narration.
  • the computer readable instructions are executed by the processor, the functions of the modules/units of the driving model training device in Embodiment 2 are implemented. To avoid repetition, details are not described herein again.
  • the computer readable instructions are executed by the processor, the functions of the steps in the driver identification method in Embodiment 3 are implemented. To avoid repetition, details are not described herein.
  • the computer readable instructions are executed by the processor, the functions of the modules/units in the driver identification device in Embodiment 4 are implemented. To avoid repetition, details are not described herein.
  • FIG. 10 is a schematic diagram of a terminal device according to an embodiment of the present application.
  • the terminal device 100 of this embodiment includes a processor 101, a memory 102, and computer readable instructions 103 stored in the memory 102 and executable on the processor 101, the computer readable instructions being processed by the processor
  • the driving model training method in Embodiment 1 is implemented when 101 is executed. To avoid repetition, details are not described herein.
  • the computer readable instructions are executed by the processor 101, the functions of the models/units in the driving model training device in Embodiment 2 are implemented. To avoid repetition, details are not described herein.
  • the computer readable instructions are executed by the processor 101, the functions of the steps in the driver identification method in Embodiment 3 are implemented. To avoid repetition, details are not described herein.
  • the computer readable instructions are executed by the processor 101 to implement the functions of the modules/units in the driver identification device of Embodiment 4. To avoid repetition, we will not go into details here.
  • computer readable instructions 103 may be partitioned into one or more modules/units, one or more modules/units being stored in memory 102 and executed by processor 101 to complete the application.
  • the one or more modules/units may be a series of computer readable instruction segments capable of performing a particular function for describing the execution of computer readable instructions 103 in the terminal device 100.
  • the computer readable instructions 100 may be divided into the training data acquisition module 11 , the face recognition model acquisition module 12 , the audio recognition model acquisition module 13 , and the associated storage module 14 in Embodiment 2, or the to-be-identified in Embodiment 4.
  • the terminal device 100 can be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device may include, but is not limited to, the processor 101 and the memory 102. It will be understood by those skilled in the art that FIG. 10 is merely an example of the terminal device 100, and does not constitute a limitation of the terminal device 100, and may include more or less components than those illustrated, or combine some components, or different components.
  • the terminal device may further include an input/output device, a network access device, a bus, and the like.
  • the processor 101 may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 102 may be an internal storage unit of the terminal device 100, such as a hard disk or a memory of the terminal device 100.
  • the memory 102 may also be an external storage device of the terminal device 100, such as a plug-in hard disk equipped on the terminal device 100, a smart memory card (SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on.
  • the memory 102 may also include both an internal storage unit of the terminal device 100 and an external storage device.
  • the memory 102 is used to store computer readable instructions as well as other programs and data required by the terminal device.
  • the memory 102 can also be used to temporarily store data that has been or will be output.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated modules/units if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable medium. Based on such understanding, the present application implements all or part of the processes in the foregoing embodiments, and may also be implemented by instructing related hardware by computer readable instructions, which may be stored in a computer readable medium.
  • the computer readable instructions when executed by a processor, can implement the steps of the various method embodiments described above.
  • the computer readable instructions comprise computer readable instruction code
  • the computer readable instruction code can be in the form of a source code, an object code, an executable, or some intermediate form.
  • the computer readable medium can include any entity or device capable of carrying the computer readable instruction code, a recording medium, a USB flash drive, a removable hard drive, a magnetic disk, an optical disk, a computer memory, a read only memory (ROM, Read-Only) Memory), random access memory (RAM), electrical carrier signals, telecommunications signals, and software distribution media.
  • a recording medium a USB flash drive
  • a removable hard drive a magnetic disk, an optical disk
  • a computer memory a read only memory (ROM, Read-Only) Memory
  • RAM random access memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)
  • Image Processing (AREA)

Abstract

一种驾驶模型训练方法、驾驶人识别方法、装置、设备及介质。该驾驶模型训练方法包括:获取同一驾驶场景的训练图像数据和训练音频数据,所述训练图像数据和所述训练音频数据与用户标识相关联(S11);采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型(S12);采用所述训练音频数据对所述卷积神经网络模型进行训练,获取音频识别模型(S13);采用所述训练图像数据和所述音频图像数据对所述人脸识别模型和所述音频识别模型进行一致性验证,将所述人脸识别模型和所述音频识别模型与所述用户标识关联存储(S14)。该驾驶模型训练方法利用图像维度和声音维度上的特征,解决当前驾驶模型识别效果差的问题,提高了识别驾驶人开车的精确度。

Description

驾驶模型训练方法、驾驶人识别方法、装置、设备及介质
本专利申请以2017年9月19日提交的申请号为201710846204.0,名称为“驾驶模型训练方法、驾驶人识别方法、装置、设备及介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及身份识别领域,尤其涉及一种驾驶模型训练方法、驾驶人识别方法、装置、设备及介质。
背景技术
目前识别是否驾驶人本人开车一般使用手机获取的的陀螺仪数据和手机轨迹数据来判断是否本人开车,但是这种采用陀螺仪数据和手机轨迹数据进行驾驶人识别的结果精确度不高。采用陀螺仪数据和手机轨迹数据进行驾驶人识别获得的数据往往不能反映驾驶人驾驶的真实状态,采用的具体数据如汽车的速度、加速度或者在地图上的轨迹数据这些数据难以实现对驾驶人的精准识别。采集并使用的数据多为汽车驾驶时的物理特性,没有使用其他能够有效反映驾驶人识别的特性,不能较好地反映驾驶人真实行驶过程的状态,造成进行驾驶人识别的识别效果较差。
发明内容
本申请实施例提供一种驾驶模型训练方法、驾驶人识别方法、装置、设备及介质,以解决当前驾驶模型识别效果较差的问题。
第一方面,本申请实施例提供一种驾驶模型训练方法,包括:
获取同一驾驶场景的训练图像数据和训练音频数据,所述训练图像数据和所述训练音频数据与用户标识相关联;
采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型;
基于所述训练音频数据对所述卷积神经网络模型进行训练,获取音频识别模型;
采用所述训练图像数据和所述音频图像数据对所述人脸识别模型和所述音频识别模型进行一致性验证,将通过验证的所述人脸识别模型和所述音频识别模型与所述用户标识关联存储。
第二方面,本申请实施例提供一种驾驶模型训练装置,包括:
训练数据获取模块,用于获取同一驾驶场景的训练图像数据和训练音频数据,所述训练图像数据和所述训练音频数据与用户标识相关联;
人脸识别模型获取模块,用于采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型;
音频识别模型获取模块,用于基于所述训练音频数据对所述卷积神经网络模型进行训练,获取音频识别模型;
关联存储模块,用于采用所述训练图像数据和所述音频图像数据对所述人脸识别模型和所述音频识别模型进行一致性验证,将通过验证的所述人脸识别模型和所述音频识别模型与所述用户标识关联存储。
第三方面,本申请实施例提供一种驾驶人识别方法,包括:
获取用户同一驾驶场景的待识别图像数据和待识别音频数据,所述待识别图像数据和所述待识别音频数据与用户标识相关联;
基于所述用户标识查询数据库,调用与所述用户标识相对应的人脸识别模型和音频识别模型,所述人脸识别模型和所述音频识别模型是采用所述驾驶模型训练方法获取的模型;
基于所述待识别图像数据和所述人脸识别模型获取第一概率;
基于所述人脸识别模型调用与所述用户标识关联存储的音频识别模型,所述音频识别模型是采用所述驾驶模型训练方法获取的模型;
基于所述待识别音频数据和所述音频识别模型获取第二概率;
基于所述第一概率和所述第二概率确定所述用户本人开车的最终概率;
若所述最终概率大于第二预设阈值,则确定为所述用户本人开车。
第四方面,本申请实施例提供一种驾驶人识别装置,包括:
待识别数据获取模块,用于获取用户同一驾驶场景的待识别图像数据和待识别音频数据,所述待识别图像数据和所述待识别音频数据与用户标识相关联;
识别模型调用模块,用于基于所述用户标识查询数据库,调用与所述用户标识相对应的人脸识别模型和音频识别模型,所述人脸识别模型和所述音频识别模型是采用所述驾驶模型训练方法获取的模型;
第一概率获取模块,用于基于所述待识别图像数据和所述人脸识别模型获取第一概率;
第二概率获取模块,用于基于所述待识别音频数据和所述音频识别模型获取第二概率;
最终概率获取模块,用于基于所述第一概率和所述第二概率确定所述用户本人开车的最终概率;
确认结果获取模块,用于若所述最终概率大于第二预设阈值,则确定为所述用户本人开车。
第五方面,本申请实施例提供一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取同一驾驶场景的训练图像数据和训练音频数据,所述训练图像数据和所述训练音频数据与用户标识相关联;
采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型;
采用所述训练音频数据对所述卷积神经网络模型进行训练,获取音频识别模型;
采用所述训练图像数据和所述音频图像数据对所述人脸识别模型和所述音频识别模型进行一致性验证,将通过验证的所述人脸识别模型和所述音频识别模型与所述用户标识关联存储。
第六方面,本申请实施例提供一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取用户同一驾驶场景的待识别图像数据和待识别音频数据,所述待识别图像数据和所述待识别音频数据与用户标识相关联;
基于所述用户标识查询数据库,调用与所述用户标识相对应的人脸识别模型和音频识别模型,所述人脸识别模型和所述音频识别模型是采用所述驾驶模型训练方法获取的模型;
基于所述待识别图像数据和所述人脸识别模型获取第一概率;
基于所述待识别音频数据和所述音频识别模型获取第二概率;
基于所述第一概率和所述第二概率确定所述用户本人开车的最终概率;
若所述最终概率大于第二预设阈值,则确定为所述用户本人开车。
第七方面,本申请实施例提供一种计算机可读介质,所述计算机可读介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:
获取同一驾驶场景的训练图像数据和训练音频数据,所述训练图像数据和所述训练音频数据与用户标识相关联;
采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型;
采用所述训练音频数据对所述卷积神经网络模型进行训练,获取音频识别模型;
采用所述训练图像数据和所述音频图像数据对所述人脸识别模型和所述音频识别模型进行一致性验证,将通过验证的所述人脸识别模型和所述音频识别模型与所述用户标识关联存储。
第八方面,本申请实施例提供一种计算机可读介质,所述计算机可读介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下步骤:
获取用户同一驾驶场景的待识别图像数据和待识别音频数据,所述待识别图像数据和所述待识别音频数据与用户标识相关联;
基于所述用户标识查询数据库,调用与所述用户标识相对应的人脸识别模型和音频识别模型,所述人脸识别模型和所述音频识别模型是采用所述驾驶模型训练方法获取的模型;
基于所述待识别图像数据和所述人脸识别模型获取第一概率;
基于所述待识别音频数据和所述音频识别模型获取第二概率;
基于所述第一概率和所述第二概率确定所述用户本人开车的最终概率;
若所述最终概率大于第二预设阈值,则确定为所述用户本人开车。
本申请实施例所提供的驾驶模型训练方法、装置、设备及介质中,先获取同一驾驶场景的训练图像数据和训练音频数据,以便基于用户标识获取进行驾驶模型训练所需的训练图像数据和训练音频数据,以保证训练获得的驾驶模型能够通过人脸识别和音频识别确定是否用户本人驾驶。然后采用训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型,通过卷积神经网络模型训练获得的人脸识别模型,可以更为准确地对用户进行识别,为确定是否用户本人驾驶提供了保证。接着采用训练音频数据对卷积神经网络模型进行训练,获取音频识别模型,该音频识别模型在人脸识别模型的基础上还进行了音频识别维度上对用户是否本人驾驶的识别,能够进一步提高识别的精度。最后,采用训练图像数据和音频图像数据对人脸识别模型和音频识别模型进行一致性验证,将通过验证的人脸识别模型和音频识别模型与用户标识关联存储,该关联存储可以通过同一用户的用户标识将人脸识别模型和音频识别模型直接关联起来,实现对图像和音频数据的识别,使得驾驶模型实现从两个重要维度识别是否用户本人开车,充分利用图像维度和音频维度间的潜在联系, 使得识别结果更加贴近实际驾驶情况。将上述两种识别模型存储在与同一用户标识关联的数据库中,以便对同一驾驶场景下获取的图像维度和声音维度分别进行人脸识别和音频识别,识别过程有效减少单一维度数据造成的误差,有效保证驾驶模型识别的准确率。
本申请实施例所提供的驾驶人识别方法、装置、设备及介质中,基于待识别图像数据和人脸识别模型获取第一概率,基于待识别音频数据和音频识别模型获取第二概率,根据第一概率和第二概率确定用户本人开车的最终概率,并判断最终概率是否大于第二预设阈值以确定是否为用户本人开车,使得驾驶人识别结果更精确可靠。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例1中驾驶模型训练方法的一流程图。
图2是图1中步骤S11之前的一具体流程图。
图3是图1中步骤S11的一具体流程图。
图4是图1中步骤S12的一具体流程图。
图5是图1中步骤S13的一具体流程图。
图6是本申请实施例2中驾驶模型训练装置的一原理框图。
图7是本申请实施例3中驾驶人识别方法的一流程图。
图8是图7中步骤S25的一具体流程图。
图9是本申请实施例4中驾驶人识别装置的一原理框图。
图10是本申请实施例6中终端设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
实施例1
图1示出本实施例中驾驶模型训练方法的一流程图。该驾驶模型训练方法可应用在保险机构或其他机构的终端设备上,用于训练驾驶模型,以便利用训练好的驾驶模型进行识 别,达到智能识别的效果。如可应用在保险机构的终端设备上,用于训练与用户相对应的驾驶模型,以便利用训练好的驾驶模型对在保险机构办理车险的用户进行识别,以确定是否为用户本人开车。如图1所示,该驾驶模型训练方法包括如下步骤:
S11:获取同一驾驶场景的训练图像数据和训练音频数据,训练图像数据和训练音频数据与用户标识相关联。
其中,同一驾驶场景是指用户在同一时刻所处的驾驶场景,训练图像数据和训练音频数据是在该用户在同一驾驶场景所采集的数据。用户标识是用于唯一识别用户的标识,为了保证训练得到的驾驶模型可用于识别是否为用户本人开车,需使获取到的所有训练图像数据和训练音频数据均与用户标识相关联。其中,所有训练图像数据和训练音频数据均与用户标识相关联,是指每一用户在出行时产生的训练图像数据和训练音频数据与用户标识唯一对应,一个用户标识可相关联多个同一驾驶场景的训练图像数据和训练音频数据。可以理解地,与用户标识相关联的训练图像数据和训练音频数据均携带有时间标签,同一用户标识对应的同一驾驶场景的训练图像数据和训练音频数据携带相同的时间标签。
本实施例中,用户预先在手机或平板等移动终端上的应用程序(即(Application,简称APP)上完成注册,以使应用程序对应的服务器可获取相应的用户标识。该用户标识可以为用户的手机号或身份证号等可唯一识别用户的标识。当用户携带移动终端出行时,移动终端启动摄像头和录音设备,可在该驾驶场景下实时采集用户驾驶过程中的图像数据和音频数据。移动终端获取到图像数据和音频数据后,将该图像数据和音频数据上传到服务器中,以使服务器将获取到的图像数据和音频数据存储在MySQL、Oracle等数据库中,并使每一图像数据和音频数据与一用户标识关联存储。在终端设备需要进行驾驶模型训练时,可从MySQL、Oracle等数据库中查询获取与用户标识相关联的图像数据和音频数据,作为训练驾驶模型的训练图像数据和训练音频数据。该用户的训练图像数据和训练音频数据包含用户的大量训练数据,能够提供足够多的训练图像数据和训练音频数据,为驾驶模型训练提供良好的数据基础,以保证训练得到的驾驶模型的识别效果。
如图2所示,步骤S11中,获取同一驾驶场景的训练图像数据和训练音频数据,训练图像数据和训练音频数据与用户标识相关联,之前还包括如下步骤:
S1111:获取驾驶场景下车辆的当前车速,判断当前车速是否达到预设车速阈值。
本实施中,在用户开始驾驶车辆并启动移动终端后,移动终端中内置的传感器将实时获取车辆的当前车速,并实时将获取到的当前车速与预设车速阈值进行大小比较,判断当前车速是否达到预设车速阈值。具体地,用户A在一驾驶场景下车速由0km/h到60km/h 递增变化,预设车速阈值为15km/h,则用户的移动终端将实时判断车辆的当前车速是否到达15km/h。
S1112:获取驾驶场景下车辆的当前车速,判断当前车速是否达到预设车速阈值,当前图像数据和当前音频数据与用户标识相关联。
本实施例中,用户在一驾驶场景驾驶过程中,当当前车速达到预设车速阈值时,用户的移动设备将会调用移动终端的摄像头和录音设备,采集该驾驶场景下的当前图像数据和当前音频数据,并且该当前图像数据和当前音频数据与用户标识相关联。具体地,用户A在一驾驶场景下车速由0km/h到60km/h递增变化,预设车速阈值为15km/h,则当用户驾驶的车速未到达15km/h时,用户的移动设备将继续获取车辆的当前车速;当用户驾驶到达车速为15km/h时,用户的移动终端将会调用移动终端的摄像头和录音设备,采集该驾驶场景下用户A的当前图像数据和当前音频数据,该当前图像数据和当前音频数据与用户A的用户标识相关联。进一步地,不同的用户如用户B和用户C在同一时刻的驾驶场景采集的当前图像数据和当前音频数据,与其用户标识相关联,即用户B采集的当前图像数据和当前音频数据与用户B的用户标识相关联,用户C采集的当前图像数据和当前音频数据与用户C的用户标识相关联。
S1113:将当前图像数据和当前音频数据存储在数据库中。
本实施例中,用户的移动设备获取当前图像数据和当前音频数据,并将该当前图像数据和当前音频数据上传到服务器中,以使服务器将获取到的当前图像数据和当前音频数据存储在MySQL、Oracle等数据库中,并使每一当前图像数据和当前音频数据与一用户标识关联存储。进一步地,在终端设备需要进行驾驶模型训练时,可从MySQL、Oracle等数据库中查询获取与用户标识相关联的当前图像数据和当前音频数据,作为训练驾驶模型的训练图像数据和训练音频数据。
S1114:在数据库中创建驾驶数据信息表,驾驶数据信息表包括至少一条驾驶数据信息;每一驾驶数据信息包括用户标识、当前图像数据在数据库中的存储地址和当前音频数据在数据库中的存储地址。
其中,驾驶数据信息表是详细记载从用户移动终端采集的当前图像数据和当前音频数据的信息表,该驾驶数据信息表包括至少一条驾驶数据信息,每一驾驶数据信息为用户在同一驾驶场景下获取的当前图像数据和当前音频数据,因此该驾驶数据信息包括用户标识、当前图像数据在数据库中的存储地址和当前音频数据在数据库中的存储地址。本实施例中,采集的数据在数据库中通过数据表存储,并与用户标识关联,可根据用户标识查询 到当前图像数据在数据库中的存储地址和当前音频数据在数据库中的存储地址,从而快捷地获得存储在数据库中的当前图像数据和当前音频数据,以使其作为训练驾驶模型所需的训练图像数据和训练音频数据。
如图3所示,步骤S11中,获取同一驾驶场景的训练图像数据和训练音频数据,包括如下步骤:
S1121:获取用户输入的模型训练指令,模型训练指令包括用户标识。
其中,模型训练指令是指用户的移动终端获取的用于驾驶模型训练所需的训练图像数据和训练音频数据指令。本实施例中,用户在其移动终端界面输入模型训练指令,移动终端界面获取模型训练指令后,将该指令传递到移动终端的后台,以待后台对指令进行处理。该模型训练指令包括用户标识,该用户标识可用于在数据库查询驾驶数据信息表。
S1122:基于用户标识查询驾驶数据信息表,判断驾驶数据信息的数量是否大于预设数量。
本实施例中,根据用户标识查询驾驶数据信息表,该驾驶数据信息表包括至少一条驾驶数据信息;每一驾驶数据信息包括用户标识、当前图像数据在数据库中的存储地址和当前音频数据在数据库中的存储地址。移动终端根据获取的用户标识,查询驾驶数据信息表中驾驶数据信息的数量,并判断查询到的驾驶数据信息数量是否大于预设数量,其中,预设数量是指提前设置好的数量阈值,该预设数量可以设为10000条。一般来说,数据不可过少,过少的数据会导致训练获取的驾驶模型识别效果差,并且驾驶模型容易过拟合;过多的数量会造成模型训练时间过长,不利于实际应用,故应取驾驶数据信息数量适中的值,即可以防止驾驶模型过拟合,又可以在预期时间内完成模型的训练,并且还能够保证驾驶模型的识别效果。
S1123:若驾驶数据的数量大于预设数量,则获取同一驾驶场景的训练图像数据和训练音频数据。
本实施例中,将在数据库中查询到的驾驶数据数量与预设数量进行比较,若驾驶数据的数量大于预设数量,表示在数据库中存储的驾驶数据信息数量已到达进行驾驶模型训练的数量,则将存储的训练图像数据和训练音频数据输出,以进行驾驶模型训练。其中,驾驶数据的数量包含有训练图像数据和训练音频数据,该训练图像数据和训练音频数据是在同一驾驶场景下获取的,故存储在数据表中的训练图像数据和训练音频数据的关系是1:1,且驾驶数据的数量包含有训练图像数据和训练音频数据,只要二者其中一数量大于预设数量,另一训练数据也会大于预设数量,可以进行驾驶模型训练,即可同时训练人脸识别模 型和音频识别模型。其中,驾驶模型训练包括人脸识别模型训练和音频识别模型训练,用户移动终端获取的训练图像数据和训练音频数据分别用于训练人脸识别模型和音频识别模型。
S12:采用训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型。
其中,卷积神经网络(Convolutional Neural Network,简称CNN)模型,是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,常应用于大型图像的处理。卷积神经网络通常包括至少两个非线性可训练的卷积层,至少两个非线性的池化层和至少一个全连接层,即包括至少五个隐含层,此外还包括输入层和输出层。将训练图像数据输入卷积神经网络,卷积神经网络的卷积层对训练图像数据进行卷积计算,根据设置的过滤器(Filter)数量获得对应数量的特征图(Feature Map)。将获得的特征图在池化层进行下采样计算,获得池化后的特征图。其中,下采样计算的目的是去掉特征图中不重要的样本,进一步减少参数数量。下采样计算的方法很多,其中最常用的是最大池化,最大池化实际上就是在n*n的样本中取最大值,作为采样后的样本值。除了最大池化之外,常用的还有平均池化,即取在n*n的样本中取各样本的平均值,本实施例采用最大池化的下采样计算方法。其中,卷积层和池化层是成对出现的,即在卷积层进行卷积计算后紧跟着在池化层对卷积计算获取的特征图进行下采样计算。之后经过多轮卷积-池化处理的特征图将经过至少一个全连接层和在网络模型中最后的一层输出层。此时输出层和普通的全连接层唯一的区别是,激活函数是softmax函数,而全连接层的激活函数一般为sigmoid。通过计算各层的输出对卷积神经网络模型各层进行误差计算和梯度反传更新,获取更新后的各层的权值,基于更新后的各层的权值,获取人脸识别模型。通过卷积神经网络模型训练获得的人脸识别模型,可以更为准确地对用户的人脸进行识别,为确定是否用户本人驾驶提供了保证。
如图4所示,步骤S12中,采用训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型,具体包括如下步骤:
S121:初始化卷积神经网络模型。
具体地,初始化卷积神经网络主要是初始化卷积层的卷积核(即权值)和偏置。卷积神经网络模型的权值初始化就是指给卷积神经网络模型中的所有权值赋予一个初始值。如果初始权值处在误差曲面的一个相对平缓的区域时,卷积神经网络模型训练的收敛速度可能会异常缓慢。一般情况下,网络的权值被初始化在一个具有0均值的相对小的区间内均匀分布,比如[-0.30,+0.30]这样的区间内。
S122:在卷积神经网络模型中输入训练图像数据,计算卷积神经网络模型各层的输出。
本实施例中,在卷积神经网络模型中输入训练图像数据,计算卷积神经网络模型各层的输出,各层的输出采用前向传播算法可获取。其中,不同于全连接的神经网络模型,对于局部连接的卷积神经网络模型还需计算模型中卷积层的每一种输出的特征图和池化层的每一种输出的特征图,以对权值进行更新。具体地,对于卷积层的每一种输出的特征图
Figure PCTCN2017107814-appb-000001
其中,l是当前层,Mj表示选择的输入特征图组合,
Figure PCTCN2017107814-appb-000002
是输入的第i种特征图即l-1层的输出,
Figure PCTCN2017107814-appb-000003
是l层输入的第i种特征图和输出的第j种特征图之间连接所用的卷积核,
Figure PCTCN2017107814-appb-000004
是第j种特征图l层对应的加性偏置,f是激活函数,该激活函数可以是sigmoid激活函数。此外,对于池化层的每一种输出的特征图xj
Figure PCTCN2017107814-appb-000005
其中,down表示下采样计算,这里的
Figure PCTCN2017107814-appb-000006
第j种特征图l层对应的乘性偏置,b是第j种特征图l层对应的加性偏置。本实施例主要给出卷积神经网络模型中区别与一般全连接的神经网络模型的卷积层和池化层输出,其余各层的输出与一般全连接的神经网络模型计算相同,采用前向传播算法可获取,故不一一举例,以免累赘。
S123:根据各层的输出对卷积神经网络模型各层进行误差反传更新,获取更新后的各层的权值。
步骤S122中,获得的预测值与真实值之间必然存在误差,需要将这个误差信息逐层回传给每一层,让每一层更新它们的权值,才能获得识别效果更好的人脸识别模型。本实施例中,根据各层的输出对卷积神经网络模型各层进行误差反传更新,获取更新后的各层的权值,具体包括计算卷积神经网络模型每一层的误差信息,并用梯度下降法更新每一层的权值。其中,梯度下降法更新权值主要是利用误差代价函数对参数的梯度,所以权值更新的目标就是让每一层得到这样的梯度,然后更新。
在一具体实施方式中,步骤S123具体包如下步骤:根据第n个误差代价函数的表达式
Figure PCTCN2017107814-appb-000007
其中n为单个训练样本,在卷积神经网络模型中的目标输出为
Figure PCTCN2017107814-appb-000008
t2,t3,...,tk),用
Figure PCTCN2017107814-appb-000009
表示,
Figure PCTCN2017107814-appb-000010
为实际输出,c为实际输出的维度。为了求取单个样本的误差代价函数对参数的偏导,这里定义灵敏度δ为误差对输出的变化率,灵敏度的表达式为
Figure PCTCN2017107814-appb-000011
其中E为误差代价函数,其中u为ul=Wlxl-1+bl,l表示当前第l层,Wl表示 该层的权值,xl-1表示该层的输入,bl表示该层的加性偏置。通过计算灵敏度层层回传误差信息即可实现反向传播,其中反向传播的过程是指对卷积神经网络模型各层进行误差反传更新,获取更新后的各层的权值的过程。则有卷积层第l层的灵敏度为
Figure PCTCN2017107814-appb-000012
其中,ο表示每个元素相乘,因为每个神经元连接都会有一个灵敏度δ,所以每一层的灵敏度是一个矩阵,l+1层是指池化层,其运算的本质相当于也是做卷积运算,例如做特征图大小为2的下采样操作,就是用2*2的每个值为1/4的卷积核卷积图像,所以这里的权值W实际上就是这个2*2的卷积核,它的值即为βj。up表示上采样计算,上采样计算是与下采样计算相对的计算,在做下采样计算时采样因子是n,则上采样计算即将每个像素分别在垂直与水平方向上复制n倍。由于l+1池化层的灵敏度矩阵是l层灵敏度矩阵的尺寸的1/4,所以需对l+1层的灵敏度矩阵做上采样计算,使它们尺寸一致。根据获得的灵敏度,计算误差代价函数对加性偏置b的偏导为
Figure PCTCN2017107814-appb-000013
即对层l中的灵敏度中所有节点求和,其中(u,v)代表灵敏度矩阵中的元素位置。乘性偏置β与前向传播中当前层的池化层相关,因此先定义
Figure PCTCN2017107814-appb-000014
则计算误差代价函数对乘性偏置β的偏导为
Figure PCTCN2017107814-appb-000015
之后计算误差代价函数对卷积核k的偏导
Figure PCTCN2017107814-appb-000016
这里
Figure PCTCN2017107814-appb-000017
Figure PCTCN2017107814-appb-000018
在做卷积时,与kij做卷积的每一个特征图中的小块,(u,v)是指小块中心,输出特征图中(u,v)位置的值,是由输入特征图中(u,v)位置的小块和卷积核kij卷积所得的值。根据以上公式的运算,可以获得更新后的卷积神经网络模型卷积层的权值。在卷积神经网络模型的训练过程中,还应对池化层进行更新,对于池化层的每一种输出的特征图
Figure PCTCN2017107814-appb-000019
其中,down表示下采样,这里的β是乘性偏置,b是加性偏置。卷积神经网络模型中池化层灵敏度的计算公式为
Figure PCTCN2017107814-appb-000020
并且根据δ可求得有误差代价函数对加性偏置b的偏导为
Figure PCTCN2017107814-appb-000021
其中conv2、rot180和full为计算所需的函数,以上公式的其余参数与上述卷积层公式提及的参数含义相同,在此不再详述。 根据上述公式,可获取更新后的池化层权值,此外还应对卷积神经网络模型的其他各层(如全连接层)间权值进行更新,该更新过程与一般的全连接神经网络模型的权值更新方法相同,采用后向传播算法更新权值,为避免累赘,再此不一一进行详述。通过对卷积神经网络模型各层进行误差反传更新,获取更新后的各层的权值。
S124:基于更新后的各层的权值,获取人脸识别模型。
本实施例中,将获取的更新后的各层的权值,应用到卷积神经网络模型中即可获取训练后的人脸识别模型。进一步地,该人脸识别模型中各层之间的权值反映了图像中各部分模块与其相邻模块间的潜在关系,实现了对图片信息的有效抓取和识别效果。在人脸识别模型最终会输出一概率值,该概率值表示待识别图像数据在通过人脸识别模型处理后与该目标驾驶模型的贴近程度。该模型可广泛应用于驾驶人识别,以达到准确识别是否目标用户本人驾驶的目的。
S13:采用训练音频数据对卷积神经网络模型进行训练,获取音频识别模型。
本实施例中,采用训练音频数据对卷积神经网络模型进行训练,需先对音频数据进行处理,把获取的抽象的音频数据转化为训练声谱图。训练声谱图输入卷积神经网络,卷积神经网络的卷积层对声谱图进行卷积计算,根据设置的过滤器(Filter)数量获得对应数量的特征图(Feature Map)。将获得的特征图在池化层进行下采样计算,获得池化后的特征图。其中,卷积层和池化层是成对出现的,即在卷积层进行卷积计算后紧跟着在池化层对卷积计算获取的特征图进行下采样计算。之后经过多轮卷积-池化处理的特征图将经过至少一个全连接层和在网络模型中最后的一层输出层。通过计算各层的输出对卷积神经网络模型各层进行误差计算及梯度反传更新,获取更新后的各层的权值,基于更新后的各层的权值,获取音频识别模型。通过卷积神经网络模型训练获得的音频识别模型,可以更为准确地对用户进行识别,为确定是否用户本人驾驶提供了保证。
如图5所示,步骤S13中,采用训练音频数据对卷积神经网络模型进行训练,获取音频识别模型,具体包括如下步骤:
S131:初始化卷积神经网络模型。
本实施例中,与训练人脸识别模型的步骤类似,需对卷积神经网络模型进行初始化操作。卷积神经网络的初始化主要是初始化卷积层的卷积核(即权值)和偏置。网络权值初始化就是将网络中的所有权值赋予一个初始值。在训练音频识别模型过程中,卷积神经网络模型的初始值设置可以与训练人脸识别模型不同,比如在[-0.20,+0.20]这样的区间内。
S132:基于训练音频数据获取对应的训练声谱图;
本实施例中,直接获取的训练音频数据不能直接输入CNN模型进行音频识别模型训练,需先基于训练音频数据获取训练声谱图。步骤S132具体包括如下步骤:首先,将训练音频数据分割成很短的帧,这些帧可以为几百毫秒;并且为了确保信息的连续性和准确性,相邻的帧间还应存在重叠部分。以上对应的概念分别是帧长和步长。帧长为一帧的时长,步长为一帧的起点与下一帧起点的间隔时长。由于相邻帧之前需具有一定的重叠,因此步长通常小于帧长。因为帧长一般值很小,因此可认为在该短的时间域内,其基频和谐波及他们的强度均为定值。然后,将每个帧做短时傅里叶变换,获取对应的频谱信息。其中,短时傅里叶变换(Short-Time Fourier Transform,简称STFT)是和傅里叶变换相关的一种数学变换,用以确定时变信号其局部区域正弦波的频率与相位。频谱信息包括每帧的频率及其强度情况,采用颜色或灰度表示其强度,即可获取训练声谱图。
S133:在卷积神经网络模型输入训练声谱图,计算卷积神经网络模型各层的输出。
本实施例中,将训练声谱图输入到卷积神经网络进行训练,并计算卷积神经网络模型各层的输出,各层的输出采用前向传播算法可获取。其中,不同于全连接的神经网络模型,对于局部连接的卷积神经网络模型还需计算模型中卷积层的每一种输出的特征图和池化层的每一种输出的特征图,以对权值进行更新。具体地,对于卷积层的每一种输出的特征图xj
Figure PCTCN2017107814-appb-000022
其中,l是当前层,Mj表示选择的输入特征图组合,
Figure PCTCN2017107814-appb-000023
是输入的第i种特征图即l-1层的输出,
Figure PCTCN2017107814-appb-000024
是l层输入的第i种特征图和输出的第j种特征图之间连接所用的卷积核,
Figure PCTCN2017107814-appb-000025
是第j种特征图l层对应的加性偏置,f是激活函数,该激活函数可以是sigmoid激活函数。此外,池化层的每一种输出的特征图的计算与步骤S122相同,在此不再重复叙述。本实施例主要给出卷积神经网络模型中区别与一般全连接的神经网络模型的卷积层和池化层输出,其余各层的输出与一般全连接的神经网络模型计算相同,采用前向传播算法可获取,故不一一举例,以免累赘。
S134:根据各层的输出对卷积神经网络模型各层进行误差反传更新,获取更新后的各层的权值。
本实施例中,根据各层的输出对卷积神经网络模型各层进行误差反传更新,获取更新后的各层的权值包括计算卷积神经网络模型每一层的误差信息,并用梯度下降法更新每一层的权值。其中,梯度下降法更新权值主要是利用误差代价函数对参数的梯度,所以权值更新的目标就是让每一层得到这样的梯度,然后更新。具体权值及公式实现请参考步骤S123,为避免赘述,本实施例将不再展开详述。
S135:基于更新后的各层的权值,获取音频识别模型。
本实施例中,将通过训练声谱图训练获取的各层更新后权值,应用到卷积神经网络模型中即可获取训练后的音频识别模型。进一步地,该音频识别模型中各层之间的权值反映了声谱图中各部分模块与其相邻模块间的潜在关系,也间接反映了训练音频数据与用户的相关度。在音频识别模型最终会输出一概率值,该概率值表示待识别音频数据在通过驾驶模型处理后与该音频识别模型的贴近程度。该模型可广泛应用于驾驶人识别,以达到准确识别是否目标用户本人驾驶的目的。
S14:采用训练图像数据和音频图像数据对人脸识别模型和音频识别模型进行一致性验证,将通过验证的人脸识别模型和音频识别模型与用户标识关联存储。
具体地,采用训练图像数据和音频图像数据对人脸识别模型和音频识别模型进行一致性验证,是指验证采用人脸识别模型和音频识别模型对同一驾驶场景下的训练图像数据和训练音频数据进行识别,两者的识别结果同时指向是目标用户开车或者不是目标用户开车时,则认定人脸识别模型和音频识别模型对同一驾驶场景下的训练图像数据和训练音频数据识别具有一致性。基于多个驾驶场景下的训练图像数据和训练音频数据进行一致性验证,并统计验证结果,即统计符合一致性的数量和不符合一致性的数量;再根据统计的验证结果计算符合一致性的验证概率,并判断该验证概率是否大于预设概率,若验证概率大于预设概率,则认定该人脸识别模型和音频识别模型通过验证,将通过验证的人脸识别模型和音频识别模型与用户标识关联存储。将通过一致性验证的人脸识别模型和音频识别模型进行关联存储,可有利于保障人脸识别模型和音频识别模型识别的准确性。
具体地,采用人脸识别模型和音频识别模型对同一驾驶场景下的训练图像数据和训练音频数据进行一致性验证,是指将同一驾驶场景的训练图像数据输入人脸识别模型进行识别,获取第一识别结果;并将同一驾驶场景的训练音频数据输入音频识别模型进行识别,获取第二识别结果;判断第一识别结果和第二识别结果是否一致;若第一识别结果和第二识别结果一致,则认定符合一致性。可以理解地,第一识别结果和第二识别结果均可采用概率值,若该概率值大于50%,认定识别结果为是目标用户本人开车;若该概率值小于50%,认定识别结果为不是目标用户本人开车;只有在第一识别结果和第二识别结果同时大于50%或者同时小于50%,才认定符合一致性。
其中,与用户标识关联存储是指根据同一用户的用户标识进行存储,该存储依赖于用户标识,使得人脸识别模型和音频识别模型通过同一用户标识关联起来。本实施例中,将训练获得的人脸识别模型和音频识别模型与用户标识关联存储,即将带有相同用户标识的 人脸识别模型和音频识别模型存储在数据库中,并在数据库中创建模型信息表,模型信息表包括用户标识和与用户标识相对应的人脸识别模型和音频识别模型在数据库中的存储地址。将人脸识别模型和音频识别模型根据用户标识进行关联存储,两者共同形成一个整体的驾驶模型,以便于在利用驾驶模型进行识别时,同时调用整体驾驶模型中的人脸识别模型和音频识别模型,实现对图像数据和音频数据的识别,使得驾驶模型实现了从两个重要维度识别是否用户本人开车,充分利用了图像维度和音频维度间的潜在联系,使得识别结果更加贴近实际驾驶情况,提高了识别的准确度。
本实施例中,首先获取同一驾驶场景的训练图像数据和训练音频数据,训练图像数据和训练音频数据与用户标识相关联,以保证获取的数据为同一用户同一时刻的驾驶行为产生的,并且通过用户标识便于获取进行驾驶模型训练所需的训练图像数据和训练音频数据,以保证训练获得的驾驶模型能够通过人脸识别和音频识别确定是否用户本人驾驶。接着采用训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型,通过卷积神经网络模型训练获得的人脸识别模型,反映了图像中各部分模块与其相邻模块间的潜在关系,实现了对图片信息的有效抓取和识别效果,能够更为准确地对用户进行识别,为确定是否用户本人驾驶提供了保证。然后采用训练音频数据对卷积神经网络模型进行训练,获取音频识别模型,通过卷积神经网络模型训练获得的音频识别模型,反映了声谱图中各部分模块与其相邻模块间的潜在关系,也间接反映了训练音频数据与用户的相关度,能够实现精准识别,为确定是否用户本人驾驶提供了保证。最后采用训练图像数据和音频图像数据对人脸识别模型和音频识别模型进行一致性验证,将通过验证的人脸识别模型和音频识别模型与用户标识关联存储,该关联存储可以将同一用户的用户标识把人脸识别模型和音频识别模型通过数据库建立模型信息表进行关联存储。将通过一致性验证的上述两种识别模型存储在与同一用户标识关联的数据库中,以便对同一驾驶场景下获取的待识别图像数据和待识别人脸数据分别进行人脸识别和音频识别,有效保证驾驶模型识别的准确率。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
实施例2
图6示出与实施例1中驾驶模型训练方法一一对应的驾驶模型训练装置的原理框图。如图6所示,该驾驶模型训练装置包括训练数据获取模块11、人脸识别模型获取模块12、音频识别模型获取模块13和关联存储模块14。其中,训练数据获取模块11、人脸识别模型获取模块12、音频识别模型获取模块13和关联存储模块14的实现功能与实施例中驾驶 模型训练方法对应的步骤一一对应,为避免赘述,本实施例不一一详述。
训练数据获取模块11,用于获取同一驾驶场景的训练图像数据和训练音频数据。
人脸识别模型获取模块12,用于采用训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型。
音频识别模型获取模块13,用于采用训练音频数据对卷积神经网络模型进行训练,获取音频识别模型。
关联存储模块14,用于采用训练图像数据和音频图像数据对人脸识别模型和音频识别模型进行一致性验证,将通过验证的人脸识别模型和音频识别模型与用户标识关联存储。
优选地,训练数据获取模块11包括训练指令获取单元111、信息表查询单元112和训练数据获取单元113。
训练指令获取单元111,用于获取用户输入的模型训练指令,模型训练指令包括用户标识。
信息表查询单元112,用于基于用户标识查询驾驶数据信息表,判断驾驶数据信息的数量是否大于预设数量。
训练数据获取单元113,用于若驾驶数据的数量大于预设数量,则获取同一驾驶场景的训练图像数据和训练音频数据。
优选地,人脸识别模型获取模块12包括第一模型初始化单元121、第一模型层输出单元122、第一权值更新单元123和人脸识别模型获取单元124。
第一模型初始化单元121,用于初始化卷积神经网络模型。
第一模型层输出单元122,用于在卷积神经网络模型中输入训练图像数据,计算卷积神经网络模型各层的输出。
第一权值更新单元123,用于根据各层的输出对卷积神经网络模型各层进行误差反传更新,获取更新后的各层的权值。
人脸识别模型获取单元124,用于基于更新后的各层的权值,获取人脸识别模型。
优选地,音频识别模型获取模块13包括第二模型初始化单元131、训练声谱图获取单元132、第二模型层输出单元133、第二权值更新单元134和音频识别模型获取单元135。
第二模型初始化单元131,用于初始化卷积神经网络模型。
训练声谱图获取单元132,用于对训练音频数据进行特征提取,获取对应的训练声谱图。
第二模型层输出单元133,用于在卷积神经网络模型输入训练声谱图,计算卷积神经 网络模型各层的输出。
第二权值更新单元134,用于根据各层的输出对卷积神经网络模型各层进行误差反传更新,获取更新后的各层的权值。
音频识别模型获取单元135,用于基于更新后的各层的权值,获取音频识别模型。
本实施例所提供的驾驶模型训练装置中,训练数据获取模块11用于获取同一驾驶场景的训练图像数据和训练音频数据,将同一场景下相关联的训练图像数据和训练音频数据同时进行采集,使得采集获取的训练数据具有潜在的相关性,有效利用了图像维度和声音维度各自的特性,使得训练获得的驾驶模型(包括人脸识别模型部分和音频识别部分的驾驶模型)进行识别时更贴近实际场景,驾驶提高了驾驶模型的识别精度。人脸识别模型获取模块12用于采用训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型,通过采用训练图像数据训练卷积神经网络,使得训练获取的人脸识别模型中权值具有与用户标识相关联的训练图像数据的特征,使得人脸识别模型借助更新后的权值能够进行更精准的识别,提高了人脸识别模型的识别效果。音频识别模型获取模块13用于采用训练音频数据对卷积神经网络模型进行训练,与人脸识别模型的训练相似,采用训练音频数据对卷积神经网络进行训练,更新网络的权值,使得训练获取的音频识别模型具有与用户标识相关联的训练音频数据的特征,提高了人脸识别模型的识别效果。获取音频识别模型。关联存储模块14用于采用训练图像数据和音频图像数据对人脸识别模型和音频识别模型进行一致性验证,将通过验证的人脸识别模型和音频识别模型与用户标识关联存储,在同一场景下获取的训练图像数据和训练音频数据存在潜在的联系,基于用户标识将两个模型进行关联存储,以作为一个整体的驾驶模型,提高了是否用户本人驾驶的识别效果。
实施例3
图7示出本实施例中驾驶人识别方法的一流程图。该驾驶人识别方法可应用在保险机构或者其他机构的终端设备上,以便对驾驶人驾驶行为进行识别,达到智能识别的效果。如图7所示,该驾驶人识别方法包括如下步骤:
S21:获取用户同一驾驶场景的待识别图像数据和待识别音频数据,待识别图像数据和待识别音频数据与用户标识相关联。
其中,待识别图像数据和待识别音频数据是指用户在实际驾驶过程中通过移动终端的摄像头和录音设备分别采集的实时图像数据和音频数据,该数据用于进行模型识别,以判断是否用户本人开车。本实施例中,用户的移动终端根据用户的驾驶情况获取实时的待识别图像数据和待识别音频数据,且待识别图像数据和待识别音频数据与用户标识相关联。 该待识别数据是用户在同一驾驶场景下获取的,即用户的移动终端获取同一时刻中用户在驾驶时的待识别图像数据和待识别音频数据。
S22:基于用户标识查询数据库,调用与用户标识相对应的人脸识别模型和音频识别模型。
其中,该人脸识别模型和音频识别模型是采用实施例1中的驾驶模型训练方法获取的模型,具体是与用户标识关联存储并通过一致性验证的人脸识别模型和音频识别模型。本实施例中,根据待识别图像数据所携带的用户标识,在数据库中查找该用户标识相对应的模型信息表中与用户标识相对应的人脸识别模型在数据库中的存储地址,并根据该存储地址,调用与用户标识相对应的人脸识别模型。可以理解地,即通过用户的移动终端上的摄像头和录音设备分别采集的待识别图像数据和待识别音频数据均携带有用户标识;而存储在数据库中的模型信息表也包括用户标识,且模型信息表中包括与用户标识相对应的人脸识别模型和音频识别模型在数据库中的存储地址,即通过用户标识查询模型信息表,再根据表中的人脸识别模型存储地址调用存储在数据库中的人脸识别模型;并根据表中的音频识别模型存储地址调用存储在数据库中的音频识别模型。
S23:基于待识别图像数据和人脸识别模型获取第一概率。
本实施例中,根据调用的人脸识别模型,将获取的待识别图像数据在人脸识别模型中进行运算处理,使人脸识别模型输出一概率值,该概率值称为第一概率,以区别于音频识别模型识别获取的概率值。
S24:基于待识别音频数据和音频识别模型获取第二概率。
本实施例中,根据调用的音频识别模型,将获取的待识别音频数据在音频识别模型中进行运算处理,最终在音频识别模型输出一概率值,该概率值称为第二概率,以区别于人脸识别模型识别获取的第一概率值。
S25:基于第一概率和第二概率确定用户本人开车的最终概率。
本实施例中,将待识别图像数据在人脸识别模型处理识别获取的第一概率值和待识别音频数据在音频识别模型处理识别获取的第二概率值进行数值处理,以获取用于最终判断是否用户本人开车的最终概率。可以理解地,该最终概率反映的是图像和音频维度上的关系,可以有效消除单一维度上识别的不足,使得模型的识别效果更精确。
在一具体实施方式中,如图8所示,步骤S25中,基于第一概率和第二概率确定用户本人开车的最终概率,具体包括如下步骤:
S251:获取第一概率和第二概率的概率差值。
本实施例中,将获取的第一概率和第二概率以数值大的概率值减去数值小的概率值,获取第一概率和第二概率的概率差值。可以理解地,该概率差值是采用人脸识别模型和音频识别模型各自获取的概率值的误差值,其反映了不同维度进行识别的差异,通过对第一概率和第二概率的概率差值的计算,可以利用该差值进一步减小识别过程的误差,使得识别效果更为准确。
S252:判断概率差值是否大于第一预设阈值。
其中,第一预设阈值是指预先设置好的与概率差值进行比较的阈值。本实施例中,将获取的概率差值与第一预设阈值进行比较,判断概率差值是否大于第一预设阈值。
S253:若概率差值大于第一预设阈值,则选取第一概率和第二概率中较大值作为最终概率输出。
本实施例中,在获取概率差值与第一预设阈值的差值后,若概率差值大于第一预设阈值,则将第一概率和第二概率中较大值作为最终概率输出。具体地,如第一概率值为92%,第二概率值为98%,第一预设阈值为5%,则可计算得概率差值为6%,该概率差值大于第一预设阈值,则选取第一概率和第二概率中较大值98%作为最终概率输出。可以理解地,第一预设阈值是用于减少识别过程中的误差,当偶尔出现数据异常进行识别时,可以有效去除不真实的数据,一般而言,这种情况会比较少出现。
S254:若概率差值不大于第一预设阈值,则选取第一概率和第二概率的均值作为最终概率输出。
S26:若最终概率大于第二预设阈值,则确定为用户本人开车。
其中,第二预设阈值是指用于和最终概率进行比较的预先设置好的阈值。本实施例中,将获取的最终概率与第二预设阈值进行比较,若最终概率大于第二预设阈值,则确定为用户本人开车,进一步地,若最终概率不大于第二预设阈值,则确定不是用户本人开车。具体地,获取的最终概率可以是98%,第二预设阈值是95%,则可以确定为用户本人开车。
本实施例中,在获取概率差值与第一预设阈值的差值后,若概率差值不大于第一预设阈值,则选取第一概率和第二概率的均值作为最终概率输出。具体地,如第一概率值为97%,第二概率值为99%,第一预设阈值为5%,则可计算得概率差值为2%,该概率差值不大于第一预设阈值,则选取第一概率和第二概率的均值98%作为最终概率输出。可以理解地,在进行图像维度和音频维度的识别获得的第一概率值和第二概率值,当概率差值不大于第一预设阈值时,取均值可以使得获得的数据更加准确,进行识别的结果也会更加贴近实际情况,使得模型识别的结果更为准确。
本实施例中,基于待识别图像数据和人脸识别模型获取第一概率,基于待识别音频数据和音频识别模型获取第二概率,根据第一概率和第二概率确定用户本人开车的最终概率,根据获得得最终概率与第一预设阈值进行比较,可以减少识别过程中的误差,当偶尔出现数据异常进行识别时,可以有效去除不真实的数据且使得识别的结果会更加贴近实际情况,获得更好的识别结果,最后根据判断最终概率是否大于第二预设阈值以确定是否为用户本人开车,使得驾驶人识别结果更精确可靠。
实施例4
图9示出与实施例3中驾驶人识别方法一一对应的驾驶人识别装置的原理框图。如图9所示,该驾驶人识别装置包括待识别数据获取模块21、识别模型调用模块22、第一概率获取模块23、第二概率获取模块24、最终概率获取模块25和确认结果获取模块26。其中,待识别数据获取模块21、识别模型调用模块22、第一概率获取模块23、第二概率获取模块24、最终概率获取模块25和确认结果获取模块26的实现功能与实施例中驾驶人识别方法对应的步骤一一对应,为避免赘述,本实施例不一一详述。
待识别数据获取模块21,用于获取用户同一驾驶场景的待识别图像数据和待识别音频数据,待识别图像数据和待识别音频数据与用户标识相关联。
识别模型调用模块22,用于基于用户标识查询数据库,调用与用户标识相对应的人脸识别模型和音频识别模型,人脸识别模型和音频识别模型是采用实施例1中驾驶模型训练方法获取的模型。
第一概率获取模块23,用于基于待识别图像数据和人脸识别模型获取第一概率。
第二概率获取模块24,用于基于待识别音频数据和音频识别模型获取第二概率。
最终概率获取模块25,用于基于第一概率和第二概率确定用户本人开车的最终概率。
优选地,最终概率获取模块25包括概率差值获取单元251、概率差值判断单元252、第一最终概率输出单元253和第二最终概率输出单元254。
概率差值获取单元251,用于获取第一概率和第二概率的概率差值。
概率差值判断单元252,用于判断概率差值是否大于第一预设阈值。
第一最终概率输出单元253,用于若概率差值大于第一预设阈值,则选取第一概率和第二概率中较大值作为最终概率输出。
第二最终概率输出单元254,用于若概率差值不大于第一预设阈值,则选取第一概率和第二概率的均值作为最终概率输出。
确认结果获取模块26,用于若最终概率大于第二预设阈值,则确定为用户本人开车。
本实施例所提供的驾驶人识别方法装置中,模块21-模块26实现了通过获取第一概率和第二概率并以数值大的概率值减去数值小的概率值,获取第一概率和第二概率的概率差值。该概率差值是采用人脸识别模型和音频识别模型各自获取的概率值的误差值,其反映了不同维度进行识别的差异,通过对第一概率和第二概率的概率差值的计算,通过设置第一预设阈值和第二预设阈值有效实现了对异常数据的控制和数值取值的合理性操作,使得获取的概率值更贴近实际情况,进一步减来小识别过程的误差,使得识别效果更为准确。
实施例5
本实施例提供一计算机可读介质,该计算机可读介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现实施例1中驾驶模型训练方法,为避免重复,这里不再赘述。或者,该计算机可读指令被处理器执行时实现实施例2中驾驶模型训练装置的各模块/单元的功能,为避免重复,这里不再赘述。或者,该计算机可读指令被处理器执行时实现实施例3中驾驶人识别方法中各步骤的功能,为避免重复,此处不一一赘述。或者,该计算机可读指令被处理器执行时实现实施例4中驾驶人识别装置中各模块/单元的功能,为避免重复,此处不一一赘述。
实施例6
图10是本申请一实施例提供的终端设备的一示意图。如图10所示,该实施例的终端设备100包括:处理器101、存储器102以及存储在存储器102中并可在处理器101上运行的计算机可读指令103,该计算机可读指令被处理器101执行时实现实施例1中的驾驶模型训练方法,为避免重复,此处不一一赘述。或者,该计算机可读指令被处理器101执行时实现实施例2中驾驶模型训练装置中各模型/单元的功能,为避免重复,此处不一一赘述。或者,该计算机可读指令被处理器101执行时实现实施例3中驾驶人识别方法中各步骤的功能,为避免重复,此处不一一赘述。或者,该计算机可读指令被处理器101执行时实现实施例4中驾驶人识别装置中各模块/单元的功能。为避免重复,此处不一一赘述。
示例性的,计算机可读指令103可以被分割成一个或多个模块/单元,一个或者多个模块/单元被存储在存储器102中,并由处理器101执行,以完成本申请。一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述计算机可读指令103在终端设备100中的执行过程。例如,计算机可读指令100可以被分割成实施例2中的训练数据获取模块11、人脸识别模型获取模块12、音频识别模型获取模块13和关联存储模块14,或者实施例4中的待识别数据获取模块21、识别模型调用模块22、第一概率获取模块23、第二概率获取模块24、最终概率获取模块25和确认结果获取模块 26,各模块的具体功能如实施例2或实施例4所述,在此不一一赘述。
终端设备100可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。终端设备可包括,但不仅限于,处理器101、存储器102。本领域技术人员可以理解,图10仅仅是终端设备100的示例,并不构成对终端设备100的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如终端设备还可以包括输入输出设备、网络接入设备、总线等。
所称处理器101可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器102可以是终端设备100的内部存储单元,例如终端设备100的硬盘或内存。存储器102也可以是终端设备100的外部存储设备,例如终端设备100上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器102还可以既包括终端设备100的内部存储单元也包括外部存储设备。存储器102用于存储计算机可读指令以及终端设备所需的其他程序和数据。存储器102还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机可读介质中,该计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机可读指令包括计算机可读指令代码,所述 计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括是电载波信号和电信信号。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种驾驶模型训练方法,其特征在于,包括:
    获取同一驾驶场景的训练图像数据和训练音频数据,所述训练图像数据和所述训练音频数据与用户标识相关联;
    采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型;
    采用所述训练音频数据对所述卷积神经网络模型进行训练,获取音频识别模型;
    采用所述训练图像数据和所述音频图像数据对所述人脸识别模型和所述音频识别模型进行一致性验证,将通过验证的所述人脸识别模型和所述音频识别模型与所述用户标识关联存储。
  2. 根据权利要求1所述的驾驶模型训练方法,其特征在于,所述获取同一驾驶场景的训练图像数据和训练音频数据,之前还包括:
    获取所述驾驶场景下车辆的当前车速,判断所述当前车速是否达到预设车速阈值;
    若所述当前车速达到所述预设车速阈值,则采集同一所述驾驶场景下的当前图像数据和当前音频数据,所述当前图像数据和所述当前音频数据与所述用户标识相关联;
    将所述当前图像数据和所述当前音频数据存储在数据库中;
    在所述数据库中创建驾驶数据信息表,所述驾驶数据信息表包括至少一条驾驶数据信息;每一所述驾驶数据信息包括用户标识、所述当前图像数据在所述数据库中的存储地址和所述当前音频数据在所述数据库中的存储地址;
    所述获取同一驾驶场景的训练图像数据和训练音频数据,包括:
    获取用户输入的模型训练指令,所述模型训练指令包括用户标识;
    基于所述用户标识查询所述驾驶数据信息表,判断所述驾驶数据信息的数量是否大于预设数量;
    若所述驾驶数据的数量大于所述预设数量,则获取同一驾驶场景的所述训练图像数据和所述训练音频数据。
  3. 根据权利要求1所述的驾驶模型训练方法,其特征在于,所述采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型,包括:
    初始化所述卷积神经网络模型;
    在所述卷积神经网络模型中输入所述训练图像数据,计算所述卷积神经网络模型各层的输出;其中,卷积层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100001
    其中,l是当前 层,Mj表示选择的输入特征图组合,
    Figure PCTCN2017107814-appb-100002
    是输入的第i种特征图,
    Figure PCTCN2017107814-appb-100003
    是l层输入的第i种特征图和输出的第j种特征图之间连接所用的卷积核,
    Figure PCTCN2017107814-appb-100004
    是第j种特征图l层对应的加性偏置,f是激活函数;池化层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100005
    其中,down表示下采样计算,这里的
    Figure PCTCN2017107814-appb-100006
    第j种特征图l层对应的乘性偏置,
    Figure PCTCN2017107814-appb-100007
    是第j种特征图l层对应的加性偏置;
    根据所述各层的输出对所述卷积神经网络模型各层进行误差反传更新,获取更新后的所述各层的;
    基于更新后的所述各层的权值,获取人脸识别模型。
  4. 根据权利要求1所述的驾驶模型训练方法,其特征在于,所述采用所述训练音频数据对卷积神经网络模型进行训练,获取音频识别模型,包括:
    初始化所述卷积神经网络模型;
    基于所述训练音频数据获取对应的训练声谱图;
    在所述卷积神经网络模型输入所述训练声谱图,计算所述卷积神经网络模型各层的输出;其中,卷积层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100008
    其中,l是当前层,Mj表示选择的输入特征图组合,
    Figure PCTCN2017107814-appb-100009
    是输入的第i种特征图l-1层的输出,
    Figure PCTCN2017107814-appb-100010
    是l层输入的第i种特征图和输出的第j种特征图之间连接所用的卷积核,
    Figure PCTCN2017107814-appb-100011
    是第j种特征图l层对应的加性偏置,f是激活函数;池化层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100012
    其中,down表示下采样计算,这里的
    Figure PCTCN2017107814-appb-100013
    第j种特征图l层对应的乘性偏置,
    Figure PCTCN2017107814-appb-100014
    是第j种特征图l层对应的加性偏置;
    根据所述各层的输出对所述卷积神经网络模型各层进行误差反传更新,获取更新后的所述各层的权值;
    基于更新后的所述各层的权值,获取音频识别模型。
  5. 一种驾驶人识别方法,其特征在于,包括:
    获取用户同一驾驶场景的待识别图像数据和待识别音频数据,所述待识别图像数据和所述待识别音频数据与用户标识相关联;
    基于所述用户标识查询数据库,调用与所述用户标识相对应的人脸识别模型和音频识别模型,所述人脸识别模型和所述音频识别模型是采用权利要求1-4任一项所述驾驶模型 训练方法获取的模型;
    基于所述待识别图像数据和所述人脸识别模型获取第一概率;
    基于所述待识别音频数据和所述音频识别模型获取第二概率;
    基于所述第一概率和所述第二概率确定所述用户本人开车的最终概率;
    若所述最终概率大于第二预设阈值,则确定为所述用户本人开车。
  6. 根据权利要求5所述的驾驶人识别方法,其特征在于,所述基于所述第一概率和所述第二概率确定所述驾驶人本人开车的最终概率,包括:
    获取所述第一概率和所述第二概率的概率差值;
    判断所述概率差值是否大于第一预设阈值;
    若所述概率差值大于所述第一预设阈值,则选取所述第一概率和所述第二概率中较大值作为所述最终概率输出;
    若所述概率差值不大于所述第一预设阈值,则选取所述第一概率和所述第二概率的均值作为所述最终概率输出。
  7. 一种驾驶模型训练装置,其特征在于,包括:
    训练数据获取模块,用于获取同一驾驶场景的训练图像数据和训练音频数据,所述训练图像数据和所述训练音频数据与用户标识相关联;
    人脸识别模型获取模块,用于采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型;
    音频识别模型获取模块,用于基于所述训练音频数据对所述卷积神经网络模型进行训练,获取音频识别模型;
    关联存储模块,用于采用所述训练图像数据和所述音频图像数据对所述人脸识别模型和所述音频识别模型进行一致性验证,将通过验证的所述人脸识别模型和所述音频识别模型与所述用户标识关联存储。
  8. 一种驾驶人识别装置,其特征在于,包括:
    待识别数据获取模块,用于获取用户同一驾驶场景的待识别图像数据和待识别音频数据,所述待识别图像数据和所述待识别音频数据与用户标识相关联;
    识别模型调用模块,用于基于所述用户标识查询数据库,调用与所述用户标识相对应的人脸识别模型和音频识别模型,所述人脸识别模型和所述音频识别模型是采用权利要求1-4任一项所述驾驶模型训练方法获取的模型;
    第一概率获取模块,用于基于所述待识别图像数据和所述人脸识别模型获取第一概 率;
    第二概率获取模块,用于基于所述待识别音频数据和所述音频识别模型获取第二概率;
    最终概率获取模块,用于基于所述第一概率和所述第二概率确定所述用户本人开车的最终概率;
    确认结果获取模块,用于若所述最终概率大于第二预设阈值,则确定为所述用户本人开车。
  9. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取同一驾驶场景的训练图像数据和训练音频数据,所述训练图像数据和所述训练音频数据与用户标识相关联;
    采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型;
    采用所述训练音频数据对所述卷积神经网络模型进行训练,获取音频识别模型;
    采用所述训练图像数据和所述音频图像数据对所述人脸识别模型和所述音频识别模型进行一致性验证,将通过验证的所述人脸识别模型和所述音频识别模型与所述用户标识关联存储。
  10. 根据权利要求9所述的终端设备,其特征在于,所述获取同一驾驶场景的训练图像数据和训练音频数据,之前还包括:
    获取所述驾驶场景下车辆的当前车速,判断所述当前车速是否达到预设车速阈值;
    若所述当前车速达到所述预设车速阈值,则采集同一所述驾驶场景下的当前图像数据和当前音频数据,所述当前图像数据和所述当前音频数据与所述用户标识相关联;
    将所述当前图像数据和所述当前音频数据存储在数据库中;
    在所述数据库中创建驾驶数据信息表,所述驾驶数据信息表包括至少一条驾驶数据信息;每一所述驾驶数据信息包括用户标识、所述当前图像数据在所述数据库中的存储地址和所述当前音频数据在所述数据库中的存储地址;
    所述获取同一驾驶场景的训练图像数据和训练音频数据,包括:
    获取用户输入的模型训练指令,所述模型训练指令包括用户标识;
    基于所述用户标识查询所述驾驶数据信息表,判断所述驾驶数据信息的数量是否大于预设数量;
    若所述驾驶数据的数量大于所述预设数量,则获取同一驾驶场景的所述训练图像数据和所述训练音频数据。
  11. 根据权利要求9所述的终端设备,其特征在于,所述采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型,包括:
    初始化所述卷积神经网络模型;
    在所述卷积神经网络模型中输入所述训练图像数据,计算所述卷积神经网络模型各层的输出;其中,卷积层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100015
    其中,l是当前层,Mj表示选择的输入特征图组合,
    Figure PCTCN2017107814-appb-100016
    是输入的第i种特征图,
    Figure PCTCN2017107814-appb-100017
    是l层输入的第i种特征图和输出的第j种特征图之间连接所用的卷积核,
    Figure PCTCN2017107814-appb-100018
    是第j种特征图l层对应的加性偏置,f是激活函数;池化层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100019
    其中,down表示下采样计算,这里的
    Figure PCTCN2017107814-appb-100020
    第j种特征图l层对应的乘性偏置,
    Figure PCTCN2017107814-appb-100021
    是第j种特征图l层对应的加性偏置;
    根据所述各层的输出对所述卷积神经网络模型各层进行误差反传更新,获取更新后的所述各层的;
    基于更新后的所述各层的权值,获取人脸识别模型。
  12. 根据权利要求9所述的终端设备,其特征在于,所述采用所述训练音频数据对卷积神经网络模型进行训练,获取音频识别模型,包括:
    初始化所述卷积神经网络模型;
    基于所述训练音频数据获取对应的训练声谱图;
    在所述卷积神经网络模型输入所述训练声谱图,计算所述卷积神经网络模型各层的输出;其中,卷积层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100022
    其中,l是当前层,Mj表示选择的输入特征图组合,
    Figure PCTCN2017107814-appb-100023
    是输入的第i种特征图l-1层的输出,
    Figure PCTCN2017107814-appb-100024
    是l层输入的第i种特征图和输出的第j种特征图之间连接所用的卷积核,
    Figure PCTCN2017107814-appb-100025
    是第j种特征图l层对应的加性偏置,f是激活函数;池化层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100026
    其中,down表示下采样计算,这里的
    Figure PCTCN2017107814-appb-100027
    第j种特征图l层对应的乘性偏置,
    Figure PCTCN2017107814-appb-100028
    是第j种特征图l层对应的加性偏置;
    根据所述各层的输出对所述卷积神经网络模型各层进行误差反传更新,获取更新后的 所述各层的权值;
    基于更新后的所述各层的权值,获取音频识别模型。
  13. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取用户同一驾驶场景的待识别图像数据和待识别音频数据,所述待识别图像数据和所述待识别音频数据与用户标识相关联;
    基于所述用户标识查询数据库,调用与所述用户标识相对应的人脸识别模型和音频识别模型,所述人脸识别模型和所述音频识别模型是采用权利要求1-4任一项所述驾驶模型训练方法获取的模型;
    基于所述待识别图像数据和所述人脸识别模型获取第一概率;
    基于所述待识别音频数据和所述音频识别模型获取第二概率;
    基于所述第一概率和所述第二概率确定所述用户本人开车的最终概率;
    若所述最终概率大于第二预设阈值,则确定为所述用户本人开车。
  14. 根据权利要求13所述的终端设备,其特征在于,所述基于所述第一概率和所述第二概率确定所述驾驶人本人开车的最终概率,包括:
    获取所述第一概率和所述第二概率的概率差值;
    判断所述概率差值是否大于第一预设阈值;
    若所述概率差值大于所述第一预设阈值,则选取所述第一概率和所述第二概率中较大值作为所述最终概率输出;
    若所述概率差值不大于所述第一预设阈值,则选取所述第一概率和所述第二概率的均值作为所述最终概率输出。
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:
    获取同一驾驶场景的训练图像数据和训练音频数据,所述训练图像数据和所述训练音频数据与用户标识相关联;
    采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型;
    采用所述训练音频数据对所述卷积神经网络模型进行训练,获取音频识别模型;
    采用所述训练图像数据和所述音频图像数据对所述人脸识别模型和所述音频识别模型进行一致性验证,将通过验证的所述人脸识别模型和所述音频识别模型与所述用户标识 关联存储。
  16. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述获取同一驾驶场景的训练图像数据和训练音频数据,之前还包括:
    获取所述驾驶场景下车辆的当前车速,判断所述当前车速是否达到预设车速阈值;
    若所述当前车速达到所述预设车速阈值,则采集同一所述驾驶场景下的当前图像数据和当前音频数据,所述当前图像数据和所述当前音频数据与所述用户标识相关联;
    将所述当前图像数据和所述当前音频数据存储在数据库中;
    在所述数据库中创建驾驶数据信息表,所述驾驶数据信息表包括至少一条驾驶数据信息;每一所述驾驶数据信息包括用户标识、所述当前图像数据在所述数据库中的存储地址和所述当前音频数据在所述数据库中的存储地址;
    所述获取同一驾驶场景的训练图像数据和训练音频数据,包括:
    获取用户输入的模型训练指令,所述模型训练指令包括用户标识;
    基于所述用户标识查询所述驾驶数据信息表,判断所述驾驶数据信息的数量是否大于预设数量;
    若所述驾驶数据的数量大于所述预设数量,则获取同一驾驶场景的所述训练图像数据和所述训练音频数据。
  17. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述采用所述训练图像数据对卷积神经网络模型进行训练,获取人脸识别模型,包括:
    初始化所述卷积神经网络模型;
    在所述卷积神经网络模型中输入所述训练图像数据,计算所述卷积神经网络模型各层的输出;其中,卷积层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100029
    其中,l是当前层,Mj表示选择的输入特征图组合,
    Figure PCTCN2017107814-appb-100030
    是输入的第i种特征图,
    Figure PCTCN2017107814-appb-100031
    是l层输入的第i种特征图和输出的第j种特征图之间连接所用的卷积核,
    Figure PCTCN2017107814-appb-100032
    是第j种特征图l层对应的加性偏置,f是激活函数;池化层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100033
    其中,down表示下采样计算,这里的
    Figure PCTCN2017107814-appb-100034
    第j种特征图l层对应的乘性偏置,
    Figure PCTCN2017107814-appb-100035
    是第j种特征图l层对应的加性偏置;
    根据所述各层的输出对所述卷积神经网络模型各层进行误差反传更新,获取更新后的所述各层的;
    基于更新后的所述各层的权值,获取人脸识别模型。
  18. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述采用所述训练音频数据对卷积神经网络模型进行训练,获取音频识别模型,包括:
    初始化所述卷积神经网络模型;
    基于所述训练音频数据获取对应的训练声谱图;
    在所述卷积神经网络模型输入所述训练声谱图,计算所述卷积神经网络模型各层的输出;其中,卷积层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100036
    其中,l是当前层,Mj表示选择的输入特征图组合,
    Figure PCTCN2017107814-appb-100037
    是输入的第i种特征图l-1层的输出,
    Figure PCTCN2017107814-appb-100038
    是l层输入的第i种特征图和输出的第j种特征图之间连接所用的卷积核,
    Figure PCTCN2017107814-appb-100039
    是第j种特征图l层对应的加性偏置,f是激活函数;池化层的每一种输出的特征图xj
    Figure PCTCN2017107814-appb-100040
    其中,down表示下采样计算,这里的
    Figure PCTCN2017107814-appb-100041
    第j种特征图l层对应的乘性偏置,
    Figure PCTCN2017107814-appb-100042
    是第j种特征图l层对应的加性偏置;
    根据所述各层的输出对所述卷积神经网络模型各层进行误差反传更新,获取更新后的所述各层的权值;
    基于更新后的所述各层的权值,获取音频识别模型。
  19. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现如下步骤:
    获取用户同一驾驶场景的待识别图像数据和待识别音频数据,所述待识别图像数据和所述待识别音频数据与用户标识相关联;
    基于所述用户标识查询数据库,调用与所述用户标识相对应的人脸识别模型和音频识别模型,所述人脸识别模型和所述音频识别模型是采用权利要求1-4任一项所述驾驶模型训练方法获取的模型;
    基于所述待识别图像数据和所述人脸识别模型获取第一概率;
    基于所述待识别音频数据和所述音频识别模型获取第二概率;
    基于所述第一概率和所述第二概率确定所述用户本人开车的最终概率;
    若所述最终概率大于第二预设阈值,则确定为所述用户本人开车。
  20. 根据权利要求19所述的计算机可读存储介质,其特征在于,所述基于所述第一概率和所述第二概率确定所述驾驶人本人开车的最终概率,包括:
    获取所述第一概率和所述第二概率的概率差值;
    判断所述概率差值是否大于第一预设阈值;
    若所述概率差值大于所述第一预设阈值,则选取所述第一概率和所述第二概率中较大值作为所述最终概率输出;
    若所述概率差值不大于所述第一预设阈值,则选取所述第一概率和所述第二概率的均值作为所述最终概率输出。
PCT/CN2017/107814 2017-09-19 2017-10-26 驾驶模型训练方法、驾驶人识别方法、装置、设备及介质 WO2019056471A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710846204.0A CN107729986B (zh) 2017-09-19 2017-09-19 驾驶模型训练方法、驾驶人识别方法、装置、设备及介质
CN201710846204.0 2017-09-19

Publications (1)

Publication Number Publication Date
WO2019056471A1 true WO2019056471A1 (zh) 2019-03-28

Family

ID=61206543

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/107814 WO2019056471A1 (zh) 2017-09-19 2017-10-26 驾驶模型训练方法、驾驶人识别方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN107729986B (zh)
WO (1) WO2019056471A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163111A (zh) * 2019-04-24 2019-08-23 平安科技(深圳)有限公司 基于人脸识别的叫号方法、装置、电子设备及存储介质
CN110472813A (zh) * 2019-06-24 2019-11-19 广东浤鑫信息科技有限公司 一种校车站点自适应调整方法及系统
CN111259719A (zh) * 2019-10-28 2020-06-09 浙江零跑科技有限公司 一种基于多目红外视觉系统的驾驶室场景分析方法
CN112052551A (zh) * 2019-10-25 2020-12-08 华北电力大学(保定) 一种风机喘振运行故障识别方法及系统
CN112969053A (zh) * 2021-02-23 2021-06-15 南京领行科技股份有限公司 车内信息传输方法、装置、车载设备及存储介质
CN113034602A (zh) * 2021-04-16 2021-06-25 电子科技大学中山学院 一种朝向角度分析方法、装置、电子设备及存储介质
CN113051425A (zh) * 2021-03-19 2021-06-29 腾讯音乐娱乐科技(深圳)有限公司 音频表征提取模型的获取方法和音频推荐的方法
CN114571472A (zh) * 2020-12-01 2022-06-03 北京小米移动软件有限公司 用于足式机器人的地面属性检测方法及驱动方法及其装置
CN115391054A (zh) * 2022-10-27 2022-11-25 宁波均联智行科技股份有限公司 车机系统的资源分配方法及车机系统
CN116092059A (zh) * 2022-11-30 2023-05-09 南京通力峰达软件科技有限公司 一种基于神经网络的车联网用户驾驶行为识别方法及系统

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110383330A (zh) * 2018-05-30 2019-10-25 深圳市大疆创新科技有限公司 池化装置和池化方法
CN110766129A (zh) * 2018-07-27 2020-02-07 杭州海康威视数字技术股份有限公司 一种神经网络训练系统及显示数据的方法
US10846522B2 (en) * 2018-10-16 2020-11-24 Google Llc Speaking classification using audio-visual data
CN109543627B (zh) * 2018-11-27 2023-08-01 西安电子科技大学 一种判断驾驶行为类别的方法、装置、及计算机设备
CN109872415B (zh) * 2018-12-28 2021-02-02 北京理工大学 一种基于神经网络的车速估计方法及系统
CN109743382B (zh) * 2018-12-28 2021-04-20 北汽福田汽车股份有限公司 车辆的云服务系统及其交互方法
CN109685214B (zh) * 2018-12-29 2021-03-02 百度在线网络技术(北京)有限公司 一种驾驶模型训练方法、装置和终端设备
CN109800720B (zh) * 2019-01-23 2023-12-22 平安科技(深圳)有限公司 情绪识别模型训练方法、情绪识别方法、装置、设备及存储介质
CN110309904B (zh) * 2019-01-29 2022-08-09 广州红贝科技有限公司 一种神经网络压缩方法
CN110287346B (zh) * 2019-06-28 2021-11-30 深圳云天励飞技术有限公司 数据存储方法、装置、服务器及存储介质
CN110765868A (zh) * 2019-09-18 2020-02-07 平安科技(深圳)有限公司 唇读模型的生成方法、装置、设备及存储介质
CN111401828A (zh) * 2020-02-28 2020-07-10 上海近屿智能科技有限公司 一种强化排序的动态智能面试方法、装置、设备及计算机存储介质
CN111352086B (zh) * 2020-03-06 2022-08-02 电子科技大学 一种基于深度卷积神经网络的未知目标识别方法
CN113362070A (zh) * 2021-06-03 2021-09-07 中国工商银行股份有限公司 用于识别操作用户的方法、装置、电子设备和介质
CN113609956B (zh) * 2021-07-30 2024-05-17 北京百度网讯科技有限公司 训练方法、识别方法、装置、电子设备以及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090191513A1 (en) * 2008-01-24 2009-07-30 Nec Laboratories America, Inc. Monitoring driving safety using semi-supervised sequential learning
CN104361276A (zh) * 2014-11-18 2015-02-18 新开普电子股份有限公司 一种多模态生物特征身份认证方法及系统
CN104463201A (zh) * 2014-11-28 2015-03-25 杭州华为数字技术有限公司 一种识别驾驶状态、驾驶人的方法及装置
CN106203626A (zh) * 2016-06-30 2016-12-07 北京奇虎科技有限公司 汽车驾驶行为检测方法及装置、汽车

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2065842B1 (en) * 2007-11-28 2012-11-14 Honda Research Institute Europe GmbH Adaptive driver assistance system with robust estimation of object properties
US9663112B2 (en) * 2014-10-09 2017-05-30 Ford Global Technologies, Llc Adaptive driver identification fusion
CN105740767A (zh) * 2016-01-22 2016-07-06 江苏大学 一种基于脸部特征的驾驶员路怒症实时识别和预警方法
CN106127123B (zh) * 2016-06-16 2019-12-31 江苏大学 一种基于rgb-i的昼夜行车驾驶人员人脸实时检测方法
CN106650633A (zh) * 2016-11-29 2017-05-10 上海智臻智能网络科技股份有限公司 一种驾驶员情绪识别方法和装置
CN106910192B (zh) * 2017-03-06 2020-09-22 长沙全度影像科技有限公司 一种基于卷积神经网络的图像融合效果评估方法
CN107126224B (zh) * 2017-06-20 2018-02-06 中南大学 一种基于Kinect的轨道列车驾驶员状态的实时监测预警方法与系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090191513A1 (en) * 2008-01-24 2009-07-30 Nec Laboratories America, Inc. Monitoring driving safety using semi-supervised sequential learning
CN104361276A (zh) * 2014-11-18 2015-02-18 新开普电子股份有限公司 一种多模态生物特征身份认证方法及系统
CN104463201A (zh) * 2014-11-28 2015-03-25 杭州华为数字技术有限公司 一种识别驾驶状态、驾驶人的方法及装置
CN106203626A (zh) * 2016-06-30 2016-12-07 北京奇虎科技有限公司 汽车驾驶行为检测方法及装置、汽车

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CNN: "Deep learning paper notes 4, derivation and implementation of CNN convolutional neutral network", CSDN, 16 August 2013 (2013-08-16) *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163111A (zh) * 2019-04-24 2019-08-23 平安科技(深圳)有限公司 基于人脸识别的叫号方法、装置、电子设备及存储介质
CN110472813A (zh) * 2019-06-24 2019-11-19 广东浤鑫信息科技有限公司 一种校车站点自适应调整方法及系统
CN110472813B (zh) * 2019-06-24 2023-12-22 广东浤鑫信息科技有限公司 一种校车站点自适应调整方法及系统
CN112052551A (zh) * 2019-10-25 2020-12-08 华北电力大学(保定) 一种风机喘振运行故障识别方法及系统
CN112052551B (zh) * 2019-10-25 2023-05-02 华北电力大学(保定) 一种风机喘振运行故障识别方法及系统
CN111259719A (zh) * 2019-10-28 2020-06-09 浙江零跑科技有限公司 一种基于多目红外视觉系统的驾驶室场景分析方法
CN111259719B (zh) * 2019-10-28 2023-08-25 浙江零跑科技股份有限公司 一种基于多目红外视觉系统的驾驶室场景分析方法
CN114571472B (zh) * 2020-12-01 2024-01-23 北京小米机器人技术有限公司 用于足式机器人的地面属性检测方法及驱动方法及其装置
CN114571472A (zh) * 2020-12-01 2022-06-03 北京小米移动软件有限公司 用于足式机器人的地面属性检测方法及驱动方法及其装置
CN112969053A (zh) * 2021-02-23 2021-06-15 南京领行科技股份有限公司 车内信息传输方法、装置、车载设备及存储介质
CN113051425A (zh) * 2021-03-19 2021-06-29 腾讯音乐娱乐科技(深圳)有限公司 音频表征提取模型的获取方法和音频推荐的方法
CN113051425B (zh) * 2021-03-19 2024-01-05 腾讯音乐娱乐科技(深圳)有限公司 音频表征提取模型的获取方法和音频推荐的方法
CN113034602B (zh) * 2021-04-16 2023-04-07 电子科技大学中山学院 一种朝向角度分析方法、装置、电子设备及存储介质
CN113034602A (zh) * 2021-04-16 2021-06-25 电子科技大学中山学院 一种朝向角度分析方法、装置、电子设备及存储介质
CN115391054A (zh) * 2022-10-27 2022-11-25 宁波均联智行科技股份有限公司 车机系统的资源分配方法及车机系统
CN115391054B (zh) * 2022-10-27 2023-03-17 宁波均联智行科技股份有限公司 车机系统的资源分配方法及车机系统
CN116092059B (zh) * 2022-11-30 2023-10-20 南京通力峰达软件科技有限公司 一种基于神经网络的车联网用户驾驶行为识别方法及系统
CN116092059A (zh) * 2022-11-30 2023-05-09 南京通力峰达软件科技有限公司 一种基于神经网络的车联网用户驾驶行为识别方法及系统

Also Published As

Publication number Publication date
CN107729986A (zh) 2018-02-23
CN107729986B (zh) 2020-11-03

Similar Documents

Publication Publication Date Title
WO2019056471A1 (zh) 驾驶模型训练方法、驾驶人识别方法、装置、设备及介质
WO2020215672A1 (zh) 医学图像病灶检测定位方法、装置、设备及存储介质
US10402448B2 (en) Image retrieval with deep local feature descriptors and attention-based keypoint descriptors
CN108898086B (zh) 视频图像处理方法及装置、计算机可读介质和电子设备
WO2020143309A1 (zh) 分割模型训练方法、oct图像分割方法、装置、设备及介质
WO2019169688A1 (zh) 车辆定损方法、装置、电子设备及存储介质
CN111401516B (zh) 一种神经网络通道参数的搜索方法及相关设备
WO2019232851A1 (zh) 语音区分模型训练方法、装置、计算机设备及存储介质
CN112926410B (zh) 目标跟踪方法、装置、存储介质及智能视频系统
WO2019056497A1 (zh) 驾驶模型训练方法、驾驶人识别方法、装置、设备及介质
WO2019056498A1 (zh) 驾驶模型训练方法、驾驶人识别方法、装置、设备及介质
KR20170140214A (ko) 신경망을 위한 훈련 기준으로서의 필터 특이성
CN110879982B (zh) 一种人群计数系统及方法
CN107679997A (zh) 医疗理赔拒付方法、装置、终端设备及存储介质
CN109344731A (zh) 基于神经网络的轻量级的人脸识别方法
US10740901B2 (en) Encoder regularization of a segmentation model
CN110738650B (zh) 一种传染病感染识别方法、终端设备及存储介质
US20240153240A1 (en) Image processing method, apparatus, computing device, and medium
EP4350575A1 (en) Image classification method and related device thereof
CN112949519A (zh) 目标检测方法、装置、设备及存储介质
TWI803243B (zh) 圖像擴增方法、電腦設備及儲存介質
CN111281355B (zh) 一种用于确定脉搏采集位置的方法与设备
CN114820755B (zh) 一种深度图估计方法及系统
Zhang et al. PCANet: pyramid context-aware network for retinal vessel segmentation
TW202223770A (zh) 機器學習裝置以及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17925760

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.09.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17925760

Country of ref document: EP

Kind code of ref document: A1