CN113888777A

CN113888777A - Voiceprint unlocking method and device based on cloud machine learning

Info

Publication number: CN113888777A
Application number: CN202111051063.6A
Authority: CN
Inventors: 吴伟; 张嵘; 陈鑫; 孙嘉鹏
Original assignee: Nanjing Institute Of Jindun Public Security Technology Co ltd
Current assignee: Nanjing Institute Of Jindun Public Security Technology Co ltd
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2022-01-04
Anticipated expiration: 2041-09-08
Also published as: CN113888777B

Abstract

The invention discloses a voiceprint unlocking method based on cloud machine learning, which belongs to the technical field of voiceprint recognition and comprises the following steps: acquiring first recording information, and extracting through a probability density function to obtain a first voice representation; obtaining a general voiceprint background model as a background library by using an expectation maximization algorithm through a first voice characterization; acquiring second recording information, and extracting through a probability density function to obtain a second voice representation; comparing the second voice representation with the general voiceprint background model, and calculating the approximation degree of the second voice representation and the general voiceprint background model; and comparing the approximation degree value with a preset threshold value, judging whether the first recording information is matched with the second recording information according to a comparison result, and opening the lock if the first recording information is matched with the second recording information.

Description

Voiceprint unlocking method and device based on cloud machine learning

Technical Field

The invention relates to a voiceprint unlocking method and device based on cloud machine learning, and belongs to the technical field of voiceprint recognition.

Background

The lockset is the most important link of security protection, is a product which is just needed and is needed by every family, and the traditional mechanical lock is a large-key-volume high-reliability full-mechanical coded lock without electronic devices. It operates in a unique manner, similar to the dialing of an old telephone set-starting from the start of the dial, turning the dial clockwise to a certain number and then back to the start, a password is entered. The operation is repeated until the last password is input, and the dial is rotated anticlockwise from the starting point to unlock the lock. When the lock is unlocked, the inside is reset, so that the dial is retreated to the starting point to close the lock, the lock can be unlocked only by inputting the password again, and the problem of the reset inside does not need to be considered. If the password is wrongly input, the dial can be reset by rotating anticlockwise (virtual unlocking), and then the password is input again. However, the traditional mechanical lock is low in unlocking capability of the anti-theft technology, a key is easy to lose or even be copied, inconvenience is caused by forgetting the key in daily life, an unlocking method using voiceprint information as an unlocking secret key is urgently needed, and the problem that the traditional local voiceprint recognition training library is low in recognition rate due to the fact that training data are few exists.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a voiceprint unlocking method and device based on cloud machine learning, and solves the problem that a traditional local voiceprint recognition training library is low in recognition rate due to the fact that training data are few.

In order to achieve the purpose, the invention is realized by adopting the following technical scheme:

on one hand, the invention provides a voiceprint unlocking method based on cloud machine learning, which is applied to a main control board embedded in a lockset and comprises the following steps:

acquiring first recording information, and extracting through a probability density function to obtain a first voice representation;

uploading the first voice representation to a cloud server, wherein the cloud server is used for obtaining a general voiceprint background model as a background library by using an expectation maximization algorithm for the first voice representation and synchronizing the general voiceprint background model to a main control board;

acquiring second recording information, and extracting through a probability density function to obtain a second voice representation;

comparing the second voice representation with the universal voiceprint background model downloaded from the cloud server, and calculating the approximation degree of the second voice representation and the universal voiceprint background model;

and comparing the approximation degree value with a preset threshold value, judging whether the first recording information is matched with the second recording information according to a comparison result, and opening the lockset if the first recording information is matched with the second recording information.

Furthermore, the first recording information and the second recording information are recorded and acquired through a microphone arranged on the lock, and the microphone is connected with the main control panel.

Further, the audio content of the first recording information is a predetermined section of specific text or sound, and is stored in the main control board.

In a second aspect, the present invention provides a voiceprint unlocking method based on cloud machine learning, which is applied to a lock, and includes:

obtaining a general voiceprint background model as a background library by using an expectation maximization algorithm through a first voice characterization;

comparing the second voice representation with the general voiceprint background model, and calculating the approximation degree of the second voice representation and the general voiceprint background model;

Further, the second speech characterization is compared with the universal voiceprint background model by using a maximum posterior probability algorithm.

In a third aspect, the present invention provides a voiceprint unlocking device based on cloud machine learning, including:

the first extraction unit is used for acquiring first recording information and extracting a first voice representation through a probability density function;

the background library establishing unit is used for obtaining a general voiceprint background model as a background library through a first voice representation by using an expectation maximization algorithm;

the second extraction unit is used for acquiring second recording information and extracting a second voice representation through a probability density function;

a comparison calculation unit; comparing the second voice representation with the general voiceprint background model, and calculating the approximation degree of the second voice representation and the general voiceprint background model;

and the comparison unit is used for comparing the approximation degree value with a preset threshold value, judging whether the first recording information is matched with the second recording information according to a comparison result, and opening the lockset if the first recording information is matched with the second recording information.

In a fourth aspect, the present invention provides a voiceprint unlocking system based on cloud machine learning, which includes a lock, and further includes:

the microphone is arranged on the lockset and used for recording the speaking recording information and sending the recording information to the main control board;

the master control board is embedded in the lockset and used as an algorithm program carrier, is connected with an external microphone, acquires recording information, obtains voice representation through probability density function extraction, uploads the voice representation to the cloud server, and accesses the Internet to download a universal voiceprint background model from the cloud server at regular intervals; comparing the extracted voice representation with the downloaded general voiceprint background model, calculating the approximation degree of the voice representation and the downloaded general voiceprint background model, comparing the approximation degree value with a preset threshold value, judging whether the first recording information is matched with the second recording information according to the comparison result, and opening a lock if the first recording information is matched with the second recording information;

and the cloud server is used for obtaining a general voiceprint background model as a background library through the received voice representation as training data and an expectation maximization algorithm, and synchronizing the general voiceprint background model to the main control board.

Furthermore, a storage module is arranged on the main control board and used for storing the received audio information.

In a fifth aspect, the invention provides a voiceprint unlocking device based on cloud machine learning, which comprises a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any of the above.

In a sixth aspect, the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.

Compared with the prior art, the invention has the following beneficial effects: compared with the traditional local voiceprint recognition training library, the expectation maximization algorithm and the general voiceprint background model are deployed at the cloud end of the server, training is carried out through data uploaded by each terminal, the data are asynchronously issued to the main control boards of each access network, and data of other speakers are used for each speaker to carry out pre-training, so that the problem of low recognition rate caused by the fact that the traditional local voiceprint recognition training library is few in training data is solved.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a functional block diagram of the method of the present invention;

FIG. 3 is a speech characterization training block diagram.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

Example 1

This embodiment introduces a voiceprint unlocking method based on cloud machine learning, including: be applied to and inlay in inside main control board of tool to lock, include:

The first recording information and the second recording information are recorded and obtained through a microphone arranged on the lock.

As shown in fig. 1 to fig. 3, the application process of the voiceprint unlocking method based on cloud machine learning provided in this embodiment specifically involves the following steps:

firstly, a user inputs voice through a microphone, the voice is transmitted to a main control board, and voice representation is extracted through a probability density function;

a speaker wakes up the main control panel to record audio through a microphone by operating a touch, a switch or a button and the like, the audio content is a pre-appointed section of specific characters or sound, the main control panel stores the audio in a cache, and the audio is voiceprint information.

The probability density function is a linear combination of weighted summation of M component densities and a plurality of Gaussian distribution functions, and can theoretically fit any type of distribution of voice characterization, and the formula is as follows:

where x is a random vector of dimension d and λ is the parameter set { λ ] of the Gaussian mixture model₁,...,λ₂,...,λ_M},λ_i＝(w_i,μ_i,Σ_i),i∈[1,…,M],w_iIs the weight component of the mixture of the weight components,

is a probability density function of the ith d-dimensional Gaussian component, mu_i,Σ_iTheir mean and variance, respectively; the probability density function of the gaussian component is:

and the data obtained after the function operation is used for describing the distribution of the data points/characteristic points, namely the voice characterization.

Secondly, the main control board is used as a terminal device to upload the recognized voice representation to a cloud server through mobile data or WiFi, the cloud server is used as training data through the voice representation, and a general voiceprint background model is obtained through an expectation maximization algorithm and used as a background library;

the expectation-maximization algorithm is an algorithm that finds the parameter maximum likelihood estimate or the maximum a posteriori probability in a probabilistic model that relies on hidden variables that cannot be observed.

The expectation maximization algorithm is divided into two steps:

first, solving the posterior probability by fixed parameters, fixing mu and sigma, and calculating the posterior probability distribution p (z)⁽ⁿ⁾|x⁽ⁿ⁾) The formula is as follows:

wherein mu is mean value of Gaussian distribution, sigma is variance of Gaussian distribution, k represents kth Gaussian distribution, and pi_kRepresents the weight coefficient of the k-th Gaussian distribution and satisfies pi_k≥0，

I.e. the prior probability that sample x is generated by the kth gaussian distribution; n represents the nth sample; gamma is posterior distribution; x is the number of⁽ⁿ⁾Representing n training samples generated by a Gaussian mixture model; z is a radical of⁽ⁿ⁾Indicating from which gaussian it came.

Using the above formula gamma_nkAnd expressing the posterior probability of the nth sample to the kth Gaussian distribution, wherein the posterior probability is expressed by an NxK matrix, N is a sample set, and K is a Gaussian distribution set.

The second step is that: fixing the posterior probability, optimizing parameters corresponding to the lower bound of the solution evidence, and maximizing marginal likelihood p (x | mu, sigma) under the condition of known posterior probability, namely maximizing log likelihood (probability), wherein the formula is as follows:

wherein q (z) is p (z)⁽ⁿ⁾K), ELBO is defined as the lower bound of the function by an inequality, so the lower bound of its log-likelihood is further solved as follows:

wherein D is a training set, gamma is posterior distribution, and the Lagrangian method is utilized to solve the following updated conclusion:

wherein

And obtaining a universal voiceprint background model consisting of a plurality of voice representations through the expectation maximization algorithm, asynchronously sending the model to a main control board which is communicated with the cloud server at present, and updating the voiceprint background model on the main control board so as to reduce the recognition error value.

Step three, the main control board uses the current voice representation to compare with the universal voiceprint background model downloaded from the cloud server by using a maximum posterior probability algorithm,

the self-adaptive process of the maximum posterior probability algorithm mainly comprises three steps:

first, estimating speaker voice data based on sufficient statistics of each Gaussian component in the universal voiceprint background model, wherein for speaker voice characterization, sufficient statistics comprise frequency numbers (N) of observation sequences from each component i_i) First order (E)_i(X), mean expectation) and second order (S)_i(X), mean expectation) moment to compute the weight, mean and variance of the gaussian mixture model.

Second, using data-dependent mixing parameters, i.e. hereinafter

And combining the sufficient statistics of the new estimation and the sufficient statistics of the universal voiceprint background model to obtain the final voice characteristic parameter estimation.

The specific process is as follows:

given a generic voiceprint background model and a speaker-specific observation sequence X ═ X₁,…,x_TThe division model is used as the division model;

where component i is paired with observation x_tResponse speed of (2), observed data x_tProbability of the ith component from the generic voiceprint background model:

wherein, w_iIs a mixing weight component; mu.s_t、Σ_tTheir mean and variance, respectively; g is the probability density of the gaussian component; m is the number of component densities.

Using Pr (i | x)_t,λ_i) And (3) calculating sufficient statistics, wherein the sum of the probabilities of the T observation sequence vectors from the component i is as follows:

the mean of the T observation sequence vectors from component i is expected to be:

the variance of the T observation sequence vectors from component i is expected to be:

and thirdly, updating parameters (weight, mean value and variance) of the mixed components by using the sufficient statistics obtained in the second step:

for each parameter of the Gaussian mixture component, the data depends on the mixture parameter

Is defined as follows:

r^ρis a fixed correlation factor based on p, typically using the same a update parameter, i.e.

Experiments show that the value range of r is (8-20) effective, the self-adaptive process only updates the mean value with the best effect, and the actual system has the best effect

And γ is only a normalization factor to ensure that the sum of the updated weight parameters is 1, so it traverses according to the component i

Comparing the similarity degree value with the preset threshold value with the similarity degree represented by the voice to obtain whether the speaker is matched, and opening the lockset if the speaker is matched; the speech characterization and the absolute value of the difference are the approximation degree, and the smaller the difference is, the more similar the speech characterization and the absolute value are.

Example 2

The embodiment provides a voiceprint unlocking method based on cloud machine learning, which is applied to a lockset and comprises the following steps:

Example 3

This embodiment provides a voiceprint unlocking means based on high in the clouds machine learning, includes:

Example 4

This embodiment provides a voiceprint system of unblanking based on high in clouds machine learning, including the tool to lock, still include:

Example 5

The embodiment provides a voiceprint unlocking device based on cloud machine learning, which comprises a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of:

Example 6

The present embodiments provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of any of the methods:

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. The utility model provides a voiceprint unlocking method based on cloud machine learning, its characterized in that is applied to the main control board that inlays in the inside tool to lock, includes:

2. The cloud machine learning-based voiceprint unlocking method according to claim 1, wherein the voiceprint unlocking method comprises the following steps: the first recording information and the second recording information are recorded and obtained through a microphone arranged on the lock, and the microphone is connected with the main control panel.

3. The cloud machine learning-based voiceprint unlocking method according to claim 1, wherein the voiceprint unlocking method comprises the following steps: the audio content of the first recording information is a predetermined section of specific characters or sound, and is stored in the main control board.

4. The utility model provides a voiceprint unlocking method based on cloud machine learning, which is characterized in that, is applied to the tool to lock, includes:

5. The cloud machine learning-based voiceprint unlocking method according to claim 4, wherein the voiceprint unlocking method comprises the following steps: the second speech characterization is compared with the generic voiceprint background model using a maximum a posteriori probability algorithm.

6. The utility model provides a voiceprint unlocking means based on high in the clouds machine learning which characterized in that includes:

7. The utility model provides a voiceprint system of unblanking based on high in clouds machine learning, includes the tool to lock, its characterized in that still includes:

8. The cloud machine learning-based voiceprint unlocking method according to claim 6, wherein the voiceprint unlocking method comprises the following steps: and the main control board is provided with a storage module for storing the received audio information.

9. The utility model provides a voiceprint unlocking means based on high in the clouds machine learning which characterized in that: comprising a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 4 to 5.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the program when executed by a processor implements the steps of the method of any one of claims 4 to 5.