WO2018149077A1 - 声纹识别方法、装置、存储介质和后台服务器 - Google Patents
声纹识别方法、装置、存储介质和后台服务器 Download PDFInfo
- Publication number
- WO2018149077A1 WO2018149077A1 PCT/CN2017/090046 CN2017090046W WO2018149077A1 WO 2018149077 A1 WO2018149077 A1 WO 2018149077A1 CN 2017090046 W CN2017090046 W CN 2017090046W WO 2018149077 A1 WO2018149077 A1 WO 2018149077A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- test
- voiceprint feature
- voiceprint
- target
- user
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000012360 testing method Methods 0.000 claims abstract description 166
- 230000000694 effects Effects 0.000 claims abstract description 28
- 230000007246 mechanism Effects 0.000 claims abstract description 20
- 239000000203 mixture Substances 0.000 claims description 60
- 238000012549 training Methods 0.000 claims description 42
- 238000004422 calculation algorithm Methods 0.000 claims description 34
- 230000009467 reduction Effects 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 24
- 238000000605 extraction Methods 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 18
- 238000001514 detection method Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 14
- 238000001228 spectrum Methods 0.000 description 10
- 238000000556 factor analysis Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 210000003484 anatomy Anatomy 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000010183 spectrum analysis Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/32—User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/39—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using genetic algorithms
Definitions
- the present invention relates to the field of biometric identification technology, and more particularly to a voiceprint recognition method, apparatus, storage medium, and backend server.
- Voiceprint Recognition is an identification technique that identifies a speaker based on the biometric characteristics of the speaker contained in the sound. Because voiceprint recognition is safe and reliable, it can be used in almost all areas of security protection and personalized applications where identification is required. For example, the business volume of financial institutions such as banks, securities, insurance, etc. continues to expand, resulting in a large number of identification needs. Compared with traditional identification technology, the advantage of voiceprint recognition is that the voiceprint extraction process is simple and low-cost, and each person's voiceprint features are different from other people's voiceprint features, unique, and difficult to forge and counterfeit. Because voiceprint recognition is safe, reliable, and convenient, it is widely used in applications where identification is required. However, the existing voiceprint recognition process takes a long time. When processing a large number of voice recognition requests, it is easy to cause partial voice recognition requests to be lost due to excessive processing time, which affects the application of voiceprint recognition technology.
- the technical problem to be solved by the present invention is to provide a voiceprint recognition method, device, storage medium and background server for the defects of the prior art, which can improve the processing efficiency of a large number of voice recognition requests and shorten the processing time.
- a voiceprint recognition method comprising:
- the client collects the test voice of the user, and sends a voice recognition request to the background server, where the voice recognition request includes the user ID and the test voice;
- the background server receives the voice recognition request, and uses a message queue and an asynchronous mechanism to determine a voice recognition request to be processed;
- the background server acquires a target voiceprint feature corresponding to the user ID of the to-be-processed voice recognition request, and acquires a test voiceprint feature corresponding to the test voice of the to-be-processed voice recognition request;
- the client receives and displays the judgment result.
- the invention also provides a voiceprint recognition device, comprising:
- a client configured to collect a test voice of the user, and send a voice recognition request to the background server, where the voice recognition request includes a user ID and the test voice;
- a background server configured to receive the voice recognition request, and determine a to-be-processed voice recognition request by using a message queue and an asynchronous mechanism
- a background server configured to acquire a target voiceprint feature corresponding to the user ID of the to-be-processed voice recognition request, and acquire a test voiceprint feature corresponding to the test voice of the to-be-processed voice recognition request;
- a background server configured to determine, according to the target voiceprint feature and the test voiceprint feature, whether to correspond to the same user, and output a determination result to the client;
- the client is configured to receive and display the judgment result.
- the present invention also provides a background server comprising a memory, a processor, and a computer program stored in the memory and operable on the processor, the processor implementing the voiceprint recognition when the computer program is executed The steps performed by the background server in the method.
- the present invention also provides a computer readable storage medium storing a computer program that, when executed by a processor, implements the steps performed by a background server in the voiceprint recognition method described above.
- the background server acquires a corresponding target voiceprint feature based on the user ID in the voice recognition request to be processed, and is based on the voice to be processed. Identifying the test voice in the request to obtain the test voiceprint feature, and comparing the target voiceprint feature with the test voiceprint feature to determine whether the target voiceprint feature and the speaker of the test voiceprint feature are the same user, and the fast voice recognition can be achieved. The effect is to improve the efficiency of speech recognition.
- the background server uses the message queue and the asynchronous mechanism to determine the pending speech recognition request, so as to improve the processing efficiency of a large number of speech recognition requests, and avoid partial speech recognition requests being lost due to excessive processing time.
- Embodiment 1 is a flow chart of a voiceprint recognition method in Embodiment 1 of the present invention.
- FIG. 2 is a schematic block diagram of a voiceprint recognition apparatus in Embodiment 2 of the present invention.
- FIG. 3 is a schematic diagram of a background server according to an embodiment of the present invention.
- Fig. 1 shows a flow chart of the voiceprint recognition method in this embodiment.
- the voiceprint recognition method can be applied on the client and the background server to identify the test voice collected by the client.
- the voiceprint recognition method includes the following steps:
- the client collects the test voice of the user, and sends a voice recognition request to the background server, where the voice recognition request includes the user ID and the test voice.
- the client includes a terminal connected to the background server, such as a smart phone, a notebook, a desktop computer, etc., and the client has a microphone for collecting test voice or an external microphone interface.
- the user ID is used to uniquely identify the user identity.
- the test voice is associated with the user ID, and is used to determine the user corresponding to the test voice.
- the client samples and records the user, obtains the test voice in the wav audio format, forms a voice recognition request according to the test voice and the user ID, and sends the voice recognition request to the background server.
- test voice is collected by using a multi-threading method; when the client is the webpage end, the test voice is collected by using the Ajax asynchronous refresh method, so as to achieve the communication with the background server, the user operation is not interrupted.
- Ajax Asynchronous JavaScript and XML
- asynchronous JavaScript and XML is a web application development method that uses client-side scripts to exchange data with a web server.
- the background server receives the voice recognition request, and uses the message queue and the asynchronous mechanism to determine the voice recognition request to be processed.
- the background server receives at least one voice recognition request sent by the client, and At least one voice recognition request is placed in the message queue to wait.
- the background server uses an asynchronous mechanism to schedule at least one voice recognition request in the message queue, so that when the background server processes each message in the message queue, the sender and the receiver are independent of each other without waiting for the other party to respond.
- the background server can receive a large number of voice recognition requests at the same time, and avoid processing time of any pending voice recognition request is too long. A large number of other speech recognition requests are lost.
- the message queue and asynchronous mechanism can also be used to build a distributed system in the background server, which can improve the peak processing capability and flexibility of the voice recognition request, reduce the coupling degree between processes, and ensure that each voice recognition request can be deal with.
- the background server acquires a target voiceprint feature corresponding to the user ID of the voice recognition request to be processed, and acquires a test voiceprint feature corresponding to the test voice of the voice recognition request to be processed.
- the target voiceprint feature is a voiceprint feature of the user corresponding to the user ID stored in advance in the background server.
- the test voiceprint feature is the voiceprint feature corresponding to the test voice in the voice request.
- Voiceprint is a sound wave spectrum that carries speech information displayed by electroacoustic instruments.
- voiceprint features include, but are not limited to, acoustic features related to the anatomical structure of the human's pronunciation mechanism, such as spectrum, cepstrum, formants, pitch, reflection coefficients, etc.), nasal sounds, deep breath sounds, hoarse sounds, Laughter and so on.
- the target voiceprint feature and the test voiceprint feature are preferably I-vector (ie, identification vector) features.
- I-vector ie, identification vector
- any I-vector feature can be obtained by the I-vector algorithm, and the i-vertor algorithm is an estimated recession.
- the method of quantity using a fixed-length low-dimensional vector to represent a piece of speech.
- the variance between the class and the class is not considered separately, but it is placed in a subspace, that is, the total variable space. (total variablity space) is considered to allow unsupervised methods for training, and to remove language-independent information in the total variable space, while maintaining dimensionality and denoising while maximizing the acoustic information associated with the language. .
- step S30 specifically includes the following steps:
- S31 Query the voiceprint feature library according to the user ID of the voice recognition request to be processed, to acquire a target voiceprint feature corresponding to the user ID of the voice recognition request to be processed.
- At least one set of user IDs and target voiceprint features associated with the user IDs are pre-stored in the voiceprint feature library to facilitate searching for corresponding target voiceprint features based on the user IDs in the pending voice recognition request.
- GMM-UBM Gaussian Mixture Model-Universal Background Model
- the speaker uses his own voice to reflect the pronunciation condition not included in the model, and uses the speaker feature distribution that is not related to the speaker to approximate the description, which has the characteristics of high recognition rate.
- the background server puts the received voice recognition request into the message queue and waits.
- the pending voice recognition request is taken from the message queue and sent to the background Servlet container for processing, and the Servlet container creates an HttpRequest object, which will be sent.
- the information sent is encapsulated into this object, and an HttpResponse object is created.
- the HttpRequest and HttpResponse are passed as parameters to the HttpServlet object, the service method of the HttpServlet object is called, and the Gaussian mixture model-general background model is used to process the test speech. To obtain test voiceprint features.
- the background server determines whether it corresponds to the same user according to the target voiceprint feature and the test voiceprint feature, and outputs the judgment result to the client.
- the target voiceprint feature is a voiceprint feature previously associated with the user ID stored in the voiceprint feature library
- the test voiceprint feature is a voiceprint feature corresponding to the test voice associated with the user ID collected by the client, if When two identical or approximate degrees reach a preset similar threshold, the two can be considered as the same user to output to the client the judgment result that the two are the same user or not the same user.
- step S40 specifically includes the following steps:
- the PLDA algorithm is used to reduce the dimension of the target voiceprint feature and the test voiceprint feature, and obtain the target dimension reduction value and the test dimension reduction value.
- the PLDA (Probabilistic Linear Discriminant Analysis) algorithm is a channel compensation algorithm.
- PLDA is based on the I-vector feature. Because the I-vector feature contains both speaker difference information and channel difference information, and we only care about speaker information, channel compensation is required.
- the channel compensation capability of the PLDA algorithm is better than the LDA algorithm.
- the PLDA algorithm specifically includes the following steps:
- ⁇ is the mean voiceprint vector
- W is the distance between classes
- w is the voiceprint feature
- i is the number of iterations.
- S42 Perform a cosine measure on the target dimension reduction value and the test dimension reduction value by using a cosine measure function to obtain a cosine measure value.
- the cosine measure function includes: Where w train is the target voiceprint feature, w test is the test voiceprint feature, and t is time.
- the cosine measure function can be used to easily measure the distance between the target voiceprint feature and the test voiceprint feature.
- the cosine measure function is simpler to calculate. The effect is more direct and effective.
- S43 Determine whether the cosine measure value is greater than a similar threshold; if yes, the same user; if not, the same user.
- S50 The client receives and displays the judgment result.
- the judgment result may be that the test voiceprint feature corresponding to the test voice is the same as the speaker of the target voiceprint feature saved in the voiceprint feature database, or is not the judgment result of the same user.
- the background server acquires a corresponding target voiceprint feature based on the user ID in the voice recognition request to be processed, and acquires a test voiceprint feature based on the test voice in the voice recognition request to be processed, and The target voiceprint feature is compared with the test voiceprint feature to determine whether the target voiceprint feature and the speaker of the test voiceprint feature are the same user, which can achieve fast speech recognition effect and improve speech recognition efficiency.
- the background server uses the message queue and the asynchronous mechanism to determine the pending speech recognition request, so as to improve the processing efficiency of a large number of speech recognition requests, and avoid partial speech recognition requests being lost due to excessive processing time.
- the voiceprint recognition method further includes the following steps:
- S51 Perform MFCC feature extraction on the training speech to obtain MFCC acoustic features.
- MFCC Mel Frequency Cepstrum Coefficients, Mel frequency cepstrum coefficient.
- the process of performing MFCC feature extraction on training speech includes: pre-emphasizing, framing, and windowing the training speech; and obtaining a corresponding spectrum by FFT (Fast Fourier Transform) for each short-time analysis window.
- FFT Fast Fourier Transform
- the above spectrum is passed through the Mel filter group to obtain the Mel frequency; the inverse spectrum analysis is performed on the Mel spectrum (take the logarithm and inverse transform, and the actual inverse transform is generally realized by DCT discrete cosine transform, taking the second after DCT) From the 13th coefficient to the MFCC coefficient, the Mel frequency cepstral coefficient MFCC is obtained, thereby obtaining the MFCC acoustic characteristics.
- S52 Perform voice activity detection on the MFCC acoustic features, and estimate Gaussian mixture model parameters.
- the voice activity detection uses the Voice Activity Detection (VAD) algorithm to perform voice and noise judgment on different characteristics of voice and noise, and detects the voice signal segment and the noise signal segment from the digital signals obtained by continuous sampling.
- VAD Voice Activity Detection
- the parameter set of the Gaussian Mixture Model (GMM model) is estimated by the MFCC acoustic feature of the speech signal segment.
- the voice activity detection algorithm is used to calculate the speech feature parameters such as short-time energy, short-time zero-crossing rate, and short-time autocorrelation, thereby removing the mute signal and the non-speech signal, and preserving the non-silent speech signal to estimate the Gaussian mixture model parameter.
- the zeroth order, first order, and second order quantities of the MFCC acoustic features are used to estimate the parameters of the Gaussian mixture model.
- S53 The general background model is trained by using Gaussian mixture model parameters to obtain a Gaussian mixture model-general background model.
- the Gaussian mixture model parameters are analyzed by a general background model to obtain a Gaussian mixture model-general background model.
- the factor analysis algorithm is used to analyze the acoustic features represented by the Gaussian mixture model, and the mean vector (mean) of the acoustic features is separated from the voiceprint difference vector (balance) to obtain the I-vector feature.
- the factor analysis algorithm can separate the voiceprint difference vector between different voices, and it is easier to extract the voiceprint specificity between different voices.
- S54 Receive a voiceprint registration request, the voiceprint registration request includes a user ID and target training voice.
- the client receives the voiceprint registration request input by the user, and sends the voiceprint registration request to the server, and the server receives the voiceprint registration request.
- the Gaussian mixture model-general background model is used to extract the feature of the target training speech to obtain the target voiceprint feature.
- the server uses the trained Gaussian mixture model-general background model to perform feature extraction on the target training speech to obtain the target voiceprint feature.
- MFCC feature extraction is performed on the target training speech to obtain the corresponding target MFCC acoustic features, then the target MFCC acoustic features are detected for the speech activity, and then the active speech detected MFCC acoustic features are put into the trained Gaussian mixture model-
- the general background model performs feature extraction to obtain the target voiceprint features.
- S56 Store the user ID and the target voiceprint feature in the voiceprint feature library.
- the user ID in the voiceprint registration request and the target voiceprint feature acquired based on the target training voice are stored in the voiceprint feature library, so that when the user identity needs to be performed, the corresponding call may be performed based on the user ID.
- the target voiceprint feature is stored in the voiceprint feature library, so that when the user identity needs to be performed, the corresponding call may be performed based on the user ID.
- the MFCC feature extraction and the voice activity detection are performed on the training speech, the Gaussian mixture model parameters are estimated, and the general background model is trained by using the Gaussian mixture model parameters to obtain the trained Gaussian mixture model-general background model.
- the Gaussian mixture model-general background model has the advantage of high recognition rate.
- Receiving the voiceprint registration request extracting the target training voice in the voiceprint registration request through the trained Gaussian mixture model-general background model to obtain the target voiceprint feature, and storing the target voiceprint feature and the user ID in the In the voiceprint feature library, in order to facilitate the speech recognition process, Obtaining a corresponding target voiceprint feature based on the user ID in the to-be-processed voice recognition request, and comparing with the test voiceprint feature to determine whether the target voiceprint feature and the test voiceprint feature speaker are the same user to achieve voice recognition effect.
- Fig. 2 is a flow chart showing the voiceprint recognition method in the embodiment.
- the voiceprint recognition device includes a client and a background server, and can perform identification on the test voice collected by the client.
- the voiceprint recognition apparatus includes a client 10 and a background server 20.
- the client 10 is configured to collect test voices of the user, and send a voice recognition request to the background server, where the voice recognition request includes a user ID and a test voice.
- the client 10 includes a terminal connected to the background server, such as a smart phone, a notebook, a desktop computer, etc., and the client has a microphone for collecting test voice or an external microphone interface.
- the user ID is used to uniquely identify the user identity.
- the test voice is associated with the user ID, and is used to determine the user corresponding to the test voice.
- the client samples and records the user, obtains the test voice in the wav audio format, forms a voice recognition request according to the test voice and the user ID, and sends the voice recognition request to the background server.
- test voice is collected by using a multi-threading method; when the client is the webpage end, the test voice is collected by using the Ajax asynchronous refresh method, so as to achieve the communication with the background server, the user operation is not interrupted.
- Ajax Asynchronous JavaScript and XML
- asynchronous JavaScript and XML is a web application development method that uses client-side scripts to exchange data with a web server.
- the background server 20 is configured to receive a voice recognition request, using message queue and asynchronous The mechanism determines the speech recognition request to be processed.
- the background server 20 receives the voice recognition request sent by at least one client, and puts at least one voice recognition request into the message queue to wait.
- the background server uses an asynchronous mechanism to schedule at least one voice recognition request in the message queue, so that when the background server processes each message in the message queue, the sender and the receiver are independent of each other without waiting for the other party to respond.
- the background server can receive a large number of voice recognition requests at the same time, and avoid processing time of any pending voice recognition request is too long. A large number of other speech recognition requests are lost.
- the message queue and asynchronous mechanism can also be used to build a distributed system in the background server, which can improve the peak processing capability and flexibility of the voice recognition request, reduce the coupling degree between processes, and ensure that each voice recognition request can be deal with.
- the background server 20 is configured to acquire a target voiceprint feature corresponding to the user ID of the voice recognition request to be processed, and acquire a test voiceprint feature corresponding to the test voice of the voice recognition request to be processed.
- the target voiceprint feature is a voiceprint feature of the user corresponding to the user ID stored in advance in the background server.
- the test voiceprint feature is the voiceprint feature corresponding to the test voice in the voice request.
- Voiceprint is a sound wave spectrum that carries speech information displayed by electroacoustic instruments.
- voiceprint features include, but are not limited to, acoustic features related to the anatomical structure of the human's pronunciation mechanism, such as spectrum, cepstrum, formants, pitch, reflection coefficients, etc.), nasal sounds, deep breath sounds, hoarse sounds, Laughter and so on.
- the target voiceprint feature and the test voiceprint feature are preferably I-vector (ie, Identify vector, identification vector) feature.
- I-vector ie, Identify vector, identification vector
- any I-vector feature can be obtained by the I-vector algorithm.
- the i-vertor algorithm is a method for estimating hidden variables. A fixed-length low-dimensional vector is used to represent a segment of speech, and I-vector feature extraction is performed.
- the variance between the class and the class is not considered separately, but it is considered in a subspace, that is, the total variable space (total variablity space), so that it can be trained in an unsupervised way and can be removed. Language-independent information in the total variable space minimizes the acoustic information associated with the language while reducing dimensionality and denoising.
- the background server 20 includes a feature query unit 211 and a feature processing unit 212.
- the feature query unit 211 is configured to query the voiceprint feature library according to the user ID of the voice recognition request to be processed to acquire the target voiceprint feature corresponding to the user ID of the voice recognition request to be processed.
- At least one set of user IDs and target voiceprint features associated with the user IDs are pre-stored in the voiceprint feature library to facilitate searching for corresponding target voiceprint features based on the user IDs in the pending voice recognition request.
- the feature processing unit 212 is configured to process the test voiceprint feature to be processed by the Gaussian Mixture Model-Universal Background Model to process the voice recognition request to obtain the test voiceprint feature corresponding to the test voice of the voice recognition request to be processed.
- GMM-UBM Gaussian Mixture Model-Universal Background Model
- the background server 20 puts the received voice recognition request into the message queue to wait.
- the pending voice recognition request is taken from the message queue and sent to the background Servlet container for processing, and the Servlet container creates an HttpRequest object, which will be sent.
- the information coming in is encapsulated into this object, and an HttpResponse object is created.
- the HttpRequest and HttpResponse are passed as parameters to the HttpServlet object, the service method of the HttpServlet object is called, and the Gaussian mixture model-general background model is used to process the test speech. To obtain test voiceprint features.
- the background server 20 determines whether it corresponds to the same user according to the target voiceprint feature and the test voiceprint feature, and outputs the judgment result to the client.
- the target voiceprint feature is a voiceprint feature previously associated with the user ID stored in the voiceprint feature library
- the test voiceprint feature is a voiceprint feature corresponding to the test voice associated with the user ID collected by the client, if When two identical or approximate degrees reach a preset similar threshold, the two can be considered as the same user to output to the client the judgment result that the two are the same user or not the same user.
- the background server 20 specifically includes a feature dimension reduction unit 221, a cosine measure processing unit 222, and a user recognition determination unit 223.
- the feature dimension reduction unit 221 is configured to perform dimension reduction on the target voiceprint feature and the test voiceprint feature by using the PLDA algorithm, and obtain the target dimension reduction value and the test dimension reduction value.
- the PLDA (Probabilistic Linear Discriminant Analysis) algorithm is a channel compensation algorithm.
- PLDA is based on I-vector features because I-vector The feature contains the speaker difference information and the channel difference information, and we only care about the speaker information, so channel compensation is needed.
- the channel compensation capability of the PLDA algorithm is better than the LDA algorithm.
- the PLDA algorithm specifically includes the following steps:
- ⁇ is the mean voiceprint vector
- W is the distance between classes
- w is the voiceprint feature
- i is the number of iterations.
- the cosine measure processing unit 222 is configured to perform a cosine measure on the target dimension reduction value and the test dimension reduction value by using a cosine measure function to obtain a cosine measure value.
- the cosine measure function includes: Where w train is the target voiceprint feature, w test is the test voiceprint feature, and t is time.
- the cosine measure function can be used to easily measure the distance between the target voiceprint feature and the test voiceprint feature.
- the cosine measure function is simpler to calculate. The effect is more direct and effective.
- the user identification determining unit 223 is configured to determine whether the cosine measure value is greater than a similar threshold; If yes, it is the same user; if not, it is not the same user.
- the client 10 is configured to receive and display the judgment result.
- the judgment result may be that the test voiceprint feature corresponding to the test voice is the same as the speaker of the target voiceprint feature saved in the voiceprint feature database, or is not the judgment result of the same user.
- the background server acquires a corresponding target voiceprint feature based on the user ID in the voice recognition request to be processed, and acquires a test voiceprint feature based on the test voice in the voice recognition request to be processed, and The target voiceprint feature is compared with the test voiceprint feature to determine whether the target voiceprint feature and the speaker of the test voiceprint feature are the same user, which can achieve fast speech recognition effect and improve speech recognition efficiency.
- the background server uses the message queue and the asynchronous mechanism to determine the pending speech recognition request, so as to improve the processing efficiency of a large number of speech recognition requests, and avoid partial speech recognition requests being lost due to excessive processing time.
- the voiceprint recognition apparatus further includes an acoustic feature extraction unit 231, a voice activity detection unit 232, a model training unit 233, a registration voice receiving unit 234, a target voiceprint feature acquisition unit 235, and a target voiceprint feature storage.
- Unit 236 the voiceprint recognition apparatus.
- the acoustic feature extraction unit 231 is configured to perform MFCC feature extraction on the training speech to obtain the MFCC acoustic feature.
- MFCC Mel Frequency Cepstrum Coefficients, Mel frequency cepstrum coefficient.
- the process of performing MFCC feature extraction on training speech includes: pre-emphasizing, framing, and windowing the training speech; and obtaining a corresponding spectrum by FFT (Fast Fourier Transform) for each short-time analysis window.
- FFT Fast Fourier Transform
- the above spectrum is passed through the Mel filter group to obtain the Mel frequency; the inverse spectrum analysis is performed on the Mel spectrum (take the logarithm and inverse transform, and the actual inverse transform is generally realized by DCT discrete cosine transform, taking the second after DCT) From the 13th coefficient to the MFCC coefficient, the Mel frequency cepstral coefficient MFCC is obtained, thereby obtaining the MFCC acoustic characteristics.
- the voice activity detecting unit 232 is configured to perform voice activity detection on the MFCC acoustic feature and estimate a Gaussian mixture model parameter.
- the voice activity detection uses the Voice Activity Detection (VAD) algorithm to perform voice and noise judgment on different characteristics of voice and noise, and detects the voice signal segment and the noise signal segment from the digital signals obtained by continuous sampling.
- VAD Voice Activity Detection
- the MFCC acoustic characteristics of the speech signal segment are taken as the parameter set of the Gaussian Mixture Model (GMM model for short).
- GMM model Gaussian Mixture Model
- the voice activity detection algorithm is used to calculate the speech feature parameters such as short-time energy, short-time zero-crossing rate, and short-time autocorrelation, thereby removing the mute signal and the non-speech signal, and the non-silent speech signal is reserved for estimating the Gaussian mixture model parameter.
- the parameters of the Gaussian mixture model are estimated using the zero-order, first-order, and second-order quantities of the acoustic characteristics of the non-silent speech signal MFCC.
- the model training unit 233 is configured to train the general background model by using Gaussian mixture model parameters to obtain a Gaussian mixture model-general background model.
- the Gaussian mixture model parameters are factored by the general background model.
- the factor analysis algorithm is used to analyze the acoustic features represented by the Gaussian mixture model, and the mean vector (mean) of the acoustic features is separated from the voiceprint difference vector (balance) to obtain the I-vector feature.
- the factor analysis algorithm can separate the voiceprint difference vector between different voices, and it is easier to extract the voiceprint specificity between different voices.
- the registration voice receiving unit 234 is configured to receive a voiceprint registration request, where the voiceprint registration request includes a user ID and a target training voice.
- the client receives the voiceprint registration request input by the user, and sends the voiceprint registration request to the server, and the server receives the voiceprint registration request.
- the target voiceprint feature acquiring unit 235 is configured to perform feature extraction on the target training voice by using a Gaussian mixture model-general background model to obtain a target voiceprint feature.
- the server uses the trained Gaussian mixture model-general background model to perform feature extraction on the target training speech to obtain the target voiceprint feature.
- MFCC feature extraction is performed on the target training speech to obtain the corresponding target MFCC acoustic features, then the target MFCC acoustic features are detected for the speech activity, and then the active speech detected MFCC acoustic features are put into the trained Gaussian mixture model-
- the general background model performs feature extraction to obtain the target voiceprint features.
- the target voiceprint feature storage unit 236 is configured to store the user ID and the target voiceprint feature in the voiceprint feature library.
- the user ID in the voiceprint registration request and the target voiceprint feature acquired based on the target training voice are stored in the voiceprint feature library, so as to facilitate When user identification is required, the corresponding target voiceprint feature can be called based on the user ID.
- the Gaussian mixture model parameters are estimated, and the general background model is trained by using the Gaussian mixture model parameters to obtain the training.
- the Gaussian mixture model-general background model the Gaussian mixture model-general background model has the advantage of high recognition rate.
- the voiceprint registration request Receiving the voiceprint registration request, extracting the target training voice in the voiceprint registration request through the trained Gaussian mixture model-general background model to obtain the target voiceprint feature, and storing the target voiceprint feature and the user ID in the In the voiceprint feature library, in the speech recognition process, the corresponding target voiceprint feature is obtained based on the user ID in the to-be-processed voice recognition request, and compared with the test voiceprint feature to determine the target voiceprint feature and the test sound. Whether the speaker of the pattern feature is the same user to achieve the speech recognition effect.
- FIG. 3 is a schematic diagram of a background server according to an embodiment of the present invention.
- the background server 3 of this embodiment includes a processor 30, a memory 31, and a computer program 32 stored in the memory 31 and operable on the processor 30, for example, performing the above-described voiceprint recognition. Method of procedure.
- the processor 30 executes the computer program 32, the steps in the embodiments of the above-described respective voiceprint recognition methods are implemented, such as steps S10 to S50 shown in FIG.
- the processor 30 executes the computer program 32, the functions of the modules/units in the foregoing device embodiments are implemented, such as the functions of the various units on the background server 20 shown in FIG. 2.
- the computer program 32 can be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 30 to complete this invention.
- the one or more modules/units can A series of computer program instructions that are capable of performing a particular function, the instruction segments being used to describe the execution of the computer program 32 in the background server 3.
- the background server 3 may be a computing device such as a local server or a cloud server.
- the background server may include, but is not limited to, the processor 30 and the memory 31. It will be understood by those skilled in the art that FIG. 3 is only an example of the backend server 3, does not constitute a limitation to the backend server 3, may include more or less components than the illustration, or combine some components, or different components.
- the background server may further include an input/output device, a network access device, a bus, and the like.
- the processor 30 may be a central processing unit (CPU), or may be another general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.
- the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
- the memory 31 may be an internal storage unit of the background server 3, such as a hard disk or a memory of the background server 3.
- the memory 31 may also be an external storage device of the background server 3, such as a plug-in hard disk provided on the background server 3, a smart memory card (SMC), and a secure digital (SD). Card, flash card, etc.
- the memory 31 may also include both an internal storage unit of the background server 3 and an external storage device.
- the memory 31 is configured to store the computer program and the same required by the background server His programs and data.
- the memory 31 can also be used to temporarily store data that has been output or is about to be output.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computational Linguistics (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Telephonic Communication Services (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (20)
- 一种声纹识别方法,其特征在于,包括:客户端采集用户的测试语音,并向后台服务器发送语音识别请求,所述语音识别请求包括用户ID和所述测试语音;后台服务器接收所述语音识别请求,采用消息队列和异步机制确定待处理语音识别请求;后台服务器获取与所述待处理语音识别请求的用户ID相对应的目标声纹特征,并获取与所述待处理语音识别请求的测试语音相对应的测试声纹特征;后台服务器根据所述目标声纹特征和所述测试声纹特征判断是否对应同一用户,并向所述客户端输出判断结果;客户端接收并显示所述判断结果。
- 根据权利要求1所述的声纹识别方法,其特征在于,所述后台服务器获取与所述待处理语音识别请求的用户ID相对应的目标声纹特征,并获取与所述待处理语音识别请求的测试语音相对应的测试声纹特征,包括:根据所述待处理语音识别请求的用户ID查询声纹特征库,以获取与所述待处理语音识别请求的用户ID相对应的目标声纹特征;采用高斯混合模型-通用背景模型对所述待处理语音识别请求的测试声纹特征进行处理,以获取与所述待处理语音识别请求的测试语音相对应的测试声纹特征。
- 根据权利要求2所述的声纹识别方法,其特征在于,还包括:对训练语音进行MFCC特征提取,以获取MFCC声学特征;对所述MFCC声学特征进行语音活动检测,估计高斯混合模型参数;利用所述高斯混合模型参数对通用背景模型进行训练,获取所述高斯混合模型-通用背景模型;接收声纹注册请求,所述声纹注册请求包括用户ID和目标训练语音;采用所述高斯混合模型-通用背景模型对所述目标训练语音进行训练,获取目标声纹特征;将所述用户ID和所述目标声纹特征存储在所述声纹特征库。
- 根据权利要求1所述的声纹识别方法,其特征在于,所述根据所述目标声纹特征和所述测试声纹特征判断是否对应同一用户,包括:采用PLDA算法分别对所述目标声纹特征和所述测试声纹特征进行降维,获取目标降维值和测试降维值;采用余弦测度函数对所述目标降维值和所述测试降维值进行余弦测度,获取余弦测度值;判断所述余弦测度值是否大于相似阈值;若是,则为同一用户;若否,则不为同一用户。
- 一种声纹识别装置,其特征在于,包括:客户端,用于采集用户的测试语音,并向后台服务器发送语音识别请求,所述语音识别请求包括用户ID和所述测试语音;后台服务器,用于接收所述语音识别请求,采用消息队列和异步机制确定待处理语音识别请求;后台服务器,用于获取与所述待处理语音识别请求的用户ID相对应的目标声纹特征,并获取与所述待处理语音识别请求的测试语音相对应的测试声纹特征;后台服务器,用于根据所述目标声纹特征和所述测试声纹特征判 断是否对应同一用户,并向所述客户端输出判断结果;客户端,用于接收并显示所述判断结果。
- 根据权利要求6所述的声纹识别装置,其特征在于,所述后台服务器包括:特征查询单元,用于根据所述待处理语音识别请求的用户ID查询声纹特征库,以获取与所述待处理语音识别请求的用户ID相对应的目标声纹特征;特征处理单元,用于采用高斯混合模型-通用背景模型对所述待处理语音识别请求的测试声纹特征进行处理,以获取与所述待处理语音识别请求的测试语音相对应的测试声纹特征。
- 根据权利要求7所述的声纹识别装置,其特征在于,后台服务器还包括:声学特征提取单元,用于对训练语音进行MFCC特征提取,以获取MFCC声学特征;语音活动检测单元,用于对所述MFCC声学特征进行语音活动检测,估计高斯混合模型参数;模型训练单元,用于利用所述高斯混合模型参数对通用背景模型进行训练,获取所述高斯混合模型-通用背景模型;注册语音接收单元,用于接收声纹注册请求,所述声纹注册请求包括用户ID和目标训练语音;目标声纹特征获取单元,用于采用所述高斯混合模型-通用背景模型对所述目标训练语音进行训练,获取目标声纹特征;目标声纹特征存储单元,用于将所述用户ID和所述目标声纹特征存储在所述声纹特征库。
- 根据权利要求6所述的声纹识别装置,其特征在于,所述后台服务器包括:特征降维单元,用于采用PLDA算法分别对所述目标声纹特征和所述测试声纹特征进行降维,获取目标降维值和测试降维值;余弦测度处理单元,用于采用余弦测度函数对所述目标降维值和所述测试降维值进行余弦测度,获取余弦测度值;用户识别判断单元,用于判断所述余弦测度值是否大于相似阈值;若是,则为同一用户;若否,则不为同一用户。
- 一种后台服务器,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如下步骤:客户端采集用户的测试语音,并向后台服务器发送语音识别请求,所述语音识别请求包括用户ID和所述测试语音;后台服务器接收所述语音识别请求,采用消息队列和异步机制确定待处理语音识别请求;后台服务器获取与所述待处理语音识别请求的用户ID相对应的目标声纹特征,并获取与所述待处理语音识别请求的测试语音相对应的测试声纹特征;后台服务器根据所述目标声纹特征和所述测试声纹特征判断是否对应同一用户,并向所述客户端输出判断结果;客户端接收并显示所述判断结果。
- 根据权利要求11所述的后台服务器,其特征在于,所述后台服务器获取与所述待处理语音识别请求的用户ID相对应的目标声纹特征,并获取与所述待处理语音识别请求的测试语音相对应的测试声纹特征,包括:根据所述待处理语音识别请求的用户ID查询声纹特征库,以获取与所述待处理语音识别请求的用户ID相对应的目标声纹特征;采用高斯混合模型-通用背景模型对所述待处理语音识别请求的 测试声纹特征进行处理,以获取与所述待处理语音识别请求的测试语音相对应的测试声纹特征。
- 根据权利要求12所述的后台服务器,其特征在于,还包括:对训练语音进行MFCC特征提取,以获取MFCC声学特征;对所述MFCC声学特征进行语音活动检测,估计高斯混合模型参数;利用所述高斯混合模型参数对通用背景模型进行训练,获取所述高斯混合模型-通用背景模型;接收声纹注册请求,所述声纹注册请求包括用户ID和目标训练语音;采用所述高斯混合模型-通用背景模型对所述目标训练语音进行训练,获取目标声纹特征;将所述用户ID和所述目标声纹特征存储在所述声纹特征库。
- 根据权利要求11所述的后台服务器,其特征在于,所述根据所述目标声纹特征和所述测试声纹特征判断是否对应同一用户,包括:采用PLDA算法分别对所述目标声纹特征和所述测试声纹特征进行降维,获取目标降维值和测试降维值;采用余弦测度函数对所述目标降维值和所述测试降维值进行余弦测度,获取余弦测度值;判断所述余弦测度值是否大于相似阈值;若是,则为同一用户;若否,则不为同一用户。
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如下步骤:客户端采集用户的测试语音,并向后台服务器发送语音识别请求,所述语音识别请求包括用户ID和所述测试语音;后台服务器接收所述语音识别请求,采用消息队列和异步机制确定待处理语音识别请求;后台服务器获取与所述待处理语音识别请求的用户ID相对应的目标声纹特征,并获取与所述待处理语音识别请求的测试语音相对应的测试声纹特征;后台服务器根据所述目标声纹特征和所述测试声纹特征判断是否对应同一用户,并向所述客户端输出判断结果;客户端接收并显示所述判断结果。
- 根据权利要求16所述的计算机可读存储介质,其特征在于,所述后台服务器获取与所述待处理语音识别请求的用户ID相对应的目标声纹特征,并获取与所述待处理语音识别请求的测试语音相对应的测试声纹特征,包括:根据所述待处理语音识别请求的用户ID查询声纹特征库,以获取与所述待处理语音识别请求的用户ID相对应的目标声纹特征;采用高斯混合模型-通用背景模型对所述待处理语音识别请求的测试声纹特征进行处理,以获取与所述待处理语音识别请求的测试语音相对应的测试声纹特征。
- 根据权利要求17所述的计算机可读存储介质,其特征在于,还包括:对训练语音进行MFCC特征提取,以获取MFCC声学特征;对所述MFCC声学特征进行语音活动检测,估计高斯混合模型参数;利用所述高斯混合模型参数对通用背景模型进行训练,获取所述高斯混合模型-通用背景模型;接收声纹注册请求,所述声纹注册请求包括用户ID和目标训练语音;采用所述高斯混合模型-通用背景模型对所述目标训练语音进行训练,获取目标声纹特征;将所述用户ID和所述目标声纹特征存储在所述声纹特征库。
- 根据权利要求16所述的计算机可读存储介质,其特征在于,所述根据所述目标声纹特征和所述测试声纹特征判断是否对应同一用户,包括:采用PLDA算法分别对所述目标声纹特征和所述测试声纹特征进行降维,获取目标降维值和测试降维值;采用余弦测度函数对所述目标降维值和所述测试降维值进行余弦测度,获取余弦测度值;判断所述余弦测度值是否大于相似阈值;若是,则为同一用户;若否,则不为同一用户。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG11201803895RA SG11201803895RA (en) | 2017-02-16 | 2017-06-26 | Voiceprint recognition method, device, storage medium and background server |
JP2018514332A JP6649474B2 (ja) | 2017-02-16 | 2017-06-26 | 声紋識別方法、装置及びバックグラウンドサーバ |
AU2017341161A AU2017341161A1 (en) | 2017-02-16 | 2017-06-26 | Voiceprint recognition method, device, storage medium and background server |
EP17857669.0A EP3584790A4 (en) | 2017-02-16 | 2017-06-26 | VOICEPRINT RECOGNITION METHOD, DEVICE, STORAGE MEDIUM AND BACKGROUND SERVER |
US15/772,801 US10629209B2 (en) | 2017-02-16 | 2017-06-26 | Voiceprint recognition method, device, storage medium and background server |
KR1020187015547A KR20180104595A (ko) | 2017-02-16 | 2017-06-26 | 성문 식별 방법, 장치, 저장 매체 및 백스테이지 서버 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710083629.0 | 2017-02-16 | ||
CN201710083629.0A CN106847292B (zh) | 2017-02-16 | 2017-02-16 | 声纹识别方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018149077A1 true WO2018149077A1 (zh) | 2018-08-23 |
Family
ID=59128377
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/090046 WO2018149077A1 (zh) | 2017-02-16 | 2017-06-26 | 声纹识别方法、装置、存储介质和后台服务器 |
Country Status (8)
Country | Link |
---|---|
US (1) | US10629209B2 (zh) |
EP (1) | EP3584790A4 (zh) |
JP (1) | JP6649474B2 (zh) |
KR (1) | KR20180104595A (zh) |
CN (1) | CN106847292B (zh) |
AU (2) | AU2017341161A1 (zh) |
SG (1) | SG11201803895RA (zh) |
WO (1) | WO2018149077A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111048100A (zh) * | 2019-11-21 | 2020-04-21 | 深圳市东进银通电子有限公司 | 一种大数据并行化声纹辨认系统和方法 |
CN111210829A (zh) * | 2020-02-19 | 2020-05-29 | 腾讯科技(深圳)有限公司 | 语音识别方法、装置、系统、设备和计算机可读存储介质 |
CN111312259A (zh) * | 2020-02-17 | 2020-06-19 | 厦门快商通科技股份有限公司 | 声纹识别方法、系统、移动终端及存储介质 |
CN112214298A (zh) * | 2020-09-30 | 2021-01-12 | 国网江苏省电力有限公司信息通信分公司 | 基于声纹识别的动态优先级调度方法及系统 |
CN114780787A (zh) * | 2022-04-01 | 2022-07-22 | 杭州半云科技有限公司 | 声纹检索方法、身份验证方法、身份注册方法和装置 |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106847292B (zh) * | 2017-02-16 | 2018-06-19 | 平安科技(深圳)有限公司 | 声纹识别方法及装置 |
US10170112B2 (en) * | 2017-05-11 | 2019-01-01 | Google Llc | Detecting and suppressing voice queries |
CN107492379B (zh) * | 2017-06-30 | 2021-09-21 | 百度在线网络技术(北京)有限公司 | 一种声纹创建与注册方法及装置 |
CN109215643B (zh) * | 2017-07-05 | 2023-10-24 | 阿里巴巴集团控股有限公司 | 一种交互方法、电子设备及服务器 |
CN107527620B (zh) * | 2017-07-25 | 2019-03-26 | 平安科技(深圳)有限公司 | 电子装置、身份验证的方法及计算机可读存储介质 |
CN107623614B (zh) * | 2017-09-19 | 2020-12-08 | 百度在线网络技术(北京)有限公司 | 用于推送信息的方法和装置 |
CN109584884B (zh) * | 2017-09-29 | 2022-09-13 | 腾讯科技(深圳)有限公司 | 一种语音身份特征提取器、分类器训练方法及相关设备 |
CN107978311B (zh) * | 2017-11-24 | 2020-08-25 | 腾讯科技(深圳)有限公司 | 一种语音数据处理方法、装置以及语音交互设备 |
CN108806696B (zh) * | 2018-05-08 | 2020-06-05 | 平安科技(深圳)有限公司 | 建立声纹模型的方法、装置、计算机设备和存储介质 |
US11893999B1 (en) * | 2018-05-13 | 2024-02-06 | Amazon Technologies, Inc. | Speech based user recognition |
CN108777146A (zh) * | 2018-05-31 | 2018-11-09 | 平安科技(深圳)有限公司 | 语音模型训练方法、说话人识别方法、装置、设备及介质 |
CN108899032A (zh) * | 2018-06-06 | 2018-11-27 | 平安科技(深圳)有限公司 | 声纹识别方法、装置、计算机设备及存储介质 |
CN108986792B (zh) * | 2018-09-11 | 2021-02-12 | 苏州思必驰信息科技有限公司 | 用于语音对话平台的语音识别模型的训练调度方法及系统 |
KR20190067135A (ko) | 2019-05-27 | 2019-06-14 | 박경훈 | 묶을 수 있는 끈이 일체형으로 직조 된 망사 자루 연속 자동화 제조방법 및 그로써 직조 된 망사 자루 |
CN110491370A (zh) * | 2019-07-15 | 2019-11-22 | 北京大米科技有限公司 | 一种语音流识别方法、装置、存储介质及服务器 |
CN110364182B (zh) * | 2019-08-01 | 2022-06-14 | 腾讯音乐娱乐科技(深圳)有限公司 | 一种声音信号处理方法及装置 |
CN110610709A (zh) * | 2019-09-26 | 2019-12-24 | 浙江百应科技有限公司 | 基于声纹识别的身份辨别方法 |
CN111081261B (zh) * | 2019-12-25 | 2023-04-21 | 华南理工大学 | 一种基于lda的文本无关声纹识别方法 |
CN111370000A (zh) * | 2020-02-10 | 2020-07-03 | 厦门快商通科技股份有限公司 | 声纹识别算法评估方法、系统、移动终端及存储介质 |
CN111554303B (zh) * | 2020-05-09 | 2023-06-02 | 福建星网视易信息系统有限公司 | 一种歌曲演唱过程中的用户身份识别方法及存储介质 |
CN112000570A (zh) * | 2020-07-29 | 2020-11-27 | 北京达佳互联信息技术有限公司 | 应用测试方法、装置、服务器及存储介质 |
CN111951791B (zh) * | 2020-08-26 | 2024-05-17 | 上海依图网络科技有限公司 | 声纹识别模型训练方法、识别方法、电子设备及存储介质 |
CN112185395B (zh) * | 2020-09-04 | 2021-04-27 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 一种基于差分隐私的联邦声纹识别方法 |
CN112185362A (zh) * | 2020-09-24 | 2021-01-05 | 苏州思必驰信息科技有限公司 | 针对用户个性化服务的语音处理方法及装置 |
US11522994B2 (en) | 2020-11-23 | 2022-12-06 | Bank Of America Corporation | Voice analysis platform for voiceprint tracking and anomaly detection |
CN112669820B (zh) * | 2020-12-16 | 2023-08-04 | 平安科技(深圳)有限公司 | 基于语音识别的考试作弊识别方法、装置及计算机设备 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1936967A (zh) * | 2005-09-20 | 2007-03-28 | 吴田平 | 声纹考勤机 |
CN101923855A (zh) * | 2009-06-17 | 2010-12-22 | 复旦大学 | 文本无关的声纹识别系统 |
CN102324232A (zh) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | 基于高斯混合模型的声纹识别方法及系统 |
CN102402985A (zh) * | 2010-09-14 | 2012-04-04 | 盛乐信息技术(上海)有限公司 | 提高声纹识别安全性的声纹认证系统及其实现方法 |
CN102509547A (zh) * | 2011-12-29 | 2012-06-20 | 辽宁工业大学 | 基于矢量量化的声纹识别方法及系统 |
CN103730114A (zh) * | 2013-12-31 | 2014-04-16 | 上海交通大学无锡研究院 | 一种基于联合因子分析模型的移动设备声纹识别方法 |
CN104835498A (zh) * | 2015-05-25 | 2015-08-12 | 重庆大学 | 基于多类型组合特征参数的声纹识别方法 |
CN106847292A (zh) * | 2017-02-16 | 2017-06-13 | 平安科技(深圳)有限公司 | 声纹识别方法及装置 |
Family Cites Families (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU670379B2 (en) | 1993-08-10 | 1996-07-11 | International Standard Electric Corp. | System and method for passive voice verification in a telephone network |
US7047196B2 (en) | 2000-06-08 | 2006-05-16 | Agiletv Corporation | System and method of voice recognition near a wireline node of a network supporting cable television and/or video delivery |
JP2002304379A (ja) * | 2001-04-05 | 2002-10-18 | Sharp Corp | 個人認証方法および個人認証システム |
US6853716B1 (en) * | 2001-04-16 | 2005-02-08 | Cisco Technology, Inc. | System and method for identifying a participant during a conference call |
JP2003114617A (ja) * | 2001-10-03 | 2003-04-18 | Systemfrontier Co Ltd | 音声による認証システム及び音声による認証方法 |
US7240007B2 (en) * | 2001-12-13 | 2007-07-03 | Matsushita Electric Industrial Co., Ltd. | Speaker authentication by fusion of voiceprint match attempt results with additional information |
JP2005115921A (ja) * | 2003-09-17 | 2005-04-28 | Moss Institute Co Ltd | 音声情報管理方法,音声情報管理システム,音声情報管理プログラム及び音声データ管理装置 |
US20060015335A1 (en) * | 2004-07-13 | 2006-01-19 | Ravigopal Vennelakanti | Framework to enable multimodal access to applications |
CN101197131B (zh) * | 2006-12-07 | 2011-03-30 | 积体数位股份有限公司 | 随机式声纹密码验证系统、随机式声纹密码锁及其产生方法 |
JP2009230267A (ja) * | 2008-03-19 | 2009-10-08 | Future Vision:Kk | 会議室設備及び会議室設備を用いた会議記録システム |
JP2009237774A (ja) * | 2008-03-26 | 2009-10-15 | Advanced Media Inc | 認証サーバ、サービス提供サーバ、認証方法、通信端末、およびログイン方法 |
US8442824B2 (en) * | 2008-11-26 | 2013-05-14 | Nuance Communications, Inc. | Device, system, and method of liveness detection utilizing voice biometrics |
JP2010182076A (ja) * | 2009-02-05 | 2010-08-19 | Nec Corp | 認証システム、認証サーバ、証明方法およびプログラム |
CN102760434A (zh) * | 2012-07-09 | 2012-10-31 | 华为终端有限公司 | 一种声纹特征模型更新方法及终端 |
AU2013315343B2 (en) * | 2012-09-11 | 2019-05-30 | Auraya Pty Ltd | Voice authentication system and method |
CN103035245A (zh) * | 2012-12-08 | 2013-04-10 | 大连创达技术交易市场有限公司 | 以太网声纹识别系统 |
JP6276523B2 (ja) | 2013-06-28 | 2018-02-07 | 株式会社フジクラ | 酸化物超電導導体及び酸化物超電導導体の製造方法 |
WO2015011867A1 (ja) * | 2013-07-26 | 2015-01-29 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 情報管理方法 |
JP6360484B2 (ja) * | 2013-09-03 | 2018-07-18 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | 音声対話制御方法 |
GB2517952B (en) * | 2013-09-05 | 2017-05-31 | Barclays Bank Plc | Biometric verification using predicted signatures |
EP3373176B1 (en) * | 2014-01-17 | 2020-01-01 | Cirrus Logic International Semiconductor Limited | Tamper-resistant element for use in speaker recognition |
CN103915096A (zh) * | 2014-04-15 | 2014-07-09 | 胡上杰 | 警务声纹识别方法 |
KR102346634B1 (ko) | 2015-02-27 | 2022-01-03 | 삼성전자주식회사 | 사용자 인식을 위한 특징 벡터를 변환하는 방법 및 디바이스 |
CN105845140A (zh) * | 2016-03-23 | 2016-08-10 | 广州势必可赢网络科技有限公司 | 应用于短语音条件下的说话人确认方法和装置 |
CN107492382B (zh) * | 2016-06-13 | 2020-12-18 | 阿里巴巴集团控股有限公司 | 基于神经网络的声纹信息提取方法及装置 |
CN106297806A (zh) * | 2016-08-22 | 2017-01-04 | 安徽工程大学机电学院 | 基于声纹的智能传声系统 |
-
2017
- 2017-02-16 CN CN201710083629.0A patent/CN106847292B/zh active Active
- 2017-06-26 SG SG11201803895RA patent/SG11201803895RA/en unknown
- 2017-06-26 EP EP17857669.0A patent/EP3584790A4/en not_active Ceased
- 2017-06-26 WO PCT/CN2017/090046 patent/WO2018149077A1/zh active Application Filing
- 2017-06-26 US US15/772,801 patent/US10629209B2/en active Active
- 2017-06-26 AU AU2017341161A patent/AU2017341161A1/en active Pending
- 2017-06-26 JP JP2018514332A patent/JP6649474B2/ja active Active
- 2017-06-26 KR KR1020187015547A patent/KR20180104595A/ko not_active Application Discontinuation
- 2017-06-26 AU AU2017101877A patent/AU2017101877A4/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1936967A (zh) * | 2005-09-20 | 2007-03-28 | 吴田平 | 声纹考勤机 |
CN101923855A (zh) * | 2009-06-17 | 2010-12-22 | 复旦大学 | 文本无关的声纹识别系统 |
CN102402985A (zh) * | 2010-09-14 | 2012-04-04 | 盛乐信息技术(上海)有限公司 | 提高声纹识别安全性的声纹认证系统及其实现方法 |
CN102324232A (zh) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | 基于高斯混合模型的声纹识别方法及系统 |
CN102509547A (zh) * | 2011-12-29 | 2012-06-20 | 辽宁工业大学 | 基于矢量量化的声纹识别方法及系统 |
CN103730114A (zh) * | 2013-12-31 | 2014-04-16 | 上海交通大学无锡研究院 | 一种基于联合因子分析模型的移动设备声纹识别方法 |
CN104835498A (zh) * | 2015-05-25 | 2015-08-12 | 重庆大学 | 基于多类型组合特征参数的声纹识别方法 |
CN106847292A (zh) * | 2017-02-16 | 2017-06-13 | 平安科技(深圳)有限公司 | 声纹识别方法及装置 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111048100A (zh) * | 2019-11-21 | 2020-04-21 | 深圳市东进银通电子有限公司 | 一种大数据并行化声纹辨认系统和方法 |
CN111048100B (zh) * | 2019-11-21 | 2023-09-08 | 深圳市东进银通电子有限公司 | 一种大数据并行化声纹辨认系统和方法 |
CN111312259A (zh) * | 2020-02-17 | 2020-06-19 | 厦门快商通科技股份有限公司 | 声纹识别方法、系统、移动终端及存储介质 |
CN111210829A (zh) * | 2020-02-19 | 2020-05-29 | 腾讯科技(深圳)有限公司 | 语音识别方法、装置、系统、设备和计算机可读存储介质 |
CN112214298A (zh) * | 2020-09-30 | 2021-01-12 | 国网江苏省电力有限公司信息通信分公司 | 基于声纹识别的动态优先级调度方法及系统 |
CN112214298B (zh) * | 2020-09-30 | 2023-09-22 | 国网江苏省电力有限公司信息通信分公司 | 基于声纹识别的动态优先级调度方法及系统 |
CN114780787A (zh) * | 2022-04-01 | 2022-07-22 | 杭州半云科技有限公司 | 声纹检索方法、身份验证方法、身份注册方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN106847292B (zh) | 2018-06-19 |
EP3584790A4 (en) | 2021-01-13 |
US20190272829A1 (en) | 2019-09-05 |
CN106847292A (zh) | 2017-06-13 |
EP3584790A1 (en) | 2019-12-25 |
AU2017101877A4 (en) | 2020-04-23 |
AU2017341161A1 (en) | 2018-08-30 |
US10629209B2 (en) | 2020-04-21 |
SG11201803895RA (en) | 2018-09-27 |
KR20180104595A (ko) | 2018-09-21 |
JP2019510248A (ja) | 2019-04-11 |
JP6649474B2 (ja) | 2020-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018149077A1 (zh) | 声纹识别方法、装置、存储介质和后台服务器 | |
WO2021208287A1 (zh) | 用于情绪识别的语音端点检测方法、装置、电子设备及存储介质 | |
CN106486131B (zh) | 一种语音去噪的方法及装置 | |
WO2020181824A1 (zh) | 声纹识别方法、装置、设备以及计算机可读存储介质 | |
WO2018166187A1 (zh) | 服务器、身份验证方法、系统及计算机可读存储介质 | |
WO2018223727A1 (zh) | 识别声纹的方法、装置、设备及介质 | |
WO2019019256A1 (zh) | 电子装置、身份验证的方法、系统及计算机可读存储介质 | |
US20160111112A1 (en) | Speaker change detection device and speaker change detection method | |
CN109346088A (zh) | 身份识别方法、装置、介质及电子设备 | |
WO2017031846A1 (zh) | 噪声消除、语音识别方法、装置、设备及非易失性计算机存储介质 | |
US20120143608A1 (en) | Audio signal source verification system | |
CN108694954A (zh) | 一种性别年龄识别方法、装置、设备及可读存储介质 | |
WO2021218136A1 (zh) | 基于语音的用户性别年龄识别方法、装置、计算机设备及存储介质 | |
WO2021000498A1 (zh) | 复合语音识别方法、装置、设备及计算机可读存储介质 | |
WO2019232826A1 (zh) | i-vector向量提取方法、说话人识别方法、装置、设备及介质 | |
CN110880329A (zh) | 一种音频识别方法及设备、存储介质 | |
CN109036437A (zh) | 口音识别方法、装置、计算机装置及计算机可读存储介质 | |
WO2017045429A1 (zh) | 一种音频数据的检测方法、系统及存储介质 | |
WO2018095167A1 (zh) | 声纹识别方法和声纹识别系统 | |
TW202018696A (zh) | 語音識別方法、裝置及計算設備 | |
CN113223536A (zh) | 声纹识别方法、装置及终端设备 | |
CN111161713A (zh) | 一种语音性别识别方法、装置及计算设备 | |
CN109545226A (zh) | 一种语音识别方法、设备及计算机可读存储介质 | |
Chakroun et al. | Efficient text-independent speaker recognition with short utterances in both clean and uncontrolled environments | |
WO2019218515A1 (zh) | 服务器、基于声纹的身份验证方法及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2018514332 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11201803895R Country of ref document: SG |
|
ENP | Entry into the national phase |
Ref document number: 20187015547 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2017341161 Country of ref document: AU Date of ref document: 20170626 Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17857669 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |