WO2018107810A1 - Procédé et appareil de reconnaissance d'empreinte vocale, et dispositif électronique et support - Google Patents

Procédé et appareil de reconnaissance d'empreinte vocale, et dispositif électronique et support Download PDF

Info

Publication number
WO2018107810A1
WO2018107810A1 PCT/CN2017/099707 CN2017099707W WO2018107810A1 WO 2018107810 A1 WO2018107810 A1 WO 2018107810A1 CN 2017099707 W CN2017099707 W CN 2017099707W WO 2018107810 A1 WO2018107810 A1 WO 2018107810A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
speech
neural network
voice
network model
Prior art date
Application number
PCT/CN2017/099707
Other languages
English (en)
Chinese (zh)
Inventor
王健宗
郭卉
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2018107810A1 publication Critical patent/WO2018107810A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • the present application belongs to the technical field of identity authentication, and in particular, to a voiceprint recognition method, device, electronic device and medium.
  • Voiceprint recognition also known as speaker recognition, is used to determine which segment of speech is spoken by a certain segment of speech or to confirm whether a segment of speech is spoken by a designated person.
  • the speech parameters of the physiological and behavioral characteristics of the speaker the technique of automatically identifying the identity of the speaker.
  • voiceprint recognition is widely used in the Internet, banking systems, public security and other fields.
  • Voiceprint is a sound wave spectrum that carries speech information displayed by electroacoustic instruments. Each person's speech acoustic characteristics are both relatively stable and variability, not absolute and immutable. This variation can come from physiology, pathology, psychology, simulation, camouflage, and also related to environmental disturbances.
  • the mainstream voiceprint recognition method in the industry generally needs to first model the voiceprint of the speaker, usually the pre-training of the global background model.
  • a mixed Gaussian model is mainly used to train a general background model. Since the mixed Gaussian background model based on unsupervised training does not have the category information of the sample data, it is only used to represent the characteristics of all the speakers in the speaker space, and is a single speaker-independent background model, so it is difficult to accurately distinguish the speech. The difference in human characteristics ultimately leads to a low recognition accuracy when the speaker's voiceprint is recognized.
  • the embodiment of the invention provides a voiceprint recognition method, device, electronic device and medium, so as to solve the problem that the prior art is difficult to accurately distinguish the distinctive features of the speaker, thereby resulting in a low accuracy of voiceprint recognition.
  • a first aspect of the embodiments of the present invention provides a voiceprint recognition method, including:
  • N feature extraction matrices of N training speeches are respectively trained, and each feature extraction matrix corresponds to one of the training speeches.
  • Speaker model Using the output parameters of the neural network model and the speaker features corresponding to each training speech, respectively, N feature extraction matrices of N training speeches are respectively trained, and each feature extraction matrix corresponds to one of the training speeches. Speaker model;
  • a speaker model matching the second feature matrix is selected, and the speaker output corresponding to the selected speaker model is Depicting the voiceprint recognition result of the recognized voice;
  • K and N are integers greater than zero, and K is greater than N.
  • a second aspect of the embodiments of the present invention provides a voiceprint recognition apparatus, including:
  • a pre-processing module configured to pre-process the input K voices to obtain valid voices in each voice, where the voice includes a training voice and a voice to be recognized;
  • a first extraction module configured to extract a Meer frequency cepstral coefficient acoustic feature of the effective speech in each training speech, output a dimension including the Mel frequency cepstral coefficient, and a number of sub-frames of each training speech First feature matrix;
  • a building module configured to construct a long-term recurrent neural network model, and input the first feature matrix into the neural network model to obtain an output parameter of the neural network model;
  • a training module configured to use the output parameters of the neural network model and the speaker features corresponding to each training speech to respectively obtain N feature extraction matrices of N training speeches, where each feature extraction matrix corresponds to a speaker model of the training speech;
  • a second extraction module configured to extract an acoustic characteristic of a Mel frequency cepstral coefficient of the effective speech in the speech to be recognized, and output a dimension including the dimension of the Mel frequency cepstral coefficient and the number of sub-frames of the to-be-recognized speech Two characteristic matrix;
  • An identification module configured to select, in the N speaker models, a speaker model that matches the second feature matrix according to a preset similarity measurement algorithm, where the selected speaker model corresponds
  • the speaker outputs the voiceprint recognition result of the speech to be recognized
  • K and N are integers greater than zero, and K is greater than N.
  • a third aspect of the embodiments of the present invention provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the computer program The following steps are implemented:
  • N feature extraction matrices of N training speeches are respectively trained, and each feature extraction matrix corresponds to one of the training speeches.
  • Speaker model Using the output parameters of the neural network model and the speaker features corresponding to each training speech, respectively, N feature extraction matrices of N training speeches are respectively trained, and each feature extraction matrix corresponds to one of the training speeches. Speaker model;
  • a speaker model matching the second feature matrix is selected, and the speaker output corresponding to the selected speaker model is Depicting the voiceprint recognition result of the recognized voice;
  • K and N are integers greater than zero, and K is greater than N.
  • a computer readable storage medium storing a computer program, the computer program being executed by at least one processor, implements the following steps:
  • a speaker model matching the second feature matrix is selected, and the speaker output corresponding to the selected speaker model is Depicting the voiceprint recognition result of the recognized voice;
  • K and N are integers greater than zero, and K is greater than N.
  • the voiceprint background model is trained by supervised learning, and by combining the characteristics of the speaker, a more suitable acoustic feature set can be extracted from the original training voice data, thereby more accurately distinguishing the speaker.
  • the difference feature can obtain better voiceprint recognition effect in the scene of overlapping voices. Since the main process of recognition is based on the deep neural network model, it is possible to learn a more robust speaker model and solve the problem of low recognition accuracy of the existing voiceprint recognition method.
  • FIG. 1 is a flowchart of an implementation of a voiceprint recognition method according to an embodiment of the present invention
  • step S101 in the voiceprint recognition method is a specific implementation flowchart of step S101 in the voiceprint recognition method according to the embodiment of the present invention
  • step S102 in the voiceprint recognition method according to the embodiment of the present invention
  • step S103 is a specific implementation flowchart of step S103 in the voiceprint recognition method according to the embodiment of the present invention.
  • FIG. 5 is a specific implementation flowchart of step S104 in the voiceprint recognition method according to the embodiment of the present invention.
  • FIG. 6 is a structural block diagram of a voiceprint recognition apparatus according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of an electronic device according to an embodiment of the present invention.
  • the embodiment of the invention is implemented based on a time recursive depth neural network.
  • the training of the speaker model relies on the acoustic characteristics of the training speech to estimate and optimize the parameters of the model, and different speaker models are used to represent different speaker personality characteristics.
  • the speaker model is matched with multiple speaker models in turn, and the speaker model that does not meet the matching condition is eliminated. Finally, the speaker corresponding to the speaker model matching the matching condition is received as the voiceprint recognition. the result of.
  • FIG. 1 is a flowchart showing an implementation process of a voiceprint recognition method according to an embodiment of the present invention, which is described in detail as follows:
  • the input K voices are respectively preprocessed to obtain valid voices in each voice
  • the speech includes training speech and speech to be recognized.
  • different speaker models are established by inputting a sufficient number of training speeches, which are labeled speech samples of known speaker identity, used to adjust parameters of the speaker model, so that the model can be based on Supervise learning and achieve the required recognition performance in practical applications.
  • the speech is the speech to be recognized.
  • the training speech is different from the speech to be recognized, and may be different or the same speech data.
  • the speech to be recognized can be used to test the performance of the finally derived speaker model, and test whether it can accurately recognize the speaker identity of the speech to be recognized.
  • FIG. 2 shows a specific implementation flow of the voiceprint recognition method S101 provided by the embodiment of the present invention, which is described in detail as follows:
  • S201 Perform pre-emphasis processing on the input K voices respectively to improve a frequency band of the high frequency signal in each voice.
  • the high-frequency resonance peak is highlighted, and each speech signal is respectively passed through a high-pass filter to emphasize the high-frequency portion of the speech, so that the spectrum of the speech signal becomes more smooth.
  • Each of the pre-emphasis processed speech is segmented by selecting an appropriate number of sampling points to convert each speech into a multi-frame short-term speech signal.
  • each frame signal can be regarded as a stationary process, that is, the statistical characteristics are stable.
  • the windowing process indicates that the original short-term speech signal is used as an integrand and is integrated with a particular window function.
  • a window function is a real function that takes zero values except for a given interval, including but not limited to window functions such as rectangular windows, triangular windows, Hanning windows, and Hamming windows.
  • the window function is a Hanning window.
  • a higher short-term energy decision threshold is selected in the short-time power spectrum profile corresponding to the short-term speech signal, and the first coarse judgment is performed.
  • the starting and ending point of the effective speech signal is outside the time interval corresponding to the threshold value and the short-term energy envelope intersection.
  • a lower short-term energy decision threshold is selected, and the two points of the short-term energy envelope intersecting the threshold are used as the starting and ending points of the effective speech signal, and the effective speech can be extracted and output.
  • the output signal-to-noise ratio of the high frequency band is obviously reduced, and the noise in the short-time stationary signal is filtered by extracting the effective voice in the voice signal, thereby reducing the speaking.
  • the amount of calculation in the human model training process and the shortening of the speech processing time of the subsequent multiple steps can eliminate the noise interference of the silent segment and improve the correct rate of speech recognition.
  • the Mel's Mel frequency based on the human ear's auditory characteristics is nonlinearly related to the Hz frequency. Using the nonlinear relationship, the Hz spectral feature is calculated.
  • FIG. 3 shows a specific implementation flow of the voiceprint recognition method S102 provided by the embodiment of the present invention, as follows:
  • the effective speech in each of the training speeches is analyzed by a fast Fourier transform to obtain a power spectrum of the valid speech.
  • the spectrum of the effective speech of each frame is obtained, and after the modulo is applied to the spectrum, the square calculation is performed to obtain the power spectrum of the effective speech of each frame.
  • the different energy distributions characterized by the power spectrum represent different characteristics of speech.
  • the power spectrum is filtered by a filter set of a Mel scale, the filter set includes M triangular filters, and the logarithmic energy of the output of each of the triangular filters is obtained.
  • a first feature matrix including a dimension of the Mel frequency cepstral coefficient and a number of sub-frames of each training speech is output.
  • the energy of the effective speech signal per frame plus the logarithmic energy constructs a two-dimensional MFCC acoustic signature.
  • a variety of acoustic features, such as pitch, zero-crossing rate, and formant, are added in the process so that the output first feature matrix can be represented by "MFCC dimension x number of frames", which is the original input.
  • MFCC dimension x number of frames which is the original input.
  • the power spectrum of the effective speech is filtered by the triangular filter, the smoothing of the effective speech spectrum of each frame is realized, the harmonic effect is eliminated, and the resonance peak of the original speech signal corresponding to the effective speech of each frame is highlighted.
  • a long-term recurrent neural network model is constructed, and the first feature matrix is input into the neural network model to obtain an output parameter of the neural network model.
  • FIG. 4 shows a specific implementation flow of the voiceprint recognition method S103 provided by the embodiment of the present invention, which is described in detail as follows:
  • a long-term recursive neural network model is initialized, the neural network model including an input layer, a recursive layer containing long and short-term memory units, and an output layer.
  • the neural network model includes multiple levels, and the roles of the different layers are different.
  • the network structure of the long-short recursive neural network is explained. It can be understood that in the actual applied network structure, the number of layers of the neural network is not limited to five layers.
  • an open source deep learning tool CNTK is used to initialize a five-layer long-term recurrent neural network model.
  • the network structure of the neural network model is: one input layer, three recursive layers containing long-term and short-term memory units (LSTM), and An output layer.
  • Each recursive layer contains 1024 nodes and includes a two-level hierarchy, where one level is a mapping layer with 512 nodes.
  • the parameters of the LSTM recursive layer input are 83-dimensional speech feature vectors. Based on the information of the current frame, the first five frames, and the last five frames of effective speech, each time only one frame of effective speech is iterated, so there are a total of 913 dimensions (11 The feature vector of frame ⁇ 83 dimension is used as the input of LSTM. After entering the LSTM recursive layer, the 913-dimensional feature vector passes through 1024 hidden layer memory cells in sequence. Therefore, the input and output feature vector dimensions of the LSTM recursive layer are the same.
  • the first feature matrix is input to the neural network model.
  • the frame feature vectors in the first feature matrix are classified by using a Softmax classifier, and state clustering is performed according to the classification result to obtain a plurality of types of frame feature vectors.
  • a posterior probabilities of the various types of frame feature vectors are respectively calculated, and a posterior probabilities of the various types of frame feature vectors are output parameters of the neural network model.
  • the DNN output parameters are: Wherein, the i represents an effective speech of the i-th frame; the ⁇ represents text information corresponding to the speech; the f i represents a first feature matrix input by the deep neural network; and the k represents the k-th class of the output, corresponding to The number of Gaussian blends in a traditional mixed Gaussian model.
  • N feature extraction matrices of N training speeches are respectively obtained, and each feature extraction matrix corresponds to one The speaker model of the training speech.
  • FIG. 5 shows a specific implementation flow of the voiceprint recognition method S104 provided by the embodiment of the present invention, which is described in detail as follows:
  • a training parameter of the neural network model is acquired, where the training parameter is a mixed weight, an average value, and a variance of the output parameter.
  • a feature vector corresponding to the speaker of each training speech is calculated by using a forward-backward algorithm according to the training parameter and the speaker feature corresponding to the training speech.
  • the speaker feature corresponding to the training voice represents the speaker identity tag information of the training voice
  • the forward-backward algorithm is used according to the mixed weight, the mean value, the variance, and the tag information of the training voice of the DNN output parameter.
  • the Baum-Welch algorithm of the principle iteratively estimates the feature vector of the speaker corresponding to each training speech.
  • the content of the specific embodiment in S102 is also applicable in S105.
  • the difference is that the original voice signal processed in this step is the voice to be recognized, and the original voice signal processed in S102 is the training voice. The same, not repeated here.
  • a speaker model matching the second feature matrix is selected, and the selected speaker model corresponds to a speech.
  • the human output is the voiceprint recognition result of the speech to be recognized.
  • the similarity measurement algorithm includes, but is not limited to, an algorithm such as a distance measure, a similarity measure, and a matching measure to measure the degree of similarity between the second feature matrix and the speaker model in the objective objective representation form.
  • a speaker model that matches the second feature matrix is obtained by a cosine measure in a similarity measure algorithm.
  • the cosine of the angles of the two vectors in the vector space is used to measure the difference between the second feature matrix and the N speaker model individuals.
  • the similarity judgment of two vectors is performed by comparing the cosine distances of the input two i-vector low-dimensional vectors and setting a certain threshold. Wherein, the line connecting the feature point and the origin intersects the origin, and the smaller the angle, the more similar the two features are, and the larger the angle, the smaller the similarity of the two features.
  • a speaker model with the greatest similarity is selected, and the original speaker of the speaker model is the speaker of the speech to be recognized, thereby obtaining the voiceprint recognition result of the speech to be recognized.
  • the voiceprint background model is trained by supervised learning, and by combining the characteristics of the speaker, a more suitable acoustic feature set can be extracted from the original training voice data, thereby more accurately distinguishing the speaker.
  • the difference feature can obtain better voiceprint recognition effect in the scene of overlapping voices. Since the main process of recognition is based on the deep neural network model, it is possible to learn a more robust speaker model and solve the problem of low recognition accuracy of the existing voiceprint recognition method.
  • FIG. 6 is a structural block diagram of a voiceprint recognition apparatus according to an embodiment of the present invention.
  • the voiceprint recognition apparatus may be a software module, a hardware module, or a soft and hard. Combined module. For the convenience of explanation, only the parts related to the present embodiment are shown.
  • the apparatus includes:
  • the pre-processing module 61 is configured to pre-process the input K voices to obtain valid voices in each voice, where the voice includes the training voice and the voice to be recognized.
  • a first extraction module 62 configured to extract a Mel frequency cepstral coefficient acoustic feature of the effective speech in each training speech, output a dimension including the Mel frequency cepstral coefficient, and a framing of each training speech The first feature matrix of the number.
  • the building module 63 is configured to construct a long-term recursive neural network model, and input the first feature matrix into the neural network model to obtain an output parameter of the neural network model.
  • the training module 64 is configured to use the output parameters of the neural network model and the speaker features corresponding to each training speech to respectively train N feature extraction matrices of N training speeches, and each feature extraction matrix Corresponding to a speaker model of the training speech.
  • a second extraction module 65 configured to extract a Meer frequency cepstral coefficient acoustic feature of the valid speech in the to-be-recognized speech, output a dimension including the Mel frequency cepstral coefficient, and a number of sub-frames of the to-be-recognized speech The second feature matrix.
  • the identification module 66 is configured to select, in the N speaker models, a speaker model that matches the second feature matrix according to a preset similarity measurement algorithm, where the selected speaker model corresponds to The speaker output is the voiceprint recognition result of the speech to be recognized.
  • K and N are integers greater than zero, and K is greater than N.
  • the pre-processing module 61 includes:
  • a pre-emphasis sub-module configured to perform pre-emphasis processing on the input K voices respectively to improve a frequency band of the high-frequency signal in each voice
  • a conversion sub-module configured to convert each voice after the pre-emphasis processing into a short-time stationary signal by using a frame-and-window algorithm
  • a detecting submodule configured to distinguish the noise and the voice in the short-term stationary signal based on the endpoint detection algorithm, and output the voice in the short-term stationary signal as the effective voice of each voice.
  • the first extraction module 62 includes:
  • Obtaining a sub-module configured to analyze, by using a fast Fourier transform, the effective voice in each training voice, to obtain a power spectrum of the valid voice;
  • a filtering submodule configured to filter the power spectrum by using a filter group of a Meyer scale, the filter set comprising M triangular filters, and acquiring a logarithmic energy of the output of each of the triangular filters,
  • the M is an integer greater than zero;
  • a transform submodule configured to perform a discrete cosine transform on the logarithmic energy, and output an acoustic characteristic of the Mel frequency cepstral coefficient of the effective speech
  • an output submodule configured to output, according to the Mel frequency cepstral coefficient acoustic feature, a first feature matrix including a dimension of the Mel frequency cepstral coefficient and a number of framings of each training speech.
  • the building module 63 includes:
  • An initialization sub-module for initializing a long-term recursive neural network model comprising an input layer, a recursive layer containing long and short-term memory units, and an output layer;
  • An input submodule configured to input the first feature matrix into the neural network model
  • a classification sub-module configured to classify frame feature vectors in the first feature matrix by using a Softmax classifier, and perform state clustering according to the classification result to obtain a plurality of types of frame feature vectors
  • a calculation submodule configured to respectively calculate a posterior probabilities of the various types of frame feature vectors, wherein the posterior probabilities of the various types of frame feature vectors are output parameters of the neural network model.
  • the training module 64 includes:
  • a parameter obtaining submodule configured to acquire training parameters of the neural network model, where the training parameter is a mixed weight, an average value, and a variance of the output parameter;
  • a feature acquisition sub-module configured to calculate, according to the training parameter and the speaker feature corresponding to the training speech, a feature vector of the speaker corresponding to each training voice by using a forward-backward algorithm
  • An iterative sub-module configured to iterate to the training parameter of the neural network model and the feature vector of each of the training speech corresponding to the speaker, to obtain a feature extraction matrix of each training speech
  • the voiceprint background model is trained by supervised learning, and by combining the characteristics of the speaker, a more suitable acoustic feature set can be extracted from the original training voice data, thereby more accurately distinguishing the speaker.
  • the difference feature can obtain better voiceprint recognition effect in the scene of overlapping voices. Since the main process of recognition is based on the deep neural network model, it is possible to learn a more robust speaker model and solve the problem of low recognition accuracy of the existing voiceprint recognition method.
  • FIG. 7 is a schematic diagram of an electronic device according to an embodiment of the present invention.
  • the electronic device 7 of this embodiment includes a processor 70, a memory 71, and a computer program 72, such as a voiceprint recognition program, stored in the memory 71 and operable on the processor 70.
  • the processor 70 executes the computer program 72, the steps in the foregoing various file management method embodiments are implemented, such as steps 101 to 106 shown in FIG.
  • the processor 70 when executing the computer program 72, implements the functions of the modules/units in the various apparatus embodiments described above, such as the functions of the modules 61-66 shown in FIG.
  • the computer program 72 can be partitioned into one or more modules/units that are stored in the memory 71 and executed by the processor 70 to complete this invention.
  • the one The plurality of modules/units may be a series of computer program instruction segments capable of performing a particular function, the instruction segments being used to describe the execution of the computer program 72 in the electronic device 7.
  • the electronic device 7 can be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the electronic device 7 may include, but is not limited to, a processor 70, a memory 71. It will be understood by those skilled in the art that FIG. 7 is merely an example of the electronic device 7, and does not constitute a limitation on the electronic device 7, and may include more or less components than those illustrated, or combine some components, or different components.
  • the electronic device 7 may further include an input and output device, a network access device, a bus, and the like.
  • the processor 70 may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 71 may be an internal storage unit of the electronic device 7, such as a hard disk or memory of the electronic device 7.
  • the memory 71 may also be an external storage device of the electronic device 7, such as a plug-in hard disk equipped on the electronic device 7, a smart memory card (SMC), and a secure digital (SD). Card, flash card, etc. Further, the memory 71 may also include both an internal storage unit of the electronic device 7 and an external storage device.
  • the memory 71 is used to store the computer program and other programs and data required by the electronic device 7.
  • the memory 71 can also be used to temporarily store data that has been output or is about to be output.
  • each functional module and module described above is exemplified. In practical applications, the above functions may be assigned to different functional modules according to needs.
  • the module is completed by dividing the internal structure of the device into different functional modules or modules to perform all or part of the functions described above.
  • Each functional module and module in the embodiment may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module, and the integrated module may be implemented by hardware.
  • Formal implementation can also be implemented in the form of software functional modules.
  • the specific names of the respective functional modules and modules are only for the purpose of facilitating mutual differentiation, and are not intended to limit the scope of protection of the present application.
  • modules and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
  • the disclosed apparatus and method may be implemented in other manners.
  • the system embodiment described above is merely illustrative.
  • the division of the module or module is only a logical function division.
  • there may be another division manner for example, multiple modules or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or module, and may be in electrical, mechanical or other form.
  • the modules described as separate components may or may not be physically separated.
  • the components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. on. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules if implemented in the form of software functional modules and sold or used as separate products, may be stored in a computer readable storage medium.
  • the medium includes a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the methods described in various embodiments of the embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephonic Communication Services (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé et un appareil de reconnaissance d'empreinte vocale, et un dispositif électronique et un support, qui sont applicables au domaine technique de l'authentification d'identité. Le procédé consiste : à prétraiter une voix d'entrée pour acquérir une voix valide dans la voix ; à extraire une caractéristique acoustique MFCC de la voix et délivrer en sortie des première et seconde matrices de caractéristiques contenant une dimension MFCC et un numéro de cadrage vocal ; à construire un modèle de réseau neuronal récurrent à court terme long et prendre la première matrice de caractéristiques comme entrée ; à former des matrices d'extraction de caractéristiques d'apprentissage au moyen d'un paramètre d'apprentissage du modèle de réseau neuronal et d'une caractéristique de locuteur de la voix, chacune des matrices d'extraction de caractéristiques correspondant à un modèle de locuteur ; et à sélectionner un modèle de locuteur correspondant à la seconde matrice de caractéristiques, une sortie de locuteur correspondant au modèle de locuteur correspondant étant un résultat de reconnaissance d'empreinte vocale. Ainsi, une caractéristique acoustique plus appropriée peut être extraite d'une voix d'apprentissage, de telle sorte que les caractéristiques distinctives de locuteurs peuvent être reconnues plus précisément, qu'un modèle de locuteur présentant une robustesse supérieure peut être appris, et qu'un meilleur résultat de reconnaissance d'empreinte vocale peut être acquis.
PCT/CN2017/099707 2016-12-15 2017-08-30 Procédé et appareil de reconnaissance d'empreinte vocale, et dispositif électronique et support WO2018107810A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611158891.9A CN107610707B (zh) 2016-12-15 2016-12-15 一种声纹识别方法及装置
CN201611158891.9 2016-12-15

Publications (1)

Publication Number Publication Date
WO2018107810A1 true WO2018107810A1 (fr) 2018-06-21

Family

ID=61055561

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/099707 WO2018107810A1 (fr) 2016-12-15 2017-08-30 Procédé et appareil de reconnaissance d'empreinte vocale, et dispositif électronique et support

Country Status (2)

Country Link
CN (1) CN107610707B (fr)
WO (1) WO2018107810A1 (fr)

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036437A (zh) * 2018-08-14 2018-12-18 平安科技(深圳)有限公司 口音识别方法、装置、计算机装置及计算机可读存储介质
CN109102799A (zh) * 2018-08-17 2018-12-28 信阳师范学院 一种基于频域系数对数和的语音端点检测方法
CN109285551A (zh) * 2018-09-18 2019-01-29 上海海事大学 基于wmfcc和dnn的帕金森患者声纹识别方法
CN109308903A (zh) * 2018-08-02 2019-02-05 平安科技(深圳)有限公司 语音模仿方法、终端设备及计算机可读存储介质
CN110060677A (zh) * 2019-04-04 2019-07-26 平安科技(深圳)有限公司 语音遥控器控制方法、装置及计算机可读存储介质
CN110059059A (zh) * 2019-03-15 2019-07-26 平安科技(深圳)有限公司 语音信息的批量筛选方法、装置、计算机设备及存储介质
CN110797008A (zh) * 2018-07-16 2020-02-14 阿里巴巴集团控股有限公司 一种远场语音识别方法、语音识别模型训练方法和服务器
CN110853654A (zh) * 2019-11-17 2020-02-28 西北工业大学 一种模型生成方法、声纹识别方法及对应装置
CN110895935A (zh) * 2018-09-13 2020-03-20 阿里巴巴集团控股有限公司 语音识别方法、系统、设备和介质
CN111048072A (zh) * 2019-11-21 2020-04-21 中国南方电网有限责任公司 一种应用于电力企业的声纹识别方法
CN111145736A (zh) * 2019-12-09 2020-05-12 华为技术有限公司 语音识别方法及相关设备
CN111161744A (zh) * 2019-12-06 2020-05-15 华南理工大学 同时优化深度表征学习与说话人类别估计的说话人聚类方法
CN111210840A (zh) * 2020-01-02 2020-05-29 厦门快商通科技股份有限公司 一种年龄预测方法和装置以及设备
CN111354364A (zh) * 2020-04-23 2020-06-30 上海依图网络科技有限公司 一种基于rnn聚合方式的声纹识别方法与系统
CN111354367A (zh) * 2018-12-24 2020-06-30 中国移动通信有限公司研究院 一种语音处理方法、装置及计算机存储介质
CN111414511A (zh) * 2020-03-25 2020-07-14 合肥讯飞数码科技有限公司 自动声纹建模入库方法、装置以及设备
CN111415654A (zh) * 2019-01-07 2020-07-14 北京嘀嘀无限科技发展有限公司 一种音频识别方法和装置、以及声学模型训练方法和装置
CN111462762A (zh) * 2020-03-25 2020-07-28 清华大学 一种说话人向量正则化方法、装置、电子设备和存储介质
CN111508498A (zh) * 2020-04-09 2020-08-07 携程计算机技术(上海)有限公司 对话式语音识别方法、系统、电子设备和存储介质
CN111564163A (zh) * 2020-05-08 2020-08-21 宁波大学 一种基于rnn的多种伪造操作语音检测方法
CN111613231A (zh) * 2019-02-26 2020-09-01 广州慧睿思通信息科技有限公司 语音数据处理方法、装置、计算机设备和存储介质
CN111681669A (zh) * 2020-05-14 2020-09-18 上海眼控科技股份有限公司 一种基于神经网络的语音数据的识别方法与设备
CN111768801A (zh) * 2020-06-12 2020-10-13 瑞声科技(新加坡)有限公司 气流杂音消除方法、装置、计算机设备及存储介质
CN111768761A (zh) * 2019-03-14 2020-10-13 京东数字科技控股有限公司 一种语音识别模型的训练方法和装置
CN111798840A (zh) * 2020-07-16 2020-10-20 中移在线服务有限公司 语音关键词识别方法和装置
CN111816218A (zh) * 2020-07-31 2020-10-23 平安科技(深圳)有限公司 语音端点检测方法、装置、设备及存储介质
CN111816205A (zh) * 2020-07-09 2020-10-23 中国人民解放军战略支援部队航天工程大学 一种基于飞机音频的机型智能识别方法
CN111862953A (zh) * 2019-12-05 2020-10-30 北京嘀嘀无限科技发展有限公司 语音识别模型的训练方法、语音识别方法及装置
CN111862985A (zh) * 2019-05-17 2020-10-30 北京嘀嘀无限科技发展有限公司 一种语音识别装置、方法、电子设备及存储介质
CN111883106A (zh) * 2020-07-27 2020-11-03 腾讯音乐娱乐科技(深圳)有限公司 音频处理方法及装置
CN112163457A (zh) * 2020-09-03 2021-01-01 中国联合网络通信集团有限公司 一种通信电台识别方法及装置
CN112270933A (zh) * 2020-11-12 2021-01-26 北京猿力未来科技有限公司 一种音频识别方法和装置
CN112382300A (zh) * 2020-12-14 2021-02-19 北京远鉴信息技术有限公司 声纹鉴定方法、模型训练方法、装置、设备及存储介质
CN112420070A (zh) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 自动标注方法、装置、电子设备及计算机可读存储介质
WO2021051608A1 (fr) * 2019-09-20 2021-03-25 平安科技(深圳)有限公司 Procédé et dispositif de reconnaissance d'empreinte vocale utilisant un apprentissage profond et appareil
CN112562691A (zh) * 2020-11-27 2021-03-26 平安科技(深圳)有限公司 一种声纹识别的方法、装置、计算机设备及存储介质
CN112637428A (zh) * 2020-12-29 2021-04-09 平安科技(深圳)有限公司 无效通话判断方法、装置、计算机设备及存储介质
CN112712820A (zh) * 2020-12-25 2021-04-27 广州欢城文化传媒有限公司 一种音色分类方法、装置、设备和介质
CN112750446A (zh) * 2020-12-30 2021-05-04 标贝(北京)科技有限公司 语音转换方法、装置和系统及存储介质
CN112822186A (zh) * 2020-12-31 2021-05-18 国网江苏省电力有限公司信息通信分公司 基于语音认证的电力系统ip调度台通知广播方法及系统
CN112883812A (zh) * 2021-01-22 2021-06-01 广州联智信息科技有限公司 一种基于深度学习的肺音分类方法、系统及存储介质
CN112951245A (zh) * 2021-03-09 2021-06-11 江苏开放大学(江苏城市职业学院) 一种融入静态分量的动态声纹特征提取方法
CN113178196A (zh) * 2021-04-20 2021-07-27 平安国际融资租赁有限公司 音频数据提取方法、装置、计算机设备和存储介质
CN113271430A (zh) * 2021-05-13 2021-08-17 中国联合网络通信集团有限公司 网络视频会议中防干扰方法、系统、设备及存储介质
CN113299295A (zh) * 2021-05-11 2021-08-24 支付宝(杭州)信息技术有限公司 声纹编码网络的训练方法及装置
CN113393832A (zh) * 2021-06-03 2021-09-14 清华大学深圳国际研究生院 一种基于全局情感编码的虚拟人动画合成方法及系统
CN113408539A (zh) * 2020-11-26 2021-09-17 腾讯科技(深圳)有限公司 数据识别方法、装置、电子设备及存储介质
CN113421573A (zh) * 2021-06-18 2021-09-21 马上消费金融股份有限公司 身份识别模型训练方法、身份识别方法及装置
CN113593581A (zh) * 2021-07-12 2021-11-02 西安讯飞超脑信息科技有限公司 声纹判别方法、装置、计算机设备和存储介质
CN113804767A (zh) * 2021-08-16 2021-12-17 东南大学 一种螺栓失效检测方法
CN113838469A (zh) * 2021-09-09 2021-12-24 竹间智能科技(上海)有限公司 一种身份识别方法、系统及存储介质
CN113870840A (zh) * 2021-09-27 2021-12-31 京东科技信息技术有限公司 语音识别方法、装置及相关设备
CN113948089A (zh) * 2020-06-30 2022-01-18 北京猎户星空科技有限公司 声纹模型训练和声纹识别方法、装置、设备及介质
CN114067834A (zh) * 2020-07-30 2022-02-18 中国移动通信集团有限公司 一种不良前导音识别方法、装置、存储介质和计算机设备
CN114495948A (zh) * 2022-04-18 2022-05-13 北京快联科技有限公司 一种声纹识别方法及装置
US11335352B2 (en) * 2017-09-29 2022-05-17 Tencent Technology (Shenzhen) Company Limited Voice identity feature extractor and classifier training
CN114974259A (zh) * 2021-12-23 2022-08-30 号百信息服务有限公司 一种声纹识别方法
CN115171700A (zh) * 2022-06-13 2022-10-11 武汉大学 一种基于脉冲神经网络的声纹识别语音助手方法
CN115223576A (zh) * 2022-06-23 2022-10-21 国网江苏省电力有限公司南京供电分公司 基于mfcc的变压器声纹特征可控精度提取和识别方法与系统
CN115457968A (zh) * 2022-08-26 2022-12-09 华南理工大学 基于混合分辨率深度可分卷积网络的声纹确认方法
CN115472168A (zh) * 2022-08-24 2022-12-13 武汉理工大学 耦合bgcc和pwpe特征的短时语音声纹识别方法、系统及设备
EP3460793B1 (fr) * 2017-07-25 2023-04-05 Ping An Technology (Shenzhen) Co., Ltd. Appareil électronique, procédé et système de vérification d'identité et support de stockage lisible par ordinateur
CN117475360A (zh) * 2023-12-27 2024-01-30 南京纳实医学科技有限公司 基于改进型mlstm-fcn的音视频特点的生物体征提取与分析方法
CN117577137A (zh) * 2024-01-15 2024-02-20 宁德时代新能源科技股份有限公司 切刀健康评估方法、装置、设备及存储介质

Families Citing this family (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108447490B (zh) * 2018-02-12 2020-08-18 阿里巴巴集团控股有限公司 基于记忆性瓶颈特征的声纹识别的方法及装置
CN108564955B (zh) * 2018-03-19 2019-09-03 平安科技(深圳)有限公司 电子装置、身份验证方法和计算机可读存储介质
CN108564954B (zh) * 2018-03-19 2020-01-10 平安科技(深圳)有限公司 深度神经网络模型、电子装置、身份验证方法和存储介质
CN110349585B (zh) * 2018-04-04 2023-05-05 富士通株式会社 语音认证方法和信息处理设备
CN108650266B (zh) * 2018-05-14 2020-02-18 平安科技(深圳)有限公司 服务器、声纹验证的方法及存储介质
CN108877814B (zh) * 2018-05-23 2020-12-29 中南林业科技大学 窨井盖盗损检测方法、智能终端及计算机可读存储介质
CN108831484A (zh) * 2018-05-29 2018-11-16 广东声将军科技有限公司 一种离线的且与语言种类无关的声纹识别方法及装置
CN108766445A (zh) * 2018-05-30 2018-11-06 苏州思必驰信息科技有限公司 声纹识别方法及系统
CN108777146A (zh) * 2018-05-31 2018-11-09 平安科技(深圳)有限公司 语音模型训练方法、说话人识别方法、装置、设备及介质
CN108899032A (zh) * 2018-06-06 2018-11-27 平安科技(深圳)有限公司 声纹识别方法、装置、计算机设备及存储介质
CN108806716A (zh) * 2018-06-15 2018-11-13 想象科技(北京)有限公司 用于基于情感框架的计算机化匹配的方法与装置
CN108776795A (zh) * 2018-06-20 2018-11-09 邯郸学院 用户身份识别方法、装置及终端设备
CN108847234B (zh) * 2018-06-28 2020-10-30 广州华多网络科技有限公司 唇语合成方法、装置、电子设备及存储介质
CN108899037B (zh) * 2018-07-05 2024-01-26 平安科技(深圳)有限公司 动物声纹特征提取方法、装置及电子设备
CN109166586B (zh) * 2018-08-02 2023-07-07 平安科技(深圳)有限公司 一种识别说话人的方法及终端
CN108847245B (zh) * 2018-08-06 2020-06-23 北京海天瑞声科技股份有限公司 语音检测方法和装置
CN109147146B (zh) * 2018-08-21 2022-04-12 平安科技(深圳)有限公司 语音取号的方法及终端设备
CN110858290B (zh) * 2018-08-24 2023-10-17 比亚迪股份有限公司 驾驶员异常行为识别方法、装置、设备及存储介质
CN108847253B (zh) * 2018-09-05 2023-06-13 平安科技(深圳)有限公司 车辆型号识别方法、装置、计算机设备及存储介质
CN109065069B (zh) 2018-10-10 2020-09-04 广州市百果园信息技术有限公司 一种音频检测方法、装置、设备及存储介质
CN109346107B (zh) * 2018-10-10 2022-09-30 中山大学 一种基于lstm的独立说话人语音发音逆求解的方法
CN109257362A (zh) * 2018-10-11 2019-01-22 平安科技(深圳)有限公司 声纹验证的方法、装置、计算机设备以及存储介质
CN109256147B (zh) * 2018-10-30 2022-06-10 腾讯音乐娱乐科技(深圳)有限公司 音频节拍检测方法、装置及存储介质
CN109545228A (zh) * 2018-12-14 2019-03-29 厦门快商通信息技术有限公司 一种端到端说话人分割方法及系统
CN110010133A (zh) * 2019-03-06 2019-07-12 平安科技(深圳)有限公司 基于短文本的声纹检测方法、装置、设备及存储介质
CN109903774A (zh) * 2019-04-12 2019-06-18 南京大学 一种基于角度间隔损失函数的声纹识别方法
CN110265035B (zh) * 2019-04-25 2021-08-06 武汉大晟极科技有限公司 一种基于深度学习的说话人识别方法
CN110276189B (zh) * 2019-06-27 2022-02-11 电子科技大学 一种基于步态信息的用户身份认证方法
CN110570870A (zh) * 2019-09-20 2019-12-13 平安科技(深圳)有限公司 一种文本无关的声纹识别方法、装置及设备
CN110875043B (zh) * 2019-11-11 2022-06-17 广州国音智能科技有限公司 声纹识别方法、装置、移动终端及计算机可读存储介质
CN110660399A (zh) * 2019-11-11 2020-01-07 广州国音智能科技有限公司 声纹识别的训练方法、装置、终端及计算机存储介质
CN111312293A (zh) * 2020-02-17 2020-06-19 杭州电子科技大学 一种基于深度学习对呼吸暂停症患者的识别方法及系统
CN111341327A (zh) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 一种基于粒子群算法的说话人语音识别方法、装置和设备
CN111312208A (zh) * 2020-03-09 2020-06-19 广州深声科技有限公司 一种说话人不相干的神经网络声码器系统
CN111341307A (zh) * 2020-03-13 2020-06-26 腾讯科技(深圳)有限公司 语音识别方法、装置、电子设备及存储介质
CN111524522B (zh) * 2020-04-23 2023-04-07 上海依图网络科技有限公司 一种基于多种语音特征融合的声纹识别方法及系统
CN111583938B (zh) * 2020-05-19 2023-02-03 威盛电子股份有限公司 电子装置与语音识别方法
CN111951791B (zh) * 2020-08-26 2024-05-17 上海依图网络科技有限公司 声纹识别模型训练方法、识别方法、电子设备及存储介质
CN112259106B (zh) * 2020-10-20 2024-06-11 网易(杭州)网络有限公司 声纹识别方法、装置、存储介质及计算机设备
CN112259114A (zh) * 2020-10-20 2021-01-22 网易(杭州)网络有限公司 语音处理方法及装置、计算机存储介质、电子设备
CN112347788A (zh) * 2020-11-06 2021-02-09 平安消费金融有限公司 语料处理方法、装置及存储介质
CN112489677B (zh) * 2020-11-20 2023-09-22 平安科技(深圳)有限公司 基于神经网络的语音端点检测方法、装置、设备及介质
CN112562653B (zh) * 2020-11-26 2023-05-26 睿云联(厦门)网络通讯技术有限公司 一种基于人类行为经验的离线语音识别学习方法
CN112669820B (zh) * 2020-12-16 2023-08-04 平安科技(深圳)有限公司 基于语音识别的考试作弊识别方法、装置及计算机设备
CN112786059A (zh) * 2021-03-11 2021-05-11 合肥市清大创新研究院有限公司 一种基于人工智能的声纹特征提取方法及装置
CN113178205B (zh) * 2021-04-30 2024-07-05 平安科技(深圳)有限公司 语音分离方法、装置、计算机设备及存储介质
CN113269084B (zh) * 2021-05-19 2022-11-01 上海外国语大学 基于观众群体情感神经相似性的影视剧市场预测方法及系统
CN113611314A (zh) * 2021-08-03 2021-11-05 成都理工大学 一种说话人识别方法及系统
CN113488059A (zh) * 2021-08-13 2021-10-08 广州市迪声音响有限公司 一种声纹识别方法及系统
CN114826709B (zh) * 2022-04-15 2024-07-09 马上消费金融股份有限公司 身份认证和声学环境检测方法、系统、电子设备及介质
CN114894285A (zh) * 2022-04-29 2022-08-12 广东科达计量科技有限公司 一种具有全方位立体识别功能的无人值守地磅称重系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102820033A (zh) * 2012-08-17 2012-12-12 南京大学 一种声纹识别方法
CN103971690A (zh) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 一种声纹识别方法和装置
CN104008751A (zh) * 2014-06-18 2014-08-27 周婷婷 一种基于bp神经网络的说话人识别方法
US20150149165A1 (en) * 2013-11-27 2015-05-28 International Business Machines Corporation Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors
US20150161994A1 (en) * 2013-12-05 2015-06-11 Nuance Communications, Inc. Method and Apparatus for Speech Recognition Using Neural Networks with Speaker Adaptation
CN104952448A (zh) * 2015-05-04 2015-09-30 张爱英 一种双向长短时记忆递归神经网络的特征增强方法及系统
CN105869644A (zh) * 2016-05-25 2016-08-17 百度在线网络技术(北京)有限公司 基于深度学习的声纹认证方法和装置

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8345962B2 (en) * 2007-11-29 2013-01-01 Nec Laboratories America, Inc. Transfer learning methods and systems for feed-forward visual recognition systems
US8504361B2 (en) * 2008-02-07 2013-08-06 Nec Laboratories America, Inc. Deep neural networks and methods for using same
CN102446505A (zh) * 2010-10-15 2012-05-09 盛乐信息技术(上海)有限公司 联合因子分析方法及联合因子分析声纹认证方法
CN102479511A (zh) * 2010-11-23 2012-05-30 盛乐信息技术(上海)有限公司 一种大规模声纹认证方法及其系统
CN102324232A (zh) * 2011-09-12 2012-01-18 辽宁工业大学 基于高斯混合模型的声纹识别方法及系统
CN103873254B (zh) * 2014-03-03 2017-01-25 杭州电子科技大学 一种人类声纹生物密钥生成方法
GB201416303D0 (en) * 2014-09-16 2014-10-29 Univ Hull Speech synthesis
KR102305584B1 (ko) * 2015-01-19 2021-09-27 삼성전자주식회사 언어 모델 학습 방법 및 장치, 언어 인식 방법 및 장치
JP6453681B2 (ja) * 2015-03-18 2019-01-16 株式会社東芝 演算装置、演算方法およびプログラム
WO2016172871A1 (fr) * 2015-04-29 2016-11-03 华侃如 Procédé de synthèse de parole basé sur des réseaux neuronaux récurrents
CN105139864B (zh) * 2015-08-17 2019-05-07 北京眼神智能科技有限公司 语音识别方法和装置
CN105513597B (zh) * 2015-12-30 2018-07-10 百度在线网络技术(北京)有限公司 声纹认证处理方法及装置
CN106228045A (zh) * 2016-07-06 2016-12-14 吴本刚 一种身份识别系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102820033A (zh) * 2012-08-17 2012-12-12 南京大学 一种声纹识别方法
CN103971690A (zh) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 一种声纹识别方法和装置
US20150149165A1 (en) * 2013-11-27 2015-05-28 International Business Machines Corporation Speaker Adaptation of Neural Network Acoustic Models Using I-Vectors
US20150161994A1 (en) * 2013-12-05 2015-06-11 Nuance Communications, Inc. Method and Apparatus for Speech Recognition Using Neural Networks with Speaker Adaptation
CN104008751A (zh) * 2014-06-18 2014-08-27 周婷婷 一种基于bp神经网络的说话人识别方法
CN104952448A (zh) * 2015-05-04 2015-09-30 张爱英 一种双向长短时记忆递归神经网络的特征增强方法及系统
CN105869644A (zh) * 2016-05-25 2016-08-17 百度在线网络技术(北京)有限公司 基于深度学习的声纹认证方法和装置

Cited By (98)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3460793B1 (fr) * 2017-07-25 2023-04-05 Ping An Technology (Shenzhen) Co., Ltd. Appareil électronique, procédé et système de vérification d'identité et support de stockage lisible par ordinateur
US11335352B2 (en) * 2017-09-29 2022-05-17 Tencent Technology (Shenzhen) Company Limited Voice identity feature extractor and classifier training
CN110797008A (zh) * 2018-07-16 2020-02-14 阿里巴巴集团控股有限公司 一种远场语音识别方法、语音识别模型训练方法和服务器
CN110797008B (zh) * 2018-07-16 2024-03-29 阿里巴巴集团控股有限公司 一种远场语音识别方法、语音识别模型训练方法和服务器
CN109308903A (zh) * 2018-08-02 2019-02-05 平安科技(深圳)有限公司 语音模仿方法、终端设备及计算机可读存储介质
CN109036437A (zh) * 2018-08-14 2018-12-18 平安科技(深圳)有限公司 口音识别方法、装置、计算机装置及计算机可读存储介质
CN109102799A (zh) * 2018-08-17 2018-12-28 信阳师范学院 一种基于频域系数对数和的语音端点检测方法
CN110895935A (zh) * 2018-09-13 2020-03-20 阿里巴巴集团控股有限公司 语音识别方法、系统、设备和介质
CN110895935B (zh) * 2018-09-13 2023-10-27 阿里巴巴集团控股有限公司 语音识别方法、系统、设备和介质
CN109285551A (zh) * 2018-09-18 2019-01-29 上海海事大学 基于wmfcc和dnn的帕金森患者声纹识别方法
CN111354367A (zh) * 2018-12-24 2020-06-30 中国移动通信有限公司研究院 一种语音处理方法、装置及计算机存储介质
CN111354367B (zh) * 2018-12-24 2023-06-23 中国移动通信有限公司研究院 一种语音处理方法、装置及计算机存储介质
CN111415654B (zh) * 2019-01-07 2023-12-08 北京嘀嘀无限科技发展有限公司 一种音频识别方法和装置、以及声学模型训练方法和装置
CN111415654A (zh) * 2019-01-07 2020-07-14 北京嘀嘀无限科技发展有限公司 一种音频识别方法和装置、以及声学模型训练方法和装置
CN111613231A (zh) * 2019-02-26 2020-09-01 广州慧睿思通信息科技有限公司 语音数据处理方法、装置、计算机设备和存储介质
CN111768761B (zh) * 2019-03-14 2024-03-01 京东科技控股股份有限公司 一种语音识别模型的训练方法和装置
CN111768761A (zh) * 2019-03-14 2020-10-13 京东数字科技控股有限公司 一种语音识别模型的训练方法和装置
CN110059059A (zh) * 2019-03-15 2019-07-26 平安科技(深圳)有限公司 语音信息的批量筛选方法、装置、计算机设备及存储介质
CN110059059B (zh) * 2019-03-15 2024-04-16 平安科技(深圳)有限公司 语音信息的批量筛选方法、装置、计算机设备及存储介质
CN110060677A (zh) * 2019-04-04 2019-07-26 平安科技(深圳)有限公司 语音遥控器控制方法、装置及计算机可读存储介质
CN111862985A (zh) * 2019-05-17 2020-10-30 北京嘀嘀无限科技发展有限公司 一种语音识别装置、方法、电子设备及存储介质
CN111862985B (zh) * 2019-05-17 2024-05-31 北京嘀嘀无限科技发展有限公司 一种语音识别装置、方法、电子设备及存储介质
CN112420070A (zh) * 2019-08-22 2021-02-26 北京峰趣互联网信息服务有限公司 自动标注方法、装置、电子设备及计算机可读存储介质
WO2021051608A1 (fr) * 2019-09-20 2021-03-25 平安科技(深圳)有限公司 Procédé et dispositif de reconnaissance d'empreinte vocale utilisant un apprentissage profond et appareil
CN110853654A (zh) * 2019-11-17 2020-02-28 西北工业大学 一种模型生成方法、声纹识别方法及对应装置
CN110853654B (zh) * 2019-11-17 2021-12-21 西北工业大学 一种模型生成方法、声纹识别方法及对应装置
CN111048072A (zh) * 2019-11-21 2020-04-21 中国南方电网有限责任公司 一种应用于电力企业的声纹识别方法
CN111862953B (zh) * 2019-12-05 2023-08-22 北京嘀嘀无限科技发展有限公司 语音识别模型的训练方法、语音识别方法及装置
CN111862953A (zh) * 2019-12-05 2020-10-30 北京嘀嘀无限科技发展有限公司 语音识别模型的训练方法、语音识别方法及装置
CN111161744A (zh) * 2019-12-06 2020-05-15 华南理工大学 同时优化深度表征学习与说话人类别估计的说话人聚类方法
CN111161744B (zh) * 2019-12-06 2023-04-28 华南理工大学 同时优化深度表征学习与说话人类别估计的说话人聚类方法
CN111145736A (zh) * 2019-12-09 2020-05-12 华为技术有限公司 语音识别方法及相关设备
CN111145736B (zh) * 2019-12-09 2022-10-04 华为技术有限公司 语音识别方法及相关设备
CN111210840A (zh) * 2020-01-02 2020-05-29 厦门快商通科技股份有限公司 一种年龄预测方法和装置以及设备
CN111414511A (zh) * 2020-03-25 2020-07-14 合肥讯飞数码科技有限公司 自动声纹建模入库方法、装置以及设备
CN111462762A (zh) * 2020-03-25 2020-07-28 清华大学 一种说话人向量正则化方法、装置、电子设备和存储介质
CN111414511B (zh) * 2020-03-25 2023-08-22 合肥讯飞数码科技有限公司 自动声纹建模入库方法、装置以及设备
CN111508498B (zh) * 2020-04-09 2024-01-30 携程计算机技术(上海)有限公司 对话式语音识别方法、系统、电子设备和存储介质
CN111508498A (zh) * 2020-04-09 2020-08-07 携程计算机技术(上海)有限公司 对话式语音识别方法、系统、电子设备和存储介质
CN111354364A (zh) * 2020-04-23 2020-06-30 上海依图网络科技有限公司 一种基于rnn聚合方式的声纹识别方法与系统
CN111354364B (zh) * 2020-04-23 2023-05-02 上海依图网络科技有限公司 一种基于rnn聚合方式的声纹识别方法与系统
CN111564163B (zh) * 2020-05-08 2023-12-15 宁波大学 一种基于rnn的多种伪造操作语音检测方法
CN111564163A (zh) * 2020-05-08 2020-08-21 宁波大学 一种基于rnn的多种伪造操作语音检测方法
CN111681669A (zh) * 2020-05-14 2020-09-18 上海眼控科技股份有限公司 一种基于神经网络的语音数据的识别方法与设备
CN111768801A (zh) * 2020-06-12 2020-10-13 瑞声科技(新加坡)有限公司 气流杂音消除方法、装置、计算机设备及存储介质
CN113948089A (zh) * 2020-06-30 2022-01-18 北京猎户星空科技有限公司 声纹模型训练和声纹识别方法、装置、设备及介质
CN111816205B (zh) * 2020-07-09 2023-06-20 中国人民解放军战略支援部队航天工程大学 一种基于飞机音频的机型智能识别方法
CN111816205A (zh) * 2020-07-09 2020-10-23 中国人民解放军战略支援部队航天工程大学 一种基于飞机音频的机型智能识别方法
CN111798840A (zh) * 2020-07-16 2020-10-20 中移在线服务有限公司 语音关键词识别方法和装置
CN111798840B (zh) * 2020-07-16 2023-08-08 中移在线服务有限公司 语音关键词识别方法和装置
CN111883106B (zh) * 2020-07-27 2024-04-19 腾讯音乐娱乐科技(深圳)有限公司 音频处理方法及装置
CN111883106A (zh) * 2020-07-27 2020-11-03 腾讯音乐娱乐科技(深圳)有限公司 音频处理方法及装置
CN114067834A (zh) * 2020-07-30 2022-02-18 中国移动通信集团有限公司 一种不良前导音识别方法、装置、存储介质和计算机设备
CN111816218A (zh) * 2020-07-31 2020-10-23 平安科技(深圳)有限公司 语音端点检测方法、装置、设备及存储介质
CN111816218B (zh) * 2020-07-31 2024-05-28 平安科技(深圳)有限公司 语音端点检测方法、装置、设备及存储介质
CN112163457A (zh) * 2020-09-03 2021-01-01 中国联合网络通信集团有限公司 一种通信电台识别方法及装置
CN112270933B (zh) * 2020-11-12 2024-03-12 北京猿力未来科技有限公司 一种音频识别方法和装置
CN112270933A (zh) * 2020-11-12 2021-01-26 北京猿力未来科技有限公司 一种音频识别方法和装置
CN113408539A (zh) * 2020-11-26 2021-09-17 腾讯科技(深圳)有限公司 数据识别方法、装置、电子设备及存储介质
CN112562691A (zh) * 2020-11-27 2021-03-26 平安科技(深圳)有限公司 一种声纹识别的方法、装置、计算机设备及存储介质
CN112382300A (zh) * 2020-12-14 2021-02-19 北京远鉴信息技术有限公司 声纹鉴定方法、模型训练方法、装置、设备及存储介质
CN112712820A (zh) * 2020-12-25 2021-04-27 广州欢城文化传媒有限公司 一种音色分类方法、装置、设备和介质
CN112637428A (zh) * 2020-12-29 2021-04-09 平安科技(深圳)有限公司 无效通话判断方法、装置、计算机设备及存储介质
CN112750446A (zh) * 2020-12-30 2021-05-04 标贝(北京)科技有限公司 语音转换方法、装置和系统及存储介质
CN112750446B (zh) * 2020-12-30 2024-05-24 标贝(青岛)科技有限公司 语音转换方法、装置和系统及存储介质
CN112822186A (zh) * 2020-12-31 2021-05-18 国网江苏省电力有限公司信息通信分公司 基于语音认证的电力系统ip调度台通知广播方法及系统
CN112883812B (zh) * 2021-01-22 2024-05-03 广东白云学院 一种基于深度学习的肺音分类方法、系统及存储介质
CN112883812A (zh) * 2021-01-22 2021-06-01 广州联智信息科技有限公司 一种基于深度学习的肺音分类方法、系统及存储介质
CN112951245B (zh) * 2021-03-09 2023-06-16 江苏开放大学(江苏城市职业学院) 一种融入静态分量的动态声纹特征提取方法
CN112951245A (zh) * 2021-03-09 2021-06-11 江苏开放大学(江苏城市职业学院) 一种融入静态分量的动态声纹特征提取方法
CN113178196A (zh) * 2021-04-20 2021-07-27 平安国际融资租赁有限公司 音频数据提取方法、装置、计算机设备和存储介质
CN113178196B (zh) * 2021-04-20 2023-02-07 平安国际融资租赁有限公司 音频数据提取方法、装置、计算机设备和存储介质
CN113299295A (zh) * 2021-05-11 2021-08-24 支付宝(杭州)信息技术有限公司 声纹编码网络的训练方法及装置
CN113299295B (zh) * 2021-05-11 2022-12-30 支付宝(杭州)信息技术有限公司 声纹编码网络的训练方法及装置
CN113271430A (zh) * 2021-05-13 2021-08-17 中国联合网络通信集团有限公司 网络视频会议中防干扰方法、系统、设备及存储介质
CN113271430B (zh) * 2021-05-13 2022-11-18 中国联合网络通信集团有限公司 网络视频会议中防干扰方法、系统、设备及存储介质
CN113393832A (zh) * 2021-06-03 2021-09-14 清华大学深圳国际研究生院 一种基于全局情感编码的虚拟人动画合成方法及系统
CN113393832B (zh) * 2021-06-03 2023-10-10 清华大学深圳国际研究生院 一种基于全局情感编码的虚拟人动画合成方法及系统
CN113421573B (zh) * 2021-06-18 2024-03-19 马上消费金融股份有限公司 身份识别模型训练方法、身份识别方法及装置
CN113421573A (zh) * 2021-06-18 2021-09-21 马上消费金融股份有限公司 身份识别模型训练方法、身份识别方法及装置
CN113593581A (zh) * 2021-07-12 2021-11-02 西安讯飞超脑信息科技有限公司 声纹判别方法、装置、计算机设备和存储介质
CN113593581B (zh) * 2021-07-12 2024-04-19 西安讯飞超脑信息科技有限公司 声纹判别方法、装置、计算机设备和存储介质
CN113804767B (zh) * 2021-08-16 2022-11-04 东南大学 一种螺栓失效检测方法
CN113804767A (zh) * 2021-08-16 2021-12-17 东南大学 一种螺栓失效检测方法
CN113838469A (zh) * 2021-09-09 2021-12-24 竹间智能科技(上海)有限公司 一种身份识别方法、系统及存储介质
CN113870840A (zh) * 2021-09-27 2021-12-31 京东科技信息技术有限公司 语音识别方法、装置及相关设备
CN114974259A (zh) * 2021-12-23 2022-08-30 号百信息服务有限公司 一种声纹识别方法
CN114495948A (zh) * 2022-04-18 2022-05-13 北京快联科技有限公司 一种声纹识别方法及装置
CN115171700A (zh) * 2022-06-13 2022-10-11 武汉大学 一种基于脉冲神经网络的声纹识别语音助手方法
CN115171700B (zh) * 2022-06-13 2024-04-26 武汉大学 一种基于脉冲神经网络的声纹识别语音助手方法
CN115223576A (zh) * 2022-06-23 2022-10-21 国网江苏省电力有限公司南京供电分公司 基于mfcc的变压器声纹特征可控精度提取和识别方法与系统
CN115472168A (zh) * 2022-08-24 2022-12-13 武汉理工大学 耦合bgcc和pwpe特征的短时语音声纹识别方法、系统及设备
CN115472168B (zh) * 2022-08-24 2024-04-19 武汉理工大学 耦合bgcc和pwpe特征的短时语音声纹识别方法、系统及设备
CN115457968A (zh) * 2022-08-26 2022-12-09 华南理工大学 基于混合分辨率深度可分卷积网络的声纹确认方法
CN117475360B (zh) * 2023-12-27 2024-03-26 南京纳实医学科技有限公司 基于改进型mlstm-fcn的音视频特点的生物特征提取与分析方法
CN117475360A (zh) * 2023-12-27 2024-01-30 南京纳实医学科技有限公司 基于改进型mlstm-fcn的音视频特点的生物体征提取与分析方法
CN117577137A (zh) * 2024-01-15 2024-02-20 宁德时代新能源科技股份有限公司 切刀健康评估方法、装置、设备及存储介质
CN117577137B (zh) * 2024-01-15 2024-05-28 宁德时代新能源科技股份有限公司 切刀健康评估方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN107610707B (zh) 2018-08-31
CN107610707A (zh) 2018-01-19

Similar Documents

Publication Publication Date Title
WO2018107810A1 (fr) Procédé et appareil de reconnaissance d'empreinte vocale, et dispositif électronique et support
WO2021208287A1 (fr) Procédé et appareil de détection d'activité vocale pour reconnaissance d'émotion, dispositif électronique et support de stockage
TWI641965B (zh) 基於聲紋識別的身份驗證的方法及系統
Liu et al. GMM and CNN hybrid method for short utterance speaker recognition
CN109817246B (zh) 情感识别模型的训练方法、情感识别方法、装置、设备及存储介质
US11030998B2 (en) Acoustic model training method, speech recognition method, apparatus, device and medium
US9685155B2 (en) Method for distinguishing components of signal of environment
CN106486131B (zh) 一种语音去噪的方法及装置
WO2016155047A1 (fr) Procédé de reconnaissance d'événement sonore dans une scène auditive ayant un rapport signal sur bruit faible
WO2020034628A1 (fr) Procédé et dispositif d'identification d'accents, dispositif informatique et support d'informations
Vyas A Gaussian mixture model based speech recognition system using Matlab
WO2019136912A1 (fr) Dispositif électronique, procédé et système d'authentification d'identité, et support de stockage
WO2019237518A1 (fr) Procédé d'établissement de bibliothèque de modèles, procédé et appareil de reconnaissance vocale, ainsi que dispositif et support
CN112053694A (zh) 一种基于cnn与gru网络融合的声纹识别方法
CN112735435A (zh) 具备未知类别内部划分能力的声纹开集识别方法
Al-Kaltakchi et al. Thorough evaluation of TIMIT database speaker identification performance under noise with and without the G. 712 type handset
Xue et al. Cross-modal information fusion for voice spoofing detection
CN111785262B (zh) 一种基于残差网络及融合特征的说话人年龄性别分类方法
Rahman et al. Dynamic thresholding on speech segmentation
Li et al. A Convolutional Neural Network with Non-Local Module for Speech Enhancement.
Elnaggar et al. A new unsupervised short-utterance based speaker identification approach with parametric t-SNE dimensionality reduction
CN111310836B (zh) 一种基于声谱图的声纹识别集成模型的防御方法及防御装置
Mini et al. Feature vector selection of fusion of MFCC and SMRT coefficients for SVM classifier based speech recognition system
Aggarwal et al. Grid search analysis of nu-SVC for text-dependent speaker-identification
Zhipeng et al. Voiceprint recognition based on BP Neural Network and CNN

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17881766

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17881766

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 04.10.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17881766

Country of ref document: EP

Kind code of ref document: A1