CN112614493B - Voiceprint recognition method, system, storage medium and electronic device - Google Patents
Voiceprint recognition method, system, storage medium and electronic device Download PDFInfo
- Publication number
- CN112614493B CN112614493B CN202011409154.8A CN202011409154A CN112614493B CN 112614493 B CN112614493 B CN 112614493B CN 202011409154 A CN202011409154 A CN 202011409154A CN 112614493 B CN112614493 B CN 112614493B
- Authority
- CN
- China
- Prior art keywords
- neural network
- voiceprint
- convolutional neural
- layer
- convolutional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 79
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000003044 adaptive effect Effects 0.000 claims abstract description 10
- 238000011176 pooling Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 6
- 238000004891 communication Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Image Analysis (AREA)
- Collating Specific Patterns (AREA)
Abstract
The application relates to the technical field of voiceprint recognition, in particular to a voiceprint recognition method, a voiceprint recognition system, a storage medium and electronic equipment, and solves the problem that in the related art, due to the fact that square convolution of a fixed receptive field is adopted, the final voiceprint recognition effect is poor. The method comprises the following steps: extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model is obtained by training a convolutional neural network comprising a deformable convolutional layer; and comparing the similarity of the voiceprint features to be verified and the registered voiceprint features, judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint. The voiceprint features are extracted through the convolutional neural network added with the deformable convolutional layer, so that adaptive receptive field change of different voiceprint features is realized, the finally obtained convolutional neural network model has higher robustness, and the voiceprint recognition precision is improved.
Description
Technical Field
The present application relates to the field of voiceprint recognition technologies, and in particular, to a voiceprint recognition method, a voiceprint recognition system, a storage medium, and an electronic device.
Background
Voiceprint recognition is a technology for identity authentication based on voice, and belongs to one of biological feature recognition. The application field of voiceprint recognition is very wide, and the voiceprint recognition can be continuously popularized and popularized along with the development of intelligent voice technology. In recent years, application of deep learning begins to become a hot spot in the field of voiceprint recognition, and a voiceprint recognition system modeled by a deep convolutional neural network shows great recognition performance improvement due to the fact that a large amount of labeled audio data benefit.
In the existing voiceprint recognition method adopting deep convolutional neural network modeling, the convolutional kernel of the convolutional neural network can perform convolution operation on a local region of an input feature, the traditional square convolution is adopted, only voiceprint features in a fixed square region can be sampled, adaptive receptive field change cannot be performed on different voiceprint features, and the final voiceprint recognition effect is poor.
Disclosure of Invention
In view of the above problems, the present application provides a voiceprint recognition method, system, storage medium, and electronic device, which solve the technical problem in the related art that a final voiceprint recognition effect is poor due to the square convolution with a fixed receptive field.
In a first aspect, the present application provides a voiceprint recognition method, including:
receiving voice information;
extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer;
comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result;
and judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint.
Optionally, the comparing the similarity between the voiceprint feature to be verified and the registered voiceprint feature that is registered in advance to obtain a similarity result includes:
and calculating the similarity between the voiceprint features to be verified and the registered voiceprint features which are registered in advance by a cosine calculation method to obtain a similarity result.
Optionally, the process of registering the voiceprint feature includes:
receiving registration voice information;
extracting registration voiceprint characteristics in the registration voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer.
Optionally, the training process of the convolutional neural network model includes:
establishing a convolutional neural network; the convolutional neural network comprises a first convolutional layer, a first pooling layer, a deformable convolutional layer, a second pooling layer, a second convolutional layer and a full-connection layer which are sequentially arranged; the first convolutional layer comprises a first sub-convolutional layer and a second sub-convolutional layer, the deformable convolutional layer comprises a first sub-deformable convolutional layer and a second sub-deformable convolutional layer;
and training the convolutional neural network by taking the training voiceprint features which are marked in advance as input to obtain the convolutional neural network model.
Optionally, the training voiceprint feature is a mel-frequency cepstrum coefficient feature.
Optionally, the deformable convolution layer is configured to add an offset parameter to each element of the convolution kernel to obtain the adaptive receptive field.
Optionally, the first pooling layer and the second pooling layer are used to reduce feature size, enlarge the receptive field, and/or reduce the computational effort.
In a second aspect, a voiceprint recognition system, the system comprising:
a receiving unit for receiving voice information;
the extracting unit is used for extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer;
the comparison unit is used for comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result;
and the verification unit is used for judging whether the similarity result is greater than a preset threshold value or not, and if the similarity result is greater than the preset threshold value, the voiceprint recognition is successful.
In a third aspect, a storage medium storing a computer program executable by one or more processors may be used to implement the voiceprint recognition method as described in the first aspect above.
In a fourth aspect, an electronic device comprises a memory and a processor, the memory having a computer program stored thereon, the memory and the processor being communicatively connected to each other, the computer program, when executed by the processor, performing the voiceprint recognition method as described in the first aspect above.
The application provides a voiceprint recognition method, a system, a storage medium and an electronic device, comprising: receiving voice information; extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer; comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result; and judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint. According to the method and the device, the voiceprint features are extracted through the convolutional neural network added with the deformable convolutional layer, so that adaptive receptive field change is carried out on different voiceprint features, the finally obtained convolutional neural network model has higher robustness, and the voiceprint recognition precision is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flowchart of a voiceprint recognition method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a convolutional neural network provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a voiceprint recognition system according to an embodiment of the present application;
fig. 4 is a connection block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following detailed description will be provided with reference to the accompanying drawings and embodiments, so that how to apply the technical means to solve the technical problems and achieve the corresponding technical effects can be fully understood and implemented. The embodiments and various features in the embodiments of the present application can be combined with each other without conflict, and the formed technical solutions are all within the scope of protection of the present application.
It can be known from the background art that in the existing voiceprint recognition method for modeling by using a deep convolutional neural network, the convolution kernel of the convolutional neural network performs convolution operation on a local region of an input feature, the traditional square convolution is adopted, the voiceprint feature in a fixed square region can only be sampled, and the adaptive receptive field change cannot be performed on different voiceprint features, so that the final voiceprint recognition effect is poor.
In view of this, the present application provides a voiceprint recognition method, system, storage medium and electronic device, which solve the technical problem in the related art that the final voiceprint recognition effect is poor due to the square convolution with a fixed receptive field.
Example one
Fig. 1 is a schematic flow chart of a voiceprint recognition method provided in an embodiment of the present application, and as shown in fig. 1, the method includes:
s101, receiving voice information;
s102, extracting voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model;
in step S102, the pre-trained convolutional neural network model is trained by a convolutional neural network including a deformable convolutional layer.
S103, comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result;
and S104, judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint.
It should be noted that, because the set of people needing to be verified by the voiceprint recognition system is not fixed, it is not practical to retrain the convolutional neural network model once by adding one person, and therefore, the convolutional neural network model plays a role in feature extraction in the whole method and does not serve as a classifier or a recognizer.
Optionally, the comparing the similarity between the voiceprint feature to be verified and the registered voiceprint feature that is registered in advance to obtain a similarity result includes:
and calculating the similarity between the voiceprint features to be verified and the registered voiceprint features which are registered in advance by a cosine calculation method to obtain a similarity result.
It should be noted that, the present invention includes, but is not limited to, calculating the similarity between the voiceprint feature to be verified and the registered voiceprint feature that is registered in advance by using a cosine calculation method, and other calculation methods may also be used as needed, as long as the similarity between the voiceprint feature to be verified and the registered voiceprint feature that is registered in advance is finally obtained.
Optionally, the process of registering the voiceprint feature includes:
receiving registration voice information;
extracting registration voiceprint characteristics in the registration voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer.
It should be noted that, in order to solve the problem of increasing the number of people to be identified, a pre-registration method may be adopted to register the voiceprint features of the increased people for similarity comparison in subsequent voiceprint identification.
Optionally, the training process of the convolutional neural network model includes:
establishing a convolutional neural network; the convolutional neural network comprises a first convolutional layer, a first pooling layer, a deformable convolutional layer, a second pooling layer, a second convolutional layer and a full-connection layer which are sequentially arranged; the first convolutional layer comprises a first sub-convolutional layer and a second sub-convolutional layer, the deformable convolutional layer comprises a first sub-deformable convolutional layer and a second sub-deformable convolutional layer;
and training the convolutional neural network by taking the training voiceprint features which are marked in advance as input to obtain the convolutional neural network model.
It should be noted that the first convolution layer is used for performing preliminary extraction on the input training voiceprint features to obtain the intermediate layer features, which is convenient for subsequent continuous processing of the deformable convolution layer and the like.
Specifically, as shown in fig. 2, for the schematic structural diagram of the convolutional neural network provided in the embodiment of the present application, after inputting training voiceprint features into the first convolutional layer, feature extraction is performed to obtain intermediate layer features, and then the intermediate layer features are output through the full connection layer after being sequentially processed by each layer.
Optionally, the training voiceprint feature is a mel-frequency cepstrum coefficient feature.
It should be noted that Mel-scale Frequency Cepstral Coefficients (MFCCs) are Cepstral parameters extracted in the Mel-scale Frequency domain, and the Mel-scale describes the non-linear characteristic of human ear Frequency. Since the characteristics do not depend on the properties of the signals, no assumptions and restrictions are made on the input signals, and the research results of the auditory model are utilized. Therefore, the parameter has better robustness than the LPCC based on the vocal tract model, better conforms to the auditory characteristics of human ears, and still has better recognition performance when the signal-to-noise ratio is reduced.
Optionally, the deformable convolution layer is configured to add an offset parameter to each element of the convolution kernel to obtain the adaptive receptive field.
It should be noted that after the training voiceprint features are input into the first convolution layer, feature extraction is performed to obtain intermediate layer features, and meanwhile, specific offset of each element of the convolution kernel can be obtained through learning, when a training process comes to the deformable convolution layer, an offset parameter is added to the specific offset of each element of the convolution kernel, and the offset parameter can enable the receptive field of the sampling network to be adaptively adjusted according to the shape of the target to be measured, so that the most accurate features are obtained.
Optionally, the first pooling layer and the second pooling layer are used to reduce feature size, enlarge the receptive field, and/or reduce the computational effort.
In summary, an embodiment of the present application provides a voiceprint recognition method, including: receiving voice information; extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolution neural network model which is trained in advance is obtained by training a convolution neural network comprising a deformable convolution layer; comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result; and judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint. The voiceprint features are extracted through the convolutional neural network added with the deformable convolutional layer, so that adaptive receptive field change is carried out on different voiceprint features, the finally obtained convolutional neural network model has higher robustness, and the voiceprint recognition precision is improved.
Example two
Based on the voiceprint recognition method disclosed in the above embodiment of the present invention, fig. 3 specifically discloses a voiceprint recognition system applying the voiceprint recognition method.
As shown in fig. 3, the embodiment of the present invention discloses a voiceprint recognition system, which includes:
a receiving unit 301, configured to receive voice information;
an extracting unit 302, configured to extract a voiceprint feature to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer;
a comparing unit 303, configured to compare similarity between the voiceprint feature to be verified and a registered voiceprint feature that is registered in advance, to obtain a similarity result;
the verification unit 304 is configured to determine whether the similarity result is greater than a preset threshold, and if the similarity result is greater than the preset threshold, the voiceprint recognition is successful.
For the specific working processes of the receiving unit 301, the extracting unit 302, the comparing unit 303 and the verifying unit 304 in the voiceprint recognition system disclosed in the embodiment of the present invention, reference may be made to the corresponding contents in the voiceprint recognition method disclosed in the above embodiment of the present invention, and details are not repeated here.
In summary, an embodiment of the present application provides a voiceprint recognition system, including: receiving voice information; extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer; comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result; and judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint. The voiceprint features are extracted through the convolutional neural network added with the deformable convolutional layer, so that adaptive receptive field change of different voiceprint features is realized, the finally obtained convolutional neural network model has higher robustness, and the voiceprint recognition precision is improved.
EXAMPLE III
The present embodiment further provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, where the computer program can implement the method steps of the first embodiment when executed by a processor, and the detailed description of the embodiment is not repeated herein.
Example four
Fig. 4 is a connection block diagram of an electronic device 500 according to an embodiment of the present application, and as shown in fig. 4, the electronic device 500 may include: a processor 501, a memory 502, a multimedia component 503, an input/output (I/O) interface 504, and a communication component 505.
The processor 501 is configured to execute all or part of the steps in the voiceprint recognition method according to the first embodiment. The memory 502 is used to store various types of data, which may include, for example, instructions for any application or method in the electronic device, as well as application-related data.
The Processor 501 may be implemented by an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and is configured to perform the voiceprint recognition method in the first embodiment.
The Memory 502 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically Erasable Programmable Read-Only Memory (EEPROM), erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.
The multimedia component 503 may include a screen, which may be a touch screen, and an audio component for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in a memory or transmitted through a communication component. The audio assembly also includes at least one speaker for outputting audio signals.
The I/O interface 504 provides an interface between the processor 501 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons.
The communication component 505 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 505 may include: wi-Fi module, bluetooth module, NFC module.
In summary, the present application provides a voiceprint recognition method, a system, a storage medium, and an electronic device, where the method includes: receiving voice information; extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer; comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result; and judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint. The voiceprint features are extracted through the convolutional neural network added with the deformable convolutional layer, so that adaptive receptive field change is carried out on different voiceprint features, the finally obtained convolutional neural network model has higher robustness, and the voiceprint recognition precision is improved.
In the embodiments provided in the present application, it should be understood that the disclosed method can be implemented in other ways. The above-described method embodiments are merely illustrative.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
Although the embodiments disclosed in the present application are described above, the above descriptions are only for the convenience of understanding the present application, and are not intended to limit the present application. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims.
Claims (9)
1. A voiceprint recognition method, the method comprising:
receiving voice information;
extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer;
comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result;
judging whether the similarity result is greater than a preset threshold value or not, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint;
the training process of the convolutional neural network model comprises the following steps:
establishing a convolutional neural network; the convolutional neural network comprises a first convolutional layer, a first pooling layer, a deformable convolutional layer, a second pooling layer, a second convolutional layer and a full-connection layer which are sequentially arranged; the first convolutional layer comprises a first sub-convolutional layer and a second sub-convolutional layer, the deformable convolutional layer comprises a first sub-deformable convolutional layer and a second sub-deformable convolutional layer;
and training the convolutional neural network by taking the training voiceprint features which are marked in advance as input to obtain the convolutional neural network model.
2. The method according to claim 1, wherein the comparing the similarity between the voiceprint feature to be verified and the registered voiceprint feature that has been registered in advance to obtain a similarity result comprises:
and calculating the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance by a cosine calculation method to obtain a similarity result.
3. The method of claim 1, wherein the registration process for registering the voiceprint feature comprises:
receiving registration voice information;
extracting registration voiceprint characteristics in the registration voice information through a pre-trained convolutional neural network model; the pre-trained convolutional neural network model is obtained by training a convolutional neural network comprising a deformable convolutional layer.
4. The method of claim 1, wherein the training voiceprint features are mel-frequency cepstral coefficient features.
5. The method of claim 1, wherein the deformable convolution layer is configured to add an offset parameter to each element of the convolution kernel to obtain an adaptive receptive field.
6. The method of claim 1, wherein the first and second pooling layers are used to reduce feature size, enlarge the receptive field, and/or reduce computational effort.
7. A voiceprint recognition system, said system comprising:
a receiving unit for receiving voice information;
the extracting unit is used for extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer;
the comparison unit is used for comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result;
the verification unit is used for judging whether the similarity result is greater than a preset threshold value or not, and if the similarity result is greater than the preset threshold value, the voiceprint recognition is successful;
the training process of the convolutional neural network model comprises the following steps:
establishing a convolutional neural network; the convolutional neural network comprises a first convolutional layer, a first pooling layer, a deformable convolutional layer, a second pooling layer, a second convolutional layer and a full-connection layer which are sequentially arranged; the first convolutional layer comprises a first sub-convolutional layer and a second sub-convolutional layer, the deformable convolutional layer comprises a first sub-deformable convolutional layer and a second sub-deformable convolutional layer;
and training the convolutional neural network by taking the training voiceprint features which are marked in advance as input to obtain the convolutional neural network model.
8. A storage medium storing a computer program executable by one or more processors for implementing a voiceprint recognition method as claimed in any one of claims 1 to 6.
9. An electronic device, comprising a memory and a processor, wherein the memory has a computer program stored thereon, and the memory and the processor are communicatively connected to each other, and wherein the computer program, when executed by the processor, performs the voiceprint recognition method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011409154.8A CN112614493B (en) | 2020-12-04 | 2020-12-04 | Voiceprint recognition method, system, storage medium and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011409154.8A CN112614493B (en) | 2020-12-04 | 2020-12-04 | Voiceprint recognition method, system, storage medium and electronic device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112614493A CN112614493A (en) | 2021-04-06 |
CN112614493B true CN112614493B (en) | 2022-11-11 |
Family
ID=75228922
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011409154.8A Active CN112614493B (en) | 2020-12-04 | 2020-12-04 | Voiceprint recognition method, system, storage medium and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112614493B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113327604A (en) * | 2021-07-02 | 2021-08-31 | 因诺微科技(天津)有限公司 | Ultrashort speech language identification method |
CN114093370B (en) * | 2022-01-19 | 2022-04-29 | 珠海市杰理科技股份有限公司 | Voiceprint recognition method and device, computer equipment and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107610709B (en) * | 2017-08-01 | 2021-03-19 | 百度在线网络技术(北京)有限公司 | Method and system for training voiceprint recognition model |
CN108564025A (en) * | 2018-04-10 | 2018-09-21 | 广东电网有限责任公司 | A kind of infrared image object identification method based on deformable convolutional neural networks |
CN108766445A (en) * | 2018-05-30 | 2018-11-06 | 苏州思必驰信息科技有限公司 | Method for recognizing sound-groove and system |
CN110047490A (en) * | 2019-03-12 | 2019-07-23 | 平安科技(深圳)有限公司 | Method for recognizing sound-groove, device, equipment and computer readable storage medium |
CN111368684A (en) * | 2020-02-27 | 2020-07-03 | 北华航天工业学院 | Winter wheat automatic interpretation method based on deformable full-convolution neural network |
-
2020
- 2020-12-04 CN CN202011409154.8A patent/CN112614493B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN112614493A (en) | 2021-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9502038B2 (en) | Method and device for voiceprint recognition | |
JP6859522B2 (en) | Methods, devices, and systems for building user voiceprint models | |
CN110310623B (en) | Sample generation method, model training method, device, medium, and electronic apparatus | |
TWI527023B (en) | A voiceprint recognition method and apparatus | |
CN109801634B (en) | Voiceprint feature fusion method and device | |
KR102413282B1 (en) | Method for performing personalized speech recognition and user terminal and server performing the same | |
CN112614493B (en) | Voiceprint recognition method, system, storage medium and electronic device | |
US10909991B2 (en) | System for text-dependent speaker recognition and method thereof | |
CN109473105A (en) | The voice print verification method, apparatus unrelated with text and computer equipment | |
CN113223536B (en) | Voiceprint recognition method and device and terminal equipment | |
CN110265035B (en) | Speaker recognition method based on deep learning | |
WO2019232826A1 (en) | I-vector extraction method, speaker recognition method and apparatus, device, and medium | |
CN113129867B (en) | Training method of voice recognition model, voice recognition method, device and equipment | |
CN110634492B (en) | Login verification method, login verification device, electronic equipment and computer readable storage medium | |
CN111161713A (en) | Voice gender identification method and device and computing equipment | |
TW202213326A (en) | Generalized negative log-likelihood loss for speaker verification | |
Khdier et al. | Deep learning algorithms based voiceprint recognition system in noisy environment | |
CN108776795A (en) | Method for identifying ID, device and terminal device | |
CN109545226B (en) | Voice recognition method, device and computer readable storage medium | |
CN112347788A (en) | Corpus processing method, apparatus and storage medium | |
CN113421573B (en) | Identity recognition model training method, identity recognition method and device | |
CN111199742A (en) | Identity verification method and device and computing equipment | |
CN105575385A (en) | Voice cipher setting system and method, and sound cipher verification system and method | |
CN112992155B (en) | Far-field voice speaker recognition method and device based on residual error neural network | |
CN113948089B (en) | Voiceprint model training and voiceprint recognition methods, devices, equipment and media |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |