CN115035901A - Voiceprint recognition method based on neural network and related device - Google Patents

Voiceprint recognition method based on neural network and related device Download PDF

Info

Publication number
CN115035901A
CN115035901A CN202210635522.3A CN202210635522A CN115035901A CN 115035901 A CN115035901 A CN 115035901A CN 202210635522 A CN202210635522 A CN 202210635522A CN 115035901 A CN115035901 A CN 115035901A
Authority
CN
China
Prior art keywords
voiceprint
semi
orthogonal
network model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210635522.3A
Other languages
Chinese (zh)
Inventor
李国伟
王俊波
唐琪
张殷
黎小龙
范心明
李新
董镝
宋安琪
刘崧
梁年柏
谢志杨
李志锦
严司玮
蒋维
武利会
陈志平
王志刚
张伟忠
何胜红
刘少辉
陈贤熙
曾庆辉
刘昊
吴焯军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Foshan Power Supply Bureau of Guangdong Power Grid Corp
Original Assignee
Guangdong Power Grid Co Ltd
Foshan Power Supply Bureau of Guangdong Power Grid Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd, Foshan Power Supply Bureau of Guangdong Power Grid Corp filed Critical Guangdong Power Grid Co Ltd
Priority to CN202210635522.3A priority Critical patent/CN115035901A/en
Publication of CN115035901A publication Critical patent/CN115035901A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a voiceprint recognition method and a related device based on a neural network, wherein the method comprises the following steps: constructing a semi-orthogonal decomposition neural network model based on a plurality of semi-orthogonal convolution blocks, wherein each semi-orthogonal convolution block comprises a plurality of semi-orthogonal one-dimensional convolution layers, and the semi-orthogonal one-dimensional convolution layers are connected with each other through a series connection mode, an inner hop connection structure and an outer hop connection structure; performing voiceprint recognition training on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model; and identifying the target voiceprint by adopting a target identification network model to obtain a voiceprint identification result. The semi-orthogonal one-dimensional convolution layers can decompose an original parameter matrix in the network, can compress a redundant parameter expression space, and reduce time delay span while filtering noise interference. The method and the device can solve the technical problems that the anti-noise capability of the existing voiceprint recognition technology is poor, the time delay modeling capability is limited, and the recognition result lacks accuracy and reliability.

Description

Voiceprint recognition method based on neural network and related device
Technical Field
The present application relates to the field of voiceprint recognition technologies, and in particular, to a voiceprint recognition method and a related apparatus based on a neural network.
Background
The voiceprint recognition system is a system for automatically recognizing the identity of a speaker according to the characteristics of human voice, and the voiceprint recognition technology belongs to one of biological verification technologies, namely, the identity of the speaker is verified through voice. The technology has the characteristics of better convenience, stability, measurability, safety and the like, and is generally used in the fields of banks, social security, public security, intelligent home, mobile payment and the like.
The existing voiceprint recognition method is limited by noise influence in voiceprint information, so that the recognition result lacks accuracy and reliability, and the delay modeling capability of the voiceprint recognition based on the neural network model is limited, so that the actual voiceprint recognition effect is poor, and the high-standard application requirement cannot be met.
Disclosure of Invention
The application provides a voiceprint recognition method based on a neural network and a related device, which are used for solving the technical problems that the anti-noise capability of the existing voiceprint recognition technology is poor, the time delay modeling capability is limited, and the recognition result lacks accuracy and reliability.
In view of the above, a first aspect of the present application provides a voiceprint recognition method based on a neural network, including:
constructing a semi-orthogonal decomposition neural network model based on a plurality of semi-orthogonal convolution blocks, wherein each semi-orthogonal convolution block comprises a plurality of semi-orthogonal one-dimensional convolution layers, and the semi-orthogonal one-dimensional convolution layers are connected in series, and are connected with each other through an inner hop connecting structure and an outer hop connecting structure;
performing voiceprint recognition training on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model;
and identifying the target voiceprint by adopting the target identification network model to obtain a voiceprint identification result.
Preferably, the voiceprint recognition training of the semi-orthogonal decomposition neural network model is performed according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model, and the method further includes:
preprocessing the training voiceprint information to obtain an audio frame to be processed, wherein the preprocessing operation comprises weighting, framing and windowing;
based on a Fourier transform algorithm, calculating the audio frame by adopting a Mel filter to obtain MFCC characteristics;
and constructing a preset MFCC training set according to the MFCC characteristics.
Preferably, the performing voiceprint recognition training on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model further includes:
constructing a semi-orthogonal decomposition feature extractor based on a plurality of semi-orthogonal convolution blocks;
performing voiceprint feature extraction training on the semi-orthogonal decomposition feature extractor according to a preset MFCC training set corresponding to training voiceprint information to obtain a target voiceprint feature extractor;
and in the voiceprint information registration process, performing feature extraction on the newly added voiceprint through the target voiceprint feature extractor, and storing the extracted voiceprint features in a database.
Preferably, the performing voiceprint recognition training on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model, and then further comprising:
performing voiceprint recognition test on the target recognition network model by adopting a preset MFCC test set corresponding to the test voiceprint information to obtain a test result;
screening the target recognition network model according to the test result to obtain an optimized recognition network model;
correspondingly, the identifying the target voiceprint by using the target identification network model to obtain the voiceprint identification result includes:
and identifying the target voiceprint by adopting the optimized identification network model to obtain a voiceprint identification result.
A second aspect of the present application provides a voiceprint recognition apparatus based on a neural network, including:
the model building module is used for building a semi-orthogonal decomposition neural network model based on a plurality of semi-orthogonal convolution blocks, each semi-orthogonal convolution block comprises a plurality of semi-orthogonal one-dimensional convolution layers, and the semi-orthogonal one-dimensional convolution layers are connected in series, and are connected through an inner jump connection structure and an outer jump connection structure;
the model training module is used for carrying out voiceprint recognition training on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model;
and the voiceprint recognition module is used for recognizing the target voiceprint by adopting the target recognition network model to obtain a voiceprint recognition result.
Preferably, the method further comprises the following steps:
the preprocessing module is used for preprocessing the training voiceprint information to obtain an audio frame to be processed, and the preprocessing operation comprises weighting, framing and windowing;
the feature extraction module is used for calculating the audio frame by adopting a Mel filter based on a Fourier transform algorithm to obtain MFCC features;
and the training set constructing module is used for constructing a preset MFCC training set according to the MFCC characteristics.
Preferably, the method further comprises the following steps:
an extractor construction module for constructing a semi-orthogonal decomposition feature extractor based on a plurality of semi-orthogonal convolution blocks;
the extractor training module is used for carrying out voiceprint feature extraction training on the semi-orthogonal decomposition feature extractor according to a preset MFCC training set corresponding to training voiceprint information to obtain a target voiceprint feature extractor;
and the extractor using module is used for extracting the characteristics of the newly added voiceprints through the target voiceprint characteristic extractor in the voiceprint information registration process and storing the extracted voiceprint characteristics in a database.
Preferably, the method further comprises the following steps:
the test module is used for carrying out voiceprint recognition test on the target recognition network model by adopting a preset MFCC test set corresponding to the test voiceprint information to obtain a test result;
the optimization module is used for screening the target recognition network model according to the test result to obtain an optimized recognition network model;
correspondingly, the voiceprint recognition module is specifically configured to:
and identifying the target voiceprint by adopting the optimized identification network model to obtain a voiceprint identification result.
A third aspect of the present application provides a voiceprint recognition device based on a neural network, the device comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the neural network based voiceprint recognition method of the first aspect according to instructions in the program code.
A fourth aspect of the present application provides a computer-readable storage medium for storing program code for executing the neural network-based voiceprint recognition method of the first aspect.
According to the technical scheme, the embodiment of the application has the following advantages:
in the present application, a voiceprint recognition method based on a neural network is provided, including: constructing a semi-orthogonal decomposition neural network model based on a plurality of semi-orthogonal convolution blocks, wherein each semi-orthogonal convolution block comprises a plurality of semi-orthogonal one-dimensional convolution layers, and the semi-orthogonal one-dimensional convolution layers are connected with each other through a series connection mode, an inner hop connection structure and an outer hop connection structure; performing voiceprint recognition training on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model; and identifying the target voiceprint by adopting a target identification network model to obtain a voiceprint identification result.
According to the voiceprint recognition method based on the neural network, in the process of constructing the semi-orthogonal decomposition neural network model, the convolution layers are connected through the jump connection structure, and voiceprint characteristic information of the shallow layer is directly transmitted to the deep layer convolution layer, so that the deep layer network obtains richer voiceprint information, and the anti-noise capability of the network is improved; the semi-orthogonal one-dimensional convolution layers can decompose the original parameter matrix in the network, can compress a redundant parameter expression space, reduces the time delay span while filtering noise interference, and achieves the purpose of long-time-delay learning. Therefore, the method and the device can solve the technical problems that the anti-noise capability of the existing voiceprint recognition technology is poor, the time delay modeling capability is limited, and the recognition result is lack of accuracy and reliability.
Drawings
Fig. 1 is a schematic flowchart of a voiceprint recognition method based on a neural network according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a voiceprint recognition apparatus based on a neural network according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a semi-orthogonal decomposition neural network model provided in an embodiment of the present application;
fig. 4 is a schematic network structure diagram of a half-orthogonal volume block according to an embodiment of the present disclosure.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
For easy understanding, please refer to fig. 1, the present application provides an embodiment of a method for voiceprint recognition based on a neural network, including:
step 101, constructing a semi-orthogonal decomposition neural network model based on a plurality of semi-orthogonal convolution blocks, wherein each semi-orthogonal convolution block comprises a plurality of semi-orthogonal one-dimensional convolution layers, and the semi-orthogonal one-dimensional convolution layers are connected in series, and are connected through an inner hop connection structure and an outer hop connection structure.
It should be noted that, currently, the mainstream Deep Neural Network (DNN) and voiceprint recognition application is proposed by a Time Delay Neural Network (TDNN), and a penultimate layer or a second hidden layer of the TDNN is used as a voiceprint feature output, which is called x-vector. The TDNN is mainly built by a multi-layer one-dimensional Convolutional Neural Network (CNN) component. However, the one-dimensional convolution component has the capability of describing image or voice characteristics in a multi-scale manner, which is superior to a common full-connection manner, but the TDNN has a reduced recognition effect in a strong noise environment and has insufficient noise resistance. In addition, the TDNN has limited delay modeling capability and can only perform effective learning within a short, steady time range.
Therefore, in the embodiment, the weight matrix of the neural network is decomposed by using the semi-orthogonal convolution layer, so that the parameter quantity of the original one-dimensional convolution weight layer is reduced to a great extent; under the condition of supervised learning speaker tags, the semi-orthogonal decomposition neural network extracts important speaker voice print information from each factorization based on finite parameters and filters irrelevant noise information, thereby exerting the anti-noise capability. In addition, the time delay modeling range of the semi-orthogonal convolution layer is limited, if an excessively large span is set, sampling leakage occurs, and information filtering is deteriorated.
Referring to fig. 3, the semi-orthogonal decomposition neural network model in this example includes a plurality of unequal semi-orthogonal convolution blocks, and adjacent semi-orthogonal convolution blocks are connected in series, and in addition, each semi-orthogonal convolution block is spliced with one, two or more outer-hop connection structures or outputs of inner-hop connection structures from the second semi-orthogonal convolution block of the network model. It should be noted that the external hop connection structure also includes network structures such as a semi-orthogonal one-dimensional convolution layer and an activation function, and the output of the external hop connection structure is transmitted to a second semi-orthogonal convolution block and a later deep semi-orthogonal convolution block in the network model. The inner-hop connection structure starts from the second semi-orthogonal convolution block, and shallow layer characteristic information needs to be transmitted to the later deep layer semi-orthogonal convolution block. As can be seen from the example given in fig. 3, the inputs of the second half-orthogonal convolution block include a serial input and a skip-out input; the inputs to the third half-orthogonal convolution block include a series input, an out-hop connection input and an in-hop connection input, and so on. Specifically, the number of the semi-orthogonal convolution blocks included in the network model may be determined according to actual needs, and is not limited herein as long as the convolution block connection thinking of the present embodiment is met.
The semi-orthogonal convolution block includes a plurality of semi-orthogonal one-dimensional convolution layers, and the number is at least 2, so that the semi-orthogonal convolution block is a plurality of sections of semi-orthogonal convolution blocks, please refer to fig. 4, an arc arrow in fig. 4 is a data flow direction of an inner jump connection structure and an outer jump connection structure, the semi-orthogonal convolution block includes an outer jump splicing layer, an activation function, a regular layer and an output layer besides the semi-orthogonal one-dimensional convolution layers, and the output layer includes input information fused into the inner jump connection structure and output information of the regular layer; the outer hop splicing layer can receive output information of outer hop connection structures input by other semi-orthogonal convolution blocks and combine the output information with other information received by the layer in a splicing mode. It will be appreciated that the outer hop stitching layer may receive the output information of a plurality of unequal outer hop connection structures.
It should be noted that, the semi-orthogonal one-dimensional convolutional layer can decompose the input parameter matrix a [ a, a ], and the constraint decomposed parameter matrix M [ a, B ] conforms to the semi-orthogonal decomposition, so that the effective voiceprint information of the output matrix B [ B, B ] can be retained, where a and B are both matrix dimension degrees, and the constraint decomposition formula is as follows:
Figure BDA0003681946020000061
A=MB
wherein alpha is a floating-point coefficient and is 1 by default; and I is an identity matrix.
When the constraint converges, there will be:
A=MB≈MM T
i.e. the output matrix B is approximately equal to the parameter matrix M T
Performing convolution operation on the decomposed matrix B or the spliced matrix by using a parameter matrix Nb, a of one-dimensional convolution, and learning information on different scales by using a plurality of convolution kernels to generate a matrix Pa, a; wherein, the matrix P is different from the output result R [ a, a ] of the ordinary one-dimensional convolution TDNN, and assuming that the parameter matrix of the ordinary one-dimensional convolution TDNN is W [ a, a ], the ordinary one-dimensional convolution and the semi-orthogonal decomposition one-dimensional convolution have the following differences:
ordinary one-dimensional convolution: a W → R
Semi-orthogonal decomposition one-dimensional convolution: a x MN → P
When the dimensionality reduction of the parameter matrix M is equal to or less than a/4, the total parameter number of M and N is equal to or less than half of the parameter number of W. Multiple audio frequencies of each training speaker have multiple noise factor differences, and in the supervised learning speaker label, the neural network can learn speaker commonalities under multiple noises. The semi-orthogonal one-dimensional convolution is used for compressing a redundant parameter representation space by decomposing the original matrix, so that the speaker information can be refined, and noise interference can be filtered; the modeling value of the noise information is as follows:
ε=W-MN。
activating the function and the regular layer, and performing nonlinear activation processing and re-regular on a matrix P output by the semi-orthogonal one-dimensional convolutional layer to obtain hidden layer information Q; and finally, the output layer performs integration operation such as addition or splicing on the matrix Q after activation and normalization and the jump connection input matrix A, wherein the selection of the embodiment is weight addition, the weight default is 0.66, and the weight addition is finished into an output matrix O.
And 102, performing voiceprint recognition training on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to the training voiceprint information to obtain a target recognition network model.
Data of training of the semi-orthogonal decomposition neural network model is acoustic features, namely Mel Frequency Cepstrum Coefficient (MFCC), for a training set with labels, cross entropy calculation loss can be conducted on output results of the model and the labels, further training of the model is optimized, and a target recognition network model is obtained.
Further, step 102, before, further comprising:
preprocessing the training voiceprint information to obtain an audio frame to be processed, wherein the preprocessing operation comprises weighting, framing and windowing;
based on a Fourier transform algorithm, an audio frame is calculated by adopting a Mel filter to obtain MFCC characteristics;
and constructing a preset MFCC training set according to the MFCC characteristics.
Further, step 102, further includes:
constructing a semi-orthogonal decomposition feature extractor based on a plurality of semi-orthogonal convolution blocks;
performing voiceprint feature extraction training on the semi-orthogonal decomposition feature extractor according to a preset MFCC training set corresponding to training voiceprint information to obtain a target voiceprint feature extractor;
and in the voiceprint information registration process, extracting the characteristics of the newly added voiceprints through a target voiceprint characteristic extractor, and storing the extracted voiceprint characteristics in a database.
The feature extractor has the same network structure as the semi-orthogonal decomposition neural network model, and the essence is that the voiceprint feature output is carried out on the first layer of the final pooling layer of the semi-orthogonal decomposition neural network model, and the output of the identification result of the final full connection layer is not carried out. The training process of the target voiceprint feature extractor is the same as that of the recognition model; and the feature extractor can be used for an initial voice information registration process and can also be applied to an acquisition process of a verification set.
Further, step 102, thereafter, further includes:
performing voiceprint recognition test on the target recognition network model by adopting a preset MFCC test set corresponding to the test voiceprint information to obtain a test result;
screening the target recognition network model according to the test result to obtain an optimized recognition network model;
correspondingly, step 103 includes:
and identifying the target voiceprint by adopting an optimized identification network model to obtain a voiceprint identification result.
And 103, identifying the target voiceprint by adopting a target identification network model to obtain a voiceprint identification result.
According to the voiceprint recognition method based on the neural network, in the process of constructing the semi-orthogonal decomposition neural network model, the convolution layers are connected through the jump connection structure, and voiceprint characteristic information of the shallow layer is directly transmitted to the deep layer convolution layers, so that the deep layer network can obtain richer voiceprint information, and the anti-noise capability of the network is improved; the semi-orthogonal one-dimensional convolution layers can decompose the original parameter matrix in the network, can compress a redundant parameter expression space, reduces the time delay span while filtering noise interference, and achieves the purpose of long-time-delay learning. Therefore, the method and the device for identifying the voiceprint can solve the technical problems that the anti-noise capability of the existing voiceprint identification technology is poor, the time delay modeling capability is limited, and the identification result is lack of accuracy and reliability.
For ease of understanding, referring to fig. 2, the present application provides an embodiment of a neural network-based voiceprint recognition apparatus, comprising:
the model building module 201 is used for building a semi-orthogonal decomposition neural network model based on a plurality of semi-orthogonal convolution blocks, each semi-orthogonal convolution block comprises a plurality of semi-orthogonal one-dimensional convolution layers, and the semi-orthogonal one-dimensional convolution layers are connected in a series mode through an inner jump connection structure and an outer jump connection structure;
the model training module 202 is used for performing voiceprint recognition training on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model;
and the voiceprint recognition module 203 is configured to recognize the target voiceprint by using the target recognition network model to obtain a voiceprint recognition result.
Further, still include:
the preprocessing module 204 is configured to perform preprocessing operations on the training voiceprint information to obtain an audio frame to be processed, where the preprocessing operations include weighting, framing, and windowing;
a feature extraction module 205, configured to calculate an audio frame by using a mel filter based on a fourier transform algorithm to obtain MFCC features;
a training set constructing module 206, configured to construct a preset MFCC training set according to the MFCC characteristics.
Further, still include:
an extractor construction module 207 for constructing a semi-orthogonal decomposition feature extractor based on the plurality of semi-orthogonal convolution blocks;
an extractor training module 208, configured to perform voiceprint feature extraction training on the semi-orthogonal decomposition feature extractor according to a preset MFCC training set corresponding to training voiceprint information, to obtain a target voiceprint feature extractor;
and the extractor using module 209 is used for performing feature extraction on the newly added voiceprint through the target voiceprint feature extractor in the process of registering the voiceprint information, and storing the extracted voiceprint features in the database.
Further, still include:
the test module 210 is configured to perform a voiceprint recognition test on the target recognition network model by using a preset MFCC test set corresponding to the test voiceprint information to obtain a test result;
the optimization module 211 is configured to screen the target recognition network model according to the test result to obtain an optimized recognition network model;
correspondingly, the voiceprint recognition module 203 is specifically configured to:
and identifying the target voiceprint by adopting an optimized identification network model to obtain a voiceprint identification result.
The application also provides a voiceprint recognition device based on the neural network, and the device comprises a processor and a memory;
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is used for executing the neural network-based voiceprint recognition method in the above method embodiment according to the instructions in the program code.
The present application also provides a computer-readable storage medium for storing program code for executing the neural network-based voiceprint recognition method in the above-described method embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, or portions or all or portions of the technical solutions that contribute to the prior art, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for executing all or part of the steps of the methods described in the embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device). And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A voiceprint recognition method based on a neural network is characterized by comprising the following steps:
constructing a semi-orthogonal decomposition neural network model based on a plurality of semi-orthogonal convolution blocks, wherein each semi-orthogonal convolution block comprises a plurality of semi-orthogonal one-dimensional convolution layers, and the semi-orthogonal one-dimensional convolution layers are connected in series, and are connected with each other through an inner hop connecting structure and an outer hop connecting structure;
performing voiceprint recognition training on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model;
and identifying the target voiceprint by adopting the target identification network model to obtain a voiceprint identification result.
2. The method for voiceprint recognition based on neural network according to claim 1, wherein the voiceprint recognition training is performed on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model, and before the method, the method further comprises:
preprocessing the training voiceprint information to obtain an audio frame to be processed, wherein the preprocessing operation comprises weighting, framing and windowing;
based on a Fourier transform algorithm, calculating the audio frame by adopting a Mel filter to obtain MFCC characteristics;
and constructing a preset MFCC training set according to the MFCC characteristics.
3. The method according to claim 1, wherein the training of voiceprint recognition is performed on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model, and further comprising:
constructing a semi-orthogonal decomposition feature extractor based on a plurality of semi-orthogonal convolution blocks;
performing voiceprint feature extraction training on the semi-orthogonal decomposition feature extractor according to a preset MFCC training set corresponding to training voiceprint information to obtain a target voiceprint feature extractor;
and in the voiceprint information registration process, performing feature extraction on the newly added voiceprint through the target voiceprint feature extractor, and storing the extracted voiceprint features in a database.
4. The method for voiceprint recognition based on neural network according to claim 1, wherein the voiceprint recognition training is performed on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model, and then further comprising:
performing voiceprint recognition test on the target recognition network model by adopting a preset MFCC test set corresponding to the test voiceprint information to obtain a test result;
screening the target recognition network model according to the test result to obtain an optimized recognition network model;
correspondingly, the identifying the target voiceprint by using the target identification network model to obtain the voiceprint identification result includes:
and identifying the target voiceprint by adopting the optimized identification network model to obtain a voiceprint identification result.
5. A voiceprint recognition apparatus based on a neural network, comprising:
the model building module is used for building a semi-orthogonal decomposition neural network model based on a plurality of semi-orthogonal convolution blocks, each semi-orthogonal convolution block comprises a plurality of semi-orthogonal one-dimensional convolution layers, and the semi-orthogonal one-dimensional convolution layers are connected in a series mode through an inner jump connection structure and an outer jump connection structure;
the model training module is used for carrying out voiceprint recognition training on the semi-orthogonal decomposition neural network model according to a preset MFCC training set corresponding to training voiceprint information to obtain a target recognition network model;
and the voiceprint recognition module is used for recognizing the target voiceprint by adopting the target recognition network model to obtain a voiceprint recognition result.
6. The neural network-based voiceprint recognition apparatus according to claim 5, further comprising:
the preprocessing module is used for preprocessing the training voiceprint information to obtain an audio frame to be processed, and the preprocessing operation comprises weighting, framing and windowing;
the feature extraction module is used for calculating the audio frame by adopting a Mel filter based on a Fourier transform algorithm to obtain MFCC features;
and the training set constructing module is used for constructing a preset MFCC training set according to the MFCC characteristics.
7. The neural network-based voiceprint recognition apparatus according to claim 5, further comprising:
an extractor construction module for constructing a semi-orthogonal decomposition feature extractor based on the plurality of semi-orthogonal convolution blocks;
the extractor training module is used for carrying out voiceprint feature extraction training on the semi-orthogonal decomposition feature extractor according to a preset MFCC training set corresponding to training voiceprint information to obtain a target voiceprint feature extractor;
and the extractor using module is used for extracting the characteristics of the newly added voiceprints through the target voiceprint characteristic extractor in the voiceprint information registration process and storing the extracted voiceprint characteristics in a database.
8. The neural network-based voiceprint recognition apparatus according to claim 5, further comprising:
the test module is used for carrying out voiceprint recognition test on the target recognition network model by adopting a preset MFCC test set corresponding to the test voiceprint information to obtain a test result;
the optimization module is used for screening the target recognition network model according to the test result to obtain an optimized recognition network model;
correspondingly, the voiceprint recognition module is specifically configured to:
and identifying the target voiceprint by adopting the optimized identification network model to obtain a voiceprint identification result.
9. A neural network-based voiceprint recognition apparatus, the apparatus comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the neural network-based voiceprint recognition method of any one of claims 1-4 according to instructions in the program code.
10. A computer-readable storage medium for storing program code for performing the neural network-based voiceprint recognition method of any one of claims 1 to 4.
CN202210635522.3A 2022-06-07 2022-06-07 Voiceprint recognition method based on neural network and related device Pending CN115035901A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210635522.3A CN115035901A (en) 2022-06-07 2022-06-07 Voiceprint recognition method based on neural network and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210635522.3A CN115035901A (en) 2022-06-07 2022-06-07 Voiceprint recognition method based on neural network and related device

Publications (1)

Publication Number Publication Date
CN115035901A true CN115035901A (en) 2022-09-09

Family

ID=83122551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210635522.3A Pending CN115035901A (en) 2022-06-07 2022-06-07 Voiceprint recognition method based on neural network and related device

Country Status (1)

Country Link
CN (1) CN115035901A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115831127A (en) * 2023-01-09 2023-03-21 浙江大学 Voiceprint reconstruction model construction method and device based on voice conversion and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115831127A (en) * 2023-01-09 2023-03-21 浙江大学 Voiceprint reconstruction model construction method and device based on voice conversion and storage medium
CN115831127B (en) * 2023-01-09 2023-05-05 浙江大学 Voiceprint reconstruction model construction method and device based on voice conversion and storage medium

Similar Documents

Publication Publication Date Title
KR102159217B1 (en) Electronic device, identification method, system and computer-readable storage medium
CN109326299B (en) Speech enhancement method, device and storage medium based on full convolution neural network
CN108597505B (en) Voice recognition method and device and terminal equipment
CN105976812A (en) Voice identification method and equipment thereof
CN110428842A (en) Speech model training method, device, equipment and computer readable storage medium
CN106683680A (en) Speaker recognition method and device and computer equipment and computer readable media
CN110956957A (en) Training method and system of speech enhancement model
CN106898355B (en) Speaker identification method based on secondary modeling
CN112687263A (en) Voice recognition neural network model, training method thereof and voice recognition method
CN108986798B (en) Processing method, device and the equipment of voice data
CN111932296B (en) Product recommendation method and device, server and storage medium
CN111009238B (en) Method, device and equipment for recognizing spliced voice
CN111508524B (en) Method and system for identifying voice source equipment
CN111986679A (en) Speaker confirmation method, system and storage medium for responding to complex acoustic environment
CN109658943B (en) Audio noise detection method and device, storage medium and mobile terminal
CN112530410A (en) Command word recognition method and device
CN115035901A (en) Voiceprint recognition method based on neural network and related device
CN112183582A (en) Multi-feature fusion underwater target identification method
CN113763966B (en) End-to-end text irrelevant voiceprint recognition method and system
CN113571095B (en) Speech emotion recognition method and system based on nested deep neural network
WO2021179198A1 (en) Image feature visualization method, image feature visualization apparatus, and electronic device
CN112329819A (en) Underwater target identification method based on multi-network fusion
CN116542783A (en) Risk assessment method, device, equipment and storage medium based on artificial intelligence
CN116343798A (en) Verification method and device for speaker identity in far-field scene and electronic equipment
CN114141256A (en) Voiceprint feature extraction model construction method and system based on wavelet neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination