CN111341304A - Method, device and equipment for training speech characteristics of speaker based on GAN - Google Patents
Method, device and equipment for training speech characteristics of speaker based on GAN Download PDFInfo
- Publication number
- CN111341304A CN111341304A CN202010130403.3A CN202010130403A CN111341304A CN 111341304 A CN111341304 A CN 111341304A CN 202010130403 A CN202010130403 A CN 202010130403A CN 111341304 A CN111341304 A CN 111341304A
- Authority
- CN
- China
- Prior art keywords
- voice
- data
- speaker
- gan
- denoising
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 34
- 238000000605 extraction Methods 0.000 claims description 27
- 238000010606 normalization Methods 0.000 claims description 17
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 125000004122 cyclic group Chemical group 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- NGVDGCNFYWLIFO-UHFFFAOYSA-N pyridoxal 5'-phosphate Chemical compound CC1=NC=C(COP(O)(O)=O)C(C=O)=C1O NGVDGCNFYWLIFO-UHFFFAOYSA-N 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The application discloses a speaker voice feature training method, a device and equipment based on GAN, after the speaker voice data is subjected to conventional denoising processing, the obtained first denoising voice data Jining feature is extracted, the obtained first voice feature data is input into a generator preset with a GAN network, the first denoising voice data is subjected to secondary denoising by using a mask value to obtain second denoising voice data, and the second denoising voice data is used for voice feature training and recognition, so that the accuracy of speaker voice recognition is effectively improved, and the technical problem that the recognition accuracy of the existing voice recognition mode is low is solved.
Description
Technical Field
The present application relates to the field of speech processing technologies, and in particular, to a method, an apparatus, and a device for training speech characteristics of a speaker based on GAN.
Background
The voice recognition is an important means for identifying the same speaker, the existing speaker voiceprint identification is to acquire speaker voice data, perform voice feature extraction after denoising the speaker voice data, and then perform voice recognition through a preset voice recognition model, but the existing voice recognition mode is not high in recognition accuracy, so that the technical problem to be solved urgently by technical personnel in the field is still to further improve the accuracy of speaker voice recognition.
Disclosure of Invention
The application provides a method, a device and equipment for training the speech characteristics of a speaker based on GAN, which are used for solving the technical problem that the recognition accuracy of the existing speech recognition mode is not high.
In view of the above, a first aspect of the present application provides a method for training speech features of a speaker based on GAN, including:
acquiring voice data of a speaker through a recording device;
carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data;
performing feature extraction on the first de-noised voice data to obtain first voice feature data;
inputting the first voice feature data into a generator of a preset GAN network, and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data, wherein the ideal mask value is the ratio of the second voice feature data to the first voice feature data;
determining second de-noised voice data of the speaker voice according to the ideal mask value;
and inputting the second denoising voice data into a preset training network for voice characteristic training.
Optionally, the performing conventional denoising processing on the speaker voice data to obtain first denoised voice data includes:
and carrying out voice denoising processing based on a deep cyclic neural network on the speaker voice data to obtain first denoising voice data.
Optionally, the performing feature extraction on the first denoising voice data to obtain first voice feature data includes:
and performing MFCC feature extraction on the first de-noised voice data to obtain first voice feature data.
Optionally, after the feature extraction is performed on the first de-noised speech data to obtain first speech feature data, before the inputting the first speech feature data into a generator of a preset GAN network and outputting an ideal mask value of second speech feature data corresponding to the first speech feature data, the method further includes:
calculating a mean square error normalization processing value of the first voice characteristic data;
correspondingly, the inputting the first voice feature data into a generator of a preset GAN network and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data includes:
and inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data.
Optionally, the inputting the first voice feature data into a generator of a preset GAN network, and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data, may further include:
and training and testing the initial GAN network until the initial GAN network converges to obtain the preset GAN network.
A second aspect of the present application provides a GAN-based device for training speech characteristics of a speaker, comprising:
the acquisition unit is used for acquiring the voice data of the speaker through the recording equipment;
the first denoising unit is used for carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data;
the feature extraction unit is used for performing feature extraction on the first denoising voice data to obtain first voice feature data;
a mask unit, configured to input the first voice feature data into a generator of a preset GAN network, and output an ideal mask value of second voice feature data corresponding to the first voice feature data, where the ideal mask value is a ratio of the second voice feature data to the first voice feature data;
the second denoising unit is used for determining second denoising voice data of the speaker voice according to the ideal mask value;
and the first training unit is used for inputting the second denoising voice data into a preset training network for voice characteristic training.
Optionally, the feature extraction unit is specifically configured to:
and performing MFCC feature extraction on the first de-noised voice data to obtain first voice feature data.
Optionally, the method further comprises:
the second training unit is used for training and testing the initial GAN network until the initial GAN network converges to obtain the preset GAN network;
the normalization unit is used for calculating a mean square error normalization processing value of the first voice characteristic data;
correspondingly, the mask unit is specifically configured to:
and inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data.
In a third aspect, the present application provides a method and apparatus for training speech characteristics of a speaker based on GAN, the apparatus including a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute any of the GAN-based speaker speech feature training methods of the first aspect according to instructions in the program code.
According to the technical scheme, the embodiment of the application has the following advantages:
the application provides a speaker voice feature training method based on GAN, comprising the following steps: acquiring voice data of a speaker through a recording device; carrying out conventional denoising processing on speaker voice data to obtain first denoising voice data; performing feature extraction on the first de-noised voice data to obtain first voice feature data; inputting the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data, wherein the ideal mask value is the ratio of the second voice characteristic data to the first voice characteristic data; determining second de-noised voice data of the speaker voice according to the ideal mask value; and inputting the second denoising voice data into a preset training network for voice characteristic training. According to the method and the device, after the conventional denoising processing is carried out on the speaker voice data, the obtained first denoising voice data Jining feature is extracted, the obtained first voice feature data is input into a generator of a preset GAN network, the first denoising voice data is denoised for the second time by utilizing a mask value to obtain second denoising voice data, the second denoising voice data is utilized to carry out voice feature training and recognition, the accuracy of speaker voice recognition is effectively improved, and the technical problem that the recognition accuracy of the existing voice recognition mode is not high is solved.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for training speech features of a speaker based on GAN according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a GAN-based speaker speech feature training apparatus according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
To facilitate understanding, referring to fig. 1, the present application provides an embodiment of a GAN-based speaker speech feature training method, including:
It should be noted that, in the embodiment of the present application, speaker voice data needs to be acquired first, and the acquisition of the speaker voice data may be acquired by a recording device, or may be acquired from existing speaker voice data in a network in a web crawler manner.
And 102, carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data.
It should be noted that after the speaker voice data is obtained, the speaker voice data is subjected to conventional denoising processing, and the conventional denoising processing mode may preferentially select a voice denoising processing mode based on a deep cyclic neural network to obtain first denoised voice data.
And 103, performing feature extraction on the first denoising voice data to obtain first voice feature data.
It should be noted that, the feature extraction performed on the first denoised speech data may be MFCC feature extraction or PLP feature extraction.
And 104, inputting the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of the second voice characteristic data corresponding to the first voice characteristic data, wherein the ideal mask value is the ratio of the second voice characteristic data to the first voice characteristic data.
And 105, determining second de-noised voice data of the speaker voice according to the ideal mask value.
It should be noted that before inputting the first speech feature data into the generator of the preset GAN network, the initial GAN network needs to be trained and tested to obtain the preset GAN network. For the first language feature data, the mean and variance of each dimension element in the first voice feature data can be calculated, normalization processing is respectively carried out on the mean and variance of each dimension, and a mean variance normalization processing value of each dimension feature data of the first voice data is formed, so that valuable voice is effectively reserved, and noise is suppressed. And inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, denoising the first voice characteristic data by the generator of the preset GAN network according to the mean square error normalization processing value of the first voice characteristic data, generating an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data, and outputting the ideal mask value. Because the ideal mask value is the ratio of the second voice characteristic data to the first voice characteristic data, the second voice characteristic data is calculated according to the ideal mask value and the first voice characteristic data, and then the second voice characteristic data is subjected to inverse transformation of feature extraction to obtain second de-noised voice data.
And 106, inputting the second denoising voice data into a preset training network for voice characteristic training.
It should be noted that the second denoising voice data is input into the preset training network for voice feature training, and the training voice data is used for speaker voice recognition, so that the accuracy of speaker recognition can be effectively improved.
According to the method for training the speech characteristics of the speaker based on the GAN, after the conventional denoising processing is carried out on the speech data of the speaker, the obtained first denoising speech data Jining characteristics are extracted, the obtained first speech characteristic data are input into a generator preset with the GAN network, the first denoising speech data of the speaker are denoised for the second time by using a mask value to obtain second denoising speech data, and the second denoising speech data are used for carrying out speech characteristic training and recognition, so that the accuracy of speech recognition of the speaker is effectively improved, and the technical problem that the recognition accuracy of the existing speech recognition mode is not high is solved.
For ease of understanding, referring to fig. 2, an embodiment of a GAN-based speaker phonetic feature training apparatus is provided, comprising:
the acquisition unit is used for acquiring the voice data of the speaker through the recording equipment;
the first denoising unit is used for carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data;
the feature extraction unit is used for performing feature extraction on the first de-noised voice data to obtain first voice feature data;
the mask unit is used for inputting the first voice characteristic data into a generator of a preset GAN network and outputting an ideal mask value of the second voice characteristic data corresponding to the first voice characteristic data, wherein the ideal mask value is the ratio of the second voice characteristic data to the first voice characteristic data;
the second denoising unit is used for determining second denoising voice data of the speaker voice according to the ideal mask value;
and the first training unit is used for inputting the second denoising voice data into a preset training network for voice characteristic training.
Further, the first denoising unit is specifically configured to:
and carrying out voice denoising processing based on a deep cyclic neural network on the speaker voice data to obtain first denoising voice data.
Further, the feature extraction unit is specifically configured to:
and performing MFCC feature extraction on the first de-noised voice data to obtain first voice feature data.
Further, still include:
the second training unit is used for training and testing the initial GAN network until the initial GAN network converges to obtain a preset GAN network;
the normalization unit is used for calculating a mean square error normalization processing value of the first voice characteristic data;
correspondingly, the mask unit is specifically configured to:
and inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of the second voice characteristic data corresponding to the first voice characteristic data.
The application provides an embodiment of a method and a device for training speech characteristics of a speaker based on GAN, wherein the device comprises a processor and a memory:
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is used for executing the GAN-based speaker voice feature training method in the embodiment of the GAN-based speaker voice feature training method according to instructions in the program code.
In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer system (which may be a personal computer, a server, or a network system) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
Claims (10)
1. A method for training speech features of a speaker based on GAN, comprising:
acquiring voice data of a speaker through a recording device;
carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data;
performing feature extraction on the first de-noised voice data to obtain first voice feature data;
inputting the first voice feature data into a generator of a preset GAN network, and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data, wherein the ideal mask value is the ratio of the second voice feature data to the first voice feature data;
determining second de-noised voice data of the speaker voice according to the ideal mask value;
and inputting the second denoising voice data into a preset training network for voice characteristic training.
2. The GAN-based speaker voice feature training method as claimed in claim 1, wherein the performing a conventional denoising process on the speaker voice data to obtain a first denoised voice data comprises:
and carrying out voice denoising processing based on a deep cyclic neural network on the speaker voice data to obtain first denoising voice data.
3. The GAN-based speaker voice feature training method as claimed in claim 2, wherein the performing feature extraction on the first de-noised voice data to obtain first voice feature data comprises:
and performing MFCC feature extraction on the first de-noised voice data to obtain first voice feature data.
4. The GAN-based speaker voice feature training method as claimed in claim 3, wherein after the feature extraction is performed on the first de-noised voice data to obtain first voice feature data, before the inputting the first voice feature data into a generator of a preset GAN network and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data, the method further comprises:
calculating a mean square error normalization processing value of the first voice characteristic data;
correspondingly, the inputting the first voice feature data into a generator of a preset GAN network and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data includes:
and inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data.
5. The GAN-based speaker voice feature training method as claimed in claim 1, wherein the inputting the first voice feature data into a generator of a preset GAN network and outputting the ideal mask value of the second voice feature data corresponding to the first voice feature data further comprises:
and training and testing the initial GAN network until the initial GAN network converges to obtain the preset GAN network.
6. A GAN-based speaker speech feature training device, comprising:
the acquisition unit is used for acquiring the voice data of the speaker through the recording equipment;
the first denoising unit is used for carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data;
the feature extraction unit is used for performing feature extraction on the first denoising voice data to obtain first voice feature data;
a mask unit, configured to input the first voice feature data into a generator of a preset GAN network, and output an ideal mask value of second voice feature data corresponding to the first voice feature data, where the ideal mask value is a ratio of the second voice feature data to the first voice feature data;
the second denoising unit is used for determining second denoising voice data of the speaker voice according to the ideal mask value;
and the first training unit is used for inputting the second denoising voice data into a preset training network for voice characteristic training.
7. The GAN-based speaker voice feature training device as claimed in claim 6, wherein the first denoising unit is specifically configured to:
and carrying out voice denoising processing based on a deep cyclic neural network on the speaker voice data to obtain first denoising voice data.
8. The GAN-based speaker voice feature training device as claimed in claim 7, wherein the feature extraction unit is specifically configured to:
and performing MFCC feature extraction on the first de-noised voice data to obtain first voice feature data.
9. The GAN-based speaker voice feature training device as claimed in claim 8, further comprising:
the second training unit is used for training and testing the initial GAN network until the initial GAN network converges to obtain the preset GAN network;
the normalization unit is used for calculating a mean square error normalization processing value of the first voice characteristic data;
correspondingly, the mask unit is specifically configured to:
and inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data.
10. A method and a device for training speech features of a speaker based on GAN are characterized in that the device comprises a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the GAN-based speaker speech feature training method according to any one of claims 1-5 according to instructions in the program code.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010130403.3A CN111341304A (en) | 2020-02-28 | 2020-02-28 | Method, device and equipment for training speech characteristics of speaker based on GAN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010130403.3A CN111341304A (en) | 2020-02-28 | 2020-02-28 | Method, device and equipment for training speech characteristics of speaker based on GAN |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111341304A true CN111341304A (en) | 2020-06-26 |
Family
ID=71187170
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010130403.3A Pending CN111341304A (en) | 2020-02-28 | 2020-02-28 | Method, device and equipment for training speech characteristics of speaker based on GAN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111341304A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112700786A (en) * | 2020-12-29 | 2021-04-23 | 西安讯飞超脑信息科技有限公司 | Voice enhancement method, device, electronic equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107039036A (en) * | 2017-02-17 | 2017-08-11 | 南京邮电大学 | A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network |
CN107910011A (en) * | 2017-12-28 | 2018-04-13 | 科大讯飞股份有限公司 | A kind of voice de-noising method, device, server and storage medium |
CN108986835A (en) * | 2018-08-28 | 2018-12-11 | 百度在线网络技术(北京)有限公司 | Based on speech de-noising method, apparatus, equipment and the medium for improving GAN network |
CN109119093A (en) * | 2018-10-30 | 2019-01-01 | Oppo广东移动通信有限公司 | Voice de-noising method, device, storage medium and mobile terminal |
CN109147810A (en) * | 2018-09-30 | 2019-01-04 | 百度在线网络技术(北京)有限公司 | Establish the method, apparatus, equipment and computer storage medium of speech enhan-cement network |
CN109256139A (en) * | 2018-07-26 | 2019-01-22 | 广东工业大学 | A kind of method for distinguishing speek person based on Triplet-Loss |
CN109326302A (en) * | 2018-11-14 | 2019-02-12 | 桂林电子科技大学 | A kind of sound enhancement method comparing and generate confrontation network based on vocal print |
CN109410974A (en) * | 2018-10-23 | 2019-03-01 | 百度在线网络技术(北京)有限公司 | Sound enhancement method, device, equipment and storage medium |
CN109785852A (en) * | 2018-12-14 | 2019-05-21 | 厦门快商通信息技术有限公司 | A kind of method and system enhancing speaker's voice |
CN110164425A (en) * | 2019-05-29 | 2019-08-23 | 北京声智科技有限公司 | A kind of noise-reduction method, device and the equipment that can realize noise reduction |
CN110223429A (en) * | 2019-06-19 | 2019-09-10 | 上海应用技术大学 | Voice access control system |
CN110503974A (en) * | 2019-08-29 | 2019-11-26 | 泰康保险集团股份有限公司 | Fight audio recognition method, device, equipment and computer readable storage medium |
WO2020029906A1 (en) * | 2018-08-09 | 2020-02-13 | 腾讯科技(深圳)有限公司 | Multi-person voice separation method and apparatus |
-
2020
- 2020-02-28 CN CN202010130403.3A patent/CN111341304A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107039036A (en) * | 2017-02-17 | 2017-08-11 | 南京邮电大学 | A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network |
CN107910011A (en) * | 2017-12-28 | 2018-04-13 | 科大讯飞股份有限公司 | A kind of voice de-noising method, device, server and storage medium |
CN109256139A (en) * | 2018-07-26 | 2019-01-22 | 广东工业大学 | A kind of method for distinguishing speek person based on Triplet-Loss |
WO2020029906A1 (en) * | 2018-08-09 | 2020-02-13 | 腾讯科技(深圳)有限公司 | Multi-person voice separation method and apparatus |
CN108986835A (en) * | 2018-08-28 | 2018-12-11 | 百度在线网络技术(北京)有限公司 | Based on speech de-noising method, apparatus, equipment and the medium for improving GAN network |
CN109147810A (en) * | 2018-09-30 | 2019-01-04 | 百度在线网络技术(北京)有限公司 | Establish the method, apparatus, equipment and computer storage medium of speech enhan-cement network |
CN109410974A (en) * | 2018-10-23 | 2019-03-01 | 百度在线网络技术(北京)有限公司 | Sound enhancement method, device, equipment and storage medium |
CN109119093A (en) * | 2018-10-30 | 2019-01-01 | Oppo广东移动通信有限公司 | Voice de-noising method, device, storage medium and mobile terminal |
CN109326302A (en) * | 2018-11-14 | 2019-02-12 | 桂林电子科技大学 | A kind of sound enhancement method comparing and generate confrontation network based on vocal print |
CN109785852A (en) * | 2018-12-14 | 2019-05-21 | 厦门快商通信息技术有限公司 | A kind of method and system enhancing speaker's voice |
CN110164425A (en) * | 2019-05-29 | 2019-08-23 | 北京声智科技有限公司 | A kind of noise-reduction method, device and the equipment that can realize noise reduction |
CN110223429A (en) * | 2019-06-19 | 2019-09-10 | 上海应用技术大学 | Voice access control system |
CN110503974A (en) * | 2019-08-29 | 2019-11-26 | 泰康保险集团股份有限公司 | Fight audio recognition method, device, equipment and computer readable storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112700786A (en) * | 2020-12-29 | 2021-04-23 | 西安讯飞超脑信息科技有限公司 | Voice enhancement method, device, electronic equipment and storage medium |
CN112700786B (en) * | 2020-12-29 | 2024-03-12 | 西安讯飞超脑信息科技有限公司 | Speech enhancement method, device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106683680B (en) | Speaker recognition method and device, computer equipment and computer readable medium | |
CN110457432B (en) | Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium | |
WO2021128741A1 (en) | Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium | |
US20160111112A1 (en) | Speaker change detection device and speaker change detection method | |
CN110544469B (en) | Training method and device of voice recognition model, storage medium and electronic device | |
CN111312286A (en) | Age identification method, age identification device, age identification equipment and computer readable storage medium | |
CN110556126A (en) | Voice recognition method and device and computer equipment | |
CN112382300A (en) | Voiceprint identification method, model training method, device, equipment and storage medium | |
CN111108552A (en) | Voiceprint identity identification method and related device | |
CN106782503A (en) | Automatic speech recognition method based on physiologic information in phonation | |
CN110634490A (en) | Voiceprint identification method, device and equipment | |
CN111108554A (en) | Voiceprint recognition method based on voice noise reduction and related device | |
CN113112992B (en) | Voice recognition method and device, storage medium and server | |
CN111133508A (en) | Method and device for selecting comparison phonemes | |
CN113409771B (en) | Detection method for forged audio frequency, detection system and storage medium thereof | |
CN111341304A (en) | Method, device and equipment for training speech characteristics of speaker based on GAN | |
CN107993666B (en) | Speech recognition method, speech recognition device, computer equipment and readable storage medium | |
CN113178204B (en) | Single-channel noise reduction low-power consumption method, device and storage medium | |
CN112786058B (en) | Voiceprint model training method, voiceprint model training device, voiceprint model training equipment and storage medium | |
CN111462736B (en) | Image generation method and device based on voice and electronic equipment | |
CN111341321A (en) | Matlab-based spectrogram generating and displaying method and device | |
CN112489678A (en) | Scene recognition method and device based on channel characteristics | |
CN111149154B (en) | Voiceprint recognition method, device, equipment and storage medium | |
CN113782033B (en) | Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium | |
CN112634942B (en) | Method for identifying originality of mobile phone recording, storage medium and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200626 |