CN111341304A - Method, device and equipment for training speech characteristics of speaker based on GAN - Google Patents

Method, device and equipment for training speech characteristics of speaker based on GAN Download PDF

Info

Publication number
CN111341304A
CN111341304A CN202010130403.3A CN202010130403A CN111341304A CN 111341304 A CN111341304 A CN 111341304A CN 202010130403 A CN202010130403 A CN 202010130403A CN 111341304 A CN111341304 A CN 111341304A
Authority
CN
China
Prior art keywords
voice
data
speaker
gan
denoising
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010130403.3A
Other languages
Chinese (zh)
Inventor
陈昊亮
许敏强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Speakin Intelligent Technology Co ltd
Original Assignee
Guangzhou Speakin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Speakin Intelligent Technology Co ltd filed Critical Guangzhou Speakin Intelligent Technology Co ltd
Priority to CN202010130403.3A priority Critical patent/CN111341304A/en
Publication of CN111341304A publication Critical patent/CN111341304A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The application discloses a speaker voice feature training method, a device and equipment based on GAN, after the speaker voice data is subjected to conventional denoising processing, the obtained first denoising voice data Jining feature is extracted, the obtained first voice feature data is input into a generator preset with a GAN network, the first denoising voice data is subjected to secondary denoising by using a mask value to obtain second denoising voice data, and the second denoising voice data is used for voice feature training and recognition, so that the accuracy of speaker voice recognition is effectively improved, and the technical problem that the recognition accuracy of the existing voice recognition mode is low is solved.

Description

Method, device and equipment for training speech characteristics of speaker based on GAN
Technical Field
The present application relates to the field of speech processing technologies, and in particular, to a method, an apparatus, and a device for training speech characteristics of a speaker based on GAN.
Background
The voice recognition is an important means for identifying the same speaker, the existing speaker voiceprint identification is to acquire speaker voice data, perform voice feature extraction after denoising the speaker voice data, and then perform voice recognition through a preset voice recognition model, but the existing voice recognition mode is not high in recognition accuracy, so that the technical problem to be solved urgently by technical personnel in the field is still to further improve the accuracy of speaker voice recognition.
Disclosure of Invention
The application provides a method, a device and equipment for training the speech characteristics of a speaker based on GAN, which are used for solving the technical problem that the recognition accuracy of the existing speech recognition mode is not high.
In view of the above, a first aspect of the present application provides a method for training speech features of a speaker based on GAN, including:
acquiring voice data of a speaker through a recording device;
carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data;
performing feature extraction on the first de-noised voice data to obtain first voice feature data;
inputting the first voice feature data into a generator of a preset GAN network, and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data, wherein the ideal mask value is the ratio of the second voice feature data to the first voice feature data;
determining second de-noised voice data of the speaker voice according to the ideal mask value;
and inputting the second denoising voice data into a preset training network for voice characteristic training.
Optionally, the performing conventional denoising processing on the speaker voice data to obtain first denoised voice data includes:
and carrying out voice denoising processing based on a deep cyclic neural network on the speaker voice data to obtain first denoising voice data.
Optionally, the performing feature extraction on the first denoising voice data to obtain first voice feature data includes:
and performing MFCC feature extraction on the first de-noised voice data to obtain first voice feature data.
Optionally, after the feature extraction is performed on the first de-noised speech data to obtain first speech feature data, before the inputting the first speech feature data into a generator of a preset GAN network and outputting an ideal mask value of second speech feature data corresponding to the first speech feature data, the method further includes:
calculating a mean square error normalization processing value of the first voice characteristic data;
correspondingly, the inputting the first voice feature data into a generator of a preset GAN network and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data includes:
and inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data.
Optionally, the inputting the first voice feature data into a generator of a preset GAN network, and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data, may further include:
and training and testing the initial GAN network until the initial GAN network converges to obtain the preset GAN network.
A second aspect of the present application provides a GAN-based device for training speech characteristics of a speaker, comprising:
the acquisition unit is used for acquiring the voice data of the speaker through the recording equipment;
the first denoising unit is used for carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data;
the feature extraction unit is used for performing feature extraction on the first denoising voice data to obtain first voice feature data;
a mask unit, configured to input the first voice feature data into a generator of a preset GAN network, and output an ideal mask value of second voice feature data corresponding to the first voice feature data, where the ideal mask value is a ratio of the second voice feature data to the first voice feature data;
the second denoising unit is used for determining second denoising voice data of the speaker voice according to the ideal mask value;
and the first training unit is used for inputting the second denoising voice data into a preset training network for voice characteristic training.
Optionally, the feature extraction unit is specifically configured to:
and performing MFCC feature extraction on the first de-noised voice data to obtain first voice feature data.
Optionally, the method further comprises:
the second training unit is used for training and testing the initial GAN network until the initial GAN network converges to obtain the preset GAN network;
the normalization unit is used for calculating a mean square error normalization processing value of the first voice characteristic data;
correspondingly, the mask unit is specifically configured to:
and inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data.
In a third aspect, the present application provides a method and apparatus for training speech characteristics of a speaker based on GAN, the apparatus including a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute any of the GAN-based speaker speech feature training methods of the first aspect according to instructions in the program code.
According to the technical scheme, the embodiment of the application has the following advantages:
the application provides a speaker voice feature training method based on GAN, comprising the following steps: acquiring voice data of a speaker through a recording device; carrying out conventional denoising processing on speaker voice data to obtain first denoising voice data; performing feature extraction on the first de-noised voice data to obtain first voice feature data; inputting the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data, wherein the ideal mask value is the ratio of the second voice characteristic data to the first voice characteristic data; determining second de-noised voice data of the speaker voice according to the ideal mask value; and inputting the second denoising voice data into a preset training network for voice characteristic training. According to the method and the device, after the conventional denoising processing is carried out on the speaker voice data, the obtained first denoising voice data Jining feature is extracted, the obtained first voice feature data is input into a generator of a preset GAN network, the first denoising voice data is denoised for the second time by utilizing a mask value to obtain second denoising voice data, the second denoising voice data is utilized to carry out voice feature training and recognition, the accuracy of speaker voice recognition is effectively improved, and the technical problem that the recognition accuracy of the existing voice recognition mode is not high is solved.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for training speech features of a speaker based on GAN according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a GAN-based speaker speech feature training apparatus according to an embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
To facilitate understanding, referring to fig. 1, the present application provides an embodiment of a GAN-based speaker speech feature training method, including:
step 101, obtaining voice data of a speaker through a recording device.
It should be noted that, in the embodiment of the present application, speaker voice data needs to be acquired first, and the acquisition of the speaker voice data may be acquired by a recording device, or may be acquired from existing speaker voice data in a network in a web crawler manner.
And 102, carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data.
It should be noted that after the speaker voice data is obtained, the speaker voice data is subjected to conventional denoising processing, and the conventional denoising processing mode may preferentially select a voice denoising processing mode based on a deep cyclic neural network to obtain first denoised voice data.
And 103, performing feature extraction on the first denoising voice data to obtain first voice feature data.
It should be noted that, the feature extraction performed on the first denoised speech data may be MFCC feature extraction or PLP feature extraction.
And 104, inputting the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of the second voice characteristic data corresponding to the first voice characteristic data, wherein the ideal mask value is the ratio of the second voice characteristic data to the first voice characteristic data.
And 105, determining second de-noised voice data of the speaker voice according to the ideal mask value.
It should be noted that before inputting the first speech feature data into the generator of the preset GAN network, the initial GAN network needs to be trained and tested to obtain the preset GAN network. For the first language feature data, the mean and variance of each dimension element in the first voice feature data can be calculated, normalization processing is respectively carried out on the mean and variance of each dimension, and a mean variance normalization processing value of each dimension feature data of the first voice data is formed, so that valuable voice is effectively reserved, and noise is suppressed. And inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, denoising the first voice characteristic data by the generator of the preset GAN network according to the mean square error normalization processing value of the first voice characteristic data, generating an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data, and outputting the ideal mask value. Because the ideal mask value is the ratio of the second voice characteristic data to the first voice characteristic data, the second voice characteristic data is calculated according to the ideal mask value and the first voice characteristic data, and then the second voice characteristic data is subjected to inverse transformation of feature extraction to obtain second de-noised voice data.
And 106, inputting the second denoising voice data into a preset training network for voice characteristic training.
It should be noted that the second denoising voice data is input into the preset training network for voice feature training, and the training voice data is used for speaker voice recognition, so that the accuracy of speaker recognition can be effectively improved.
According to the method for training the speech characteristics of the speaker based on the GAN, after the conventional denoising processing is carried out on the speech data of the speaker, the obtained first denoising speech data Jining characteristics are extracted, the obtained first speech characteristic data are input into a generator preset with the GAN network, the first denoising speech data of the speaker are denoised for the second time by using a mask value to obtain second denoising speech data, and the second denoising speech data are used for carrying out speech characteristic training and recognition, so that the accuracy of speech recognition of the speaker is effectively improved, and the technical problem that the recognition accuracy of the existing speech recognition mode is not high is solved.
For ease of understanding, referring to fig. 2, an embodiment of a GAN-based speaker phonetic feature training apparatus is provided, comprising:
the acquisition unit is used for acquiring the voice data of the speaker through the recording equipment;
the first denoising unit is used for carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data;
the feature extraction unit is used for performing feature extraction on the first de-noised voice data to obtain first voice feature data;
the mask unit is used for inputting the first voice characteristic data into a generator of a preset GAN network and outputting an ideal mask value of the second voice characteristic data corresponding to the first voice characteristic data, wherein the ideal mask value is the ratio of the second voice characteristic data to the first voice characteristic data;
the second denoising unit is used for determining second denoising voice data of the speaker voice according to the ideal mask value;
and the first training unit is used for inputting the second denoising voice data into a preset training network for voice characteristic training.
Further, the first denoising unit is specifically configured to:
and carrying out voice denoising processing based on a deep cyclic neural network on the speaker voice data to obtain first denoising voice data.
Further, the feature extraction unit is specifically configured to:
and performing MFCC feature extraction on the first de-noised voice data to obtain first voice feature data.
Further, still include:
the second training unit is used for training and testing the initial GAN network until the initial GAN network converges to obtain a preset GAN network;
the normalization unit is used for calculating a mean square error normalization processing value of the first voice characteristic data;
correspondingly, the mask unit is specifically configured to:
and inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of the second voice characteristic data corresponding to the first voice characteristic data.
The application provides an embodiment of a method and a device for training speech characteristics of a speaker based on GAN, wherein the device comprises a processor and a memory:
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is used for executing the GAN-based speaker voice feature training method in the embodiment of the GAN-based speaker voice feature training method according to instructions in the program code.
In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer system (which may be a personal computer, a server, or a network system) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A method for training speech features of a speaker based on GAN, comprising:
acquiring voice data of a speaker through a recording device;
carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data;
performing feature extraction on the first de-noised voice data to obtain first voice feature data;
inputting the first voice feature data into a generator of a preset GAN network, and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data, wherein the ideal mask value is the ratio of the second voice feature data to the first voice feature data;
determining second de-noised voice data of the speaker voice according to the ideal mask value;
and inputting the second denoising voice data into a preset training network for voice characteristic training.
2. The GAN-based speaker voice feature training method as claimed in claim 1, wherein the performing a conventional denoising process on the speaker voice data to obtain a first denoised voice data comprises:
and carrying out voice denoising processing based on a deep cyclic neural network on the speaker voice data to obtain first denoising voice data.
3. The GAN-based speaker voice feature training method as claimed in claim 2, wherein the performing feature extraction on the first de-noised voice data to obtain first voice feature data comprises:
and performing MFCC feature extraction on the first de-noised voice data to obtain first voice feature data.
4. The GAN-based speaker voice feature training method as claimed in claim 3, wherein after the feature extraction is performed on the first de-noised voice data to obtain first voice feature data, before the inputting the first voice feature data into a generator of a preset GAN network and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data, the method further comprises:
calculating a mean square error normalization processing value of the first voice characteristic data;
correspondingly, the inputting the first voice feature data into a generator of a preset GAN network and outputting an ideal mask value of second voice feature data corresponding to the first voice feature data includes:
and inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data.
5. The GAN-based speaker voice feature training method as claimed in claim 1, wherein the inputting the first voice feature data into a generator of a preset GAN network and outputting the ideal mask value of the second voice feature data corresponding to the first voice feature data further comprises:
and training and testing the initial GAN network until the initial GAN network converges to obtain the preset GAN network.
6. A GAN-based speaker speech feature training device, comprising:
the acquisition unit is used for acquiring the voice data of the speaker through the recording equipment;
the first denoising unit is used for carrying out conventional denoising processing on the speaker voice data to obtain first denoising voice data;
the feature extraction unit is used for performing feature extraction on the first denoising voice data to obtain first voice feature data;
a mask unit, configured to input the first voice feature data into a generator of a preset GAN network, and output an ideal mask value of second voice feature data corresponding to the first voice feature data, where the ideal mask value is a ratio of the second voice feature data to the first voice feature data;
the second denoising unit is used for determining second denoising voice data of the speaker voice according to the ideal mask value;
and the first training unit is used for inputting the second denoising voice data into a preset training network for voice characteristic training.
7. The GAN-based speaker voice feature training device as claimed in claim 6, wherein the first denoising unit is specifically configured to:
and carrying out voice denoising processing based on a deep cyclic neural network on the speaker voice data to obtain first denoising voice data.
8. The GAN-based speaker voice feature training device as claimed in claim 7, wherein the feature extraction unit is specifically configured to:
and performing MFCC feature extraction on the first de-noised voice data to obtain first voice feature data.
9. The GAN-based speaker voice feature training device as claimed in claim 8, further comprising:
the second training unit is used for training and testing the initial GAN network until the initial GAN network converges to obtain the preset GAN network;
the normalization unit is used for calculating a mean square error normalization processing value of the first voice characteristic data;
correspondingly, the mask unit is specifically configured to:
and inputting the mean square error normalization processing value of the first voice characteristic data into a generator of a preset GAN network, and outputting an ideal mask value of second voice characteristic data corresponding to the first voice characteristic data.
10. A method and a device for training speech features of a speaker based on GAN are characterized in that the device comprises a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the GAN-based speaker speech feature training method according to any one of claims 1-5 according to instructions in the program code.
CN202010130403.3A 2020-02-28 2020-02-28 Method, device and equipment for training speech characteristics of speaker based on GAN Pending CN111341304A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010130403.3A CN111341304A (en) 2020-02-28 2020-02-28 Method, device and equipment for training speech characteristics of speaker based on GAN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010130403.3A CN111341304A (en) 2020-02-28 2020-02-28 Method, device and equipment for training speech characteristics of speaker based on GAN

Publications (1)

Publication Number Publication Date
CN111341304A true CN111341304A (en) 2020-06-26

Family

ID=71187170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010130403.3A Pending CN111341304A (en) 2020-02-28 2020-02-28 Method, device and equipment for training speech characteristics of speaker based on GAN

Country Status (1)

Country Link
CN (1) CN111341304A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700786A (en) * 2020-12-29 2021-04-23 西安讯飞超脑信息科技有限公司 Voice enhancement method, device, electronic equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107039036A (en) * 2017-02-17 2017-08-11 南京邮电大学 A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network
CN107910011A (en) * 2017-12-28 2018-04-13 科大讯飞股份有限公司 A kind of voice de-noising method, device, server and storage medium
CN108986835A (en) * 2018-08-28 2018-12-11 百度在线网络技术(北京)有限公司 Based on speech de-noising method, apparatus, equipment and the medium for improving GAN network
CN109119093A (en) * 2018-10-30 2019-01-01 Oppo广东移动通信有限公司 Voice de-noising method, device, storage medium and mobile terminal
CN109147810A (en) * 2018-09-30 2019-01-04 百度在线网络技术(北京)有限公司 Establish the method, apparatus, equipment and computer storage medium of speech enhan-cement network
CN109256139A (en) * 2018-07-26 2019-01-22 广东工业大学 A kind of method for distinguishing speek person based on Triplet-Loss
CN109326302A (en) * 2018-11-14 2019-02-12 桂林电子科技大学 A kind of sound enhancement method comparing and generate confrontation network based on vocal print
CN109410974A (en) * 2018-10-23 2019-03-01 百度在线网络技术(北京)有限公司 Sound enhancement method, device, equipment and storage medium
CN109785852A (en) * 2018-12-14 2019-05-21 厦门快商通信息技术有限公司 A kind of method and system enhancing speaker's voice
CN110164425A (en) * 2019-05-29 2019-08-23 北京声智科技有限公司 A kind of noise-reduction method, device and the equipment that can realize noise reduction
CN110223429A (en) * 2019-06-19 2019-09-10 上海应用技术大学 Voice access control system
CN110503974A (en) * 2019-08-29 2019-11-26 泰康保险集团股份有限公司 Fight audio recognition method, device, equipment and computer readable storage medium
WO2020029906A1 (en) * 2018-08-09 2020-02-13 腾讯科技(深圳)有限公司 Multi-person voice separation method and apparatus

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107039036A (en) * 2017-02-17 2017-08-11 南京邮电大学 A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network
CN107910011A (en) * 2017-12-28 2018-04-13 科大讯飞股份有限公司 A kind of voice de-noising method, device, server and storage medium
CN109256139A (en) * 2018-07-26 2019-01-22 广东工业大学 A kind of method for distinguishing speek person based on Triplet-Loss
WO2020029906A1 (en) * 2018-08-09 2020-02-13 腾讯科技(深圳)有限公司 Multi-person voice separation method and apparatus
CN108986835A (en) * 2018-08-28 2018-12-11 百度在线网络技术(北京)有限公司 Based on speech de-noising method, apparatus, equipment and the medium for improving GAN network
CN109147810A (en) * 2018-09-30 2019-01-04 百度在线网络技术(北京)有限公司 Establish the method, apparatus, equipment and computer storage medium of speech enhan-cement network
CN109410974A (en) * 2018-10-23 2019-03-01 百度在线网络技术(北京)有限公司 Sound enhancement method, device, equipment and storage medium
CN109119093A (en) * 2018-10-30 2019-01-01 Oppo广东移动通信有限公司 Voice de-noising method, device, storage medium and mobile terminal
CN109326302A (en) * 2018-11-14 2019-02-12 桂林电子科技大学 A kind of sound enhancement method comparing and generate confrontation network based on vocal print
CN109785852A (en) * 2018-12-14 2019-05-21 厦门快商通信息技术有限公司 A kind of method and system enhancing speaker's voice
CN110164425A (en) * 2019-05-29 2019-08-23 北京声智科技有限公司 A kind of noise-reduction method, device and the equipment that can realize noise reduction
CN110223429A (en) * 2019-06-19 2019-09-10 上海应用技术大学 Voice access control system
CN110503974A (en) * 2019-08-29 2019-11-26 泰康保险集团股份有限公司 Fight audio recognition method, device, equipment and computer readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700786A (en) * 2020-12-29 2021-04-23 西安讯飞超脑信息科技有限公司 Voice enhancement method, device, electronic equipment and storage medium
CN112700786B (en) * 2020-12-29 2024-03-12 西安讯飞超脑信息科技有限公司 Speech enhancement method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106683680B (en) Speaker recognition method and device, computer equipment and computer readable medium
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
WO2021128741A1 (en) Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium
US20160111112A1 (en) Speaker change detection device and speaker change detection method
CN110544469B (en) Training method and device of voice recognition model, storage medium and electronic device
CN111312286A (en) Age identification method, age identification device, age identification equipment and computer readable storage medium
CN110556126A (en) Voice recognition method and device and computer equipment
CN112382300A (en) Voiceprint identification method, model training method, device, equipment and storage medium
CN111108552A (en) Voiceprint identity identification method and related device
CN106782503A (en) Automatic speech recognition method based on physiologic information in phonation
CN110634490A (en) Voiceprint identification method, device and equipment
CN111108554A (en) Voiceprint recognition method based on voice noise reduction and related device
CN113112992B (en) Voice recognition method and device, storage medium and server
CN111133508A (en) Method and device for selecting comparison phonemes
CN113409771B (en) Detection method for forged audio frequency, detection system and storage medium thereof
CN111341304A (en) Method, device and equipment for training speech characteristics of speaker based on GAN
CN107993666B (en) Speech recognition method, speech recognition device, computer equipment and readable storage medium
CN113178204B (en) Single-channel noise reduction low-power consumption method, device and storage medium
CN112786058B (en) Voiceprint model training method, voiceprint model training device, voiceprint model training equipment and storage medium
CN111462736B (en) Image generation method and device based on voice and electronic equipment
CN111341321A (en) Matlab-based spectrogram generating and displaying method and device
CN112489678A (en) Scene recognition method and device based on channel characteristics
CN111149154B (en) Voiceprint recognition method, device, equipment and storage medium
CN113782033B (en) Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium
CN112634942B (en) Method for identifying originality of mobile phone recording, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200626