CN113782043A - Voice acquisition method and device, electronic equipment and computer readable storage medium - Google Patents

Voice acquisition method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN113782043A
CN113782043A CN202111041110.9A CN202111041110A CN113782043A CN 113782043 A CN113782043 A CN 113782043A CN 202111041110 A CN202111041110 A CN 202111041110A CN 113782043 A CN113782043 A CN 113782043A
Authority
CN
China
Prior art keywords
voice
sampling frequency
target
audio data
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111041110.9A
Other languages
Chinese (zh)
Inventor
蒋毅
李健
武卫东
陈明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinovoice Technology Co Ltd
Original Assignee
Beijing Sinovoice Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinovoice Technology Co Ltd filed Critical Beijing Sinovoice Technology Co Ltd
Priority to CN202111041110.9A priority Critical patent/CN113782043A/en
Publication of CN113782043A publication Critical patent/CN113782043A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The embodiment of the invention provides a voice acquisition method, a voice acquisition device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: determining a target sampling frequency of the required target audio data; according to the target sampling frequency, firstly performing oversampling on the voice sent by the user, and then performing down-sampling so as to enhance the voice signal in the voice sent by the user; and taking the voice data after the enhancement processing as the required target audio data. In the embodiment of the invention, the voice signal sent by the user is enhanced, but the noise inside the equipment is not influenced by oversampling, so the signal-to-noise ratio of the voice signal can be simply and effectively improved by enhancing the voice signal energy and simultaneously not changing the method of the noise energy inside the equipment.

Description

Voice acquisition method and device, electronic equipment and computer readable storage medium
Technical Field
The embodiment of the invention relates to the field of voice processing, in particular to a voice acquisition method, a voice acquisition device, electronic equipment and a computer-readable storage medium.
Background
Speech acquisition is involved in a large number of scenarios, such as television program recording, movie recording, music recording, educational video recording, etc. However, when voice acquisition is performed, it is difficult to avoid noise interference, which mainly includes environmental noise and noise inside the recording device. Thereby causing noise to be included in the finally acquired audio data and affecting the audio quality. Therefore, it is necessary to collect as many target speaking sounds as possible to increase the volume or reduce the noise effect so as to increase the signal-to-noise ratio of the collected audio data and improve the audio quality.
In the related art, there are two main solutions, one is a microphone array solution and one is a digital gain solution.
Wherein, the microphone array scheme adopts a microphone array to enhance the volume of the voice signal by the array gain of the microphone array. However, this solution involves multiple microphone sensors and acquisition circuits, the acquisition system has a complex structure and high cost, and due to the differences in channel gain, frequency response curve and consistency of the acquisition conditioning circuit among the multiple microphone sensors, the enhanced speech has reverberation in the time domain and distortion in the frequency domain.
The digital gain scheme is actually a dynamic gain adjustment method that uses digital gain for sound signal amplification. The scheme can improve the energy of noise signals mixed in the voice signals while amplifying the acquired weak signals, so that the signal-to-noise ratio of the acquired voice cannot be effectively improved.
It is therefore desirable to provide a simple and effective solution for improving the signal-to-noise ratio of audio data to improve the audio quality.
Disclosure of Invention
The invention provides a voice acquisition method, a voice acquisition device, electronic equipment and a computer readable storage medium.
In order to solve the above problem, in a first aspect, an embodiment of the present invention provides a method for acquiring a voice, where the method includes:
determining a target sampling frequency of the required target audio data;
according to the target sampling frequency, firstly performing oversampling on the voice sent by the user, and then performing down-sampling so as to enhance the voice signal in the voice sent by the user;
and taking the voice data after the enhancement processing as the required target audio data.
Optionally, according to the target sampling frequency, performing oversampling on a voice uttered by a user and then performing downsampling includes:
collecting voice sent by a user at an actual sampling frequency which is N times higher than the target sampling frequency to obtain audio data of N sampling values in unit time;
adding the audio data of every N adjacent sampling values to obtain the audio data of the sampling values in a unit time;
and taking the obtained audio data of the plurality of sampling values as the voice data after enhancement processing.
Optionally, determining a target sampling frequency of the desired target audio data comprises:
and determining the target sampling frequency according to the audio analysis requirement or the actual effective frequency range of the sampling object.
Optionally, collecting the voice uttered by the user at an actual sampling frequency N times higher than the target sampling frequency includes:
selecting an actual high-speed audio acquisition circuit consisting of a high-frequency response microphone and a high-speed signal acquisition circuit corresponding to the actual sampling frequency;
and acquiring the voice sent by the user by utilizing the actual high-speed audio acquisition circuit.
Optionally, the method further comprises:
collecting voice sent by a user in an initial time period;
analyzing the collected voice to determine whether the voice sent by the user is far-field voice;
according to the target sampling frequency, firstly performing oversampling on the collected voice and then performing downsampling, and the method comprises the following steps:
and under the condition that the collected voice is far-field voice, according to the target sampling frequency, firstly performing oversampling on the collected voice and then performing downsampling.
In a second aspect, an embodiment of the present invention provides a speech acquisition apparatus, where the apparatus includes:
the target sampling frequency determining module is used for determining the target sampling frequency of the required target audio data;
the enhancement processing module is used for firstly carrying out oversampling and then carrying out downsampling on the voice sent by the user according to the target sampling frequency so as to carry out enhancement processing on the voice signal in the voice sent by the user;
and the target audio data acquisition module is used for taking the voice data after the enhancement processing as the required target audio data.
Optionally, the enhancement processing module includes:
the oversampling submodule is used for acquiring voice sent by a user at an actual sampling frequency which is N times higher than the target sampling frequency to obtain audio data of N sampling values in unit time;
the down-sampling submodule is used for adding the audio data of every N adjacent sampling values to be used as the audio data of the sampling value in unit time;
and the voice data determination submodule is used for taking the obtained audio data of the plurality of sampling values as the voice data after enhancement processing.
Optionally, the target sampling frequency determination module includes:
and the target sampling frequency determining submodule is used for determining the target sampling frequency according to the audio analysis requirement or the actual effective frequency range of the sampling object.
Optionally, the oversampling submodule includes:
the selection unit is used for selecting an actual high-speed audio acquisition circuit consisting of a high-frequency response microphone and a high-speed signal acquisition circuit corresponding to the actual sampling frequency;
and the acquisition unit is used for acquiring the voice sent by the user by utilizing the actual high-speed audio acquisition circuit.
Optionally, the apparatus further comprises:
the acquisition module is used for acquiring voice sent by a user in an initial time period;
the analysis module is used for analyzing the collected voice and determining whether the voice sent by the user is far-field voice;
the enhancement processing module is further configured to perform oversampling and then down-sampling on the collected voice according to the target sampling frequency under the condition that the collected voice is far-field voice.
In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program that is stored in the memory and is executable on the processor, where the processor executes the computer program to implement the voice collecting method provided in the embodiment of the present invention.
In a fourth aspect, the embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor, and the program includes the steps of the speech acquisition method proposed in the embodiment of the present invention.
In the embodiment of the invention, the target sampling frequency of the required target audio data is determined firstly, then the voice sent by the user is subjected to oversampling and then is subjected to downsampling according to the target sampling frequency, in the process, the voice signal sent by the user is enhanced, but the noise inside the equipment is not influenced by the oversampling, therefore, the signal-to-noise ratio of the voice signal can be simply and effectively improved by a method of enhancing the voice signal energy and not changing the noise energy inside the equipment, and the audio quality is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments or the related technical descriptions will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a flowchart of a voice collecting method according to an embodiment of the present invention;
fig. 2 is a flowchart of a voice collecting method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a structure of a voice collecting device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A flowchart of a speech acquisition method provided in an embodiment of the present invention is shown in fig. 1. The voice acquisition method provided by the invention can be applied to voice acquisition processes of television program recording, film recording, music recording, teaching video recording and the like. The voice acquisition method comprises the following steps:
in step S110, a target sampling frequency of the desired target audio data is determined.
In this embodiment, the sound collection scheme may be customized for a specific application, which is specifically represented by determining a target sampling frequency according to an actual requirement of voice analysis or an effective frequency range of a collection object, and collecting a voice signal with the target sampling frequency as a target.
And step S120, according to the target sampling frequency, firstly performing oversampling on the voice sent by the user, and then performing down-sampling on the voice so as to enhance the voice signal in the voice sent by the user.
In this embodiment, oversampling means: after the target sampling frequency is determined, the voice signal sent by the sampling object is collected by using the actual sampling frequency N times the target sampling frequency.
In this embodiment, the down-sampling refers to: after the audio data is acquired by using the actual sampling frequency, sampling values in the audio data are integrated according to a ratio between the actual sampling frequency and the target sampling frequency, and N sampling values in unit time (one sampling time interval of a sampling period of the target sampling frequency is taken as unit time) are added to obtain the sampling value in the unit time.
Step S130, the enhanced voice data is used as the required target audio data.
In this embodiment, the voice uttered by the user is first oversampled, so that N sampling values can be acquired in unit time. And then down-sampling the collected multiple sampling values, and adding N sampling values in unit time to obtain enhanced voice data, wherein the enhanced voice data is used as required target audio data to obtain audio data after voice signal enhancement.
In the embodiment of the invention, the target sampling frequency of the required target audio data is determined firstly, then the voice sent by the user is subjected to oversampling and then downsampling according to the target sampling frequency, in the process, the voice signal sent by the user is enhanced, but the noise inside the equipment is not influenced by the oversampling and downsampling processes, therefore, the invention can simply and effectively improve the signal-to-noise ratio of the voice signal by enhancing the voice signal energy without changing the method of the noise energy inside the equipment, thereby improving the audio quality.
In this embodiment, before executing step S120, the voice collecting method may further include:
step S1, collecting the voice sent by the user in the initial time period;
in this embodiment, the voice uttered by the user may also be collected within the initial time period to perform the test.
Step S2, analyzing the collected voice, and determining whether the voice sent by the user is far-field voice;
in this embodiment, the user voice collected in the initial time period may be analyzed to determine whether the voice uttered by the user is far-field voice.
In the present embodiment, a voice whose distance from the reference point of the acquisition sensor is much larger than the signal wavelength is referred to as a far-field voice. In this embodiment, whether the sound source voice is far-field voice can be determined by analyzing the distance between the reference point of the microphone and the position of the sound source and the wavelength of the sound source voice signal.
In this embodiment, when the collected voice is far-field voice, the collected voice is first oversampled and then downsampled according to the target sampling frequency.
In this embodiment, when the voice of the sound source is determined to be far-field voice, a method of oversampling and then downsampling may be adopted to perform enhancement processing on the subsequently acquired voice signal.
In the invention, the inventor finds that under the far-field voice acquisition condition that the target speaker is far away from the acquisition sensor, the voice becomes weak due to the attenuation of a spatial path, the energy of the voice reaching the acquisition sensor is weak, the energy of the voice signal acquired directly each time is less, and the voice is easily interfered by noise.
The inventor further discovers that in a voice acquisition scene, noise contained in the acquired audio data is mainly environmental noise and internal device noise, wherein in a far-field voice acquisition scene, the environmental noise and the internal device noise have equivalent influence on voice signals, and the influence can be approximately one to one. Therefore, the inventor proposes that the voice uttered by the user can be oversampled and then downsampled so as to enhance the voice signal in the voice uttered by the user. In the oversampling process, although the signal enhancement is performed on both the speech signal and the environmental noise signal, the internal noise signal of the device is kept unchanged, so that the signal-to-noise ratio of the speech signal can be effectively improved.
A flowchart of a speech acquisition method provided in an embodiment of the present invention is shown in fig. 2. In this embodiment, the voice collecting method includes:
in step S210, a target sampling frequency of the desired target audio data is determined.
In this embodiment, the step S210 specifically includes: and determining the target sampling frequency according to the audio analysis requirement or the actual effective frequency range of the sampling object.
In this embodiment, the target sampling frequency of the target audio data may be determined according to the actual application scenario, for example: during the voice call, the voice sampling frequency is required to be 8kHz, so that the target sampling frequency can be determined to be 8 kHz.
In this embodiment, the target application frequency may also be determined according to the actual effective frequency range of the sampling object in the actual application scene.
Step S220, collecting the voice sent by the user at the actual sampling frequency which is N times higher than the target sampling frequency to obtain the audio data of N sampling values in unit time.
In this embodiment, after the target sampling frequency is determined, an actual sampling frequency N times the target sampling frequency may be determined according to actual requirements. Wherein, N can be any natural number more than 1, and can be customized according to actual needs.
For example, assuming that the target sampling frequency is 8kHz sampling frequency, 2 times, 4 times or 8 times of actual sampling frequency (16kHz, 32kHz, 64kHz) may be selected for oversampling, resulting in audio data with 2 times, 4 times and 8 times of sampling value per unit time. In the present embodiment, the unit time refers to one sampling time interval of the sampling period of the target sampling frequency.
In this embodiment, a target sampling frequency may be used to perform test sampling, analyze the energy of the collected voice, determine the actually required voice signal energy, and determine the required multiple according to the ratio of the two.
In practical applications, the step S220 specifically includes the following sub-steps:
and step S221, selecting an actual high-speed audio acquisition circuit consisting of a high-frequency response microphone and a high-speed signal acquisition circuit corresponding to the actual sampling frequency.
In this embodiment, after determining the required actual sampling frequency, a corresponding high-frequency-response microphone may be selected, and an actual high-speed audio acquisition circuit may be formed by using the high-frequency-response microphone and the high-speed acquisition signal acquisition circuit.
And step S222, acquiring the voice sent by the user by utilizing the actual high-speed audio acquisition circuit.
In this embodiment, the corresponding actual high-speed audio acquisition circuit can be customized according to the actual voice analysis requirement, so that the actual high-speed audio acquisition circuit is used as an acquisition sensor to acquire the voice of the user.
In step S230, the audio data of every N adjacent sample values are added to be the audio data of the sample value in one unit time.
In this embodiment, after obtaining the audio data of N sampling values in a unit time, the audio data is down-sampled by a multiple of oversampling, and the audio data of every N adjacent sampling values is added to obtain the audio data of the sampling value of one unit time class.
For example, after obtaining audio data of 2 times or 4 times or 8 times of sampling values in a unit time, each 2 sampling values, each 4 sampling values, and each 8 sampling values may be respectively subjected to multipoint accumulation to serve as a single sampling value, so that the energy of the single sampling point is increased.
In step S240, the audio data of the obtained plurality of sampling values is used as the speech data after enhancement processing.
In this embodiment, audio data of a plurality of sampling values can be obtained by oversampling and down-sampling, and in the audio data, the internal noise generated by the internal circuit of the acquisition apparatus is not changed, and the energy of the sampling point per unit time is increased, so that audio data in which the sampling value is enhanced but the internal noise is not changed can be obtained.
Step S250, the enhanced voice data is taken as the required target audio data.
In this embodiment, after obtaining the enhanced voice data, the voice data may be used as the required target audio data, so as to obtain high quality audio data with a higher signal-to-noise ratio.
In this embodiment, when determining the target sampling frequency of the required target audio data, a corresponding voice acquisition scheme may be customized, and the voice sent by the user is first oversampled and then downsampled, and in the voice acquisition process, the voice signal sent by the user is enhanced, but the noise inside the device is not affected by the oversampling and downsampling processes.
Referring to fig. 3, a block diagram of a structure of a voice collecting apparatus 300 according to the present invention is shown, specifically, the voice collecting apparatus 300 may include the following modules:
a target sampling frequency determination module 301, configured to determine a target sampling frequency of the required target audio data;
the enhancement processing module 302 is configured to perform oversampling on the voice sent by the user according to the target sampling frequency, and then perform down-sampling on the voice to perform enhancement processing on a voice signal in the voice sent by the user;
and a target audio data obtaining module 303, configured to take the enhanced voice data as the required target audio data.
Optionally, the enhancement processing module 302 includes:
the oversampling submodule is used for acquiring voice sent by a user at an actual sampling frequency which is N times higher than the target sampling frequency to obtain audio data of N sampling values in unit time;
the down-sampling submodule is used for adding the audio data of every N adjacent sampling values to be used as the audio data of the sampling value in unit time;
and the voice data determination submodule is used for taking the obtained audio data of the plurality of sampling values as the voice data after enhancement processing.
Optionally, the target sampling frequency determining module 301 includes:
and the target sampling frequency determining submodule is used for determining the target sampling frequency according to the audio analysis requirement or the actual effective frequency range of the sampling object.
Optionally, the oversampling submodule includes:
the selection unit is used for selecting an actual high-speed audio acquisition circuit consisting of a high-frequency response microphone and a high-speed signal acquisition circuit corresponding to the actual sampling frequency;
and the acquisition unit is used for acquiring the voice sent by the user by utilizing the actual high-speed audio acquisition circuit.
Optionally, the apparatus further comprises:
the acquisition module is used for acquiring voice sent by a user in an initial time period;
the analysis module is used for analyzing the collected voice and determining whether the voice sent by the user is far-field voice;
the enhancement processing module 302 is further configured to, when the collected voice is far-field voice, perform oversampling on the collected voice first and then perform downsampling on the collected voice according to the target sampling frequency.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Correspondingly, the invention further provides an electronic device, which includes a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor implements the voice acquisition method according to the embodiment of the invention when executing the computer program, and can achieve the same technical effects, and the details are not repeated here to avoid repetition. The electronic device can be a PC, a mobile terminal, a personal digital assistant, a tablet computer and the like.
The present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the voice collecting method according to the embodiments of the present invention, and can achieve the same technical effects, and is not described herein again to avoid repetition. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The speech acquisition method, the apparatus, the electronic device and the computer-readable storage medium provided by the present invention are described in detail above, and a specific example is applied in the text to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Claims (10)

1. A method for speech acquisition, the method comprising:
determining a target sampling frequency of the required target audio data;
according to the target sampling frequency, firstly performing oversampling on the voice sent by the user, and then performing down-sampling so as to enhance the voice signal in the voice sent by the user;
and taking the voice data after the enhancement processing as the required target audio data.
2. The method of claim 1, wherein oversampling and then downsampling the speech uttered by the user according to the target sampling frequency comprises:
collecting voice sent by a user at an actual sampling frequency which is N times higher than the target sampling frequency to obtain audio data of N sampling values in unit time;
adding the audio data of every N adjacent sampling values to obtain the audio data of the sampling values in a unit time;
and taking the obtained audio data of the plurality of sampling values as the voice data after enhancement processing.
3. The method of claim 1, wherein determining a desired target sampling frequency for the target audio data comprises:
and determining the target sampling frequency according to the audio analysis requirement or the actual effective frequency range of the sampling object.
4. The method of claim 2, wherein collecting speech uttered by a user at an actual sampling frequency N times higher than the target sampling frequency comprises:
selecting an actual high-speed audio acquisition circuit consisting of a high-frequency response microphone and a high-speed signal acquisition circuit corresponding to the actual sampling frequency;
and acquiring the voice sent by the user by utilizing the actual high-speed audio acquisition circuit.
5. The method according to any one of claims 1-4, further comprising:
collecting voice sent by a user in an initial time period;
analyzing the collected voice to determine whether the voice sent by the user is far-field voice;
according to the target sampling frequency, firstly performing oversampling on the collected voice and then performing downsampling, and the method comprises the following steps:
and under the condition that the collected voice is far-field voice, according to the target sampling frequency, firstly performing oversampling on the collected voice and then performing downsampling.
6. A speech acquisition device, the device comprising:
the target sampling frequency determining module is used for determining the target sampling frequency of the required target audio data;
the enhancement processing module is used for firstly carrying out oversampling and then carrying out downsampling on the voice sent by the user according to the target sampling frequency so as to carry out enhancement processing on the voice signal in the voice sent by the user;
and the target audio data acquisition module is used for taking the voice data after the enhancement processing as the required target audio data.
7. The apparatus of claim 6, wherein the enhancement processing module comprises:
the oversampling submodule is used for acquiring voice sent by a user at an actual sampling frequency which is N times higher than the target sampling frequency to obtain audio data of N sampling values in unit time;
the down-sampling submodule is used for adding the audio data of every N adjacent sampling values to be used as the audio data of the sampling value in unit time;
and the voice data determination submodule is used for taking the obtained audio data of the plurality of sampling values as the voice data after enhancement processing.
8. The apparatus of claim 6, wherein the target sampling frequency determination module comprises:
and the target sampling frequency determining submodule is used for determining the target sampling frequency according to the audio analysis requirement or the actual effective frequency range of the sampling object.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the speech acquisition method of any one of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the speech acquisition method according to any one of claims 1 to 5.
CN202111041110.9A 2021-09-06 2021-09-06 Voice acquisition method and device, electronic equipment and computer readable storage medium Pending CN113782043A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111041110.9A CN113782043A (en) 2021-09-06 2021-09-06 Voice acquisition method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111041110.9A CN113782043A (en) 2021-09-06 2021-09-06 Voice acquisition method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113782043A true CN113782043A (en) 2021-12-10

Family

ID=78841394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111041110.9A Pending CN113782043A (en) 2021-09-06 2021-09-06 Voice acquisition method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113782043A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117278896A (en) * 2023-11-23 2023-12-22 深圳市昂思科技有限公司 Voice enhancement method and device based on double microphones and hearing aid equipment

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09289437A (en) * 1996-04-22 1997-11-04 Sony Corp Digital limiter device
JP2002073073A (en) * 2000-08-28 2002-03-12 Sharp Corp Voice recognition device and program recording medium
JP2005110018A (en) * 2003-09-30 2005-04-21 Tadashi Aoki METHOD AND SYSTEM FOR VoIP VOICE COMMUNICATION, AND ITS TRANSMITTING TERMINAL, RECEIVING TERMINAL AND PROGRAM
JP2005159667A (en) * 2003-11-25 2005-06-16 Yamaha Corp Acoustic signal compressor
CN1689069A (en) * 2002-09-06 2005-10-26 松下电器产业株式会社 Sound encoding apparatus and sound encoding method
JP2006243042A (en) * 2005-02-28 2006-09-14 Sanyo Electric Co Ltd High-frequency interpolating device and reproducing device
CN101221767A (en) * 2008-01-23 2008-07-16 晨星半导体股份有限公司 Voice boosting device and method used on the same
JP2009094836A (en) * 2007-10-10 2009-04-30 Victor Co Of Japan Ltd Digital speech processing apparatus and digital speech processing program
US20090182555A1 (en) * 2008-01-16 2009-07-16 Mstar Semiconductor, Inc. Speech Enhancement Device and Method for the Same
CN101499282A (en) * 2008-02-03 2009-08-05 深圳艾科创新微电子有限公司 Voice A/D conversion method and device
US20120114139A1 (en) * 2010-11-05 2012-05-10 Industrial Technology Research Institute Methods and systems for suppressing noise
CN106575508A (en) * 2014-06-10 2017-04-19 瑞内特有限公司 Digital encapsulation of audio signals
CN106782592A (en) * 2016-12-27 2017-05-31 中山大学花都产业科技研究院 A kind of echo and the system and method uttered long and high-pitched sounds for eliminating network sound transmission
CN110267163A (en) * 2019-06-18 2019-09-20 重庆清文科技有限公司 A kind of virtual low frequency Enhancement Method of direct sound, system, medium and equipment
CN110534125A (en) * 2019-09-11 2019-12-03 清华大学无锡应用技术研究院 A kind of real-time voice enhancing system and method inhibiting competitive noise
CN110610717A (en) * 2019-08-30 2019-12-24 西南电子技术研究所(中国电子科技集团公司第十研究所) Separation method of mixed signals in complex frequency spectrum environment
CN111243619A (en) * 2020-01-06 2020-06-05 平安科技(深圳)有限公司 Training method and device for voice signal segmentation model and computer equipment
CN111402908A (en) * 2020-03-30 2020-07-10 Oppo广东移动通信有限公司 Voice processing method, device, electronic equipment and storage medium

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09289437A (en) * 1996-04-22 1997-11-04 Sony Corp Digital limiter device
JP2002073073A (en) * 2000-08-28 2002-03-12 Sharp Corp Voice recognition device and program recording medium
CN1689069A (en) * 2002-09-06 2005-10-26 松下电器产业株式会社 Sound encoding apparatus and sound encoding method
JP2005110018A (en) * 2003-09-30 2005-04-21 Tadashi Aoki METHOD AND SYSTEM FOR VoIP VOICE COMMUNICATION, AND ITS TRANSMITTING TERMINAL, RECEIVING TERMINAL AND PROGRAM
JP2005159667A (en) * 2003-11-25 2005-06-16 Yamaha Corp Acoustic signal compressor
JP2006243042A (en) * 2005-02-28 2006-09-14 Sanyo Electric Co Ltd High-frequency interpolating device and reproducing device
JP2009094836A (en) * 2007-10-10 2009-04-30 Victor Co Of Japan Ltd Digital speech processing apparatus and digital speech processing program
US20090182555A1 (en) * 2008-01-16 2009-07-16 Mstar Semiconductor, Inc. Speech Enhancement Device and Method for the Same
CN101221767A (en) * 2008-01-23 2008-07-16 晨星半导体股份有限公司 Voice boosting device and method used on the same
CN101499282A (en) * 2008-02-03 2009-08-05 深圳艾科创新微电子有限公司 Voice A/D conversion method and device
US20120114139A1 (en) * 2010-11-05 2012-05-10 Industrial Technology Research Institute Methods and systems for suppressing noise
CN106575508A (en) * 2014-06-10 2017-04-19 瑞内特有限公司 Digital encapsulation of audio signals
CN106782592A (en) * 2016-12-27 2017-05-31 中山大学花都产业科技研究院 A kind of echo and the system and method uttered long and high-pitched sounds for eliminating network sound transmission
CN110267163A (en) * 2019-06-18 2019-09-20 重庆清文科技有限公司 A kind of virtual low frequency Enhancement Method of direct sound, system, medium and equipment
CN110610717A (en) * 2019-08-30 2019-12-24 西南电子技术研究所(中国电子科技集团公司第十研究所) Separation method of mixed signals in complex frequency spectrum environment
CN110534125A (en) * 2019-09-11 2019-12-03 清华大学无锡应用技术研究院 A kind of real-time voice enhancing system and method inhibiting competitive noise
CN111243619A (en) * 2020-01-06 2020-06-05 平安科技(深圳)有限公司 Training method and device for voice signal segmentation model and computer equipment
CN111402908A (en) * 2020-03-30 2020-07-10 Oppo广东移动通信有限公司 Voice processing method, device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117278896A (en) * 2023-11-23 2023-12-22 深圳市昂思科技有限公司 Voice enhancement method and device based on double microphones and hearing aid equipment
CN117278896B (en) * 2023-11-23 2024-03-19 深圳市昂思科技有限公司 Voice enhancement method and device based on double microphones and hearing aid equipment

Similar Documents

Publication Publication Date Title
US20220270638A1 (en) Method and apparatus for processing live stream audio, and electronic device and storage medium
CN108447496B (en) Speech enhancement method and device based on microphone array
KR102191736B1 (en) Method and apparatus for speech enhancement with artificial neural network
CN112185410B (en) Audio processing method and device
CN110875056B (en) Speech transcription device, system, method and electronic device
CN111477238A (en) Echo cancellation method and device and electronic equipment
CN110970051A (en) Voice data acquisition method, terminal and readable storage medium
CN113782043A (en) Voice acquisition method and device, electronic equipment and computer readable storage medium
CN113674752A (en) Method and device for reducing noise of audio signal, readable medium and electronic equipment
CN113409800A (en) Processing method and device for monitoring audio, storage medium and electronic equipment
CN110096250B (en) Audio data processing method and device, electronic equipment and storage medium
CN113241088B (en) Training method and device of voice enhancement model and voice enhancement method and device
CN111370017B (en) Voice enhancement method, device and system
CN114758668A (en) Training method of voice enhancement model and voice enhancement method
CN111462743B (en) Voice signal processing method and device
CN114220451A (en) Audio denoising method, electronic device, and storage medium
CN113470673A (en) Data processing method, device, equipment and storage medium
CN109378012B (en) Noise reduction method and system for recording audio by single-channel voice equipment
CN113517000A (en) Echo cancellation test method, terminal and storage device
US20230360662A1 (en) Method and device for processing a binaural recording
CN117133303B (en) Voice noise reduction method, electronic equipment and medium
CN113611271B (en) Digital volume augmentation method and device suitable for mobile terminal and storage medium
CN110928515B (en) Split screen display method, electronic device and computer readable storage medium
CN117686975A (en) Direction correction-based sound source positioning method, direction correction-based sound source positioning device, direction correction-based sound source positioning equipment and storage medium
CN115273871A (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination