CN111785292B - Speech reverberation intensity estimation method and device based on image recognition and storage medium - Google Patents
Speech reverberation intensity estimation method and device based on image recognition and storage medium Download PDFInfo
- Publication number
- CN111785292B CN111785292B CN202010426246.0A CN202010426246A CN111785292B CN 111785292 B CN111785292 B CN 111785292B CN 202010426246 A CN202010426246 A CN 202010426246A CN 111785292 B CN111785292 B CN 111785292B
- Authority
- CN
- China
- Prior art keywords
- reverberation
- intensity
- image recognition
- voice
- energy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000009499 grossing Methods 0.000 claims abstract description 9
- 238000005259 measurement Methods 0.000 claims abstract description 7
- 238000001514 detection method Methods 0.000 claims abstract description 6
- 238000013528 artificial neural network Methods 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 16
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 238000001228 spectrum Methods 0.000 claims description 8
- 206010019133 Hangover Diseases 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
- G10L21/14—Transforming into visible information by displaying frequency domain information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/08—Feature extraction
- G06F2218/10—Feature extraction by analysing the shape of a waveform, e.g. extracting parameters relating to peaks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Quality & Reliability (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a voice reverberation intensity estimation method, a device and a storage medium based on image recognition, which convert reverberation voice into a three-dimensional spectrogram; performing image detection on the three-dimensional spectrogram to obtain a tail section of the reverberation voice in the three-dimensional spectrogram; calculating the energy intensity of the trailing section, and taking the energy intensity as an initial estimation value of the reverberation intensity; and finally, smoothing the initial estimation values of more than two tail sections to obtain a final estimation value, and taking the final estimation value as the measurement of the reverberation intensity of the reverberation voice, thereby greatly improving the anti-interference performance and the accuracy of the reverberation intensity measurement.
Description
Technical Field
The invention relates to the technical field of voice processing, in particular to a voice reverberation intensity estimation method based on image recognition, a voice reverberation intensity estimation device based on image recognition and a computer readable storage medium.
Background
The reverberation effect is an important phenomenon of indoor acoustics, and is generated by multiple reflections of sound in a closed space. In applications such as hands-free telephones, video teleconferencing systems, hearing aids, man-machine dialog systems, reverberation effects are an important factor affecting the intelligibility of speech signals; meanwhile, it is also an important factor affecting the binaural effect in applications such as stereo theaters, stereo car sound systems, and the like.
However, in practical life, there are very few ways to measure the reverberation strength, and the commonly used reverberation strength estimation methods mainly include:
(1) Estimating the reverberation strength according to the reverberation time:
reverberation time (denoted as RT) 60 ) Is defined as: the time it takes for the residual acoustic energy in a particular room space, from when the acoustic excitation ceases, to decay to 60dB below the energy at the initial observation after multiple reflections. Reverberation time is an important index for measuring the Reverberation characteristics of a specific room space, and is closely related to the calculated estimation of Late-Reverberation (Late-Reverberation) power in a dereverberation algorithm.
However, blind source reverberation time is an academic problem, and particularly when only one channel is used, it is difficult to accurately obtain the reverberation time in any environment.
(2) Estimating the reverberation strength from the SRMR values:
the reverberation modulation energy ratio (SRMR) value is an estimate of the reverberation strength by calculating the speech-to-reverberation adjustment energy ratio. However, SRMR is text dependent and is affected by vowels in speech, and there may be no reverberation but a high reverberation strength is returned.
Disclosure of Invention
The invention mainly aims to provide a method, a device and a storage medium for estimating the reverberation intensity of voice based on image recognition, and aims to solve the technical problem that the reverberation intensity is difficult to measure accurately.
In order to achieve the above object, the present invention provides a method for estimating the reverberation strength of speech based on image recognition, which comprises the following steps:
step a, converting the reverberation voice into a three-dimensional spectrogram;
b, detecting the image of the three-dimensional spectrogram to obtain a tail section of the reverberation voice in the three-dimensional spectrogram;
step c, calculating the energy intensity of the trailing section, and taking the energy intensity as an initial estimation value of the reverberation intensity;
and d, smoothing the initial estimation values of more than two tail sections to obtain a final estimation value, and taking the final estimation value as the measurement of the reverberation intensity of the reverberation voice.
Preferably, in the step a, the three-dimensional spectrogram is color-labeled according to the intensity of the spectrogram energy; in the step c, the energy intensity of the trailing segment is calculated according to the color depth in the color mark.
Further, the color mark means that the stronger the speech spectrum energy is, the darker the color is, and the weaker the speech spectrum energy is, the lighter the color is.
Preferably, in the step b, the identifying the hangover segment according to an energy loss law of the reverberant speech includes:
b1. searching more than one frequency point on a preset time interval and a preset frequency segment;
b2. calculating a point with the highest amplitude frequency in the more than one frequency points;
b3. moving a time axis, and searching more than one frequency point with amplitude lower than the highest frequency point of the amplitude on the preset frequency segment to obtain a low-amplitude frequency point;
b4. judging whether the low-amplitude frequency points accord with an energy loss rule or not, if so, judging a time range corresponding to the low-amplitude frequency points as a reverberation time period; the reverberation period is the hangover period.
Preferably, in the step b, the three-dimensional spectrogram is used as an input of a neural network, and a trailing segment of the reverberation voice in the three-dimensional spectrogram is obtained through an image detection function of the neural network.
Further, the neural network adopts a TDNN neural network or a CNN neural network.
Preferably, in the step d, a log1p function is adopted for smoothing; the calculation method is as follows:
log1p=log(x+1);
wherein x is an initial estimation value of the trailing segment.
Furthermore, to achieve the above object, the present invention further provides an apparatus including a memory, a processor and an image recognition based voice reverberation strength estimation program stored on the memory and executable on the processor, wherein the image recognition based voice reverberation strength estimation program, when executed by the processor, implements the steps of the image recognition based voice reverberation strength estimation method as described above.
Furthermore, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon an image recognition-based speech reverberation intensity estimation program, which when executed by a processor implements the steps of the image recognition-based speech reverberation intensity estimation method as described above.
The invention has the beneficial effects that:
(1) The method converts the reverberation voice into a three-dimensional spectrogram; detecting the image of the three-dimensional spectrogram to obtain a tail dragging section of the reverberation voice in the three-dimensional spectrogram; calculating the energy intensity of the trailing section, and taking the energy intensity as an initial estimation value of the reverberation intensity; finally, smoothing is carried out between the initial estimation values of more than two trailing sections to obtain a final estimation value, and the final estimation value is used as the measurement of the reverberation intensity of the reverberation voice, so that the anti-interference performance and the accuracy of the reverberation intensity measurement can be greatly improved;
(2) The energy intensity is calculated by adopting the color depth in the color mark based on image recognition, so that the method is more intuitive;
(3) The identification of the trailing section is to judge the highest frequency point of the amplitude by adopting the amplitude of the frequency point based on the image identification, and to search the low-amplitude frequency point according to the highest frequency point of the amplitude on the basis, so that the trailing section can be quickly and accurately positioned;
(4) The smoothing algorithm of the invention can ensure the validity of data, thereby improving the accuracy of the calculated result of the reverberation intensity.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions will be clearly and completely described below with reference to specific embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The spectrum of the reverberation voice has obvious trailing in voice gaps, and when the spectrum is represented on a spectrogram, the trailing sections have obvious difference from the spectrogram generated by other reasons. And, the larger the reverberation is, the larger the energy of the tail section is, we can find out these tail sections through image recognition, and calculate the energy intensity of the tail section as the measure of the reverberation intensity of the reverberated voice, thereby obtaining the technical scheme of the present invention.
Specifically, the method for estimating the speech reverberation intensity based on image recognition comprises the following steps:
step a, converting the reverberation voice into a three-dimensional spectrogram;
b, carrying out image detection on the three-dimensional spectrogram to obtain a tail section of the reverberation voice in the three-dimensional spectrogram;
step c, calculating the energy intensity of the trailing section, and taking the energy intensity as an initial estimation value of the reverberation intensity;
and d, smoothing the initial estimation values of more than two tail sections to obtain a final estimation value, and taking the final estimation value as the measurement of the reverberation intensity of the reverberation voice.
In the step a, the three-dimensional spectrogram is a time-frequency amplitude three-dimensional map, and the frame number (time) is taken as an x-axis, the frequency is taken as a y-axis, and the amplitude is taken as a z-axis. In the embodiment, the three-dimensional spectrogram is subjected to color marking according to the intensity of the spectrogram energy; the color mark means that the stronger the speech spectrum energy is, the darker the color is, and the weaker the speech spectrum energy is, the lighter the color is. In this embodiment, the intensity of energy in the spectrogram is represented by red, and deeper red indicates greater energy.
In the step b, identifying the hangover segment according to an energy loss law of the reverberant voice specifically includes:
b1. searching more than one frequency point on a preset time interval and a preset frequency segment;
b2. calculating a highest frequency point of the amplitudes of the more than one frequency points;
b3. moving a time axis, and searching more than one frequency point with amplitude lower than the highest frequency point of the amplitude on the preset frequency segment to obtain a low-amplitude frequency point;
b4. judging whether the low-amplitude frequency points accord with an energy loss rule or not, if so, judging the time range corresponding to the low-amplitude frequency points as a reverberation time period; the reverberation period is the hangover period.
The speech reverberation is the result of multiple reflections, which cause energy losses and therefore some smearing from the three-dimensional spectrogram. And after finding the point with the highest frequency of the amplitude in continuous time, finding a point with a smaller amplitude under the same frequency, and then obtaining the point with the smaller amplitude as the tail section of the reverberation voice.
In this embodiment, the three-dimensional spectrogram is used as an input of a neural network, and a trailing segment of the reverberation voice in the three-dimensional spectrogram is obtained through an image detection function of the neural network. Preferably, the neural network is a TDNN neural network or a CNN neural network. In the embodiment, the three-dimensional spectrogram and the color marks thereof are used as the input of a neural network, and the trailing segment of the reverberation voice is output; and meanwhile, outputting the characteristics of the frequency, the amplitude and the like corresponding to the reverberation section.
The TDNN Neural Network is a Time-Delay Neural Network (TDNN) that extends the output of each hidden layer in the Time domain, that is, the input received by each hidden layer is not only the output of the previous layer at the current Time, but also the output of the previous layer at some Time before and after the current Time. The TDNN neural network is multi-layer, each layer has strong abstraction capacity on the characteristics and has the capacity of expressing the relation of the voice characteristics on time and has time invariance. The TDNN delay neural network reduces the learning complexity by sharing the weight on the time dimension, is suitable for processing voice and time sequence signals, and is adaptive to the delay characteristic of the reverberation voice.
The CNN Neural Network is a Convolutional Neural Network (CNN), and is invented under the influence of a Time Delay Neural Network (TDNN) in speech signal processing. The convolutional neural network is one of artificial neural networks, and the weight sharing network structure of the convolutional neural network is more similar to a biological neural network, so that the complexity of a network model is reduced, and the number of weights is reduced. The advantage is more obvious when the input of the network is a multi-dimensional image, so that the image can be directly used as the input of the network, and the complex characteristic extraction and data reconstruction process in the traditional recognition algorithm is avoided. Convolutional networks are a multi-layered perceptron specifically designed to recognize two-dimensional shapes, and the structure of such networks is highly invariant to translation, scaling, tilting, or other forms of deformation.
In the step c, the energy intensity of the trailing segment is calculated according to the color depth in the color mark. And, the longer the tail, the lower the energy and the smaller the amplitude. The formation of longer trailing segments may be a result of 2 to 3 reflections.
In the step d, a log1p function is adopted for smoothing; the calculation method is as follows:
log1p=log(x+1);
wherein x is an initial estimation value of the trailing segment.
The log1p function can be used for converting data with larger skewness, so that the data are more compliant with Gaussian distribution; moreover, the log1p function can ensure the validity of x data: when x is small (e.g. subtraction of two values gives x = 10) -16 ) Too small to exceed numerical validity; and a log1p function is adopted to calculate a small result which is not 0, so that the accuracy of the calculated result of the reverberation intensity is improved.
The method for estimating the reverberation intensity of the invention can be applied to the conjecture of the speaking environment. Specifically, a method for estimating a speech environment can be provided: calculating the reverberation intensity of the reverberation voice of the speaking environment by acquiring the reverberation voice of the speaking environment and adopting the voice reverberation intensity estimation method based on the image recognition, and inputting the reverberation intensity into a neural network model to predict the corresponding speaking environment; the reverberation intensity and other parameter characteristics corresponding to each speaking environment are preset in the neural network model.
Alternatively, the method for estimating the reverberation strength of the present invention can be applied to, but not limited to, whether the indexes of the recording studio, the concert hall, and other places meet one of the required evaluation standards.
In addition, the present invention also provides an apparatus comprising: the device such as a mobile phone, a digital camera, or a tablet computer has a photographing function, or has a voice reverberation intensity estimation function based on image recognition, or has an image display function and a voice processing function. The device may include components such as a memory, a processor, an input unit, a display unit, a power supply, and the like.
The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (e.g., an image playing function, etc.) required by at least one function, and the like; the storage data area may store data created according to the use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may further include a memory controller to provide the processor and the input unit access to the memory.
The input unit may be used to receive input numeric or character or image information, voice information, and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. Specifically, the input unit of the present embodiment may include a microphone and other input devices in addition to the camera.
The display unit may be used to display information input by or provided to a user and various graphical user interfaces of the apparatus, which may be made up of graphics, text, icons, video, and any combination thereof. The Display unit may include a Display panel, and optionally, the Display panel may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like.
An embodiment of the present invention further provides a computer-readable storage medium, which may be the computer-readable storage medium contained in the memory in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium has stored therein at least one instruction that is loaded and executed by a processor to implement a method for speech reverberation strength estimation based on image recognition. The computer readable storage medium may be a read-only memory, a magnetic or optical disk, or the like.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the apparatus embodiment and the storage medium embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.
Also, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
While the above description shows and describes the preferred embodiments of the present invention, it is to be understood that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (6)
1. A speech reverberation strength estimation method based on image recognition is characterized by comprising the following steps:
step a, converting the reverberation voice into a three-dimensional spectrogram; the three-dimensional spectrogram is subjected to color marking according to the intensity of the spectrogram energy;
b, carrying out image detection on the three-dimensional spectrogram to obtain a tail section of the reverberation voice in the three-dimensional spectrogram;
step c, calculating the energy intensity of the tailing section according to the color depth in the color mark, and taking the energy intensity as an initial estimation value of the reverberation intensity;
step d, smoothing the initial estimation values of more than two tail sections to obtain a final estimation value, and taking the final estimation value as the measurement of the reverberation intensity of the reverberation voice;
in the step b, the three-dimensional spectrogram is used as the input of a neural network, and the trailing section of the reverberation voice in the three-dimensional spectrogram is obtained through the image detection function of the neural network; and, identifying the hangover segment according to the energy loss law of the reverberant speech, specifically including:
b1. searching more than one frequency point on a preset time interval and a preset frequency segment;
b2. calculating a highest frequency point of the amplitudes of the more than one frequency points;
b3. moving a time axis, and searching more than one frequency point with amplitude lower than the highest frequency point of the amplitude on the preset frequency segment to obtain a low-amplitude frequency point;
b4. judging whether the low-amplitude frequency points accord with an energy loss rule or not, if so, judging the time range corresponding to the low-amplitude frequency points as a reverberation time period; the reverberation period is the hangover period.
2. The method of claim 1, wherein the method comprises: the color mark means that the color is darker when the energy of the language spectrum is stronger, and the color is lighter when the energy of the language spectrum is weaker.
3. The method of claim 1, wherein the method comprises: the neural network adopts TDNN neural network or CNN neural network.
4. The method of claim 1, wherein the method comprises: in the step d, a log1p function is adopted for smoothing; the calculation method is as follows:
log1p=log(x+1);
wherein x is an initial estimation value of the trailing segment.
5. An image recognition-based speech reverberation strength estimation device, characterized in that the device comprises a memory, a processor and an image recognition-based speech reverberation strength estimation program stored on the memory and executable on the processor, wherein the image recognition-based speech reverberation strength estimation program, when executed by the processor, implements the steps of the image recognition-based speech reverberation strength estimation method according to any of the claims 1 to 4.
6. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon an image recognition-based speech reverberation strength estimation program, which when executed by a processor implements the steps of the image recognition-based speech reverberation strength estimation method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010426246.0A CN111785292B (en) | 2020-05-19 | 2020-05-19 | Speech reverberation intensity estimation method and device based on image recognition and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010426246.0A CN111785292B (en) | 2020-05-19 | 2020-05-19 | Speech reverberation intensity estimation method and device based on image recognition and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111785292A CN111785292A (en) | 2020-10-16 |
CN111785292B true CN111785292B (en) | 2023-03-31 |
Family
ID=72754277
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010426246.0A Active CN111785292B (en) | 2020-05-19 | 2020-05-19 | Speech reverberation intensity estimation method and device based on image recognition and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111785292B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114822589B (en) * | 2022-04-02 | 2023-07-04 | 中科猷声(苏州)科技有限公司 | Indoor acoustic parameter determination method, model construction method, device and electronic equipment |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8036767B2 (en) * | 2006-09-20 | 2011-10-11 | Harman International Industries, Incorporated | System for extracting and changing the reverberant content of an audio input signal |
JP5895529B2 (en) * | 2011-12-28 | 2016-03-30 | ヤマハ株式会社 | Reverberation analysis apparatus and reverberation analysis method |
EP2733700A1 (en) * | 2012-11-16 | 2014-05-21 | Nederlandse Organisatie voor toegepast -natuurwetenschappelijk onderzoek TNO | Method of and apparatus for evaluating intelligibility of a degraded speech signal |
JP6261043B2 (en) * | 2013-08-30 | 2018-01-17 | 本田技研工業株式会社 | Audio processing apparatus, audio processing method, and audio processing program |
CN103440869B (en) * | 2013-09-03 | 2017-01-18 | 大连理工大学 | Audio-reverberation inhibiting device and inhibiting method thereof |
CN104658543A (en) * | 2013-11-20 | 2015-05-27 | 大连佑嘉软件科技有限公司 | Method for eliminating indoor reverberation |
US10382880B2 (en) * | 2014-01-03 | 2019-08-13 | Dolby Laboratories Licensing Corporation | Methods and systems for designing and applying numerically optimized binaural room impulse responses |
US9558757B1 (en) * | 2015-02-20 | 2017-01-31 | Amazon Technologies, Inc. | Selective de-reverberation using blind estimation of reverberation level |
CN107680603B (en) * | 2016-08-02 | 2021-08-31 | 电信科学技术研究院 | Reverberation time estimation method and device |
CN109637553A (en) * | 2019-01-08 | 2019-04-16 | 电信科学技术研究院有限公司 | A kind of method and device of speech dereverbcration |
CN110827821B (en) * | 2019-12-04 | 2022-04-12 | 三星电子(中国)研发中心 | Voice interaction device and method and computer readable storage medium |
-
2020
- 2020-05-19 CN CN202010426246.0A patent/CN111785292B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111785292A (en) | 2020-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111161752B (en) | Echo cancellation method and device | |
CN110491403B (en) | Audio signal processing method, device, medium and audio interaction equipment | |
CN109087669B (en) | Audio similarity detection method and device, storage medium and computer equipment | |
CN107481718B (en) | Audio recognition method, device, storage medium and electronic equipment | |
CN111179961B (en) | Audio signal processing method and device, electronic equipment and storage medium | |
CN110808063A (en) | Voice processing method and device for processing voice | |
CN111312273A (en) | Reverberation elimination method, apparatus, computer device and storage medium | |
CN105308679A (en) | Method and system for identifying location associated with voice command to control home appliance | |
US10297251B2 (en) | Vehicle having dynamic acoustic model switching to improve noisy speech recognition | |
Wang et al. | Recurrent deep stacking networks for supervised speech separation | |
CN110047470A (en) | A kind of sound end detecting method | |
JP2009271359A (en) | Processing unit, speech recognition apparatus, speech recognition system, speech recognition method, and speech recognition program | |
CN108899047A (en) | The masking threshold estimation method, apparatus and storage medium of audio signal | |
CN108206027A (en) | A kind of audio quality evaluation method and system | |
CN112309417B (en) | Method, device, system and readable medium for processing audio signal with wind noise suppression | |
CN111785292B (en) | Speech reverberation intensity estimation method and device based on image recognition and storage medium | |
KR20210137146A (en) | Speech augmentation using clustering of queues | |
JPWO2011122521A1 (en) | Information display system, information display method and program | |
JP6265903B2 (en) | Signal noise attenuation | |
Chan et al. | Speech enhancement strategy for speech recognition microcontroller under noisy environments | |
CN113744730A (en) | Sound detection method and device | |
CN111755029B (en) | Voice processing method, device, storage medium and electronic equipment | |
JP2009276365A (en) | Processor, voice recognition device, voice recognition system and voice recognition method | |
CN112967731B (en) | Method, device and computer readable medium for eliminating voice echo | |
US11769486B2 (en) | System and method for data augmentation and speech processing in dynamic acoustic environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |