CN111785292B - Speech reverberation intensity estimation method and device based on image recognition and storage medium - Google Patents

Speech reverberation intensity estimation method and device based on image recognition and storage medium Download PDF

Info

Publication number
CN111785292B
CN111785292B CN202010426246.0A CN202010426246A CN111785292B CN 111785292 B CN111785292 B CN 111785292B CN 202010426246 A CN202010426246 A CN 202010426246A CN 111785292 B CN111785292 B CN 111785292B
Authority
CN
China
Prior art keywords
reverberation
intensity
image recognition
voice
energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010426246.0A
Other languages
Chinese (zh)
Other versions
CN111785292A (en
Inventor
张广学
肖龙源
叶志坚
李稀敏
刘晓葳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN202010426246.0A priority Critical patent/CN111785292B/en
Publication of CN111785292A publication Critical patent/CN111785292A/en
Application granted granted Critical
Publication of CN111785292B publication Critical patent/CN111785292B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L21/14Transforming into visible information by displaying frequency domain information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • G06F2218/10Feature extraction by analysing the shape of a waveform, e.g. extracting parameters relating to peaks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a voice reverberation intensity estimation method, a device and a storage medium based on image recognition, which convert reverberation voice into a three-dimensional spectrogram; performing image detection on the three-dimensional spectrogram to obtain a tail section of the reverberation voice in the three-dimensional spectrogram; calculating the energy intensity of the trailing section, and taking the energy intensity as an initial estimation value of the reverberation intensity; and finally, smoothing the initial estimation values of more than two tail sections to obtain a final estimation value, and taking the final estimation value as the measurement of the reverberation intensity of the reverberation voice, thereby greatly improving the anti-interference performance and the accuracy of the reverberation intensity measurement.

Description

Speech reverberation intensity estimation method and device based on image recognition and storage medium
Technical Field
The invention relates to the technical field of voice processing, in particular to a voice reverberation intensity estimation method based on image recognition, a voice reverberation intensity estimation device based on image recognition and a computer readable storage medium.
Background
The reverberation effect is an important phenomenon of indoor acoustics, and is generated by multiple reflections of sound in a closed space. In applications such as hands-free telephones, video teleconferencing systems, hearing aids, man-machine dialog systems, reverberation effects are an important factor affecting the intelligibility of speech signals; meanwhile, it is also an important factor affecting the binaural effect in applications such as stereo theaters, stereo car sound systems, and the like.
However, in practical life, there are very few ways to measure the reverberation strength, and the commonly used reverberation strength estimation methods mainly include:
(1) Estimating the reverberation strength according to the reverberation time:
reverberation time (denoted as RT) 60 ) Is defined as: the time it takes for the residual acoustic energy in a particular room space, from when the acoustic excitation ceases, to decay to 60dB below the energy at the initial observation after multiple reflections. Reverberation time is an important index for measuring the Reverberation characteristics of a specific room space, and is closely related to the calculated estimation of Late-Reverberation (Late-Reverberation) power in a dereverberation algorithm.
However, blind source reverberation time is an academic problem, and particularly when only one channel is used, it is difficult to accurately obtain the reverberation time in any environment.
(2) Estimating the reverberation strength from the SRMR values:
the reverberation modulation energy ratio (SRMR) value is an estimate of the reverberation strength by calculating the speech-to-reverberation adjustment energy ratio. However, SRMR is text dependent and is affected by vowels in speech, and there may be no reverberation but a high reverberation strength is returned.
Disclosure of Invention
The invention mainly aims to provide a method, a device and a storage medium for estimating the reverberation intensity of voice based on image recognition, and aims to solve the technical problem that the reverberation intensity is difficult to measure accurately.
In order to achieve the above object, the present invention provides a method for estimating the reverberation strength of speech based on image recognition, which comprises the following steps:
step a, converting the reverberation voice into a three-dimensional spectrogram;
b, detecting the image of the three-dimensional spectrogram to obtain a tail section of the reverberation voice in the three-dimensional spectrogram;
step c, calculating the energy intensity of the trailing section, and taking the energy intensity as an initial estimation value of the reverberation intensity;
and d, smoothing the initial estimation values of more than two tail sections to obtain a final estimation value, and taking the final estimation value as the measurement of the reverberation intensity of the reverberation voice.
Preferably, in the step a, the three-dimensional spectrogram is color-labeled according to the intensity of the spectrogram energy; in the step c, the energy intensity of the trailing segment is calculated according to the color depth in the color mark.
Further, the color mark means that the stronger the speech spectrum energy is, the darker the color is, and the weaker the speech spectrum energy is, the lighter the color is.
Preferably, in the step b, the identifying the hangover segment according to an energy loss law of the reverberant speech includes:
b1. searching more than one frequency point on a preset time interval and a preset frequency segment;
b2. calculating a point with the highest amplitude frequency in the more than one frequency points;
b3. moving a time axis, and searching more than one frequency point with amplitude lower than the highest frequency point of the amplitude on the preset frequency segment to obtain a low-amplitude frequency point;
b4. judging whether the low-amplitude frequency points accord with an energy loss rule or not, if so, judging a time range corresponding to the low-amplitude frequency points as a reverberation time period; the reverberation period is the hangover period.
Preferably, in the step b, the three-dimensional spectrogram is used as an input of a neural network, and a trailing segment of the reverberation voice in the three-dimensional spectrogram is obtained through an image detection function of the neural network.
Further, the neural network adopts a TDNN neural network or a CNN neural network.
Preferably, in the step d, a log1p function is adopted for smoothing; the calculation method is as follows:
log1p=log(x+1);
wherein x is an initial estimation value of the trailing segment.
Furthermore, to achieve the above object, the present invention further provides an apparatus including a memory, a processor and an image recognition based voice reverberation strength estimation program stored on the memory and executable on the processor, wherein the image recognition based voice reverberation strength estimation program, when executed by the processor, implements the steps of the image recognition based voice reverberation strength estimation method as described above.
Furthermore, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon an image recognition-based speech reverberation intensity estimation program, which when executed by a processor implements the steps of the image recognition-based speech reverberation intensity estimation method as described above.
The invention has the beneficial effects that:
(1) The method converts the reverberation voice into a three-dimensional spectrogram; detecting the image of the three-dimensional spectrogram to obtain a tail dragging section of the reverberation voice in the three-dimensional spectrogram; calculating the energy intensity of the trailing section, and taking the energy intensity as an initial estimation value of the reverberation intensity; finally, smoothing is carried out between the initial estimation values of more than two trailing sections to obtain a final estimation value, and the final estimation value is used as the measurement of the reverberation intensity of the reverberation voice, so that the anti-interference performance and the accuracy of the reverberation intensity measurement can be greatly improved;
(2) The energy intensity is calculated by adopting the color depth in the color mark based on image recognition, so that the method is more intuitive;
(3) The identification of the trailing section is to judge the highest frequency point of the amplitude by adopting the amplitude of the frequency point based on the image identification, and to search the low-amplitude frequency point according to the highest frequency point of the amplitude on the basis, so that the trailing section can be quickly and accurately positioned;
(4) The smoothing algorithm of the invention can ensure the validity of data, thereby improving the accuracy of the calculated result of the reverberation intensity.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions will be clearly and completely described below with reference to specific embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The spectrum of the reverberation voice has obvious trailing in voice gaps, and when the spectrum is represented on a spectrogram, the trailing sections have obvious difference from the spectrogram generated by other reasons. And, the larger the reverberation is, the larger the energy of the tail section is, we can find out these tail sections through image recognition, and calculate the energy intensity of the tail section as the measure of the reverberation intensity of the reverberated voice, thereby obtaining the technical scheme of the present invention.
Specifically, the method for estimating the speech reverberation intensity based on image recognition comprises the following steps:
step a, converting the reverberation voice into a three-dimensional spectrogram;
b, carrying out image detection on the three-dimensional spectrogram to obtain a tail section of the reverberation voice in the three-dimensional spectrogram;
step c, calculating the energy intensity of the trailing section, and taking the energy intensity as an initial estimation value of the reverberation intensity;
and d, smoothing the initial estimation values of more than two tail sections to obtain a final estimation value, and taking the final estimation value as the measurement of the reverberation intensity of the reverberation voice.
In the step a, the three-dimensional spectrogram is a time-frequency amplitude three-dimensional map, and the frame number (time) is taken as an x-axis, the frequency is taken as a y-axis, and the amplitude is taken as a z-axis. In the embodiment, the three-dimensional spectrogram is subjected to color marking according to the intensity of the spectrogram energy; the color mark means that the stronger the speech spectrum energy is, the darker the color is, and the weaker the speech spectrum energy is, the lighter the color is. In this embodiment, the intensity of energy in the spectrogram is represented by red, and deeper red indicates greater energy.
In the step b, identifying the hangover segment according to an energy loss law of the reverberant voice specifically includes:
b1. searching more than one frequency point on a preset time interval and a preset frequency segment;
b2. calculating a highest frequency point of the amplitudes of the more than one frequency points;
b3. moving a time axis, and searching more than one frequency point with amplitude lower than the highest frequency point of the amplitude on the preset frequency segment to obtain a low-amplitude frequency point;
b4. judging whether the low-amplitude frequency points accord with an energy loss rule or not, if so, judging the time range corresponding to the low-amplitude frequency points as a reverberation time period; the reverberation period is the hangover period.
The speech reverberation is the result of multiple reflections, which cause energy losses and therefore some smearing from the three-dimensional spectrogram. And after finding the point with the highest frequency of the amplitude in continuous time, finding a point with a smaller amplitude under the same frequency, and then obtaining the point with the smaller amplitude as the tail section of the reverberation voice.
In this embodiment, the three-dimensional spectrogram is used as an input of a neural network, and a trailing segment of the reverberation voice in the three-dimensional spectrogram is obtained through an image detection function of the neural network. Preferably, the neural network is a TDNN neural network or a CNN neural network. In the embodiment, the three-dimensional spectrogram and the color marks thereof are used as the input of a neural network, and the trailing segment of the reverberation voice is output; and meanwhile, outputting the characteristics of the frequency, the amplitude and the like corresponding to the reverberation section.
The TDNN Neural Network is a Time-Delay Neural Network (TDNN) that extends the output of each hidden layer in the Time domain, that is, the input received by each hidden layer is not only the output of the previous layer at the current Time, but also the output of the previous layer at some Time before and after the current Time. The TDNN neural network is multi-layer, each layer has strong abstraction capacity on the characteristics and has the capacity of expressing the relation of the voice characteristics on time and has time invariance. The TDNN delay neural network reduces the learning complexity by sharing the weight on the time dimension, is suitable for processing voice and time sequence signals, and is adaptive to the delay characteristic of the reverberation voice.
The CNN Neural Network is a Convolutional Neural Network (CNN), and is invented under the influence of a Time Delay Neural Network (TDNN) in speech signal processing. The convolutional neural network is one of artificial neural networks, and the weight sharing network structure of the convolutional neural network is more similar to a biological neural network, so that the complexity of a network model is reduced, and the number of weights is reduced. The advantage is more obvious when the input of the network is a multi-dimensional image, so that the image can be directly used as the input of the network, and the complex characteristic extraction and data reconstruction process in the traditional recognition algorithm is avoided. Convolutional networks are a multi-layered perceptron specifically designed to recognize two-dimensional shapes, and the structure of such networks is highly invariant to translation, scaling, tilting, or other forms of deformation.
In the step c, the energy intensity of the trailing segment is calculated according to the color depth in the color mark. And, the longer the tail, the lower the energy and the smaller the amplitude. The formation of longer trailing segments may be a result of 2 to 3 reflections.
In the step d, a log1p function is adopted for smoothing; the calculation method is as follows:
log1p=log(x+1);
wherein x is an initial estimation value of the trailing segment.
The log1p function can be used for converting data with larger skewness, so that the data are more compliant with Gaussian distribution; moreover, the log1p function can ensure the validity of x data: when x is small (e.g. subtraction of two values gives x = 10) -16 ) Too small to exceed numerical validity; and a log1p function is adopted to calculate a small result which is not 0, so that the accuracy of the calculated result of the reverberation intensity is improved.
The method for estimating the reverberation intensity of the invention can be applied to the conjecture of the speaking environment. Specifically, a method for estimating a speech environment can be provided: calculating the reverberation intensity of the reverberation voice of the speaking environment by acquiring the reverberation voice of the speaking environment and adopting the voice reverberation intensity estimation method based on the image recognition, and inputting the reverberation intensity into a neural network model to predict the corresponding speaking environment; the reverberation intensity and other parameter characteristics corresponding to each speaking environment are preset in the neural network model.
Alternatively, the method for estimating the reverberation strength of the present invention can be applied to, but not limited to, whether the indexes of the recording studio, the concert hall, and other places meet one of the required evaluation standards.
In addition, the present invention also provides an apparatus comprising: the device such as a mobile phone, a digital camera, or a tablet computer has a photographing function, or has a voice reverberation intensity estimation function based on image recognition, or has an image display function and a voice processing function. The device may include components such as a memory, a processor, an input unit, a display unit, a power supply, and the like.
The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (e.g., an image playing function, etc.) required by at least one function, and the like; the storage data area may store data created according to the use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may further include a memory controller to provide the processor and the input unit access to the memory.
The input unit may be used to receive input numeric or character or image information, voice information, and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. Specifically, the input unit of the present embodiment may include a microphone and other input devices in addition to the camera.
The display unit may be used to display information input by or provided to a user and various graphical user interfaces of the apparatus, which may be made up of graphics, text, icons, video, and any combination thereof. The Display unit may include a Display panel, and optionally, the Display panel may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like.
An embodiment of the present invention further provides a computer-readable storage medium, which may be the computer-readable storage medium contained in the memory in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium has stored therein at least one instruction that is loaded and executed by a processor to implement a method for speech reverberation strength estimation based on image recognition. The computer readable storage medium may be a read-only memory, a magnetic or optical disk, or the like.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the apparatus embodiment and the storage medium embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.
Also, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
While the above description shows and describes the preferred embodiments of the present invention, it is to be understood that the invention is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the inventive concept as expressed herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A speech reverberation strength estimation method based on image recognition is characterized by comprising the following steps:
step a, converting the reverberation voice into a three-dimensional spectrogram; the three-dimensional spectrogram is subjected to color marking according to the intensity of the spectrogram energy;
b, carrying out image detection on the three-dimensional spectrogram to obtain a tail section of the reverberation voice in the three-dimensional spectrogram;
step c, calculating the energy intensity of the tailing section according to the color depth in the color mark, and taking the energy intensity as an initial estimation value of the reverberation intensity;
step d, smoothing the initial estimation values of more than two tail sections to obtain a final estimation value, and taking the final estimation value as the measurement of the reverberation intensity of the reverberation voice;
in the step b, the three-dimensional spectrogram is used as the input of a neural network, and the trailing section of the reverberation voice in the three-dimensional spectrogram is obtained through the image detection function of the neural network; and, identifying the hangover segment according to the energy loss law of the reverberant speech, specifically including:
b1. searching more than one frequency point on a preset time interval and a preset frequency segment;
b2. calculating a highest frequency point of the amplitudes of the more than one frequency points;
b3. moving a time axis, and searching more than one frequency point with amplitude lower than the highest frequency point of the amplitude on the preset frequency segment to obtain a low-amplitude frequency point;
b4. judging whether the low-amplitude frequency points accord with an energy loss rule or not, if so, judging the time range corresponding to the low-amplitude frequency points as a reverberation time period; the reverberation period is the hangover period.
2. The method of claim 1, wherein the method comprises: the color mark means that the color is darker when the energy of the language spectrum is stronger, and the color is lighter when the energy of the language spectrum is weaker.
3. The method of claim 1, wherein the method comprises: the neural network adopts TDNN neural network or CNN neural network.
4. The method of claim 1, wherein the method comprises: in the step d, a log1p function is adopted for smoothing; the calculation method is as follows:
log1p=log(x+1);
wherein x is an initial estimation value of the trailing segment.
5. An image recognition-based speech reverberation strength estimation device, characterized in that the device comprises a memory, a processor and an image recognition-based speech reverberation strength estimation program stored on the memory and executable on the processor, wherein the image recognition-based speech reverberation strength estimation program, when executed by the processor, implements the steps of the image recognition-based speech reverberation strength estimation method according to any of the claims 1 to 4.
6. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon an image recognition-based speech reverberation strength estimation program, which when executed by a processor implements the steps of the image recognition-based speech reverberation strength estimation method according to any one of claims 1 to 4.
CN202010426246.0A 2020-05-19 2020-05-19 Speech reverberation intensity estimation method and device based on image recognition and storage medium Active CN111785292B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010426246.0A CN111785292B (en) 2020-05-19 2020-05-19 Speech reverberation intensity estimation method and device based on image recognition and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010426246.0A CN111785292B (en) 2020-05-19 2020-05-19 Speech reverberation intensity estimation method and device based on image recognition and storage medium

Publications (2)

Publication Number Publication Date
CN111785292A CN111785292A (en) 2020-10-16
CN111785292B true CN111785292B (en) 2023-03-31

Family

ID=72754277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010426246.0A Active CN111785292B (en) 2020-05-19 2020-05-19 Speech reverberation intensity estimation method and device based on image recognition and storage medium

Country Status (1)

Country Link
CN (1) CN111785292B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114822589B (en) * 2022-04-02 2023-07-04 中科猷声(苏州)科技有限公司 Indoor acoustic parameter determination method, model construction method, device and electronic equipment

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8036767B2 (en) * 2006-09-20 2011-10-11 Harman International Industries, Incorporated System for extracting and changing the reverberant content of an audio input signal
JP5895529B2 (en) * 2011-12-28 2016-03-30 ヤマハ株式会社 Reverberation analysis apparatus and reverberation analysis method
EP2733700A1 (en) * 2012-11-16 2014-05-21 Nederlandse Organisatie voor toegepast -natuurwetenschappelijk onderzoek TNO Method of and apparatus for evaluating intelligibility of a degraded speech signal
JP6261043B2 (en) * 2013-08-30 2018-01-17 本田技研工業株式会社 Audio processing apparatus, audio processing method, and audio processing program
CN103440869B (en) * 2013-09-03 2017-01-18 大连理工大学 Audio-reverberation inhibiting device and inhibiting method thereof
CN104658543A (en) * 2013-11-20 2015-05-27 大连佑嘉软件科技有限公司 Method for eliminating indoor reverberation
US10382880B2 (en) * 2014-01-03 2019-08-13 Dolby Laboratories Licensing Corporation Methods and systems for designing and applying numerically optimized binaural room impulse responses
US9558757B1 (en) * 2015-02-20 2017-01-31 Amazon Technologies, Inc. Selective de-reverberation using blind estimation of reverberation level
CN107680603B (en) * 2016-08-02 2021-08-31 电信科学技术研究院 Reverberation time estimation method and device
CN109637553A (en) * 2019-01-08 2019-04-16 电信科学技术研究院有限公司 A kind of method and device of speech dereverbcration
CN110827821B (en) * 2019-12-04 2022-04-12 三星电子(中国)研发中心 Voice interaction device and method and computer readable storage medium

Also Published As

Publication number Publication date
CN111785292A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
CN111161752B (en) Echo cancellation method and device
CN110491403B (en) Audio signal processing method, device, medium and audio interaction equipment
CN109087669B (en) Audio similarity detection method and device, storage medium and computer equipment
CN107481718B (en) Audio recognition method, device, storage medium and electronic equipment
CN111179961B (en) Audio signal processing method and device, electronic equipment and storage medium
CN110808063A (en) Voice processing method and device for processing voice
CN111312273A (en) Reverberation elimination method, apparatus, computer device and storage medium
CN105308679A (en) Method and system for identifying location associated with voice command to control home appliance
US10297251B2 (en) Vehicle having dynamic acoustic model switching to improve noisy speech recognition
Wang et al. Recurrent deep stacking networks for supervised speech separation
CN110047470A (en) A kind of sound end detecting method
JP2009271359A (en) Processing unit, speech recognition apparatus, speech recognition system, speech recognition method, and speech recognition program
CN108899047A (en) The masking threshold estimation method, apparatus and storage medium of audio signal
CN108206027A (en) A kind of audio quality evaluation method and system
CN112309417B (en) Method, device, system and readable medium for processing audio signal with wind noise suppression
CN111785292B (en) Speech reverberation intensity estimation method and device based on image recognition and storage medium
KR20210137146A (en) Speech augmentation using clustering of queues
JPWO2011122521A1 (en) Information display system, information display method and program
JP6265903B2 (en) Signal noise attenuation
Chan et al. Speech enhancement strategy for speech recognition microcontroller under noisy environments
CN113744730A (en) Sound detection method and device
CN111755029B (en) Voice processing method, device, storage medium and electronic equipment
JP2009276365A (en) Processor, voice recognition device, voice recognition system and voice recognition method
CN112967731B (en) Method, device and computer readable medium for eliminating voice echo
US11769486B2 (en) System and method for data augmentation and speech processing in dynamic acoustic environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant