CN113470628A - Voice recognition method and device - Google Patents

Voice recognition method and device Download PDF

Info

Publication number
CN113470628A
CN113470628A CN202110792834.0A CN202110792834A CN113470628A CN 113470628 A CN113470628 A CN 113470628A CN 202110792834 A CN202110792834 A CN 202110792834A CN 113470628 A CN113470628 A CN 113470628A
Authority
CN
China
Prior art keywords
data
rir
present application
time period
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110792834.0A
Other languages
Chinese (zh)
Other versions
CN113470628B (en
Inventor
李程帅
周全
孙进伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Xinxin Microelectronics Technology Co Ltd
Original Assignee
Qingdao Xinxin Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Xinxin Microelectronics Technology Co Ltd filed Critical Qingdao Xinxin Microelectronics Technology Co Ltd
Priority to CN202110792834.0A priority Critical patent/CN113470628B/en
Priority claimed from CN202110792834.0A external-priority patent/CN113470628B/en
Publication of CN113470628A publication Critical patent/CN113470628A/en
Application granted granted Critical
Publication of CN113470628B publication Critical patent/CN113470628B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application discloses a voice recognition method and a voice recognition device, which are used for enhancing the robustness of a voice recognition model. The application provides a voice recognition method, which comprises the following steps: masking predetermined room impulse response, RIR, data; convolving the masked RIR data with original voice data to obtain new voice data; and training a voice recognition model by using the new voice data.

Description

Voice recognition method and device
Technical Field
The present application relates to the field of information technology, and in particular, to a method and an apparatus for speech recognition.
Background
The existing voice recognition technology mainly depends on an algorithm based on deep learning, in order to obtain a voice recognition model with a high recognition rate, a large amount of voice data matched with a real scene is needed, wherein room reverberation, the distance angle between a speaker and a microphone and the like are one of important factors influencing the performance of the voice recognition model, however, reverberation of a shielded or special-shaped room is difficult to simulate by the algorithm, for example, the recognition rate is obviously reduced under the conditions that the speaker is in a restaurant, the microphone is in a living room, or the speaker is back to the microphone, shielding exists between the speaker and the microphone, and the like, and a large amount of reverberation data is difficult to acquire, so that massive data coverage of the conditions cannot be achieved.
Disclosure of Invention
The embodiment of the application provides a voice recognition method and a voice recognition device, which are used for enhancing the robustness of a voice recognition model.
The voice recognition method provided by the embodiment of the application comprises the following steps:
masking predetermined room impulse response, RIR, data;
convolving the masked RIR data with original voice data to obtain new voice data;
and training a voice recognition model by using the new voice data.
By this method, predetermined room impulse response, RIR, data is masked; convolving the masked RIR data with original voice data to obtain new voice data; the new voice data is utilized to train the voice recognition model, so that the robustness of the voice recognition model is enhanced, the voice recognition rate of the voice recognition model under the conditions of room shielding, multi-angle and the like is improved, and the method is simple, efficient and high in applicability.
Optionally, masking the predetermined room impulse response RIR data specifically includes:
determining a time period for masking the RIR data;
and replacing the RIR data of the time period with a preset value.
Optionally, the preset value is zero, or is an average value of a part of the RIR data in the RIR data, or is a random number.
Optionally, the RIR data for the time period comprises one or more periods of RIR data in the RIR data.
Optionally, the starting position of the time period comprises a random value within a preset range from 0.
Optionally, the duration of the time period comprises a random value within a preset range from 0.
Optionally, the RIR data is generated by simulation in advance, or acquired in a real scene.
An embodiment of the present application provides a speech recognition apparatus, including:
a first unit for masking predetermined room impulse response RIR data;
the second unit is used for convolving the masked RIR data with the original voice data to obtain new voice data;
a third unit for training a speech recognition model using the new speech data.
Another embodiment of the present application provides a computing device, which includes a memory and a processor, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions stored in the memory and executing any one of the above methods according to the obtained program.
Another embodiment of the present application provides a computer storage medium having stored thereon computer-executable instructions for causing a computer to perform any one of the methods described above.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic representation of RIR data provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of masked RIR data provided in an embodiment of the present application;
fig. 3 is a schematic flowchart of a speech recognition method according to an embodiment of the present application;
fig. 4 is a schematic flowchart of a speech recognition method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a voice recognition method and a voice recognition device, which are used for enhancing the robustness of a voice recognition model, so that the voice recognition rate of the voice recognition model under the conditions of room shielding, multi-angle and the like is improved, and the method is simple, efficient and high in applicability.
The method and the device are based on the same application concept, and because the principles of solving the problems of the method and the device are similar, the implementation of the device and the method can be mutually referred, and repeated parts are not repeated.
The embodiment of the application aims to improve the generalization capability of the voice recognition model, randomly masks (mask) the Room Impulse Response (RIR) data, increases the diversity of samples, enables the voice recognition model to accurately recognize voice without depending on specific direct sound and reflected sound, and improves the robustness of the voice recognition model to the situations of shielding, special-shaped rooms (multiple angles) and the like. The room reverberation is composed of direct sound, which refers to a sound signal that a sound source directly propagates to a microphone without reflection, and reflected sound, which is the rest of the sound signal that propagates to the microphone after being reflected and absorbed by an obstacle.
The technical scheme provided by the embodiment of the application mainly comprises the following three aspects:
1. the simulation generates RIR data, or directly uses the RIR data acquired in the real scene. There are various ways how the RIR data may be collected, such as: a certain stimulus signal is played in the room and the response it elicits is recorded.
2. And generating a set of random numbers comprising a random number of the starting position for masking the RIR data and a random number of the time length for masking the time and the time length of the random direct sound or the room reflected sound. That is, this step is used to determine a time period for masking the RIR data, wherein the starting position of the time period includes a random value within a preset range from 0, and the duration of the time period includes a random value within a preset range from 0.
3. And replacing the data corresponding to time and duration in the RIR sample with zero, and performing convolution operation on the obtained new RIR data and the original voice data to obtain new voice data. The convolution operation refers to multiplying and adding convolution kernels and corresponding elements of the audio signal. For example, the formula of one-dimensional discrete convolution is given as follows, where f (N) is RIR, g (N) is the original audio signal, N is the length of the signal f (N), and s (N) is the calculation result.
Figure BDA0003161676020000041
In the embodiment of the application, the RIR data obtained by simulation (for example, obtained by simulation by using an image method) is randomly masked in a data augmentation stage, so that the generalization performance of the speech recognition model is enhanced, and especially the robustness of the speech recognition model in a special-shaped room with a shelter and under the condition that a speaker faces away from a microphone is improved. The data augmentation refers to data augmentation of original data through methods of noise addition, reverberation addition and the like.
The technical scheme provided by the embodiment of the application comprises the following steps:
firstly, RIR data needs to be generated through simulation, and the RIR data collected in a real scene can also be used. The RIR data can be generated using prior art simulations, enabling efficient generation of room impulse responses.
The RIR data is related to parameters such as the distance angle between a microphone and a speaker, the wall reflectivity, the room size (length, width and height), the reverberation time and the like, wherein the wall reflectivity is related to the material of the wall and is a parameter set artificially; the reverberation time refers to the time required for the sound source to attenuate by 60dB after stopping sounding, and can be obtained by using a racing formula (the racing formula is an empirical formula). This method of simulation (e.g., pyroomics) can only generate data in an unobstructed rectangular room environment.
The method for generating the RIR data is not unique, the method for generating the RIR data is not dependent on the method for generating the RIR, and the RIR data generated in any mode or the RIR data acquired in a real scene can be directly used.
For example, referring to fig. 1, an RIR sample is generated by simulation, the abscissa is time, the ordinate is room impulse response (Amp) at the current time, fig. 1 shows the RIR of 8000 sampling points with a sampling rate of 16000, the total duration is half a second, and then in step two, the RIR sample is convolved with the original speech data to obtain new reverberation speech data. The raw speech data refers to speech data collected with a high fidelity microphone in a mute room without reverberation. The original voice data refers to the object which we need to use RIR to add reverberation, and is the collected original data. The new speech data contains the direct sound and room reflections of the original speech data.
Step two, the embodiment of the present application randomly masks the room reflected sound (the content of the masking is random, and may include the direct sound or the reflected sound), where the masking refers to replacing a valid value with zero (of course, other values than zero may also be used), and is intended to reduce the dependence of the speech recognition model on a part of the room reflected sound, so that the speech recognition model can accurately recognize even under the condition of having a mask. Wherein, the direct sound refers to a sound signal which is directly transmitted to a microphone after being sounded from a sound source; the early reflected sound refers to, for example, reflected sound within 100ms (of course, not limited to this value, and the specific value may be determined according to actual needs) after the direct sound.
Specifically, the method comprises the following steps:
firstly, the duration of masking is determined, for example, a random value with a uniform distribution of 0 to 200 in the RIR data is taken as s, if s is equal to 200, then data of 200 consecutive sample points in the RIR data is masked, for example, the audio sampling rate shown in fig. 1 is 16k, 200 sample points are 12.5ms, and the duration of masking is set as mask _ len. Then, the start position of the mask is determined, for example, a random value uniformly distributed from 0 to 1000 is taken (for example, if the random number is taken as 100, the 100 th to 300 th sampling points of the RIR are masked), and the mask _ start is set, so that the sampling points from mask _ start to mask _ start + mask _ len are masked, the data of the sampling points are replaced by zero, and the RIR data after masking is shown in fig. 2.
And then, convolving the new RIR data with the original voice data to obtain new voice data. For example, if mask _ len is equal to 100, mask _ start is 500, that is, the reflected sound 31.25ms to 37.5ms after the arrival of the direct sound is masked in the reverberation room simulated by the new RIR data. The sampling rate is 16k, 500/16000 is 0.03125 (31.25 ms), 100/16000 is 0.00625, and 0.03125+0.00625 is 0.0375 (37.5 ms).
Finally, the new speech data is used for training of the speech recognition model.
Fig. 3 is a schematic flow chart of a speech recognition method according to an embodiment of the present application.
It should be noted that the above technical solutions provided in the embodiments of the present application are only examples, and other implementations of the present application are not unique, for example, a random masking manner is performed on the RIR data, and the embodiments of the present application may mask a continuous early reflected sound, or randomly mask a plurality of segments of room impulse responses with different lengths (i.e., the RIR data in fig. 1), which all belong to the scope of the embodiments of the present application. Secondly, for the positions masked by the RIR data, the embodiment of the present application replaces with zero values, and may also replace with an average value of the whole RIR data or other random numbers, which all belong to the scope of the embodiment of the present application.
In summary, referring to fig. 4, a speech recognition method provided in the embodiment of the present application includes:
s101, masking predetermined room impulse response RIR data;
s102, convolving the masked RIR data with original voice data to obtain new voice data;
and S103, training a voice recognition model by using the new voice data.
Optionally, masking the predetermined room impulse response RIR data specifically includes:
determining a time period for masking the RIR data;
and replacing the RIR data of the time period with a preset value.
Optionally, the preset value is zero, or is an average value of a part of the RIR data in the RIR data, or is a random number.
Optionally, the RIR data for the time period comprises one or more periods of RIR data in the RIR data.
Optionally, the starting position of the time period comprises a random value within a preset range from 0. The predetermined range is, for example, 0 to 1000.
Optionally, the duration of the time period comprises a random value within a preset range from 0. The predetermined range is, for example, 0 to 200.
Optionally, the RIR data is generated by simulation in advance, or acquired in a real scene.
Referring to fig. 5, a computing device provided in this embodiment of the present application may be any kind of terminal device, such as an intelligent appliance (or a network device), and the apparatus includes:
a memory 11 for storing program instructions;
a processor 12 for calling the program instructions stored in the memory and executing, according to the obtained program:
masking predetermined room impulse response, RIR, data;
convolving the masked RIR data with original voice data to obtain new voice data;
and training a voice recognition model by using the new voice data.
Optionally, masking the predetermined room impulse response RIR data specifically includes:
determining a time period for masking the RIR data;
and replacing the RIR data of the time period with a preset value.
Optionally, the preset value is zero, or is an average value of a part of the RIR data in the RIR data, or is a random number.
Optionally, the RIR data for the time period comprises one or more periods of RIR data in the RIR data.
Optionally, the starting position of the time period comprises a random value within a preset range from 0. The predetermined range is, for example, 0 to 1000.
Optionally, the duration of the time period comprises a random value within a preset range from 0. The predetermined range is, for example, 0 to 200.
Optionally, the RIR data is generated by simulation in advance, or acquired in a real scene.
Referring to fig. 6, a speech recognition apparatus provided in this embodiment of the present application may be any kind of terminal device, such as an intelligent appliance (or a network device), for example, and the apparatus includes:
a first unit 21 for masking predetermined room impulse response RIR data;
a second unit 22, configured to convolve the masked RIR data with original voice data to obtain new voice data;
a third unit 23 for training a speech recognition model with the new speech data.
Optionally, masking the predetermined room impulse response RIR data specifically includes:
determining a time period for masking the RIR data;
and replacing the RIR data of the time period with a preset value.
Optionally, the preset value is zero, or is an average value of a part of the RIR data in the RIR data, or is a random number.
Optionally, the RIR data for the time period comprises one or more periods of RIR data in the RIR data.
Optionally, the starting position of the time period comprises a random value within a preset range from 0. The predetermined range is, for example, 0 to 1000.
Optionally, the duration of the time period comprises a random value within a preset range from 0. The predetermined range is, for example, 0 to 200.
Optionally, the RIR data is generated by simulation in advance, or acquired in a real scene.
It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The embodiment of the present application provides a computing device, which may specifically be a desktop computer, a portable computer, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), and the like. The computing device may include a Central Processing Unit (CPU), memory, input/output devices, etc., the input devices may include a keyboard, mouse, touch screen, etc., and the output devices may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), etc.
The memory may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides the processor with program instructions and data stored in the memory. In the embodiments of the present application, the memory may be used for storing a program of any one of the methods provided by the embodiments of the present application.
The processor is used for executing any one of the methods provided by the embodiment of the application according to the obtained program instructions by calling the program instructions stored in the memory.
Embodiments of the present application provide a computer storage medium for storing computer program instructions for an apparatus provided in the embodiments of the present application, which includes a program for executing any one of the methods provided in the embodiments of the present application.
The computer storage media may be any available media or data storage device that can be accessed by a computer, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.
The technical scheme provided by the embodiment of the application can be applied to terminal equipment and network equipment.
The Terminal device may also be referred to as a User Equipment (User Equipment, abbreviated as "UE"), a Mobile Station (Mobile Station, abbreviated as "MS"), a Mobile Terminal (Mobile Terminal), or the like, and optionally, the Terminal may have a capability of communicating with one or more core networks via a Radio Access Network (RAN), for example, the Terminal may be an intelligent appliance, a Mobile phone (or referred to as a "cellular" phone), or a computer with Mobile property, and for example, the Terminal may also be a portable, pocket, hand-held, computer-built-in, or vehicle-mounted Mobile device.
A network device may be a base station (e.g., access point) that refers to a device in an access network that communicates over the air-interface, through one or more sectors, with wireless terminals. The base station may be configured to interconvert received air frames and IP packets as a router between the wireless terminal and the rest of the access network, which may include an Internet Protocol (IP) network. The base station may also coordinate management of attributes for the air interface. For example, the Base Station may be a Base Transceiver Station (BTS) in GSM or CDMA, a Base Station (NodeB) in WCDMA, an evolved Node B (NodeB or eNB or e-NodeB) in LTE, or a gNB in 5G system. The embodiments of the present application are not limited.
The above method process flow may be implemented by a software program, which may be stored in a storage medium, and when the stored software program is called, the above method steps are performed.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method of speech recognition, the method comprising:
masking predetermined room impulse response, RIR, data;
convolving the masked RIR data with original voice data to obtain new voice data;
and training a voice recognition model by using the new voice data.
2. The method of claim 1, wherein masking predetermined Room Impulse Response (RIR) data comprises:
determining a time period for masking the RIR data;
and replacing the RIR data of the time period with a preset value.
3. The method of claim 2, wherein the preset value is zero, or is an average value of a part of the RIR data, or is a random number.
4. The method of claim 2, wherein the period of RIR data comprises one or more periods of RIR data in the RIR data.
5. The method of claim 2, wherein the start position of the time period comprises a random value within a preset range from 0.
6. The method of claim 2, wherein the duration of the time period comprises a random value within a preset range from 0.
7. The method of claim 1, wherein the RIR data is generated by pre-simulation or acquired in a real scene.
8. A speech recognition apparatus, comprising:
a first unit for masking predetermined room impulse response RIR data;
the second unit is used for convolving the masked RIR data with the original voice data to obtain new voice data;
a third unit for training a speech recognition model using the new speech data.
9. A computing device, comprising:
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to perform the method of any of claims 1 to 7 in accordance with the obtained program.
10. A computer storage medium having stored thereon computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 7.
CN202110792834.0A 2021-07-14 Voice recognition method and device Active CN113470628B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110792834.0A CN113470628B (en) 2021-07-14 Voice recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110792834.0A CN113470628B (en) 2021-07-14 Voice recognition method and device

Publications (2)

Publication Number Publication Date
CN113470628A true CN113470628A (en) 2021-10-01
CN113470628B CN113470628B (en) 2024-05-31

Family

ID=

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5544249A (en) * 1993-08-26 1996-08-06 Akg Akustische U. Kino-Gerate Gesellschaft M.B.H. Method of simulating a room and/or sound impression
EP2028883A2 (en) * 2007-08-22 2009-02-25 Gwangju Institute of Science and Technology Sound field generator and method of generating sound field using the same
US20170316773A1 (en) * 2015-01-20 2017-11-02 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Speech reproduction device configured for masking reproduced speech in a masked speech zone
US20180253648A1 (en) * 2017-03-01 2018-09-06 Synaptics Inc Connectionist temporal classification using segmented labeled sequence data
CN108734138A (en) * 2018-05-24 2018-11-02 浙江工业大学 A kind of melanoma skin disease image classification method based on integrated study
CN110379414A (en) * 2019-07-22 2019-10-25 出门问问(苏州)信息科技有限公司 Acoustic model enhances training method, device, readable storage medium storing program for executing and calculates equipment
US10582299B1 (en) * 2018-12-11 2020-03-03 Amazon Technologies, Inc. Modeling room acoustics using acoustic waves
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN111210802A (en) * 2020-01-08 2020-05-29 厦门亿联网络技术股份有限公司 Method and system for generating reverberation voice data
CN112257521A (en) * 2020-09-30 2021-01-22 中国人民解放军军事科学院国防科技创新研究院 CNN underwater acoustic signal target identification method based on data enhancement and time-frequency separation
CN112633171A (en) * 2020-12-23 2021-04-09 北京恒达时讯科技股份有限公司 Sea ice identification method and system based on multi-source optical remote sensing image
CN112767927A (en) * 2020-12-29 2021-05-07 平安科技(深圳)有限公司 Method, device, terminal and storage medium for extracting voice features

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5544249A (en) * 1993-08-26 1996-08-06 Akg Akustische U. Kino-Gerate Gesellschaft M.B.H. Method of simulating a room and/or sound impression
EP2028883A2 (en) * 2007-08-22 2009-02-25 Gwangju Institute of Science and Technology Sound field generator and method of generating sound field using the same
US20170316773A1 (en) * 2015-01-20 2017-11-02 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Speech reproduction device configured for masking reproduced speech in a masked speech zone
US20180253648A1 (en) * 2017-03-01 2018-09-06 Synaptics Inc Connectionist temporal classification using segmented labeled sequence data
CN108734138A (en) * 2018-05-24 2018-11-02 浙江工业大学 A kind of melanoma skin disease image classification method based on integrated study
US10582299B1 (en) * 2018-12-11 2020-03-03 Amazon Technologies, Inc. Modeling room acoustics using acoustic waves
CN110379414A (en) * 2019-07-22 2019-10-25 出门问问(苏州)信息科技有限公司 Acoustic model enhances training method, device, readable storage medium storing program for executing and calculates equipment
CN111210802A (en) * 2020-01-08 2020-05-29 厦门亿联网络技术股份有限公司 Method and system for generating reverberation voice data
CN111159416A (en) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 Language task model training method and device, electronic equipment and storage medium
CN112257521A (en) * 2020-09-30 2021-01-22 中国人民解放军军事科学院国防科技创新研究院 CNN underwater acoustic signal target identification method based on data enhancement and time-frequency separation
CN112633171A (en) * 2020-12-23 2021-04-09 北京恒达时讯科技股份有限公司 Sea ice identification method and system based on multi-source optical remote sensing image
CN112767927A (en) * 2020-12-29 2021-05-07 平安科技(深圳)有限公司 Method, device, terminal and storage medium for extracting voice features

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TIEMIN MEI: "ROOM IMPULSE RESPONSE RESHAPING/SHORTENING BASED ON LEAST MEAN SQUARES OPTIMIZATIONWITH INFINITY NORM CONSTRAINT", IEEE, 31 December 2009 (2009-12-31), pages 1 - 6 *
ZHONG-QIU WANG: "Robust Speaker Localization Guided by Deep Learning Based Time-Frequency Masking", IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 31 December 2018 (2018-12-31), pages 1 - 11 *
贾海蓉: "基于双通道神经网络时频掩蔽的语音增强算法", 华中科技大学学报, vol. 49, no. 6, 30 June 2021 (2021-06-30), pages 43 - 49 *

Similar Documents

Publication Publication Date Title
CN106033419B (en) Method, device and system for pushing messages in real time
US8908875B2 (en) Electronic device with digital reverberator and method
CN110809214B (en) Audio playing method, audio playing device and terminal equipment
CN105657479B (en) Video processing method and device
CN105225674B (en) A kind of audio signal processing method, device and mobile terminal
CN108391199B (en) virtual sound image synthesis method, medium and terminal based on personalized reflected sound threshold
CN107301028B (en) Audio data processing method and device based on multi-person remote call
WO2019072180A1 (en) Method and apparatus for allocating resources to application
CN108549486A (en) The method and device of explanation is realized in virtual scene
CN107016990B (en) Audio signal generation method and device
WO2015062109A1 (en) Method and device for evaluating network key performance indicator
CN110493703A (en) Stereo audio processing method, system and the storage medium of virtual spectators
CN112770063B (en) Image generation method and device
CN112333608B (en) Voice data processing method and related product
CN113470628B (en) Voice recognition method and device
CN113470628A (en) Voice recognition method and device
WO2024027295A1 (en) Speech enhancement model training method and apparatus, enhancement method, electronic device, storage medium, and program product
CN111158907B (en) Data processing method and device, electronic equipment and storage medium
CN111225384A (en) Uplink interference modeling method, interference determining method and device
CN109362027B (en) Positioning method, device, equipment and storage medium
CN113936676A (en) Sound adjusting method and device and electronic equipment
CN115705839A (en) Voice playing method and device, computer equipment and storage medium
CN113112998A (en) Model training method, reverberation effect reproduction method, device and readable storage medium
CN110827851B (en) Method for adjusting volume, electronic device and computer storage medium
US20200304908A1 (en) Processing audio signals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant