WO2020073566A1 - 音频处理方法和装置 - Google Patents

音频处理方法和装置 Download PDF

Info

Publication number
WO2020073566A1
WO2020073566A1 PCT/CN2019/073127 CN2019073127W WO2020073566A1 WO 2020073566 A1 WO2020073566 A1 WO 2020073566A1 CN 2019073127 W CN2019073127 W CN 2019073127W WO 2020073566 A1 WO2020073566 A1 WO 2020073566A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
processed
characteristic data
frequency
frequency domain
Prior art date
Application number
PCT/CN2019/073127
Other languages
English (en)
French (fr)
Inventor
黄传增
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020073566A1 publication Critical patent/WO2020073566A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/08Arrangements for producing a reverberation or echo sound
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups

Definitions

  • the embodiments of the present disclosure relate to the field of computer technology, and in particular to audio processing methods and devices.
  • the embodiments of the present disclosure propose an audio processing method and apparatus.
  • an embodiment of the present disclosure provides an audio processing method including: acquiring audio to be processed; extracting frequency domain characteristics and tonal characteristics of the audio to be processed to obtain frequency domain characteristic data and tone characteristic data, wherein the frequency The domain characteristic data includes the frequency spectrum or the center point of the spectrum; based on the frequency domain characteristic data, the audio to be processed is equalized to obtain equalized audio; and the equalized audio is reverberated based on the tone characteristic data.
  • the frequency domain characteristic data includes the frequency spectrum of the audio to be processed; and equalizing the audio to be processed based on the frequency domain characteristic data includes: in response to determining that the energy difference between two frequency points in the spectrum is greater than a preset The first energy difference threshold reduces the gain of the frequency band where the two frequency points in the audio to be processed; in response to determining that the energy difference between the two frequency points in the spectrum is less than the preset second energy difference threshold, increases the two in the audio to be processed The gain of the frequency band where a frequency point is located.
  • the frequency domain characteristic data includes the center point of the spectrum; and equalizing the audio to be processed based on the frequency domain characteristic data includes: equalizing the audio to be processed based on the spectrum center point.
  • the method further includes: determining the loudness of the audio to be processed.
  • an audio processing device which includes:
  • the audio acquisition unit is configured to acquire audio to be processed; the first extraction unit is configured to extract frequency domain characteristics and tone characteristics of the audio to be processed to obtain frequency domain characteristic data and tone characteristic data, wherein the frequency domain characteristic data includes a spectrum or Spectrum center point; equalization processing unit, configured to perform equalization processing on the audio to be processed based on frequency domain characteristic data to obtain equalized audio; reverberation processing unit, configured to perform reverberation on the audio to be processed based on tone characteristic data deal with.
  • the frequency domain characteristic data includes the frequency spectrum of the audio to be processed; and the equalization processing unit is further configured to: in response to determining that the energy difference between two frequency points in the spectrum is greater than a preset first energy difference threshold, subtract Smaller gain of the frequency band where the two frequency points in the audio to be processed; in response to determining that the energy difference between the two frequency points in the spectrum is less than the preset second energy difference threshold, increase the gain of the frequency band where the two frequency points in the audio to be processed .
  • the frequency domain characteristic data includes the center point of the spectrum; and the equalization processing unit is further configured to perform equalization processing on the audio to be processed based on the center point of the spectrum.
  • the apparatus further includes: a loudness determination unit configured to determine the loudness of the audio to be processed.
  • the method and apparatus provided by the embodiments of the present disclosure can first extract the acquired frequency domain characteristics and tone characteristics of the audio to be processed, thereby obtaining frequency domain characteristic data and tone characteristic data. On this basis, the audio to be processed can be equalized based on the frequency domain characteristic data. Thereafter, the equalized audio can be subjected to reverberation processing based on the tone characteristic data. Due to the different frequency domain characteristics and tonal characteristics of different audio, reverberation processing and equalization processing are more targeted.
  • FIG. 1 is an exemplary system architecture diagram to which an embodiment of the present disclosure can be applied;
  • FIG. 2 is a flowchart of an embodiment of an audio processing method according to the present disclosure
  • FIG. 3 is a schematic diagram of an application scenario of an audio processing method according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of an embodiment of an audio processing device according to the present disclosure.
  • FIG. 6 is a schematic structural diagram of an electronic device suitable for implementing embodiments of the present disclosure.
  • FIG. 1 shows an exemplary system architecture 100 to which an audio processing method or apparatus of an embodiment of the present disclosure can be applied.
  • the system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105.
  • the network 104 is a medium used to provide a communication link between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, and so on.
  • Various communication client applications such as singing applications, video recording and sharing applications, and audio processing applications, can be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 are hardware, they may be various electronic devices that have a display screen and support audio processing.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the above electronic device. It can be implemented as multiple software or software modules, or as a single software or software module. There is no specific limit here.
  • the server 105 may be a server that provides various services, for example, a back-end server that supports applications installed on the terminal devices 101, 102, and 103.
  • the audio processing method provided by the embodiments of the present disclosure is generally executed by the terminal devices 101, 102, and 103.
  • the audio processing device is generally provided in the terminal devices 101, 102, 103.
  • the server can be hardware or software.
  • the server can be implemented as a distributed server cluster composed of multiple servers or as a single server.
  • the server is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or as a single software or software module. There is no specific limit here.
  • terminal devices, networks, and servers in FIG. 1 are only schematic. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • the audio processing method includes:
  • Step 201 Acquire audio to be processed.
  • the execution subject of the audio processing method can acquire the audio to be processed in various ways.
  • the above-mentioned execution subject can record the voice of the user singing through the recording device to obtain the audio to be processed.
  • the recording device may be integrated on the above-mentioned executive body, or may be in communication connection with the executive body, which is not limited in this disclosure.
  • the above-mentioned execution subject may also obtain pre-stored audio from the local or other storage device connected as the audio to be processed.
  • the audio to be processed may be any audio.
  • the audio to be processed can be specified by a technician or can be filtered according to certain conditions.
  • the audio to be processed may be the complete audio sung by the user or an audio segment sung by the user.
  • the audio to be processed may also be an audio segment with a short singing time (for example, 30 milliseconds) by the user.
  • Step 202 Extract frequency domain characteristics and tone characteristics of the audio to be processed to obtain frequency domain characteristic data and tone characteristic data.
  • the above-mentioned execution subject can transform the audio to be processed from the time domain to the frequency domain.
  • the frequency domain characteristics of the audio to be processed are extracted to obtain frequency domain characteristic data.
  • the frequency domain characteristic data includes the frequency domain characteristic data including the spectrum or the center point of the spectrum.
  • some audio processing software can be used to extract the frequency domain characteristics and tone characteristic data of the audio to be processed through the above-mentioned execution subject. It is also possible to obtain frequency domain characteristic data and tone characteristic data according to some characteristic extraction algorithms.
  • the execution subject may extract the tone characteristics of the audio to be processed to obtain tone characteristic data.
  • its tone characteristic data is also different.
  • the audio to be processed is the recorded audio when the user sings.
  • the pitch characteristic data may be data related to pitch. As an example, it may be the proportion of a certain frequency band of the entire audio to be processed in the entire frequency spectrum.
  • the pitch difference between the human voice and the corresponding accompaniment may be determined as the tone characteristic data.
  • Step 203 Perform equalization processing on the audio to be processed based on the frequency domain characteristic data to obtain equalized audio.
  • the execution subject may perform equalization processing on the audio to be processed based on the frequency domain characteristic data to obtain equalized audio.
  • a correspondence table in which frequency domain characteristic data and equalization parameters are stored may be established in advance. Therefore, the above-mentioned execution subject can query the above-mentioned frequency-domain characteristic data in the correspondence table, so as to obtain the corresponding equalization processing parameter.
  • the audio to be processed can be equalized through various filters based on the corresponding equalization processing parameters to obtain equalized audio.
  • the above-mentioned execution subject may select the corresponding processing logic according to the frequency domain characteristic data according to the preset processing logic, and perform equalization processing on the audio to be processed to obtain equalized audio.
  • performing equalization processing on the audio to be processed based on the frequency domain characteristic data includes: in response to determining that the energy difference between two frequency points in the spectrum is greater than a preset first energy difference threshold, Decrease the gain of the frequency band where the two frequency points in the audio to be processed; in response to determining that the energy difference between the two frequency points in the spectrum is less than the preset second energy difference threshold, increase the frequency band of the frequency band where the two frequency points in the audio to be processed Gain.
  • the frequency domain characteristic data includes the center point of the spectrum; and based on the frequency domain characteristic data, the audio to be processed is equalized, including: based on the spectrum center point, the audio to be processed is equalized .
  • a correspondence table in which the center point of the spectrum and the equalization parameters are stored can be established in advance. Therefore, the above-mentioned executive body may query the above-mentioned correspondence table for the equalization parameter corresponding to the center point of the frequency spectrum included in the frequency domain characteristic data. Therefore, the audio to be processed is processed according to the equalization parameters obtained from the query. Specifically, various existing equalizers can be used to input the equalization parameters obtained from the query into the equalizer. Thus, the audio to be processed is equalized.
  • Step 204 Perform reverberation processing on the equalized audio based on the tone characteristic data.
  • the above-mentioned execution subject may perform reverberation processing on the equalized audio according to the tone characteristic data.
  • the tone characteristic data as the proportion of a certain frequency band of the entire audio to be processed in the entire frequency spectrum as an example.
  • audio can be divided into different categories, such as: high-frequency audio, low-frequency audio, and mid-frequency audio.
  • the correspondence between audio and reverb processing parameters or processing logic for different categories can be set.
  • Take the pitch characteristic data as an example of the pitch difference between the human voice and the corresponding accompaniment.
  • the above-mentioned execution subject can determine the reverberation processing parameters or processing logic corresponding to the tonal characteristic data of the audio to be processed, so as to perform reverberation processing on the audio to be processed.
  • FIG. 3 is a schematic diagram of an application scenario of the audio processing method according to this embodiment.
  • the execution subject of the audio processing method may be the smartphone 301.
  • the smartphone 301 can first obtain the audio 3011 to be processed. After that, the smartphone 301 may extract the frequency domain characteristics and tone characteristics of the audio to be processed to obtain frequency domain characteristic data 3012 and tone characteristic data 3013. On this basis, based on the frequency domain characteristic data 3012, the audio to be processed 3011 is subjected to equalization processing to obtain audio 3011 'after equalization processing. Based on the tone characteristic data 3013, the equalized audio 3011 'is subjected to reverberation processing.
  • the method provided by the above embodiment of the present disclosure may first extract the acquired frequency domain characteristics and tone characteristics of the audio to be processed, thereby obtaining frequency domain characteristic data and tone characteristic data. On this basis, the audio to be processed can be equalized based on the frequency domain characteristic data. Thereafter, the equalized audio can be subjected to reverberation processing based on the tone characteristic data. Due to the different frequency domain characteristics and tonal characteristics of different audio, reverberation processing and equalization processing are more targeted.
  • FIG. 4 shows a flow 400 of yet another embodiment of an audio processing method.
  • the process 400 of the audio processing method includes the following steps:
  • Step 401 Acquire audio to be processed.
  • Step 402 Extract the frequency domain characteristics and tone characteristics of the audio to be processed to obtain frequency domain characteristic data and tone characteristic data.
  • Step 403 Perform equalization processing on the audio to be processed based on the frequency domain characteristic data to obtain equalized audio;
  • Step 404 Perform reverberation processing on the equalized audio based on the tone characteristic data.
  • steps 401-404 for the specific implementation of steps 401-404 and the technical effects brought about by them, reference may be made to steps 201-204 in the embodiment corresponding to FIG. 2, which will not be repeated here.
  • Step 405 Determine the loudness of the audio to be processed.
  • the above-mentioned execution subject may determine the loudness of the audio to be processed in various ways.
  • the loudness of the audio to be processed can be determined through some loudness algorithms. For example, a loudness algorithm based on octave or 1/3 octave.
  • the loudness of the audio to be processed can be obtained through some existing loudness models.
  • Existing loudness models include but are not limited to: Moore loudness model, Zwicker loudness model, etc.
  • the Zwicker loudness model is a multi-band loudness calculation model based on the excitation mode, which can simulate the hearing mechanism of the human ear.
  • the cochlear basement membrane can be compared to a set of band-pass filters with overlapping bandwidths, called the characteristic frequency band. Under the action of external excitation, the corresponding excitation intensity will be generated on each characteristic frequency band, which is called "excitation mode".
  • the characteristic loudness proportional to the excitation intensity can be obtained, and the characteristic loudness can be obtained by integrating the characteristic loudness.
  • the Moore loudness model is an improved loudness model based on the Zwicker loudness model. Compared with the Zwicker loudness model, the Moore loudness model is suitable for various steady-state noise signals and has a higher frequency resolution.
  • the audio processing method in this embodiment increases the step of determining the loudness of the audio to be processed. In practice, the loudness of audio to be processed often changes after various processing. Therefore, determining the loudness of the audio to be processed laid the foundation for loudness comparison and adjustment.
  • the present disclosure provides an audio processing device.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2.
  • the device can be specifically applied to various In electronic equipment.
  • the audio processing device 500 of this embodiment includes an audio acquisition unit 501, a first extraction unit 502, an equalization processing unit 503 and a reverberation processing unit 504.
  • the audio acquisition unit 501 is configured to acquire audio to be processed.
  • the first extraction unit 502 is configured to extract frequency domain characteristics of the audio to be processed to obtain frequency domain characteristic data.
  • the processing unit 503 is configured to perform equalization processing on the audio to be processed based on the frequency domain characteristic data.
  • the reverberation processing unit 504 is configured to perform reverberation processing on the audio to be processed based on the tone characteristic data.
  • the specific processing of the audio acquisition unit 501, the first extraction unit 502, the processing unit 503, and the reverberation processing unit 504 in the audio processing device 500 can be referred to the corresponding implementation in FIG. 2
  • the steps 201-204 in the example will not be repeated here.
  • the frequency domain characteristic data may include the frequency spectrum of the audio to be processed; and the equalization processing unit 503 is further configured to: in response to determining that the energy difference between two frequency points in the frequency spectrum is greater than the Set the first energy difference threshold to reduce the gain of the frequency band where the two frequency points in the audio to be processed; in response to determining that the energy difference between the two frequency points in the spectrum is less than the preset second energy difference threshold, increase the audio to be processed The gain of the frequency band where the two frequency points in.
  • the frequency domain characteristic data may include a spectrum center point; and the equalization processing unit 503 is further configured to perform equalization processing on the audio to be processed based on the spectrum center point.
  • the apparatus 500 may further include: a loudness determination unit (not shown in the figure).
  • the loudness determination unit is configured to determine the loudness of the audio to be processed.
  • the first extraction unit 502 may extract the acquired frequency domain characteristics and tone characteristics of the audio to be processed, thereby obtaining frequency domain characteristic data and tone characteristic data.
  • the equalization processing unit 503 may perform equalization processing on the audio to be processed based on the frequency domain characteristic data.
  • the reverberation processing unit 504 is configured to perform reverberation processing on the equalized audio based on the tone characteristic data. Due to the different frequency domain characteristics and tonal characteristics of different audio, reverberation processing and equalization processing are more targeted.
  • FIG. 6 shows a schematic structural diagram of an electronic device (for example, the terminal device in FIG. 1) 600 suitable for implementing the embodiments of the present disclosure.
  • Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals ( For example, mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, and so on.
  • the electronic device shown in FIG. 6 is just an example, and should not bring any limitation to the functions and use scope of the embodiments of the present disclosure.
  • the electronic device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which may be loaded into random access according to a program stored in a read-only memory (ROM) 602 or from the storage device 608
  • the program in the memory (RAM) 603 performs various appropriate operations and processes.
  • various programs and data necessary for the operation of the electronic device 600 are also stored.
  • the processing device 601, ROM 602, and RAM 603 are connected to each other via a bus 604.
  • An input / output (I / O) interface 605 is also connected to the bus 604.
  • the following devices can be connected to the I / O interface 605: including input devices 606 such as touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc .; including, for example, liquid crystal display (LCD), speaker, vibration
  • An output device 607 such as a storage device; a storage device 608 including, for example, a magnetic tape, a hard disk, etc .; and a communication device 609.
  • the communication device 609 may allow the electronic device 600 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 6 shows an electronic device 600 having various devices, it should be understood that it is not required to implement or have all the devices shown. More or fewer devices may be implemented or provided instead.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • embodiments of the present disclosure include a computer program product that includes a computer program carried on a computer-readable medium, the computer program containing program code for performing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 609, or from the storage device 608, or from the ROM 602.
  • the processing device 601 the above-described functions defined in the method of the embodiments of the present disclosure are executed.
  • the computer-readable medium described in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer diskettes, hard drives, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal that is propagated in baseband or as part of a carrier wave, in which computer-readable program code is carried. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable signal medium may send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device .
  • the program code contained on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: electric wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • the computer-readable medium may be included in the electronic device; or it may exist alone without being assembled into the electronic device.
  • the computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device is caused to: acquire audio to be processed; extract frequency-domain characteristics and tonal characteristics of the audio to be processed to obtain Frequency domain characteristic data and tone characteristic data, wherein frequency domain characteristic data includes spectrum or spectrum center point; based on frequency domain characteristic data, the audio to be processed is equalized to obtain equalized audio; and after equalization processing based on tone characteristic data Audio reverberation processing.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, the programming languages including object-oriented programming languages such as Java, Smalltalk, C ++, and also including conventional Procedural programming language-such as "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (eg, through an Internet service provider Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider Internet connection e.g, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of code that contains one or more logic functions Executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession can actually be executed in parallel, and sometimes they can also be executed in reverse order, depending on the functions involved.
  • each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented with dedicated hardware-based systems that perform specified functions or operations Or, it can be realized by a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present disclosure may be implemented in software or hardware.
  • the name of the unit does not constitute a limitation on the unit itself.
  • the audio acquisition unit can also be described as "a unit that acquires audio to be processed.”

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种音频处理方法(200)和装置,该方法(200)包括:获取待处理音频(201);提取待处理音频的频域特性和音调特性以得到频域特性数据和音调特性数据,其中频域特性数据包括频谱或频谱中心点(202);基于频域特性数据,对待处理音频进行均衡处理以得到均衡处理后的音频(203);并且基于音调特性数据对均衡处理后的音频进行混响处理(204)。该方法(200)实现了更具针对性的混响处理和均衡处理。

Description

音频处理方法和装置
本专利申请要求于2018年10月12日提交的、申请号为201811190954.8、申请人为北京微播视界科技有限公司、发明名称为“音频处理方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本公开的实施例涉及计算机技术领域,具体涉及音频处理方法和装置。
背景技术
随着电子设备的普及,人们对电子设备的智能化、人性化的要求也越来越高。以手机为代表的携式电子终端的使用普及度越来越高,多媒体功能是用户使用最多的应用之一。
发明内容
本公开的实施例提出了音频处理方法和装置。
第一方面,本公开的实施例提供了一种音频处理方法,该方法包括:获取待处理音频;提取待处理音频的频域特性和音调特性以得到频域特性数据和音调特性数据,其中频域特性数据包括频谱或频谱中心点;基于频域特性数据,对待处理音频进行均衡处理以得到均衡处理后的音频;并且基于音调特性数据对均衡处理后的音频进行混响处理。
在一些实施例中,频域特性数据中包括待处理音频的频谱;以及基于频域特性数据,对待处理音频进行均衡处理,包括:响应于确定频谱中两个频率点的能量差大于预设的第一能量差阈值,减小待处理音频中两个频率点所在频带的增益;响应于确定频谱中两个频率点的能量差小于预设的第二能量差阈值,增大待处理音频中两个频率点所在频带的增 益。
在一些实施例中,频域特性数据中包括频谱中心点;以及基于频域特性数据,对待处理音频进行均衡处理,包括:基于频谱中心点,对待处理音频进行均衡处理。
在一些实施例中,在获取待处理音频之后,该方法还包括:确定待处理音频的响度。
第二方面,本公开的实施例提供了一种音频处理装置,该装置包括:
音频获取单元,被配置成获取待处理音频;第一提取单元,被配置成提取待处理音频的频域特性和音调特性以得到频域特性数据和音调特性数据,其中频域特性数据包括频谱或频谱中心点;均衡处理单元,被配置成基于频域特性数据,对待处理音频进行均衡处理以得到均衡处理后的音频;混响处理单元,被配置成基于音调特性数据,对待处理音频进行混响处理。
在一些实施例中,频域特性数据中包括待处理音频的频谱;以及均衡处理单元被进一步配置成:响应于确定频谱中两个频率点的能量差大于预设的第一能量差阈值,减小待处理音频中两个频率点所在频带的增益;响应于确定频谱中两个频率点的能量差小于预设的第二能量差阈值,增大待处理音频中两个频率点所在频带的增益。
在一些实施例中,频域特性数据中包括频谱中心点;以及均衡处理单元被进一步配置成:基于频谱中心点,对待处理音频进行均衡处理。
在一些实施例中,该装置还包括:响度确定单元,被配置成确定待处理音频的响度。
本公开的实施例提供的方法和装置,首先可以提取获取的待处理音频的频域特性和音调特性,从而得到频域特性数据和音调特性数据。在此基础上,可以基于频域特性数据,对待处理音频进行均衡处理。之后,可以基于音调特性数据对均衡处理后的音频进行混响处理。由于不同音频的频域特性和音调特性不同,从而使混响处理和均衡处理更具针对性。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本公开的其它特征、目的和优点将会变得更明显:
图1是本公开的一个实施例可以应用于其中的示例性系统架构图;
图2是根据本公开的音频处理方法的一个实施例的流程图;
图3是根据本公开的实施例的音频处理方法的一个应用场景的示意图;
图4是根据本公开的音频处理方法的又一个实施例的流程图;
图5是根据本公开的音频处理装置的一个实施例的结构示意图;
图6是适于用来实现本公开的实施例的电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关公开,而非对该公开的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关公开相关的部分。
需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。
图1示出了可以应用本公开的实施例的音频处理方法或装置的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如唱歌类应用、视频录制分享类应用、音频处理类应用等。
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有显示屏并且支持音频处理的各种电子设备。当终端设备101、102、103为软件时,可以安装在上述电子设备中。其可以实现成多个软件或软件模块,也可以实现成单个软件或软件模块。在此不做具体限定。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上安装的应用提供支持的后台服务器。
需要说明的是,本公开的实施例所提供的音频处理方法一般由终端设备101、102、103执行。相应的,音频处理装置一般设置于终端设备101、102、103中。
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
继续参考图2,示出了根据本公开的音频处理方法的一个实施例的流程200。该音频处理方法包括:
步骤201,获取待处理音频。
在本实施例中,音频处理方法的执行主体(例如图1所示的终端设备101、102、103)可以通过各种方式获取待处理音频。例如,上述执行主体可以通过录音设备录制用户演唱的声音,得到待处理音频。其中,录音设备可以集成于上述执行主体上,也可以与执行主体通信连接,本公开对此不做限制。又如,上述执行主体也可以从本地或通信连接的其他存储设备中获取预先存储的音频作为待处理音频。
在本实施例中,待处理音频可以是任意的音频。待处理音频可以是由技术人员指定,也可以根据一定的条件进行筛选得到。例如,用户通过终端设备(例如,智能手机)录制演唱的音频时,待处理音频 可以是用户演唱的完整的音频,也可以是用户演唱的一个音频片段。在实时监听的场景下,待处理音频也可以是用户演唱的时间较短(例如30毫秒)的一个音频片段。
步骤202,提取待处理音频的频域特性和音调特性以得到频域特性数据和音调特性数据。
在本实施例中,上述执行主体可以将待处理音频从时域变换到频域。在此基础上,提取待处理音频的频域特性,得到频域特性数据。其中,频域特性数据包括频域特性数据包括频谱或频谱中心点。实践中,通过上述执行主体可以一些音频处理软件,提取待处理音频的频域特性和音调特性数据。也可以根据一些特性提取算法,得到频域特性数据和音调特性数据。
在本实施例中,上述执行主体可以提取待处理音频的音调特性,得到音调特性数据。实践中,随着待处理音频的不同,其音调特性数据也不相同。尤其是当待处理音频为录制的用户演唱时的音频。其中,音调特性数据可以是与音调相关的数据。作为示例,可以是整个待处理音频的某个频带在整个频谱中的占比情况。作为示例,当待处理音频包括伴奏声和人声时,可以确定人声与对应的伴奏的音调差作为音调特性数据。
步骤203,基于频域特性数据,对待处理音频进行均衡处理以得到均衡处理后的音频。
在本实施例中,上述执行主体可以基于频域特性数据,对待处理音频进行均衡处理以得到均衡处理后的音频。作为示例,可以预先建立存储有频域特性数据与均衡参数的对应关系表。从而上述执行主体可以在对应关系表中查询上述频域特性数据,从而得到对应的均衡处理参数。在此基础上,可以基于对应的均衡处理参数,通过各种滤波器,对待处理音频进行均衡处理以得到均衡处理后的音频。作为示例,上述执行主体可以根据预先设定的处理逻辑,根据频域特性数据选取对应的处理逻辑,并对待处理音频进行均衡处理以得到均衡处理后的音频。
在本实施例的一些可选的实现方式中,基于频域特性数据,对待处理音频进行均衡处理,包括:响应于确定频谱中两个频率点的能量差大于预设的第一能量差阈值,减小待处理音频中两个频率点所在频带的增 益;响应于确定频谱中两个频率点的能量差小于预设的第二能量差阈值,增大待处理音频中两个频率点所在频带的增益。从而可以实现对于待处理音频的精细调节。
在本实施例的一些可选的实现方式中,频域特性数据中包括频谱中心点;以及基于频域特性数据,对待处理音频进行均衡处理,包括:基于频谱中心点,对待处理音频进行均衡处理。
在这些实现方式中,可以预先建立存储有频谱中心点与均衡参数的对应关系表。从而,上述执行主体可以在上述对应关系表中查询与频域特性数据中包括频谱中心点对应的均衡参数。从而根据查询得到的均衡参数对待处理音频进行处理。具体来说,可以通过现有的各种均衡器,并将查询得到的均衡参数输入均衡器。从而对待处理音频进行均衡处理。
步骤204,基于音调特性数据对均衡处理后的音频进行混响处理。
在本实施例中,上述执行主体可以根据音调特性数据,对均衡处理后的音频进行混响处理。以音调特性数据是整个待处理音频的某个频带在整个频谱中的占比情况为例。根据占比情况的不同,可以将音频分为不同的类别,例如:高音音频、低音音频、中音音频。在此基础上,可以设定针对不同的类别的音频与混响处理参数或者处理逻辑的对应关系。以音调特性数据为人声与对应的伴奏的音调差为例。通过预先建立的音调差与混响处理参数或者处理逻辑的对应关系。从而上述执行主体可以通过待处理音频的音调特性数据,确定与之对应的混响处理参数或者处理逻辑,从而对待处理音频进行混响处理。
继续参见图3,图3是根据本实施例的音频处理方法的应用场景的一个示意图。在图3的应用场景中,音频处理方法的执行主体可以是智能手机301。智能手机301可以首先获取待处理音频3011。之后,智能手机301可以提取待处理音频的频域特性和音调特性以得到频域特性数据3012和音调特性数据3013。在此基础上,基于频域特性数据3012,对待处理音频3011进行均衡处理,得到均衡处理后的音频3011'。基于音调特性数据3013,对均衡处理后的音频3011'进行混响处理。
本公开的上述实施例提供的方法,首先可以提取获取的待处理音 频的频域特性和音调特性,从而得到频域特性数据和音调特性数据。在此基础上,可以基于频域特性数据,对待处理音频进行均衡处理。之后,可以基于音调特性数据对均衡处理后的音频进行混响处理。由于不同音频的频域特性和音调特性不同,从而使混响处理和均衡处理更具针对性。
进一步参考图4,其示出了音频处理方法的又一个实施例的流程400。该音频处理方法的流程400,包括以下步骤:
步骤401,获取待处理音频。
步骤402,提取待处理音频的频域特性和音调特性以得到频域特性数据和音调特性数据。
步骤403,基于频域特性数据,对待处理音频进行均衡处理以得到均衡处理后的音频;
步骤404,基于音调特性数据对均衡处理后的音频进行混响处理。
在本实施例中,步骤401-404的具体实现及其所带来的技术效果可以参考图2对应的实施例中的步骤201-204,在此不再赘述。
步骤405,确定待处理音频的响度。
在本实施例中,上述执行主体可以通过多种方式确定待处理音频的响度。
作为示例,可以通过一些响度算法,确定待处理音频的响度。例如,基于倍频程或1/3倍频程的响度算法。
作为示例,可以通过一些现有的响度模型,得到待处理音频的响度。现有的响度模型包括但不限于:Moore响度模型、Zwicker响度模型等等。其中,Zwicker响度模型是一种基于激励模式的多频带响度计算模型,能够模拟人耳的听觉产生机理。根据人耳产生听觉的机理,耳蜗基底膜可被类比为一组带宽重叠的带通滤波器,称为特征频带。在外部激励的作用下,每个特征频带上会产生对应的激励强度,称为“激励模式”。根据激励强度,可以得到与激励强度成比例关系的特征响度,对特征响度积分,可以得到子带的响度。Moore响度模型是在Zwicker响度模型的基础上改进的响度模型,与Zwicker响度模型相比,Moore 响度模型适用于各种稳态噪声信号,频率分辨率较高。从图4中可以看出,与图2对应的实施例相比,本实施例中的音频处理方法增加了确定待处理音频的响度的步骤。实践中,由于待处理音频在进行各种处理后,其响度往往会发生变化。因此,确定待处理音频的响度为响度比较和调整奠定了基础。
进一步参考图5,作为对上述各图所示方法的实现,本公开提供了一种音频处理装置,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本实施例的音频处理装置500包括:音频获取单元501、第一提取单元502、均衡处理单元503和混响处理单元504。其中,音频获取单元501被配置成获取待处理音频。第一提取单元502被配置成提取待处理音频的频域特性,得到频域特性数据。处理单元503被配置成基于频域特性数据,对待处理音频进行均衡处理。混响处理单元504被配置成基于音调特性数据,对待处理音频进行混响处理。
在本实施例中,音频处理装置500中的音频获取单元501、第一提取单元502、处理单元503和混响处理单元504的具体处理及其所带来的技术效果可以参考图2对应的实施例中的步骤201-204,在此不再赘述。
在本实施例的一些可选的实现方式中,频域特性数据中可以包括待处理音频的频谱;以及均衡处理单元503被进一步配置成:响应于确定频谱中两个频率点的能量差大于预设的第一能量差阈值,减小待处理音频中两个频率点所在频带的增益;响应于确定频谱中两个频率点的能量差小于预设的第二能量差阈值,增大待处理音频中两个频率点所在频带的增益。
在本实施例的一些可选的实现方式中,频域特性数据中可以包括频谱中心点;以及均衡处理单元503被进一步配置成:基于频谱中心点,对待处理音频进行均衡处理。
在本实施例的一些可选的实现方式中,装置500还可以包括:响度确定单元(图中未示出)。响度确定单元被配置成确定待处理音频的响度。
在本实施例中,首先第一提取单元502可以提取获取的待处理音频的频域特性和音调特性,从而得到频域特性数据和音调特性数据。在此基础上,均衡处理单元503可以基于频域特性数据,对待处理音频进行均衡处理。混响处理单元504被配置成基于音调特性数据对均衡处理后的音频进行混响处理。由于不同音频的频域特性和音调特性不同,从而使混响处理和均衡处理更具针对性。
下面参考图6,其示出了适于用来实现本公开的实施例的电子设备(例如图1中的终端设备)600的结构示意图。本公开的实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图6示出的电子设备仅仅是一个示例,不应对本公开的实施例的功能和使用范围带来任何限制。
如图6所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以 被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开的实施例的方法中限定的上述功能。
需要说明的是,本公开所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取待处理音频;提取待处理音频的频域特性和音调特性以得到频域特性数据和音调特性数据,其中频域特性数据包括频谱或 频谱中心点;基于频域特性数据,对待处理音频进行均衡处理以得到均衡处理后的音频;并且基于音调特性数据对均衡处理后的音频进行混响处理。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开的实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,音频获取单元还可以被描述为“获取待处理音频的单元”。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限 于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (10)

  1. 一种音频处理方法,包括:
    获取待处理音频;
    提取所述待处理音频的频域特性和音调特性以得到频域特性数据和音调特性数据,其中所述频域特性数据包括频谱或频谱中心点;
    基于所述频域特性数据,对所述待处理音频进行均衡处理以得到均衡处理后的音频;并且
    基于所述音调特性数据对所述均衡处理后的音频进行混响处理。
  2. 根据权利要求1所述的方法,其中,所述频域特性数据中包括所述待处理音频的频谱;以及
    所述基于所述频域特性数据,对所述待处理音频进行均衡处理,包括:
    响应于确定所述频谱中两个频率点的能量差大于预设的第一能量差阈值,减小所述待处理音频中两个频率点所在频带的增益;
    响应于确定所述频谱中两个频率点的能量差小于预设的第二能量差阈值,增大所述待处理音频中两个频率点所在频带的增益。
  3. 根据权利要求1所述的方法,其中,所述频域特性数据中包括频谱中心点;以及
    所述基于所述频域特性数据,对所述待处理音频进行均衡处理,包括:
    基于所述频谱中心点,对所述待处理音频进行均衡处理。
  4. 根据权利要求1-3中任一所述的方法,其中,在所述获取待处理音频之后,所述方法还包括:
    确定所述待处理音频的响度。
  5. 一种音频处理装置,包括:
    音频获取单元,被配置成获取待处理音频;
    第一提取单元,被配置成提取所述待处理音频的频域特性和音调特性以得到频域特性数据和音调特性数据,其中所述频域特性数据包括频谱或频谱中心点;
    均衡处理单元,被配置成基于所述频域特性数据对所述待处理音频进行均衡处理以得到均衡处理后的音频;
    混响处理单元,被配置成基于所述音调特性数据对所述待处理音频进行混响处理。
  6. 根据权利要求5所述的装置,其中,所述频域特性数据中包括所述待处理音频的频谱;以及
    所述均衡处理单元被进一步配置成:
    响应于确定所述频谱中两个频率点的能量差大于预设的第一能量差阈值,减小所述待处理音频中两个频率点所在频带的增益;
    响应于确定所述频谱中两个频率点的能量差小于预设的第二能量差阈值,增大所述待处理音频中两个频率点所在频带的增益。
  7. 根据权利要求5所述的装置,其中,所述频域特性数据中包括频谱中心点;以及
    所述均衡处理单元被进一步配置成:
    基于所述频谱中心点,对所述待处理音频进行均衡处理。
  8. 根据权利要求5-7中任一所述的装置,其中,所述装置还包括:
    响度确定单元,被配置成确定所述待处理音频的响度。
  9. 一种终端设备,包括:
    一个或多个处理器;
    存储装置,其上存储有一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-4中任一所述的方法。
  10. 一种计算机可读介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-4中任一所述的方法。
PCT/CN2019/073127 2018-10-12 2019-01-25 音频处理方法和装置 WO2020073566A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811190954.8 2018-10-12
CN201811190954.8A CN111048107B (zh) 2018-10-12 2018-10-12 音频处理方法和装置

Publications (1)

Publication Number Publication Date
WO2020073566A1 true WO2020073566A1 (zh) 2020-04-16

Family

ID=70164378

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/073127 WO2020073566A1 (zh) 2018-10-12 2019-01-25 音频处理方法和装置

Country Status (2)

Country Link
CN (1) CN111048107B (zh)
WO (1) WO2020073566A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1672325A (zh) * 2002-06-05 2005-09-21 索尼克焦点公司 声虚拟现实引擎和增强传送音的高级技术
CN101155438A (zh) * 2006-09-26 2008-04-02 张秀丽 音频设备的频率响应自适应均衡方法
CN102667918A (zh) * 2009-10-21 2012-09-12 弗兰霍菲尔运输应用研究公司 用于使音频信号混响的混响器和方法
CN103236263A (zh) * 2013-03-27 2013-08-07 东莞宇龙通信科技有限公司 一种改善通话质量的方法、系统及移动终端
CN103559876A (zh) * 2013-11-07 2014-02-05 安徽科大讯飞信息科技股份有限公司 音效处理方法及系统
CN104681034A (zh) * 2013-11-27 2015-06-03 杜比实验室特许公司 音频信号处理
WO2017136573A1 (en) * 2016-02-02 2017-08-10 Dts, Inc. Augmented reality headphone environment rendering
CN108022595A (zh) * 2016-10-28 2018-05-11 电信科学技术研究院 一种语音信号降噪方法和用户终端

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8831936B2 (en) * 2008-05-29 2014-09-09 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for speech signal processing using spectral contrast enhancement
US20110251704A1 (en) * 2010-04-09 2011-10-13 Martin Walsh Adaptive environmental noise compensation for audio playback
CN103714824B (zh) * 2013-12-12 2017-06-16 小米科技有限责任公司 一种音频处理方法、装置及终端设备
CN107705778B (zh) * 2017-08-23 2020-09-15 腾讯音乐娱乐(深圳)有限公司 音频处理方法、装置、存储介质以及终端
CN108597527B (zh) * 2018-04-19 2020-01-24 北京微播视界科技有限公司 多声道音频处理方法、装置、计算机可读存储介质和终端

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1672325A (zh) * 2002-06-05 2005-09-21 索尼克焦点公司 声虚拟现实引擎和增强传送音的高级技术
CN101155438A (zh) * 2006-09-26 2008-04-02 张秀丽 音频设备的频率响应自适应均衡方法
CN102667918A (zh) * 2009-10-21 2012-09-12 弗兰霍菲尔运输应用研究公司 用于使音频信号混响的混响器和方法
CN103236263A (zh) * 2013-03-27 2013-08-07 东莞宇龙通信科技有限公司 一种改善通话质量的方法、系统及移动终端
CN103559876A (zh) * 2013-11-07 2014-02-05 安徽科大讯飞信息科技股份有限公司 音效处理方法及系统
CN104681034A (zh) * 2013-11-27 2015-06-03 杜比实验室特许公司 音频信号处理
WO2017136573A1 (en) * 2016-02-02 2017-08-10 Dts, Inc. Augmented reality headphone environment rendering
CN108022595A (zh) * 2016-10-28 2018-05-11 电信科学技术研究院 一种语音信号降噪方法和用户终端

Also Published As

Publication number Publication date
CN111048107A (zh) 2020-04-21
CN111048107B (zh) 2022-09-23

Similar Documents

Publication Publication Date Title
CN109121057B (zh) 一种智能助听的方法及其系统
CN112306448A (zh) 根据环境噪声调节输出音频的方法、装置、设备和介质
CN109817238A (zh) 音频信号采集装置、音频信号处理方法和装置
WO2021203906A1 (zh) 自动音量调整方法、装置、介质和设备
US20130178964A1 (en) Audio system with adaptable audio output
CN115442709A (zh) 音频处理方法、虚拟低音增强系统、设备和存储介质
US20130178963A1 (en) Audio system with adaptable equalization
CN112309418B (zh) 一种抑制风噪声的方法及装置
CN113362839A (zh) 音频数据处理方法、装置、计算机设备及存储介质
WO2020073564A1 (zh) 用于检测音频信号的响度的方法和装置
US11113092B2 (en) Global HRTF repository
CN111045634A (zh) 音频处理方法和装置
WO2020073566A1 (zh) 音频处理方法和装置
US20200320979A1 (en) Synthetic Narrowband Data Generation for Narrowband Automatic Speech Recognition Systems
CN114121050A (zh) 音频播放方法、装置、电子设备和存储介质
CN112307161B (zh) 用于播放音频的方法和装置
CN115278456A (zh) 一种音响设备及音频信号处理方法
CN111147655B (zh) 模型生成方法和装置
WO2020073562A1 (zh) 音频处理方法和装置
WO2020087788A1 (zh) 音频处理方法和装置
CN109375892B (zh) 用于播放音频的方法和装置
CN111145776B (zh) 音频处理方法和装置
CN111145793B (zh) 音频处理方法和装置
CN108932953B (zh) 一种音频均衡函数确定方法、音频均衡方法及设备
CN111145792B (zh) 音频处理方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19870432

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.07.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19870432

Country of ref document: EP

Kind code of ref document: A1