WO2020087788A1 - 音频处理方法和装置 - Google Patents
音频处理方法和装置 Download PDFInfo
- Publication number
- WO2020087788A1 WO2020087788A1 PCT/CN2019/072945 CN2019072945W WO2020087788A1 WO 2020087788 A1 WO2020087788 A1 WO 2020087788A1 CN 2019072945 W CN2019072945 W CN 2019072945W WO 2020087788 A1 WO2020087788 A1 WO 2020087788A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- data
- scene type
- processing method
- denoising processing
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 119
- 238000000034 method Methods 0.000 claims abstract description 46
- 238000004590 computer program Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 description 21
- 238000010586 diagram Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 230000000644 propagated effect Effects 0.000 description 4
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 125000001495 ethyl group Chemical group [H]C([H])([H])C([H])([H])* 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10009—Improvement or modification of read or write signals
- G11B20/10046—Improvement or modification of read or write signals filtering or equalising, e.g. setting the tap weights of an FIR filter
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02087—Noise filtering the noise being separate speech, e.g. cocktail party
Definitions
- the embodiments of the present disclosure relate to the field of computer technology, and in particular to audio processing methods and devices.
- Recording also called pickup, refers to the process of collecting sound.
- Electronic devices such as terminals
- the recording can get the recording data, and the recording data can be directly used as the playback data.
- the playback data can be played by the electronic device that collects the recorded data, or by other electronic devices.
- the embodiments of the present disclosure propose an audio processing method and device.
- an embodiment of the present disclosure provides an audio processing method, which includes: acquiring recorded data; selecting a denoising processing method as a target denoising processing method from a pre-established denoising processing method set; based on the above target Denoising processing method to process the above recording data.
- an embodiment of the present disclosure provides an audio processing device including: an acquisition unit configured to acquire recording data; a selection unit configured to select denoising from a pre-established denoising processing method set The processing method serves as a target denoising processing method; the processing unit is configured to process the recording data based on the target denoising processing method.
- an embodiment of the present disclosure provides an electronic device including: one or more processors; a storage device on which one or more programs are stored, when the above one or more programs are When executed by one or more processors, the above one or more processors implement the method described in any one of the implementation manners of the first aspect.
- an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, where the computer program is executed by a processor to implement the method described in any one of the implementation manners of the first aspect.
- the audio processing method and device provided by the embodiments of the present disclosure select the denoising processing method as the target denoising processing method from the pre-established denoising processing method set, and then perform the above recording data based on the target denoising processing method Processing, technical effects can at least include: provides a new audio processing method.
- FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present disclosure may be applied;
- FIG. 2 is a flowchart of an embodiment of an audio processing method according to the present disclosure
- FIG. 3 is a schematic diagram of an application scenario according to the audio processing method of the present disclosure.
- FIG. 4 is a schematic diagram of another application scenario according to the audio processing method of the present disclosure.
- FIG. 5 is a schematic structural diagram of an embodiment of an audio processing device according to the present disclosure.
- FIG. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present disclosure.
- FIG. 1 shows an exemplary system architecture 100 to which embodiments of the audio processing method or audio processing apparatus of the present disclosure can be applied.
- the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
- the network 104 may be a medium to provide a communication link between the terminal devices 101, 102, 103 and the server 105.
- the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
- the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages, and so on.
- Various communication client applications may be installed on the terminal devices 101, 102, and 103, such as recording applications, call applications, live broadcast applications, search applications, instant communication tools, email clients, and social platform software.
- the terminal devices 101, 102, and 103 may be hardware or software.
- the terminal devices 101, 102, and 103 can be various electronic devices with communication functions, including but not limited to smartphones, tablets, e-book readers, MP3 players (Moving Pictures Experts Group Audio Layer III, Motion Picture Expert Compression Standard Audio Layer 3), MP4 (Moving Pictures Experts Group Audio Layer IV, Motion Picture Expert Compression Standard Audio Layer 4) players, laptops and desktop computers, etc.
- MP3 players Motion Pictures Experts Group Audio Layer III, Motion Picture Expert Compression Standard Audio Layer 3
- MP4 Motion Picture Expert Compression Standard Audio Layer 4
- the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example to provide distributed services), or as a single software or software module. There is no specific limit here.
- the server 105 may be a server that provides various services, such as a background server that supports the sound pickup function on the terminal devices 101, 102, and 103.
- the terminal device may package the original recording data obtained by sound pickup to obtain an audio processing request, and then send the audio processing request to the background server.
- the background server can analyze and process the received audio processing request and other data, and feed back the processing result (for example, playback data) to the terminal device.
- the audio processing method provided by the embodiments of the present disclosure is generally executed by the terminal devices 101, 102, and 103, and accordingly, the audio processing device is generally provided in the terminal devices 101, 102, and 103.
- the audio processing method provided by the embodiment of the present disclosure may also be executed by a server.
- the server may receive the recording data sent by the terminal device, and then execute the method shown in the present disclosure, and finally send the playback data generated based on the recording data Terminal Equipment.
- the server can be hardware or software.
- the server can be implemented as a distributed server cluster composed of multiple servers, or as a single server.
- the server is software, it can be implemented as multiple software or software modules (for example, to provide distributed services), or as a single software or software module. There is no specific limit here.
- terminal devices, networks, and servers in FIG. 1 are only schematic. According to the implementation needs, there can be any number of terminal devices, networks and servers.
- FIG. 2 illustrates a process 200 of an embodiment of an audio processing method.
- This embodiment is mainly exemplified by applying the method to an electronic device with certain computing capabilities.
- the electronic device may be the terminal device shown in FIG. 1.
- the audio processing method includes the following steps:
- Step 201 Obtain recording data.
- the execution subject of the audio processing method (for example, the terminal device shown in FIG. 1) can acquire the recording data.
- the recorded data may be audio data collected by the above-mentioned execution subject or other electronic devices.
- the above-mentioned execution subject can directly collect or receive the recording data from other electronic devices to obtain the recording data.
- Step 202 Select the denoising processing method as the target denoising processing method from the pre-established denoising processing method set.
- the execution subject may select the denoising processing method as the target denoising processing method from the pre-established denoising processing method set.
- the denoising processing method may be a processing method for removing noise.
- the sound other than the target sound can be defined as noise.
- the target sound may be human speech, and the sound (noise) other than the target sound may be a car sound on the street.
- the target sound may be the voice of someone A, and the sound (noise) other than the target sound may include the voice of someone B and the sound of a car on the street.
- the denoising processing method may be a denoising processing function call interface, or a packaged denoising processing function.
- the denoising processing function may include parameters such as filters, noise determination thresholds, and band selection parameters.
- the set of denoising processing methods may be a set of denoising processing methods.
- the denoising processing methods in the denoising processing method set may differ in the following aspects but not limited to: filters, noise determination thresholds, band selection parameters, and so on.
- the first denoising processing method may have higher denoising accuracy and slower processing speed; the second denoising processing method may have lower denoising accuracy and faster processing speed.
- the target denoising processing method can be selected from the above denoising processing method set in various ways.
- the target denoising processing method is selected, and a denoising processing method suitable for various electronic devices can be provided for different electronic devices; During the audio acquisition period (the denoising requirements in different periods may be different), provide a denoising processing method adapted to the current period. Therefore, it is possible to implement adaptive denoising processing and improve the universality and efficiency of the denoising processing.
- step 203 the recorded data is processed based on the target denoising processing method.
- the above-mentioned execution subject may process the above-mentioned recorded data based on the target denoising processing mode selected in step 202 for processing.
- the execution subject may use the target denoising processing method to process the recorded data.
- FIG. 3 is a schematic diagram of an application scenario of the audio processing method according to the embodiment shown in FIG.
- the application scenario of Figure 3 :
- the terminal 301 can collect recording data.
- the denoising processing method is selected as the target denoising processing method.
- the terminal 301 can process the recording data based on the target denoising processing method.
- the terminal 301 may process the data to be played back, and then the terminal 301 reads the data to be played back to play the sound.
- FIG. 4 is a schematic diagram of an application scenario of the audio processing method according to the embodiment shown in FIG. 2.
- FIG. 4 is a schematic diagram of an application scenario of the audio processing method according to the embodiment shown in FIG. 2.
- Figure 4 is a schematic diagram of an application scenario of the audio processing method according to the embodiment shown in FIG. 2.
- the terminal 401 can collect recorded data.
- the server 402 can acquire the above recording data.
- the server 402 may select the denoising processing method as the target denoising processing method from the pre-established denoising processing method set.
- the server 402 may process the recording data based on the target denoising method.
- the server 402 may process the data to be played back, and then send the processed data to be played back to the terminal 403. Then, the terminal 403 reads the data to be played to play the sound.
- the method provided by the above embodiment of the present disclosure by selecting a denoising processing method as a target denoising processing method from a pre-established denoising processing method set, and then processing the recording data based on the target denoising processing method,
- the technical effects can at least include: providing a new audio processing method.
- step 202 may be implemented in the following manner: from the above set of denoising processing methods, a denoising processing method is randomly selected as the target denoising processing method.
- step 202 may be implemented by selecting the denoising processing method corresponding to the target scene type from the above denoising processing method set as the target denoising processing method.
- the target denoising processing method is selected according to the target scene type, and the denoising processing method suitable for processing the recording data can be determined according to the scene from which the recording data is collected. Therefore, the recorded data can be processed through a more suitable denoising method to achieve the desired effect.
- the expected effect may be higher processing accuracy or faster processing speed.
- the denoising processing method in the above denoising processing method set corresponds to a predefined scene type.
- the predefined scene type may indicate an application scene.
- Application scenarios can draw different classifications from different angles.
- scene types can be divided into high-noise scenes, medium-noise scenes, and low-noise scenes.
- scene types can be divided into call scenes and singing scenes (the user's singing voice is then released).
- the target scene type may be the type to which the scene from which the recording data is collected belongs.
- the target scene type can be determined in various ways.
- the above target application may be an application that calls a recording collection function of an electronic device to collect the above recording data.
- the application that invokes the recording and collection function may be an application with a recording and collection function, for example, a call-type application, a singing-type application (collecting the singing voice of the user and releasing it).
- the above target scene type can be obtained by the following steps: according to the correspondence between the scene type and the application, from the preset set of scene types, the scene type corresponding to the target application is selected as the target scene type.
- scene types may include high-noise scenes and low-noise scenes
- applications may include call-type applications and singing-type applications. Call applications can correspond to high-noise scenes
- singing applications can correspond to low-noise scenes.
- the target application type is selected according to the correspondence relationship between the scene type and the application, which may be executed by the above-mentioned execution subject, or may be executed by the electronic device that collects the recording data.
- the target application as a bridge to determine the type of scene, the nature of the scene in which the target application is usually located can be used to quickly and accurately determine the target scene type.
- the above target scene type may be obtained by the following steps: acquiring a preset scene type in the target application, and using the acquired scene type as the target scene type.
- the application user or application provider can set the scene type according to the scene frequently used by the target application.
- the target scene type can be set for the application in advance according to the type of application (calling or singing) and demand (real-time requirements are high or low). Therefore, a denoising processing method suitable for the application can be determined for the application.
- acquiring the scene type preset in the target application as the target application type may be executed by the above-mentioned execution subject, or may be executed by an electronic device that collects recording data.
- the target scene type is obtained by the following steps: determining the target noise level of the recording data according to the recording data; according to the correspondence between the preset noise level and the scene type, from the preset set of scene types, The scene type corresponding to the target noise level is selected as the target scene type.
- the front-end data of the recording data can be selected for processing to determine the ratio of noise to the target sound, thereby determining the noise level in the recording data, and determining the determined noise level as the target noise level. Then, according to the correspondence between the noise level and the scene type, the target scene type is selected.
- the noise level may include a high noise level, a medium noise level, and a low noise level.
- Scene types can include high noise scenes, medium noise scenes, and low noise scenes.
- a high noise level corresponds to a high noise scene
- a medium noise level corresponds to a medium noise scene
- a low noise level corresponds to a low noise scene.
- the recorded data is processed in real time to determine the noise level, and then the noise level is used as a bridge to determine the target application scenario. It can match the noise situation of the current application scene and determine the target scene type in real time and accurately.
- the recording data may include echo data of sound generated based on the playback data of the target electronic device.
- terminal device A may be used as the first end
- terminal device B may be used as the second end.
- User A makes a sound
- terminal device A collects the second end recording data.
- the terminal device A or the server generates the first-end playback data based on the second-end recording data.
- Terminal device B receives the first-end playback data and reads the first-end playback data for playback.
- the terminal device B can collect the sound of the space where it is located to obtain the first-end recording data. It can be understood that, when the terminal device ethyl plays sound at the first end playback data, the sound is transmitted to the space where the terminal device B is located, and the first end recording data collected by the terminal device B includes the sound based on the first end playback data .
- the sound generated based on the first-end sound is propagated in the space, and the audio data formed by collecting the propagated sound may be referred to as echo data.
- the echo data and the first-end playback data have a certain degree of similarity but are different; for example, the semantics are the same but the voice sizes are different.
- the above step 203 may include using the target denoising processing method to process the recording data to generate first intermediate data; using a preset echo cancellation processing method to eliminate echo in the first intermediate data Data to generate second intermediate data; based on the second intermediate data, generate data to be played back.
- the principle of the echo cancellation processing method is as follows: acquiring first-end playback data and first-end recording data; from the first-end recording data, determining a target data segment that matches the first-end playback data ; According to the acquisition start time of the target data segment, determine the delay time of the first-end playback data relative to the first-end recording data; according to the delay time, eliminate the echo data in the first-end recording data; wherein The foregoing first-end playback data is generated based on the second-end recording data, and the first-end recording data includes echo data of the sound generated based on the first-end playback data.
- the execution subject may eliminate the echo data in the first-end recording data according to the delay time.
- the implementation principle of eliminating the echo data in the first-end recording data is as follows: the time for collecting the first-end recording data is shifted backward by the delay time, and the start time of collecting the echo data for collecting the echo data can be determined. In the first-end recording data, find the location of the start time of the echo data collection. Subtracting the echo data from the first-end recording data after this position can eliminate the above-mentioned echo data in the first-end recording data.
- a function that uses echo data as an independent variable and first-end recording data as a dependent variable may be generated in advance. Use this function to obtain echo data.
- generating the data to be played based on the second intermediate data may include generating data to be played based on the second intermediate data.
- the generation of the data to be played back based on the above second intermediate data may be processed by various processing methods, and the processing methods may include but are not limited to: automatic gain control, time-frequency conversion, volume limiting, and the like.
- the present disclosure provides an embodiment of an audio processing device, which corresponds to the method embodiment shown in FIG. 2, and the device may specifically Used in various electronic devices.
- the audio processing device 500 of this embodiment includes: an obtaining unit 501, a selecting unit 502 and a processing unit 503.
- the acquisition unit is configured to acquire the recording data
- the selection unit is configured to select the denoising processing method as the target denoising processing method from the pre-established denoising processing method set
- the processing unit is configured to be based on the above target Denoising processing method to process the above recording data.
- step 201 the specific processing of the acquisition unit 501, the selection unit 502, and the processing unit 503 of the audio processing device 500 and the technical effects they bring can be referred to step 201, step 202, and step 203 in the corresponding embodiment of FIG. 2, respectively Relevant descriptions will not be repeated here.
- the above selection unit is further configured to: select the denoising processing mode corresponding to the target scene type from the above denoising processing mode set as the target denoising processing mode; Wherein, the denoising processing method in the above denoising processing method set corresponds to a predefined scene type, and the target scene type is the type to which the scene from which the recording data is collected belongs.
- the above target scene type is obtained by the following steps: according to the correspondence relationship between the scene type and the application, from the preset scene type set, select the scene type corresponding to the target application as the target Scene type; where the above target application is an application that calls the recording collection function of the electronic device to collect the above recording data.
- the above target scene type is obtained by the following steps: acquiring a preset scene type in the target application, and determining the acquired scene type as the above target scene type; wherein, the above target The application is an application that calls the recording collection function of the electronic device to collect the above recording data.
- the target scene type is obtained by the following steps: determining the target noise level of the recording data according to the recording data; according to the correspondence between the preset noise level and the scene type, In the set of set scene types, the scene type corresponding to the target noise level is selected as the target scene type.
- the recording data includes echo data of sound generated based on the playback data of the target electronic device; and the processing unit is further configured to use the target denoising process Method, processing the recording data to generate first intermediate data; using a preset echo cancellation processing method, eliminating the echo data in the first intermediate data to generate second intermediate data; based on the second intermediate data, generating a pending Playback data.
- the processing unit is further configured to process the second intermediate data based on the target denoising processing manner to generate data to be played back.
- FIG. 6 shows a schematic structural diagram of an electronic device (such as the terminal or server in FIG. 1) 600 suitable for implementing the embodiments of the present disclosure.
- the electronic device shown in FIG. 6 is just an example, and should not bring any limitation to the functions and use scope of the embodiments of the present disclosure.
- the electronic device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which may be loaded into random access according to a program stored in a read only memory (ROM) 602 or from the storage device 606
- a processing device such as a central processing unit, a graphics processor, etc.
- the program in the memory (RAM) 603 performs various appropriate operations and processes.
- various programs and data necessary for the operation of the electronic device 600 are also stored.
- the processing device 601, ROM 602, and RAM 603 are connected to each other via a bus 604.
- An input / output (I / O) interface 605 is also connected to the bus 604.
- the following devices can be connected to the I / O interface 605: including input devices 606 such as touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc .; including, for example, liquid crystal display (LCD), speaker, vibration
- An output device 607 such as a storage device; includes a storage device 608 such as a magnetic tape, a hard disk, etc .; and a communication device 609.
- the communication device 609 may allow the electronic device 600 to perform wireless or wired communication with other devices to exchange data.
- FIG. 6 shows an electronic device 600 having various devices, it should be understood that it is not required to implement or have all the devices shown. More or fewer devices may be implemented or provided instead.
- the process described above with reference to the flowchart may be implemented as a computer software program.
- embodiments of the present disclosure include a computer program product that includes a computer program carried on a computer-readable medium, the computer program containing program code for performing the method shown in the flowchart.
- the computer program may be downloaded and installed from the network through the communication device 609, or from the storage device 608, or from the ROM 602.
- the processing device 601 the above-mentioned functions defined in the method of the embodiments of the present disclosure are executed.
- the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
- the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples of computer readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable removable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
- the computer-readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
- the computer-readable signal medium may include a data signal that is propagated in baseband or as part of a carrier wave, in which computer-readable program code is carried. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable signal medium may send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device .
- the program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: electric wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
- the computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
- the computer-readable medium carries one or more programs.
- the electronic device When the one or more programs are executed by the electronic device, the electronic device is caused to: acquire the recording data; and select the denoising from the pre-established denoising processing method set
- the processing method is used as the target denoising processing method; based on the target denoising processing method, the recording data is processed.
- the computer program code for performing the operations of the present disclosure can be written in one or more programming languages or a combination thereof.
- the above programming languages include object-oriented programming languages such as Java, Smalltalk, C ++, as well as conventional Procedural programming language-such as "C" language or similar programming language.
- the program code may be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, through an Internet service provider Internet connection).
- LAN local area network
- WAN wide area network
- Internet service provider Internet connection for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
- each block in the flowchart or block diagram may represent a module, program segment, or part of code that contains one or more logic functions Executable instructions.
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks represented in succession may actually be executed in parallel, and they may sometimes be executed in reverse order, depending on the functions involved.
- each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts can be implemented with dedicated hardware-based systems that perform specified functions or operations Or, it can be realized by a combination of dedicated hardware and computer instructions.
- the units described in the embodiments of the present disclosure may be implemented in software or hardware.
- the name of the unit does not constitute a limitation on the unit itself.
- the acquisition unit can also be described as a “unit for acquiring recording data”.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
本公开实施例公开了音频处理方法和装置。该方法的具体实施方式包括:获取录音数据;从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式;基于该目标去噪处理方式,对该录音数据进行处理。该实施方式提供了新的音频处理方式。
Description
本专利申请要求于2018年11月2日提交的、申请号为201811302472.7、申请人为北京微播视界科技有限公司、发明名称为“音频处理方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
本公开实施例涉及计算机技术领域,具体涉及音频处理方法和装置。
录音,也可以称为拾音,指把声音收集起来的过程。电子设备(例如终端)可以录音。录音可以得到录音数据,可以将录音数据直接作为放音数据。放音数据可以由采集录音数据的电子设备播放,也可以由其它电子设备播放。
在音频处理领域,通常需要对音频数据进行去噪处理。
发明内容
本公开实施例提出了音频处理方法和装置。
第一方面,本公开实施例提供了一种音频处理方法,该方法包括:获取录音数据;从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式;基于上述目标去噪处理方式,对上述录音数据进行处理。
第二方面,本公开实施例提供了一种音频处理装置,该装置包括:获取单元,被配置成获取录音数据;选取单元,被配置成从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式;处理单元,被配置成基于上述目标去噪处理方式,对上述录音数据进 行处理。
第三方面,本公开实施例提供了一种电子设备,该电子设备包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当上述一个或多个程序被上述一个或多个处理器执行时,使得上述一个或多个处理器实现如第一方面中任一实现方式描述的方法。
第四方面,本公开实施例提供了一种计算机可读介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现如第一方面中任一实现方式描述的方法。
本公开实施例提供的音频处理方法和装置,通过从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式,再基于上述目标去噪处理方式,对上述录音数据进行处理,技术效果至少可以包括:提供了一种新的音频处理方式。
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本公开的其它特征、目的和优点将会变得更明显:
图1是本公开的一些实施例可以应用于其中的示例性系统架构图;
图2是根据本公开的音频处理方法的一个实施例的流程图;
图3是根据本公开的音频处理方法的一个应用场景的示意图;
图4是根据本公开的音频处理方法的另一个应用场景的示意图;
图5是根据本公开的音频处理装置的一个实施例的结构示意图;
图6是适于用来实现本公开实施例的电子设备的计算机系统的结构示意图。
下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。
图1示出了可以应用本公开的音频处理方法或音频处理装置的实施例的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104可以是用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如录音类应用、通话类应用、直播类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有通信功能的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上的拾音功能支持的后台服务器。终端设备可以将拾音得到的原始录音数据进行打包得到音频处理请求,然后将音频处理请求发送至后台服务器。后台服务器可以对接收到的音频处理请求等数据进行分析等处理,并将处理结果(例如放音数据)反馈给终端设备。
需要说明的是,本公开实施例所提供的音频处理方法一般由终端设备101、102、103执行,相应地,音频处理装置一般设置于终端设 备101、102、103中。可选的,本公开实施例所提供的音频处理方法也可以由服务器执行,服务器可以接收终端设备发送的录音数据,然后执行本公开所示方法,最后将基于录音数据生成的放音数据发送给终端设备。
需要说明的是,服务器可以是硬件,也可以是软件。当服务器为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
请参考图2,其示出了音频处理方法的一个实施例的流程200。本实施例主要以该方法应用于有一定运算能力的电子设备中来举例说明,该电子设备可以是图1示出的终端设备。该音频处理方法,包括以下步骤:
步骤201,获取录音数据。
在本实施例中,音频处理方法的执行主体(例如图1所示的终端设备)可以获取录音数据。
在本实施例中,录音数据可以是上述执行主体或者其它电子设备采集的音频数据。上述执行主体可以直接采集或者从其它电子设备接收录音数据,以获取录音数据。
步骤202,从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式。
在本实施例中,上述执行主体可以从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式。
在本实施例中,去噪处理方式可以是用于去除噪声的处理方式。可以将目标声音之外的声音定义为噪声。例如,目标声音可以是人类语音,目标声音之外的声音(噪音)可以是街上的汽车声。再例如,目标声音可以是甲某人的语音,目标声音之外的声音(噪音)可以包 括乙某人的语音和街上的汽车声。
在本实施例中,去噪处理方式可以是去噪处理函数调用接口,也可以是打包好的去噪处理函数。
作为示例,去噪处理函数中可能包括滤波器、噪音判定阈值、频带选择参数等参数。
在本实施例中,上述去噪处理方式集合可以是去噪处理方式的集合。去噪处理方式集合中的去噪处理方式,不同点可以在于以下方面但不限于:滤波器、噪音判定阈值、频带选择参数等。
需要说明的,不同的去噪处理方式,可以具有不同的侧重点。例如,第一去噪处理方式可能去噪精度高一些,处理速度慢一些;第二去噪处理方式可能去噪精度低一些,处理速度快一些。
在本实施例中,可以通过各种方式,从上述去噪处理方式集合中,选取出目标去噪处理方式。
需要说明的是,从上述去噪处理方式集合中,选取出目标去噪处理方式,可以对于不同的电子设备,提供适配各种电子设备的去噪处理方式;或者,对于同一电子设备的不同音频采集时期(不同时期的去噪需求可能不同),提供适配当前时期的去噪处理方式。从而,可以实现自适应进行去噪处理,提高去噪处理的普适性和效率。
步骤203,基于目标去噪处理方式,对录音数据进行处理。
在本实施例中,上述执行主体可以基于步骤202选择处理的目标去噪处理方式,对上述录音数据进行处理。
在本实施例中,上述执行主体可以利用上述目标去噪处理方式,对上述录音数据进行处理。
继续参见图3,图3是根据图2所示实施例的音频处理方法的应用场景的一个示意图。在图3的应用场景中:
首先,终端301可以采集录音数据。
然后,从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式。
再后,终端301可以基于上述目标去噪处理方式,对上述录音数据进行处理。
最后,作为示例,终端301可以处理得到待放音数据,再由终端301读取上述待放音数据进行放音。
继续参见图4,图4是根据图2所示实施例的音频处理方法的应用场景的一个示意图。在图4的应用场景中:
首先,终端401可以采集录音数据。
然后,服务器402可以获取上述录音数据。
然后,上述服务器402可以从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式。
再后,上述服务器402可以基于上述目标去噪处理方式,对上述录音数据进行处理。
最后,作为示例,上述服务器402可以处理得到待放音数据,再将处理得到的待放音数据发送给终端403。再由终端403读取上述待放音数据进行放音。
本公开的上述实施例提供的方法,通过从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式,再基于上述目标去噪处理方式,对上述录音数据进行处理,技术效果至少可以包括:提供了一种新的音频处理方式。
在一些实施例中,步骤202可以通过以下方式实现:从上述去噪处理方式集合中,随机选取去噪处理方式,作为目标去噪处理方式。
在一些实施例中,步骤202可以通过以下方式实现:从上述去噪处理方式集合中,选取与目标场景类型对应的去噪处理方式,作为目标去噪处理方式。
需要说明的是,根据目标场景类型选取出目标去噪处理方式,可以根据录音数据采集自的场景,确定适合处理此录音数据的去噪处理方式。由此,可以通过更为合适的去噪处理方式对录音数据进行处理,以达到预期效果。作为示例,预期效果可能是处理精度高一些或者处理速度快一些。
在这里,上述去噪处理方式集合中的去噪处理方式与预定义的场景类型对应。
在这里,预定义的场景类型可以指示应用场景。应用场景可以从 不同的角度得出不同的分类。
作为示例,从噪声等级高低的角度,场景类型可以分为高噪声场景、中噪声场景和低噪声场景。从使用方式的角度,场景类型可以分为通话场景和唱歌场景(用户唱歌的声音再放出来)。
在这里,目标场景类型可以是录音数据采集自的场景所属的类型。
可选的,可以利用各种方式确定目标场景类型。
在本公开中,上述目标应用可以是调用电子设备的录音采集功能采集上述录音数据的应用。
在这里,调用录音采集功能的应用,可以是具有录音采集功能的应用,例如,通话类应用、唱歌类应用(采集用户唱歌的声音再放出来)。
可以理解,不同的应用,对于录音采集功能的需求可能不同。例如,通话类应用所需的去噪处理要求可能高一些,对语音的清晰度的要求可能高一些。唱歌类应用所需的去噪处理要求可能低一些。
在一些实施例中,上述目标场景类型可以通过以下步骤得到:根据场景类型与应用的对应关系,从预设的场景类型集合中,选取与目标应用对应的场景类型作为目标场景类型。
在这里,上述执行主体可以预先存储场景类型与应用的对应关系。作为示例,场景类型可以包括高噪声场景和低噪声场景;应用可以包括通话类应用和唱歌类应用。通话类应用可以与高噪声场景对应,唱歌类应用可以与低噪声场景对应。
在这里,根据场景类型与应用的对应关系,选取出目标应用类型,可以由上述执行主体执行,也可以由采集录音数据的电子设备执行。
需要说明的是,以目标应用作为确定场景类型的桥梁,可以利用目标应用通常情况下所在场景所具有的性质,从而,快速而准确地确定目标场景类型。
在一些实施例中,上述目标场景类型可以通过以下步骤得到:获取目标应用中预先设置的场景类型,以及将所获取的场景类型作为目标场景类型。
在这里,可以由应用使用者或者应用提供方,根据目标应用经常 使用的场景,设置场景类型。
需要说明的是,可以根据应用的类型(通话类还是唱歌类)和需求(实时性要求高还是低),提前为应用设置目标场景类型。从而,可以为应用确定出适合该应用的去噪处理方式。
在这里,获取目标应用中预先设置的场景类型作为目标应用类型,可以由上述执行主体执行,也可以由采集录音数据的电子设备执行。
在一些实施例中,上述目标场景类型通过以下步骤得到:根据上述录音数据,确定录音数据的目标噪音等级;根据预设的噪音等级和场景类型的对应关系,从预设的场景类型集合中,选取与目标噪音等级对应的场景类型作为目标场景类型。
在这里,可以选取录音数据的前端数据进行处理,确定噪音和目标声音的比例,从而确定录音数据中的噪音等级,以及将所确定的噪音等级确定为目标噪音等级。然后,根据噪音等级与场景类型的对应关系,选取出目标场景类型。
作为示例,噪音等级可以包括高噪音等级、中等噪音等级和低噪音等级。场景类型可以包括高噪音场景、中噪音场景和低噪音场景。高噪音等级对应高噪音场景,中等噪音等级对应中噪音场景,低噪音等级对应低噪音场景。
需要说明的是,对录音数据进行实时处理,确定噪音等级,再以噪音等级为桥梁,确定目标应用场景。可以贴合当前应用场景的噪音情况,实时而准确地确定目标场景类型。
在一些实施例中,上述录音数据可以包括基于上述目标电子设备的放音数据而产生的声音的回音数据。
作为示例,终端设备甲可以作为第一端,终端设备乙可以作为第二端。用户甲发出声音,终端设备甲采集得到第二端录音数据。终端设备甲或者服务器基于上述第二端录音数据,生成上述第一端放音数据。终端设备乙接收上述第一端放音数据,以及读取上述第一端放音数据进行放音。终端设备乙可以采集所在空间的声音,得到第一端录音数据。可以理解,由于终端设备乙基于第一端放音数据进行放音时,声音传递到终端设备乙所在的空间,终端设备乙采集到的第一端录音 数据包括基于第一端放音数据的声音。
在这里,基于上述第一端放音而产生的声音在空间中进行传播,采集传播后的声音而形成的音频数据可以称为回音数据。可以理解,上述回音数据与上述第一端放音数据,具有一定程度的相似性但是不相同;例如,语义相同但是语音大小不同。
在一些实施例中,上述步骤203可以包括利用上述目标去噪处理方式,对上述录音数据进行处理,生成第一中间数据;利用预设的回声消除处理方式,消除上述第一中间数据中的回音数据,生成第二中间数据;基于上述第二中间数据,生成待放音数据。
在一些实施例中,回音消除处理方式原理如下:获取第一端放音数据和第一端录音数据;从上述第一端录音数据中,确定与上述第一端放音数据匹配的目标数据段;根据上述目标数据段的采集开始时间,确定上述第一端放音数据相对于上述第一端录音数据的延迟时间;根据上述延迟时间,消除上述第一端录音数据中的上述回音数据;其中,上述第一端放音数据基于第二端录音数据生成,第一端录音数据包括基于上述第一端放音数据而产生的声音的回音数据。
上述执行主体可以根据上述延迟时间,消除上述第一端录音数据中的上述回音数据。在这里,消除上述第一端录音数据中的上述回音数据的实现原理如下:采集第一端录音数据的时间向后推移上述延迟时间,可以确定采集回音数据的回音数据采集开始时间。在第一端录音数据中,找到回音数据采集开始时间的位置。从此位置之后的第一端录音数据中,减去回音数据,可以消除第一端录音数据中的上述回音数据。作为示例,可以预先生成以回音数据为自变量、以第一端录音数据为因变量的函数。利用此函数求取回音数据。
在一些实施例中,上述基于上述第二中间数据,生成待放音数据可以包括基于上述第二中间数据,生成待放音数据。
需要说明的是,在回音消除处理之后,可能还会有一些噪音不能去除,因此,在回音消除方式之后,再设置一次去噪处理,可以进一步去除噪声,提高音质。
在一些实施例中,上述基于上述第二中间数据,生成待放音数据 可以利用各种处理方式进行处理,处理方式可以包括但不限于:自动增益控制、时频转换、音量限幅等。
进一步参考图5,作为对上述各图所示方法的实现,本公开提供了一种音频处理装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本实施例的音频处理装置500包括:获取单元501、选取单元502和处理单元503。其中,获取单元,被配置成获取录音数据;选取单元,被配置成从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式;处理单元,被配置成基于上述目标去噪处理方式,对上述录音数据进行处理。
在本实施例中,音频处理装置500的获取单元501、选取单元502和处理单元503的具体处理及其所带来的技术效果可分别参考图2对应实施例中步骤201、步骤202和步骤203的相关说明,在此不再赘述。
在本实施例的一些可选的实现方式中,上述选取单元,还被配置成:从上述去噪处理方式集合中,选取与目标场景类型对应的去噪处理方式,作为目标去噪处理方式;其中,上述去噪处理方式集合中的去噪处理方式与预定义的场景类型对应,目标场景类型是上述录音数据采集自的场景所属的类型。
在本实施例的一些可选的实现方式中,上述目标场景类型通过以下步骤得到:根据场景类型与应用的对应关系,从预设的场景类型集合中,选取与目标应用对应的场景类型作为目标场景类型;其中,上述目标应用是调用电子设备的录音采集功能采集上述录音数据的应用。
在本实施例的一些可选的实现方式中,上述目标场景类型通过以下步骤得到:获取目标应用中预先设置的场景类型,以及将所获取的场景类型确定为上述目标场景类型;其中,上述目标应用是调用电子设备的录音采集功能采集上述录音数据的应用。
在本实施例的一些可选的实现方式中,上述目标场景类型通过以 下步骤得到:根据上述录音数据,确定录音数据的目标噪音等级;根据预设的噪音等级和场景类型的对应关系,从预设的场景类型集合中,选取与目标噪音等级对应的场景类型作为目标场景类型。
在本实施例的一些可选的实现方式中,上述录音数据包括基于上述目标电子设备的放音数据而产生的声音的回音数据;以及上述处理单元,还被配置成:利用上述目标去噪处理方式,对上述录音数据进行处理,生成第一中间数据;利用预设的回声消除处理方式,消除上述第一中间数据中的回音数据,生成第二中间数据;基于上述第二中间数据,生成待放音数据。
在本实施例的一些可选的实现方式中,上述处理单元,还被配置成:基于上述目标去噪处理方式,对上述第二中间数据进行处理,生成待放音数据。
需要说明的是,本公开实施例提供的音频处理装置中各单元的实现细节和技术效果可以参考本公开中其它实施例的说明,在此不再赘述。
下面参考图6,其示出了适于用来实现本公开实施例的电子设备(例如图1中的终端或服务器)600的结构示意图。图6示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图6所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置606加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607; 包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于: 电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取录音数据;从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式;基于上述目标去噪处理方式,对上述录音数据进行处理。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下 并不构成对该单元本身的限定,例如,获取单元还可以被描述为“获取录音数据的单元”。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
Claims (16)
- 一种音频处理方法,包括:获取录音数据;从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式;基于所述目标去噪处理方式,对所述录音数据进行处理。
- 根据权利要求1所述的方法,其中,所述从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式,包括:从所述去噪处理方式集合中,选取与目标场景类型对应的去噪处理方式,作为目标去噪处理方式;其中,所述去噪处理方式集合中的去噪处理方式与预定义的场景类型对应,目标场景类型是所述录音数据采集自的场景所属的类型。
- 根据权利要求2所述的方法,其中,所述目标场景类型通过以下步骤得到:根据场景类型与应用的对应关系,从预设的场景类型集合中,选取与目标应用对应的场景类型作为目标场景类型;其中,所述目标应用是调用电子设备的录音采集功能采集所述录音数据的应用。
- 根据权利要求2所述的方法,其中,所述目标场景类型通过以下步骤得到:获取目标应用中预先设置的场景类型,以及将所获取的场景类型确定为所述目标场景类型;其中,所述目标应用是调用电子设备的录音采集功能采集所述录音数据的应用。
- 根据权利要求2所述的方法,其中,所述目标场景类型通过以 下步骤得到:根据所述录音数据,确定录音数据的目标噪音等级;根据预设的噪音等级和场景类型的对应关系,从预设的场景类型集合中,选取与目标噪音等级对应的场景类型作为目标场景类型。
- 根据权利要求1-5中任一项所述的方法,其中,所述录音数据包括基于所述目标电子设备的放音数据而产生的声音的回音数据;以及所述基于所述目标去噪处理方式,对所述录音数据进行处理,包括:利用所述目标去噪处理方式,对所述录音数据进行处理,生成第一中间数据;利用预设的回声消除处理方式,消除所述第一中间数据中的回音数据,生成第二中间数据;基于所述第二中间数据,生成待放音数据。
- 根据权利要求6所述的方法,其中,所述基于所述第二中间数据,生成待放音数据,包括:基于所述目标去噪处理方式,对所述第二中间数据进行处理,生成待放音数据。
- 一种音频处理装置,包括:获取单元,被配置成获取录音数据;选取单元,被配置成从预先建立的去噪处理方式集合中,选取去噪处理方式作为目标去噪处理方式;处理单元,被配置成基于所述目标去噪处理方式,对所述录音数据进行处理。
- 根据权利要求8所述的装置,其中,所述选取单元,还被配置成:从所述去噪处理方式集合中,选取与目标场景类型对应的去噪处理方式,作为目标去噪处理方式;其中,所述去噪处理方式集合中的去噪处理方式与预定义的场景类型对应,目标场景类型是所述录音数据采集自的场景所属的类型。
- 根据权利要求9所述的装置,其中,所述目标场景类型通过以下步骤得到:根据场景类型与应用的对应关系,从预设的场景类型集合中,选取与目标应用对应的场景类型作为目标场景类型;其中,所述目标应用是调用电子设备的录音采集功能采集所述录音数据的应用。
- 根据权利要求9所述的装置,其中,所述目标场景类型通过以下步骤得到:获取目标应用中预先设置的场景类型,以及将所获取的场景类型确定为所述目标场景类型;其中,所述目标应用是调用电子设备的录音采集功能采集所述录音数据的应用。
- 根据权利要求9所述的装置,其中,所述目标场景类型通过以下步骤得到:根据所述录音数据,确定录音数据的目标噪音等级;根据预设的噪音等级和场景类型的对应关系,从预设的场景类型集合中,选取与目标噪音等级对应的场景类型作为目标场景类型。
- 根据权利要求8-12中任一项所述的装置,其中,所述录音数据包括基于所述目标电子设备的放音数据而产生的声音的回音数据;以及所述处理单元,还被配置成:利用所述目标去噪处理方式,对所述录音数据进行处理,生成第 一中间数据;利用预设的回声消除处理方式,消除所述第一中间数据中的回音数据,生成第二中间数据;基于所述第二中间数据,生成待放音数据。
- 根据权利要求13所述的装置,其中,所述处理单元,还被配置成:基于所述目标去噪处理方式,对所述第二中间数据进行处理,生成待放音数据。
- 一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-7中任一所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811302472.7 | 2018-11-02 | ||
CN201811302472.7A CN111145770B (zh) | 2018-11-02 | 2018-11-02 | 音频处理方法和装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020087788A1 true WO2020087788A1 (zh) | 2020-05-07 |
Family
ID=70462909
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/072945 WO2020087788A1 (zh) | 2018-11-02 | 2019-01-24 | 音频处理方法和装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111145770B (zh) |
WO (1) | WO2020087788A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115050384A (zh) * | 2022-05-10 | 2022-09-13 | 广东职业技术学院 | 一种户外直播中背景音降噪方法、设备和系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150162047A1 (en) * | 2013-12-10 | 2015-06-11 | Joseph J. Lacirignola | Methods and apparatus for recording impulsive sounds |
CN104991754A (zh) * | 2015-06-29 | 2015-10-21 | 小米科技有限责任公司 | 录音方法及装置 |
CN105551517A (zh) * | 2015-12-10 | 2016-05-04 | 深圳市中易腾达科技股份有限公司 | 一种具有应用场景识别控制的无线传输录音笔及录音系统 |
US20170223453A1 (en) * | 2014-10-21 | 2017-08-03 | Olympus Corporation | First recording device, second recording device, recording system, first recording method, second recording method, first computer program product, and second computer program product |
CN108022591A (zh) * | 2017-12-30 | 2018-05-11 | 北京百度网讯科技有限公司 | 车内环境中语音识别的处理方法、装置和电子设备 |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
HUP0003010A2 (en) * | 2000-07-31 | 2002-08-28 | Herterkom Gmbh | Signal purification method for the discrimination of a signal from background noise |
CN101667426A (zh) * | 2009-09-23 | 2010-03-10 | 中兴通讯股份有限公司 | 一种消除环境噪声的装置及方法 |
CN102131014A (zh) * | 2010-01-13 | 2011-07-20 | 歌尔声学股份有限公司 | 时频域联合回声消除装置及方法 |
JP5561195B2 (ja) * | 2011-02-07 | 2014-07-30 | 株式会社Jvcケンウッド | ノイズ除去装置およびノイズ除去方法 |
CN103348408B (zh) * | 2011-02-10 | 2015-11-25 | 杜比实验室特许公司 | 噪声和位置外信号的组合抑制方法和系统 |
US9595997B1 (en) * | 2013-01-02 | 2017-03-14 | Amazon Technologies, Inc. | Adaption-based reduction of echo and noise |
CN103617797A (zh) * | 2013-12-09 | 2014-03-05 | 腾讯科技(深圳)有限公司 | 一种语音处理方法,及装置 |
CN104036786B (zh) * | 2014-06-25 | 2018-04-27 | 青岛海信电器股份有限公司 | 一种语音降噪的方法及装置 |
CN105719644A (zh) * | 2014-12-04 | 2016-06-29 | 中兴通讯股份有限公司 | 一种自适应调整语音识别率的方法及装置 |
CN104575510B (zh) * | 2015-02-04 | 2018-08-24 | 深圳酷派技术有限公司 | 降噪方法、降噪装置和终端 |
CN105554234B (zh) * | 2015-09-23 | 2019-08-02 | 宇龙计算机通信科技(深圳)有限公司 | 一种消噪处理的方法、装置和终端 |
WO2017136587A1 (en) * | 2016-02-02 | 2017-08-10 | Dolby Laboratories Licensing Corporation | Adaptive suppression for removing nuisance audio |
CN106910511B (zh) * | 2016-06-28 | 2020-08-14 | 阿里巴巴集团控股有限公司 | 一种语音去噪方法和装置 |
CN106572411A (zh) * | 2016-09-29 | 2017-04-19 | 乐视控股(北京)有限公司 | 降噪控制方法及相关装置 |
CN108461089A (zh) * | 2016-12-09 | 2018-08-28 | 青岛璐琪信息科技有限公司 | 基于流媒体技术的视频综合系统 |
CN108257617B (zh) * | 2018-01-11 | 2021-01-19 | 会听声学科技(北京)有限公司 | 一种噪声场景识别系统及方法 |
-
2018
- 2018-11-02 CN CN201811302472.7A patent/CN111145770B/zh active Active
-
2019
- 2019-01-24 WO PCT/CN2019/072945 patent/WO2020087788A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150162047A1 (en) * | 2013-12-10 | 2015-06-11 | Joseph J. Lacirignola | Methods and apparatus for recording impulsive sounds |
US20170223453A1 (en) * | 2014-10-21 | 2017-08-03 | Olympus Corporation | First recording device, second recording device, recording system, first recording method, second recording method, first computer program product, and second computer program product |
CN104991754A (zh) * | 2015-06-29 | 2015-10-21 | 小米科技有限责任公司 | 录音方法及装置 |
CN105551517A (zh) * | 2015-12-10 | 2016-05-04 | 深圳市中易腾达科技股份有限公司 | 一种具有应用场景识别控制的无线传输录音笔及录音系统 |
CN108022591A (zh) * | 2017-12-30 | 2018-05-11 | 北京百度网讯科技有限公司 | 车内环境中语音识别的处理方法、装置和电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN111145770B (zh) | 2022-11-22 |
CN111145770A (zh) | 2020-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016180100A1 (zh) | 一种音频处理的性能提升方法及装置 | |
JP6706633B2 (ja) | 通話音質改善のためのシステムおよび方法 | |
CN111435600B (zh) | 用于处理音频的方法和装置 | |
CN110931035A (zh) | 音频处理方法、装置、设备及存储介质 | |
CN112750452A (zh) | 语音处理方法、装置、系统、智能终端以及电子设备 | |
JP2022095689A (ja) | 音声データノイズ低減方法、装置、機器、記憶媒体及びプログラム | |
CN110096250B (zh) | 一种音频数据处理方法、装置、电子设备及存储介质 | |
WO2020087788A1 (zh) | 音频处理方法和装置 | |
CN112423019B (zh) | 调整音频播放速度的方法、装置、电子设备及存储介质 | |
WO2022227625A1 (zh) | 信号处理方法及装置 | |
CN113382119B (zh) | 消除回声的方法、装置、可读介质和电子设备 | |
CN114979344A (zh) | 回声消除方法、装置、设备及存储介质 | |
CN111147655B (zh) | 模型生成方法和装置 | |
CN114743571A (zh) | 一种音频处理方法、装置、存储介质及电子设备 | |
CN114121050A (zh) | 音频播放方法、装置、电子设备和存储介质 | |
CN111179970B (zh) | 音视频处理方法、合成方法、装置、电子设备及存储介质 | |
CN111145776B (zh) | 音频处理方法和装置 | |
CN111145769A (zh) | 音频处理方法和装置 | |
CN111210837B (zh) | 音频处理方法和装置 | |
CN111145792B (zh) | 音频处理方法和装置 | |
CN112307161A (zh) | 用于播放音频的方法和装置 | |
CN111145793B (zh) | 音频处理方法和装置 | |
CN110138991B (zh) | 回音消除方法和装置 | |
CN111131860A (zh) | 一种音视频播放方法、装置、设备及介质 | |
CN113241086B (zh) | 音频处理方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19878753 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.08.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19878753 Country of ref document: EP Kind code of ref document: A1 |