Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows an
exemplary system architecture 100 to which embodiments of the audio processing method or audio processing apparatus of the present disclosure may be applied.
As shown in fig. 1, the
system architecture 100 may include
terminal devices 101, 102, 103, a
network 104, and a
server 105. The
network 104 may be a medium to provide communication links between the
terminal devices 101, 102, 103 and the
server 105.
Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the
terminal devices 101, 102, 103 to interact with the
server 105 via the
network 104 to receive or send messages or the like. Various communication client applications, such as a recording application, a call application, a live application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the
terminal devices 101, 102, and 103.
The
terminal apparatuses 101, 102, and 103 may be hardware or software. When the
terminal devices 101, 102, and 103 are hardware, they may be various electronic devices with communication functions, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg Audio Layer 4), laptop portable computers, desktop computers, and the like. When the
terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The
server 105 may be a server that provides various services, such as a background server that supports the sound pickup function on the
terminal apparatuses 101, 102, 103. The terminal device can package the original audio data obtained by pickup to obtain an audio processing request, and then sends the audio processing request to the background server. The background server can analyze and process the received data such as the audio processing request and feed back the processing result (such as playback data) to the terminal equipment.
It should be noted that the audio processing method provided by the embodiment of the present disclosure is generally executed by the
terminal devices 101, 102, and 103, and accordingly, the audio processing apparatus is generally disposed in the
terminal devices 101, 102, and 103. Optionally, the audio processing method provided in the embodiment of the present disclosure may also be executed by a server, where the server may receive the first recording data sent by the terminal device, then execute the method disclosed in the present disclosure, and finally send playback data generated based on the first recording data to the terminal device.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring to fig. 2, a
flow 200 of one embodiment of an audio processing method is shown. The embodiment is mainly exemplified by applying the method to an electronic device with certain computing capability, and the electronic device may be the terminal device shown in fig. 1. The audio processing method comprises the following steps:
step 201, acquiring first sound recording data.
In the present embodiment, the execution subject of the audio processing method (e.g., the terminal device shown in fig. 1) may acquire the first recording data.
In this embodiment, the first recording data may be audio data collected by the execution subject or other electronic device. The executing body can directly collect or receive the first recording data from other electronic equipment to obtain the first recording data.
In this embodiment, the first recording data may be audio data collected by one or more sound pickup apparatuses. For example, the sound recording devices may be audio data collected by one sound recording device, audio data collected by two sound recording devices, or audio data collected by three or an array of sound recording devices.
Step 202, generating second sound recording data according to the first sound recording data.
In this embodiment, the execution main body may generate second sound recording data according to the first sound recording data.
In this embodiment, the second sound recording data may include first direction data, second direction data and omnidirectional data.
Here, the first direction and the second direction may have a certain included angle therebetween, and the included angle may be preset according to actual conditions. The first direction is a direction corresponding to the first direction data, and the second direction is a direction corresponding to the second direction data.
As an example, the first direction and the second direction may be perpendicular to each other. For convenience of explanation, the present disclosure will be described below by taking the vertical between the first direction and the second direction as an example, and it should be noted that the description in the vertical case should not be construed as a limitation on the relationship between the first direction and the second direction.
Optionally, the pickup equipment combination can include left recording equipment and right recording equipment, can be based on the audio data and the predetermined direction reconfiguration matrix of the audio data and the right recording equipment collection of doing left recording equipment collection, generate the second recording data.
Optionally, the sound pickup apparatus combination may include a left sound recording apparatus, a right sound recording apparatus, and a loop sound recording apparatus. The first recording data collected by this pickup assembly may include left recording device recording data (denoted as left _ mic), right recording device recording data (denoted as right _ mic), and loop recording device recording data (denoted as sound _ mic).
Alternatively, the second recording data may be generated by reconfiguring the matrix according to the left recording apparatus recording data, the right recording apparatus recording data, the loop recording apparatus recording data, and a predetermined direction.
As an example, the second sound recording data may be computationally generated using the following formula.
W=A00*left_mic+A11*right_mic+A22*sorround_mic
X=A01*left_mic+A10*right_mic+A20*sorround_mic
Y=A02*left_mic+A12*right_mic+A21*surround_mic
Wherein: w represents omnidirectional data, X represents first directional data, and Y represents second directional data; denotes multiplication; a00, a11, a22, a01, a10, a20, a02, a12, and a21 are elements in the matrix (1) shown below. The matrix (1) is as follows:
optionally, values of elements in the matrix (1) may be set according to actual conditions. As an example, the value of the matrix (1) may be as shown in the matrix (2), and the matrix (2) may be as follows:
it should be noted that the second recording data has clear directional correspondence as an intermediate form of audio processing, so that the second recording data in each direction can be processed separately in subsequent audio processing (for example, noise reduction, etc.), thereby improving the audio processing effect and obtaining data to be played with higher tone quality.
And step 203, generating data to be played based on the second recording data.
In this embodiment, the execution body may generate data to be played based on the second recording data.
In this embodiment, the data to be played back may be audio data in a time domain.
In this embodiment, the second recording data may be processed by various processing methods to generate data to be played.
As an example, the processing means may include, but is not limited to: performing time-frequency transformation on the second recording data in the time domain form to obtain frequency spectrum data; processing the spectral data with at least one of the following speech signals: noise removal, automatic gain control and echo cancellation; and transforming the frequency spectrum data after the voice signal processing to a time domain to be used as data to be played.
In this embodiment, the playback device may be an electronic device that plays back data to be played back later. As an example, the execution main body is a terminal device a, and the playback device may be a terminal device b. In an application scenario of a call between the terminal device a and the terminal device b, the playback device may be the terminal device b that is to play the data to be played later.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the audio processing method according to the embodiment shown in fig. 2. In the application scenario of fig. 3:
first, the terminal 301 may capture a voice uttered by a user to acquire first recording data.
Then, the terminal 301 may generate second audio record data according to the first audio record data. The second sound recording data includes first direction data, second direction data and omnidirectional data.
Then, the terminal 301 may generate data to be played based on the second recording data. In the application scenario, the playback device may be the terminal 301.
Finally, the terminal 301 can read the playback data for playback.
With continued reference to fig. 4, fig. 4 is a schematic diagram of another application scenario of the audio processing method according to the embodiment shown in fig. 2. In the application scenario of fig. 4:
first, the terminal 401 may collect voice uttered by the user to acquire first recording data.
Then, the terminal 401 may transmit the first sound recording data to the
server 402.
Then, the
server 402 may generate second sound recording data according to the first sound recording data.
Then, the
server 402 may generate data to be played based on the second recording data. In the present application scenario, the playback device may be the terminal 403.
Finally, the
server 402 can send the data to be played to the terminal 403, and the terminal 403 plays the data.
The method provided by the foregoing embodiment of the present disclosure generates the second recording data by performing directional reconfiguration on the first recording data, and generates the data to be played based on the second recording data, where the technical effects at least include: a new audio processing approach is provided.
With further reference to fig. 5, a
flow 500 of yet another embodiment of an audio processing method is shown. The
process 500 of the audio processing method includes the following steps:
step 501, obtaining first recording data.
In the present embodiment, the execution subject of the audio processing method (e.g., the terminal device shown in fig. 1) may acquire the first recording data.
Step 502, generating second sound recording data according to the first sound recording data.
In this embodiment, please refer to the description of
step 201 and step 202 in the embodiment shown in fig. 2 for details and technical effects of
step 501 and step 502, which are not described herein again.
In some embodiments, before the
step 503, the second sound recording data may be further subjected to high-pass filtering to generate new second sound recording data. The new second audio record data is used as a basis for the directional extension in
step 503.
It should be noted that, through the high-pass filtering, the low-frequency component in the second recording data can be filtered out, so as to implement the preliminary noise reduction.
In
step 503, based on the second recording data, the sound field direction is expanded.
In this embodiment, the execution body may perform sound field direction expansion based on the second recording data.
In the present embodiment, the sound field direction expansion may be used to expand data corresponding to more directions (relative to the second sound recording data).
It should be noted that, by expanding the sound field direction, audio data corresponding to more directions can be expanded, and the spatial resolution of the audio can be improved, so that the spatial surround feeling of the audio can be improved, and a preparation for playing back by using a plurality of playback devices arranged in space can be made.
Here, the number of directions of the third sound recording data may be one or more than one with respect to the number of directions of the second sound recording data.
As an example, the data in the second sound recording data corresponds to a first direction and a second direction, and the data in the third data may correspond to the first direction, the second direction, and the third direction. The third direction may point to an angular bisector of the first direction and the second direction.
As an example, the data in the second sound recording data corresponds to a first direction and a second direction, and the data in the third data may correspond to the first direction, the second direction, a fourth direction, and a fifth direction. The fourth and fifth directions may point in the 30 degree direction and the 60 degree direction, respectively, i.e. the fourth and fifth directions bisect the angle between the first and second directions.
Step 504, generating third recording data according to the result of the sound field direction expansion.
In this embodiment, the execution body may generate the third sound recording data according to the result of the expansion according to the sound field direction of
step 503.
Alternatively, the result of generating the directional extension may be used as the third sound recording data.
Optionally, the direction of the result of generating the direction extension may be further fine-tuned, and the fine-tuned result is used as the third recording data. As an example, the fine tuning of the direction may be achieved by adding a time delay to the first direction data and the second direction data.
And 505, processing the third recording data by using a preset audio processing mode to generate data to be played.
In this embodiment, the execution main body may process the third recording data by using a preset audio processing method to generate data to be played.
As an example, the preset audio processing manner may include, but is not limited to: performing time-frequency transformation on the third recording data in the time domain form to obtain frequency spectrum data; processing the spectral data with at least one of the following speech signals: noise removal, automatic gain control and echo cancellation; and transforming the frequency spectrum data after the voice signal processing to a time domain to be used as data to be played.
As can be seen from fig. 5, compared with the embodiment corresponding to fig. 2, the
flow 500 of the audio processing method in the present embodiment highlights the step of performing directional expansion on the second sound recording data. Therefore, the technical effects of the solution described in this embodiment at least include: a new audio processing approach is provided.
With further reference to fig. 6, a
flow 600 of yet another embodiment of an audio processing method is shown. The
flow 600 of the audio processing method comprises the following steps:
step 601, acquiring first recording data.
In the present embodiment, the execution subject of the audio processing method (e.g., the terminal device shown in fig. 1) may acquire the first recording data.
Step 602, generating second sound recording data according to the first sound recording data.
In this embodiment, please refer to the description of
step 201 and step 202 in the embodiment shown in fig. 2 for details and technical effects of
step 601 and step 602, which are not described herein again.
Step 603, based on the second recording data, performing sound field direction compression to generate fourth recording data.
In this embodiment, the executing entity may perform sound field direction compression based on the second sound recording data to generate fourth sound recording data.
In this embodiment, directional compression may be used to compress data to a corresponding fewer direction (relative to the second audio record data).
For example, the first sound recording data includes audio data corresponding to three directions. The fourth sound recording data may include audio data corresponding to one direction.
It should be noted that the second recording data is directionally compressed to generate fourth recording data, and then data to be played is generated based on the fourth recording data, so that the playback requirements of fewer channels can be adapted.
Optionally,
step 603 may be implemented by: and averaging or weighted averaging the first direction data, the second direction data and the omnidirectional data in the second sound recording data to obtain fourth sound recording data corresponding to a single channel.
Optionally,
step 603 may be implemented by: determining a speech energy value of an audio frame in the second recording data; selecting an audio frame in the second recording data according to the voice energy value; extracting the omnidirectional data corresponding to the selected audio frame to generate the fourth recording data.
And step 604, processing the fourth recording data by using a preset audio processing mode to generate data to be played.
In this embodiment, the execution main body may process the fourth recording data by using a preset audio processing method to generate data to be played.
As an example, the preset audio processing manner may include, but is not limited to: performing time-frequency transformation on the fourth sound recording data in the time domain form to obtain frequency spectrum data; processing the spectral data with at least one of the following speech signals: noise removal, automatic gain control and echo cancellation; and transforming the frequency spectrum data after the voice signal processing to a time domain to be used as data to be played.
As can be seen from fig. 6, compared with the embodiment corresponding to fig. 2, the
flow 600 of the audio processing method in the present embodiment highlights the step of performing directional compression on the second sound recording data. Therefore, the technical effects of the solution described in this embodiment at least include: a new audio processing approach is provided.
With further reference to fig. 7, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an audio processing apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 7, the
audio processing apparatus 700 of the present embodiment includes: an
acquisition unit 701, a
first generation unit 702, and a second generation unit 70. Wherein the acquisition unit is configured to acquire first sound recording data; a first generating unit configured to generate second sound recording data based on the first sound recording data; a second generating unit configured to generate data to be played back based on the second recording data; wherein: the second recording data includes first direction data, second direction data and omnidirectional data.
In this embodiment, specific processes of the obtaining
unit 701, the
processing unit 702, the determining
unit 703 and the adjusting unit 704 of the
audio processing apparatus 700 and technical effects brought by the specific processes can refer to related descriptions of
step 201,
step 202,
step 203 and step 204 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional implementations of this embodiment, the second generating unit is further configured to: performing sound field direction expansion based on the second recording data; generating third recording data according to the result of the sound field direction expansion; and processing the third recording data by using a preset audio processing mode to generate data to be played.
In some optional implementations of this embodiment, the second generating unit is configured to: and carrying out high-pass filtering on the second sound recording data to generate new second sound recording data.
In some optional implementations of this embodiment, the second generating unit is configured to: and performing direction fine adjustment on the result of the sound field direction expansion, and taking the fine adjustment result as third recording data.
In some optional implementations of this embodiment, the second generating unit is configured to: based on the second recording data, performing sound field direction compression to generate fourth recording data; and processing the fourth recording data by using a preset audio processing mode to generate data to be played.
In some optional implementations of this embodiment, the second generating unit is further configured to: determining the voice energy value of the audio frame in the second recording data; selecting an audio frame in the second recording data according to the voice energy value; extracting the omnidirectional data corresponding to the selected audio frame to generate the fourth recording data.
In some optional implementation manners of this embodiment, the first recording data includes first recording data that may include left recording apparatus recording data, right recording apparatus recording data, and loop recording apparatus recording data; and the first generating unit is further configured to: and generating the second recording data according to the left recording device recording data, the right recording device recording data, the annular recording device recording data and a preset direction reconfiguration matrix.
It should be noted that details of implementation and technical effects of each unit in the audio processing apparatus provided in the embodiment of the present disclosure may refer to descriptions of other embodiments in the present disclosure, and are not described herein again.
Referring now to fig. 8, a schematic diagram of an electronic device (e.g., a terminal or server of fig. 1) 800 suitable for implementing embodiments of the present disclosure is shown. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 8, an
electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the
RAM 803, various programs and data necessary for the operation of the
electronic apparatus 800 are also stored. The
processing apparatus 801, the
ROM 802, and the
RAM 803 are connected to each other by a
bus 804. An input/output (I/O)
interface 805 is also connected to
bus 804.
Generally, the following devices may be connected to the I/O interface 805:
input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, recording device, accelerometer, gyroscope, etc.;
output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like;
storage 808 including, for example, magnetic tape, hard disk, etc.; and a
communication device 809. The communication means 809 may allow the
electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an
electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the
ROM 802. The computer program, when executed by the
processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring first recording data; generating second sound recording data according to the first sound recording data; generating data to be played based on the second recording data; wherein: the second recording data includes first direction data, second direction data and omnidirectional data.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Here, the name of the unit does not constitute a limitation of the unit itself in some cases, and for example, the acquisition unit may also be described as "a unit that acquires first sound recording data".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.