CN116825128A - Audio processing method and device, computer readable storage medium and electronic equipment - Google Patents

Audio processing method and device, computer readable storage medium and electronic equipment Download PDF

Info

Publication number
CN116825128A
CN116825128A CN202310803719.8A CN202310803719A CN116825128A CN 116825128 A CN116825128 A CN 116825128A CN 202310803719 A CN202310803719 A CN 202310803719A CN 116825128 A CN116825128 A CN 116825128A
Authority
CN
China
Prior art keywords
spatial
audio
audio signal
initial
estimated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310803719.8A
Other languages
Chinese (zh)
Inventor
曹健
张灵鲲
许逸君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202310803719.8A priority Critical patent/CN116825128A/en
Publication of CN116825128A publication Critical patent/CN116825128A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

The disclosure relates to the technical field of audio processing, in particular to an audio processing method and device, a computer readable storage medium and electronic equipment, wherein the method comprises the following steps: acquiring an initial audio signal acquired by a dual-channel audio acquisition device, and extracting spatial characteristics of the initial audio signal to obtain spatial characteristic data corresponding to the initial audio signal; determining estimated audio signals for a plurality of set directions in space based on the spatial feature data; and processing the estimated audio signals in a plurality of set directions according to the first decoding matrix and the corresponding related transfer function of the binaural audio acquisition equipment so as to obtain a spatial audio signal. The technical scheme of the embodiment of the disclosure reduces the acquisition difficulty of the spatial audio signal.

Description

Audio processing method and device, computer readable storage medium and electronic equipment
Technical Field
The disclosure relates to the technical field of audio processing, and in particular relates to an audio processing method and device, a computer readable storage medium and electronic equipment.
Background
With the development of science and technology, spatial audio has been attracting attention in recent years, and spatial information and azimuth information of sound have been attracting more and more attention.
However, the spatial audio content is mainly dependent on professional sound mixer production or acquisition of professional sound field microphones, and the production threshold is high. The lack of spatial audio content makes it difficult for the public to experience spatial audio technology.
Disclosure of Invention
The disclosure aims to provide an audio processing method, an audio processing device, a computer readable medium and an electronic device, so as to reduce the acquisition difficulty of a spatial audio signal at least to a certain extent.
According to a first aspect of the present disclosure, there is provided an audio processing method, comprising: acquiring an initial audio signal acquired by a dual-channel audio acquisition device, and extracting spatial characteristics of the initial audio signal to obtain spatial characteristic data corresponding to the initial audio signal; determining estimated audio signals for a plurality of set directions in space based on the spatial feature data; and processing the estimated audio signals in a plurality of set directions according to the first decoding matrix and the corresponding related transfer function of the binaural audio acquisition equipment so as to obtain a spatial audio signal.
According to a second aspect of the present disclosure, there is provided an audio processing apparatus comprising: the audio acquisition module is used for acquiring an initial audio signal acquired by the double-channel audio acquisition equipment and extracting spatial characteristics of the initial audio signal to obtain spatial characteristic data corresponding to the initial audio signal; an audio estimation module for determining estimated audio signals for a plurality of set directions in space based on the spatial feature data; and the audio processing module is used for processing the estimated audio signals in a plurality of set directions according to the first decoding matrix and the corresponding related transfer function of the binaural audio acquisition equipment so as to obtain the spatial audio signals.
According to a third aspect of the present disclosure, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method described above.
According to a fourth aspect of the present disclosure, there is provided an electronic apparatus, comprising: one or more processors; and a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the methods described above.
According to the audio processing method provided by the embodiment of the disclosure, on one hand, the initial audio signals acquired by the dual-channel audio acquisition equipment are acquired, and the spatial feature extraction is performed on the initial audio signals so as to obtain the spatial feature data corresponding to the initial audio signals; and based on the space characteristic data, the estimated audio signals of a plurality of set directions in the space are determined, so that multichannel space audio is generated, the space audio playback experience of a plurality of visual angles can be carried out, the sound source object recorded in the recording is ensured to be kept in the correct space orientation during the space audio playback, and the immersion of the user is improved. On the other hand, the space audio signal can be obtained through the initial audio signal acquired by the double-channel audio acquisition equipment without purchasing professional sound field acquisition equipment, so that the cost for obtaining the space audio signal is reduced. In still another aspect, the estimated audio signals in the plurality of set directions are processed by using a relevant transfer function and a first decoding matrix corresponding to the binaural audio acquisition device, so as to obtain the spatial audio signals, and the corresponding spatial audio signals are obtained by using the spatial sound field data and the first decoding matrix, so that professional production of a sound mixer is not needed, the production threshold of the spatial audio signals is reduced, and the application range of converting the binaural signals into the spatial audio signals is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:
FIG. 1 illustrates a schematic diagram of an exemplary system architecture to which embodiments of the present disclosure may be applied;
FIG. 2 schematically illustrates a flow chart of a method of audio processing in an exemplary embodiment of the present disclosure;
FIG. 3 schematically illustrates an interface diagram for determining an initial audio signal in an exemplary embodiment of the present disclosure;
FIG. 4 illustrates a schematic diagram of determining an angle of a headset in a horizontal direction in an exemplary embodiment of the present disclosure;
FIG. 5 illustrates a schematic diagram of a two-channel audio signal for acquiring multiple directions in an exemplary embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart for determining spatial signature data in an exemplary embodiment of the present disclosure;
fig. 7 schematically illustrates a flowchart for determining a spatial audio signal in an exemplary embodiment of the present disclosure;
FIG. 8 schematically illustrates a flow chart of one method of determining spatial sound field data in an exemplary embodiment of the present disclosure;
FIG. 9 schematically illustrates a flow chart of another audio processing method in an exemplary embodiment of the present disclosure;
FIG. 10 schematically illustrates an effect diagram of spatial audio in an exemplary embodiment of the present disclosure;
fig. 11 schematically illustrates a composition diagram of an audio processing apparatus in an exemplary embodiment of the present disclosure;
fig. 12 shows a schematic diagram of an electronic device to which embodiments of the present disclosure may be applied.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
In the related art, on wearable devices such as headphones and glasses, in-head effects are easy to occur, and the sound field and spatial azimuth information perceived by a user are reduced, namely the prior art cannot change the common binaural stereo on the wearable devices such as headphones and glasses into spatial audio, so that the user experience is poor.
Based on the above drawbacks, the present disclosure provides an audio processing method, and fig. 1 shows a schematic diagram of a system architecture in which the above audio processing method may be implemented, where the system architecture 100 may include a terminal 110 and a server 120. The terminal 110 may be a terminal device such as a smart phone, a tablet computer, a desktop computer, a notebook computer, etc., and the server 120 generally refers to a background system that provides services related to audio signals in the present exemplary embodiment, and may be a server or a cluster formed by multiple servers. The terminal 110 and the server 120 may form a connection through a wired or wireless communication link for data interaction.
In one embodiment, the above-described audio processing method may be performed by the terminal 110. For example, a user acquires an initial audio signal acquired by the dual-channel audio acquisition device by using the terminal 110, performs spatial feature extraction on the initial audio signal by using the terminal 110 to obtain spatial feature data corresponding to the initial audio signal, then determines estimated audio signals in a plurality of set directions in space based on the spatial feature data, and then processes the estimated audio signals in the plurality of set directions according to a corresponding correlation transfer function of the dual-channel audio acquisition device and a first decoding matrix corresponding to the spatial audio data to obtain the spatial audio signal.
In one embodiment, the above-described audio processing method may be performed by the server 120. For example, after the user acquires the initial audio signal acquired by the dual-channel audio acquisition device by using the terminal 110, the terminal 110 uploads the initial audio signal to the server 120, the server 120 performs spatial feature extraction on the initial audio signal to obtain spatial feature data corresponding to the initial audio signal, then determines estimated audio signals in a plurality of set directions in space based on the spatial feature data, and finally processes the estimated audio signals in the plurality of set directions according to a related transfer function corresponding to the dual-channel audio acquisition device and a first decoding matrix corresponding to the spatial audio data to obtain the spatial audio signal. The spatial audio signal is returned to the terminal 110.
As can be seen from the above, the main body of execution of the audio processing method in the present exemplary embodiment may be the terminal 110 or the server 120 described above, which is not limited by the present disclosure.
The audio processing method in the present exemplary embodiment will be described below with reference to fig. 2, and fig. 2 shows an exemplary flow of the audio processing method, which may include steps S210 to S230.
In step S210, an initial audio signal acquired by the binaural audio acquisition device is acquired, and spatial feature extraction is performed on the initial audio signal, so as to obtain spatial feature data corresponding to the initial audio signal.
In an example embodiment of the present disclosure, the processor may acquire an initial audio signal acquired by a binaural audio acquisition device, where the binaural audio acquisition device may be a wearable device such as an earphone, a smart glasses, a smart helmet, or may be a non-wearable device such as two microphones, or may be directly acquired by a mobile terminal, and in this example embodiment, the method is not specifically limited.
The initial audio signal is obtained by recording by the binaural audio acquisition device, and the content of the initial audio signal may be news audio, advertisement audio, movie audio, etc., which is not particularly limited in this exemplary embodiment.
After the binaural audio acquisition device records the initial audio signal, the binaural audio acquisition device may transmit the initial audio signal to the processor, and the processor may perform denoising processing on the initial audio signal first, so that the quality of the obtained spatial audio signal is higher.
In an example embodiment, if the binaural audio collection device does not have a storage space, the recorded music may be transmitted to the processor in real time; if the two-channel audio acquisition device has a corresponding storage space, the initial audio data signal can be recorded in advance, and when the acquisition instruction of the processor is received, the initial audio data signal is transmitted to the processor, and the processor and the two-channel audio acquisition device can be connected through wireless, such as Bluetooth. Or may be connected by a data line, and is not particularly limited in this exemplary embodiment.
In an exemplary embodiment, the processor may acquire a plurality of binaural audio signals in advance, and store the binaural audio signals in a memory connected to the preprocessor, and determine an initial audio signal from the plurality of binaural audio signals when the processing is required, and referring to fig. 3, a specific determination method may be that the stored plurality of binaural audio signals are displayed to a user, the stored plurality of binaural audio signals are selected by the user on a display interface, specifically, the selected binaural audio signals are displayed in a differentiated manner, and when the user triggers a selection control, the selected binaural audio signal is determined to be the initial audio signal.
After the initial audio signal is obtained, spatial feature extraction may be performed on the initial audio signal, specifically, a set angle and a weight of two signal collectors of the binaural audio acquisition device may be obtained first, for example, as shown in fig. 4, if the binaural audio acquisition device is an earphone, an angle of the two earphones in a horizontal direction may be obtained, and when the binaural audio acquisition device is a wearable device, the angles of the two signal collectors may be collected to adapt to physical features of different users, so as to improve processing precision of the initial audio signal, and the weights may be 0.4 and 0.6, or may be both 0.5, and may be customized according to user requirements, which is not specifically limited in this exemplary embodiment.
It should be noted that different initial audio signals are configured with different weight information, so that the emphasis in the sound can be increased, the sound in the finally obtained spatial audio information has more azimuth sense, and the user experience is enhanced.
After the set angles and weights of the two signal collectors of the binaural audio acquisition device are obtained, spatial feature extraction can be performed on the initial audio signal by using a spherical Bessel function and a spherical harmonic function of a first preset order to obtain spatial feature data corresponding to the initial audio signal, and specifically, the spatial feature data of the initial audio signal can be obtained by using the following first spatial feature extraction formula. The first preset order may be any positive integer greater than or equal to 1 and less than or equal to 10, or may be customized according to a user requirement, which is not specifically limited in this exemplary embodiment.
The details of the spherical bessel function and the spherical harmonic function may refer to the related art, and are not particularly limited in this exemplary embodiment.
The first spatial audio feature extraction formula is as follows:
wherein c nm Is the corresponding spherical harmonic Y nm The coefficient of (-), i.e. the above spatial signature data, c is the propagation velocity of the sound wave,is the direction vector of point x, j n (. Cndot.) is a first class n-order ball Bessel function, (. Cndot.)>Is the conjugate of spherical harmonics, S (x q The method comprises the steps of carrying out a first treatment on the surface of the f) Omega for initial audio signal q The weight corresponding to the azimuth is given. The spatial characteristic data c of the binaural signal can be obtained through calculation of the above formula nm . The spatial signature data describes the initialThe energy distribution information in a plurality of directions in the space contained in the audio signal can be used for estimating a plurality of view binaural signals; n represents the first preset order; q has values of 1 and 2, and is used for identifying the two signal collectors.
In another exemplary embodiment of the present disclosure, referring to fig. 5, the initial audio signal may include a two-channel audio signal with multiple directions, which may be generated by a user moving or may be caused by a two-channel audio acquisition device moving, for example, when a user's head moves from a first direction to a second direction, two-channel audio signals with two directions may be sound. At this time, referring to fig. 6, spatial feature extraction is performed on the initial audio signal to obtain spatial feature data corresponding to the initial audio signal, which may include steps S610 to S620.
In step S610, intermediate spatial feature data corresponding to the binaural audio signals in each azimuth are determined respectively;
in step S620, the sum of all the intermediate spatial feature data is taken as spatial feature data.
In this exemplary embodiment, the intermediate spatial feature data corresponding to the binaural audio signal for each azimuth may be first determined, and then the plurality of intermediate spatial feature data may be summed to obtain the spatial feature data. Specifically, the method can be realized by a second space feature extraction formula,
the second spatial audio feature extraction formula may be as follows:
wherein v is nm Is the corresponding spherical harmonic Y nm The coefficient of (-), i.e. the above spatial signature data, c is the propagation velocity of the sound wave,is the direction vector of point x, j n (. Cndot.) is a first class n-order ball Bessel function, (. Cndot.)>Is the conjugate of spherical harmonics, S (x q The method comprises the steps of carrying out a first treatment on the surface of the f) Omega for initial audio signal q The weight corresponding to the azimuth is given. The spatial characteristic data c of the initial audio signal can be obtained through the calculation of the formula nm . The spatial feature data describes energy distribution information in various spatial directions contained in the initial audio signal, which can be used for estimating the binaural signal in other set directions; n represents the first preset order; q has values of 1 and 2, and is used for identifying the two signal collectors, and k is used for identifying different orientations.
For example, if the number of orientations is 1, k is equal to 1, and if the number of orientations is 4, k is equal to 4.
The binaural signals in different directions are obtained as initial audio signals, the collected data are more abundant, and the quality and effect of the obtained spatial audio signals are better.
In step S220, estimated audio signals for a plurality of set directions in space are determined based on the spatial feature data.
In an example embodiment of the present disclosure, after the spatial feature data is obtained, the estimated audio signals of a plurality of setting directions in the space may be determined using the spatial feature data, and in particular, the estimated audio signals of each setting direction may be determined using a spherical bessel function and a spherical harmonic function of a second preset order according to the spatial feature data; wherein the first preset order is greater than or equal to the second preset order. Wherein the first preset order is greater than or equal to the second preset order.
Specifically, the estimated audio signals in the multiple directions may be obtained by using a reconstruction formula, where the reconstruction formula is:
wherein S (x; f) is an estimated audio signal of the corresponding azimuth and N is a second predetermined order. Spherical harmonic representation of different ordersThe energy distribution condition in the whole space direction can be used for representing arbitrary energy distribution information, and the space special transverse data c nm With corresponding spherical harmonic Y nm Weighted to estimate spatial directionAnd estimating the audio signal.
In step S230, the estimated audio signals of the plurality of set directions are processed according to the first decoding matrix and the corresponding correlation transfer function of the binaural audio acquisition device, so as to obtain a spatial audio signal.
In an example embodiment of the present disclosure, referring to fig. 7, the above steps may include steps S710 to S720.
In step S710, spatial sound field estimation is performed on the estimated audio signals in the plurality of set directions according to the correlation transfer function and the first decoding matrix to obtain spatial sound field data.
In this exemplary embodiment, the processor may first obtain a related transfer function corresponding to the binaural audio collection device, for example, if the binaural audio collection device is a head wearable device such as a headset or glasses with an audio collection function, the related transfer function may be a head related transfer function.
For another example, if the binaural acquisition device includes two microphones, the corresponding correlation transfer function may be determined according to the relative positions of the sound source and the two microphones, and the scene in which the device is located.
That is, the related transfer function is related to the relative positions of the audio collection device and the sound source and the scene in which the sound source is located, and the specific determination manner is not particularly limited in the present exemplary embodiment.
Head related transfer function (Head Related Transfer Functions, HRTF for short): refer to an acoustic localization algorithm that describes the transmission of sound waves from a sound source to both ears. It is the result of the comprehensive filtering of sound waves by human physiological structures such as the head, pinna, torso, etc.
The processor may further obtain a first decoding matrix of the spatial audio signal, for example, the first decoding matrix may be obtained by using a conventional sound field reconstruction method, such as a Mode-matching decoder (MMD), where the first decoding matrix is used to decode spatial sound field data to obtain a spatial audio signal of a type corresponding to the first decoding matrix, and the first decoding matrix corresponds to the spatial audio signal.
After the correlation transfer function and the first decoding matrix are obtained, spatial generation data may be obtained by performing spatial generation estimation on the estimated audio signals in the plurality of setting directions.
Specifically, referring to fig. 8, performing spatial sound field estimation on the estimated audio signals of the plurality of set directions according to the correlation transfer function and the first decoding matrix to obtain spatial sound field data may include steps S810 to S820.
In step S810, a second decoding matrix corresponding to the estimated audio signal is determined according to the correlation transfer function and the first decoding matrix;
in this exemplary embodiment, the above-mentioned binaural audio acquisition device is a head wearable device, and the above-mentioned related transfer function is a head related transfer function.
First, the processor may obtain a second decoding matrix corresponding to the estimated audio signal according to the head related transfer function and the first decoding matrix. The method comprises the following steps:
H=HRTF*H'
wherein H is a second decoding matrix, HRTF is a head related transfer function, and H' is the first decoding matrix.
The second decoding matrix includes decoding the sound field C onto a plurality of speakers, i.e. calculating signals of the plurality of speakers, so that the sound field C is reconstructed in the target area, and further includes superimposing signals generated at the ears by calculating signals transmitted from the virtual speakers, and multiplying the speaker signals by a Head related transfer function (HRTF, head-related transfer function) of the speaker relative to the listener's direction, where the speaker's direction is 30 degrees horizontal and 45 degrees high, and the played signal of the speaker is convolved with the HRTF of the 30 degrees 45 degrees direction in the HRTF database, so as to obtain the binaural signals generated by the virtual speakers of the direction. And overlapping the binaural signals generated by the plurality of speakers to obtain a spatial audio signal which is rendered to ears after the sound field is decoded by the speaker array.
In step S820, spatial sound field data is determined based on the inverse matrix of the second decoding matrix and the estimated audio signal.
After the second decoding matrix is obtained, the spatial sound field data may be determined using an inverse matrix of the second decoding matrix and the audio signal, specifically as follows:
where C represents the spatial sound field data and S represents the above estimated audio signal.
In step S720, a spatial audio signal is determined based on the first decoding matrix and the spatial sound field data.
In this exemplary embodiment, after the spatial sound field data is obtained, the spatial audio signal may be obtained by decoding the spatial sound field data using the first decoding matrix, which is specifically as follows:
S'=H'C
wherein S 'represents the spatial audio signal, H' represents the first decoding matrix, and C represents the spatial sound field data.
The following describes the above steps in detail with reference to fig. 9, first, step S910 may be performed to obtain an initial audio signal, then step S920 may be performed to extract spatial features from the initial audio signal to obtain spatial feature data, then step S930 may be performed to determine an estimated audio signal in a set direction according to the spatial feature data, then step S940 may be performed to obtain the estimated audio signal, and then step S950 may be further performed to perform spatial sound field estimation on the estimated audio signal to obtain spatial sound field data, and the spatial audio signal may be determined using the first decoding matrix and the spatial sound field data.
It should be noted that, the specific details of the above steps have been described in detail, and thus are not described herein.
In summary, in this exemplary embodiment, on one hand, by acquiring an initial audio signal acquired by a dual-channel audio acquisition device, and performing spatial feature extraction on the initial audio signal, spatial feature data corresponding to the initial audio signal is obtained; and based on the space characteristic data, the estimated audio signals of a plurality of set directions in the space are determined, so that multichannel space audio is generated, the space audio playback experience of a plurality of visual angles can be carried out, the sound source object recorded in the recording is ensured to be kept in the correct space orientation during the space audio playback, and the immersion of the user is improved. On the other hand, the space audio signal can be obtained through the initial audio signal acquired by the double-channel audio acquisition equipment without purchasing professional sound field acquisition equipment, so that the cost for obtaining the space audio signal is reduced. In still another aspect, the estimated audio signals in the plurality of set directions are processed by using a related transfer function corresponding to the binaural audio acquisition device and a first decoding matrix corresponding to the spatial audio data, so as to obtain the spatial audio signals, and the corresponding spatial audio signals are acquired by using the spatial sound field data and the first decoding matrix, so that professional production of a mixer is not needed, the production threshold of the spatial audio signals is reduced, and the application range of converting the binaural signals into the spatial audio signals is improved. On the other hand, the angles of the two signal collectors can be adapted to the physical characteristics of different users, and the processing precision of the initial audio signals is improved. In still another aspect, when the initial audio signal is obtained, the binaural signal in different directions is obtained as the initial audio signal, the collected data is richer, the quality and the effect of the obtained spatial audio signal are better, and as shown in fig. 10, the user experience can be improved by converting the binaural initial audio signal into the spatial audio signal.
It is noted that the above-described figures are merely schematic illustrations of processes involved in a method according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
Further, referring to fig. 11, in the embodiment of the present example, there is further provided an audio processing apparatus 1100, including an audio acquisition module 1110, an audio estimation module 1120, and an audio processing module 1130. Wherein:
the audio acquisition module 1110 may be configured to acquire an initial audio signal acquired by the binaural audio acquisition device, and perform spatial feature extraction on the initial audio signal to obtain spatial feature data corresponding to the initial audio signal. In an example embodiment, the initial audio signal comprises a plurality of azimuth binaural audio signals, and the audio acquisition module 1110 may be configured to determine intermediate spatial feature data corresponding to each azimuth binaural audio signal, respectively; the sum of all the intermediate spatial feature data is taken as spatial feature data.
The audio estimation module 1120 may be configured to determine estimated audio signals for a plurality of set directions in space based on the spatial feature data in one example embodiment, the audio estimation module 1120 may be configured to determine estimated audio signals for each set direction using a spherical bessel function and a spherical harmonic function of a second preset order from the spatial feature data.
The audio processing module 1130 may be configured to process the estimated audio signals in a plurality of set directions according to the first decoding matrix and a corresponding correlation transfer function of the binaural audio acquisition device, so as to obtain a spatial audio signal. In an exemplary embodiment, the audio processing module 1130 includes a spatial estimation module and an audio determination module, where the spatial estimation module is configured to perform spatial sound field estimation on the estimated audio signals in a plurality of set directions according to the correlation transfer function and the first decoding matrix to obtain spatial sound field data, and the audio determination module is configured to determine the spatial audio signal based on the first decoding matrix and the spatial sound field data.
In an exemplary embodiment, the spatial estimation module may be configured to determine a second decoding matrix corresponding to the estimated audio signal according to the correlation transfer function and the first decoding matrix; spatial sound field data is determined based on the inverse of the second decoding matrix and the estimated audio signal. Specifically, according to the set angles and weights of two signal collectors of the binaural audio acquisition equipment, the spatial feature extraction is performed on the initial audio signal by using a spherical Bessel function and a spherical harmonic function of a first preset order, so as to obtain spatial feature data corresponding to the initial audio signal.
When the binaural audio collection device is an earphone, the spatial estimation module may be configured to perform spatial sound field estimation on the estimated audio signals in a plurality of set directions according to a head related transfer function and a first decoding matrix of a user wearing the earphone to obtain spatial sound field data.
The audio determination module may be configured to determine the estimated audio signal for each set direction using a spherical bessel function and a spherical harmonic function of a second preset order from the spatial feature data.
The specific details of each module in the above apparatus are already described in the method section, and the details that are not disclosed can be referred to the embodiment of the method section, so that they will not be described in detail.
The exemplary embodiments of the present disclosure also provide an electronic device for performing the above-described audio processing method, which may be the above-described terminal 110 or server 120. In general, the electronic device may include a processor and a memory for storing executable instructions of the processor, the processor being configured to perform the above-described audio processing method via execution of the executable instructions.
The configuration of the electronic device will be exemplarily described below using the mobile terminal 1200 in fig. 12 as an example. It will be appreciated by those skilled in the art that the configuration of fig. 12 can also be applied to stationary type devices in addition to components specifically for mobile purposes.
As shown in fig. 12, the mobile terminal 1200 may specifically include: the portable electronic device comprises a processor 1201, a memory 1202, a bus 1203, a mobile communication module 1204, an antenna 1, a wireless communication module 1205, an antenna 2, a display 1206, an image pickup module 1207, an audio module 1208, a power supply module 1209 and a sensor module 1210.
The processor 1201 may include one or more processing units, such as, for example: processor 1201 may include an AP (Application Processor ), modem processor, GPU (Graphics Processing Unit, graphics processor), ISP (Image Signal Processor ), controller, encoder, decoder, DSP (Digital Signal Processor ), baseband processor, and/or NPU (Neural-Network Processing Unit, neural network processor), and the like. The audio processing method in the present exemplary embodiment may be performed by an AP, a GPU, or a DSP, and may be performed by an NPU when the method involves neural network-related processing.
The processor 1201 may form a connection with the memory 1202 or other components through the bus 1203.
Memory 1202 may be used to store computer-executable program code that includes instructions. The processor 1201 performs various functional applications of the mobile terminal 1200 and data processing by executing instructions stored in the memory 1202. The memory 1202 may also store application data, such as files that store images, videos, and the like.
The communication functions of the mobile terminal 1200 may be implemented by the mobile communication module 1204, the antenna 1, the wireless communication module 1205, the antenna 2, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 1204 may provide a 2G, 3G, 4G, 5G, etc. mobile communication solution applied on the mobile terminal 1200. The wireless communication module 1205 may provide wireless communication solutions for wireless local area networks, bluetooth, near field communications, etc. as applied to the mobile terminal 1200.
The display 1206 is used to implement display functions such as displaying user interfaces, images, video, and the like. The image capturing module 1207 is used to realize a capturing function such as capturing an image, video, or the like. The audio module 208 is used to implement audio functions, such as playing audio, collecting speech, etc. The power module 209 is used to implement power management functions such as charging a battery, powering a device, monitoring a battery status, etc. The sensor module 1210 may include a depth sensor 12101, a pressure sensor 12102, a gyro sensor 12103, a barometric pressure sensor 12104, etc. to implement a corresponding sensing function.
Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification. In some possible implementations, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device.
It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Furthermore, the program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. An audio processing method, comprising:
acquiring an initial audio signal acquired by a dual-channel audio acquisition device, and extracting spatial characteristics of the initial audio signal to obtain spatial characteristic data corresponding to the initial audio signal;
determining estimated audio signals for a plurality of set directions in space based on the spatial feature data;
and processing the estimated audio signals in a plurality of set directions according to a first decoding matrix and a corresponding related transfer function of the binaural audio acquisition equipment so as to obtain a spatial audio signal.
2. The method of claim 1, wherein processing the estimated audio signals for a plurality of the set directions according to a first decoding matrix and a corresponding correlation transfer function of the binaural audio acquisition device to obtain a spatial audio signal comprises:
according to the related transfer function and the first decoding matrix, performing spatial sound field estimation on the estimated audio signals in the plurality of set directions to obtain spatial sound field data;
the spatial audio signal is determined based on the first decoding matrix and the spatial sound field data.
3. The method of claim 2, wherein performing spatial sound field estimation on the estimated audio signals of the plurality of set directions according to the correlation transfer function and the first decoding matrix to obtain spatial sound field data comprises:
determining a second decoding matrix corresponding to the estimated audio signal according to the correlation transfer function and the first decoding matrix;
the spatial sound field data is determined based on an inverse of the second decoding matrix and the estimated audio signal.
4. The method of claim 2, wherein the binaural audio acquisition device is a headphone, and performing spatial sound field estimation on the plurality of estimated audio signals in the set direction according to the correlation transfer function and the first decoding matrix to obtain spatial sound field data comprises:
and carrying out space sound field estimation on the estimated audio signals in a plurality of set directions according to the head related transfer function of the user wearing the earphone and the first decoding matrix to obtain space sound field data.
5. The method of claim 1, wherein the performing spatial feature extraction on the initial audio signal to obtain spatial feature data corresponding to the initial audio signal comprises:
and according to the set angles and weights of the two signal collectors of the binaural audio acquisition equipment, performing spatial feature extraction on the initial audio signal by utilizing a spherical Bessel function and a spherical harmonic function of a first preset order to obtain spatial feature data corresponding to the initial audio signal.
6. The method of claim 5, wherein determining estimated audio signals for a plurality of set directions in space based on the spatial signature data comprises:
according to the space characteristic data, determining estimated audio signals of each set direction by utilizing a spherical Bessel function and a spherical harmonic function of a second preset order;
wherein the first preset order is greater than or equal to the second preset order.
7. The method according to any one of claims 1 to 6, wherein the initial audio signal comprises a multi-azimuth binaural audio signal, and the performing spatial feature extraction on the initial audio signal to obtain spatial feature data corresponding to the initial audio signal comprises:
respectively determining intermediate space characteristic data corresponding to the binaural audio signals in each azimuth;
and taking the sum of all the intermediate spatial feature data as the spatial feature data.
8. An audio processing apparatus, comprising:
the audio acquisition module is used for acquiring an initial audio signal acquired by the double-channel audio acquisition equipment and extracting spatial characteristics of the initial audio signal to obtain spatial characteristic data corresponding to the initial audio signal;
an audio estimation module for determining estimated audio signals for a plurality of set directions in space based on the spatial feature data;
and the audio processing module is used for processing the estimated audio signals in a plurality of set directions according to the first decoding matrix and the corresponding related transfer function of the binaural audio acquisition equipment so as to obtain a spatial audio signal.
9. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the audio processing method according to any one of claims 1 to 7.
10. An electronic device, comprising:
one or more processors; and
a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the audio processing method of any of claims 1-7.
CN202310803719.8A 2023-06-30 2023-06-30 Audio processing method and device, computer readable storage medium and electronic equipment Pending CN116825128A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310803719.8A CN116825128A (en) 2023-06-30 2023-06-30 Audio processing method and device, computer readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310803719.8A CN116825128A (en) 2023-06-30 2023-06-30 Audio processing method and device, computer readable storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN116825128A true CN116825128A (en) 2023-09-29

Family

ID=88125610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310803719.8A Pending CN116825128A (en) 2023-06-30 2023-06-30 Audio processing method and device, computer readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116825128A (en)

Similar Documents

Publication Publication Date Title
CN108156561B (en) Audio signal processing method and device and terminal
JP7210602B2 (en) Method and apparatus for processing audio signals
US20240155302A1 (en) Emoji that indicates a location of binaural sound
KR102462067B1 (en) Method for processing vr audio and corresponding equipment
CN111696513A (en) Audio signal processing method and device, electronic equipment and storage medium
US9838790B2 (en) Acquisition of spatialized sound data
US11223920B2 (en) Methods and systems for extended reality audio processing for near-field and far-field audio reproduction
CN114531640A (en) Audio signal processing method and device
CN110890100B (en) Voice enhancement method, multimedia data acquisition method, multimedia data playing method, device and monitoring system
CN113889140A (en) Audio signal playing method and device and electronic equipment
WO2023231787A9 (en) Audio processing method and apparatus
CN112599144A (en) Audio data processing method, audio data processing apparatus, medium, and electronic device
CN116825128A (en) Audio processing method and device, computer readable storage medium and electronic equipment
CN114339582B (en) Dual-channel audio processing method, device and medium for generating direction sensing filter
WO2022262576A1 (en) Three-dimensional audio signal encoding method and apparatus, encoder, and system
CN111246345B (en) Method and device for real-time virtual reproduction of remote sound field
US10764684B1 (en) Binaural audio using an arbitrarily shaped microphone array
WO2022242479A1 (en) Three-dimensional audio signal encoding method and apparatus, and encoder
WO2022178852A1 (en) Listening assisting method and apparatus
EP3569001B1 (en) Method for processing vr audio and corresponding equipment
CN115623156B (en) Audio processing method and related device
CN116781817A (en) Binaural sound pickup method and device
WO2021212287A1 (en) Audio signal processing method, audio processing device, and recording apparatus
CN118042345A (en) Method, device and storage medium for realizing space sound effect based on free view angle
JP2024517503A (en) 3D audio signal coding method and apparatus, and encoder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination