WO2024082181A1

WO2024082181A1 - Spatial audio collection method and apparatus

Info

Publication number: WO2024082181A1
Application number: PCT/CN2022/126234
Authority: WO
Inventors: 王宾
Original assignee: 北京小米移动软件有限公司
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2024-04-25
Also published as: CN118235431A

Abstract

The present disclosure relates to the technical field of mobile communications. Provided are a spatial audio collection method and apparatus. The method comprises: arranging a plurality of groups of mutually orthogonal microphone arrays in a UE, and performing differential beam processing on microphone signals acquired by the microphone arrays, so as to acquire a spatial audio signal. By means of the present disclosure, mutually orthogonal microphone arrays formed by miniature microphones that use a beam technique may be used, and the size of a spatial audio collection system is controlled to be within certain dimensions, such that the spatial audio collection system is built in an existing mobile device, and a pickup system that can be built in a mobile intelligent device is formed; in addition, the directivity of the microphone arrays is controlled by means of a differential beam technique, and additional electro-acoustic and acoustic hardware requirements are reduced, thereby meeting requirements for the mobile intelligent device regarding collecting immersive audio when the volume of the device is controlled.

Description

Spatial audio acquisition method and device

Technical Field

The present disclosure relates to the field of mobile communication technology, and in particular to a spatial audio acquisition method and device.

Background technique

With the development of technology, spatial audio has been widely used in multimedia and instant communication of civilian equipment. However, the acquisition of spatial audio currently relies on external devices and cannot be directly acquired through smart mobile devices. In addition, the current spatial audio acquisition devices are too large and difficult to operate, which is not suitable for users' growing demand for high-quality audio and video acquisition.

Summary of the invention

The present disclosure proposes a spatial audio acquisition method and device to solve the problem in the prior art that a spatial audio acquisition system cannot be integrated into a UE to perform effective and high-quality spatial audio acquisition.

A first aspect embodiment of the present disclosure provides a spatial audio acquisition method, which is executed by a user equipment UE, wherein multiple groups of microphone arrays are arranged in the UE, and the maximum response directions of each group of arrays are mutually orthogonal. The method includes: performing differential beam processing on microphone signals obtained by the microphone array to obtain spatial audio signals.

In some embodiments, differential beam processing is performed on microphone signals acquired by a microphone array to acquire spatial audio signals, including: adding appropriate delay filtering and corresponding compensation filters to the microphone signals to acquire array signals with desired directivity; and decoding the array signals to acquire spatial audio signals.

In some embodiments, the method further includes: acquiring multiple directivities of the microphone array, the directivities representing the sensitivity of signals in different directions.

In some embodiments, the method further includes: obtaining a plurality of directivity differential arrays; and obtaining the required directivity in three-dimensional space by combining different differential arrays to obtain a spatial audio signal.

In some embodiments, the method further includes: decoding the spatial audio signal to output immersive multi-channel audio and/or ambisonic audio.

In some embodiments, the method further includes: filtering the microphone signal to obtain a low-frequency component and a high-frequency component, wherein the low-frequency component is output as a low-frequency effect and the high-frequency component is used to form a spatial audio signal.

In some embodiments, the microphone array is arranged in the UE in any of the following ways: the microphone array is arranged in a position close to a human voice collection component in the UE; the microphone array is arranged in a position close to an image collection component in the UE.

In some embodiments, the microphone array includes a predetermined number of microphones, which form three groups of microphone arrays. The three groups of microphone arrays are orthogonal to each other or the angle deviation from the orthogonality error is within a predetermined range, and the centers of the three groups of microphone arrays coincide or have a distance that does not exceed an error threshold.

A second aspect of the present disclosure provides a spatial audio acquisition device, which is arranged to be executed in a user equipment UE. A plurality of microphone arrays are arranged in the UE, and the maximum response directions of each array are mutually orthogonal. The device includes: a spatial audio signal acquisition module, which is used to perform differential beam processing on microphone signals acquired by the microphone array to acquire spatial audio signals.

The third aspect embodiment of the present disclosure provides a communication device, including: a transceiver; a memory; a processor, which is connected to the transceiver and the memory respectively, and is configured to control the wireless signal reception and transmission of the transceiver by executing computer executable instructions on the memory, and can implement the spatial audio acquisition method of the above-mentioned first aspect embodiment.

The fourth aspect of the present disclosure provides a computer storage medium, wherein the computer storage medium stores computer executable instructions; after the computer executable instructions are executed by a processor, the spatial audio acquisition method of the first aspect of the present disclosure can be implemented.

The embodiments of the present disclosure provide a spatial audio acquisition method and device, which arranges multiple groups of mutually orthogonal microphone arrays in the UE, and performs differential beam processing on the microphone signals acquired by the microphone arrays to obtain spatial audio signals. The present disclosure controls the size of the acquisition system within a certain size while controlling the directivity of the microphone by setting up mutually orthogonal microphone arrays, so that it can be built into current mobile devices, forming a pickup system that can be built into mobile smart devices. And through differential beam technology, the signal collected by the pickup system is controlled to collect spatial audio, reducing the requirements for additional electroacoustic and acoustic hardware, thereby solving the requirements of mobile smart devices for collecting immersive audio while controlling the size of the device.

Additional aspects and advantages of the present disclosure will be given in part in the following description and in part will be obvious from the following description or learned through practice of the present disclosure.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and/or additional aspects and advantages of the present disclosure will become apparent and easily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:

FIG1 is a schematic diagram of a flow chart of a spatial audio acquisition method according to an embodiment of the present disclosure;

FIG2 is a schematic diagram of a flow chart of a spatial audio acquisition method according to an embodiment of the present disclosure;

FIG3 is a schematic diagram of spatial audio acquisition logic according to an embodiment of the present disclosure;

FIG4 is a schematic diagram of a first-order differential array according to an embodiment of the present disclosure;

FIG5 is a schematic diagram of the directivity of a microphone array according to an embodiment of the present disclosure;

FIG6 is a schematic diagram of the directivity of the left channel and the right channel after decoding according to an embodiment of the present disclosure;

FIG7 is a schematic diagram of a first-order B format according to an embodiment of the present disclosure;

FIG8 is a schematic diagram of a first-order B-format directional signal component according to an embodiment of the present disclosure;

FIG9 is a schematic diagram of an arrangement of a microphone array in a mobile device according to an embodiment of the present disclosure;

FIG10 is a schematic diagram of an arrangement of a microphone array in a mobile device according to an embodiment of the present disclosure;

FIG11 is a block diagram of a spatial audio acquisition device according to an embodiment of the present disclosure;

FIG12 is a schematic diagram of the structure of a communication device provided in an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of the structure of a chip provided in an embodiment of the present disclosure.

Detailed ways

Embodiments of the present disclosure are described in detail below, and examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are intended to be used to explain the present disclosure, and should not be construed as limiting the present disclosure.

With the development of technology, spatial audio has been widely used in civilian devices. Websites such as Youtube and Facebook support spatial audio content. In real-time communication, audio and video codecs such as AVS support spatial audio codecs.

Although the existing spatial audio acquisition technology can achieve good audio quality and immersive sound field reproduction, the current spatial audio acquisition relies on external devices and cannot be directly acquired through smart mobile devices. In addition, the current spatial audio acquisition devices are too large and difficult to operate. The following table shows the 3D spatial audio acquisition devices in the prior art:

It can be seen that the spatial audio acquisition equipment in the relevant technology cannot be built into the current mobile smart devices, and is not suitable for users' growing demand for high-quality audio and video acquisition. Taking the most common mobile smart device nowadays, the smartphone, as an example, the size is about 7 inches, such as Xiaomi 12S PRO (length 163.6mm, width: 74.6mm, thickness: 8.16mm). In addition, the hardware layout in the mobile smart device is very compact, and the volume of the built-in audio acquisition system of the mobile smart device is very limited.

To this end, the present disclosure proposes a spatial audio acquisition method and device to solve the problem in the prior art that a spatial audio acquisition system cannot be integrated into a UE to perform effective and high-quality spatial audio acquisition.

The spatial audio acquisition method and device provided by the present application are described in detail below with reference to the accompanying drawings.

FIG1 shows a flow chart of a spatial audio acquisition method according to an embodiment of the present disclosure. The method may be implemented by a user equipment (UE). In the present disclosure, the UE is arranged with multiple microphone arrays, and the maximum response directions of each array are mutually orthogonal. As shown in FIG1 , the method may include the following steps.

S101, performing differential beam processing on microphone signals acquired by a microphone array to acquire spatial audio signals.

In an embodiment of the present disclosure, multiple groups of mutually orthogonal microphone arrays are set in the UE, and each microphone array may include multiple microphones. The present disclosure does not limit the type of microphones, and an omnidirectional micro-microphone with a small size, small error, more suitable for integration in small devices such as user equipment, and suitable for beam control, such as a MEMS (micro-electromechanical system) microphone, an electret microphone, etc., can be used to control the size of the sound pickup system. Compared with previous spatial audio acquisition devices, the present disclosure uses an omnidirectional micro-microphone to greatly reduce the size of the spatial audio acquisition system.

Traditional microphone beamforming includes delay-sum, filter-sum, adaptive beamforming (MVDR), and differential beamforming. Differential beamforming has the advantages of compact layout and frequency-invariant beam pattern. In the embodiments of the present disclosure, for multiple groups of microphone arrays, the directivity can be controlled by differential microphone beam technology to assist in obtaining spatial audio signals. In the present disclosure, different differential beam designs are used to combine different low-order arrays to obtain the required directivity in three-dimensional space, thereby collecting spatial audio signals. The solution of the present disclosure relies only on beam technology to control directivity, which can effectively reduce the dependence of the pickup system on electroacoustic and acoustic hardware.

In summary, according to the spatial audio acquisition method provided by the present disclosure, multiple groups of mutually orthogonal microphone arrays are arranged in the UE, and differential beam processing is performed on the microphone signals obtained by the microphone array to obtain spatial audio signals. The present disclosure can use mutually orthogonal microphone arrays formed by micro-microphones using beam technology to control the size of the spatial audio acquisition system within a certain size so that it can be built into current mobile devices, forming a pickup system that can be built into mobile smart devices. At the same time, through differential beam technology, the directivity of the microphone array is controlled to reduce the requirements for additional electroacoustic and acoustic hardware, thereby solving the requirements of mobile smart devices for collecting immersive audio while controlling the size of the device.

Fig. 2 shows a schematic flow chart of a spatial audio acquisition method according to an embodiment of the present disclosure. The method may be executed by a UE. In the embodiment of the present disclosure, the arrangement of the microphone array is first introduced.

In some optional embodiments, the microphone array includes a predetermined number of microphones, which form three groups of microphone arrays. The three groups of microphone arrays are orthogonal to each other or the angle deviation from the orthogonality error is within a predetermined range, and the centers of the three groups of microphone arrays coincide or have a distance that does not exceed an error threshold.

In other words, the number of microphones in the microphone array of the present disclosure is not limited, and each array can be composed of any number of microphones. In a preferred embodiment, four MEMS microphones are the most cost-effective design, for example, arranged on four adjacent vertices of a regular hexahedron to form three mutually orthogonal microphone arrays.

The present disclosure does not limit the type of microphone, and a miniature microphone (such as MEMS) can be used to control the size of the sound pickup system. Compared with previous spatial audio acquisition devices, the solution provided by the present disclosure can greatly reduce the volume.

It should be understood that the arrangement of three microphone arrays at orthogonal angles is a preferred embodiment of the present invention. In some optional embodiments, the three microphones may have a certain angle offset, and the centers of the three arrays completely overlap under ideal conditions. In some optional embodiments, separation or a certain distance between the centers can be considered as errors. Of course, the microphone arrays on the actual device should maintain orthogonality between the arrays to reduce the interference caused by position errors. In addition, since position errors, inconsistencies between microphones, and interference from the device itself will affect the final performance, calibration is required based on actual conditions.

For example, the present disclosure uses omnidirectional miniature microphones with completely identical parameters to arrange three pairs of mutually orthogonal microphones, and the midpoints of the connecting lines of each pair of microphones coincide. The microphone signals constituting the array can be reused, so at least four microphones are needed to form the microphone array required by the present invention, which can be arranged at any four vertices of a regular hexahedron. Due to the volume limitation of mobile smart devices, a preferred embodiment of the present disclosure recommends the arrangement of four microphones, which are arranged at four adjacent vertices of a regular hexahedron, with the main axis directions of the microphones consistent and the spacing between the microphone arrays as small as possible.

In this example, a three-dimensional space coordinate system is established with microphone 0 as the origin, microphone 1 is on the x-axis, microphone 2 is on the y-axis, and microphone 3 is on the z-axis. The distances between microphone 0, microphone 1, microphone 2, and microphone 3 are equal, forming three pairs of orthogonal first-order differential arrays. Since miniature microphones have the advantage of small size compared to traditional capacitor and dynamic microphones, the distance between the three pairs of microphones can be completely controlled at 4mm, which is much smaller than the wavelength (1.7cm) of the target signal (20-20kHz), so the error caused by the microphone distance can be ignored.

Based on the embodiment shown in FIG. 1 , as shown in FIG. 2 , the method may include the following steps.

S201, filtering the microphone signal acquired by the microphone array to obtain low-frequency components and high-frequency components.

In an embodiment of the present disclosure, the microphone signals acquired by the microphone array are filtered, wherein the obtained low-frequency components are output as low-frequency effects, and the high-frequency components are used for subsequent processing to form spatial audio signals, as shown in FIG3 , which shows a logical schematic diagram of spatial audio acquisition described in the present disclosure.

It should be understood that due to the high-pass characteristics of the differential beam, the performance in the low-frequency part is poor, so the original signal of microphone 0 (i.e. the microphone signal obtained by the microphone array) can be passed through a low-pass filter to retain only the low-frequency component as the LFE channel. Since the low-frequency component has a longer wavelength, it has less impact on the positioning of the human ear, and while strengthening the low-frequency effect, it does not affect the sense of space. The remaining channels are filtered out of the low-frequency component through a high-pass filter as the high-frequency component, which is used for subsequent processing to form a spatial audio signal.

S202, adding appropriate delay filtering and corresponding compensation filters to the microphone signal to obtain an array signal with required directivity.

It should be understood that, for the high frequency components obtained in step S201, appropriate delay filtering and corresponding compensation filters may be added to obtain an array signal with desired directivity.

S203, obtaining multiple directivities of the microphone array.

In the embodiments of the present disclosure, directivity represents the sensitivity of signals in different directions. The present disclosure obtains the required directivity in three-dimensional space by obtaining differential arrays of multiple directivities and combining different differential arrays, as shown in FIG4 , which shows a schematic diagram of a first-order differential array.

Specifically, the above steps S202-S203 are described in detail below.

The standard first-order differential array obtains the target signal by subtracting the microphones with the same main axis direction, and controls the directivity by adding a delay with constant angular frequency to the subtracted microphone signal:

First, let

Where δ is the microphone spacing and c is the speed of sound.

The output compensation filter can be expressed as:

Where ω is the angular frequency, ∝ _1,1 is the delay filter coefficient,

Therefore, for the signal at angle θ (the incident angle of the sound source at the microphone), the signal output by the array (ie, the array signal mentioned above) is expressed as: Y(ω,θ)=( _X1 (ω,θ) _-X2 (ω,θ)) _HL (ω), where _Xn (ω,θ) represents the nth microphone signal.

Since the distance between microphones is much smaller than the wavelength, τ ₀ -∝ _1,1 τ ₀ ＜＜2π, the amplitude difference between X ₁ and X ₂ can be ignored, and e ^x ＝1+x.

Therefore, the signal Y(ω,θ) output by the array can be expressed as:

Then the directivity of the array (signal sensitivity in different directions) is:

After simplification, it is expressed as:

The two most common orientations are (with the main axis at 90°):

指向性Directivity	∝ _1,1 ∝ _1,1	灵敏度为0的角度Angle at which sensitivity is 0
8字形\偶极性Figure 8\Dipole	00	0°，180°0°, 180°
心型Heart	-1-1	-90°-90°

Therefore, by controlling the delay filter coefficient, the direction of the differential beam can be controlled.

S204: Decode the array signal to obtain a spatial audio signal.

In the embodiments of the present disclosure, according to the principle of differential array, three pairs of microphones can form the following five first-order differential arrays with different directivities:

序号Serial number	主轴方向Spindle direction	指向性Directivity	选用麦克风Microphone Selection
阵列1：Array 1:	X轴正方向X-axis positive direction	心型Heart	麦克风0,麦克风1Microphone 0, Microphone 1
阵列2：Array 2:	X轴负方向X-axis negative direction	心型Heart	麦克风0,麦克风1Microphone 0, Microphone 1
阵列3：Array 3:	Y轴正方向Positive direction of Y axis	8字8 characters	麦克风0,麦克风2Microphone 0, Microphone 2
阵列4：Array 4:	Z轴正方向Z-axis positive direction	8字8 characters	麦克风0,麦克风3Microphone 0, Microphone 3
阵列5：Array 5:	X轴正方向X-axis positive direction	8字8 characters	麦克风0,麦克风1Microphone 0, Microphone 1

By combining different first-order arrays, the required directivity in three-dimensional space is obtained, thereby collecting spatial audio signals.

S205 , decoding the spatial audio signal to output immersive multi-channel audio and/or ambisonic audio.

In the embodiments of the present disclosure, different differential beam designs are used to obtain audio signals required for spatial audio. For example, different audio formats such as multi-channel audio and ambisonic (B-format) can be output, where multi-channel audio and ambisonic audio are two immersive (surround sound) formats.

For example, in an optional embodiment, according to the M\S recording principle, an M\S-3D recording format is constructed, and 5.1.4-channel multi-channel audio is output by decoding the spatial audio signal. Among them, two cardioid arrays with opposite directivities point to the positive and anti-phase directions of the X axis, and two figure-8 arrays point to the positive directions of the Y axis and the positive directions of the Z axis respectively.

The decoding method for obtaining multi-channel audio is as follows, where "+" indicates signal addition and "-" indicates signal inversion addition.

声道 Channel		阵列1Array 1	阵列2Array 2	阵列3Array 3	阵列4Array 4
左Left	++	The	++	--
中middle	++	The	The	The
右right	++	The	--	--
左环Left ring	The	++	++	--
右环Right ring	The	++	--	--
前方顶部左侧Front top left	++	The	++	++
前方顶部右侧Front top right	++	The	The	++
顶部左后Top left rear	The	++	++	++
顶部右后Top right rear	The	++	--	++

The arrangement of the microphone array proposed in the present invention is as shown in the five arrays mentioned above, wherein the directivity of array 1 in the xoy section is shown in FIG5(a); the directivity of array 3 in the xoy section is shown in FIG5(b), wherein + is the positive phase, - is the negative phase, and the same positive and negative phase signals will cancel each other out; the directivity of array 4 in the xoz plane section is shown in FIG5(c), wherein + is the positive phase, - is the negative phase, and the same positive and negative phase signals will cancel each other out.

After decoding, the directivity of the left channel and the right channel in the coordinate axis plane section is shown in FIG6 , wherein the left channel is shown in FIG6( a ) and the right channel is shown in FIG6( b ).

In another embodiment of the present disclosure, the present disclosure can output standard ambisonic audio. It should be understood that the first-order B-format is the first-order decomposition of spherical harmonics, as shown in FIG7 . The standard B-format requires an omnidirectional signal (W) and three mutually positive 8-shaped directional signals (X, Y, Z). By selecting the corresponding array, the four components required to obtain the B-format can be expressed as: W = microphone 0; X = array 1; Y = array 2; Z = array 5, as shown in FIG8 .

Therefore, the present disclosure decodes the spatial audio signal to obtain audio signals of different formats to meet the diverse needs of spatial audio acquisition.

In addition, in an optional example, the microphone array in the UE may be arranged according to actual needs.

In one example, when the handheld call requirement is taken into account, the microphone array is arranged in a position close to the human voice collection component in the UE. For example, the microphone array is arranged at the lower end of the mobile smart device, closer to the human mouth, to ensure a better signal-to-noise ratio. FIG9 shows a schematic diagram of the arrangement of the microphone array in the mobile device, where FIG9(a) is a schematic diagram of the back side of the mobile device, and FIG9(b) is a schematic diagram of the front side.

In another example, when taking into account the video effect, the microphone array is arranged in a position close to the image acquisition component in the UE. For example, the microphone array is arranged close to the camera and is consistent with the positive direction of the camera. By ensuring that the viewing angle is as consistent as possible with the camera, a better audio-visual effect is guaranteed. Figure 10 shows a schematic diagram of the arrangement of the microphone array in a mobile device, where Figure 10(a) is a schematic diagram of the back of the mobile device, and Figure 10(b) is a schematic diagram of the front.

In summary, according to the spatial audio acquisition method provided by the present disclosure, multiple groups of mutually orthogonal microphone arrays are arranged in the UE, and differential beam processing is performed on the microphone signals obtained by the microphone array to obtain spatial audio signals. The present disclosure can use mutually orthogonal microphone arrays formed by micro-microphones using beam technology to control the size of the spatial audio acquisition system within a certain size, so that it can be built into current mobile devices, forming a pickup system that can be built into mobile smart devices. At the same time, through differential beam technology, the directivity of the microphone array is controlled to reduce the requirements for additional electroacoustic and acoustic hardware, thereby solving the requirements of mobile smart devices for collecting immersive audio while controlling the size of the device. In addition, by outputting audio in different formats, different application requirements can be met, and the present disclosure can adapt to different application scenarios by arranging microphone arrays at different positions in mobile devices.

In the above embodiments provided by the present application, the method provided by the embodiment of the present application is introduced from the perspective of the user equipment. In order to implement the various functions in the method provided by the above embodiments of the present application, the user equipment may include a hardware structure and a software module, and implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. A certain function of the above functions can be executed in the form of a hardware structure, a software module, or a hardware structure plus a software module.

Corresponding to the spatial audio acquisition methods provided in the above-mentioned embodiments, the present disclosure also provides a spatial audio acquisition device. Since the spatial audio acquisition device provided in the embodiment of the present disclosure corresponds to the spatial audio acquisition methods provided in the above-mentioned embodiments, the implementation method of the spatial audio acquisition method is also applicable to the spatial audio acquisition device provided in this embodiment, and will not be described in detail in this embodiment.

FIG11 is a schematic diagram of the structure of a spatial audio collection device 1100 provided in an embodiment of the present disclosure. The spatial audio collection device 1100 is arranged in a user equipment UE for execution. A plurality of microphone arrays are arranged in the UE, and the maximum response directions of each array are mutually orthogonal.

As shown in FIG. 11 , the apparatus 1100 includes: a spatial audio signal acquisition module 1110 for performing differential beam processing on microphone signals acquired by a microphone array to acquire spatial audio signals.

According to the spatial audio acquisition device provided by the present disclosure, multiple groups of mutually orthogonal microphone arrays are arranged in the UE, and differential beam processing is performed on the microphone signals obtained by the microphone array to obtain spatial audio signals. The present disclosure can use mutually orthogonal microphone arrays formed by micro-microphones using beam technology to control the size of the spatial audio acquisition system within a certain size while controlling the directivity of the microphones so that it can be built into current mobile devices, forming a pickup system that can be built into mobile smart devices. Through differential beam technology, the directivity of the signal collected by the pickup system is controlled, and the requirements for additional electroacoustic and acoustic hardware are reduced, thereby solving the requirements of mobile smart devices for collecting immersive audio while controlling the size of the device.

In some embodiments, the spatial audio signal acquisition module 1110 is further used to: add appropriate delay filtering and corresponding compensation filters to the microphone signal to obtain an array signal with desired directivity; and decode the array signal to obtain the spatial audio signal.

In some embodiments, the spatial audio signal acquisition module 1110 is further used to: acquire multiple directivities of the microphone array, where the directivities represent the sensitivity of signals in different directions.

In some embodiments, the spatial audio signal acquisition module 1110 is further used to: acquire a plurality of directivity differential arrays; and acquire the required directivity in three-dimensional space by combining different differential arrays to acquire the spatial audio signal.

In some embodiments, the spatial audio signal acquisition module 1110 is further configured to: decode the spatial audio signal to output immersive multi-channel audio and/or ambisonic audio.

In some embodiments, the spatial audio signal acquisition module 1110 is further used to filter the microphone signal to obtain low-frequency components and high-frequency components, wherein the low-frequency components are output as low-frequency effects and the high-frequency components are used to form a spatial audio signal.

According to the spatial audio acquisition device provided by the present disclosure, multiple groups of mutually orthogonal microphone arrays are arranged in the UE, and differential beam processing is performed on the microphone signals obtained by the microphone array to obtain spatial audio signals. The present disclosure can use mutually orthogonal microphone arrays formed by miniature microphones using beam technology to control the size of the spatial audio acquisition system within a certain size while controlling the directivity of the microphone, so that it can be built into current mobile devices, forming a pickup system that can be built into mobile smart devices. Through differential beam technology, the directionality of the signal collected by the pickup system is controlled to reduce the requirements for additional electroacoustic and acoustic hardware, thereby solving the requirements of mobile smart devices for collecting immersive audio while controlling the size of the device. In addition, by outputting audio in different formats, different application requirements can be met, and the present disclosure can adapt to different application scenarios by arranging microphone arrays at different positions in mobile devices.

Please refer to Figure 12, which is a schematic diagram of the structure of a communication device 1200 provided in an embodiment of the present application. The communication device 1200 can be a network device, or a user device, or a chip, a chip system, or a processor that supports the network device to implement the above method, or a chip, a chip system, or a processor that supports the user device to implement the above method. The device can be used to implement the method described in the above method embodiment, and the details can be referred to the description in the above method embodiment.

The communication device 1200 may include one or more processors 1201. The processor 1201 may be a general-purpose processor or a dedicated processor, etc. For example, it may be a baseband processor or a central processing unit. The baseband processor may be used to process the communication protocol and the communication data, and the central processing unit may be used to control the communication device (such as a base station, a baseband chip, a terminal device, a terminal device chip, a DU or a CU, etc.), execute a computer program, and process the data of the computer program.

Optionally, the communication device 1200 may further include one or more memories 1202, on which a computer program 1204 may be stored, and the processor 1201 executes the computer program 1204 so that the communication device 1200 performs the method described in the above method embodiment. Optionally, data may also be stored in the memory 1202. The communication device 1200 and the memory 1202 may be provided separately or integrated together.

Optionally, the communication device 1200 may further include a transceiver 1205 and an antenna 1206. The transceiver 1205 may be referred to as a transceiver unit, a transceiver, or a transceiver circuit, etc., and is used to implement a transceiver function. The transceiver 1205 may include a receiver and a transmitter, the receiver may be referred to as a receiver or a receiving circuit, etc., and is used to implement a receiving function; the transmitter may be referred to as a transmitter or a transmitting circuit, etc., and is used to implement a transmitting function.

Optionally, the communication device 1200 may further include one or more interface circuits 1207. The interface circuit 1207 is used to receive code instructions and transmit them to the processor 1201. The processor 1201 executes the code instructions to enable the communication device 1200 to execute the method described in the above method embodiment.

In one implementation, the processor 1201 may include a transceiver for implementing receiving and sending functions. For example, the transceiver may be a transceiver circuit, an interface, or an interface circuit. The transceiver circuit, interface, or interface circuit for implementing the receiving and sending functions may be separate or integrated. The above-mentioned transceiver circuit, interface, or interface circuit may be used for reading and writing code/data, or the above-mentioned transceiver circuit, interface, or interface circuit may be used for transmitting or delivering signals.

In one implementation, the processor 1201 may store a computer program 1203, which runs on the processor 1201 and enables the communication device 1200 to perform the method described in the above method embodiment. The computer program 1203 may be fixed in the processor 1201, in which case the processor 1201 may be implemented by hardware.

In one implementation, the communication device 1200 may include a circuit that can implement the functions of sending or receiving or communicating in the aforementioned method embodiments. The processor and transceiver described in the present application may be implemented in an integrated circuit (IC), an analog IC, a radio frequency integrated circuit RFIC, a mixed signal IC, an application specific integrated circuit (ASIC), a printed circuit board (PCB), an electronic device, etc. The processor and transceiver may also be manufactured using various IC process technologies, such as complementary metal oxide semiconductor (CMOS), N-type metal oxide semiconductor (NMOS), P-type metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.

The communication device described in the above embodiments may be a network device or a user device, but the scope of the communication device described in the present application is not limited thereto, and the structure of the communication device may not be limited by FIG. 12. The communication device may be an independent device or may be part of a larger device. For example, the communication device may be:

(1) Independent integrated circuit IC, or chip, or chip system or subsystem;

(2) having a set of one or more ICs, and optionally, the IC set may also include a storage component for storing data and computer programs;

(3) ASIC, such as modem;

(4) Modules that can be embedded in other devices;

(5) Receivers, terminal devices, intelligent terminal devices, cellular phones, wireless devices, handheld devices, mobile units, vehicle-mounted devices, network devices, cloud devices, artificial intelligence devices, etc.;

(6)Others

For the case where the communication device can be a chip or a chip system, please refer to the schematic diagram of the chip structure shown in Figure 13. The chip shown in Figure 13 includes a processor 1301 and an interface 1302. The number of processors 1301 can be one or more, and the number of interfaces 1302 can be multiple.

Optionally, the chip further includes a memory 1303, and the memory 1303 is used to store necessary computer programs and data.

Those skilled in the art may also understand that the various illustrative logical blocks and steps listed in the embodiments of the present application may be implemented by electronic hardware, computer software, or a combination of the two. Whether such functions are implemented by hardware or software depends on the specific application and the design requirements of the entire system. Those skilled in the art may use various methods to implement the functions for each specific application, but such implementation should not be understood as exceeding the scope of protection of the embodiments of the present application.

The present application also provides a readable storage medium having instructions stored thereon, which implement the functions of any of the above method embodiments when executed by a computer.

The present application also provides a computer program product, which implements the functions of any of the above method embodiments when executed by a computer.

In the above embodiments, it can be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented using software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs. When the computer program is loaded and executed on a computer, the process or function according to the embodiment of the present application is generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer program can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer program can be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center that contains one or more available media integrated. Available media can be magnetic media (e.g., floppy disks, hard disks, tapes), optical media (e.g., high-density digital video discs (DVD)), or semiconductor media (e.g., solid state disks (SSD)), etc.

A person skilled in the art may understand that the various numerical numbers such as first and second involved in the present application are only used for the convenience of description and are not used to limit the scope of the embodiments of the present application, and also indicate the order of precedence.

At least one in the present application can also be described as one or more, and a plurality can be two, three, four or more, which is not limited in the present application. In the embodiments of the present application, for a technical feature, the technical features in the technical feature are distinguished by "first", "second", "third", "A", "B", "C" and "D", etc., and there is no order of precedence or size between the technical features described by the "first", "second", "third", "A", "B", "C" and "D".

As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., disk, optical disk, memory, programmable logic device (PLD)) for providing machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal for providing machine instructions and/or data to a programmable processor.

The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such back-end components, middleware components, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communications network). Examples of communications networks include: a local area network (LAN), a wide area network (WAN), and the Internet.

A computer system may include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server is generated by computer programs running on respective computers and having a client-server relationship to each other.

It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the steps recorded in this disclosure can be executed in parallel, sequentially or in different orders, as long as the desired results of the technical solutions disclosed in this disclosure can be achieved, and this document does not limit this.

In addition, it should be understood that the various embodiments of the present application may be implemented individually or in combination with other embodiments when the solution permits.

Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.

The above are only specific implementations of the present application, but the protection scope of the present application is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present application, which should be included in the protection scope of the present application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

A spatial audio acquisition method, characterized in that the method is performed by a user equipment UE, wherein multiple groups of microphone arrays are arranged in the UE, and the maximum response directions of each group of arrays are mutually orthogonal, and the method comprises:

Perform differential beam processing on microphone signals acquired by the microphone array to acquire spatial audio signals.
The method according to claim 1, characterized in that the performing differential beam processing on the microphone signals obtained by the microphone array to obtain the spatial audio signal comprises:

Adding appropriate delay filtering and corresponding compensation filters to the microphone signal to obtain an array signal with desired directivity;

The array signal is decoded to obtain the spatial audio signal.
The method according to claim 2, characterized in that the method further comprises:

A plurality of directivities of the microphone array are obtained, where the directivities represent the sensitivity of signals in different directions.
The method according to claim 3, characterized in that the method further comprises:

Acquire the plurality of directivity differential arrays;

By combining different differential arrays, the required directivity in three-dimensional space is obtained to obtain the spatial audio signal.
The method according to any one of claims 1 to 4, characterized in that the method further comprises:

The spatial audio signal is decoded to output immersive multi-channel audio and/or ambisonic audio.
The method according to any one of claims 1 to 5, characterized in that the method further comprises:

Filtering the microphone signal to obtain low-frequency components and high-frequency components,

The low-frequency component is output as a low-frequency effect, and the high-frequency component is used to form the spatial audio signal.
The method according to any one of claims 1 to 6, characterized in that the microphone array is arranged in the UE in any of the following ways:

The microphone array is arranged at a position in the UE close to a human voice collection component;

The microphone array is arranged in the UE at a position close to the image acquisition component.
The method according to any one of claims 1 to 7 is characterized in that the microphone array includes a predetermined number of microphones, the predetermined number of microphones form three groups of microphone arrays, the three groups of microphone arrays are orthogonal to each other or the angle deviation from orthogonality error is within a predetermined range, and the centers of the three groups of microphone arrays coincide or have a distance that does not exceed an error threshold.
A spatial audio acquisition device, characterized in that the device is arranged in a user equipment UE for execution, the UE is arranged with multiple groups of microphone arrays, and the maximum response directions of each group of arrays are mutually orthogonal, and the device comprises:

The spatial audio signal acquisition module is used to perform differential beam processing on the microphone signals acquired by the microphone array to acquire spatial audio signals.
A communication device, comprising: a transceiver; a memory; a processor, connected to the transceiver and the memory respectively, configured to control the wireless signal reception and transmission of the transceiver by executing computer executable instructions on the memory, and capable of implementing any one of the methods of claims 1-8.
A computer storage medium, wherein the computer storage medium stores computer executable instructions; after the computer executable instructions are executed by a processor, the method according to any one of claims 1 to 8 can be implemented.