CN115938339A

CN115938339A - Audio data processing method and system

Info

Publication number: CN115938339A
Application number: CN202210288316.XA
Authority: CN
Inventors: 张平; 刘腾腾; 夏溧; 周健
Original assignee: Beijing Finite Element Technology Co Ltd
Current assignee: Beijing Finite Element Technology Co Ltd
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2023-04-07

Abstract

The application provides a method and a system for processing audio data, wherein when a double-recording process is detected to be started, a real-time audio data stream is acquired through an audio engine, and an audio processing example corresponding to the real-time audio data stream is created; selecting at least one audio data processing node, and generating an audio data processing chain for processing the real-time audio data stream corresponding to the audio processing instance; inputting the real-time audio data stream into the audio data processing chain to perform audio data processing, so as to obtain target audio data corresponding to the real-time audio data stream; outputting the target audio data through the audio engine. The audio data processing method provided by the application is mainly quoted in a double-recording scene, and ideal audio effect and time delay performance can be flexibly configured and synthesized, so that the user experience is better improved.

Description

Audio data processing method and system

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method and a system for processing audio data.

Background

In the process of double recording, due to the requirement for the quality of the double-recorded video, the processing of the double-recorded audio data is very important. At present, a commonly used audio processing scheme is to perform simple audio recognition, acquisition, playing and other processing based on an AVAudioPlayer framework. However, this processing method cannot satisfy the problems of sound change, headphone audio synthesis, reverberation, and speech synthesis, so that the finally generated audio data cannot satisfy the requirements, and the business process is affected.

Disclosure of Invention

It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.

According to an aspect of the present application, there is provided a method of processing audio data, including:

when detecting that the double-recording process is started, acquiring a real-time audio data stream through an audio engine, and creating an audio processing example corresponding to the real-time audio data stream;

selecting at least one audio data processing node to generate an audio data processing chain for processing the real-time audio data stream corresponding to the audio processing instance;

inputting the real-time audio data stream into the audio data processing chain to perform audio data processing, so as to obtain target audio data corresponding to the real-time audio data stream;

outputting the target audio data through the audio engine.

Optionally, after selecting at least one audio data processing node and before creating the audio processing instance corresponding to the real-time audio data stream, the method further includes:

and establishing a connection relation between each audio processing node and the audio engine so as to enable the audio engine to manage in an associated mode and call each audio processing node.

Optionally, the selecting at least one target audio data processing node to generate an audio data processing chain for processing a real-time audio data stream corresponding to the audio processing instance includes:

acquiring characteristic parameters corresponding to the real-time audio data stream;

selecting at least one audio data processing node according to the characteristic parameters corresponding to the real-time audio data streams in combination with audio processing requirements, and setting processing parameters of each audio data processing node; the processing parameters include but are not limited to default description files, channel numbers, sampling numbers;

and connecting an input node, an output node and each audio data processing node in the audio engine to generate an audio data processing chain for processing the real-time audio data stream.

Optionally, the audio processing node includes a player node, a sound effect node, and a composition node.

According to another aspect of the present application, there is provided a system for processing audio data, comprising:

the data acquisition module acquires a real-time audio data stream through an audio engine when detecting that the double-recording process is started, and creates an audio processing example corresponding to the real-time audio data stream;

the processing chain generating module is used for selecting at least one audio data processing node and generating an audio data processing chain for processing the real-time audio data stream corresponding to the audio processing instance;

the audio processing module is used for inputting the real-time audio data stream into the audio data processing chain to perform audio data processing so as to obtain target audio data corresponding to the real-time audio data stream;

and the output module is used for outputting the target audio data through the audio engine.

Optionally, the processing chain generation module is further configured to:

and establishing a connection relation between each audio processing node and the audio engine so as to enable the audio engine to manage in an associated manner and call each audio processing node.

Optionally, the processing chain generating module is further configured to:

and connecting the input node, the output node and each audio data processing node in the audio engine to generate an audio data processing chain for processing the real-time audio data stream.

According to another aspect of the present application, there is also provided a computing device comprising a memory, a processor and a computer program stored in the memory and executable by the processor, wherein the processor when executing the computer program implements the audio data processing method according to any one of the above.

According to another aspect of the present application, there is also provided a computer-readable storage medium, preferably a non-volatile readable storage medium, having stored therein a computer program which, when executed by a processor, implements the method of processing audio data as described in any one of the above.

The application provides a method and a system for processing audio data, which are mainly used for realizing the processing of audio by selecting nodes and establishing an audio data processing chain by using an audio engine in a double-recording service. The audio data processing method can flexibly configure and synthesize ideal audio effect and time delay performance, so that user experience is better improved, and low-time-delay and real-time audio processing can be achieved through the audio engine. And the audio can be input in multiple ways, and special effects can be added.

The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of example and not by way of limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily to scale. In the drawings:

FIG. 1 is a flow chart of a method for processing audio data according to an embodiment of the application;

FIG. 2 is a schematic diagram of an audio data processing architecture according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a system for processing audio data according to an embodiment of the application;

FIG. 4 is a schematic diagram of a computing device architecture according to an alternative embodiment of the present application;

FIG. 5 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

As shown in fig. 1, the audio data processing method in the embodiment of the present application may include at least the following steps S101 to S104.

S101, when the double-recording process is detected to be started, a real-time audio data stream is obtained through an audio engine, and an audio processing example corresponding to the real-time audio data stream is created.

The audio engine in this embodiment may be a framework system specially used for audio data processing, and specifically may be an AVAudioEngine, which has a powerful data processing function. The AVAudio Engine simplifies the encapsulation of Core Audio, and the AVAudio Engine is used for simply processing some Audio signals and is the processing of Audio data stream level. The AVAudio Engine is an audio data stream level process.

The scheme provided by the embodiment is mainly applied to a double-recording service, and after a terminal is detected to start a double-recording process, audio stream data can be obtained in a trial mode to serve as a real-time audio data stream, and an audio processing example corresponding to the real-time audio data stream is established in an AVAudio Engine.

S102, selecting at least one audio data processing node, and generating an audio data processing chain for processing the real-time audio data stream corresponding to the audio processing instance.

After the audio processing instance is created, an audio data processing chain corresponding to the audio processing instance needs to be generated to realize the processing of the real-time audio data stream. And generating an audio data processing chain requires selecting an audio data processing node. The basic concept of the avadioengine API is to create a node map of the audio, from the source nodes (player and microphone) and the over processing nodes (mixer and effector) to the destination nodes (hardware output). Each node has a certain number of input and output buses, which also have a well-defined data format.

After the AVAudioEngine is instantiated, three nodes (nodes) are defaulted, namely an inputNode and an outputNode, wherein the two nodes cannot be detach, and an optional mainiXerNode is also provided. When using the AVAudioEngine, the inputNode and the outputNode respectively correspond to the microphone and the speaker of the hardware, and need to check the sampling rate and the number of channels, and if the sampling rate is 0, the sampling rate is not available.

With reference to fig. 2, in this embodiment, three nodes, i.e., an inputNode, an outputNode, and a mainMixerNode in the AVAudioEngine are used fixedly, and in addition, a player node PlayerNode and an audio effect node are also added, where the player node PlayerNode may further include a player node for background Sound playing (backing track) and a player node for Sound EffEcts (Sound EffEcts).

Optionally, the step S102 selects at least one target audio data processing node, and generates an audio data processing chain for processing the real-time audio data stream corresponding to the audio processing instance, where the audio data processing chain may include:

s102-1, obtaining the characteristic parameters corresponding to the real-time audio data stream. The characteristic parameters corresponding to the real-time audio data stream may include volume, speed, whether reverberation occurs, and the like.

S102-2, selecting at least one audio data processing node according to the characteristic parameters corresponding to the real-time audio data stream and audio processing requirements, and setting processing parameters of each audio data processing node; the processing parameters include, but are not limited to, a default profile, a number of channels, and a number of samples. The audio processing requirements can be set by different service scenarios in a user-defined manner, which is not limited in the embodiment of the present application.

As described above, the corresponding characteristic parameters of the real-time audio stream data can be obtained, the specific parameters for trial audio stream data processing can be adaptively set in combination with the audio processing requirements set by the user, and the set parameters are further used for processing the real-time audio stream.

And S102-3, connecting the input node, the output node and each audio data processing node in the audio engine, and generating an audio data processing chain for processing the real-time audio data stream.

In the above embodiment, the input node inputNode, the output node outputNode, and the mainMixerNode in the AVAudioEngine are available, and besides, the selected player node PlayerNode and the sound effect node may be accessed between the input node inputNode and the mixing node MixerNode. Further, in conjunction with fig. 2, the input node may be a uni-directional input, multi-directional output, and the MixerNode is a multi-directional input, uni-directional output. Optionally, between the input node inputNode and each player node PlayerNode and sound effect node, a node tap block may be further configured to transmit data to the input node inputNode and each player node PlayerNode and sound effect node.

Optionally, after at least one audio data processing node is selected in step S102 and before the audio processing instance corresponding to the real-time audio data stream is created, a connection relationship between each audio processing node and the audio engine may be established, so that the audio engine performs association management and calls each audio processing node, specifically, association between the avaudio engine and each node needs to be performed, thereby implementing complete creation of an audio data processing chain and ensuring concatenation of input and output of each node.

And S103, inputting the real-time audio data stream into the audio data processing chain to perform audio data processing, so as to obtain target audio data corresponding to the real-time audio data stream. The following is a description by way of specific examples.

1. Adjusting reverberation, AVAudioUnitReverb

The range of the wetDryMix is 0-100,0, which is the total stem, the stem sound is the pure human sound without music, and 100 is the total moist, so the space sense is very strong. The dry sound is the original and the wet sound is post-processed.

The codes are as follows:

func toSetReverb(value:Float){reverbEffect.wetDryMix＝value}

2. adjusting the audio playback speed, AVAudioUnitVariseed

The codes are as follows:

func toSetRate(value:Float){rateEffect.rate＝value}

3. adjusting volume, AVAudioUnitEQ

The value range of globalGain is-96-24, and the unit is decibel

The codes are as follows:

func toSetVolumn(value:Float){volumeEffect.globalGain＝value}

4. synthesizer AVSpeechSynthesizerDelegate

The code is as follows:

let synthesizer = avspech synthesizer ()/////agent to set synthesizer, listen for event synthesizer. Delay = self

The method of the embodiment can realize technical processing of audio data such as sound change, reverberation, speech synthesis and the like when the audio data is processed.

And S104, outputting the target audio data through the audio engine.

After the audio data processing chain is generated, the real-time audio data stream for double-recording video is accessed into the audio data processing chain, and after a series of audio processing is completed, the real-time audio data stream is output by an output node of an audio engine. Based on the method provided by the embodiment of the application, in a double-recording scene, an ideal audio effect and time delay property can be flexibly configured and synthesized, so that the user experience is better improved.

Based on the same inventive concept, an embodiment of the present application further provides a system for processing audio data, as shown in fig. 3, the system for processing audio data of the present embodiment may include:

the data acquisition module 310 is used for acquiring a real-time audio data stream through an audio engine when detecting that a double-recording process is started, and creating an audio processing example corresponding to the real-time audio data stream;

a processing chain generating module 320, configured to select at least one audio data processing node, and generate an audio data processing chain for processing a real-time audio data stream corresponding to the audio processing instance;

the audio processing module 330 is configured to input the real-time audio data stream into the audio data processing chain to perform audio data processing, so as to obtain target audio data corresponding to the real-time audio data stream;

an output module 340, configured to output the target audio data through the audio engine.

In an optional embodiment of the present application, the processing chain generating module 320 is further configured to establish a connection relationship between each audio processing node and the audio engine, so that the audio engine performs association management and invokes each audio processing node.

In an optional embodiment of the present application, the processing chain generating module 320 is further configured to:

The embodiment of the present application further provides a computing device, which includes a memory, a processor, and a computer program stored in the memory and capable of being executed by the processor, wherein the processor implements the audio data processing method according to any one of the above items when executing the computer program.

Embodiments of the present application further provide a computer-readable storage medium, preferably a non-volatile readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the audio data processing method according to any one of the above items.

A computing device is also provided in embodiments of the present application, and with reference to fig. 4, the computing device comprises a memory 420, a processor 410, and a computer program stored in the memory 420 and executable by the processor 410, the computer program being stored in a space 430 for program code in the memory 420, and the computer program, when executed by the processor 410, implementing the method steps 431 for performing any of the embodiments of the present application.

The embodiment of the application also provides a computer readable storage medium. Referring to fig. 5, the computer readable storage medium comprises a storage unit for program code provided with a program 431' for performing the steps of the method according to an embodiment of the application, which program is executed by a processor.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform processes or functions in accordance with the embodiments of the application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of processing audio data, comprising:

outputting the target audio data through the audio engine.

2. The method of claim 1, wherein after selecting the at least one audio data processing node and before creating the corresponding audio processing instance of the real-time audio data stream, the method further comprises:

3. The method of claim 2, wherein selecting at least one target audio data processing node to generate an audio data processing chain for processing the real-time audio data stream corresponding to the audio processing instance comprises:

4. The method of claim 1, wherein the audio processing nodes comprise a player node, a sound effect node, and a composition node.

5. A system for processing audio data, comprising:

6. The system of claim 5, wherein the processing chain generation module is further configured to:

7. The system of claim 5, wherein the processing chain generation module is further configured to:

8. A computing device comprising a memory, a processor and a computer program stored in the memory and executable by the processor, wherein the processor when executing the computer program implements the method of processing audio data according to any of claims 1-4.

9. A computer-readable storage medium, preferably a non-volatile readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of processing audio data of one of claims 1 to 4.