WO2023273360A1

WO2023273360A1 - Browser-based real-time audio processing method and system, and storage device

Info

Publication number: WO2023273360A1
Application number: PCT/CN2022/076304
Authority: WO
Inventors: 潘晨
Original assignee: 稿定（厦门）科技有限公司
Priority date: 2021-06-29
Filing date: 2022-02-15
Publication date: 2023-01-05
Also published as: CN113434110A

Abstract

The present invention relates to a browser-based real-time audio processing method, comprising the following steps: S1, acquiring a native audio processing module written in a non-JavaScript programming language; S2, compiling the native audio processing module into a Web-side audio processing module; S3, selecting a container of the Web-side audio processing module in a browser, the container being configured to load and run the Web-side audio processing module; S4, acquiring a real-time audio, establishing a real-time audio input interface in the browser, and mapping the real-time audio to the real-time audio input interface; S5, the container obtaining the real-time audio by means of the real-time audio input interface, matching an input data type and an output data type of the container, processing the real-time audio and playing sound.

Description

A browser-based real-time audio processing method, system, and storage device

technical field

The invention relates to the field of audio processing, and specifically refers to a browser-based real-time audio processing method, system, and storage device.

Background technique

With the popularity of high-speed network, live broadcast and other strong real-time content carriers, the demand for audio processing, such as superimposing and changing voices to increase the interactive interest of content, etc., is increasing day by day.

At present, browser-side audio processing on the market focuses on the visualization of audio data waveforms, and the data comes from pre-loaded entire audio files. There is a lack of processing solutions for real-time audio data, which cannot meet the needs of real-time audio processing such as live broadcast. At the same time, it is difficult to implement the native audio processing algorithm of the platform, which can only realize simple audio processing, and there are problems such as large delay.

It is the purpose of the present invention to design a browser-based real-time audio processing method, system, and storage device for the above-mentioned problems in the prior art.

Contents of the invention

In view of the above-mentioned problems in the prior art, the present invention provides a browser-based real-time audio processing method, system, and storage device, which can effectively solve the above-mentioned problems in the prior art.

Technical scheme of the present invention is:

A browser-based real-time audio processing method, comprising the following steps:

S1, obtaining a native audio processing module written in a non-JavaScript programming language;

S2, compiling the native audio processing module into a web-side audio processing module;

S3. Select the container of the web-side audio processing module in the browser, and the container is used to load and run the web-side audio processing module;

S4, acquiring real-time audio, establishing a real-time audio input interface in the browser, and mapping the real-time audio to the real-time audio input interface;

S5. The container acquires the real-time audio through the real-time audio input interface, matches the input data type and output data type of the container, processes the real-time audio, and plays a sound.

Further, in step S3, the selection of the container of the Web-side audio processing module in the browser is specifically: in the browser, calling the AudioWorklet interface in the browser to create a newly started independent thread as the Web-side Container for audio processing modules.

Further, step S4 is specifically:

S4.1. Obtain real-time audio through an audio input source, where the audio input source is one or more of a buffered audio input node, a media stream audio input node, and a media element audio input node;

S4.2, if the audio input source is a buffered audio input node, call the browser AudioBufferSourceNode interface, and write the real-time audio into the buffer attribute of the object in turn;

If the audio input source is a media stream audio input node, call the browser getUserMedia interface to obtain the real-time audio from the device media stream, call the browser createMediaStreamSource interface to obtain the real-time audio, and establish media stream audio input;

If the audio input source is a media element audio input node, connect the player to the browser, call the browser createMediaElementSource interface, and create the media element audio input.

Further, step S5 is specifically:

S5.1. The container acquires the real-time audio through the real-time audio input interface;

S5.2, establishing an input layer cache and an output layer cache in the web-side audio processing module;

S5.3. After the container obtains the real-time audio, convert the sample point type of the channel data into a data type matched by the web-side audio processing module and store it in the input layer cache;

S5.4, the web-side audio processing module processes the data in the input layer cache, and stores the processing result in the output layer cache;

S5.5. Read the output layer cache, and convert the sampling point type of the processing result into a data type matched by the container;

S5.6, play sound.

Further, both the input layer cache and the output layer cache are ring caches.

Further, in step S5.3, after the container obtains the real-time audio, it converts the sampling point type of the channel data into a data type matched by the web-side audio processing module and stores it in the input layer cache Before, the method further includes a step of interleaving and storing the channel data included in the real-time audio.

Further, in step S5.5, after converting the sample point type of the processing result into a data type matched by the container, a step is further included: converting the interleaved processing result into a multi-channel processing result.

Further, in step S5.4, when the cache storage capacity of the input layer cache reaches the frame length required by the web-side audio processing module, the web-side audio processing module processes the data in the input layer cache, And store the processing results in the output layer cache.

A browser-based real-time audio processing system is further provided, comprising:

An acquisition unit, configured to acquire a native audio processing module written in a non-JavaScript programming language;

A compiling unit, configured to compile the native audio processing module into a Web-side audio processing module;

The container selection unit is used to select the container of the Web-side audio processing module in the browser, and the container is used to load and run the Web-side audio processing module;

A mapping unit, configured to acquire real-time audio, establish a real-time audio input interface in the browser, and map the real-time audio to the real-time audio input interface;

The data type matching unit is used for the container to obtain the real-time audio through the real-time audio input interface, match the input data type and output data type of the container, process the real-time audio and play sound.

A computer-readable storage medium is further provided, storing a computer program, and implementing the real-time audio processing method when the computer program is executed by a processor.

Therefore, the present invention provides following effect and/or advantage:

The method provided by the present invention can universally realize the method of real-time audio processing in the browser by compiling and loading the container, establishing a real-time audio input interface to input real-time audio, and matching the input data type and output data type of the container.

The method provided by the present invention has simple and efficient processing steps, and has the characteristics of low delay. Since the human ear is very sensitive to the intermittent sound, the delay or stuttering of the time greater than 16ms can be detected by the human ear. The low-latency processing and output make the final output audio without a sense of discontinuity.

The invention has universality, not only can realize the processing of the audio input source in the browser, but also is applicable to various existing c/c++ audio input algorithm modules.

The container selected by the present invention can establish independent computing resources for the web-side audio processing module, and does not choose the main thread that takes into account heavy tasks such as interface rendering and event response, so as to ensure the lowest processing time for audio processing.

The present invention selects different interfaces for different audio sources to be connected to the container, and maps the real-time audio obtained from the audio input source that can be accepted by the container to a specific interface acceptable to the browser or container, thereby ensuring that the browser or container can obtain real-time audio.

The present invention matches the input data type and the output data type of the container, so that the input audio data can be recognized and processed by the Web-side audio processing module, and at the same time, the data obtained after processing by the Web-side audio processing module can be processed The container recognizes and plays.

It is to be understood that both the foregoing general description and the following detailed description of the invention are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.

Description of drawings

Figure 1 is a schematic flow chart of Embodiment 1.

Fig. 2 is a schematic diagram of a link in which real-time audio is mapped to the real-time audio input interface.

Fig. 3 is a schematic diagram of input and output matching in the container.

FIG. 4 is a schematic diagram of interleaved storage of channel data.

Fig. 5 is a schematic diagram of the functional framework of the second embodiment.

detailed description

In order to facilitate the understanding of those skilled in the art, the structure of the present invention will be further described in detail with the embodiments in conjunction with the accompanying drawings:

With reference to Fig. 1, a kind of browser-based real-time audio processing method comprises the following steps:

S1, obtaining a native audio processing module written in a non-JavaScript programming language; in this embodiment, the native audio processing module can be a bottom-level audio processing module written in c/c++ language, using a mature audio processing algorithm to realize different voice-changing effects on PCM . In other embodiments, the original audio processing module can also be a program written in other programming languages, and the audio processing module can also be a module that realizes different audio processing effects. Moreover, the native audio processing module used in this embodiment is an existing technology, and its composition and function will not be described in detail here.

S2, compiling the native audio processing module into a Web-side audio processing module; Webassembly is a virtual machine language, and its MVP (Minimum Viable Product, minimum viable product is the core function) has been widely supported in various browsers , and its execution performance is close to native, which has greatly improved the performance of traditional Javascript processing modules running on browsers. Module compilation can be done with the help of the Emscripten compilation tool chain. Emscripten is an implementation of LLVM (Low Level Virtual Machine, a general-purpose compiler architecture), which is specially used for the conversion of c/c++ modules to Webassembly modules. Convert the Web-side audio processing module through Webassembly.

S3. Select the container of the web-side audio processing module in the browser, the container is used to load and run the web-side audio processing module; the container provides computing resources for the web-side audio processing module, and A thread is a unit for computer scheduling resources. Generally, the main thread in a browser is used as a container for loading and running processing modules. However, in real-time audio processing scenarios, considering that the human ear is very sensitive to audio discontinuity, independent computing resources are required to ensure the lowest processing time for audio processing, and the main thread that takes into account heavy tasks such as interface rendering and event response is not selected. And choose to start a new thread as the container of the Web-side audio processing module. This separate thread can be provided by the AudioWorklet interface in the browser.

S4, acquire real-time audio, establish a real-time audio input interface in the browser, and map the real-time audio to the real-time audio input interface; the container only accepts several specific audio input sources, so it is necessary to input audio in any form mapped to these specific input sources. Map the real-time audio obtained from the audio input source that can be accepted by the container to the specific interface acceptable to the browser or container, so as to ensure that the browser or container can obtain real-time audio.

S5. The container acquires the real-time audio through the real-time audio input interface, matches the input data type and output data type of the container, processes the real-time audio, and plays a sound. In this embodiment, the input and output data model of the c/c++ module is retained inside the compiled Web-side audio processing module, which does not match the input and output data model of the container, and the length of a single audio data frame input and output in the container is 128 sampling points, each sampling point is recorded using a 32-bit floating-point number, the length of a single audio data frame processed by the compiled web-side audio processing module is generally 1024, and each sampling point generally uses a 16-bit integer number Make a note. Therefore, it is necessary to match the input data type and the output data type of the container, so that the input audio data can be recognized and processed by the Web-side audio processing module, and at the same time, the data obtained after processing by the Web-side audio processing module can be Recognized and played by the container.

Step S4 is specifically: referring to FIG. 2,

S4.1. Obtain real-time audio through an audio input source, where the audio input source is one or more of a buffered audio input node, a media stream audio input node, and a media element audio input node; wherein, the buffered audio input node is generally It is used for the input of original audio data PCM, such as the sampled audio in the buffer; the media stream audio input node is generally used for the input of the media stream of the device, such as the audio obtained from the microphone; the media element audio input node is generally used for The input of player media files in the browser, such as audio being played by a player.

Step S5 is specifically: referring to FIG. 3 ,

S5.1, the container obtains the real-time audio through the real-time audio input interface; in this embodiment, the length of a single audio data frame input and output in the container is 128 sampling points, and each sampling point uses a 32-bit floating point number Make a note.

S5.3, after the container obtains the real-time audio, convert the sampling point type of the channel data into the data type matched by the Web-side audio processing module and store it in the input layer cache; save the single audio data Converted to a frame length of 1024, each sampling point is recorded using a 16-bit integer data type, which is a data type that can be recognized and processed by the compiled web-side audio processing module.

S5.4, the Web-side audio processing module processes the data in the input layer cache, and stores the processing results in the output layer cache; in this embodiment, the Web-side audio processing module is implemented using a mature audio processing algorithm Different voice changing effects.

S5.5. Read the output layer cache, and convert the sampling point type of the processing result into the data type matched by the container; in this embodiment, reverse convert the data in the output layer cache, and convert the step The type of data entered by the container mentioned in S5.1.

S5.6, play sound.

Further, both the input layer cache and the output layer cache are ring caches. The ring cache maximizes the reuse of memory resources.

Further, in step S5.3, referring to FIG. 4 , after the container obtains the real-time audio, it converts the sample point type of the channel data into a data type matched by the web-side audio processing module and stores it in the Before the input layer buffer, it further includes the step of interleaving and storing the channel data included in the real-time audio. Since audio generally includes two channels, it is necessary to organize the two-channel audio data into data on a single chain before it can be read by the web-side audio processing module. Specifically, the data of the left channel and the right channel are interleavedly stored. Therefore, in step S5.5, after converting the sample point type of the processing result into a data type matching the container, a step is further included: converting the interleaved stored processing result into a multi-channel processing result.

Using a browser to implement the method mentioned in this embodiment, it takes about 10 microseconds to process 128-bit data for a single data type matching, and it takes about 300 microseconds for the audio data of 1024-bit data to be processed by the web-side audio processing module. It takes about 320 microseconds in total.

Embodiment two

A browser-based real-time audio processing system, comprising: with reference to Figure 5,

Embodiment Three

A computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the real-time audio processing method described in Embodiment 1 is realized.

Each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute all or part of the steps of the methods in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

The above descriptions are only preferred embodiments of the present invention, and all equivalent changes and modifications made according to the scope of the patent application of the present invention shall fall within the scope of the present invention.

Claims

A browser-based real-time audio processing method, characterized in that: comprising the following steps:

S1, obtaining a native audio processing module written in a non-JavaScript programming language;

S2, compiling the native audio processing module into a web-side audio processing module;

S3. Select the container of the web-side audio processing module in the browser, and the container is used to load and run the web-side audio processing module;

S4, acquiring real-time audio, establishing a real-time audio input interface in the browser, and mapping the real-time audio to the real-time audio input interface;

S5. The container acquires the real-time audio through the real-time audio input interface, matches the input data type and output data type of the container, processes the real-time audio, and plays a sound.
A browser-based real-time audio processing method according to claim 1, characterized in that: in step S3, the selection of the container of the web-side audio processing module in the browser is specifically: in the browser, Call the AudioWorklet interface in the browser to create a newly started independent thread as the container of the web-side audio processing module.
A browser-based real-time audio processing method according to claim 1, characterized in that: step S4 is specifically:

S4.1. Obtain real-time audio through an audio input source, where the audio input source is one or more of a buffered audio input node, a media stream audio input node, and a media element audio input node;

S4.2, if the audio input source is a buffered audio input node, call the browser AudioBufferSourceNode interface, and write the real-time audio into the buffer attribute of the object in turn;

If the audio input source is a media stream audio input node, call the browser getUserMedia interface to obtain the real-time audio from the device media stream, call the browser createMediaStreamSource interface to obtain the real-time audio, and establish media stream audio input;

If the audio input source is a media element audio input node, connect the player to the browser, call the browser createMediaElementSource interface, and create the media element audio input.
A browser-based real-time audio processing method according to claim 1, characterized in that: step S5 is specifically:

S5.1. The container acquires the real-time audio through the real-time audio input interface;

S5.2, establishing an input layer cache and an output layer cache in the web-side audio processing module;

S5.3. After the container obtains the real-time audio, convert the sample point type of the channel data into a data type matched by the web-side audio processing module and store it in the input layer cache;

S5.4, the web-side audio processing module processes the data in the input layer cache, and stores the processing result in the output layer cache;

S5.5. Read the output layer cache, and convert the sampling point type of the processing result into a data type matched by the container;

S5.6, play sound.
A browser-based real-time audio processing method according to claim 4, characterized in that: both the input layer cache and the output layer cache are ring caches.
A browser-based real-time audio processing method according to claim 4, characterized in that: in step S5.3, after the container acquires the real-time audio, convert the sampling point type of the channel data into Before the data type matched by the web-side audio processing module is stored in the input layer cache, it further includes a step of interleaving the channel data included in the real-time audio.
A browser-based real-time audio processing method according to claim 6, characterized in that: in step S5.5, after converting the sample point type of the processing result into the data type matched by the container, further comprising Step: converting the interleaved processing result into a multi-channel processing result.
A browser-based real-time audio processing method according to claim 4, characterized in that: in step S5.4, when the cache storage capacity of the input layer cache reaches the frame length required by the Web-side audio processing module , the web-side audio processing module processes the data in the input layer cache, and stores the processing result in the output layer cache.
A browser-based real-time audio processing system, characterized in that: comprising:

An acquisition unit, configured to acquire a native audio processing module written in a non-JavaScript programming language;

A compiling unit, configured to compile the native audio processing module into a Web-side audio processing module;

The container selection unit is used to select the container of the Web-side audio processing module in the browser, and the container is used to load and run the Web-side audio processing module;

A mapping unit, configured to acquire real-time audio, establish a real-time audio input interface in the browser, and map the real-time audio to the real-time audio input interface;

The data type matching unit is used for the container to obtain the real-time audio through the real-time audio input interface, match the input data type and output data type of the container, process the real-time audio and play sound.
A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the real-time audio processing method described in claim 1 is implemented.