CN114546324A

CN114546324A - Audio processing method and device

Info

Publication number: CN114546324A
Application number: CN202011255627.3A
Authority: CN
Inventors: 郑染秋; 高裕轩
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2020-11-11
Filing date: 2020-11-11
Publication date: 2022-05-27

Abstract

The application provides an audio processing method and an audio processing device, wherein the audio processing method is applied to a browser, and the method comprises the following steps: acquiring first audio data of audio to be processed through a data interface of a browser, and displaying a waveform corresponding to the first audio data on an audio display interface of the browser; the method comprises the steps of receiving preset interactive operation aiming at waveforms through an audio display interface of a browser, calling an instruction execution interface to process first audio data, and obtaining processed second audio data, wherein the instruction execution interface is an interface which is packaged with an instruction receiving unit, an instruction processing unit, an execution algorithm unit and an execution result output unit in the browser, and the execution algorithm unit comprises an algorithm corresponding to the preset interactive operation. According to the method and the device, the packaged instruction execution interface is called in the browser, so that the complex calculation operation of the audio data can be realized, and the functions of the browser are expanded.

Description

Audio processing method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to an audio processing method and apparatus.

Background

With the rapid development of computer technology, audio processing technology is also developed, and with the diversification of work and life requirements of people, the audio processing of people gradually tends to diversify and complicate. In the prior art, a user can process Audio in a browser, and under the support of a Web Audio Api, the user can decode a section of input Audio, write the Audio into a buffer area, and perform some basic operations on the Audio. However, some of the more complicated operations (such as pitch detection and beat detection) require complicated computation on audio data, and the browser is not yet supported, so an audio processing apparatus capable of performing complicated processing on audio in the browser is needed.

Disclosure of Invention

In view of this, an embodiment of the present application provides an audio processing method. The application also relates to an audio processing device, a computing device and a computer readable storage medium for extending the functions of the browser.

According to a first aspect of the embodiments of the present application, there is provided an audio processing method applied to a browser, the method including:

acquiring first audio data of audio to be processed through a data interface of a browser, and displaying a waveform corresponding to the first audio data on an audio display interface of the browser;

receiving preset interactive operation aiming at the waveform through an audio display interface of the browser, calling an instruction execution interface to process the first audio data to obtain processed second audio data, wherein the instruction execution interface is an interface in which an instruction receiving unit, an instruction processing unit, an execution algorithm unit and an execution result output unit are packaged in the browser, the execution algorithm unit comprises an algorithm corresponding to the preset interactive operation, and an interactive instruction triggered by the preset interactive operation is executed through the instruction execution interface;

and displaying the waveform corresponding to the second audio data on an audio display interface of the browser.

According to a second aspect of the embodiments of the present application, there is provided an audio processing apparatus configured in a browser, the audio processing apparatus including an instruction execution interface, the instruction execution interface being packaged with an instruction receiving unit, an instruction processing unit, an execution algorithm unit, and an execution result output unit;

the instruction receiving unit is configured to receive a calling instruction, and the calling instruction carries preset interactive operation aiming at the waveform of the first audio data, which is acquired through an audio display interface of the browser;

the instruction processing unit is configured to determine a target execution algorithm unit corresponding to the call instruction according to the preset interactive operation, and operate the target execution algorithm unit to obtain an execution result of the call instruction;

the execution result output unit is configured to output the execution result.

According to a third aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to implement the method of:

receiving preset interactive operation aiming at the waveform through an audio display interface of the browser, calling an instruction execution interface to process the first audio data to obtain processed second audio data, wherein the instruction execution interface is an interface which is packaged with an instruction receiving unit, an instruction processing unit, an execution algorithm unit and an execution result output unit in the browser, the execution algorithm unit comprises an algorithm corresponding to the preset interactive operation, and an interactive instruction triggered by the preset interactive operation is executed through the instruction execution interface;

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the audio processing method.

According to the audio processing method, first audio data of audio to be processed can be acquired through a data interface of a browser, and waveforms corresponding to the first audio data are displayed on an audio display interface of the browser; and then, receiving preset interactive operation aiming at the waveform through an audio display interface of the browser, calling an instruction execution interface to process the first audio data to obtain processed second audio data, and displaying the waveform corresponding to the second audio data on the audio display interface of the browser. The instruction execution interface is an interface which is packaged with an instruction receiving unit, an instruction processing unit, an execution algorithm unit and an execution result output unit in the browser, the execution algorithm unit comprises an algorithm corresponding to the execution of the preset interactive operation, and the interactive instruction triggered by the preset interactive operation is executed through the instruction execution interface. Under the condition, when complex operation needs to be performed on the audio data, the packaged instruction execution interface is directly called to operate the execution algorithm unit packaged in the instruction execution interface, and the audio data is subjected to complex processing.

The audio processing device is configured on a browser and comprises an instruction execution interface, wherein the instruction execution interface is packaged with an instruction receiving unit, an instruction processing unit, an execution algorithm unit and an execution result output unit; the instruction receiving unit is configured to receive a calling instruction, and the calling instruction carries preset interactive operation of the waveform of the first audio data acquired through an audio display interface of the browser; the instruction processing unit is configured to determine a target execution algorithm unit corresponding to the call instruction according to preset interactive operation, and run the target execution algorithm unit to obtain an execution result of the call instruction; an execution result output unit configured to output an execution result. Under the condition, the execution algorithm required by the operation of performing complex calculation on the audio data can be pre-packaged in the instruction execution interface, so that the operation of performing complex calculation on the audio data is realized, the functions of the browser are expanded, and the use experience of the browser is improved.

Drawings

Fig. 1 is a schematic block diagram of an audio processing apparatus according to an embodiment of the present application;

FIG. 2 is a functional diagram of an instruction execution interface according to an embodiment of the present application;

FIG. 3 is a functional diagram of an audio processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic diagram of an analystnode interface analysis data according to an embodiment of the present application;

fig. 5 is a schematic waveform diagram of first audio data according to an embodiment of the present application;

FIG. 6 is a waveform diagram of another first audio data provided according to an embodiment of the application;

FIG. 7 is a flow chart of an audio processing method provided according to an embodiment of the present application;

fig. 8 is a block diagram of a computing device according to an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms referred to in one or more embodiments of the present application are explained.

Pitch (Pitch): pitch refers to the sound of various heights, i.e. the height of the tone, one of the basic characteristics of the tone. The level of sound is determined by the vibration frequency, and the two are in positive correlation, and if the frequency (i.e. the number of vibrations per unit time) is high, the sound is "high", otherwise the sound is "low".

BPM (Beat Per Minute, units of beats Per Minute): the most superficial concept is the number of sound beats made between time segments of one minute, in units of BPM, also called beat number.

And (3) Tempo: tempo, usually in BPM beats per minute, is a measure of the speed of music.

PCM (Pulse Code Modulation): the standard format adopted by the system of computer, DVD and digital telephone is PCM, that is, a time-continuous and value-continuous analog signal is converted into a time-discrete and value-discrete digital signal and then transmitted in a channel, and PCM is the process of sampling the analog signal, quantizing the amplitude of the sample and coding.

Sampling frequency: also known as sampling rate or sampling speed, defines the number of samples per second extracted from a continuous signal and constituting a discrete signal, expressed in hertz (Hz), and the inverse of the sampling frequency, called the sampling period or sampling time, is the time interval between samples. That is, how often the sampling is performed, the sound range recognized by human ears is 20-20KHZ, so 44.1kHz, 48kHz or 96kHz is generally selected as the sampling rate.

Web Audio API: a very efficient and versatile system for controlling audio on the Web is provided that allows developers to decode, process and output audio, such as to self-select audio sources, add special effects to audio, visualize audio, add spatial effects (e.g., panning), etc.

AudioBuffer interface: representing a short segment of audio resources stored in a memory, is constructed from an audio file using the audiocontext. After the audio is put into the AudioBuffer, the audio can be transmitted to an AudioBuffer sourcenode for playing.

Analystnode interface: a node is shown that can provide real-time frequency and time domain analysis information, which is an AudioNode that does not make any changes to the audio stream, while allowing the data it generates to be retrieved and processed to create an audio visualization. The AnalyzerNode has only one input and output, and can function properly even if it is not connected to an output.

In the present application, an audio processing apparatus is provided, and the present application relates to an audio processing method, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.

Fig. 1 shows an architecture diagram of an audio processing apparatus according to an embodiment of the present application, fig. 2 shows a functional diagram of an instruction execution interface according to an embodiment of the present application, and fig. 3 shows a functional diagram of an audio processing apparatus according to an embodiment of the present application. As shown in fig. 2, the upper layer functions of the instruction execution interface include instruction receiving and execution result outputting, and the lower layer functions include instruction processing and execution algorithms; as shown in fig. 3, the upper layer functions of the audio processing apparatus include data reception and waveform presentation, and the lower layer functions include interactive functions and interface calls.

As shown in fig. 1, the audio processing apparatus is configured in the browser, and the audio processing apparatus includes an instruction execution interface 102, wherein the instruction execution interface 102 is packaged with an instruction receiving unit 1021, an instruction processing unit 1022, an execution algorithm unit 1023, and an execution result output unit 1024;

the instruction receiving unit 1021 is configured to receive a call instruction, and the call instruction carries preset interactive operation of a waveform of first audio data acquired through an audio display interface of a browser;

the instruction processing unit 1022 is configured to determine a target execution algorithm unit 1023 corresponding to the call instruction according to a preset interactive operation, and run the target execution algorithm unit 1023 to obtain an execution result of the call instruction;

an execution result output unit 1024 configured to output an execution result.

In practical application, along with diversification of work and life requirements of people, people tend to diversify and complicate audio processing, and for operations (such as pitch detection, beat detection and the like) needing complex calculation on audio data, a browser is not supported.

Therefore, the present application provides an audio processing apparatus configured in a browser, the audio processing apparatus including an instruction execution interface, the instruction execution interface encapsulating an instruction receiving unit, an instruction processing unit, an execution algorithm unit, and an execution result output unit; the instruction receiving unit is configured to receive a calling instruction, and the calling instruction carries preset interactive operation aiming at the waveform of the first audio data, which is acquired through an audio display interface of the browser; the instruction processing unit is configured to determine a target execution algorithm unit corresponding to the call instruction according to preset interactive operation, and operate the target execution algorithm unit to obtain an execution result of the call instruction; an execution result output unit configured to output an execution result. Under the condition, an execution algorithm required by the operation of performing complex calculation on the audio data can be pre-packaged in the instruction execution interface, and when the complex operation on the audio data is required, the packaged instruction execution interface is directly called to operate an execution algorithm unit packaged in the instruction execution interface, so that the audio data is subjected to complex processing, and more comprehensive audio processing capability is provided for the browser.

The number of the algorithm execution units 1023 can be multiple, and each algorithm execution unit 1023 includes an execution algorithm for performing complex calculation and processing on the audio. For example, 3 execution algorithm units 1023, which are pitch detection algorithm units including a pitch detection algorithm, are packaged in the instruction execution interface; a beat detection algorithm unit including a beat detection algorithm; a pitch adjustment algorithm unit, wherein a pitch adjustment algorithm is included.

It should be noted that the instruction receiving unit 1021 and the execution result output unit 1024 correspond to instruction receiving and execution result output in the upper-layer function shown in fig. 2, respectively, and the instruction processing unit 1022 and the execution algorithm unit 1023 correspond to execution processing and execution algorithm in the lower-layer function shown in fig. 2, respectively.

According to the method and the device, the execution algorithm required by the operation of performing complex computation on the audio data can be pre-packaged in the instruction execution interface, when the complex operation is required on the audio data, the packaged instruction execution interface is directly called, the execution algorithm unit packaged in the instruction execution interface is operated, and the audio data is subjected to complex processing.

In an optional implementation manner of this embodiment, as shown in fig. 1, the audio processing apparatus further includes a data obtaining module 104, a presentation module 106, an interaction module 108, and an interface calling module 110;

the data acquisition module 104 is configured to acquire first audio data of the audio to be processed through a data acquisition interface of the browser;

the presentation module 106 is configured to present a waveform corresponding to the first audio data on an audio presentation interface of the browser;

the interaction module 108 is configured to receive preset interaction operations for the waveforms through an audio presentation interface of the browser;

the interface calling module 110 is configured to call the instruction execution interface 102 to process the first audio data, so as to obtain processed second audio data;

the presentation module 106 is further configured to present the waveform corresponding to the second audio data on an audio presentation interface of the browser.

In actual implementation, the interface calling module 110 is further configured to:

sending a calling instruction to an instruction receiving unit of the instruction execution interface to operate a target execution algorithm unit of the instruction execution interface to process the first audio data to obtain processed second audio data;

and receiving second audio data output by an execution result output unit of the instruction execution interface.

The data obtaining module 104 and the displaying module 106 respectively correspond to data receiving and waveform displaying in the upper layer function shown in fig. 3, and the interacting module 108 and the interface calling module 110 respectively correspond to interacting and interface calling in the bottom layer function shown in fig. 3.

In actual implementation, generally, the directly acquired audio data are all time domain signals, and when a waveform corresponding to the audio data is displayed, a frequency domain waveform diagram corresponding to the audio data is often displayed, so that for convenience of visualizing the first audio data by the subsequent display module 106 (i.e., displaying the waveform corresponding to the first audio data), time domain and frequency domain information of the first audio data needs to be analyzed by means of an analystnode interface of webaudio, so that the display module 106 can draw the waveform corresponding to the first audio data on the audio display interface of the browser.

It should be noted that fig. 4 shows a schematic diagram of analysis data of an analystnode interface, which represents a node capable of providing real-time frequency domain and time domain analysis information according to an embodiment of the present application, and as shown in fig. 4, it is an AudioNode without any modification to an audio stream, and at the same time, it allows to acquire and process the data generated by it, thereby creating an audio visualization.

According to the method and the device, first audio data of the audio to be processed can be acquired through a data acquisition interface of the browser, and then waveforms corresponding to the first audio data are displayed on an audio display interface of the browser; and then, a user can carry out interactive operation on the displayed waveform on the audio display interface, if preset interactive operation is received through the audio display interface of the browser, the packaged instruction execution interface can be directly called to operate a corresponding execution algorithm unit packaged in the instruction execution interface to carry out complex processing on the audio data, and when the processed second audio data output by the execution result output unit packaged in the instruction execution interface is received, the waveform corresponding to the second audio data can be displayed on the audio display interface of the browser again.

In an optional implementation manner of this embodiment, as shown in fig. 1, the audio processing apparatus further includes a data processing module;

the data processing module is configured to decode the first audio data through a decoding interface of the browser and determine at least one channel data of the first audio data;

accordingly, the presentation module is further configured to:

and displaying the waveform corresponding to the channel data on an audio display interface of the browser aiming at each channel data in the at least one channel data.

Specifically, after the first audio data of the audio to be processed is acquired through the data acquisition interface, in order to facilitate processing of the first audio data, the first audio data can be decoded through a decodeaaudiodata interface of the WebAudio, the first audio data is decoded into a section of AudioBuffer, the AudioBuffer represents PCM data of a section of audio in the memory, and after the first audio data is placed into the AudioBuffer, the control of the first audio data can be achieved. For a song with multiple sound channels, sound channel data of different sound channels can be obtained from the AudioBuffer, different sound channels of the first audio data are separated, and then waveforms corresponding to the sound channel data can be respectively displayed on an audio display interface of the browser according to the separated sound channel data.

Not only can show the wave form that first audio data corresponds to that is complete in this application to first audio data, when first audio data includes a plurality of sound channel data, can also separate each sound channel data in the first audio data, separately show different sound channel data, realized promptly that the branch sound channel carries out the audio data visual, be convenient for follow-up user carries out interactive operation to the wave form of the audio data of single sound channel, the object of interactive operation has been expanded, the suitability of audio processing device has been improved.

In an optional implementation manner of this embodiment, the presentation module 106 is further configured to:

determining first display precision of the first audio data according to the width of the audio display interface, wherein the first display precision refers to displaying a first number of pixel points on the audio display interface, and each pixel point identifies a second number of audio sampling points in the first audio data;

and generating a waveform corresponding to the first audio data according to the first numerical value pixel points displayed in the audio display interface and the second numerical value audio sampling points identified by each pixel point.

The first audio data of the audio to be processed is sampled according to a certain sampling frequency, and thus the first audio data should include a plurality of audio sampling points, for example, a segment of the audio to be processed (i.e., the first audio data) may include 10000 audio sampling points. However, the width of the audio display interface is certain, that is, the audio display interface can only display a fixed number of pixel points, generally speaking, the number of the pixel points that the audio display interface can display is far smaller than the number of the audio sampling points included in the audio data, so that a display precision needs to be preset, that is, one pixel point in the audio display interface can represent how many audio sampling points. The first numerical value is the number of the displayed pixel points in the audio display interface, and the second numerical value is the number of audio sampling points in the first audio data represented by one pixel point.

Illustratively, 100 pixel points displayed on the audio display interface are determined according to the width of the audio display interface, and when the first audio data includes 10000 audio sampling points, one pixel point in the audio display interface needs to identify the audio sampling points in the 100 first audio data.

for each pixel point in the audio display interface, determining the coordinate of the pixel point according to the second numerical value audio sampling points of the pixel point identification, and displaying the pixel point at the coordinate;

and connecting the first numerical value pixel points displayed in the audio display interface to form a waveform corresponding to the first audio data.

It should be noted that, a pixel in the audio display interface needs to identify the second number of audio sampling points, and therefore, the coordinates of the corresponding pixel in the audio display interface need to be determined according to the second number of audio sampling points.

Provide the wave form of two kinds of forms in this application, one kind is the broken line oscillogram, to the broken line oscillogram, need confirm the abscissa and ordinate of every pixel in audio frequency show interface, then show each pixel in the abscissa department that corresponds, connect again, can obtain the broken line oscillogram, and concrete implementation can be as follows:

determining a first audio sampling point with the highest frequency and a second audio sampling point with the lowest frequency in second numerical audio sampling points identified by the pixel points;

determining the average value of the frequencies of the first audio sampling point and the second audio sampling point;

taking the frequency mean value as the vertical coordinate of the pixel point;

and determining the abscissa of the pixel point according to the position of the pixel point in the first numerical value pixel point displayed on the audio display interface.

It should be noted that one pixel point represents the second number of audio sampling points, so that the average frequency of the second number of audio sampling points can be used as the ordinate of the pixel point. In order to reduce the amount of calculation, when determining the frequency average of the second number of audio samples, all frequencies of the second number of audio samples are not calculated, only the highest frequency and the lowest frequency are taken, and the frequency average of the highest frequency and the lowest frequency is taken as the frequency average of the second number of audio samples.

In addition, the audio display interface displays the first number of pixel points, that is, the horizontal axis of the audio display interface is averagely divided into the first number of pixel points, each of which represents one pixel point, so that the position of a certain pixel point in the first number of pixel points is the abscissa of the pixel point, and if the 30 th pixel point is displayed in the audio display interface, the abscissa of the pixel point is 30.

For example, fig. 5 shows a waveform diagram of first audio data according to an embodiment of the present application, and as shown in fig. 5, 5 pixel points are displayed in an audio display interface, and each pixel point represents 100 audio sampling points. For a first pixel point, the abscissa of the first pixel point is 1, and if the highest frequency and the lowest frequency of 100 represented audio sampling points are 5 and 3 respectively, the ordinate of the first pixel point is determined to be 4; for the second pixel point, the abscissa of the second pixel point is 2, and if the highest frequency of the 100 represented audio sampling points is 10 and the lowest frequency is 2, the ordinate of the second pixel point is determined to be 6; for a third pixel point, the abscissa of the third pixel point is 3, and if the highest frequency and the lowest frequency of the 100 represented audio sampling points are 9 and 7 respectively, the ordinate of the third pixel point is determined to be 8; for the fourth pixel point, the abscissa of the fourth pixel point is 4, and if the highest frequency and the lowest frequency of the 100 represented audio sampling points are 4 and 2 respectively, the ordinate of the fourth pixel point is determined to be 3; for the fifth pixel point, the abscissa thereof is 5, and assuming that the highest frequency of the 100 represented audio sampling points is 3 and the lowest frequency is 1, the ordinate of the pixel point is determined to be 2. And then, connecting the pixel points to form a broken line oscillogram of the first audio data.

The first display precision of the first audio data can be determined according to the width of the audio display interface, the audio sampling points are selected from the first audio data according to the first display precision, the audio sampling points are mapped to the horizontal and vertical coordinates in the audio display interface, the display positions of all pixel points in the audio display interface can be determined, then all the pixel points are connected to form a broken line oscillogram of the first audio data, the processes of point selection, mapping, line connection and the like in the audio visualization process are simple, the calculated amount is small, and the visualization efficiency is improved.

In an optional implementation manner of this embodiment, the display module 106 is further configured to:

generating a waveform corresponding to the pixel point according to the second numerical value audio sampling points identified by the pixel point for each pixel point in the audio display interface;

and combining waveforms corresponding to the first numerical value pixel points displayed in the audio display interface into a waveform corresponding to the first audio data.

It should be noted that, two forms of waveforms are provided in the present application, and besides the above polygonal line waveform diagram, the waveforms may also be a columnar waveform diagram, and for the columnar waveform diagram, each pixel corresponds to a columnar waveform, and the columnar waveforms corresponding to each pixel are combined to form a waveform of the first audio data, and the specific implementation manner may be as follows:

determining the abscissa of the pixel point according to the position of the pixel point in the first numerical value pixel point displayed on the audio display interface;

marking a first frequency of a first audio sampling point and a second frequency of a second audio sampling point at a position corresponding to an abscissa in an audio display interface;

and connecting the first frequency and the second frequency to generate a waveform corresponding to the pixel point.

It should be noted that the waveform amplification of each pixel is a histogram, the height is the frequency difference between the audio sampling point with the highest frequency and the audio sampling point with the lowest frequency, and the width is one pixel.

For example, fig. 6 shows a waveform diagram of another first audio data provided according to an embodiment of the present application, and as shown in fig. 6, 5 pixel points are displayed in an audio display interface, and each pixel point represents 100 audio sampling points. For the first pixel point, the abscissa of the first pixel point is 1, and if the highest frequency of 100 audio sampling points represented by the first pixel point is 5 and the lowest frequency of the 100 audio sampling points represented by the first pixel point is 3, a straight line is drawn from 3 to 5 at the position where the abscissa is 1; for the second pixel point, the abscissa thereof is 2, and if the highest frequency of the 100 audio sampling points represented by the second pixel point is 10 and the lowest frequency is 2, a straight line is drawn from 2 to 10 at the position where the abscissa is 2; for the third pixel point, the abscissa of the third pixel point is 3, and if the highest frequency and the lowest frequency of the 100 represented audio sampling points are 9 and 7 respectively, a straight line is drawn from 7 to 9 at the position where the abscissa is 3; for the fourth pixel point, the abscissa of the fourth pixel point is 4, and if the highest frequency of the 100 audio sampling points represented by the fourth pixel point is 4 and the lowest frequency of the 100 audio sampling points is 2, a straight line is drawn from 2 to 4 at the position where the abscissa is 4; for the fifth pixel point, the abscissa is 5, and assuming that the highest frequency of the 100 represented audio samples is 3 and the lowest frequency is 1, a straight line is drawn from 1 to 3 at a position where the abscissa is 5. The waveform combination of the above 5 pixels forms a histogram of the first audio data.

Except the broken line oscillogram that can show first audio data in this application, can also show the column oscillogram of first audio data, different show forms can adapt to different application scenarios, and adaptability is high, and the waveform display is nimble, can satisfy the requirement to the waveform of show under the different conditions.

In addition, after the drawing parameters of the waveform of the first audio data are determined, the specific drawing process may be performed in a canvas format or in a svg format, which is not limited in the present application.

It should be noted that, the instruction execution interface of the browser encapsulates an execution algorithm required for an operation that requires complex computation on audio data, but some basic operations can be implemented without calling the instruction execution interface. Therefore, after receiving the interactive operation aiming at the waveform through the audio display interface of the browser, whether the interactive operation is the preset interactive operation needs to be judged, and if the received interactive operation is the preset interactive operation, the instruction execution interface can be called to process the first audio data to obtain the processed second audio data; if the received interactive operation is not the preset interactive operation, the interactive operation can be directly realized through a basic function module of the browser. The preset interactive operation refers to an operation (i.e., a complex operation that needs to be executed by an algorithm) that is packaged in the instruction execution interface and corresponds to the execution algorithm unit.

In an optional implementation manner of this embodiment, the received interactive operation is a waveform scaling operation (not belonging to a preset interactive operation), and the interactive module 108 is further configured to:

receiving a waveform scaling operation aiming at a waveform through an audio display interface of a browser;

accordingly, the presentation module 106 is further configured to:

determining second display precision of the first audio data according to the zooming operation and the first display precision, wherein the second display precision refers to displaying a third number of pixel points on an audio display interface, and each pixel point identifies an audio sampling point in a fourth number of first audio data;

and generating a waveform corresponding to the first audio data according to the third numerical value pixel points displayed in the audio display interface and the fourth numerical value audio sampling points identified by each pixel point.

It should be noted that, in order to accurately visualize a certain segment of audio data, a certain segment of waveform may be interactively expanded specifically, so that a function of zooming the waveform map is also provided in the present application. For example, the waveform may be expanded/scaled by scrolling forward/backward through a mouse wheel.

In specific implementation, after zooming operation is performed on the waveform each time, the display precision of the audio display interface needs to be determined again according to the zooming operation, and then operations such as point selection and mapping are performed on the first audio data again, so that the waveform diagram after the zooming operation is drawn on the audio display interface again. Specifically, the zoom scale corresponding to each operation may be preset, a target zoom scale may be determined according to the current zoom operation, the current display precision may be adjusted according to the target zoom scale, and the display precision after the zoom operation may be determined.

Illustratively, the first audio data includes 10000 audio sampling points, the first display precision is that 100 pixels are displayed on the audio display interface, and if the zoom operation instruction is amplified by one time, 200 pixels are displayed on the audio display interface, and at this time, each pixel represents 50 audio sampling points.

The function of zooming the displayed waveform is provided in the application, so that different requirements of a user can be met, the waveform of the first audio data is displayed more finely or roughly, the display mode is flexible, and the adaptability is higher. And each zooming operation can be performed with point selection and mapping again, and the waveform diagram is redrawn, but the well-drawn waveform diagram is not simply zoomed in and out, so that the accuracy of the displayed waveform diagram after the zooming operation is ensured.

In an optional implementation manner of this embodiment, the received interactive operation is a waveform clipping operation (not belonging to a preset interactive operation), and the interactive module 108 is further configured to:

receiving a waveform clipping operation for a third target waveband of the waveform through an audio display interface of the browser;

accordingly, the presentation module 106 is further configured to:

determining third audio data corresponding to a third target waveband from the first audio data;

determining a third display precision of the third audio data according to the width of the audio display interface, wherein the third display precision is that a fifth numerical value pixel point is displayed on the audio display interface, and each pixel point identifies an audio sampling point in a sixth numerical value third audio data;

and generating a waveform corresponding to the third audio data according to the fifth numerical pixel point displayed in the audio display interface and the sixth numerical audio sampling point identified by each pixel point.

The display waveform can be cut, when the cut waveform is displayed, the cut waveform can be amplified and displayed, namely, the display precision is redetermined aiming at a third target waveband to be cut (namely, a selected waveband), points are reselected and mapped, and therefore the waveform of the cut target waveband is accurately displayed.

In addition, besides redrawing the clipped wave band, the displayed waveform can be simply clipped, that is, the starting point and the end point of the third target wave band are selected, and the waveform between the starting point and the end point is extracted and displayed. In specific implementation, the selection can be carried out according to requirements.

In an optional implementation manner of this embodiment, the received interactive operation is a pitch detection operation (belonging to a preset interactive operation), and the interactive module 108 is further configured to:

receiving a selected operation aiming at a first target waveband of a waveform through an audio display interface of a browser, and receiving a pitch detection operation aiming at the target waveband;

accordingly, the interface invocation module is further configured to:

sending a calling instruction to an instruction receiving unit of the instruction execution interface to operate a pitch detection algorithm unit of the instruction execution interface to detect the audio data of the first target waveband, so as to obtain pitch data of the first target waveband;

and receiving pitch data output by an execution result output unit of the instruction execution interface.

In the application, the packaged command execution interface can be directly called to operate the pitch detection algorithm unit packaged in the command execution interface, the pitch of the audio data is detected, the function of the browser is expanded, and the pitch of the audio can be detected through the browser.

In an optional implementation manner of this embodiment, the received interactive operation is a pitch adjustment operation (belonging to a preset interactive operation), and the interactive module 108 is further configured to:

receiving a pitch adjustment operation aiming at a second target wave band of the waveform through an audio display interface of the browser, wherein the pitch adjustment operation carries a target pitch;

accordingly, the interface call module 110 is further configured to:

sending a calling instruction to an instruction receiving unit of the instruction execution interface to operate a pitch adjustment algorithm unit of the instruction execution interface to adjust the current pitch of the audio data of the second target waveband to a target pitch so as to obtain audio data corresponding to the target pitch;

and receiving audio data corresponding to the target pitch output by the execution result output unit of the instruction execution interface.

The method and the device can directly adjust the pitch of the audio data by calling the packaged instruction execution interface to operate the pitch adjustment algorithm unit packaged in the instruction execution interface, and adjust the pitch of the audio data through the preset pitch adjustment algorithm, so that the function of the browser is expanded, and the pitch of the audio can be adjusted through the browser.

In an optional implementation manner of this embodiment, the received interaction operation is a beat detection operation (belonging to a preset interaction operation), and the interaction module 108 is further configured to:

receiving beat detection operation aiming at the waveform through an audio display interface of a browser;

accordingly, the interface invocation module is further configured to:

sending a calling instruction to an instruction receiving unit of an instruction execution interface to operate a beat detection algorithm unit of the instruction execution interface to perform beat detection on the first audio data to obtain beat data of the first audio data;

and receiving beat data output by an execution result output unit of the instruction execution interface.

In specific implementation, after the beat of the first audio data is detected, a beat line corresponding to the beat data can be displayed on an audio display interface of the browser.

In the application, the packaged instruction execution interface can be directly called to operate the beat detection algorithm unit packaged in the instruction execution interface, the beat of the audio data is detected, the beat line is drawn on the audio display interface, the function of the browser is expanded, the beat of the audio can be detected through the browser, and the beat line is drawn.

It should be noted that, in the present application, only the execution algorithm unit encapsulated in the instruction execution interface includes a pitch detection algorithm unit, a pitch adjustment algorithm unit, and a beat detection algorithm unit as examples for description, in actual implementation, the instruction execution interface may also be encapsulated with execution algorithm units of other complex algorithms to implement other complex operations, which is not limited in the present application.

The audio processing device is configured on a browser and comprises an instruction execution interface, wherein the instruction execution interface is packaged with an instruction receiving unit, an instruction processing unit, an execution algorithm unit and an execution result output unit; the instruction receiving unit is configured to receive a calling instruction, and the calling instruction carries preset interactive operation aiming at the waveform of the first audio data, which is acquired through an audio display interface of the browser; the instruction processing unit is configured to determine a target execution algorithm unit corresponding to the call instruction according to preset interactive operation, and operate the target execution algorithm unit to obtain an execution result of the call instruction; an execution result output unit configured to output an execution result. Under the condition, an execution algorithm required by the operation of performing complex calculation on the audio data can be pre-packaged in the instruction execution interface, and when the complex operation on the audio data is required, the packaged instruction execution interface is directly called to operate an execution algorithm unit packaged in the instruction execution interface to perform complex processing on the audio data.

Fig. 7 is a flowchart of an audio processing method according to an embodiment of the present application, which is applied in a browser and specifically includes the following steps:

step 702: the method comprises the steps of obtaining first audio data of audio to be processed through a data interface of a browser, and displaying a waveform corresponding to the first audio data on an audio display interface of the browser.

In an optional implementation manner of this embodiment, a waveform corresponding to the first audio data is displayed on an audio display interface of the browser, and a specific implementation manner may be as follows:

In an optional implementation manner of this example, a waveform corresponding to the first audio data is generated according to the first number of pixel points displayed in the audio display interface and the second number of audio sampling points identified by each pixel point, and an implementation manner may be as follows:

aiming at each pixel point in the audio display interface, determining the coordinate of the pixel point according to the second numerical value audio sampling points of the pixel point identification, and displaying the pixel point at the coordinate;

The coordinates of the pixel points are determined according to the second numerical value audio sampling points identified by the pixel points, and the specific implementation process can be as follows:

taking the frequency mean value as the vertical coordinate of the pixel point;

In an optional implementation manner of this example, a waveform corresponding to the first audio data is generated according to the first number of pixel points displayed in the audio display interface and the second number of audio sampling points identified by each pixel point, and another implementation manner may be as follows:

Wherein, aiming at each pixel point in the audio frequency display interface, according to the second numerical value audio frequency sampling point of the pixel point identification, the waveform corresponding to the pixel point is generated, and the specific realization process can be as follows:

Step 704: and receiving preset interactive operation aiming at the waveform through an audio display interface of the browser, and calling an instruction execution interface to process the first audio data to obtain processed second audio data.

The instruction execution interface is an interface which is packaged with an instruction receiving unit, an instruction processing unit, an execution algorithm unit and an execution result output unit in the browser, wherein the execution algorithm unit comprises an algorithm corresponding to the execution of the preset interactive operation, and the instruction execution interface executes the interactive instruction triggered by the preset interactive operation.

In an optional implementation manner of this example, the instruction execution interface is called to process the first audio data to obtain the processed second audio data, and a specific implementation process may be as follows:

It should be noted that, the instruction execution interface of the browser encapsulates an execution algorithm required for an operation that requires complex computation on audio data, but some basic operations can be implemented without calling the instruction execution interface. Therefore, after receiving the interactive operation aiming at the waveform through the audio display interface of the browser, whether the interactive operation is the preset interactive operation needs to be judged, and if the received interactive operation is the preset interactive operation, the instruction execution interface can be called to process the first audio data to obtain the processed second audio data; if the received interactive operation is not the preset interactive operation, the interactive operation can be directly realized through a basic function module of the browser. The preset interactive operation is an operation (i.e., a relatively complex operation that needs to be executed by an algorithm) that is packaged in the instruction execution interface and corresponds to the execution algorithm unit.

In an optional implementation manner of this example, the received interactive operation is a pitch detection operation (the pitch detection operation is a preset interactive operation), the selection operation for the first target band of the waveform is received through an audio display interface of the browser, and the pitch detection operation for the target band is received, at this time, the instruction execution interface is called to process the first audio data, so as to obtain processed second audio data, and a specific implementation process may be as follows:

In an optional implementation manner of this example, the preset interactive operation may be a pitch adjustment operation (the pitch adjustment operation is the preset interactive operation), the pitch adjustment operation for a second target band of the waveform is received through an audio display interface of the browser, the pitch adjustment operation carries a target pitch, at this time, the instruction execution interface is called to process the first audio data, so as to obtain processed second audio data, and a specific implementation process may be as follows:

In an optional implementation manner of this example, the preset interaction operation may be a beat detection operation (the beat detection operation is the preset interaction operation), the beat detection operation for the waveform is received through an audio display interface of the browser, at this time, the instruction execution interface is called to process the first audio data, so as to obtain processed second audio data, and a specific implementation process may be as follows:

In an optional implementation manner of this example, the received interactive operation is a waveform scaling operation (not a preset interactive operation), the waveform scaling operation for the waveform is received through an audio display interface of the browser, at this time, a second display precision of the first audio data may be determined directly according to the scaling operation and the first display precision, where the second display precision refers to displaying a third number of pixel points on the audio display interface, and each pixel point identifies an audio sampling point in a fourth number of first audio data; and then generating a waveform corresponding to the first audio data according to the third numerical value pixel points displayed in the audio display interface and the fourth numerical value audio sampling points identified by each pixel point.

In an optional implementation manner of this example, the received interactive operation is a waveform clipping operation (not a preset interactive operation), the waveform clipping operation for a third target waveband of the waveform is received through an audio display interface of the browser, and at this time, third audio data corresponding to the third target waveband may be directly determined from the first audio data; then, according to the width of the audio display interface, determining a third display precision of the third audio data, wherein the third display precision refers to displaying a fifth numerical value of pixel points on the audio display interface, and each pixel point identifies an audio sampling point in a sixth numerical value of the third audio data; and generating a waveform corresponding to the third audio data according to the fifth numerical pixel point displayed in the audio display interface and the sixth numerical audio sampling point identified by each pixel point.

Step 706: and displaying the waveform corresponding to the second audio data on an audio display interface of the browser.

In an optional implementation manner of this embodiment, in a case that the preset interaction operation is a beat detection operation, the obtained second audio data is beat data of the first audio data, so that a beat line corresponding to the beat data may also be displayed on an audio display interface of the browser at this time.

According to the audio processing method, first audio data of audio to be processed can be acquired through a data interface of a browser, and waveforms corresponding to the first audio data are displayed on an audio display interface of the browser; and then, receiving preset interactive operation aiming at the waveform through an audio display interface of the browser, calling an instruction execution interface to process the first audio data to obtain processed second audio data, and displaying the waveform corresponding to the second audio data on the audio display interface of the browser. Under the condition, when complex operation needs to be performed on the audio data, the packaged instruction execution interface is directly called to operate the execution algorithm unit packaged in the instruction execution interface, and the audio data is subjected to complex processing.

The foregoing is a schematic scheme of an audio processing method of the present embodiment. It should be noted that the technical solution of the audio processing method and the technical solution of the audio processing apparatus belong to the same concept, and details that are not described in detail in the technical solution of the audio processing method can be referred to the description of the technical solution of the audio processing apparatus.

Fig. 8 illustrates a block diagram of a computing device 800 provided according to an embodiment of the present application. The components of the computing device 800 include, but are not limited to, memory 810 and a processor 820. The processor 820 is coupled to the memory 810 via a bus 830, and the database 850 is used to store data.

Computing device 800 also includes access device 840, access device 840 enabling computing device 800 to communicate via one or more networks 860. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 840 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the application, the above-described components of the computing device 800 and other components not shown in fig. 8 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 8 is for purposes of example only and is not limiting as to the scope of the present application. Those skilled in the art may add or replace other components as desired.

Computing device 800 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 800 may also be a mobile or stationary server.

Wherein the processor 820 is configured to execute the following computer-executable instructions to implement the following method:

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the audio processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the audio processing method.

An embodiment of the present application further provides a computer-readable storage medium storing computer instructions, which when executed by a processor, are used for implementing the method steps of the audio processing method described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the audio processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the audio processing method.

The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims

1. An audio processing method applied to a browser, the method comprising:

2. The audio processing method according to claim 1, wherein after the first audio data of the audio to be processed is obtained through the data interface of the browser, the method further comprises:

decoding the first audio data through a decoding interface of the browser, and determining at least one channel data of the first audio data;

correspondingly, the displaying the waveform corresponding to the first audio data on the audio display interface of the browser includes:

3. The audio processing method according to claim 1 or 2, wherein the presenting, on an audio presentation interface of the browser, the waveform corresponding to the first audio data includes:

4. The audio processing method according to claim 3, wherein the generating a waveform corresponding to the first audio data according to the first number of pixels displayed in the audio display interface and the second number of audio sampling points identified by each of the pixels comprises:

5. The audio processing method according to claim 4, wherein the determining coordinates of the pixel point according to the second number of audio sampling points identified by the pixel point comprises:

determining a frequency mean of the first audio sampling point and the second audio sampling point;

taking the frequency mean value as a vertical coordinate of the pixel point;

6. The audio processing method according to claim 3, wherein the generating a waveform corresponding to the first audio data according to the first number of pixels displayed in the audio display interface and the second number of audio sampling points identified by each of the pixels comprises:

generating a waveform corresponding to each pixel point in the audio display interface according to the second numerical value audio sampling points identified by the pixel point;

7. The audio processing method according to claim 6, wherein the generating, for each pixel point in the audio presentation interface, a waveform corresponding to the pixel point according to the second number of audio sampling points identified by the pixel point comprises:

marking a first frequency of the first audio sampling point and a second frequency of the second audio sampling point at a position corresponding to the abscissa in the audio display interface;

8. The audio processing method according to claim 1 or 2, wherein the receiving, through an audio presentation interface of the browser, the preset interactive operation for the waveform comprises:

receiving a selected operation aiming at a first target wave band of the waveform through an audio display interface of the browser, and receiving a pitch detection operation aiming at the target wave band;

correspondingly, the step of processing the first audio data by the call instruction execution interface to obtain processed second audio data includes:

sending a calling instruction to the instruction receiving unit of the instruction execution interface to operate a pitch detection algorithm unit of the instruction execution interface to detect the audio data of the first target waveband, so as to obtain pitch data of the first target waveband;

receiving the pitch data output by the execution result output unit of the instruction execution interface.

9. The audio processing method according to claim 1 or 2, wherein the receiving, through an audio presentation interface of the browser, the preset interactive operation for the waveform comprises:

receiving a pitch adjustment operation aiming at a second target waveband of the waveform through an audio display interface of the browser, wherein the pitch adjustment operation carries a target pitch;

sending a calling instruction to the instruction receiving unit of the instruction execution interface to operate a pitch adjustment algorithm unit of the instruction execution interface to adjust the current pitch of the audio data of the second target waveband to the target pitch, so as to obtain audio data corresponding to the target pitch;

10. The audio processing apparatus according to claim 1 or 2, wherein the receiving, through the audio presentation interface of the browser, the preset interactive operation for the waveform comprises:

receiving beat detection operation aiming at the waveform through an audio display interface of the browser;

sending a calling instruction to the instruction receiving unit of the instruction execution interface to operate a beat detection algorithm unit of the instruction execution interface to perform beat detection on the first audio data to obtain beat data of the first audio data;

and receiving the beat data output by the execution result output unit of the instruction execution interface.

11. The audio processing device is characterized by being configured in a browser and comprising an instruction execution interface, wherein the instruction execution interface is packaged with an instruction receiving unit, an instruction processing unit, an execution algorithm unit and an execution result output unit;

the execution result output unit is configured to output the execution result.

12. A computing device, comprising:

a memory and a processor;

13. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the audio processing method of any of claims 1-10.