CN113113046A

CN113113046A - Audio processing performance detection method and device, storage medium and electronic equipment

Info

Publication number: CN113113046A
Application number: CN202110399439.6A
Authority: CN
Inventors: 骆耀东; 阮良; 陈功; 陈丽
Original assignee: Hangzhou Langhe Technology Co Ltd
Current assignee: Hangzhou Netease Zhiqi Technology Co Ltd
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2021-07-13
Anticipated expiration: 2041-04-14
Also published as: CN113113046B

Abstract

The embodiment of the invention relates to a performance detection method and device for audio processing, a computer readable storage medium and electronic equipment, and relates to the technical field of audio. The performance detection method for audio processing comprises the following steps: acquiring original audio data; processing the original audio data through each processing component in an audio engine, and acquiring audio data generated after the processing of each processing component; comparing the audio data generated after the processing of each processing component with the audio data before the processing of the processing component to obtain a comparison result; and determining the information to be optimized in each processing assembly according to the comparison result. The invention can detect the performance of each processing component in the audio engine and improve the fault positioning accuracy of the audio engine.

Description

Audio processing performance detection method and device, storage medium and electronic equipment

Technical Field

The embodiments of the present invention relate to the field of audio technologies, and in particular, to a method and an apparatus for detecting performance of audio processing, a computer-readable storage medium, and an electronic device.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims and the description herein is not admitted to be prior art by inclusion in this section.

With the development of internet technology and the like, Real-time Communications (RTC) technology is widely applied due to the characteristics of stronger Real-time performance and lower delay. Among them, the most typical applications are real-time audio-video communication, live broadcasting, and the like. For example, in a video conference, online learning, etc. scenario, a user may engage in voice telephony, audio chat, etc. with an interactive object in real-time. The audio is an important factor in audio-video communication, and the quality of the audio directly affects the experience of a user in communication. In order to reduce the influence of recording equipment, environment and other factors on the audio data and improve the audio quality, an audio engine is often required to process the audio data.

Disclosure of Invention

In this context, embodiments of the present invention desirably provide a performance detection method of audio processing, a performance detection apparatus of audio processing, a computer-readable storage medium, and an electronic device.

According to a first aspect of embodiments of the present invention, there is provided a method for detecting performance of audio processing, the method including: acquiring original audio data; processing the original audio data through each processing component in an audio engine, and acquiring audio data generated after the processing of each processing component; comparing the audio data generated after the processing of each processing component with the audio data before the processing of the processing component to obtain a comparison result; and determining the information to be optimized in each processing assembly according to the comparison result.

In an optional implementation manner, when the raw audio data is processed by processing components in an audio engine and audio data generated after processing by each processing component is acquired, the method includes: acquiring component information of each processing component in the audio engine, wherein the component information of each processing component is recorded when the original audio data is processed by each processing component; setting processing parameters of corresponding processing components in the audio engine according to the component information of each processing component; and processing the original audio data through each processing component in the audio engine to obtain the audio data generated after the processing of each processing component.

In an optional implementation manner, the setting, by the processing module, processing parameters of a corresponding processing module in the audio engine according to the module information of each processing module includes: determining processing parameters of processing components for processing the audio frames according to the component information of the processing components; and setting the processing parameters of the processing components for processing the corresponding audio frames according to the processing parameters of the processing components of the audio frames.

In an optional implementation manner, when the processing parameters of the corresponding processing component in the audio engine are set according to the component information of each processing component, the method further includes: when a first audio frame is processed, determining target component information matched with the first audio frame according to the component information, and setting processing parameters of each processing component according to the target component information; when a second audio frame is processed, determining whether component information of each processing component corresponding to the second audio frame changes or not according to the component information, wherein the second audio frame is any one audio frame after the first audio frame; and updating the component information of any one or more processing components when the component information of any one or more processing components corresponding to the second audio frame is determined to be changed.

In an optional implementation manner, the component information includes an on state and a processing level of each of the processing components, and when the component information of any one or more of the processing components is updated, the method includes: determining whether each processing component corresponding to the second audio frame is in an open state according to the open state of each processing component; when determining that each processing component corresponding to the second audio frame is in an on state, determining whether the processing level of each processing component changes; updating the processing level of any one or more of the processing components upon determining that the processing level of any one or more of the processing components has changed.

In an optional implementation, when the audio frame is processed by each processing component in the audio engine, the method further includes: determining an audio type of the audio frame, wherein the audio type comprises a near-end audio frame and a far-end audio frame; and processing the near-end audio frame and the far-end audio frame through each processing component to obtain near-end audio data generated after the processing of each processing component.

In an optional implementation manner, the comparing the audio data generated after the processing of each processing component with the audio data before the processing of the processing component to obtain a comparison result includes: and comparing the near-end audio data generated after the processing of each processing assembly with the near-end audio data before the processing of the corresponding processing assembly to obtain the comparison result.

In an optional implementation manner, the comparing the audio data generated after the processing of each processing component with the audio data before the processing of the processing component to obtain the comparison result includes: extracting audio features of the audio data generated after the processing of the processing component to obtain a first feature parameter; extracting the audio features of the audio data before being processed by the processing component to obtain second feature parameters; determining the processing quality of the audio data processed by the processing component according to the first characteristic parameter and the second characteristic parameter to obtain the comparison result; the determining the information to be optimized in each processing assembly according to the comparison result comprises the following steps: when the processing quality of the audio data processed by the processing component does not reach a preset standard, determining the processing component as a component to be optimized; wherein the audio characteristics of the audio data include any one or more of echo characteristics, noise characteristics, and gain characteristics of the audio data.

In an alternative embodiment, the processing components include at least two of a noise processing component, an echo suppression component, and a gain processing component.

According to a second aspect of the embodiments of the present invention, there is provided an apparatus for detecting performance of audio processing, the apparatus including: the first acquisition module is used for acquiring original audio data; the second acquisition module is used for processing the original audio data through each processing component in an audio engine and acquiring the audio data generated after the processing of each processing component; the comparison module is used for comparing the audio data generated after the processing of each processing assembly with the audio data before the processing of the processing assembly to obtain a comparison result; and the determining module is used for determining the information to be optimized in each processing assembly according to the comparison result.

In an optional implementation manner, when the raw audio data is processed by each processing component in an audio engine and the audio data generated after processing by each processing component is acquired, the second acquiring module is configured to: acquiring component information of each processing component in the audio engine, wherein the component information of each processing component is recorded when the original audio data is processed by each processing component; setting processing parameters of corresponding processing components in the audio engine according to the component information of each processing component; and processing the original audio data through each processing component in the audio engine to obtain the audio data generated after the processing of each processing component.

In an alternative embodiment, the raw audio data includes one or more audio frames arranged in sequence, and the second obtaining module is configured to: determining processing parameters of processing components for processing the audio frames according to the component information of the processing components; and setting the processing parameters of the processing components for processing the corresponding audio frames according to the processing parameters of the processing components of the audio frames.

In an optional implementation, the second obtaining module is configured to: when a first audio frame is processed, determining target component information matched with the first audio frame according to the component information, and setting processing parameters of each processing component according to the target component information; when a second audio frame is processed, determining whether component information of each processing component corresponding to the second audio frame changes or not according to the component information, wherein the second audio frame is any one audio frame after the first audio frame; and updating the component information of any one or more processing components when the component information of any one or more processing components corresponding to the second audio frame is determined to be changed.

In an optional implementation manner, the component information includes an on state and a processing level of each of the processing components, and when the component information of any one or more of the processing components is updated, the second obtaining module is further configured to: determining whether each processing component corresponding to the second audio frame is in an open state according to the open state of each processing component; when determining that each processing component corresponding to the second audio frame is in an on state, determining whether the processing level of each processing component changes; updating the processing level of any one or more of the processing components upon determining that the processing level of any one or more of the processing components has changed.

In an optional implementation manner, when the audio frame is processed by each processing component in the audio engine, the second obtaining module is configured to: determining an audio type of the audio frame, wherein the audio type comprises a near-end audio frame and a far-end audio frame; and processing the near-end audio frame and the far-end audio frame through each processing component to obtain near-end audio data generated after the processing of each processing component.

In an optional embodiment, the alignment module is configured to: and comparing the near-end audio data generated after the processing of each processing assembly with the near-end audio data before the processing of the corresponding processing assembly to obtain the comparison result.

In an optional embodiment, the alignment module is configured to: extracting audio features of the audio data generated after the processing of the processing component to obtain a first feature parameter; extracting the audio features of the audio data before being processed by the processing component to obtain second feature parameters; determining the processing quality of the audio data processed by the processing component according to the first characteristic parameter and the second characteristic parameter to obtain the comparison result; the determination module configured to: when the processing quality of the audio data processed by the processing component does not reach a preset standard, determining the processing component as a component to be optimized; wherein the audio characteristics of the audio data include any one or more of echo characteristics, noise characteristics, and gain characteristics of the audio data.

According to a third aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any one of the above-described performance detection methods of audio processing.

According to a fourth aspect of the embodiments of the present invention, there is provided an electronic apparatus including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any of the above-described audio processing performance detection methods via execution of the executable instructions.

According to the audio processing performance detection method, the audio processing performance detection device, the computer readable storage medium and the electronic device, original audio data can be processed through each processing component in the audio engine, audio data generated after processing of each processing component is obtained, the audio data generated after processing of each processing component is compared with the audio data before processing of the processing component, and information to be optimized in each processing component is determined according to a comparison result. By comparing the audio data generated after the processing of each processing component with the audio data before the processing, the processing performance of each processing component in the audio engine can be determined, the processing component with poor performance and the information to be optimized of each processing component are positioned, and a technical support is provided for realizing the performance optimization of the audio engine.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 shows a schematic diagram of a system architecture according to an embodiment of the invention;

FIG. 2 is a flow chart illustrating a method for detecting performance of audio processing according to an embodiment of the present invention;

FIG. 3 illustrates a schematic diagram of the processing of raw audio data according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating comparison of audio data according to an embodiment of the present invention;

FIG. 5 illustrates a process flow diagram of raw audio data according to an embodiment of the present invention;

FIG. 6 illustrates a flow diagram for obtaining component information according to an embodiment of the present invention;

FIG. 7 illustrates a flow diagram of a setup processing component according to an embodiment of the invention;

FIG. 8 illustrates a flow diagram of another setup processing component according to an embodiment of the invention;

FIG. 9 illustrates a flow diagram for updating process component parameters according to an embodiment of the present invention;

FIG. 10 shows a flow chart of a process for audio frames according to an embodiment of the invention;

fig. 11 is a block diagram illustrating a performance detecting apparatus for audio processing according to an embodiment of the present invention; and

fig. 12 is a block diagram illustrating an electronic device according to an embodiment of the present invention.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer program product. Thus, the present invention may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to an embodiment of the present invention, a method for detecting performance of audio processing, an apparatus for detecting performance of audio processing, a computer-readable storage medium, and an electronic device are provided.

In this document, any number of elements in the drawings is by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.

The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.

Summary of The Invention

The inventor finds that due to the complexity of audio data, multiple links of processing are required to be performed on the input audio data through an audio engine, and once a certain link has a problem, the tone quality of the output audio signal is poor, the intelligibility is low, and even the audio signal cannot be output.

In view of the above, the basic idea of the present invention is: the method comprises the steps of processing original audio data through each processing assembly in an audio engine, acquiring audio data generated after processing of each processing assembly, comparing the audio data generated after processing of each processing assembly with the audio data generated before processing of the processing assembly, and determining information to be optimized in each processing assembly according to a comparison result. By comparing the audio data generated after the processing of each processing component with the audio data before the processing, the processing performance of each processing component in the audio engine can be determined, the processing component with poor performance and the information to be optimized of each processing component are positioned, and a technical support is provided for realizing the performance optimization of the audio engine.

Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.

Application scene overview

It should be noted that the following application scenarios are merely illustrated to facilitate understanding of the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.

When the audio data is processed by the audio engine, the processing performance of the audio engine can be monitored, and when the performance of the audio engine is poor or the tone quality of an audio signal output by the audio engine is poor, the processing performance of each processing component in the audio engine can be detected, and the processing component with the problem and processing information can be positioned, so that an operator can perform performance optimization on the audio engine according to the detection result.

Exemplary method

The audio is an important factor in audio-video communication, and the quality of the audio directly affects the experience of user communication, so in order to reduce the influence of factors such as recording equipment and environment on audio data and improve the audio quality, an audio engine is often required to process the audio data. However, due to the complexity of the audio data, the audio engine needs to perform multiple processing steps on the input audio data, and once a problem occurs in a certain step, the output audio signal will have poor sound quality, low intelligibility, or even no audio signal can be output. Therefore, a performance detection method for audio processing is needed to detect the processing performance of the audio engine and effectively locate the fault in the audio engine.

In view of the above problems, exemplary embodiments of the present invention first provide a performance detection method of audio processing. Fig. 1 shows a system architecture diagram of an environment in which the method operates. As shown in fig. 1, the system architecture 100 may include: a client 110 and a server 120. The client 110 may include a personal computer, a smart phone, a tablet computer, a smart car device, a smart wearable device, a game console, and the like. The client 110 may have installed therein an application capable of audio communication, such as an instant messaging application, a live application, an audio-video conferencing application, an intelligent voice assistant, and the like. Through the application program installed on the client 110, the user can perform audio communication with other users, and can also input voice to control the client 110 or control other remote devices through the client 110, such as smart home devices and the like to perform corresponding operations. The server 120 represents a background service system that provides audio processing performance detection. Information interaction between the client 110 and the server 120 can be performed through a network, for example, a user performs an operation on the client 110 to send a voice message to another user, and the client 110 and the server 120 cooperate together to complete the operation.

It should be noted that, in the present exemplary embodiment, the number of each device in fig. 1 is not limited, for example, any number of clients 110 may be provided according to implementation needs, and the server 120 may be a cluster formed by a plurality of servers.

The performance detection method of audio processing provided by the present exemplary embodiment may be generally performed by the server 120. For example, a user may enter audio data through the client 110, and the server 120 may process the audio data using an audio engine disposed in the server 120 and determine the performance of the audio engine that processes the audio data. It is easily understood by those skilled in the art that the performance detection method of audio processing provided by the present exemplary embodiment may also be performed by the client 110. For example, when a user makes an audio call with another user through an application installed on the client 110, the client 110 may process an input audio signal through an audio engine provided in the client 110 and determine the processing performance of the audio engine.

Fig. 2 shows an exemplary flow of a performance detection method of audio processing performed by the client 110 and/or the server 120, which may include:

in step S210, original audio data is acquired.

The original audio data may be any audio data that needs to be processed, and may include audio data input by a user during real-time communication or instant messaging, audio data downloaded or intercepted by the user from the internet, and the like.

Step S220, processing the original audio data by each processing component in the audio engine, and acquiring the audio data generated after processing by each processing component.

The audio engines may be computer programs that perform audio processing, and each audio engine may be made up of multiple processing components, each of which may be used to perform specific audio processing functions, such as audio conversion, noise reduction, echo suppression, and so forth.

Step S230, comparing the audio data generated after the processing of each processing component with the audio data before the processing of the processing component, to obtain a comparison result.

Specifically, when comparing audio data before and after processing by each processing component, audio features such as time domain, frequency spectrum, and sound quality before and after processing by each processing component may be compared.

Step S240, determining information to be optimized in each processing module according to the comparison result. For example, the component to be optimized in each processing component and the information to be optimized in the component to be optimized, etc. may be determined according to the comparison result.

According to the performance detection method for audio processing in the exemplary embodiment, the processing performance of each processing component in the audio engine can be determined by comparing the audio data generated after the processing of each processing component with the audio data before the processing, the processing component with poor performance and the information to be optimized of each processing component are located, and a technical support is provided for realizing the performance tuning of the audio engine.

Each step in fig. 2 is described in detail below.

In step S210, original audio data is acquired.

The original audio data may be audio data input in real-time communication, for example, audio and video communication, such as voice data input by a user during voice chat, video conference, live webcast, and the like. In addition, the original audio data may also be audio data that is entered by a user through a recording device on a client during an instant messaging process, such as a voice message sent by the user through social software, or may also be other audio data that needs to be processed, for example, audio data that is downloaded or intercepted by the user from the internet, and the like, which is not limited in this exemplary embodiment.

In step S220, the original audio data is processed by each processing component in the audio engine, and the audio data generated after processing by each processing component is acquired.

The audio engine may be a computer program that performs audio processing, or may be part of a computer program that performs audio processing, which is the main program of the audio processing software. In this exemplary embodiment, multiple processing components may be included in the audio engine, each of which may be used to perform a particular audio processing function. For example, the processing components may include noise processing components, echo suppression components, gain processing components, and so forth. The noise processing component may be configured to reduce or eliminate certain noise in the input audio signal, such as environmental noise, device noise, and the like; the echo suppression component can be used for eliminating echo in the input audio signal, wherein the echo can be an echo signal generated by playing through a client loudspeaker when audio is recorded through the client; the gain processing component may be configured to adjust the amplitude of the input audio signal to obtain a smooth audio signal output, for example, during an audio call, the gain processing component may convert the volume of the input audio signal to a volume range that a user can listen to normally.

In an alternative embodiment, the processing components in this exemplary embodiment may include at least two of the noise processing components, echo suppression components, and gain processing components described above. In addition, the processing component may further include an audio conversion component, an AI (Artificial Intelligence) noise reduction component, a sound effect addition component, a codec component, and the like.

In this exemplary embodiment, the raw audio data may be processed by each processing component in the audio engine, for example, the raw audio data may be input into the audio engine, the raw audio data may be sequentially processed by each processing component in the audio engine, and the audio data generated after processing by each processing component may be acquired.

Fig. 3 shows a processing flow of raw audio data in the present exemplary embodiment, and as shown in the figure, the raw audio data includes three parts: the voice recorded by the microphone as the speaker, the noise recorded by the microphone in the environment where the user is located, and echo generated by the local loudspeaker and other audio data. At this time, the audio engine may perform a subtraction process on the echo generated by the local speaker in the original audio data through the echo suppression component 310 according to the audio source in the original audio data; for the user voice and the environmental noise recorded by the microphone, the user voice and the environmental noise can be processed sequentially by the echo suppression component 310, the noise processing component 320 and the gain processing component 330, so as to obtain the output voice of the speaker. After each processing component completes processing the audio data, the processed audio data may be output. By the method, the audio quality of the speaker voice can be improved, and the influence of noise, echo and the like on the speaker voice can be reduced.

In step S230, the audio data generated after processing by each processing component is compared with the audio data before processing by the processing component, so as to obtain a comparison result.

In the exemplary embodiment, the audio data before and after the processing of each processing component may be compared to determine whether the audio data before and after the processing of each processing component satisfies the processing requirement, so as to determine the processing performance of each processing component. Specifically, the processing performance of each processing component may be determined by comparing audio time domain and spectrum information of audio data before and after processing by each processing component, or comparing sound quality, volume, presence or absence of echo, and the like of audio signals before and after processing by each processing component.

In an alternative embodiment, referring to fig. 4, step S230 may be implemented by the following steps S410 to S430:

in step S410, the audio feature of the audio data generated after the processing by the processing component is extracted to obtain a first feature parameter.

In step S420, the audio feature of the audio data before being processed by the processing component is extracted to obtain a second feature parameter.

In the present exemplary embodiment, the audio characteristics of the audio data may include any one or more of echo characteristics, noise characteristics, and gain characteristics of the audio data; accordingly, the first characteristic parameter may include any one or more of an echo parameter of an echo characteristic of the audio data, a noise parameter of a noise characteristic, and a gain parameter of a gain characteristic. Specifically, the echo parameters may include echo duration, echo round trip loss; the noise parameter may be a signal-to-noise ratio of the noise feature; the gain parameter may comprise a volume envelope. Wherein, the echo duration can represent the time length for eliminating echo, the echo round-trip loss can represent the echo eliminating capacity, and the larger the echo round-trip loss value is, the cleaner the echo is eliminated; the signal-to-noise ratio refers to the proportion of noise in the audio signal; the volume envelope may be used to represent the magnitude of the volume of the audio signal, with a greater value of the volume envelope representing a greater volume of the audio signal.

For the audio data generated after the processing of each processing component, the characteristic parameters of the audio data can be calculated by analyzing the time domain information, the frequency spectrum characteristics and the like of the audio data. For example, the audio data may be converted into a spectral feature, and the feature parameter of the audio data may be determined according to a relationship between a peak value, a valley value, and a preset value in the spectral feature.

In step S430, the processing quality of the audio data processed by the processing component is determined according to the first characteristic parameter and the second characteristic parameter, so as to obtain a comparison result.

The first characteristic parameter and the second characteristic parameter are compared, so that the processing quality of the audio data before and after processing of each processing component, such as echo cancellation quality, denoising quality, volume gain processing quality and the like, can be determined.

Through the steps S410 to S430, the audio features of the audio data before and after the processing of the processing components can be extracted to obtain the feature parameters, and the processing quality of the audio data processed by each processing component is determined, so that the effective detection of the processing performance of each processing component can be realized, and a basis is provided for improving the audio processing process.

Step S240, determining information to be optimized in each processing module according to the comparison result.

The information to be optimized may include the components to be optimized in each processing component, and the information to be optimized of the components to be optimized, such as whether the filter needs to be replaced or the coefficients of the filter need to be adjusted.

In an alternative embodiment, when the processing quality of the audio data processed by the processing component does not reach the preset standard, the processing component may be determined as the component to be optimized. The preset standard may be set according to the processing component and actual requirements, for example, for the noise processing component, the preset standard may be that the noise of the processed audio data is smaller than a preset noise range; for the echo suppression component, the preset criterion may be an echo cancellation degree of the processed audio data, for example, the echo of the processed audio data is smaller than a preset echo range; for the gain processing component, the preset criterion may be whether the volume of the processed audio data reaches a preset threshold, or the like.

Through the method, the information to be optimized of each processing assembly can be determined, developers can adjust each processing assembly in a targeted mode according to the information to be optimized, and technology and data support is provided for improving the processing performance of the audio engine.

The applicant of the present invention finds, through research, that in an ideal environment, such as an anechoic room or an audio-visual room and other laboratory scenes, the performance of each processing component in an audio engine can achieve an expected effect, but in a real environment, the audio engine is influenced by an external environment, the complexity of an acoustic transmission path is high, non-linear distortion may exist in a recording device, and each processing component is also influenced mutually, so that the processing performance of each processing component is greatly influenced, and the tone quality of an output audio signal is deteriorated. Therefore, in order to accurately locate problematic processing components in the audio engine when processing raw audio data according to step S220, in an alternative embodiment, as shown with reference to fig. 5, the following steps S510 to S530 may be performed:

in step S510, component information of each processing component in the audio engine is acquired.

The component information of each processing component may be component information recorded when the raw audio data is processed by each processing component in actual application. The component information of each processing component may include all parameters affecting the processing performance of the processing component, for example, may include a buffer variable of each processing component, that is, intermediate information calculated by each processing component when processing a previous piece of original audio data, the intermediate information being reused when processing a current piece of original audio data, and may further include component parameters of each processing component, such as filter type and coefficients.

In practical applications, the audio engine may process the received raw audio data. In the present exemplary embodiment, in order to improve the accuracy of determining the performance of the audio engine, the processing performance of each processing component in the audio engine may be detected by the process of reproducing the raw audio data. Therefore, it is possible to acquire component information of each processing component in the audio engine, that is, to record component information of each processing component when processing raw audio data while the audio engine processes raw audio data.

In an alternative embodiment, in order to improve the processing efficiency of audio data, the original audio data may be divided into one or more audio frames arranged in sequence, whereby, when acquiring the component information of each processing component, the component information of each processing component at the current time may be recorded, and then, starting from the current time, the audio frame input into the audio engine and the sequence of the audio frame, and the component information of each processing component when processing the audio frame, are recorded in sequence. Fig. 6 shows a flow of acquiring component information in the exemplary embodiment, and as shown in the figure, the following steps may be included:

in step S610, at time t, component information of each process component is recorded. The component information of each processing component may include buffer variables and component parameters, such as filter coefficients, when the processing component processes the previous audio frame.

In step S620, the current audio frame and the order of the current audio frame are recorded. For example, the order of the current audio frame may be labeled as n, which is the total number of audio frames input into the audio engine from time t to the current time.

In step S630, audio data obtained after each processing component processes the current audio frame is recorded.

In step S640, a next audio frame is obtained and used as a new current audio frame.

In step S650, it is determined whether the component information of the processing component that processes the current audio frame changes, if so, step S660 is performed, and if not, step S620 is performed.

In step S660, component information of a processing component that processes the current audio frame is recorded. Then, step S620 is performed to record the current audio frame and the order of the current audio frame.

Through the above steps S610 to S660, component information of respective processing components of which the audio engine processes each audio frame in actual application can be recorded.

In step S520, the processing parameters of the corresponding processing component in the audio engine are set according to the component information of each processing component.

After the component information of each processing component is obtained, the processing parameters of each processing component, such as cache variables, filter coefficients, and the like, may be set according to the component information of each processing component, so that the processing parameters of each processing component are completely consistent with the parameters of each processing component in actual application when processing the original audio data.

In step S530, the original audio data is processed by each processing component in the audio engine, so as to obtain the audio data generated after processing by each processing component.

Specifically, the raw audio data may be processed sequentially by each processing component in the audio engine. For example, the original audio data may be input to the noise processing component for denoising, then the denoised audio signal may be input to the echo suppression component for canceling the echo in the audio signal, and finally the audio signal processed by the echo suppression component may be input to the gain processing component for adjusting the volume of the audio signal. According to the method, the processing process of the original audio data is reproduced, and the audio data generated after the processing of each processing component is obtained.

In an alternative embodiment, in order to improve the reproducibility of the processing procedure of the original audio data, the component information of each processing component may further include an order in which each processing component processes the original audio data, so that when the original audio data is processed by each processing component in the audio engine, the original audio data may be re-input for processing according to the order in which each processing component processes the original audio data, and the processed audio data of each processing component is obtained.

Through the steps S510 to S530, the processing parameters of the corresponding processing components can be set according to the component information of each processing component in the audio engine, and the original audio data can be processed by each processing component in the audio engine, so that the high reproduction of the original audio data processed by the audio engine can be realized, and convenience is provided for accurately determining a faulty component in the audio engine; meanwhile, after the parameters of the processing assembly are subsequently adjusted, the performance of the audio engine and the processing assembly can be verified, and the detection and verification of the processing performance of the audio engine are automated.

In addition, as the component information covers more data of each processing component for processing the original audio data, compared with a method for simply printing logs of important nodes of each processing module, the processing process of the original audio data can be accurately reproduced, and support is provided for accurately positioning the faults of the processing components.

In an alternative embodiment, for one or more audio frames sequentially arranged in the original audio data, as shown in fig. 7, when the processing parameters of the corresponding processing component in the audio engine are set according to step S520, the following steps may be performed:

in step S710, the processing parameters of the processing components that process each audio frame are determined based on the component information of each processing component. The processing parameter of the processing component may be parameter information in the component information of each processing component, and for example, may include a buffer variable parameter and a component parameter, etc. for each processing component to process each audio frame.

In step S720, the processing parameters of the processing components for processing the corresponding audio frame are set according to the processing parameters of the processing components of each audio frame.

As individual audio frames in raw audio data are processed by the audio engine, component parameters of the processing components in the audio engine may change, for example, in some audio engines, a user may change the processing parameters of each processing component through an SDK (Software Development Kit) interface that adjusts the processing component parameters. Therefore, when each audio frame is processed, the processing parameters of the processing component for processing each audio frame can be determined according to the component information of each processing component, so that when each audio frame is processed, the processing parameters of each processing component are set according to the processing parameters of the processing component corresponding to each audio frame, and the processing process of each audio frame is completely reproduced.

By determining the processing parameters of the processing components of each audio frame and setting the processing parameters of the corresponding processing components according to the processing parameters, the processing parameters of the processing components for processing each audio frame can be ensured to be consistent with the component parameters in actual application, and the reproducibility of the original audio data processing process is improved.

In an alternative embodiment, when setting the processing parameters of the corresponding processing component in the audio engine according to the component information of each processing component, referring to fig. 8, the following method may also be performed:

in step S810, when the first audio frame is processed, target component information matched with the first audio frame is determined according to the component information, and processing parameters of each processing component are set according to the target component information.

The first audio frame may be any one of audio frames in the original audio data; the target component information is component information of each processing component when the first audio frame is processed. When processing the first audio frame, the target component information of the first audio frame may be determined according to the component information, for example, the component information order matching the order may be determined according to the order of the first audio frame in the original audio data, thereby determining the target component information in the component information. And setting processing parameters of corresponding processing components for processing the first audio frame according to the target component information.

In step S820, when the second audio frame is processed, whether the component information of each processing component corresponding to the second audio frame changes is determined according to the component information.

And the second audio frame is any one audio frame after the first audio frame.

In an alternative embodiment, the component information may include identification information indicating whether the component information has changed, and when the second audio frame is processed, the identification information in the component information may be read to determine whether the component information of each processing component processing the second audio frame has changed.

In step S830, when it is determined that the component information of any one or more processing components corresponding to the second audio frame changes, the component information of any one or more processing components is updated. For example, the component information of any one or more processing components corresponding to the second audio frame may be read from the component information, and the component parameters of the processing components may be set, so as to update the component information.

When the component information of the processing component corresponding to the second audio frame changes, the component information of the processing component is updated, so that the processing parameters of the processing component can be adjusted in time, the possibility of change of processing effect caused by asynchronous parameters is avoided, the frequency of setting the component information can be reduced, and the processing efficiency of the audio frame is improved.

In an optional implementation manner, the component information may further include an open state and a processing level of each processing component, where the open state indicates whether a processing function of the processing component is open, and the processing level indicates a processing degree of the processing component. Thus, in updating component information for any one or more processing components, as shown with reference to fig. 9, the following method may also be performed:

in step S910, it is determined whether each processing element corresponding to the second audio frame is in an on state according to the on state of each processing element.

In step S920, when it is determined that each processing component corresponding to the second audio frame is in the on state, it is determined whether the processing level of each processing component changes. Taking the noise processing component as an example, the higher the processing level is, the cleaner the processing component eliminates the noise.

In step S930, when it is determined that the processing level of any one or more of the processing components has changed, the processing level of any one or more of the processing components is updated.

By the method, the processing process of the original audio data can be ensured to be completely consistent with the actual processing process, the repeated carving of the processing process of the original audio data is realized, and whether the performance of the processing assembly is improved or not can be further verified after the processing assembly is subsequently adjusted; meanwhile, when the processing component is determined to be in the open state and the processing level of the processing component is changed, the processing level of the processing component is updated, so that the updating times of the processing component can be reduced, and the efficiency of the original audio data restoration processing process is improved.

When a user enters audio through a recording device on a client, the resulting raw audio data may consist of two parts, namely far-end audio and near-end audio. The near-end original audio may be audio data collected by a user through a recording device on the client, such as may include a speech signal of a speaker and ambient noise collected through a microphone in fig. 3, and the far-end original audio may be a signal sent by another device through a network, such as may be an echo generated by playing through a local speaker shown in fig. 3. In order to determine the quality of the near-end audio processed by each processing component, in an alternative embodiment, step S220 may also be implemented by:

determining an audio type of the audio frame;

and processing the near-end audio frame and the far-end audio frame through each processing assembly to obtain the near-end audio data generated after the processing of each processing assembly. Wherein the audio types include near-end audio frames and far-end audio frames.

In practical applications, the near-end audio may be an audio signal input by a user through a recording device of the client, and the far-end audio may be an audio signal input by a connection interface between the client and another device, such as a physical interface or a network interface, that is, the near-end audio and the far-end audio may be input to the client through different data interfaces. Thus, the audio type of each audio frame in the original audio data, i.e. the near-end audio frame or the far-end audio frame, can be determined according to the input interface of the audio signal in the original audio data. Then, the audio frames are input according to the sequence of each audio frame, so that the input audio frames are processed by each processing component, and the near-end audio data generated after the processing of each processing component is output. By determining the audio type of the audio frame and outputting the near-end audio data generated after the processing of each processing assembly, the far-end audio data in the output audio data can be eliminated, so that the processing performance of each assembly can be determined according to the output near-end audio data, and the method has guiding significance for improving the voice quality of a speaker.

As mentioned above, the original audio data may be composed of near-end audio and far-end audio, and accordingly, each audio frame in the original audio data may also include a near-end audio frame and a far-end audio frame. Therefore, in an alternative embodiment, it may be determined whether each audio frame in the original audio data includes a near-end audio frame and a far-end audio frame, so as to input the corresponding near-end audio frame and far-end audio frame to the audio engine for processing according to the order of each audio frame. For example, referring to fig. 10, a method for processing an audio frame is shown, which may include:

in step S1001, each audio frame in the original audio data and component information of each processing component are acquired. Wherein the component information of each processing component may include component information for processing each audio frame.

In step S1002, processing parameters, an on state, and a processing level of each processing component that processes the current audio frame are set.

In step S1003, it is determined whether the current audio frame includes a near-end audio frame. If yes, step S1005 is executed, and if no, step S1009 is executed.

In step S1004, it is determined whether the current audio frame includes a far-end audio frame. If yes, step S1006 is executed, and if no, step S1009 is executed.

In step S1005, a near-end audio frame in the current audio frame is input.

In step S1006, a far-end audio frame in the current audio frame is input.

In step S1007, the current audio frame is processed. That is, the near-end audio frame and the far-end audio frame in the current audio frame are processed. For example, the near-end audio frame and the far-end audio frame in the current audio frame may be processed separately, and the processing methods of the near-end audio frame and the far-end audio frame may be the same or different.

In step S1008, the near-end audio data obtained after the processing by each processing component is output.

In step S1009, the next audio frame is acquired as a new current audio frame.

In step S1010, it is determined whether the on state or the processing level of each processing component that processes the current audio frame has changed. If there is a change in the on state or the processing level of the processing component, step S1002 is executed, and if there is no change in the on state or the processing level of the processing component, step S1003 and step S1004 are executed.

By the method, the near-end audio data can be extracted from the audio data generated after the processing of each processing assembly, and the processing performance of each processing assembly on the near-end audio data is determined.

In order to make the user clearly hear the speaker's voice, it is necessary to improve the quality of the near-end audio and eliminate the influence of the far-end audio as much as possible. Therefore, in an alternative embodiment, step S230 may also be implemented by:

and comparing the near-end audio data generated after the processing of each processing assembly with the near-end audio data before the processing of the corresponding processing assembly to obtain a comparison result.

By comparing the near-end audio data before and after the processing of each processing component, the quality of the near-end audio data obtained by processing the original audio data by each processing component can be determined, and the processing performance of each processing component can be determined. When the voice data recorded by the user through the recording equipment is processed, better detection capability can be obtained.

Exemplary devices

The exemplary embodiment of the present invention also provides a performance detection apparatus for audio processing. Referring to fig. 11, the performance detection apparatus 1100 for audio processing may include:

a first obtaining module 1110, configured to obtain original audio data;

the second obtaining module 1120 may be configured to process the original audio data through each processing component in the audio engine, and obtain audio data generated after processing by each processing component;

the comparison module 1130 may be configured to compare the audio data generated after the processing of each processing component with the audio data before the processing of the processing component, so as to obtain a comparison result;

the determining module 1140 may be configured to determine information to be optimized in each processing component according to the comparison result.

In an alternative embodiment, when the raw audio data is processed by each processing component in the audio engine and the audio data generated after processing by each processing component is obtained, the second obtaining module 1120 is configured to:

acquiring component information of each processing component in the audio engine, wherein the component information of each processing component is recorded when the original audio data is processed by each processing component;

setting processing parameters of corresponding processing components in the audio engine according to the component information of each processing component;

and processing the original audio data through each processing component in the audio engine to obtain the audio data generated after the processing of each processing component.

In an alternative embodiment, the raw audio data includes one or more audio frames arranged in sequence, and the second obtaining module 1120 is configured to:

determining processing parameters of processing components for processing each audio frame according to the component information of each processing component;

and setting the processing parameters of the processing components for processing the corresponding audio frames according to the processing parameters of the processing components of each audio frame.

In an alternative embodiment, the second obtaining module 1120 is configured to:

when the first audio frame is processed, determining target component information matched with the first audio frame according to the component information, and setting processing parameters of each processing component according to the target component information; and

when a second audio frame is processed, determining whether component information of each processing component corresponding to the second audio frame changes or not according to the component information, wherein the second audio frame is any audio frame behind the first audio frame;

and updating the component information of any one or more processing components when the component information of any one or more processing components corresponding to the second audio frame is determined to be changed.

In an optional implementation manner, the component information includes an on state and a processing level of each processing component, and when updating the component information of any one or more processing components, the second obtaining module 1120 is further configured to:

determining whether each processing component corresponding to the second audio frame is in an open state according to the open state of each processing component;

when the processing components corresponding to the second audio frame are determined to be in the opening state, determining whether the processing levels of the processing components are changed;

the processing level of any one or more processing components is updated upon determining that the processing level of any one or more processing components has changed.

In an alternative embodiment, in processing the audio frame by the processing components in the audio engine, the second obtaining module 1120 is configured to:

determining an audio type of the audio frame, wherein the audio type comprises a near-end audio frame and a far-end audio frame;

and processing the near-end audio frame and the far-end audio frame through each processing assembly to obtain the near-end audio data generated after the processing of each processing assembly.

In an alternative embodiment, the alignment module 1130 is configured to:

extracting audio features of the audio data generated after the processing of the processing component to obtain a first feature parameter; and

extracting the audio features of the audio data before processing by the processing component to obtain second feature parameters;

determining the processing quality of the audio data processed by the processing component according to the first characteristic parameter and the second characteristic parameter to obtain a comparison result;

a determination module 1140 configured to:

when the processing quality of the audio data processed by the processing component does not reach a preset standard, determining the processing component as a component to be optimized;

wherein the audio characteristics of the audio data include any one or more of echo characteristics, noise characteristics, and gain characteristics of the audio data.

In an alternative embodiment, the processing component comprises at least two of a noise processing component, an echo suppression component and a gain processing component.

In addition, other specific details of the embodiments of the present invention have been described in detail in the embodiments of the present invention of the above method, and are not described herein again.

Exemplary storage Medium

The storage medium of the exemplary embodiment of the present invention is explained below.

In the present exemplary embodiment, the above-described method may be implemented by a program product, such as a portable compact disc read only memory (CD-ROM) and including program code, and may be executed on a device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RE, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Exemplary electronic device

An electronic device of an exemplary embodiment of the present invention is explained with reference to fig. 12. The electronic device may be the client 110 or the server 120 described above.

The electronic device 1200 shown in fig. 12 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 12, the electronic device 1200 is embodied in the form of a general purpose computing device. The components of the electronic device 1200 may include, but are not limited to: at least one processing unit 1210, at least one memory unit 1220, a bus 1230 connecting the various system components including the memory unit 1220 and the processing unit 1210, and a display unit 1240.

Wherein the storage unit stores program code, which may be executed by the processing unit 1210, to cause the processing unit 1210 to perform the steps according to various exemplary embodiments of the present invention described in the above section "exemplary methods" of the present description. For example, the processing unit 1210 may perform the method steps as shown in fig. 2, 4-10, etc.

The storage unit 1220 may include volatile storage units such as a random access memory unit (RAM)1221 and/or a cache memory unit 1222, and may further include a read only memory unit (ROM) 1223.

Storage unit 1220 may also include a program/utility 1224 having a set (at least one) of program modules 1225, such program modules 1225 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1230 may include a data bus, an address bus, and a control bus.

The electronic device 1200 may also communicate with one or more external devices 1300 (e.g., keyboard, pointing device, bluetooth device, etc.) via an input/output (I/O) interface 1250. The electronic device 1200 further comprises a display unit 1240 connected to the input/output (I/O) interface 1250 for displaying. Also, the electronic device 1200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 1260. As shown, the network adapter 1260 communicates with the other modules of the electronic device 1200 via the bus 1230. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be noted that although in the above detailed description several modules or sub-modules of the apparatus are mentioned, such division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method for performance detection of audio processing, the method comprising:

acquiring original audio data;

processing the original audio data through each processing component in an audio engine, and acquiring audio data generated after the processing of each processing component;

comparing the audio data generated after the processing of each processing component with the audio data before the processing of the processing component to obtain a comparison result;

and determining the information to be optimized in each processing assembly according to the comparison result.

2. The method of claim 1, wherein when processing the raw audio data by processing components in an audio engine and obtaining audio data generated after processing by each of the processing components, the method comprises:

3. The method of claim 2, wherein the raw audio data comprises one or more audio frames arranged in sequence, and the setting of the processing parameters of the corresponding processing component in the audio engine according to the component information of each processing component comprises:

determining processing parameters of processing components for processing the audio frames according to the component information of the processing components;

and setting the processing parameters of the processing components for processing the corresponding audio frames according to the processing parameters of the processing components of the audio frames.

4. The method of claim 3, wherein when setting the processing parameters of the corresponding processing component in the audio engine according to the component information of each of the processing components, the method further comprises:

when a first audio frame is processed, determining target component information matched with the first audio frame according to the component information, and setting processing parameters of each processing component according to the target component information; and

when a second audio frame is processed, determining whether component information of each processing component corresponding to the second audio frame changes or not according to the component information, wherein the second audio frame is any one audio frame after the first audio frame;

5. The method of claim 4, wherein the component information includes an on-state and a processing level of each of the processing components, and wherein when updating the component information of any one or more of the processing components, the method comprises:

when determining that each processing component corresponding to the second audio frame is in an on state, determining whether the processing level of each processing component changes;

updating the processing level of any one or more of the processing components upon determining that the processing level of any one or more of the processing components has changed.

6. The method of claim 3, wherein in processing the audio frames by each of the processing components in the audio engine, the method further comprises:

and processing the near-end audio frame and the far-end audio frame through each processing component to obtain near-end audio data generated after the processing of each processing component.

7. The method of claim 6, wherein comparing the audio data generated after the processing of each processing component with the audio data before the processing of the processing component to obtain a comparison result comprises:

and comparing the near-end audio data generated after the processing of each processing assembly with the near-end audio data before the processing of the corresponding processing assembly to obtain the comparison result.

8. An apparatus for performance detection of audio processing, the apparatus comprising:

the first acquisition module is used for acquiring original audio data;

the second acquisition module is used for processing the original audio data through each processing component in an audio engine and acquiring the audio data generated after the processing of each processing component;

the comparison module is used for comparing the audio data generated after the processing of each processing assembly with the audio data before the processing of the processing assembly to obtain a comparison result;

and the determining module is used for determining the information to be optimized in each processing assembly according to the comparison result.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 7.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1-7 via execution of the executable instructions.