CN117472321A

CN117472321A - Audio processing method and device, storage medium and electronic equipment

Info

Publication number: CN117472321A
Application number: CN202311833039.7A
Authority: CN
Inventors: 肖梦育; 邓财祥
Original assignee: Guangdong Chaoge Smart Internet Technology Co ltd
Current assignee: Guangdong Chaoge Smart Internet Technology Co ltd
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-01-30

Abstract

The invention provides an audio processing method, an audio processing device, a storage medium and electronic equipment, wherein the method comprises the following steps: determining a first audio source type of a first application program when an audio recording starting instruction aiming at the first application program is detected; determining a first audio acquisition function based on the first audio source type; calling a first audio acquisition function to acquire first audio data under a first audio source type; transmitting the first audio data to the first application program to realize data capture of the target audio input device by the first application program; the target audio input device supports simultaneous data capture of a second application program under the data capture of the first application program, and the second audio source type is different from the first audio source type. The embodiment of the invention can conveniently realize the simultaneous data capture of different application programs on the same audio input device under the condition that the application programs do not need to additionally adapt and compatible the simultaneous data capture function of the same audio input device.

Description

Audio processing method and device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to an audio processing method, an audio processing device, a storage medium, and an electronic apparatus.

Background

At present, an audio input device is occupied after being opened and cannot be opened again in an unopened state, so that a conflict occurs when a plurality of application programs use the same audio input device at the same time, and only one application program is supported to normally use the audio input device, and other application programs cannot normally use the audio input device. In order to meet the requirement that multiple applications simultaneously capture data of the same audio input device, related technologies generally create a Java (object oriented programming language) service on a framework layer of an Android (Android) platform, so as to distribute data to different applications through the service, so that each application needs to be additionally adapted to and compatible with the service, to achieve the function of additionally adapting and compatible with simultaneous data capture of the same audio input device, and the efficiency is low. Based on this, how to conveniently realize simultaneous data capturing of different application programs on the same audio input device without additionally adapting and compatible simultaneous data capturing functions of the application programs on the same audio input device becomes a research hotspot.

Disclosure of Invention

In view of this, the embodiments of the present invention provide an audio processing method, apparatus, storage medium, and electronic device, so as to solve the problems in the prior art that an application program needs to be additionally adapted to be compatible with a simultaneous data capturing function of the same audio input device, and efficiency is low; that is, the embodiment of the invention can conveniently realize the simultaneous data capture of different application programs on the same audio input device without additionally adapting and compatible simultaneous data capture function of the application programs on the same audio input device, and can improve the efficiency.

According to an aspect of the present invention, there is provided an audio processing method, the method comprising:

when an audio recording starting instruction aiming at a first application program is detected, determining a first audio source type of the first application program, wherein the first audio source type is one audio source type in an audio source type set;

determining a first audio acquisition function based on the first audio source type, wherein different audio source types correspond to different audio acquisition functions;

calling the first audio acquisition function to acquire first audio data under the first audio source type; the audio acquisition functions corresponding to all the audio source types in the audio source type set support to acquire audio data under the corresponding audio source types, and the audio data under one audio source type is determined based on initial audio data input by the target audio input device;

Transmitting the first audio data to the first application program to achieve data capture of the target audio input device by the first application program; the target audio input device supports simultaneous data capture of a second application program under the data capture of the first application program, and a second audio source type of the second application program belongs to the audio source type set, wherein the second audio source type is different from the first audio source type.

According to another aspect of the present invention, there is provided an audio processing apparatus, the apparatus comprising:

the processing unit is used for determining a first audio source type of the first application program when an audio recording starting instruction aiming at the first application program is detected, wherein the first audio source type is one audio source type in an audio source type set;

the processing unit is further configured to determine a first audio acquisition function based on the first audio source type, where different audio source types correspond to different audio acquisition functions;

the acquisition unit is used for calling the first audio acquisition function and acquiring first audio data under the first audio source type; the audio acquisition functions corresponding to all the audio source types in the audio source type set support to acquire audio data under the corresponding audio source types, and the audio data under one audio source type is determined based on initial audio data input by the target audio input device;

The processing unit is further configured to send the first audio data to the first application program, so as to achieve data capturing of the target audio input device by the first application program; the target audio input device supports simultaneous data capture of a second application program under the data capture of the first application program, and a second audio source type of the second application program belongs to the audio source type set, wherein the second audio source type is different from the first audio source type.

According to another aspect of the invention there is provided an electronic device comprising a processor, and a memory storing a program, wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the above mentioned method.

According to another aspect of the present invention there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above mentioned method.

The embodiment of the invention can determine the first audio source type of the first application program when the audio recording starting instruction aiming at the first application program is detected, wherein the first audio source type is one audio source type in the audio source type set. Based on this, a first audio acquisition function may be determined based on the first audio source type, different audio source types corresponding to different audio acquisition functions; calling a first audio acquisition function to acquire first audio data under a first audio source type; the audio acquisition functions corresponding to the audio source types in the audio source type set support to acquire audio data under the corresponding audio source types, and the audio data under one audio source type is determined based on initial audio data input by the target audio input device. Then, the first audio data can be sent to the first application program to achieve data capture of the target audio input device by the first application program; the target audio input device supports simultaneous data capture of a second application program under the data capture of the first application program, and a second audio source type of the second application program belongs to an audio source type set, wherein the second audio source type is different from the first audio source type. Therefore, the embodiment of the invention can conveniently realize the simultaneous data capture of different application programs on the same audio input device under the condition that the application programs do not need to additionally adapt and compatible the simultaneous data capture function of the same audio input device; in addition, the audio data required by the corresponding application program can be directly acquired through the audio acquisition function, namely, the shunting processing under different audio source types can be realized at the position closer to the audio input equipment, and the efficiency can be improved.

Drawings

Further details, features and advantages of the invention are disclosed in the following description of exemplary embodiments with reference to the following drawings, in which:

fig. 1 shows a flow diagram of an audio processing method according to an exemplary embodiment of the invention;

FIG. 2 illustrates a schematic diagram of a frame according to an exemplary embodiment of the present invention;

fig. 3 shows a flow diagram of another audio processing method according to an exemplary embodiment of the present invention;

fig. 4 shows a flow diagram of yet another audio processing method according to an exemplary embodiment of the invention;

fig. 5 shows a flow diagram of yet another audio processing method according to an exemplary embodiment of the present invention;

fig. 6 shows a schematic block diagram of an audio processing device according to an exemplary embodiment of the present invention;

fig. 7 shows a block diagram of an exemplary electronic device that can be used to implement an embodiment of the invention.

Detailed Description

Embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the invention is susceptible of embodiment in the drawings, it is to be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the invention. It should be understood that the drawings and embodiments of the invention are for illustration purposes only and are not intended to limit the scope of the present invention.

It should be understood that the various steps recited in the method embodiments of the present invention may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like herein are merely used for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by such devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those skilled in the art will appreciate that "one or more" is intended to be construed as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the devices in the embodiments of the present invention are for illustrative purposes only and are not intended to limit the scope of such messages or information.

It should be noted that, the execution body of the audio processing method provided by the embodiment of the present invention may be one or more electronic devices, which is not limited in this aspect of the present invention; the electronic device may be a terminal (i.e. a client) or a server, and when the execution body includes a plurality of electronic devices and the plurality of electronic devices include at least one terminal and at least one server, the audio processing method provided by the embodiment of the present invention may be executed jointly by the terminal and the server. Accordingly, the terminals referred to herein may include, but are not limited to: smart phones, tablet computers, notebook computers, desktop computers, smart watches, intelligent voice interaction devices, intelligent appliances, and the like. The server mentioned herein may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing (cloud computing), cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms, and so on.

Based on the above description, an embodiment of the present invention proposes an audio processing method that can be executed by the above-mentioned electronic device (terminal or server); alternatively, the audio processing method may be performed by a terminal and a server together. For convenience of explanation, the following description will take the electronic device to execute the audio processing method as an example; as shown in fig. 1, the audio processing method may include the following steps S101 to S104:

s101, when an audio recording starting instruction aiming at a first application program is detected, determining a first audio source type of the first application program, wherein the first audio source type is one audio source type in an audio source type set.

Wherein the audio source type of an application (i.e., application) is specified when the corresponding application is developed, then an application may correspond to an audio source type, i.e., an application may have an audio source type; based on this, the audio source type of the first application may be determined to be the first audio source type. Optionally, the first application may be any application program in the installed application program included in the electronic device, such as a security monitoring application, a voice assistant, or a voice call application, which is not limited in the embodiment of the present invention.

Alternatively, the user (i.e., the object) may perform an audio recording initiation operation on any application in the display interface of the electronic device, in which case the electronic device may detect an audio recording initiation instruction for any application. Optionally, the audio recording starting operation may be a long-press operation (i.e. an operation that the continuous pressing time is longer than the preset pressing time), a single-click operation, a continuous click operation (e.g. a double-click operation, etc.), etc.; the embodiment of the present invention is not limited thereto. Optionally, the preset pressing duration may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention.

In an embodiment of the present invention, the set of audio source types may include at least two audio source types, that is, the same audio input device (i.e., audio input hardware) may be used to support audio data under each of the at least two audio source types. Optionally, the audio input device in the electronic device may include, but is not limited to: an internal Microphone (MIC), at least one external Microphone, an external camera with a recording function, etc.; the embodiment of the present invention is not limited thereto.

Optionally, the set of audio source types may include, but is not limited to: AUDIO SOURCE MIC (one AUDIO SOURCE type) and AUDIO SOURCE VOICE recording, etc.; the embodiment of the present invention is not limited thereto. The AUDIO SOURCE type audio_source_mic may be used to indicate that the microphone type is directly acquired, and in the embodiment of the present invention, the AUDIO SOURCE type may also be referred to as a microphone AUDIO SOURCE type, for example, when an application program is a call application, the AUDIO SOURCE type of the corresponding application program may be audio_source_mic; accordingly, the AUDIO SOURCE type audio_source_voice_record may be used to indicate a VOICE RECOGNITION type, and in the embodiment of the present invention, the AUDIO SOURCE type may also be referred to as a VOICE RECOGNITION AUDIO SOURCE type, for example, when an application is a VOICE assistant, the AUDIO SOURCE type of the corresponding application may be audio_source_voice_record, and so on.

Alternatively, the electronic device may be an android platform, or may be a Windows (Windows operating system) platform, which is not limited in this embodiment of the present invention. For convenience of explanation, the electronic device will be taken as an android platform for illustration.

It should be noted that the electronic device may include an application layer and an application framework layer, as shown in fig. 2; wherein an application may be located at the application layer. Specifically, the application program of the application layer is various application programs developed based on the service of the frame layer, so that a user can use the hardware function of the system or the device, that is, the application program of the application layer can realize data capture of the audio input device in the electronic device through the frame layer. Accordingly, the application framework layer may provide an interface (Application Programming Interface, API) for the developer of the application layer, which is effectively a framework for an application program, and may serve the application program at an upper layer.

Specifically, the electronic device may create a first audio resource management object (which may be represented as an AudioRecord object and the AudioRecord object may be represented as an AudioRecorder) corresponding to the first application, where one audio resource management object is used to support the corresponding application to record audio from the audio input device, and any audio resource management object is located in an application framework layer. Alternatively, the first audio source type may be determined by the electronic device through the first audio resource management object, that is, the electronic device may invoke the first audio resource management object to determine the first audio source type; alternatively, the first audio source type may be determined by the electronic device at the application layer, or may be determined by a module other than the first audio resource management object in the application framework layer, in which case the electronic device may create the first audio resource management object based on the first audio source type, so that the first audio source type is an input parameter of the first audio resource management object, and so on; the embodiment of the present invention is not limited thereto. Further, the electronic device may invoke the first audio resource management object to issue the first audio source type.

On the Android platform, the audio data collected by the application is realized by reading from the AudioRecord object. When the AudioRecord object is created, the corresponding audio input device can be opened by designating the audio source type.

Based on the above, when the first audio resource management object is called and the first audio source type is issued, the electronic device may call the first audio resource management object, send the first audio source type to the policy execution module (may also be denoted as AudioFlinger), and create, by the policy execution module, a first audio input stream object corresponding to the first application program based on the first audio source type, that is, may open, in the policy execution module, the first audio input stream corresponding to the first application program; the first audio input stream object may then be invoked to send the first audio source type into an audio hardware abstraction layer (also denoted AudioHAL) to trigger execution of a determination of a first audio acquisition function based on the first audio source type via the audio hardware abstraction layer.

Wherein, audioFlinger can be responsible for the execution of strategies, the transmission of data, etc. The audio data obtained from the AudioRecord object is obtained from AudioFlinger. The AudioFlinger can finally obtain data by opening a corresponding audio input device through an audio input stream created in the AudioHAL. The audio hardware abstraction layer is located in a hardware abstraction (also can be expressed as HAL) layer, and the hardware abstraction layer can shield the difference of hardware and is compatible with different hardware.

S102, determining a first audio acquisition function based on the first audio source type, wherein different audio source types correspond to different audio acquisition functions.

Optionally, the electronic device may further determine a target audio device type and an audio hardware abstraction layer based on the preset configuration file and the available audio input device; the preset configuration file may include parameters of different hardware modules, and types of audio devices supported by each hardware module, etc. Alternatively, the preset configuration file may be set empirically, or may be set according to actual requirements, which is not limited in the embodiment of the present invention. Optionally, the number of available audio input devices may be one or more, when the number of available audio input devices is multiple, the available audio input device with the highest priority may be selected based on the priority of each available audio input device, so as to determine the audio input device to be used, that is, the target audio input device may be determined, and the audio device type of the target audio input device is the target audio device type; when the number of available audio input devices is one, the available audio input devices may be regarded as target audio input devices, so that the target audio device type of the target audio input devices may be determined. Accordingly, the electronic device may determine an audio hardware abstraction layer supporting the target audio device type and send the target audio device type and the first audio source type to the audio hardware abstraction layer.

Alternatively, the target audio device type and the audio hardware abstraction layer may be determined by a policy scheduling module (also denoted AudioPolicy) in the application framework layer, that is, the audio policy scheduling may be responsible for by the policy scheduling module.

Based on this, the electronic device may determine a first audio acquisition function based on the target audio device type and the first audio source type, in which case one audio device type and one audio source type correspond to one audio acquisition function; that is, the electronic device may determine, through the audio hardware abstraction layer, a first audio acquisition function based on the target audio device type and the first audio source type. It should be appreciated that in the AudioHAL, the first target audio input stream object may be created and invoked according to the target audio device type and the first audio source type (i.e., the first audio input stream may be opened in the HAL, i.e., the first audio input stream of the first application may be opened in the audio hardware abstraction layer), and the first audio acquisition function may be determined to trigger execution of the following invocation of the first audio acquisition function to acquire the first audio data under the first audio source type.

Alternatively, when an audio hardware abstraction layer corresponds to an audio device type, the first audio acquisition function may be determined in the determined audio hardware abstraction layer based on only the first audio source type, in which case an audio source type corresponds to an audio acquisition function in an audio hardware abstraction layer.

S103, calling a first audio acquisition function to acquire first audio data under a first audio source type; the audio acquisition functions corresponding to the audio source types in the audio source type set support to acquire audio data under the corresponding audio source types, and the audio data under one audio source type is determined based on initial audio data input by the target audio input device.

In the embodiment of the invention, the electronic equipment can call different audio acquisition functions to acquire the audio data under different audio source types. Optionally, in the embodiment of the present invention, the AUDIO acquisition function corresponding to the AUDIO SOURCE type audio_source_mic may be denoted as in_read, and the AUDIO acquisition function corresponding to the AUDIO SOURCE type audio_source_volume_record may be denoted as in_read_record.

It can be seen that, the embodiment of the invention can implement the audio multi-stream providing method in the AudioHAL which is closer to the audio input device, so as to provide audio multi-streams with higher efficiency, because the hardware is processed through standard linux driving, which is higher than the operation efficiency of the application layer, that is, the processing and distributing process of the audio is implemented in the AudioHAL by using the C language, and the efficiency and instantaneity are higher than those of processing media data in the Java virtual machine of the application layer. For example, when the set of AUDIO SOURCE types includes audio_source_mic and audio_source_volume_record, an AUDIO dual stream may be provided, thereby separating AUDIO data under two AUDIO SOURCE types. In the embodiment of the invention, the method for opening, closing and acquiring data of different audio input streams of the same audio input device can be realized through unified management of the opening, closing, data reading, processing and distributing operations of the audio input device, and audio multiflow is provided; for ease of explanation, the following description will take the example of providing an audio dual stream.

Optionally, various peripherals or interfaces are integrated on the electronic device, so as to provide rich information input and output capability for users. Optionally, the target audio input device may be a built-in microphone, an external camera with a recording function, or the like; the embodiment of the present invention is not limited thereto. Preferably, the target AUDIO input DEVICE may be a built-IN microphone, and the target AUDIO DEVICE type may be audio_device_in_builtin_mic (a microphone AUDIO DEVICE type); for ease of explanation, the following description will take the target AUDIO input DEVICE as a built-IN microphone, and the target AUDIO DEVICE type is audio_device_in_BUILTIN_MIC.

Alternatively, the target audio input device may be a hardware peripheral in the electronic device. Optionally, the electronic device may further include a Linux kernel, and Android is an operating system based on Linux free and open source codes. In the Linux kernel, a rich driving framework is provided to access and manage various hardware devices.

S104, sending the first audio data to the first application program to achieve data capture of the target audio input device by the first application program; the target audio input device supports simultaneous data capture of a second application program under the data capture of the first application program, and a second audio source type of the second application program belongs to an audio source type set, wherein the second audio source type is different from the first audio source type.

Alternatively, the electronic device may send the first audio data to the first audio input stream object via the first audio acquisition function and send the first audio data to the first audio asset management object via the first audio input stream object, thereby sending the first audio data to the first application via the first audio asset management object.

Correspondingly, when the electronic device detects an audio recording starting instruction for the second application program, a second audio resource management object corresponding to the second application program can be created, and the second audio resource management object is called to issue a second audio source type to the policy execution module, so that a second audio input stream object corresponding to the second application program is created through the policy execution module based on the second audio source type (namely, the second audio input stream is opened in the policy execution module); further, a second audio input stream object may be invoked, a second audio source type may be sent to the audio hardware abstraction layer, and a second target audio input stream object may be created through the audio hardware abstraction layer, thereby invoking the second target audio input stream object, determining a second audio acquisition function based on the second audio source type, and invoking the second audio acquisition function to acquire second audio data under the second audio source type. It should be appreciated that the electronic device may send second audio data for the second audio source type to the second application to enable data capture by the second application for the target audio input device.

Based on the above description, the embodiments of the present invention also propose a more specific audio processing method, which can be executed by the above-mentioned electronic device (terminal or server); alternatively, the audio processing method may be performed by a terminal and a server together. For convenience of explanation, the following description will take the electronic device to execute the audio processing method as an example; referring to fig. 3, the audio processing method may include the following steps S301 to S306:

s301, when an audio recording starting instruction aiming at a first application program is detected, determining a first audio source type of the first application program, wherein the first audio source type is one audio source type in an audio source type set.

S302, determining a first audio acquisition function based on a first audio source type, wherein different audio source types correspond to different audio acquisition functions.

S303, when the first application program starts to read the audio data, whether the multi-stream processing thread is started or not is checked.

Wherein the multi-stream processing thread may also be referred to as a dual-stream processing thread when the set of audio source types comprises two audio source types, that is, when the same audio input device supports two applications for data capture simultaneously.

By way of example, as shown in FIG. 4, when the set of audio source types includes two audio source types, the first application program is application 1, it may be determined that the dual stream processing thread is not started; when the first application is application 2, it may be determined that the dual-stream processing thread has been started, in which case, before the first application starts reading the audio data, any application (e.g., application 1) other than the first application exists in the electronic device, and reading the audio data has already started, so that the dual-stream processing thread has been started, and application 1 may be the second application.

It can be seen that the embodiment of the present invention can avoid repeatedly starting the multi-stream processing thread and can avoid repeatedly opening the audio input device by checking whether the multi-stream processing thread has been started.

S304, if the multi-stream processing thread is not started, starting the multi-stream processing thread and opening the target audio input device.

Based on the above, the electronic device may trigger the following call to a first audio acquisition function to acquire first audio data under a first audio source type; that is, after the multi-stream processing thread and the target audio input device are turned on, the following call to the first audio acquisition function may be triggered to acquire the first audio data under the first audio source type. The multi-stream processing thread and the target audio input device are consistent in opening state, namely, when the multi-stream processing thread is in the opening state, the target audio input device is also in the opening state, and when the multi-stream processing thread is in the non-opening state, the target audio input device is also in the non-opening state. Correspondingly, if the multi-stream processing thread is started, the following calling of the first audio acquisition function can be triggered to acquire the first audio data under the first audio source type.

Alternatively, the multi-stream processing thread may be located in an audio hardware abstraction layer; it should be noted that, because the audio input device cannot be opened multiple times at the same time, in the existing audio hardware abstraction layer, the same audio input device cannot be opened in two paths of different audio input streams to obtain audio data.

In the embodiment of the invention, after the multi-stream processing thread is created (namely, the multi-stream processing thread is started), a driving node (namely, audio driving) of the target audio input device can be opened so as to acquire initial audio data from the driving node; wherein the audio driver may be located in a Linux kernel in the electronic device.

S305, calling a first audio acquisition function to acquire first audio data under a first audio source type; the audio acquisition functions corresponding to the audio source types in the audio source type set support to acquire audio data under the corresponding audio source types, and the audio data under one audio source type is determined based on initial audio data input by the target audio input device.

The first audio data is acquired from a data cache of the multi-stream processing thread through a first audio acquisition function. In the embodiment of the invention, the electronic device may acquire initial audio data from the audio driver of the target audio input device through the multi-stream processing thread, and determine an audio input stream processing manner corresponding to each specified audio source type of the M specified audio source types, where the M specified audio source types include: in the audio source type set, M is a positive integer aiming at the audio source type of each application program of which the target audio input device starts audio recording. By way of example, assuming application 1 initiated audio recording for the target audio input device, the M specified audio source types may include the audio source type of application 1; assuming that application 1 and application 2 each have initiated audio recording for a target audio input device, the audio source type of application 1 and the audio source type of application 2 are different, and the audio source type of application 1 and the audio source type of application 2 both belong to a set of audio source types, then the M specified audio source types may include the audio source type of application 1 and the audio source type of application 2.

Based on the above, the streaming processing thread can be used for respectively carrying out streaming processing on the initial audio data according to the audio input stream processing mode corresponding to each appointed audio source type to obtain the audio data of the initial audio data under each appointed audio source type; storing the audio data of the initial audio data under each appointed audio source type into a data cache of a multi-stream processing thread; wherein the first audio data is audio data of the initial audio data under the first audio source type.

Then, correspondingly, the electronic device may call a first audio acquisition function to acquire first audio data for the first audio source type from the data cache.

In an embodiment of the present invention, an audio input stream processing manner corresponding to an audio source type includes at least one of the following: specifying a channel data separation mode, a copying mode and an audio data processing mode, wherein the audio data processing mode comprises at least one of the following: acoustic echo cancellation, background noise suppression, and automatic gain control. Based on the above, when splitting the initial audio data according to the audio input stream processing modes corresponding to each specified audio source type to obtain the audio data of the initial audio data under each specified audio source type, if the audio input stream processing mode corresponding to any specified audio source type is the specified channel data splitting mode, the electronic device can determine the specified channel corresponding to any specified audio source type, split the initial audio data according to the specified channel, obtain the specified channel data of the initial audio data under the specified channel, and use the specified channel data as the audio data of the initial audio data under any specified audio source type; that is, the electronic device may determine, by the multi-stream processing thread, audio data for the initial audio data at any given audio source type. Optionally, the designated channel corresponding to any designated audio source type may refer to a single channel, or may refer to multiple channels, or the like, which is not limited in the embodiment of the present invention; that is, the electronic device may perform separation processing on the initial audio data to separate and split the initial audio data, so as to perform splitting processing on the initial audio data, and obtain audio data under each channel, so as to obtain specified channel data under specified channels.

Correspondingly, if the audio input stream processing mode corresponding to any specified audio source type is a copy mode, the initial audio data is used as the audio data of the initial audio data under any specified audio source type. In this case, the initial audio data may be copied to obtain audio data under any one of the specified audio source types, that is, the copied initial audio data may be used as audio data under any one of the specified audio source types to realize the initial audio data as audio data under any one of the specified audio source types, thereby realizing the splitting process.

Correspondingly, if the audio input stream processing mode corresponding to any appointed audio source type is an audio data processing mode, performing data processing on the initial audio data according to the audio data processing mode to obtain the audio data of the initial audio data under any appointed audio source type. For example, when the audio data processing mode includes an acoustic echo cancellation mode, acoustic echo cancellation processing may be performed on the initial audio data to implement data processing on the initial audio data; when the audio data processing mode comprises a background noise suppression mode, background noise suppression processing can be performed on the initial audio data; when the audio data processing mode includes an automatic gain control mode, automatic gain processing may be performed on the initial audio data, and so on; the embodiment of the present invention is not limited thereto. Alternatively, when the audio data processing mode includes an acoustic echo cancellation mode, a background noise suppression mode, and an automatic gain control mode, the data processing may be referred to as audio 3A data processing or 3A audio processing, or the like.

S306, the first audio data is sent to the first application program so as to achieve data capture of the target audio input device by the first application program; the target audio input device supports simultaneous data capture of a second application program under the data capture of the first application program, and a second audio source type of the second application program belongs to an audio source type set, wherein the second audio source type is different from the first audio source type.

It should be appreciated that the electronic device may obtain audio data for the second audio source type (i.e., second audio data) via the second audio acquisition function and send the second audio data to the second application to enable data capture by the second application for the target audio input device. In summary, the audio data acquisition mode in the embodiment of the invention keeps consistent with the standard flow of the Android platform, so that the safety is ensured, and the safety can be effectively ensured; in addition, the embodiment of the invention can increase the data processing function of the same audio input device in the audio hardware abstraction layer, realize multi-channel data splitting or audio 3A data processing of the audio input device, and the like, and can distribute the corresponding split or processed audio data to different application programs in the audio hardware abstraction layer through the audio source type corresponding to the audio input stream (such as the first audio source type corresponding to the first audio input stream).

In the embodiment of the invention, the first audio data is obtained by performing splitting processing on the basis of the multi-stream processing thread by the first audio acquisition function. Based on the above, when an audio recording stopping instruction for the first application program is detected, the electronic device can destroy the first audio resource management object corresponding to the first application program, namely, can destroy the first audio resource management object of the first audio source type; closing a first audio input stream of the first application program in the audio hardware abstraction layer, that is, destroying the first audio input stream object and the first target audio input stream object, and stopping the multi-stream processing thread to perform splitting processing for the first audio source type. Then, it may be checked whether the multi-stream processing thread is still performing the splitting process, and if the multi-stream processing thread is not performing the splitting process, the target audio input device and the multi-stream processing thread are turned off.

For example, as shown in fig. 5, taking a multi-stream processing thread as a dual-stream processing thread as an example, after the audio recording is started by both the application 1 and the application 2, assuming that the first application is the application 1, after the audio recording is stopped by the application 1, it may be determined that the dual-stream processing thread is still performing the splitting process, that is, it may still determine the audio data under the second audio source type, and no other operation may be performed at this time. Then, correspondingly, assuming that the first application program is application 2 and that application 1 has stopped recording before application 2 stopped recording, after application 2 stopped recording, it may be determined that the dual-stream processing thread is not splitting, so that the corresponding audio input device (e.g., the target audio input device) and dual-stream processing thread may be turned off. It can be seen that when one audio source type of AudioRecord object is created, i.e. a dual-stream processing thread is created, and when both audio source types of AudioRecord objects are destroyed, the dual-stream processing thread is exited to save resources.

The embodiment of the invention can determine the first audio source type of the first application program when the audio recording starting instruction aiming at the first application program is detected, wherein the first audio source type is one audio source type in an audio source type set; determining a first audio acquisition function based on the first audio source type, wherein different audio source types correspond to different audio acquisition functions; checking whether a multi-stream processing thread is started when a first application program starts to read audio data; if the multi-stream processing thread is not started, starting the multi-stream processing thread, and opening the target audio input device to trigger the following calling of the first audio acquisition function to acquire the first audio data under the first audio source type. Based on this, a first audio acquisition function may be invoked to acquire first audio data under a first audio source type; the audio acquisition functions corresponding to the audio source types in the audio source type set support to acquire audio data under the corresponding audio source types, and the audio data under one audio source type is determined based on initial audio data input by the target audio input device. Further, the first audio data may be sent to the first application to enable the first application to capture data of the target audio input device; the target audio input device supports simultaneous data capture of a second application program under the data capture of the first application program, and a second audio source type of the second application program belongs to an audio source type set, wherein the second audio source type is different from the first audio source type. Therefore, the embodiment of the invention can realize that different application programs can simultaneously acquire the audio data on the same audio input device, so as to solve the conflict when the same audio input device is needed to be used by different application programs, and support richer service scenes; moreover, the application program can acquire corresponding audio data by designating an audio source type through the Android standard service, and the application program does not need to be additionally adapted and compatible with a simultaneous data capturing function of the same audio input device, so that simultaneous data capturing of different application programs on the same audio input device is conveniently realized.

Based on the above description of related embodiments of the audio processing method, embodiments of the present invention also provide an audio processing apparatus, which may be a computer program (including program code) running in an electronic device; as shown in fig. 6, the audio processing apparatus may include a processing unit 601 and an acquisition unit 602. The audio processing device may perform the audio processing method shown in fig. 1 or 3, i.e. the audio processing device may operate the above units:

the processing unit 601 is configured to determine, when an audio recording start instruction for a first application program is detected, a first audio source type of the first application program, where the first audio source type is one audio source type in a set of audio source types;

the processing unit 601 is further configured to determine a first audio acquisition function based on the first audio source type, where different audio source types correspond to different audio acquisition functions;

an obtaining unit 602, configured to call the first audio obtaining function, and obtain first audio data under the first audio source type; the audio acquisition functions corresponding to all the audio source types in the audio source type set support to acquire audio data under the corresponding audio source types, and the audio data under one audio source type is determined based on initial audio data input by the target audio input device;

The processing unit 601 is further configured to send the first audio data to the first application program, so as to enable the first application program to capture data of the target audio input device; the target audio input device supports simultaneous data capture of a second application program under the data capture of the first application program, and a second audio source type of the second application program belongs to the audio source type set, wherein the second audio source type is different from the first audio source type.

In one embodiment, the processing unit 601 may be further configured to:

creating a first audio resource management object corresponding to the first application program, wherein one audio resource management object is used for supporting the corresponding application program to record audio from the audio input equipment, and any audio resource management object is positioned in an application program framework layer;

and calling the first audio resource management object and issuing the first audio source type.

In another embodiment, when the first audio resource management object is called and the first audio source type is issued, the processing unit 601 may be specifically configured to:

invoking the first audio resource management object, sending the first audio source type to a strategy execution module, and creating a first audio input stream object corresponding to the first application program based on the first audio source type through the strategy execution module;

And calling the first audio input stream object, and sending the first audio source type to an audio hardware abstraction layer so as to trigger execution of the first audio acquisition function based on the first audio source type through the audio hardware abstraction layer.

In another embodiment, the obtaining unit 602 may be further configured to:

checking whether a multi-stream processing thread is started when the first application program starts to read audio data;

and if the multi-stream processing thread is not started, starting the multi-stream processing thread, and opening the target audio input device to trigger the execution of the calling of the first audio acquisition function to acquire the first audio data under the first audio source type.

In another embodiment, the first audio data is obtained from the data buffer of the multi-stream processing thread by the first audio obtaining function, and the obtaining unit 602 is further configured to:

acquiring the initial audio data from the audio driver of the target audio input device through the multi-stream processing thread, and determining an audio input stream processing mode corresponding to each designated audio source type in M designated audio source types, wherein the M designated audio source types comprise: in the audio source type set, aiming at the audio source types of all application programs of which the target audio input equipment starts audio recording, M is a positive integer;

The processing unit 601 may further be configured to:

splitting the initial audio data according to the audio input stream processing modes corresponding to the specified audio source types respectively to obtain audio data of the initial audio data under the specified audio source types;

storing the audio data of the initial audio data under each appointed audio source type into a data cache of the multi-stream processing thread; wherein the first audio data is audio data of the initial audio data under the first audio source type.

In another embodiment, the audio input stream processing manner corresponding to one audio source type includes at least one of the following: designating a channel data separation mode, a copying mode and an audio data processing mode, wherein the audio data processing mode comprises at least one of the following: an acoustic echo cancellation mode, a background noise suppression mode, and an automatic gain control mode; the processing unit 601, when performing splitting processing on the initial audio data according to the audio input stream processing manner corresponding to the specified audio source types, to obtain audio data of the initial audio data under the specified audio source types, may be specifically configured to:

For any one of the M specified audio source types, if the audio input stream processing mode corresponding to the any one specified audio source type is the specified channel data separation mode, determining a specified channel corresponding to the any one specified audio source type, and performing separation and splitting processing on the initial audio data according to the specified channel to obtain specified channel data of the initial audio data under the specified channel, and taking the specified channel data as audio data of the initial audio data under the any one specified audio source type; or,

if the audio input stream processing mode corresponding to any one of the specified audio source types is the copying mode, the initial audio data is used as the audio data of the initial audio data under any one of the specified audio source types; or,

and if the audio input stream processing mode corresponding to any one of the specified audio source types is the audio data processing mode, performing data processing on the initial audio data according to the audio data processing mode to obtain the audio data of the initial audio data under any one of the specified audio source types.

In another embodiment, the first audio data is obtained by splitting the first audio acquisition function based on a multi-stream processing thread, and the processing unit 601 is further configured to:

destroying a first audio resource management object corresponding to the first application program when an audio recording stopping instruction aiming at the first application program is detected;

closing a first audio input stream of the first application program in an audio hardware abstraction layer, and stopping the multi-stream processing thread to carry out shunting processing for the first audio source type;

and checking whether the multi-stream processing thread still performs the splitting process, and if the multi-stream processing thread does not perform the splitting process, closing the target audio input device and the multi-stream processing thread.

According to one embodiment of the invention, the steps involved in the method shown in fig. 1 or 3 may be performed by the units in the audio processing device shown in fig. 6. For example, steps S101, S102, and S104 shown in fig. 1 may each be performed by the processing unit 601 shown in fig. 6, and step S103 may be performed by the acquisition unit 602 shown in fig. 6. As another example, steps S301, S302, and S306 shown in fig. 3 may each be performed by the processing unit 601 shown in fig. 6, steps S303 to S305 may each be performed by the acquisition unit 602 shown in fig. 6, and so on.

According to another embodiment of the present invention, each unit in the audio processing apparatus shown in fig. 6 may be separately or completely combined into one or several other units, or some unit(s) thereof may be further split into a plurality of units with smaller functions, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present invention. The above units are divided based on logic functions, and in practical applications, the functions of one unit may be implemented by a plurality of units, or the functions of a plurality of units may be implemented by one unit. In other embodiments of the present invention, any of the audio processing apparatuses may also include other units, and in practical applications, these functions may also be realized with assistance of other units, and may be realized by cooperation of a plurality of units.

According to another embodiment of the present invention, an audio processing apparatus as shown in fig. 6 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective methods as shown in fig. 1 or 3 on a general-purpose electronic device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and implementing the audio processing method of the embodiment of the present invention. The computer program may be recorded on, for example, a computer storage medium, and loaded into and run in the above-described electronic device through the computer storage medium.

Based on the description of the method embodiment and the apparatus embodiment, the exemplary embodiment of the present invention further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method according to an embodiment of the invention when executed by the at least one processor.

The exemplary embodiments of the present invention also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the present invention.

The exemplary embodiments of the invention also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the invention.

Referring to fig. 7, a block diagram of an electronic device 700 that may be a server or a client of the present invention will now be described, which is an example of a hardware device that may be applied to aspects of the present invention. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be any type of device capable of inputting information to the electronic device 700, and the input unit 706 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 707 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 708 may include, but is not limited to, magnetic disks, optical disks. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above. For example, in some embodiments, the audio processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. In some embodiments, the computing unit 701 may be configured to perform the audio processing method by any other suitable means (e.g., by means of firmware).

Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It is also to be understood that the foregoing is merely illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims

1. An audio processing method, comprising:

2. The method according to claim 1, wherein the method further comprises:

3. The method of claim 2, wherein the invoking the first audio resource management object, issuing the first audio source type, comprises:

4. A method according to any one of claims 1-3, wherein the method further comprises:

5. The method of claim 4, wherein the first audio data is retrieved from a data cache of the multi-stream processing thread by the first audio retrieval function, the method further comprising:

6. The method of claim 5, wherein the audio input stream processing mode corresponding to one audio source type includes at least one of: designating a channel data separation mode, a copying mode and an audio data processing mode, wherein the audio data processing mode comprises at least one of the following: an acoustic echo cancellation mode, a background noise suppression mode, and an automatic gain control mode; the splitting processing is performed on the initial audio data according to the audio input stream processing modes corresponding to the specified audio source types, so as to obtain the audio data of the initial audio data under the specified audio source types, including:

7. A method according to any of claims 1-3, wherein the first audio data is derived by splitting the first audio acquisition function based on a multi-stream processing thread, the method further comprising:

8. An audio processing apparatus, the apparatus comprising:

9. An electronic device, comprising:

a processor; and

a memory in which a program is stored,

wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to any of claims 1-7.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-7.