CN117389506A - Audio data processing method, device and system, electronic equipment and storage medium - Google Patents

Audio data processing method, device and system, electronic equipment and storage medium Download PDF

Info

Publication number
CN117389506A
CN117389506A CN202311412427.8A CN202311412427A CN117389506A CN 117389506 A CN117389506 A CN 117389506A CN 202311412427 A CN202311412427 A CN 202311412427A CN 117389506 A CN117389506 A CN 117389506A
Authority
CN
China
Prior art keywords
audio
target application
target
audio data
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311412427.8A
Other languages
Chinese (zh)
Inventor
张利群
吉超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202311412427.8A priority Critical patent/CN117389506A/en
Publication of CN117389506A publication Critical patent/CN117389506A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/162Interface to dedicated audio devices, e.g. audio drivers, interface to CODECs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

The application provides an audio data processing method, an audio data processing device, electronic equipment and a storage medium, wherein the audio data processing method comprises the following steps: acquiring multi-channel audio data and determining the number of audio channels of the audio data required by a target application; under the condition that the number of the audio channels is smaller than the preset number of channels, processing the multi-channel audio data according to the audio use scene of the target application to obtain first target audio conforming to the audio use scene, and sending the first target audio to the target application; and under the condition that the number of the audio channels is not smaller than the preset number of channels, sending the multi-channel audio data to the target application so that the target application processes the multi-channel audio data to obtain second target audio which accords with the audio use scene of the target application.

Description

Audio data processing method, device and system, electronic equipment and storage medium
Technical Field
The present invention relates to the field of audio processing, and in particular, to a method, an apparatus, a system, an electronic device, and a storage medium for processing audio data.
Background
The recording is a basic function of audio signal module equipment, and the multi-channel recording based on a recording matrix formed by a plurality of microphones is also a common method for improving the recording quality.
In order to improve the recording quality, the prior art generally optimizes the audio according to the specific use scene of the recorded audio, so as to provide the quality of the recorded audio in the scene, but the method is only suitable for a single use scene, cannot cover multiple use scenes of the audio, and cannot meet the pursuit of users for high-quality recording of multiple scenes.
Therefore, how to process audio data to meet the requirement of users for multi-scene high-quality audio becomes a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
Based on the above state of the art, the present application proposes an audio data processing method, device, system, electronic device and storage medium, so as to meet the requirements of users for multi-scene high-quality audio.
In order to achieve the technical purpose, the application provides the following technical scheme:
an audio data processing method, comprising:
acquiring multi-channel audio data and determining the number of audio channels of the audio data required by a target application;
Under the condition that the number of the audio channels is smaller than the preset number of channels, processing the multi-channel audio data according to the audio use scene of the target application to obtain first target audio conforming to the audio use scene, and sending the first target audio to the target application;
and under the condition that the number of the audio channels is not smaller than the preset number of channels, sending the multi-channel audio data to the target application so that the target application processes the multi-channel audio data to obtain second target audio which accords with the audio use scene of the target application.
Optionally, the audio usage scenario of the target application is determined by:
acquiring audio use conditions of the target application, wherein the audio use conditions of the target application comprise the number of audio channels of audio data required by the target application and a recording source of audio required by the target application;
and determining the audio use scene of the target application according to the audio use condition of the target application.
Optionally, the processing the multi-channel audio data according to the audio usage scenario of the target application to obtain a first target audio conforming to the audio usage scenario includes:
Determining an audio processing algorithm corresponding to the audio use scene according to the audio use scene;
and performing data processing on the multi-channel audio data by utilizing an audio processing algorithm corresponding to the audio use scene to obtain first target audio conforming to the audio use scene.
Optionally, the determining an audio processing algorithm corresponding to the audio usage scenario according to the audio usage scenario includes:
when the audio use scene is audio recording, determining that the audio processing algorithm is an audio noise reduction algorithm and an audio enhancement algorithm;
the method comprises the steps of,
and when the audio use scene is call recording, determining that the audio processing algorithm is an echo cancellation algorithm.
Optionally, the target application performs data processing on the multi-channel audio data in the following manner to obtain a second target audio which accords with an audio usage scene of the target application:
calling an audio processing algorithm corresponding to the audio use scene of the target application, and performing data processing on the multi-channel audio data to obtain second target audio conforming to the audio use scene of the target application;
or,
and selecting an algorithm corresponding to the audio use scene of the target application from the audio processing algorithms stored in the target application, and performing data processing on the multi-channel audio data to obtain second target audio conforming to the audio use scene of the target application.
Optionally, the usage scenario of the second target audio includes: at least one of voice wake-up recognition, voice evaluation.
An audio data processing apparatus comprising:
a first unit for acquiring multi-channel audio data and determining the number of audio channels of the audio data required by the target application;
the second unit is used for processing the multi-channel audio data according to the audio use scene of the target application to obtain first target audio conforming to the audio use scene and sending the first target audio to the target application under the condition that the number of the audio channels is smaller than the preset number of channels;
and the third unit is used for sending the multi-channel audio data to the target application under the condition that the number of the audio channels is not smaller than the preset number of channels, so that the target application processes the multi-channel audio data to obtain second target audio which accords with the audio use scene of the target application.
An audio data processing system, comprising:
the audio acquisition unit is used for acquiring multi-channel audio data;
the audio processing unit is used for acquiring multi-channel audio data and determining the number of audio channels of the audio data required by the target application; under the condition that the number of the audio channels is smaller than the preset number of channels, processing the multi-channel audio data according to the audio use scene of the target application to obtain first target audio conforming to the audio use scene, and sending the first target audio to the target application; and under the condition that the number of the audio channels is not smaller than the preset number of channels, sending the multi-channel audio data to the target application so that the target application processes the multi-channel audio data to obtain second target audio which accords with the audio use scene of the target application.
An electronic device, comprising:
a processor and a memory;
wherein the memory is used for storing a computer program; the processor is configured to implement the above-mentioned audio data processing method by running a computer program stored in the memory.
A computer storage medium storing a computer program which, when executed, implements the above-described audio data processing method.
The application provides an audio data processing method, an apparatus, a system, an electronic device and a storage medium, wherein the audio data processing method comprises the following steps: acquiring multi-channel audio data and determining the number of audio channels of the audio data required by a target application; under the condition that the number of the audio channels is smaller than the preset number of channels, processing the multi-channel audio data according to the audio use scene of the target application to obtain first target audio conforming to the audio use scene, and sending the first target audio to the target application; and under the condition that the number of the audio channels is not smaller than the preset number of channels, sending the multi-channel audio data to the target application so that the target application processes the multi-channel audio data to obtain second target audio which accords with the audio use scene of the target application.
According to the audio data processing method, the multi-channel audio is subjected to data processing by combining the audio use scenes of the target application, so that the target audio meeting the requirements of the target application is obtained, the use requirements of users on the audio in different scenes are met, and the use quality of the audio is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
Fig. 1 is a flowchart of an audio data processing method provided in an embodiment of the present application;
FIG. 2 is a flowchart of a first multi-channel audio data processing method according to an embodiment of the present application;
FIG. 3 is a flow chart of a second multi-channel audio data processing method according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating a process of processing multi-channel audio data by the recording system according to an embodiment of the present application;
FIG. 5 is a block diagram of an audio data processing device according to an embodiment of the present application;
Fig. 6 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The recording is a basic function of audio signal module equipment, and the multi-channel recording based on a recording matrix formed by a plurality of microphones is also a common method for improving the recording quality.
In order to improve the recording quality, the prior art generally optimizes the audio according to the specific use scene of the recorded audio, so as to provide the quality of the recorded audio in the scene, but the method is only suitable for a single use scene, cannot cover multiple use scenes of the audio, and cannot meet the pursuit of users for high-quality recording of multiple scenes.
Therefore, the application provides a processing method, device, system, electronic equipment and storage medium for audio data, so as to meet the requirements of users for multi-scene high-quality audio.
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Exemplary method
The embodiment of the application firstly provides a processing method of audio data, which is characterized in that data processing is carried out on multichannel audio by combining the audio use scene of a target application, so that the target audio meeting the requirements of the target application is obtained, the use requirements of users on the audio in different scenes are met, and the use quality of the audio is improved.
In the embodiment of the application, the implementation subject of the audio data processing method may be a notebook computer, a tablet computer, a desktop computer, a mobile device (e.g., a learning machine, a mobile phone, a host, an intelligent robot, a portable music player), or any other type of user terminal or intelligent device, or a combination of any two or more of these data processing devices.
Referring to fig. 1, the method includes:
s101, acquiring multi-channel audio data and determining the number of audio channels of the audio data required by a target application.
The multi-channel audio data refers to an audio signal which simultaneously contains a plurality of independent audio channels in the recording or audio data processing process, and each audio channel can contain independent audio information.
The target Application may be understood as an Application program (Application), and in an alternative embodiment of the present Application, the target Application refers to an Application program installed in a recording system, and the Application program installed in the recording body may be one or more Application programs, where each Application program corresponds to one recording scene, for example: the recording scene may be an audio recording scene, a call recording scene, etc.
In another optional embodiment of the present application, the target application and the recording system may be further connected through a data line, a wireless network, or the like, for example, the target application may be an application program installed in the mobile phone, for example: audio recording program, voice awakening program and the like, wherein the tablet personal computer is used for collecting audio information.
Furthermore, the number of Audio channels of the Audio data required by the target application can be sent to a recording system for collecting Audio through the target application, so that an Audio hardware abstraction layer (Audio HAL) in the recording body can know the number of Audio channels required by the target application in time, and further data processing is conveniently carried out on the collected multi-channel Audio to obtain Audio conforming to the number of Audio channels, and further the quality of the Audio obtained by the target application is improved.
S102, under the condition that the number of the audio channels is smaller than the preset number of channels, processing the multi-channel audio data according to the audio use scene of the target application to obtain first target audio conforming to the audio use scene, and sending the first target audio to the target application.
In an alternative embodiment of the present application, the target applications can be categorized into two categories according to the number of channels required for recording: single-channel or double-channel recording and multi-channel recording, wherein the single-channel or double-channel recording comprises audio use scenes such as audio recording, voice call recording, camera recording and the like; the multi-channel recording comprises audio use scenes such as voice awakening, voice evaluation and the like. The preset channel number can be set to 3 according to the category of the recording, namely, under the condition that the audio channel number is smaller than 3, the recording is single-channel recording or double-channel recording, otherwise, the recording is multi-channel recording.
It can be understood that setting the preset channel number to 3 is only an optional implementation manner of the present application, and in the practical application process, the preset channel number may be set according to a practical application scenario of the audio data processing method, or may be set by a user according to a practical requirement, or may also be set by a user, or may also be the number of audio channels automatically detected by the audio processing system according to a scenario and a task, which is not limited in this application. For example, in some application scenarios, the preset number of channels may be 2 when the recording includes only two types of single-channel recording and multi-channel recording.
In an alternative embodiment of the present application, the audio usage scenario of the target application is determined by:
acquiring audio use conditions of the target application, wherein the audio use conditions of the target application comprise the number of audio channels of audio data required by the target application and recording sources required by the target application; and determining the audio use scene of the target application according to the audio use condition of the target application.
The recording source may be understood as an audio signal source used in the process of collecting multi-channel audio data, and is used to represent an audio input from where the recording system obtains. For example, assuming that the number of audio channels required by the target application is single channel or dual channel, and the recording source is a microphone, it may be determined that the audio usage scenario of the target application is a common audio recording.
Specifically, the audio usage scenario of the target application may be sent to the recording system after the user opens the recording function of the target application, so that the recording system determines the audio usage scenario of the target application.
In another alternative embodiment of the present application, the target application may also directly send the corresponding audio usage fields Jing Biaoqian to the recording system, so that the recording system directly obtains the audio usage scenario of the target application.
Under the condition that the number of audio channels is smaller than the preset number of channels, the number of channels required by the target application recording is represented as a single channel or a double channel, and under the condition that the number of channels required by the target application recording is represented as a single channel or a double channel, the processing of the multi-channel audio data according to the audio use scene of the target application to obtain the first target audio conforming to the audio use scene comprises the following steps:
determining an audio processing algorithm corresponding to the audio use scene according to the audio use scene; and performing data processing on the multi-channel audio data by utilizing an audio processing algorithm corresponding to the audio use scene to obtain first target audio conforming to the audio use scene.
For example, when the audio usage scene is audio recording, the audio processing algorithm is specifically an audio noise reduction algorithm and an audio enhancement algorithm; and when the audio use scene is call recording, determining that the audio processing algorithm is an echo cancellation algorithm.
Further, in order to facilitate understanding of the process of processing the multi-channel audio data in the case where the number of audio channels is smaller than the preset number of channels, the process will be described in detail with reference to fig. 2.
As shown in fig. 2, fig. 2 includes:
multi-channel original audio 201, audio hardware abstraction layer 202, target application 203.
First, after the multi-channel original audio is acquired through the hardware device of the recording system, the multi-channel original audio 201 is sent to the audio hardware abstraction layer 202.
The audio hardware abstraction layer 202 is a software component for managing audio hardware and providing audio functions to act as an intermediate layer between the audio hardware and the operating system. The audio hardware abstraction layer 202 is further configured to receive the number of audio channels of the audio data required by the target application and the recording source of the audio required by the target application, where the audio channels are sent by the target application; and determining an audio use scene of the target application according to the number of audio channels of the audio data required by the target application and the recording source of the audio required by the target application.
Further, after determining the audio usage scenario of the target application, the audio hardware abstraction layer 202 invokes an algorithm corresponding to the audio usage scenario, processes the multi-channel audio data to obtain a first target audio corresponding to the number of audio channels required by the target application, and sends the first target audio to the target application 203.
For example, assuming that the multi-channel original audio is 8 channels of audio signals, and the audio usage scene of the target application is a call scene, determining that an algorithm corresponding to the call scene is an echo cancellation algorithm, at this time, reading the 3 rd output of the algorithm, where the 3 rd output mainly performs algorithm processing such as echo cancellation on the multi-channel audio to convert the 8 channels of audio signals into 1 channel of audio signals, and outputting the 1 channel of audio signals to the target application.
For example, if the multi-channel original audio is 8 paths of audio signals, and the audio usage scene of the target application is determined to be the audio recording scene, the algorithm corresponding to the call scene is determined to be the noise reduction, enhancement and other algorithms, at this time, the 4 th path of output of the algorithm is read, and the processing of noise reduction, enhancement and the like is mainly performed on the path of output, so that the 8 paths of audio signals are converted into 1 path of audio signals and output to the target application.
And S103, under the condition that the number of the audio channels is not smaller than the preset number of channels, the multi-channel audio data are sent to the target application, so that the target application processes the multi-channel audio data to obtain second target audio which accords with the audio use scene of the target application.
In the prior art, in consideration of that the Android native system logic only supports single-channel or dual-channel recording and does not support multi-channel recording, applications such as wake-up recognition and speech evaluation often need multiple paths of original audio to improve recognition efficiency, in the prior art, 1 or 2 paths of audio are usually directly sent to related applications after the audio data reach an audio hardware abstraction layer, or after being processed by some algorithms, the multi-channel audio data are converted into 1 or 2 paths of audio to be sent to related applications, but in this way, the high quality requirements of scenes such as wake-up recognition and speech evaluation on the audio cannot be met.
In this embodiment of the present application, the software logic may be modified to enable the recording system to support multi-channel recording, and in the case where the number of audio channels required by the target application is greater than or equal to 3, it is indicated that the audio usage scenario of the target application may be wake-up recognition, speech evaluation, or the like, and in this case, the multi-channel audio data may be directly sent to the target application based on the step S103, so that the target application processes the multi-channel audio data to obtain the second target audio that accords with the audio usage scenario of the target application.
Specifically, the target application may perform data processing on the multi-channel audio data to obtain a second target audio that accords with an audio usage scenario of the target application in the following manner:
and calling an audio processing algorithm corresponding to the audio use scene of the target application, and performing data processing on the multi-channel audio data to obtain second target audio conforming to the audio use scene of the target application.
Or,
and selecting an algorithm corresponding to the audio use scene of the target application from the audio processing algorithms stored in the target application, and performing data processing on the multi-channel audio data to obtain second target audio conforming to the audio use scene of the target application.
That is, the target application may invoke a corresponding algorithm from an audio hardware abstraction layer of the recording system according to its own audio usage scenario, and perform data processing on the multi-channel audio data, so as to obtain the second target audio; corresponding algorithms can be written at the bottom layer of the target application, so that the target application can directly use the algorithms to process the multichannel audio after receiving the multichannel audio, and the second target audio is obtained.
Further, in order to facilitate understanding of the process of processing the multi-channel audio data in the case where the number of audio channels is not less than the preset number of channels, the process will be described in detail with reference to fig. 3.
As shown in fig. 3, fig. 3 includes:
a multi-channel original audio 301, an audio hardware abstraction layer 302, a target application 303 and an algorithm processing module 304.
After the recording system collects the multi-channel original audio 301, the multi-channel original audio 301 is sent to the audio hardware abstraction layer 302, and then the multi-channel original audio 301 is directly sent to the corresponding target application under the condition that the number of audio channels required by the target application received by the audio hardware abstraction layer 302 is not smaller than the preset number of channels.
After the target application 303 obtains the multi-channel original audio 301, an algorithm in the audio hardware abstraction layer 302 can be called by combining the audio usage scene of the application, so as to obtain the second target audio, thereby improving the recording quality;
or in the case that a corresponding algorithm is deployed in the target application 303, after the target application 303 obtains the multi-channel original audio, the multi-channel original audio 301 may be processed by the algorithm to obtain the second target audio, so as to improve recording quality.
Further, in order to facilitate understanding of the audio data processing method provided in the embodiments of the present application, the audio data processing method is described in detail below with reference to fig. 4.
As shown in fig. 4, fig. 4 includes: an audio digital signal processor 401, a kernel 402, an audio hardware abstraction layer 403, an audio framework 404, an application 405.
Wherein the application 405 includes: a speech evaluation class application, a wake-up recognition class application, a talk class application, and an audio recording class application.
Still taking a preset number of channels of 3 channels as an example, the number of audio channels required by the conversation-type application and the audio recording-type application is less than 3 (i.e., the number of audio channels required by such applications is single or dual channel); the number of audio channels required by the speech evaluation class application and the wake recognition class application is greater than 3 (i.e., the number of audio channels required by such applications is multiple channels).
For the above-mentioned conversation type application and audio recording type application, taking the conversation type application as an example, when the conversation type application starts the recording function, the conversation type application sends the number of audio channels required by the application and the recording source of the required audio to a recording system so that the recording system can determine the audio use scene of the voice conversation type application according to the number of voice channels required by the conversation type application and the recording source of the required audio.
Further, the recording system collects multi-channel audio data in combination with the recording source, and sends the multi-channel audio data to the audio hardware abstraction layer 403 through the audio digital signal processor 401 and the kernel 402, the audio hardware abstraction layer 403 is provided with a plurality of audio processing algorithms, after the audio hardware abstraction layer 403 receives the multi-channel audio, an echo cancellation algorithm corresponding to an audio use scene of the call class application is determined, the multi-channel audio data is processed by using the echo cancellation algorithm, so as to obtain single-channel audio or multi-channel audio, and the single-channel audio or the multi-channel audio is sent to the call class application through the audio framework 404.
Taking the speech evaluation application as an example, when the speech evaluation application starts a recording function, the speech evaluation application sends the number of audio channels required by the speech evaluation application to a recording system, and the recording system directly throws the acquired multi-channel audio data to the speech evaluation application after determining that the number of audio channels required by the speech evaluation application is greater than 3.
After the speech evaluation class application receives the multi-channel audio data, the speech evaluation class algorithm corresponding to the audio use scene can be called from the audio hardware abstraction layer 403 by combining the audio use scene of the speech evaluation class application to process the multi-channel audio data, so as to obtain single-channel or double-channel audio conforming to the audio use scene;
Or under the condition that a voice evaluation class algorithm corresponding to the audio use scene is deployed in the voice evaluation class application, after the voice evaluation class receives the multi-channel audio data, the voice evaluation class algorithm can be directly used for processing the multi-channel voice to obtain single-channel or double-channel audio conforming to the audio use scene.
In summary, according to the audio data processing method, the multi-channel audio is subjected to data processing by combining the audio use scene of the target application, so that the target audio meeting the requirements of the target application is obtained, the use requirements of users on the audio in different scenes are met, and the use quality of the audio is improved.
Exemplary apparatus
Corresponding to the above audio data processing method, an embodiment of the present application further provides an audio data processing apparatus, referring to fig. 5, including:
a first unit 501, configured to acquire multi-channel audio data, and determine the number of audio channels of the audio data required by a target application;
a second unit 502, configured to, when the number of audio channels is less than a preset number of channels, process the multi-channel audio data according to an audio usage scenario of the target application to obtain a first target audio that accords with the audio usage scenario, and send the first target audio to the target application;
And a third unit 503, configured to send the multi-channel audio data to the target application if the number of audio channels is not less than a preset number of channels, so that the target application processes the multi-channel audio data to obtain a second target audio that accords with an audio usage scenario of the target application.
As an alternative embodiment, the audio usage scenario of the target application is determined by:
acquiring audio use conditions of the target application, wherein the audio use conditions of the target application comprise the number of audio channels of audio data required by the target application and a recording source of audio required by the target application;
and determining the audio use scene of the target application according to the audio use condition of the target application.
As an optional implementation manner, the processing the multi-channel audio data according to the audio usage scenario of the target application to obtain the first target audio conforming to the audio usage scenario includes:
determining an audio processing algorithm corresponding to the audio use scene according to the audio use scene;
and performing data processing on the multi-channel audio data by utilizing an audio processing algorithm corresponding to the audio use scene to obtain first target audio conforming to the audio use scene.
As an optional implementation manner, the determining an audio processing algorithm corresponding to the audio usage scenario according to the audio usage scenario includes:
when the audio use scene is audio recording, determining that the audio processing algorithm is an audio noise reduction algorithm and an audio enhancement algorithm;
the method comprises the steps of,
and when the audio use scene is call recording, determining that the audio processing algorithm is an echo cancellation algorithm.
As an alternative implementation manner, the target application performs data processing on the multi-channel audio data to obtain second target audio which accords with the audio usage scene of the target application by the following manner:
calling an audio processing algorithm corresponding to the audio use scene of the target application, and performing data processing on the multi-channel audio data to obtain second target audio conforming to the audio use scene of the target application;
or,
and selecting an algorithm corresponding to the audio use scene of the target application from the audio processing algorithms stored in the target application, and performing data processing on the multi-channel audio data to obtain second target audio conforming to the audio use scene of the target application.
As an alternative embodiment, the usage scenario of the second target audio includes: at least one of voice wake-up recognition, voice evaluation.
The audio data processing device provided in this embodiment belongs to the same application conception as the audio data processing method provided in the foregoing embodiment of the present application, and may execute the audio data processing method provided in any of the foregoing embodiments of the present application, and has a functional module and beneficial effects corresponding to executing the audio data processing method. Technical details not described in detail in this embodiment may be referred to the specific processing content of the audio data processing method provided in the foregoing embodiment of the present application, and will not be described herein again.
The functions performed by the first unit 501, the second unit 502, and the third unit 503 may be implemented by the same or different processors, which are not limited in the embodiment of the present application.
It will be appreciated that the elements of the above apparatus may be implemented in the form of processor-invoked software. For example, the device includes a processor, where the processor is connected to a memory, and the memory stores instructions, and the processor invokes the instructions stored in the memory to implement any of the methods above or to implement functions of each unit of the device, where the processor may be a general-purpose processor, such as a CPU or a microprocessor, and the memory may be a memory within the device or a memory outside the device. Alternatively, the units in the apparatus may be implemented in the form of hardware circuits, and the functions of some or all of the units may be implemented by designing hardware circuits, which may be understood as one or more processors; for example, in one implementation, the hardware circuit is an ASIC, and the functions of some or all of the above units are implemented by designing the logic relationships of the elements in the circuit; for another example, in another implementation, the hardware circuit may be implemented by a PLD, for example, an FPGA may include a large number of logic gates, and the connection relationship between the logic gates is configured by a configuration file, so as to implement the functions of some or all of the above units. All units of the above device may be realized in the form of processor calling software, or in the form of hardware circuits, or in part in the form of processor calling software, and in the rest in the form of hardware circuits.
In the embodiment of the application, the processor is a circuit with signal processing capability, and in one implementation, the processor may be a circuit with instruction reading and running capability, such as a CPU, a microprocessor, a GPU, or a DSP, etc.; in another implementation, the processor may implement a function through a logical relationship of hardware circuitry that is fixed or reconfigurable, e.g., a hardware circuit implemented by the processor as an ASIC or PLD, such as an FPGA, or the like. In the reconfigurable hardware circuit, the processor loads the configuration document, and the process of implementing the configuration of the hardware circuit may be understood as a process of loading instructions by the processor to implement the functions of some or all of the above units. Furthermore, a hardware circuit designed for artificial intelligence may be provided, which may be understood as an ASIC, such as NPU, TPU, DPU, etc.
It will be seen that each of the units in the above apparatus may be one or more processors (or processing circuits) configured to implement the above method, for example: CPU, GPU, NPU, TPU, DPU, microprocessor, DSP, ASIC, FPGA, or a combination of at least two of these processor forms.
Furthermore, the units in the above apparatus may be integrated together in whole or in part, or may be implemented independently. In one implementation, these units are integrated together and implemented in the form of an SOC. The SOC may include at least one processor for implementing any of the methods above or for implementing the functions of the units of the apparatus, where the at least one processor may be of different types, including, for example, a CPU and an FPGA, a CPU and an artificial intelligence processor, a CPU and a GPU, and the like.
Exemplary System
Another embodiment of the present application also proposes an audio data processing system, the system comprising:
the audio acquisition unit is used for acquiring multi-channel audio data;
the audio processing unit is used for acquiring multi-channel audio data and determining the number of audio channels of the audio data required by the target application; under the condition that the number of the audio channels is smaller than the preset number of channels, processing the multi-channel audio data according to the audio use scene of the target application to obtain first target audio conforming to the audio use scene, and sending the first target audio to the target application; and under the condition that the number of the audio channels is not smaller than the preset number of channels, sending the multi-channel audio data to the target application so that the target application processes the multi-channel audio data to obtain second target audio which accords with the audio use scene of the target application.
The audio data processing system provided in this embodiment belongs to the same application conception as the audio data processing method provided in the foregoing embodiment of the present application, and may execute the audio data processing method provided in any of the foregoing embodiments of the present application, and has a functional module and beneficial effects corresponding to executing the audio data processing method. Technical details not described in detail in this embodiment may be referred to the specific processing content of the audio data processing method provided in the foregoing embodiment of the present application, and will not be described herein again.
Exemplary electronic device
Another embodiment of the present application further provides an electronic device, referring to fig. 6, including:
a memory 200 and a processor 210;
wherein the memory 200 is connected to the processor 210, and is used for storing a program;
the processor 210 is configured to implement the audio data processing method disclosed in any one of the above embodiments by executing the program stored in the memory 200.
Specifically, the electronic device may further include: a bus, a communication interface 220, an input device 230, and an output device 240.
The processor 210, the memory 200, the communication interface 220, the input device 230, and the output device 240 are interconnected by a bus. Wherein:
a bus may comprise a path that communicates information between components of a computer system.
Processor 210 may be a general-purpose processor such as a general-purpose Central Processing Unit (CPU), microprocessor, etc., or may be an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs in accordance with aspects of the present invention. But may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
Processor 210 may include a main processor, and may also include a baseband chip, modem, and the like.
The memory 200 stores programs for implementing the technical scheme of the present invention, and may also store an operating system and other key services. In particular, the program may include program code including computer-operating instructions. More specifically, the memory 200 may include read-only memory (ROM), other types of static storage devices that may store static information and instructions, random access memory (random access memory, RAM), other types of dynamic storage devices that may store information and instructions, disk storage, flash, and the like.
The input device 230 may include means for receiving data and information entered by a user, such as a keyboard, mouse, camera, scanner, light pen, voice input device, touch screen, pedometer, or gravity sensor, among others.
Output device 240 may include means, such as a display screen, printer, speakers, etc., that allow information to be output to a user.
The communication interface 220 may include devices using any transceiver or the like for communicating with other devices or communication networks, such as ethernet, radio Access Network (RAN), wireless Local Area Network (WLAN), etc.
The processor 210 executes programs stored in the memory 200 and invokes other devices that may be used to implement the steps of any of the audio data processing methods provided in the above-described embodiments of the present application.
The electronic device may be a handheld electronic device, a wearable electronic device, an intelligent terminal, a computer, etc. with audio acquisition and processing functions.
Exemplary computer program product and storage Medium
In addition to the methods and apparatus described above, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in an audio data processing method according to the various embodiments of the present application described in the "exemplary methods" section of the present specification.
The computer program product may write program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a storage medium having stored thereon a computer program that is executed by a processor to perform the steps in the audio data processing method according to the various embodiments of the present application described in the above-described "exemplary method" section of the present specification, and specifically may implement the following steps:
step S101, acquiring multi-channel audio data and determining the number of audio channels of the audio data required by a target application;
step S102, processing the multi-channel audio data according to the audio use scene of the target application to obtain first target audio conforming to the audio use scene and sending the first target audio to the target application under the condition that the number of the audio channels is smaller than the preset number of channels;
step S103, if the number of audio channels is not less than the preset number of channels, sending the multi-channel audio data to the target application, so that the target application processes the multi-channel audio data to obtain a second target audio that accords with the audio usage scenario of the target application.
For the foregoing method embodiments, for simplicity of explanation, the methodologies are shown as a series of acts, but one of ordinary skill in the art will appreciate that the present application is not limited by the order of acts described, as some acts may, in accordance with the present application, occur in other orders or concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other. For the apparatus class embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference is made to the description of the method embodiments for relevant points.
The steps in the method of each embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs, and the technical features described in each embodiment can be replaced or combined.
The modules and sub-modules in the device and the terminal of the embodiments of the present application may be combined, divided, and deleted according to actual needs.
In the embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described terminal embodiments are merely illustrative, and for example, the division of modules or sub-modules is merely a logical function division, and there may be other manners of division in actual implementation, for example, multiple sub-modules or modules may be combined or integrated into another module, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules or sub-modules illustrated as separate components may or may not be physically separate, and components that are modules or sub-modules may or may not be physical modules or sub-modules, i.e., may be located in one place, or may be distributed over multiple network modules or sub-modules. Some or all of the modules or sub-modules may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional module or sub-module in each embodiment of the present application may be integrated in one processing module, or each module or sub-module may exist alone physically, or two or more modules or sub-modules may be integrated in one module. The integrated modules or sub-modules may be implemented in hardware or in software functional modules or sub-modules.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software unit executed by a processor, or in a combination of the two. The software elements may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of processing audio data, comprising:
acquiring multi-channel audio data and determining the number of audio channels of the audio data required by a target application;
under the condition that the number of the audio channels is smaller than the preset number of channels, processing the multi-channel audio data according to the audio use scene of the target application to obtain first target audio conforming to the audio use scene, and sending the first target audio to the target application;
and under the condition that the number of the audio channels is not smaller than the preset number of channels, sending the multi-channel audio data to the target application so that the target application processes the multi-channel audio data to obtain second target audio which accords with the audio use scene of the target application.
2. The method of claim 1, wherein the audio usage scenario of the target application is determined by:
acquiring audio use conditions of the target application, wherein the audio use conditions of the target application comprise the number of audio channels of audio data required by the target application and a recording source of audio required by the target application;
and determining the audio use scene of the target application according to the audio use condition of the target application.
3. The method of claim 1, wherein processing the multi-channel audio data according to the audio usage scenario of the target application results in a first target audio that conforms to the audio usage scenario, comprising:
determining an audio processing algorithm corresponding to the audio use scene according to the audio use scene;
and performing data processing on the multi-channel audio data by utilizing an audio processing algorithm corresponding to the audio use scene to obtain first target audio conforming to the audio use scene.
4. A method according to claim 3, wherein said determining an audio processing algorithm corresponding to said audio usage scenario from said audio usage scenario comprises:
When the audio use scene is audio recording, determining that the audio processing algorithm is an audio noise reduction algorithm and an audio enhancement algorithm;
the method comprises the steps of,
and when the audio use scene is call recording, determining that the audio processing algorithm is an echo cancellation algorithm.
5. The method of claim 1, wherein the target application performs data processing on the multi-channel audio data to obtain a second target audio that conforms to an audio usage scenario of the target application by:
calling an audio processing algorithm corresponding to the audio use scene of the target application, and performing data processing on the multi-channel audio data to obtain second target audio conforming to the audio use scene of the target application;
or,
and selecting an algorithm corresponding to the audio use scene of the target application from the audio processing algorithms stored in the target application, and performing data processing on the multi-channel audio data to obtain second target audio conforming to the audio use scene of the target application.
6. The method of claim 5, wherein the usage scenario of the second target audio comprises: at least one of voice wake-up recognition, voice evaluation.
7. An audio data processing apparatus, comprising:
a first unit for acquiring multi-channel audio data and determining the number of audio channels of the audio data required by the target application;
the second unit is used for processing the multi-channel audio data according to the audio use scene of the target application to obtain first target audio conforming to the audio use scene and sending the first target audio to the target application under the condition that the number of the audio channels is smaller than the preset number of channels;
and the third unit is used for sending the multi-channel audio data to the target application under the condition that the number of the audio channels is not smaller than the preset number of channels, so that the target application processes the multi-channel audio data to obtain second target audio which accords with the audio use scene of the target application.
8. An audio data processing system, comprising:
the audio acquisition unit is used for acquiring multi-channel audio data;
the audio processing unit is used for acquiring multi-channel audio data and determining the number of audio channels of the audio data required by the target application; under the condition that the number of the audio channels is smaller than the preset number of channels, processing the multi-channel audio data according to the audio use scene of the target application to obtain first target audio conforming to the audio use scene, and sending the first target audio to the target application; and under the condition that the number of the audio channels is not smaller than the preset number of channels, sending the multi-channel audio data to the target application so that the target application processes the multi-channel audio data to obtain second target audio which accords with the audio use scene of the target application.
9. An electronic device, comprising:
a processor and a memory;
wherein the memory is used for storing a computer program; the processor is configured to implement the audio data processing method according to any one of claims 1 to 6 by running a computer program stored in the memory.
10. A computer storage medium, characterized in that the computer storage medium stores a computer program which, when executed, implements the audio data processing method of any one of claims 1-6.
CN202311412427.8A 2023-10-26 2023-10-26 Audio data processing method, device and system, electronic equipment and storage medium Pending CN117389506A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311412427.8A CN117389506A (en) 2023-10-26 2023-10-26 Audio data processing method, device and system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311412427.8A CN117389506A (en) 2023-10-26 2023-10-26 Audio data processing method, device and system, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117389506A true CN117389506A (en) 2024-01-12

Family

ID=89438813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311412427.8A Pending CN117389506A (en) 2023-10-26 2023-10-26 Audio data processing method, device and system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117389506A (en)

Similar Documents

Publication Publication Date Title
CN108733342B (en) Volume adjusting method, mobile terminal and computer readable storage medium
US11587560B2 (en) Voice interaction method, device, apparatus and server
JP2019128939A (en) Gesture based voice wakeup method, apparatus, arrangement and computer readable medium
CN110910872A (en) Voice interaction method and device
US20200219503A1 (en) Method and apparatus for filtering out voice instruction
JP2019128938A (en) Lip reading based voice wakeup method, apparatus, arrangement and computer readable medium
WO2017074602A1 (en) Decision forest compilation
US11327727B2 (en) Systems and methods for integrating modules into a software application
CN114816610B (en) Page classification method, page classification device and terminal equipment
EP4239459A1 (en) Feedback method, apparatus, and system
CN115543145A (en) Folder management method and device
CN117389506A (en) Audio data processing method, device and system, electronic equipment and storage medium
CN116665692A (en) Voice noise reduction method and terminal equipment
CN108961071B (en) Method for automatically predicting combined service income and terminal equipment
CN114489471B (en) Input and output processing method and electronic equipment
CN108293197A (en) A kind of resource statistics method, apparatus and terminal
CN115964331A (en) Data access method, device and equipment
CN111061518B (en) Data processing method, system, terminal equipment and storage medium based on drive node
CN114594882A (en) Feedback method, device and system
US20150179174A1 (en) System and method for context sensitive inference in a speech processing system
CN117112046B (en) Application program starting method and electronic equipment
CN108460128A (en) Document management method and device, electronic device and readable storage medium storing program for executing
CN114117269B (en) Memo information collection method and device, electronic equipment and storage medium
CN113485923B (en) Project code detection method and device and electronic equipment
CN116089320B (en) Garbage recycling method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination