CN116709112A - Audio data processing method, system, data processing device and storage medium - Google Patents

Audio data processing method, system, data processing device and storage medium Download PDF

Info

Publication number
CN116709112A
CN116709112A CN202210173756.0A CN202210173756A CN116709112A CN 116709112 A CN116709112 A CN 116709112A CN 202210173756 A CN202210173756 A CN 202210173756A CN 116709112 A CN116709112 A CN 116709112A
Authority
CN
China
Prior art keywords
audio data
audio
data
channel
channels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210173756.0A
Other languages
Chinese (zh)
Inventor
黄日来
刘玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BYD Co Ltd
Original Assignee
BYD Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BYD Co Ltd filed Critical BYD Co Ltd
Priority to CN202210173756.0A priority Critical patent/CN116709112A/en
Publication of CN116709112A publication Critical patent/CN116709112A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The embodiment of the application provides an audio data processing method, an audio data processing system, data processing equipment and a storage medium, wherein the audio data processing method comprises the following steps: the data processing device acquires the audio data of the N channel from the audio processor through the audio bus; and screening the audio data of a channels required by the target application from the audio data of the N channels, wherein the target application is any one of at least two applications installed for the data processing equipment. The embodiment of the application can reduce the number of the required audio buses under the condition that at least two applications have the requirement of acquiring the audio data at the same time.

Description

Audio data processing method, system, data processing device and storage medium
Technical Field
The present application relates to the field of audio technologies, and in particular, to an audio data processing method, an audio data processing system, a data processing device, and a storage medium.
Background
Under the trend of intelligent and networking of automobiles, human-computer interaction with longer time, higher frequency and deeper level is becoming an important development theme in the automobile industry, and intelligent voice is one of the very important human-computer interaction modes of intelligent cabins of automobiles. The intelligent voice refers to a first application which runs all the time after being started, and is an important interaction entrance of the intelligent cabin, and the intelligent voice comprises the functions of voice recognition, awakening, positioning and the like. In order to improve the success rate of voice recognition, noise reduction is required, the common recording means the voice function of the second application, and the common recording does not need noise reduction, so that the number of channels of audio data required by intelligent voice and common voice is different.
In order to ensure that both intelligent speech and normal recordings do not interfere with each other. The first application and the second application respectively read audio data from different audio buses. For example, the first application reads the audio data of the required number of channels from the first audio bus, and the second application reads the audio data of the required number of channels from the second audio bus, and the first application and the second application can work simultaneously without interference. At present, the intelligent voice and the common recording work simultaneously and need independent audio buses, and can be realized only by a minimum of 2 groups of audio buses, and more audio buses are needed.
Disclosure of Invention
The embodiment of the application provides an audio data processing method, an audio data processing system, data processing equipment and a storage medium, which can reduce the number of audio buses required under the condition that at least two applications have audio data acquisition requirements at the same time.
A first aspect of an embodiment of the present application provides an audio data processing method, which is applied to a data processing apparatus, the method including:
the data processing device acquires the audio data of the N channel from the audio processor through the audio bus; n is an integer greater than or equal to 2;
and screening the audio data of a channels required by a target application from the audio data of the N channels, wherein the target application is any one of at least two applications installed by the data processing equipment, and a is a positive integer less than or equal to N.
A second aspect of an embodiment of the present application provides an audio processing system, including: the device comprises an audio processor, an audio bus and data processing equipment, wherein the audio processor is connected with the data processing equipment through the audio bus;
the data processing device is used for acquiring N-channel audio data from the audio processor through the audio bus; n is an integer greater than or equal to 2;
the data processing device is further configured to screen audio data of a channels required by a target application from the audio data of the N channels, where the target application is any one of at least two applications installed by the data processing device, and a is a positive integer less than or equal to N.
A third aspect of an embodiment of the present application provides a data processing apparatus, including:
the acquisition unit is used for acquiring the audio data of the N channels from the audio processor through the audio bus; n is an integer greater than or equal to 2;
the screening unit is used for screening the audio data of a channels required by the target application from the audio data of the N channels, wherein the target application is any one of at least two applications installed by the data processing equipment, and a is a positive integer less than or equal to N.
A fourth aspect of the embodiments of the present application provides a data processing apparatus comprising a processor and a memory for storing a computer program comprising program instructions, the processor being configured to invoke the program instructions to execute the step instructions as in the first aspect of the embodiments of the present application.
A fifth aspect of the embodiments of the present application provides a computer-readable storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute some or all of the steps as described in the first aspect of the embodiments of the present application.
A sixth aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a computer program operable to cause a computer to perform some or all of the steps described in the first aspect of embodiments of the present application. The computer program product may be a software installation package.
In the embodiment of the application, data processing equipment acquires N-channel audio data from an audio processor through an audio bus; n is an integer greater than or equal to 2; and screening the audio data of a channels required by the target application from the audio data of the N channels, wherein the target application is any one of at least two applications installed on the data processing equipment, and a is a positive integer less than or equal to N. In the embodiment of the application, the audio data required by each application in at least two applications can be screened from the N-channel audio data acquired by one audio bus, the acquisition requirements of the audio data of at least two applications can be realized by only one group of audio buses, and the number of the required audio buses can be reduced under the condition that the audio data acquisition requirements are simultaneously available for at least two applications.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of an audio processing system according to an embodiment of the present application;
fig. 2 is a flow chart of an audio data processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a 1-channel demand data processing according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a data flow direction according to an embodiment of the present application;
FIG. 5 is a schematic diagram of 2-channel demand data processing according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms first, second and the like in the description and in the claims and in the above-described figures are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the described embodiments of the application may be combined with other embodiments.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an audio processing system according to an embodiment of the application. As shown in fig. 1, the audio processing system 100 may include: an audio processor 10, an audio bus 20 and a data processing device 30. The audio processor 10 is connected with the data processing device 30 through the audio bus 20, and the data processing device 30 is provided with at least two applications;
The data processing device 30 is configured to obtain N-channel audio data from the audio processor 10 through the audio bus 20; n is an integer greater than or equal to 2;
the data processing device 30 is further configured to screen audio data of a channels required by a target application from the audio data of the N channels, where the target application is any one of at least two applications installed by the data processing device, and a is a positive integer less than or equal to N.
The data processing device 30 may be an integral part of the vehicle. For example, the data processing device 30 may be a module having a data processing function on a vehicle, and specifically, the data processing device 30 may be a System On Chip (SOC). The audio processor 10, the audio bus 20 and the data processing device 30 may be integrated in the same device or may be present separately.
In an embodiment of the present application, the at least two applications may include a first application (e.g., a smart voice application) and a second application (e.g., a general sound recording application). After the data processing device 30 is powered on, the first application may be running all the time, and the first application may be an important interaction portal of the intelligent cabin, including functions of speech recognition, wake-up, positioning, etc. In order to improve the success rate of speech recognition, noise reduction is required, and when the first application is an intelligent speech application, at least 2 channels of audio data are required for the function responsible for speech recognition in the first application, and the 2 channels comprise 1 channel microphone (mic) audio data and 2 channel reference signal data. For example, if 1 microphone is installed in the vehicle (the microphone may be installed near the driving position), n=2, and the N-channel audio data includes 1-channel microphone audio data and 1-channel reference audio data (e.g., one left-channel reference audio data). The screening the audio data of a channels required by the first application from the audio data of the N channels may specifically include: and screening out 2 channels of audio data required by the first application from the N channels of audio data. If 2 microphones are installed in the vehicle (the 2 microphones may be installed near the main driving position and near the co-driving position, respectively), n=4, and the N-channel audio data includes 2-channel microphone audio data and 2-channel reference audio data (including one left-channel reference audio data and one right-channel reference audio data). The screening the audio data of a channels required by the first application from the audio data of the N channels may specifically include: and screening out the audio data of 4 channels required by the first application from the audio data of the N channels. If 4 microphones are installed in the vehicle (the 4 microphones may be installed near the main driving position and near the co-driving position, near the left seat of the second row, near the right seat of the second row, respectively), then n=6, the N-channel audio data includes 4-channel microphone audio data and 2-channel reference audio data (including one left-channel reference audio data and one right-channel reference audio data). The screening the audio data of a channels required by the first application from the audio data of the N channels may specifically include: and screening 4 channels of audio data (2 channels of microphone audio data and 2 channels of reference audio data) or 6 channels of audio data required by the first application from the N channels of audio data. If 6 microphones are installed in the vehicle (the 6 microphones may be installed near the main driving position and near the co-driving position, near the left seat of the second row, near the right seat of the second row, near the left seat of the third row, near the right seat of the third row, respectively), the n=8, N-channel audio data includes 6-channel microphone audio data and 2-channel reference audio data (including one left-channel reference audio data and one right-channel reference audio data). The screening the audio data of a channels required by the first application from the audio data of the N channels may specifically include: and 4 channels of audio data (2 channels of microphone audio data and 2 channels of reference audio data) or 6 channels of audio data (4 channels of microphone audio data and 2 channels of reference audio data) or 8 channels of audio data required by the first application are selected from the N channels of audio data.
Common recording applications may include second applications' own voice functions such as map applications (hundred degree map, telecommunications map), music applications, chat applications, etc. The second application may collect microphone (mic) audio data for b channels.
The first application is one of the human-computer interaction portals, and the data processing device 30 is started when turned on, so that microphone (mic) voice data can be continuously acquired, and user voice input can be monitored. The second application is launched when the user has a need (e.g., when the user needs the voice-controlled mapping application to navigate). In order to achieve good human-computer interaction experience, application functions such as hundred-degree map intelligent voice, QQ voice, weChat voice and the like can be used simultaneously when intelligent voice works, and the intelligent voice of the second application can work normally at the moment and can accurately recognize instructions of user voice.
The audio processor 10 may acquire audio data collected by a microphone from the microphone, and may also acquire reference audio data from an application (e.g., a music application) having an audio playing function that is in operation. After the audio processor 10 processes the acquired data, N-channel audio data may be obtained, and the N-channel audio data may be transmitted to the data processing device 30 through the audio bus 20. The data processing device 30 may select desired audio data from the N-channel audio data to process according to the requirements of the first application and the requirements of the second application.
For the first application to be an intelligent voice application, noise reduction is required because the first application needs to consider the success rate of voice recognition. For example, a music signal and a sound source signal exist in a vehicle-mounted environment at the same time and are transmitted to a microphone (mic), and during voice recognition, the music signal needs to be shielded, namely echo cancellation, which takes the music signal as a reference signal, suppresses environmental noise and enhances effective voice signals, and the echo cancellation needs to be processed by a voice processing module with a noise reduction function in a data processing device. At least 2 channels are referenced by at least 1 channel microphone (mic) +at least 1 channel (e.g., may be musical noise).
For the second application, which is a common recording application (or may be an applet), only a microphone (mic) signal needs to be acquired, no reference signal needs to be acquired, typically 1 or 2 channels are required, and no related noise reduction processing module is provided, and no noise reduction processing requirement is provided.
The audio bus 20 may be a communication bus with transmission of audio data. For example, an Inter-integrated circuit audio bus (I2S) may also be referred to as an integrated circuit audio bus.
In the embodiment of the application, the audio data required by each application in at least two applications can be screened from the N-channel audio data acquired by one audio bus, the acquisition requirements of the audio data of at least two applications can be realized by only one group of audio buses, and the number of the required audio buses can be reduced under the condition that the audio data acquisition requirements are simultaneously available for at least two applications.
Optionally, as shown in fig. 1, the audio processing system 100 may further include M microphones (41, 42, 4M as shown in fig. 1);
the audio processor 10 is configured to obtain M paths of microphone raw data from the M microphones and obtain P paths of raw reference data from a reference audio channel;
the audio processor 10 is further configured to process the M paths of microphone original data to obtain M paths of microphone audio data, and process the P paths of original reference data to obtain P paths of reference audio data; the audio data of the N channels comprise microphone audio data of the M channels and reference audio data of the P channels; n is the sum of M and P, M is an integer greater than or equal to 1, and P is an integer greater than or equal to 1. In fig. 1, for convenience of understanding, M is greater than or equal to 3.
In an embodiment of the present application, the reference audio channel may be an audio transmission channel established by an application having an audio playing function (for example, a music application) and the audio processor 10. An application having an audio playback function may be installed on the data processing device 30.
After the audio processor 10 acquires M paths of microphone raw data from the M microphones, the M paths of microphone raw data may be processed, and similarly, after the audio processor 10 acquires P paths of raw reference data with reference to the audio channel, the P paths of raw reference data may be processed to obtain P paths of reference audio data. For example, the audio processor 10 may be an audio digital signal processor (Audio Digital Signal Processing, ADSP), which may perform processes such as encoding and decoding, analog-to-digital conversion, and packaging on the original audio data, and the processed audio data may facilitate further processing by the first application and the second application.
The following method embodiments may be applied to the audio processing system shown in fig. 1.
Referring to fig. 2, fig. 2 is a flowchart of an audio data processing method according to an embodiment of the application. As shown in fig. 2, the method includes the following steps.
The data processing device obtains 201N-channel audio data from the audio processor via the audio bus.
Wherein N is an integer greater than or equal to 2.
In the embodiment of the application, the data processing device acquires the N-channel audio data from the audio processor through the audio bus and can be provided for at least two applications (such as a first application and a second application).
Optionally, the audio data of the N channel includes microphone audio data of the M channel and reference audio data of the P channel; n is the sum of M and P, M is an integer greater than or equal to 1, and P is an integer greater than or equal to 1.
When the first application is an intelligent voice application, the first application needs to perform noise reduction processing in order to improve the success rate of voice recognition, and when the first application processes the audio data of the N channels, the first application can perform noise reduction processing on the audio data of the microphone of the M channels according to the reference audio data of the P channels, so that noise (such as music sound) in voice collected by the microphone is filtered, and the success rate of voice recognition of the first application is improved.
Wherein P may be equal to 1 or 2, and when p=1, m=1, the reference audio data of the P channel may include the reference audio data of the left channel or the reference audio data of the right channel, and the microphone audio data of the M channel may include the audio data collected by a microphone disposed near the main driving position or the audio data collected by a microphone disposed near the co-driving position. For example, if the main driving location is located on the left side of the front row of the vehicle, the reference audio data may include reference audio data of a left channel, and the microphone audio data may include audio data collected by a microphone disposed near the main driving location. If the primary driver's seat is located on the right side of the front row of the vehicle, the reference audio data may include reference audio data of a right channel, and the microphone audio data may include audio data collected by a microphone disposed near the primary driver's seat.
When p=2, m=2, the reference audio data of the P channel may include the reference audio data of the left channel and the reference audio data of the right channel, and the microphone audio data of the M channel may include the audio data collected by the microphone disposed near the main driving position and the audio data collected by the microphone disposed near the co-driving position.
When p=2, m=4, the reference audio data of the P channel may include the reference audio data of the left channel and the reference audio data of the right channel, and the microphone audio data of the M channel may include the audio data collected by the microphone disposed near the main driving position, the audio data collected by the microphone disposed near the co-driving position, the audio data collected by the microphone disposed near the second row left seat, and the audio data collected by the microphone disposed near the second row right seat.
Here, when p=2, m=6, 8, etc., reference may be made to the above embodiments, and the description thereof will be omitted.
Optionally, the data processing device obtains the N-channel audio data from the audio processor through the audio bus, which specifically includes the following steps:
(11) The data processing equipment receives K audio data frames output by the audio processor through the audio bus, each frame of the K audio data frames comprises frame data of N channels, and K is a positive integer;
(12) And the data processing equipment puts the K audio data frames into an audio buffer according to the receiving time sequence to obtain the audio data of the N channels.
In the embodiment of the application, the audio processor can collect the original audio data from the M microphones according to the set sampling frequency and the set sampling bit number, and at the same time, the audio processor can also obtain the reference audio data from the application (such as a music application) which is working and has an audio playing function according to the set sampling frequency and the set sampling bit number. The audio processor may perform processing such as encoding and decoding, analog-to-digital conversion, and encapsulation on the collected original audio data and reference audio data to obtain K audio data frames (frames).
After the data processing device receives the K audio data frames output by the audio processor through the audio bus, the K audio data frames can be put into the audio buffer according to the receiving time sequence, so as to obtain the audio data of the N channels.
Each of the K frames of audio data includes N channels of frame data. Specifically, each of the K audio data frames includes N channels of frame data. And the frame data of N channels included in each frame of the K audio data frames form the audio data of the N channels. The audio data of each of the N channels of audio data includes a combination of frame data of a corresponding channel of K frames of audio data.
202, screening a-channel audio data required by a target application from the N-channel audio data.
Wherein the target application is any one of at least two applications installed by the data processing device, and a is a positive integer less than or equal to N.
In one possible embodiment, when a=n, the target application is an intelligent voice application, in order to improve the success rate of voice recognition, the target application may need as much audio data as possible, and the first application may not perform filtering on the N-channel audio data obtained from the audio bus by the data processing device, but may all be used for processing.
In another possible embodiment, a < N, where the target application is a normal recording application, in order to reduce complexity of speech processing, the target application may screen audio data of a channels required by the target application from audio data of N channels.
In the embodiment of the application, the audio data of the N channels may include the audio data of the microphone channel and the audio data of the reference channel. If the target application does not have noise reduction processing requirements (for example, the target application is a common recording application), the audio data of the reference channel is not required to be acquired, and only the audio data of the microphone channel is required to be acquired. The audio data of a channels required by the target application can be screened from the audio data of the N channels. For example, when the target application needs to acquire audio data of one channel, the audio data of one microphone channel needs to be screened from the audio data of the N channels. When the target application needs to acquire the audio data of at least two channels, the audio data of at least two microphone channels need to be screened from the audio data of the N channels.
For example, some target applications only support processing of audio data in a single channel, and only 1 microphone is installed in a vehicle, so that the target application needs to acquire the audio data from the microphone channel, and at this time, the audio data of the microphone channel can be screened from the audio data of the N channel.
For example, some target applications only support processing of audio data of a single channel, and if 2 or more microphones are installed in a vehicle, the target application needs to screen audio data of a desired channel from audio data acquired by the 2 or more microphone channels.
For example, some target applications support processing of two-channel audio data, and if 2 or more microphones are installed in a vehicle, the target application needs to screen audio data of a desired channel from audio data acquired by the 2 or more microphone channels.
After the target application obtains the audio data of the a channels, the subsequent processing can be performed on the audio data of the a channels. For example, for a map application, after the voice recognition function is awakened, a user may send a voice signal (such as a place name), and after the map application obtains audio data of one channel or audio data of at least two channels by the method of fig. 2, the map application may perform voice recognition processing, may generate a corresponding search word, may search for a corresponding place, and may generate a navigation route.
Optionally, the N-channel audio data are encapsulated in K audio data frames, where the K audio data frames are stored in an audio buffer, and each of the K audio data frames includes N-channel frame data, where K is a positive integer.
In the embodiment of the application, the N-channel audio data acquired from the audio processor by the data processing device through the audio bus can be stored in the audio buffer. The data processing device may obtain K audio data frames from the audio processor through the audio bus, where each of the K audio data frames includes N channels of frame data, and K is a positive integer.
Optionally, in step 202, the screening of audio data of a channels required by the target application from audio data of N channels may specifically include the following steps:
(21) Creating a first buffer, wherein the capacity ratio of the first buffer to the audio buffer is greater than or equal to a/N;
(22) Screening out frame data of a channels from frame data of N channels contained in each audio data frame in the K audio data frames respectively to obtain a first screening data frame corresponding to each audio data frame; wherein each first screening data frame comprises frame data of a channel screened from the corresponding audio data frame;
(23) And placing each first screening data frame into the first buffer according to the receiving time sequence of the audio data frames corresponding to each first screening data frame to obtain the audio data of a channels required by the target application.
In the embodiment of the application, the screening of the audio data of a channels from the audio data of N channels can be realized by creating threads and caching. The created threads and caches may be implemented by Java code.
The size of the created cache may be determined based on the number of channels to be screened, and in general, the larger the number of channels, the larger the size of the cache. When the number of the screened channels is determined to be a, the size of the created first buffer can be selected to be: the ratio of the capacity of the first buffer to the capacity of the audio buffer is greater than or equal to a/N. The size of the first buffer may also be determined according to a corresponding relationship between the number of channels and the size of the buffer (the number of channels and the size of the buffer may be in a positive correlation relationship, and in particular may be in a direct proportional relationship). And the audio data of the screened a channels are put into the first buffer memory, so that the subsequent target application can conveniently acquire the audio data from the first buffer memory. The first buffer may be a first-in first-out (First Input First Output, FIFO) buffer. The audio data of the screened a channels are placed into the first buffer memory according to the receiving time sequence of the K audio data frames, so that the target application can be ensured to acquire the audio data from the first buffer memory according to the time sequence, the audio data acquired by the target application are prevented from being disordered in the time sequence, and the processing effect of the target application on the audio data is improved.
Optionally, the frame data of the a channels are all the same a-channel frame data, which are screened from each audio data frame.
In the embodiment of the present application, each channel in the N-channel audio data may include K-frame data. The following description will take a=1, n=4, m=2, and p=2 as examples. Referring to fig. 3, fig. 3 is a schematic diagram illustrating processing of 1-channel demand data according to an embodiment of the application. As shown in fig. 3, in the K frame data, each frame includes microphone audio data of a left channel (mic_l), microphone audio data of a right channel (mic_r), reference audio data of a left channel (ref_l), and reference audio data of a right channel (ref_r). And screening out the microphone audio data of the left channel of each frame in the K frame data to obtain the microphone audio data of the left channel contained in the K frame data. The data which are screened out are arranged according to the time sequence from the first frame to the K frame, and enter the first buffer memory of the first-in first-out according to the time sequence. When the target application needs to process the microphone audio data of the left channel in the K frames, the data is fetched from the first buffer according to a first-in first-out strategy.
Optionally, in step 202, the screening of audio data of a channels required by the target application from audio data of N channels may specifically include the following steps:
And if a is equal to 1, screening out audio data of a first channel from the audio data of the microphones of the M channels, wherein the first channel is one of the M channels.
In the embodiment of the application, the read function can be a read function in java, and can continuously read the audio data stream.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating a data flow direction according to an embodiment of the present application. As shown in fig. 4, the data processing device may include an audio platform layer, an audio hardware abstraction layer, and a kernel layer.
In the embodiment of the present application, the audio platform layer may be an audioflex, which is used as an android audio framework (android audio framework) layer, and may continuously acquire data from audiohal through a created thread (such as the first thread/the second thread in fig. 4), and may also have functions of control logic, resampling, and the like.
The audio hardware abstraction layer may be audiohal, which may perform a read function (first read function/second read function) to read audio data from the kernel layer.
The kernel layer may be a kernel layer that may store audio data acquired from the audio processor by DMA over the I2S bus.
Wherein the at least two applications may include a first application and a second application, and the target application may be the first application or the second application. The first application (e.g., intelligent voice APP) and the second application (e.g., normal recording APP) may obtain audio data through an Android (Android) native audioflex layer. The audioflex layer creates a corresponding thread, can circularly acquire audio data from the audioflex layer, stores the audio data in a corresponding FIFO buffer, and provides the audio data for an application to acquire. The read function of the audiohal layer can acquire data from the kernel layer through a tinyalsa module (Android audio processing framework layer), and the intelligent voice APP and the common recording APP in the embodiment of the application acquire data from the same I2S bus.
The embodiment of the application can be connected with an audio processor (such as an external ADSP) through a group of I2S buses, 2-channel MIC data (MIC_L and MIC_R) and 2-channel reference signals can be transmitted to I2S1 through DMA (direct memory access), data of a first application (such as intelligent voice APP) and a second application (such as common recording APP) are obtained from I2S1 hardware, the first application can obtain all 4-channel data, and the second application can obtain 1-channel or 2-channel MIC data through the method of the embodiment of the application.
The following describes in detail the case (a=1) where the target application needs to acquire audio data of one channel, with reference to fig. 3 and 4.
When the target application has 1 channel data requirement, the target application (for example, a common recording APP) can apply for 1 channel acquisition requirement to an audio platform layer (audioflex), and the audioflex can create a thread, circularly acquire data to an audio hardware abstraction layer (audiohal), and provide a first buffer with size of size 1. The method comprises the steps of acquiring 4 channels with the same channel number as that of intelligent voice from a kernel layer (kernel), acquiring 4-channel data from the kernel layer by adopting a buffer of 4 x sizer 1, screening out 1 channel from the data, filling the 1 channel into a first buffer of an audioflex, and providing the first buffer for a target application.
As can be seen from fig. 3, there are K audio data frames (frames), and the audio hardware abstraction layer (audiohal) obtains data from the kernel layer (kernel) with a size of 4 x size1, which is also K frames. When the method is implemented by software, all audio blocks in K frames can be traversed, MIC_L data is screened out through an arrangement sequence, each frame only keeps MIC_L data, all MIC_L data is filled into a first buffer memory and is provided for an audioflex layer, and a common recording APP acquires corresponding data from the audioflex layer.
In the embodiment of the present application, each channel in the N-channel audio data may include K-frame data. The following description will take a=2, n=4, m=2, and p=2 as examples. Referring to fig. 5, fig. 5 is a schematic diagram illustrating 2-channel demand data processing according to an embodiment of the application. As shown in fig. 5, in the K frame data, each frame includes microphone audio data of a left channel (mic_l), microphone audio data of a right channel (mic_r), reference audio data of a left channel (ref_l), and reference audio data of a right channel (ref_r). And screening out the microphone audio data of the left channel and the microphone audio data of the right channel of each frame in the K frame data to obtain the microphone audio data of the left channel and the microphone audio data of the right channel contained in the K frame data. The data which are screened out are arranged according to the time sequence from the first frame to the K frame, and enter the first buffer memory of the first-in first-out according to the time sequence. When the target application needs to process the microphone audio data of the left channel and the microphone audio data of the right channel in the K frames, the data are fetched from the first buffer according to a first-in first-out strategy.
Optionally, in step 202, the screening of audio data of a channels required by the target application from audio data of N channels may specifically include the following steps:
and if a is equal to 2, screening out audio data of a second channel and audio data of a third channel from the audio data of the microphone of the M channels, wherein the second channel is one of the M channels, and the third channel is one of the M channels except the second channel.
In the embodiment of the application, the read function can be a read function in java, and can continuously read the audio data stream.
The second channel and the third channel may be channels corresponding to different microphone orientations in the M channels. For example, the second channel is a microphone arranged at the left side in the vehicle, and the corresponding channel is a left channel; the third channel is a microphone arranged on the right side in the vehicle, and the corresponding channel is a right channel.
The case where the target application needs to acquire audio data of 2 channels is described in detail below with reference to fig. 5 and 4.
When the target application has 2 channel data requirements, the target application (for example, a general recording APP) may apply for 2 channel acquisition requirements from an audio platform layer (audioflex), and the audioflex may create threads to circularly acquire data from an audio hardware abstraction layer (audiohal), and provide a first buffer with a size of 2 x sizer 1. The method comprises the steps of acquiring 4 channels with the same channel number as that of intelligent voice from a kernel layer (kernel), acquiring 4-channel data from the kernel layer by adopting a buffer of 4 x sizer 1, screening 2 channels from the acquired 4-channel data, filling the 2 channels into a first buffer of an audioflex, and providing the first buffer for a target application.
As can be seen from fig. 5, there are K audio data frames (frames), and the audio hardware abstraction layer (audiohal) obtains data from the kernel layer (kernel) with a size of 4 x size1, which is also K frames. When the method is implemented by software, all audio blocks in K frames can be traversed, MIC_L data and MIC_R data are screened out through an arrangement sequence, each frame only keeps MIC_L data and MIC_R data, all MIC_L data and MIC_R data are filled into a first buffer memory and provided for an audioflex layer, and a common recording APP acquires corresponding data from the audioflex layer.
In the embodiment of the application, the acquisition requirements of the target application and the audio data of the target application can be met by only one group of audio buses, and the number of the required audio buses can be reduced under the conditions of the target application and the target application, so that the number of the required audio buses is reduced.
The above description of the solution of the embodiment of the present application is presented in terms of the implementation of the procedure from the method side. It will be appreciated that the data processing apparatus, in order to achieve the above-described functions, comprises corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The embodiment of the present application may divide functional units of the data processing apparatus according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated in one processing unit. The integrated units may be implemented in hardware or in software functional units. It should be noted that, in the embodiment of the present application, the division of the units is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, where the data processing apparatus 600 may include an obtaining unit 601 and a filtering unit 602, where:
an obtaining unit 601, configured to obtain audio data of the N channel from the audio processor through an audio bus; n is an integer greater than or equal to 2;
and a screening unit 602, configured to screen audio data of a channels required by a target application from the audio data of the N channels, where the target application is any one of at least two applications installed by the data processing device, and a is a positive integer less than or equal to N.
Optionally, the audio data of the N channel includes microphone audio data of the M channel and reference audio data of the P channel; n is the sum of M and P, M is an integer greater than or equal to 1, and P is an integer greater than or equal to 1.
Optionally, the N-channel audio data are encapsulated in K audio data frames, where the K audio data frames are stored in an audio buffer, and each of the K audio data frames includes N-channel frame data, where K is a positive integer.
Optionally, the screening the audio data of the a channels required by the target application from the audio data of the N channels includes:
creating a first buffer, wherein the capacity ratio of the first buffer to the audio buffer is greater than or equal to a/N; screening out frame data of a channels from frame data of N channels contained in each audio data frame in the K audio data frames respectively to obtain a first screening data frame corresponding to each audio data frame; wherein each first screening data frame comprises frame data of a channel screened from the corresponding audio data frame; and placing each first screening data frame into the first buffer according to the receiving time sequence of the audio data frames corresponding to each first screening data frame to obtain the audio data of a channels required by the target application.
And optionally screening out the frame data of a channels from each audio data frame as the same frame data of a channels.
Optionally, the filtering unit 602 filters audio data of a channels required by the target application from the audio data of N channels, including:
if a is equal to 1, screening audio data of a first channel from the audio data of the microphones of the M channels, wherein the first channel is one of the M channels;
and if a is equal to 2, screening out audio data of a second channel and audio data of a third channel from the audio data of the microphone of the M channels, wherein the second channel is one of the M channels, and the third channel is one of the M channels except the second channel.
Optionally, the second channel and the third channel are channels corresponding to different microphone orientations in the M channels.
Optionally, the acquiring unit 601 acquires N-channel audio data from the audio processor through the audio bus, including: receiving K audio data frames output by the audio processor through the audio bus; and putting the K audio data frames into the audio buffer according to the receiving time sequence to obtain the audio data of the N channels.
The acquiring unit 601 in the embodiment of the present application may be an audio bus in the data processing device, and the filtering unit 602 may be a processor in the data processing device.
In the embodiment of the application, the acquisition requirements of the audio data of at least two applications can be realized by only one group of audio buses, and the number of the audio buses required can be reduced under the condition that the audio data acquisition requirements are simultaneously available for at least two applications, so that the number of the audio buses required is reduced.
Referring to fig. 7, fig. 7 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present application, and as shown in fig. 7, the data processing apparatus 700 includes a processor 701 and a memory 702, where the processor 701 and the memory 702 may be connected to each other through a communication bus 703. The communication bus 703 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The communication bus 703 may be classified into an address bus, a data bus, a control bus, etc., for example, an audio bus, specifically an I2S bus may be included for the data bus. For ease of illustration, only one thick line is shown in fig. 7, but not only one bus or one type of bus. The memory 702 is used for storing a computer program comprising program instructions, the processor 701 being configured for invoking program instructions comprising steps for performing part or all of the method comprised in fig. 2.
The processor 701 may be a general purpose Central Processing Unit (CPU), microprocessor, application Specific Integrated Circuit (ASIC), or one or more integrated circuits for controlling the execution of the above program schemes.
The Memory 702 may be, but is not limited to, read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, random access Memory (random access Memory, RAM) or other type of dynamic storage device that can store information and instructions, but may also be electrically erasable programmable read-Only Memory (EEPROM), compact disc read-Only Memory (Compact Disc Read-Only Memory) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be stand alone and coupled to the processor via a bus. The memory may also be integrated with the processor.
In addition, the data processing device 700 may further include general components such as a communication interface (e.g., a USB interface, a microphone interface, etc.), an antenna, etc., which are not described in detail herein.
In the embodiment of the application, the acquisition requirements of the audio data of at least two applications can be realized by only one group of audio buses, and the number of the audio buses required can be reduced under the condition that the audio data acquisition requirements are simultaneously available for at least two applications.
The embodiment of the present application also provides a computer-readable storage medium storing a computer program for electronic data exchange, the computer program causing a computer to execute part or all of the steps of any one of the audio data processing methods described in the above method embodiments.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units described above may be implemented either in hardware or in software program modules.
The integrated units, if implemented in the form of software program modules, may be stored in a computer-readable memory for sale or use as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product, or all or part of the technical solution, which is stored in a memory, and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned memory includes: a U-disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-only memory, random access memory, magnetic or optical disk, etc.
The foregoing has outlined rather broadly the more detailed description of embodiments of the application, wherein the principles and embodiments of the application are explained in detail using specific examples, the above examples being provided solely to facilitate the understanding of the method and core concepts of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (12)

1. An audio data processing method, characterized in that the method is applied to a data processing device, the method comprising:
the data processing device acquires the audio data of the N channel from the audio processor through the audio bus; n is an integer greater than or equal to 2;
and screening the audio data of a channels required by a target application from the audio data of the N channels, wherein the target application is any one of at least two applications installed by the data processing equipment, and a is a positive integer less than or equal to N.
2. The method of claim 1, wherein the N-channel audio data is encapsulated in K audio data frames, the K audio data frames being stored in an audio buffer, each of the K audio data frames comprising N-channel frame data, K being a positive integer.
3. The method according to claim 2, wherein the screening the audio data of a channels required for the target application from the audio data of N channels includes:
creating a first buffer, wherein the capacity ratio of the first buffer to the audio buffer is greater than or equal to a/N;
screening out frame data of a channels from frame data of N channels contained in each audio data frame in the K audio data frames respectively to obtain a first screening data frame corresponding to each audio data frame; wherein each first screening data frame comprises frame data of a channel screened from the corresponding audio data frame;
and placing each first screening data frame into the first buffer according to the receiving time sequence of the audio data frames corresponding to each first screening data frame to obtain the audio data of a channels required by the target application.
4. A method according to claim 3, wherein the frame data of the a channels selected from each of the frames of audio data are all the same a-channel frame data.
5. The method of claim 1, wherein the N-channel audio data comprises M-channel microphone audio data and P-channel reference audio data; n is the sum of M and P, M is an integer greater than or equal to 1, and P is an integer greater than or equal to 1.
6. The method of claim 5, wherein the screening the audio data of a channels required for the target application from the audio data of N channels comprises:
if a is equal to 1, screening audio data of a first channel from the audio data of the microphones of the M channels, wherein the first channel is one of the M channels;
and if a is equal to 2, screening out audio data of a second channel and audio data of a third channel from the audio data of the microphone of the M channels, wherein the second channel is one of the M channels, and the third channel is one of the M channels except the second channel.
7. The method of claim 6, wherein the second channel and the third channel are channels corresponding to different microphone orientations in the M channels.
8. The method of claim 2, wherein the data processing device obtaining N-channel audio data from the audio processor over the audio bus, comprising:
receiving K audio data frames output by the audio processor through the audio bus;
and putting the K audio data frames into the audio buffer according to the receiving time sequence to obtain the audio data of the N channels.
9. An audio processing system, comprising: the device comprises an audio processor, an audio bus and data processing equipment, wherein the audio processor is connected with the data processing equipment through the audio bus;
the data processing device is used for acquiring N-channel audio data from the audio processor through the audio bus; n is an integer greater than or equal to 2;
the data processing device is further configured to screen audio data of a channels required by a target application from the audio data of the N channels, where the target application is any one of at least two applications installed by the data processing device, and a is a positive integer less than or equal to N.
10. The system of claim 9, further comprising M microphones;
The audio processor is used for acquiring M paths of microphone original data from the M microphones and acquiring P paths of original reference data from a reference audio channel;
the audio processor is further configured to process the M paths of microphone original data to obtain M paths of microphone audio data, and process the P paths of original reference data to obtain P paths of reference audio data; the audio data of the N channels comprise microphone audio data of the M channels and reference audio data of the P channels; n is the sum of M and P, M is an integer greater than or equal to 1, and P is an integer greater than or equal to 1.
11. A data processing apparatus comprising a processor and a memory, the memory for storing a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-8.
12. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-8.
CN202210173756.0A 2022-02-24 2022-02-24 Audio data processing method, system, data processing device and storage medium Pending CN116709112A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210173756.0A CN116709112A (en) 2022-02-24 2022-02-24 Audio data processing method, system, data processing device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210173756.0A CN116709112A (en) 2022-02-24 2022-02-24 Audio data processing method, system, data processing device and storage medium

Publications (1)

Publication Number Publication Date
CN116709112A true CN116709112A (en) 2023-09-05

Family

ID=87839776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210173756.0A Pending CN116709112A (en) 2022-02-24 2022-02-24 Audio data processing method, system, data processing device and storage medium

Country Status (1)

Country Link
CN (1) CN116709112A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472321A (en) * 2023-12-28 2024-01-30 广东朝歌智慧互联科技有限公司 Audio processing method and device, storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117472321A (en) * 2023-12-28 2024-01-30 广东朝歌智慧互联科技有限公司 Audio processing method and device, storage medium and electronic equipment
CN117472321B (en) * 2023-12-28 2024-09-17 广东朝歌智慧互联科技有限公司 Audio processing method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
US10867618B2 (en) Speech noise reduction method and device based on artificial intelligence and computer device
CN106910510A (en) Vehicle-mounted power amplifying device, vehicle and its audio play handling method
US11587560B2 (en) Voice interaction method, device, apparatus and server
CN110060685A (en) Voice awakening method and device
WO1999035009A1 (en) Vehicle computer system with audio entertainment system
CN211543441U (en) Active noise reduction system with low-delay interface
CN116709112A (en) Audio data processing method, system, data processing device and storage medium
CN110970051A (en) Voice data acquisition method, terminal and readable storage medium
CN113421578A (en) Audio processing method and device, electronic equipment and storage medium
CN109215675A (en) A kind of method, device and equipment of chauvent&#39;s criterion
CN109215666A (en) Intelligent Supports Made, the transmission method of audio signal, human-computer interaction method and terminal
CN104978966B (en) Frame losing compensation implementation method and device in audio stream
CN111417054B (en) Multi-audio-frequency data channel array generating method and device, electronic equipment and storage medium
US5832445A (en) Method and apparatus for decoding of digital audio data coded in layer 1 or 2 of MPEG format
CN113035223B (en) Audio processing method, device, equipment and storage medium
CN113053402B (en) Voice processing method and device and vehicle
CN111768791A (en) Audio playing method and device and vehicle
CN109599098A (en) Audio-frequency processing method and device
CN114501296A (en) Audio processing method and vehicle-mounted multimedia equipment
CN115703388A (en) Seat embedded speech sensor
CN115223582B (en) Audio noise processing method, system, electronic device and medium
CN113973149A (en) Electronic apparatus, device failure detection method and medium thereof
US7873424B1 (en) System and method for optimizing digital audio playback
CN116092465B (en) Vehicle-mounted audio noise reduction method and device, storage medium and electronic equipment
CN116841950A (en) Audio data transmission method, device, chip and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination