WO2022017007A1 - 音频数据处理方法、服务器及存储介质 - Google Patents

音频数据处理方法、服务器及存储介质 Download PDF

Info

Publication number
WO2022017007A1
WO2022017007A1 PCT/CN2021/097794 CN2021097794W WO2022017007A1 WO 2022017007 A1 WO2022017007 A1 WO 2022017007A1 CN 2021097794 W CN2021097794 W CN 2021097794W WO 2022017007 A1 WO2022017007 A1 WO 2022017007A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio data
program
audio
detection
module
Prior art date
Application number
PCT/CN2021/097794
Other languages
English (en)
French (fr)
Inventor
吴家平
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP21846422.0A priority Critical patent/EP4047471A4/en
Priority to KR1020227017498A priority patent/KR20220080198A/ko
Priority to JP2022548829A priority patent/JP7476327B2/ja
Publication of WO2022017007A1 publication Critical patent/WO2022017007A1/zh
Priority to US17/737,886 priority patent/US20220261217A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/162Interface to dedicated audio devices, e.g. audio drivers, interface to CODECs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path

Definitions

  • the present application relates to the field of computer technology, in particular to audio data processing.
  • Cloud application refers to the application running on the server.
  • the server runs the cloud application, generates corresponding audio data, and sends the audio data to the local application of the terminal for playback.
  • the local application of the terminal only needs to use the audio data can be played.
  • the server provides an AudioRecord (audio recording) interface for the audio capture program.
  • AudioRecord audio recording
  • the server will record the audio data through the recording thread, and the audio acquisition program can call the AudioRecord interface, read the recorded audio data from the recording thread, and send it to the local application of the terminal.
  • an embodiment of the present application provides an audio data processing method, the method is applied to a server, and the server includes a cloud application program, a system framework, a transfer program, and an audio collection program, and the method includes: The first audio data of the application is input into the system framework; the first audio data is processed by the system framework to obtain second audio data, and the second audio data is sent to the transfer program; The transfer program, according to the communication connection between the transfer program and the audio collection program, sends the second audio data to the audio collection program, and the audio collection program is used to The data is sent to the application local to the terminal.
  • an embodiment of the present application provides a delayed acquisition method, the method is applied to a server, and the server includes a detection application program, a system framework, a transfer program, and an audio collection program, and the method includes: The first detection audio data of the detection application is input into the system framework, and the sending time of the first detection audio data is recorded; the first detection audio data is processed by the system framework to obtain the second detection audio data , and send the second detection audio data to the relay program; through the relay program, according to the communication connection between the relay program and the audio collection program, the second detection audio data is sent to the relay program.
  • the audio collection program records the first reception time when the audio collection program receives the second detection audio data, and the audio collection program is used to send the second detection audio data to the local application program of the terminal; A first time difference between the sending time and the first receiving time, where the first time difference represents a delay in the transmission of the detection audio data from the detection application program to the audio collection program.
  • an embodiment of the present application provides a server, the server includes an application running module, a framework running module, a transfer module and a collection module; the application running module is used to input the first audio data of the cloud application program to the framework operation module; the framework operation module is used to process the first audio data, obtain second audio data, and send the second audio data to the transfer module; the transfer module, It is used to send the second audio data to the collection module according to the communication connection between the relay module and the collection module, and the collection module is used to send the second audio data to the local terminal of the terminal. application.
  • an embodiment of the present application provides a server, the server includes an application running module, a framework running module, a transfer module, a collection module, a recording module and an acquisition module, the application running module is used to detect the The first detection audio data is input into the framework operation module; the recording module is used to record the sending time of the first detection audio data; the framework operation module is used to process the first detection audio data , obtain the second detection audio data, and send the second detection audio data to the relay module; the relay module is used for transferring the The second detection audio data is sent to the collection module, and the collection module is used for sending the second detection audio data to the local application of the terminal; the recording module is also used for recording the collection module receiving the first 2. Detect the first receiving time of the audio data; the obtaining module is used to obtain a first time difference between the sending time and the first receiving time, where the first time difference indicates that the detected audio data is transmitted from the application running module Delay to the acquisition module.
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is used to execute the audio data processing method described in the above aspect; or , which is used to execute the delayed acquisition method described in the above aspect.
  • embodiments of the present application provide a computer program product or computer program, where the computer program product or computer program includes computer program code, and the computer program code is stored in a computer-readable storage medium.
  • a processor of a computer device reads the computer program code from the computer-readable storage medium, the processor executes the computer program code, so that the computer device implements the audio data processing method as described in the above aspect; or , to implement the delayed acquisition method described in the above aspects.
  • an embodiment of the present application provides a server, where the server includes:
  • processors communication interfaces, memories and communication buses;
  • the processor, the communication interface and the memory communicate with each other through the communication bus;
  • the communication interface is an interface of a communication module;
  • the memory for storing program codes and transmitting the program codes to the processor
  • the processor is configured to invoke the instructions of the program code in the memory to execute the audio data processing method described in the above aspect; or, to execute the delay acquisition method described in the above aspect.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 2 is an optional schematic structural diagram of the distributed system provided by the embodiment of the present application applied to the blockchain system;
  • FIG. 3 is a flowchart of an audio data processing method provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of audio data transmission in a process of delivering audio data to a terminal by a server provided by an embodiment of the present application;
  • FIG. 5 is a flowchart of an audio data processing method provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of a hardware abstraction layer provided by an embodiment of the present application sending audio data to an audio acquisition program
  • FIG. 7 is a flow chart of audio data transmission in a process of delivering audio data to a terminal by a server provided by an embodiment of the present application;
  • FIG. 8 is a flowchart of an audio data processing method provided by an embodiment of the present application.
  • FIG. 9 is a flowchart of a method for obtaining a delay provided by an embodiment of the present application.
  • FIG. 10 is a flowchart of a method for obtaining a delay provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of multiple audio data output by a detection application provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a delay in acquiring audio data by multiple programs in a server provided by an embodiment of the present application
  • FIG. 13 is a flowchart of a method for obtaining a delay provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a delay in acquiring audio data by multiple programs in a server provided by an embodiment of the present application.
  • 15 is a schematic diagram of a delay in acquiring audio data by multiple programs in a server provided by an embodiment of the present application.
  • 16 is a schematic structural diagram of an audio data processing apparatus provided by an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of another audio data processing apparatus provided by an embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of another audio data processing apparatus provided by an embodiment of the present application.
  • 19 is a schematic structural diagram of a delay acquisition device provided by an embodiment of the present application.
  • FIG. 20 is a structural block diagram of a terminal provided by an embodiment of the present application.
  • FIG. 21 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • first audio data may be referred to as second audio data
  • second audio data may be referred to as first audio data
  • Cloud application an application running in a server, optionally, the cloud application is a game application or an audio processing application.
  • Container The container encapsulates the relevant details necessary to run the application, such as the operating system, etc.
  • a server can run multiple containers, and each container can run cloud applications and operating systems, where the operating system is any one.
  • Operating system such as Android operating system, iOS (iPhone Operation System, Apple operating system), etc.
  • the hardware abstraction layer between the system framework and the hardware driver, it is responsible for receiving the audio data issued by the system framework, and outputting the audio data to the hardware through the hardware driver.
  • System framework a framework provided in the operating system, optionally, an audio processing framework (AudioFlinger) in the operating system.
  • AudioFlinger an audio processing framework
  • the resampling program (RemoteSubmix): a module in the operating system, which is used to mix the audio in the operating system and send it to the remote end through the network.
  • Audio collection program a program used to collect audio data from the operating system of the server, which can send the collected audio data to the encoding module (WebrtcProxy), and the encoding module encodes the audio data and sends it to the application of the terminal
  • the program optionally, when the cloud application is a cloud game program, the audio collection program is the CloudGame cloud game backend.
  • Audio recording interface (AudioRecord): the interface for audio data collection in the operating system, the source of audio data is microphone, RemoteSubmix, etc.
  • MixerThread The thread responsible for mixing in the system framework.
  • Recording thread The thread responsible for recording in the system framework.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • the implementation environment includes: a terminal 101 and a server 102.
  • the terminal 101 and the server 102 can be directly or indirectly connected through wired or wireless communication. This application is not limited here.
  • the terminal 101 is a device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto.
  • the server 102 is an independent physical server; optionally, the server 102 is a server cluster or a distributed system composed of multiple physical servers; Cloud servers for basic cloud computing services such as functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and big data and artificial intelligence platforms.
  • CDN Content Delivery Network
  • the server 102 runs a cloud application. During the running of the cloud application, the cloud application generates audio data, and the server 102 sends the audio data to the terminal 101, so that the terminal 101 can play the audio without running the application. Application-generated audio data.
  • the terminal 101 is installed with a local application program
  • the user can send a control instruction to the server 102 through the local application program
  • the cloud application program in the server 102 runs according to the control instruction, and generates audio data corresponding to the control instruction
  • the server 102 will The audio data is delivered to the terminal 101 so that the user can play the audio data through the local application on the terminal 101 .
  • the terminal and the server involved in the embodiments of the present application are connected to form a distributed system.
  • the distributed system as the blockchain system as an example, refer to FIG. 2 , which is an optional schematic structural diagram of the distributed system 200 provided in the embodiment of the present application applied to the blockchain system.
  • Any form of computing device (such as server, terminal) and client 202 entering the network is formed, and a peer-to-peer (P2P, Peer To Peer) network is formed between nodes.
  • P2P protocol is a transmission control protocol (TCP, Transmission).
  • TCP Transmission
  • Application layer protocol on top of Control Protocol Application layer protocol on top of Control Protocol
  • the involved functions include:
  • Routing a basic function that a node has to support communication between nodes.
  • a node can also have the following functions:
  • each server is a node in the blockchain, and data obtained by running the cloud applications on the multiple servers is synchronized.
  • the user controls the operation of the cloud game through the terminal, and the audio data processing method provided in the embodiment of the present application is used to send the audio data generated during the operation of the cloud game to the terminal, and the terminal plays the audio data, so that the user can play the audio data during the game. Listen to audio data.
  • the server can send the audio data to the terminal faster, which reduces the delay of the audio data, and enables the user to listen to the audio data faster.
  • the embodiments of the present application can also be applied to other scenarios where the server runs the cloud application, and the embodiments of the present application do not limit the application scenarios.
  • FIG. 3 is a flowchart of an audio data processing method provided by an embodiment of the present application.
  • the execution body of the embodiment of the present application is a server. Referring to FIG. 3 , the method includes the following steps.
  • the cloud application is any application running in the server.
  • the cloud application is a game application, or the cloud application is an audio processing application. This embodiment of the present application does not limit the type of the cloud application.
  • the first audio data is audio data generated during the running of the cloud application.
  • 302. Process the first audio data through the system framework to obtain second audio data, and send the second audio data to the transfer program.
  • the system framework is a framework in the operating system of the server, and is used for processing audio data.
  • the transfer program is a program between the system framework and the audio collection program, and is used to transmit the audio data processed by the system framework to the audio collection program.
  • the transfer program has the function of forwarding audio data.
  • the transfer program can also have other functions. , which is not limited in the embodiments of the present application.
  • a communication connection is established between the transfer program and the audio collection program, and the transfer program can directly send the second audio data to the audio collection program through the communication connection.
  • a local application program is installed on the terminal, and the local application program is an application program that supports the interaction between the terminal and the server.
  • the audio collection program After receiving the second audio data, the audio collection program sends the second audio data to the local application program of the terminal, so that the The terminal plays the second audio data, where the local application is the terminal local application in step 303 .
  • a relay program is set between the system framework and the audio acquisition program, and a communication connection between the relay program and the audio acquisition program is established, and the system framework can be directly processed through the communication connection.
  • the resulting audio data is sent to the audio capture program.
  • the above-mentioned method of directly sending the audio data through the communication connection reduces the transmission link of the audio data and shortens the time required for the audio capture program to obtain the audio data. The length of time reduces the delay for the server to deliver audio data.
  • the transfer program in the above steps 302 and 303 is a hardware abstraction layer; or an original resampling program in the operating system; or other programs, which are not limited in this embodiment of the present application.
  • the embodiment of the present application describes the server by taking the transfer program as the hardware abstraction layer as an example.
  • the server 400 includes a cloud application program 401 , a system framework 402 , a hardware abstraction layer 403 and an audio collection program 404 .
  • the cloud application 401 can call the interface of the system framework 402 to write audio data into the system framework 402 through, for example, a mixing thread, and the system framework 402 can call the interface of the hardware abstraction layer 403 to write audio data into the hardware abstraction layer 403 middle.
  • a communication connection is established between the hardware abstraction layer 403 and the audio collection program 404 , and audio data can be sent to the audio collection program 404 .
  • cloud application program 401 system framework 402 , hardware abstraction layer 403 and audio collection program 404 all run in the operating system container of the server 400 .
  • the server 400 also includes an encoding program 405, the audio collection program 404 sends the audio data to the encoding program 405, the encoding program 405 encodes the audio data, and sends the encoded audio data to the local application of the terminal. program.
  • FIG. 5 is a flowchart of an audio data processing method provided by an embodiment of the present application.
  • the execution body is the server shown in FIG. 4 .
  • the method includes the following steps.
  • the cloud application is an application running in the server
  • the local application is an application installed on the terminal
  • the local application is an application that supports the interaction between the terminal and the server
  • the server can store the data generated during the running of the cloud application. It is sent to the local application of the terminal, so that the terminal can display the data, so the terminal can obtain the data generated by the cloud application without running the cloud application.
  • the user can also send an instruction to the server through the local application of the terminal, and the server runs the cloud application according to the instruction, and sends the data generated by the cloud application to the local application of the terminal, so that the terminal can control the cloud in the server.
  • the application runs, and the terminal can also obtain the data generated after the cloud application runs, so the terminal can use the cloud application without installing and running the cloud application.
  • the user triggers an operation for the virtual character A to release the skill a in the local application of the terminal, and the local application of the terminal responds to the operation and sends a skill release instruction to the cloud application in the server, and the skill release instruction carries the virtual character A.
  • the virtual identifier and the skill identifier corresponding to skill a after receiving the skill release instruction, the cloud application renders the video data of the virtual character A releasing skill a according to the skill release instruction, and sends the video data to the local application of the terminal. , the video data is displayed by the local application of the terminal, so that the user can watch the picture of the virtual character A releasing the skill a.
  • the operation of releasing the skill a of the virtual character A is realized by the mutual cooperation between the cloud application program in the server and the local application program of the terminal.
  • the cloud application will generate audio data, and the server can send the audio data to the local application of the terminal, so that the terminal can play the audio data or store the audio data.
  • the cloud application acquires the first audio data according to the virtual identifier and the skill identifier in the skill release instruction, and sends the first audio data to the local application of the terminal, where the first audio data is the virtual character A releasing the skill a The corresponding skill release sound effect.
  • the local application of the terminal plays the first audio data, so that the user can hear the corresponding skill release sound effect when the virtual character A releases the skill a.
  • the cloud application stores multiple types of audio data, and the multiple types of audio data include the following types.
  • the background music is audio data played with the running of the cloud application, optionally, the cloud application stores a background music, and the background music is played in a loop with the running of the cloud application; optionally , the cloud application stores multiple background music, and the multiple background music is played cyclically with the running of the cloud application, or different background music is suitable for different running stages, and the cloud application selects from multiple In the background music, the background music corresponding to the running stage is selected to be played in a loop.
  • the cloud application can also render video data during the running process, and according to the rendered video data, the cloud application selects background music corresponding to the video data from a plurality of background music for loop playback.
  • the audio system notification is an audio notification message sent to the terminal during the running of the cloud application.
  • the audio system notification is "The enemy still has XX seconds to reach the station. Field”, "Our teammate XXX is besieged”, etc., after receiving the audio system notification, the terminal will play the audio system notification.
  • Operation sound effect is the audio data played along with the operation, so that the user has an immersive experience. For example, if the user operates the virtual character A to release the skill, the sound effect of releasing the skill is played, so that the user can clearly perceive that he has performed the operation of releasing the skill, thereby making the user feel immersed in the situation.
  • the cloud application can select audio data corresponding to the current running state from various types of audio data according to the current running state and send it to the terminal, where the first audio data is the audio data corresponding to the current running state.
  • the running state of the cloud application includes: a startup state of the cloud application, a state in which the cloud application executes an operation instruction, or a loading scene state of the cloud application, and the like.
  • the cloud application selects audio data corresponding to the startup state from multiple types of audio data, and the audio data is the first audio data.
  • the startup process of the cloud application refers to: the cloud application has been started, but has not been started yet. At this time, the cloud application can implement some functions, such as acquiring audio data and delivering audio data.
  • the audio data corresponding to the startup state is audio data of background music.
  • the cloud application is a game application.
  • the startup process will take a certain amount of time. Therefore, during the startup process of the cloud application, audio data is sent to the terminal, and the terminal plays the audio. data to avoid the boredom of users waiting for the process.
  • an operation instruction sent by the local application of the terminal is received, and the cloud application, in response to the operation instruction, executes the operation corresponding to the operation instruction, and selects the operation instruction from various types of audio data.
  • the audio data corresponding to the operation instruction is selected, and the audio data is the first audio data.
  • the cloud application is a game application.
  • the cloud application receives a skill release instruction sent by the terminal.
  • the skill release instruction carries the virtual character identifier and the skill identifier, and the cloud application responds to the skill release instruction.
  • control the corresponding virtual character to release the corresponding skill and select the skill from various types of audio data to release the corresponding audio data.
  • one or more audio sources are included in the cloud application, and the multiple types of audio data are stored in the one or more audio sources.
  • each audio source stores one type of audio data, and different audio sources store different types of audio data.
  • the cloud application can select the first audio data corresponding to the current operation state from multiple types of audio data according to the current operation state and send it to the terminal, including: the cloud application program reads from any audio source.
  • the first audio data corresponding to the current operating state is sent to the terminal; or, the cloud application program determines the target audio source according to the current operating state, reads the first audio data corresponding to the current operating state from the target audio source, and sends to the terminal.
  • the cloud application will first input the first audio data into the system framework for processing.
  • the system framework is a framework in an operating system, and the operating system is an Android system or an IOS (iPhone Operation System, Apple operating system), etc., optionally, the system framework is an audio processing framework (AudioFlinger).
  • the first audio data includes multiple channels of audio data, and the first audio data is mixed into one channel of audio data, and the third audio data obtained by the mixing process is one channel of audio data. data.
  • the first audio data includes audio data corresponding to background music and audio data corresponding to operation sound effects, that is, the first audio data includes two channels of audio data.
  • the background The audio data corresponding to the music and the audio data corresponding to the operation sound effects are mixed into one channel of audio data to obtain the third audio data, so that the third audio data heard by the subsequent user is smoother and the user's hearing effect is guaranteed.
  • the first audio data includes multiple channels of audio data
  • the first audio data includes audio data corresponding to background music and audio data corresponding to operation sound effects.
  • the background music is the audio data that is played along with the running of the cloud application
  • the operation sound effect is the audio data that is played along with the user operation. Therefore, the user may pay more attention to the audio data corresponding to the operation sound effect. Therefore, the first audio data includes multiple channels of audio data
  • performing mixing processing on the first audio data to obtain third audio data includes: determining the weight of each channel of audio data in the first audio data, according to the weight of each channel of audio data. , and mix the multiple channels of audio data into one channel of audio data to obtain third audio data.
  • the weight of each channel of audio data is determined according to the type of the audio data, for example, the weight of the system notification is the largest, the weight of the operation sound effect is second, and the weight of the background music is the smallest; or, the weight of the operation sound effect is the largest, the system notification The weight of the music is second, and the weight of the background music is the smallest.
  • the system framework includes a processing thread
  • performing sound mixing processing on the first audio data, and obtaining the third audio data includes: performing sound mixing processing on the first audio data through the processing thread to obtain the third audio data.
  • the processing thread is a mixing thread.
  • the embodiment of the present application customizes the hardware abstraction layer.
  • the hardware abstraction layer is different from the hardware abstraction layer on the terminal.
  • the hardware abstraction layer on the terminal is used to call the interface of hardware such as speakers, and input audio data into the hardware for playback.
  • the hardware abstraction layer in the embodiment of the present application is not connected with the hardware, but establishes a communication connection with the audio collection program, and sends the audio data to the audio collection program.
  • the audio collection program is configured with an audio parameter, and the audio parameter indicates that the audio data received by the audio collection program needs to meet the audio parameter.
  • the audio parameter is 24KHz (kilohertz) dual-channel, indicating that the audio collection program is configured. To receive 24KHz dual-channel audio data.
  • the hardware abstraction layer stores audio parameters, and the audio parameters are based on the audio The requirements of the acquisition program are set, so that the system framework can obtain audio parameters from the hardware abstraction layer and generate audio data that meets the audio parameters, so that the hardware abstraction layer can successfully send the audio data to the audio acquisition program.
  • the audio collection program receives 24KHz (kilohertz) audio data, and the audio parameters include: a sampling rate of 24KHz.
  • the audio parameter includes at least one of a target sampling rate, a target number of channels, or a target sampling depth.
  • this step 503 is performed before step 502 , or this step 503 is performed simultaneously with step 502 , or this step 503 is performed after step 502 .
  • this step 503 is performed only once, or the system framework needs to perform this step 503 every time the audio data is processed, which is not limited in this embodiment of the present application.
  • the third audio data can be processed according to the audio parameters in the hardware abstraction layer to obtain the second audio data, so that the audio data of the second audio data can be processed.
  • the parameters are consistent with the audio parameters in the hardware abstraction layer, so that the audio parameters of the second audio data meet the requirements of the audio collection program. That is, through the system framework, the third audio data is processed according to the audio parameters to obtain the second audio data, which is equivalent to adjusting the audio parameters of the audio data.
  • the audio parameter includes at least one of the target sampling rate, the target channel number or the target sampling depth; through the system framework, the third audio data is processed according to the audio parameter to obtain the second audio data, including the following (1 ) to at least one of (3).
  • the audio parameters include the target sampling rate, and through the system framework, the third audio data is resampled according to the target sampling rate to obtain the second audio data.
  • the third audio data is resampled to obtain the second audio data with a sampling rate of 24KHz.
  • the audio parameter includes the target channel number, and through the system framework, the channel number conversion processing is performed on the third audio data according to the target channel number to obtain the second audio data.
  • the channel number conversion process is performed on the third audio data to obtain two-channel second audio data.
  • the audio parameter includes the target sampling depth.
  • the third audio data is resampled according to the target sampling depth to obtain the second audio data.
  • the third audio data is re-sampled to obtain second audio data with a sampling depth of 8 bits.
  • the system framework includes a processing thread, and through the system framework, processing the third audio data according to the audio parameters, and obtaining the second audio data includes: processing the third audio data according to the audio parameters through the processing thread, and obtaining the first audio data.
  • Audio data In the system framework, the mixing processing of the first audio data and the processing of the third audio data according to the audio parameters are all completed by the same thread, and there is no need for multiple threads to process separately, which reduces the processing time of the audio data. In the process of transmission, the processing speed of audio data is accelerated.
  • the processing thread is a mixing thread.
  • the second audio data is sent to the hardware abstraction layer, and the hardware abstraction layer will send the second audio data to the audio acquisition program, but if the audio acquisition program has not been started, or the hardware abstraction layer and the audio acquisition program have not been established Communication connection, even if the second audio data is sent to the hardware abstraction layer, the hardware abstraction layer cannot send the second audio data to the audio acquisition program. Therefore, through the system framework, a communication connection is successfully established between the hardware abstraction layer and the audio acquisition program. case, the second audio data is sent to the hardware abstraction layer.
  • sending the second audio data to the hardware abstraction layer through the system framework includes: if a communication connection has been established between the hardware abstraction layer and the audio acquisition program, sending the second audio data to the hardware through the system framework Abstraction layer; if the hardware abstraction layer has not established a communication connection with the audio acquisition program, control the hardware abstraction layer to establish a communication connection with the audio acquisition program.
  • the second audio data is sent to the hardware abstraction layer.
  • controlling the hardware abstraction layer to establish a communication connection with the audio acquisition program includes: controlling the hardware abstraction layer to send a communication connection establishment request to the audio acquisition program, and if the audio acquisition program detects the communication connection establishment request, establishing the hardware abstraction layer and the audio Communication link between acquisition programs.
  • the hardware abstraction layer and the audio acquisition program fail to establish a communication connection successfully, and the system framework discards the second audio data and no longer uses the second audio data. Audio data is sent to the hardware abstraction layer.
  • the audio acquisition program does not detect the communication connection establishment request sent by the hardware abstraction layer, which may be because the audio acquisition program has not been successfully started.
  • the audio collection program is not only used to send the audio data generated by the cloud application to the local application of the terminal, but also used to send the video data generated by the cloud application to the local application of the terminal. If the audio collection program has not been successfully started, the audio collection program will not send the video data generated by the cloud application to the local application of the terminal, so that the terminal cannot render the screen of the cloud application according to the video data. The second audio data of the application will not affect the user.
  • the hardware abstraction layer includes a writing interface
  • sending the second audio data to the hardware abstraction layer includes: calling the writing interface of the hardware abstraction layer through the system framework to write the second audio data into the hardware abstraction layer.
  • the system framework will periodically call the writing interface of the hardware abstraction layer, and in the writing interface, it is determined whether the hardware abstraction layer has established a communication connection with the audio acquisition program, and if a communication connection has been established, the second audio data Write to the hardware abstraction layer, if the communication connection is not established, control the hardware abstraction layer to try to establish a communication connection with the audio acquisition program, and the second audio data is written into the hardware abstraction layer if the communication connection is established successfully, if the communication connection fails, Then the second audio data is discarded.
  • a communication connection is established between the hardware abstraction layer and the audio acquisition program, and the communication connection can be any form of communication connection.
  • the communication connection between the hardware abstraction layer and the audio collection program is a socket (socket) connection.
  • the hardware abstraction layer 601 is used as the client of the socket, and the audio collection program 602 is used as the server of the socket.
  • Listen the socket's accept (receive) function call is a blocking call, and will wait until a socket client is connected.
  • the audio acquisition program 602 will call the read of the socket. (read) function, the read function is configured as a blocking function, and will always wait for the hardware abstraction layer 601 to send audio data.
  • sending the second audio data to the audio collection program 602 through the hardware abstraction layer 601 is equivalent to sending the second audio data locally, and the delay is at the microsecond level.
  • the transmission time of the second audio data is greatly reduced, and the delay for the server to obtain the audio data is shortened.
  • the communication connection between the hardware abstraction layer and the audio acquisition program is: a shared memory connection.
  • the shared memory connection means: program A and program B share a memory, program A stores data in the memory, and program B can read data from the memory to realize The connection between program A and program B is realized, and the effect of program A sending data to program B is also realized.
  • sending the second audio data to the hardware abstraction layer through the system framework includes: sending the second audio data to a target memory of the hardware abstraction layer through the system framework, where the target memory is the hardware abstraction layer Shared memory for the layer and the audio capture program. Therefore, sending the second audio data to the audio collection program through the hardware abstraction layer according to the communication connection between the hardware abstraction layer and the audio collection program includes: the audio collection program reads the second audio data from the target memory.
  • any kind of communication connection can be established between the hardware abstraction layer and the audio acquisition program, the embodiment of the present application does not limit the communication connection mode between the two, and the embodiment of the present application only uses socket connection and sharing.
  • the memory is exemplified and does not limit how the two can be connected to communicate.
  • the audio collection program sends the second audio data to the encoding program, the encoding program encodes the second audio data, and then the encoding program encodes the encoded second audio data. Sent to the application local to the terminal.
  • the encoding program and the terminal can establish a communication connection, and according to the communication connection, the encoded second audio data is sent to a local application program of the terminal, and the local application program of the terminal decodes and plays the data.
  • the communication connection is a webrtc peer-to-peer connection.
  • the embodiment of the present application only takes the cloud application program outputting the first audio data and the audio collection program acquiring the second audio data as an example, and the processing process and transmission of the audio data among multiple programs in the server are taken as an example.
  • the process is exemplified.
  • the cloud application can generate audio data all the time, or generate audio data multiple times, and each time the audio data is transmitted from the cloud application to the audio collection
  • the process of the program is similar to the process of the above-mentioned steps 501 to 506 , which is not repeated in this embodiment of the present application.
  • the cloud application continuously outputs audio data
  • the cloud application periodically outputs audio data of the target size.
  • the target size of the audio data depends on the size of the terminal audio data buffer.
  • the target size of the audio data depends on the system framework, the hardware abstraction layer, or the buffer size in the audio collection program.
  • the audio data is audio data with a playback duration of 10ms.
  • a relay program is set between the system framework and the audio acquisition program, and a communication connection between the relay program and the audio acquisition program is established, and the system framework can be directly processed through the communication connection.
  • the resulting audio data is sent to the audio capture program.
  • the above method of directly sending audio data through a communication connection reduces the transmission link of audio data and shortens the time required for the audio acquisition program to obtain audio data. The length of time reduces the delay for the server to deliver audio data.
  • the thread for mixing processing in the system framework and the thread for processing according to audio parameters are both processing threads, one thread can perform two processing, which reduces the transmission of audio data, thereby shortening the acquisition of audio data by the hardware abstraction layer. time, which further reduces the delay for the server to deliver audio data.
  • the hardware abstraction layer cannot send the second audio data to the audio acquisition program.
  • the system framework sends the second audio data to the hardware abstraction layer, it will determine the hardware Whether a communication connection is established between the abstraction layer and the audio acquisition program, for example, as shown in Figure 6, if the hardware abstraction layer and the audio acquisition program have not established a communication connection, the control hardware abstraction layer attempts to establish a communication connection with the audio acquisition program. In the case of success, the second audio data will be sent to the hardware abstraction layer. If the communication connection fails to be established, the second audio data will be discarded, which reduces the sending of useless data and reduces the burden on the server.
  • the transfer program is a resampling program.
  • the server 700 includes a cloud application program 701 , a system framework 702 , a resampling program 703 and an audio collection program 704 .
  • the cloud application 701 can call the interface of the system framework 702 to write audio data into the system framework 702 , and the system framework 702 sends the obtained audio data to the resampling program 703 after processing the audio data.
  • a communication connection is established between the resampling program 703 and the audio collection program 704 , and the audio data can be directly sent to the audio collection program 704 .
  • cloud application program 701 system framework 702 , resampling program 703 and audio collection program 704 all run in the operating system container of the server 700 .
  • the server 700 also includes an encoding program 705, the audio collection program 704 sends the audio data to the encoding program 705, the encoding program 705 encodes the audio data, and sends the encoded audio data to the local application of the terminal. program.
  • FIG. 8 is a flowchart of an audio data processing method provided by an embodiment of the present application.
  • the execution body is the server shown in FIG. 7 . Referring to FIG. 8 , the method includes the following steps.
  • This step 801 is similar to the above-mentioned step 501 and will not be repeated here.
  • This step 802 is similar to the above-mentioned step 502, and details are not repeated here.
  • the resampling program is configured with an audio parameter, and the audio parameter indicates that the audio data received by the resampling program needs to satisfy the audio parameter.
  • the audio parameter is 48KHz dual-channel, indicating that the resampling program is configured to receive 48KHz dual-channel audio data. Therefore, the system framework takes the audio data from the resampling program to generate audio data that meets the needs of the resampling program.
  • This step 804 is similar to the above-mentioned step 504, and details are not repeated here.
  • the second audio data is sent to the resampling program, and the resampling program will send the second audio data to the audio collection program, but if the audio collection program has not been started, or the resampling program and the audio collection program If the communication connection is not established, even if the second audio data is sent to the resampling program, the resampling program cannot send the second audio data to the audio collection program.
  • the resampling program and the audio collection program succeed. With the communication connection established, the second audio data is sent to the resampling program.
  • sending the second audio data to the resampling program through the system framework includes: if a communication connection has been established between the resampling program and the audio acquisition program, sending the second audio data to the resampling program through the system framework Sampling program; if the resampling program has not established a communication connection with the audio collection program, control the resampling program to establish a communication connection with the audio collection program.
  • the second audio data is sent to the resampling program.
  • controlling the resampling program to establish a communication connection with the audio collection program includes: controlling the resampling program to send a communication connection establishment request to the audio collection program, and if the audio collection program detects the communication connection establishment request, establishing the resampling program and the audio Communication link between acquisition programs.
  • the system framework discards the second audio data, and will no longer use the second audio data.
  • the audio data is sent to the resampling program.
  • the audio collection program does not detect the communication connection establishment request sent by the resampling program, possibly because the audio collection program has not been successfully started.
  • the audio collection program is not only used to send the audio data generated by the cloud application to the local application of the terminal, but also used to send the video data generated by the cloud application to the local application of the terminal. If the audio collection program has not been successfully started, the audio collection program will not send the video data generated by the cloud application to the local application of the terminal, so that the terminal cannot render the screen of the cloud application according to the video data. The second audio data of the application will not affect the user.
  • the embodiment of the present application only takes as an example that the system framework sends the second audio data to the resampling program under the condition that the resampling program and the audio collection program successfully establish a communication connection.
  • the transmission process is exemplified, and in another embodiment, regardless of whether the resampling program establishes a communication connection with the audio acquisition program, the system framework will send the second audio data to the resampling program.
  • the resampling program includes a receiving thread
  • sending the second audio data to the hardware abstraction layer through the system framework includes: sending the second audio data to the receiving thread of the resampling program through the system framework.
  • the system framework processes the first audio data through a processing thread to obtain the second audio data. Therefore, in a possible implementation manner, the second audio data is sent to the resampling program through the system framework.
  • the receiving thread includes: sending the second audio data to the receiving thread of the resampling program through the processing thread.
  • a communication connection is established between the resampling program and the audio collection program, and the communication connection is any form of communication connection.
  • the communication connection between the resampling program and the audio acquisition program is a socket connection, wherein the resampling program acts as a client of the socket, and the audio acquisition program acts as a server of the socket.
  • the second audio data is sent to the audio collection program.
  • the hardware abstraction layer according to the hardware abstraction layer and the audio frequency
  • the socket connection between the collection programs is similar to the manner in which the second audio data is sent to the audio collection program, which will not be repeated here.
  • the communication connection between the resampling program and the audio acquisition program is a shared memory connection.
  • the second audio data is sent to the audio collection program.
  • the hardware abstraction layer according to the hardware abstraction layer and The shared memory connection between the audio collection programs is similar to the manner in which the second audio data is sent to the audio collection program, which will not be repeated here.
  • the resampling program includes a receiving thread
  • the communication connection between the resampling program and the audio collection program is: the communication connection between the receiving thread and the audio collection program; or, the resampling program includes the receiving thread and the first sending A thread, wherein the receiving thread is used to receive the second audio data sent by the system framework, and the first sending thread is used to send the second audio data received by the receiving thread to the audio collection program.
  • the communication connection between the resampling program and the audio collection program is: the communication connection between the first sending thread and the audio collection program.
  • the audio parameters of the second audio data meet the requirements of the resampling program. If the audio parameters of the second audio data also meet the requirements of the audio collection program, the resampling program can directly convert the second audio data Sent to the audio collection program, if the audio parameters of the second audio data do not meet the requirements of the audio collection program, the resampling program needs to perform resampling processing on the second audio data, so that the processed second audio data conforms to the audio collection program. requirements of the program, and then send the processed second audio data to the audio collection program.
  • the audio parameter configured by the resampling program is 48KHz dual-channel. If the audio parameter of the audio acquisition program is 48KHz dual-channel, the resampling program does not need to resample the second audio data, and directly sends the second audio data to the audio Acquisition program; if the audio parameter of the audio acquisition program is 16KHz dual-channel, the resampling program needs to perform resampling processing on the second audio data, so that the sampling rate of the processed second audio data is 16KHz.
  • the resampling program Since the audio parameters configured by the resampling program are the same as the audio parameters configured by the audio acquisition program, the resampling program does not need to perform resampling processing. Therefore, the resampling program can be configured according to the audio parameters configured by the audio acquisition program to make the resampling program The audio parameters configured by the program are the same as those configured by the audio capture program.
  • the system framework also includes a recording thread and a detection thread.
  • the detection thread in the system framework will detect whether there are other programs currently reading the data in the recording thread. Record the data in the thread, the system framework will no longer send data to the resampling program. The original intention of the detection thread is to save unnecessary operations and reduce power consumption.
  • the server also needs to perform the following steps 807 to 810. If the system framework does not include a recording thread, after acquiring the second audio data, the audio collection program sends the second audio data to the local application of the terminal.
  • the second audio data is sent to the recording thread through the resampling program, and the recording thread will record the received second audio data, because the recording thread records the second audio data while receiving the second audio data, and the recording
  • the process will take a certain amount of time, and it will also take a certain amount of time for the resampling program to send the second audio data to the recording thread.
  • the resampling program includes a receiving thread and a second sending thread, wherein the receiving thread is used to receive the second audio data from the system framework, and when the second sending thread has an available buffer, the second audio data is sent to the in the second sending thread.
  • the second sending thread After receiving the second audio data, the second sending thread determines whether to perform resampling processing on the second audio data according to the audio parameters configured in the recording thread.
  • the audio parameters configured by the thread perform resampling processing on the second audio data, obtain the processed second audio data, and send the processed second audio data to the recording thread; if it is not necessary to perform resampling processing on the second audio data , the second audio data is directly sent to the recording thread.
  • the existence of an available buffer in the second sending thread means that: the second sending thread sends all the audio data received by the resampling program last time to the recording thread.
  • the resampling program directly sends the second audio data to the recording thread, and the recording thread can record the second audio data. If the audio parameters of the audio data are different from the audio parameters configured by the recording thread, the resampling program directly sends the second audio data to the recording thread, and the recording thread may not be able to receive the second audio data in sequence.
  • the second sending thread determining whether to perform resampling processing on the second audio data according to the audio parameters configured in the recording thread includes: the second sending thread determining whether the audio parameters of the second audio data are the same as the audio parameters configured by the recording thread , if the audio parameters of the second audio data are the same as the audio parameters configured by the recording thread, then it is determined that there is no need to perform resampling processing on the second audio data; if the audio parameters of the second audio data are different from the audio parameters configured by the recording thread, then determine The second audio data needs to be resampled.
  • the system framework also includes a buffer corresponding to the recording thread, and recording the second audio data through the recording thread to obtain the third audio data, including: copying the second audio data to the corresponding buffer through the recording thread to obtain the third audio data,
  • the data content of the third audio data is the same as the data content of the second audio data.
  • the recording thread copies the third audio data into the corresponding cache, and calls the audio recording interface to read the third audio data from the recording thread through the audio collection program, including: using the audio collection program, invoking the audio recording interface from the recording thread
  • the third audio data is read from the corresponding buffer.
  • the audio recording interface includes a read (reading) function, and through the audio collection program, the audio recording interface is called to read the third audio data from the recording thread, including: the audio collection program calls the read function of the audio recording interface, from recording.
  • the third audio data is read in the cache corresponding to the thread. If the third audio data does not exist in the cache corresponding to the recording thread, the audio acquisition program will wait until the recording thread copies the third audio data to the cache, and then reads Pick.
  • the data content of the second audio data is the same as the data content of the third audio data, but the second audio data is directly sent by the resampling program to the audio acquisition program, and the third audio data is sent by the resampling program to the recording thread, It is then read from the recording thread by the audio acquisition program. Therefore, the second audio data can reach the audio acquisition program faster than the third audio data. The second audio data is sent to the local application of the terminal, and the third audio data is discarded.
  • a communication connection is established between the audio collection program and the resampling program, the second audio data is the audio data obtained according to the communication connection, and the third audio data is the audio data obtained by calling the audio recording interface through the audio collection program, so , the acquisition methods of the second audio data and the third audio data are different.
  • the second audio data and the third audio data are distinguished according to the acquisition methods, and the second audio data is sent to the local application of the terminal.
  • the audio collection program includes a first collection thread and a second collection thread
  • the first collection thread is used to collect the second audio data
  • a communication connection is established between the first collection thread and the resampling program
  • the resampling program is based on the resampling program.
  • the communication connection between the program and the first collection program sends the second audio data to the first collection thread; the second collection thread is used to collect the third audio data, and the second collection thread calls the audio recording interface from the recording thread.
  • the third audio data is read.
  • the server sends the audio data collected by the first collection thread to the local application of the terminal, and discards the audio data collected by the second collection thread.
  • the resampling program in the embodiment of the present application is a program in the operating system, that is to say, the resampling program is a self-contained program of the operating system, and the present application improves the original program in the operating system.
  • the above audio data processing method is implemented.
  • the embodiment of the present application only takes the cloud application program outputting the first audio data and the audio collection program acquiring the second audio data as an example, and the processing process and transmission of the audio data among multiple programs in the server are taken as an example.
  • the process is exemplified.
  • the cloud application can generate audio data all the time, or generate audio data multiple times, and each time the audio data is transmitted from the cloud application to the audio capture
  • the process of the program is similar to the process of the above-mentioned steps 801 to 810 , and details are not repeated in this embodiment of the present application.
  • the cloud application continuously outputs audio data
  • the cloud application periodically outputs audio data of the target size.
  • the target size of the audio data depends on the size of the terminal audio data buffer.
  • the target size of the audio data depends on the size of the buffer in the system framework, the resampling program, or the audio collection program.
  • the audio data is audio data with a playback duration of 10ms.
  • the audio data processing method provided by the embodiment of the present application improves the resampling program in the operating system, and establishes a communication connection between the resampling program and the audio acquisition program, so that the resampling program can directly Send the second audio data to the audio capture program.
  • the above method of directly sending audio data through a communication connection reduces the transmission link of audio data and shortens the time required for the audio acquisition program to obtain audio data. The length of time reduces the delay for the server to deliver audio data.
  • the resampling program will also send audio data to the recording thread, and the audio acquisition program reads the audio data from the recording thread to ensure that the system framework continues to send audio data to the resampling program, ensuring the continuous processing and sending of audio data.
  • the audio acquisition program will send the audio data sent by the resampling program, and discard the audio data read from the recording thread, which ensures that the delay in sending the audio data is small.
  • FIG. 9 is a flowchart of a delayed acquisition method provided by an embodiment of the present application.
  • the execution body of the embodiment of the present application is a server. Referring to FIG. 9 , the method includes the following steps.
  • the detection application is an application that runs in the server and is used to detect the delay of audio data delivered by the server.
  • the detection application can output the detection audio data, and then obtain the time consumed by the transmission of the detection audio data in other programs by acquiring the time when other programs in the server receive the detection audio data, wherein the other programs in the server are detection applications program outside of the program.
  • the first detection audio data is any detection audio data output by the detection application program.
  • the detection application program can continuously output audio data, and in addition to outputting the detection audio data, other audio data can also be output, wherein, The detection audio data is different from other audio data, so as to distinguish the detection audio data from other audio data, so as to obtain the time when the program receives the detection audio data.
  • the system framework is a framework in the operating system and is used for processing audio data.
  • the transfer program is a program between the system framework and the audio collection program, and is used to transmit the audio data processed by the system framework to the audio collection program.
  • the transfer program has the function of forwarding audio data.
  • the transfer program also has other functions. This embodiment of the present application does not limit this.
  • the second detection audio data is the audio data after the first detection audio data is processed by the system framework, but the second detection audio data and the first detection audio data are both audio data that can be distinguished from other audio data. Therefore, Even if the first detected audio data is processed to obtain the second detected audio data, the second detected audio data can be distinguished from other audio data, thereby obtaining the time when the program receives the second detected audio data.
  • the audio collection program is a program in the server for collecting audio data and sending the audio data to the terminal.
  • a communication connection is established between the relay program and the audio collection program, and the relay program directly transfers the second audio data through the communication connection. Send to audio capture program.
  • the sending time is the time when the detection application program outputs the detected audio data
  • the first receiving time is the time when the audio collection program receives the detected audio data
  • the audio collection program is used in the server to collect audio data, and the audio data is collected
  • the time when the audio collection program receives the detected audio data can be considered as the time when the server obtains the audio data
  • the first time difference between the sending time and the first receiving time also means that the server obtains The time it takes to reach the audio data, that is, the delay for the server to deliver the audio data.
  • the detection audio data is sent out by the detection application, and the reception time of the audio acquisition program receiving the detection audio data is acquired, and according to the time difference between the transmission time and the reception time, the audio data can be accurately obtained from the detection
  • the time taken by the application program to transmit to the audio collection program that is, the time required to obtain the audio data from the server, which can represent the delay in sending the audio data by the server, and then determine whether the delay in obtaining the audio data from the server can be determined according to the time length. It will affect the playback effect of the audio data, and whether it will affect the hearing effect of the end user, and then determine whether to continue to improve the server, which provides a better basis for improvement for developers.
  • the transfer program in the above steps 902 and 903 is a hardware abstraction layer; or an original resampling program in the operating system; or other programs, which are not limited in this embodiment of the present application.
  • the transfer program is a hardware abstraction layer.
  • the server 400 includes a cloud application program 401 , a system framework 402 , a hardware abstraction layer 403 and an audio collection program 404 .
  • an embodiment of the present application further provides a delay acquisition method, which can be used to detect the delay of the audio data delivered by the server shown in FIG. 4 .
  • FIG. 10 is a flowchart of a delayed acquisition method provided by an embodiment of the present application.
  • the execution body of the embodiment of the present application is the server shown in FIG. 4 . Referring to FIG. 10 , the method includes the following steps.
  • the detection application is an application that runs in the server and is used to detect the delay of audio data delivered by the server.
  • the detection application can output detection data, and then obtain the time consumed by the transmission of detection data in other programs by obtaining the time when other programs in the server receive the detection data, wherein other programs in the server are outside the detection application. program of.
  • the detection data output by the detection application program is the detection audio data. Since the detection audio data is audio data, the detection audio data is output to the After other programs such as the system framework, the other program can simulate the real audio data processing process, so that it will be more accurate to determine the delay by obtaining the time when other programs receive and detect the audio data.
  • the detection application is different from the cloud application in the above step 501, the cloud application outputs audio data according to the received operation instruction, and the detection application outputs audio data according to the configured detection logic.
  • the configured detection logic is to send the detection audio data every first time period.
  • the first duration may be any duration such as 4 seconds or 5 seconds.
  • the first detection audio data is any detection audio data output by the detection application program.
  • the detection application program can continuously output audio data, and in addition to outputting the detection audio data, other audio data can also be output, wherein, The detection audio data is different from other audio data, so as to distinguish the detection audio data from other audio data, so as to obtain the time when the program receives the detection audio data.
  • the first detected audio data is audio data carrying a tag, and it can be subsequently determined whether the first detected audio data is received according to the tag carried by the first detected audio data.
  • the first detection audio data is audio data of a fixed value
  • the first detection audio data is different from other audio data output by the detection application program.
  • the value of the first detected audio data is 0xffff (0x represents a hexadecimal value, and ffff represents a hexadecimal value)
  • the value of other audio data output by the detection application is 0, as shown in Figure 11, the The detection application program outputs audio data whose value is 0, and periodically outputs detection audio data 1101 whose value is 0xffff.
  • the server further includes a recording program, which records the current time when the detection application program inputs the first detection audio data into the system framework, where the current time is the sending time of the first detection audio data.
  • a recording program which records the current time when the detection application program inputs the first detection audio data into the system framework, where the current time is the sending time of the first detection audio data.
  • the detection application program when it inputs the first detection audio data into the system framework, it will send a message to the recording program, and the message instructs the detection application program to input the first detection audio data into the system framework, and the recording program records the reception.
  • the time of the message is used as the sending time of the first detected audio data.
  • the recording program is a program other than the detection application, or a program with a recording function in the detection application.
  • this recording program also has the function of detecting other programs, and this recording program can detect the data in the system frame, and when detecting that the system frame includes detection audio data, record the current time, and this current time is the first detection audio. The time when the data was sent.
  • the method of processing the first detected audio data through the system framework is similar to the method of processing the first audio data through the system framework in the above step 502, and the second detected audio data is sent to the hardware through the system framework
  • the manner of the abstraction layer is similar to the manner of sending the second audio data to the hardware abstraction layer through the system framework in the above step 505, and details are not repeated here.
  • the obtained second detection audio data is similar to the first detection audio data, and both are audio data that can be distinguished from other audio data.
  • the second detected audio data also carries the tag. If the value of the first detected audio data is 0xfff, and the value of other audio data is 0, then the value of the second detected audio data is a non-zero value, and the value of other audio data is still 0 after processing, that is, it will not Because the detected audio data is processed, the detection function of the detected audio data becomes invalid.
  • the recording program is further configured to record the second reception time when the hardware abstraction layer receives the second detection audio data, and before recording the second reception time, it is determined that the hardware abstraction layer receives the second detection audio data.
  • the hardware abstraction layer reports a message to the recording program, informing the recording program that the second detection audio data has been received, and when the recording program receives the reported message, records the current time, the current time.
  • the time is the second receiving time when the hardware abstraction layer receives the second detection audio data.
  • the recording program also has the function of detecting other programs.
  • the recording program detects whether the second detection audio data is included in the code of the hardware abstraction layer, and when the second detection audio data is detected, the current time is recorded, the The current time is the second receiving time when the hardware abstraction layer receives the second detection audio data.
  • the second detection audio data is sent to the audio collection program, and in the above step 506 through the hardware abstraction layer, according to the hardware abstraction layer and audio collection program
  • the communication connection between the programs is similar to sending the second audio data to the audio collection program, and details are not repeated here.
  • the recording program is also used to record the first receiving time when the audio collection program receives the second detection audio data. Before recording the first reception time, it is necessary to determine that the audio collection program receives the second detection audio data.
  • the audio collection program reports a message to the recording program, informing the recording program that the second detection audio data has been received, and when the recording program receives the reported message, records the current time, the current time The time is the first receiving time when the audio collection program receives the second detected audio data.
  • the recording program also has the function of detecting other programs, for example, whether the recording program detects the second detection audio data in the code of the audio collection program, when the second detection audio data is detected, the current time is recorded, The current time is the first receiving time when the audio collection program receives the second detected audio data.
  • the embodiments of the present application only take the time duration consumed by the detection application outputting the first detection audio data and acquiring the first detection audio data transmitted between multiple programs in the server as an example, and issuing the detection server to the acquisition server.
  • the delay of audio data is exemplified.
  • the detection application can always output audio data, and output the detection audio data every certain period of time.
  • Each of the detected audio data can obtain the delay of the audio data delivered by a server.
  • statistical processing is performed on multiple delays to obtain the target delay of the audio data delivered by the server. Because the target delay considers many Therefore, the target delay is more accurate.
  • the statistical process is an averaging process.
  • the detection application sends the detection audio data every certain period of time, and can subsequently obtain multiple first time differences and second time differences.
  • the obtained time differences can more accurately represent the detection audio data.
  • the delay from the detection application program to the audio acquisition program; by performing statistical processing on multiple second time differences, the obtained time differences can more accurately represent the delay of the detection audio data transmission from the detection application program to the hardware abstraction layer.
  • a larger time interval may be set, such as 4 seconds, 5 seconds, and the like.
  • the sending time for the detection application 1201 to send the first detection audio data is t0
  • the second receiving time for the hardware abstraction layer 1202 to receive the second detection audio data is t1.
  • the delay of detecting audio data from the detecting application program 1201 to the hardware abstraction layer 1202 is about 40ms (milliseconds)
  • the time for the audio collection program 1203 to receive the second detecting audio data is t2.
  • the first The delay of the second detection audio data from the hardware abstraction layer 1202 to the audio acquisition program 1203 receiving the second detection audio data is about 0 ms. Therefore, the time-consuming for the server to obtain audio data from the operating system is controlled to be about 40ms, which greatly shortens the time-consuming for the server to obtain the audio data.
  • the server not only sends the audio data generated by the cloud application to the terminal, but also sends the video data generated by the cloud application to the terminal.
  • Delay detection it is found that the audio and video are not synchronized, and the delay from playing the video to playing the audio corresponding to the video is about 0.37 seconds.
  • the delay between video and audio is higher than 0.3 seconds, the human ear can feel a relatively obvious delay, which affects the user's experience. If the audio data processing method provided by the embodiment of the present application is sampled, the delay in sending audio data from the server can be reduced, and the delay from video to audio can be reduced to about 0.242 seconds, so that the human ear cannot feel the obvious delay, and the increase is improved. user experience.
  • the detection audio data is sent out by the detection application, and the reception time of the audio acquisition program receiving the detection audio data is acquired, and according to the time difference between the transmission time and the reception time, the audio data can be accurately obtained from the detection application.
  • the time consumed by the transmission to the audio collection program that is, the time required to obtain the audio data from the server. This time can represent the delay in sending the audio data by the server. Based on this time, it can be determined whether the delay in obtaining the audio data by the server will affect the Whether the playback effect of the audio data will affect the hearing effect of the end user, and then determine whether to continue to improve the server, provides a better basis for improvement for developers.
  • the hardware abstraction layer receives the detection audio data, and further obtain the time consumed by the detection audio data transmitted from the detection application to the hardware abstraction layer, and the detection audio data transmitted from the hardware abstraction layer to the audio acquisition program.
  • the duration of consumption so as to accurately obtain the duration of audio data consumption in each transmission stage, so that subsequent developers can improve the server in a targeted manner.
  • the transfer program in the server is a resampling program.
  • the server 700 includes a cloud application program 701 , a system framework 702 , a resampling program 703 and an audio collection program 704 .
  • an embodiment of the present application further provides a delay acquisition method, which can be used to detect the delay of the audio data delivered by the server shown in FIG. 7 .
  • FIG. 13 is a flowchart of a delayed acquisition method provided by an embodiment of the present application.
  • the execution body of the embodiment of the present application is the server shown in FIG. 7 . Referring to FIG. 13 , the method includes the following steps.
  • the resampling program through the resampling program, according to the communication connection between the resampling program and the audio collection program, send the second detection audio data to the audio collection program, record the audio collection program to receive the first reception time of the second detection audio data, audio
  • the collection program is used for sending the second detection audio data to a local application program of the terminal.
  • steps 1301 to 1305 are similar to the above steps 1101 to 1105, the only difference is that the second detection audio data is sent to the hardware abstraction layer in the above steps 1101 to 1105, while the above steps 1301 to 1105 1305 is to send the second detected audio data to the resampling program.
  • the sending time for the detection application 1401 to send the first detection audio data is t0
  • the second receiving time for the resampling program 1402 to receive the second detection audio data is t1
  • the audio The delay of the data from the detection application program 1401 to the resampling program 1402 is about 40ms (milliseconds)
  • the first receiving time when the audio acquisition program 1403 receives the second detection audio data is t2
  • the audio data is from the resampling program 1402 to the audio acquisition program.
  • 1403 latency is about 0ms. Therefore, the time-consuming for the server to obtain audio data from the operating system is controlled to be about 40ms, which greatly shortens the time-consuming for the server to obtain the audio data.
  • the resampling program is further configured to send the second detected audio data to the recording thread.
  • the recording thread records the second detection audio data to obtain the third detection audio data, and the audio collection program reads the third detection audio data from the recording thread.
  • the recording program is further configured to record the third receiving time when the audio collection program receives the third audio data, the server obtains the third time difference between the sending time and the third receiving time, and the third time difference represents the detected audio The latency of data transfer from the detection application, resampling and recording threads to the audio capture program.
  • the audio data is obtained from the detection application program output, and the audio reaches the audio through the resampling program and the recording thread. Delay in acquisition procedure. In this way, the audio collection program only receives the third detection audio data, but cannot receive the second detection audio data, so the obtained third reception time is more accurate.
  • the sending time for the detection application 1501 to send the first detection audio data is t0
  • the second reception time for the resampling program 1502 to receive the second detection audio data is t1
  • the audio The delay of the data from the detection application program 1501 to the resampling program 1502 is about 40ms (milliseconds)
  • the third receiving time when the audio acquisition program 1503 receives the second detection audio data is t2
  • the audio data is from the resampling program 1502 to the audio acquisition program.
  • 1503 latency is about 90ms.
  • the server not only sends the audio data generated by the cloud application to the terminal, but also sends the video data generated by the cloud application to the terminal.
  • Delay detection it is found that the audio and video are not synchronized, and the delay from playing the video to playing the audio corresponding to the video is about 0.37 seconds.
  • the delay between video and audio is higher than 0.3 seconds, the human ear can feel a relatively obvious delay, which affects the user's experience. If the audio data processing method provided by the embodiment of the present application is sampled, the delay in sending audio data from the server can be reduced, and the delay from video to audio can be reduced to about 0.242 seconds, so that the human ear cannot feel the obvious delay, and the increase is improved. user experience.
  • the detection audio data is sent out by the detection application, the reception time of the audio acquisition program receiving the detection audio data is acquired, and according to the time difference between the transmission time and the reception time, the audio data from the detection application can be accurately obtained.
  • the time consumed by the transmission to the audio collection program that is, the time required to obtain the audio data from the server. This time can represent the delay in sending the audio data by the server. Then, according to this time, it can be determined whether the delay in obtaining the audio data by the server will affect the Whether the playback effect of the audio data will affect the hearing effect of the end user, and then determine whether to continue to improve the server, provides a better basis for improvement for developers.
  • the receiving time when the resampling program receives the detection audio data and further obtain the time consumed by the detection audio data transmitted from the detection application program to the resampling program, and the consumption of the detection audio data transmitted from the resampling program to the audio acquisition program.
  • the duration of the audio data in each transmission stage can be accurately obtained, so that subsequent developers can improve the server in a targeted manner.
  • Table 1 uses any audio data processing methods provided by the embodiments of the present application. method, the obtained video-to-audio delay, and the obtained video-to-audio delay using other audio data processing methods in the related art, as shown in Table 1:
  • product A adopts the audio data processing method provided in the embodiment of the present application
  • product B and product C adopt other audio data processing methods.
  • product A, product B, and product C are provided with at least one type of game, and the application of the at least one type of game runs in the server, that is, the server runs at least one type of cloud application .
  • FIG. 16 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server includes: an application running module 1601, a framework running module 1602, a transfer module 1603, and a collection module 1604;
  • the application running module 1601 for inputting the first audio data of the cloud application to the framework running module 1602;
  • This framework operation module 1602 is used to process the first audio data, obtain the second audio data, and send the second audio data to this relay module 1603;
  • the relay module 1603 is configured to send the second audio data to the collection module 1604 according to the communication connection between the relay module 1603 and the collection module 1604, and the collection module 1604 is used to send the second audio data to Terminal-native applications.
  • the framework operation module 1602 is configured to send the second audio data to the relay module 1603 if the relay module 1603 and the acquisition module 1604 have established a communication connection;
  • the framework operation module 1602 is used to control the relay module 1603 to establish a communication connection with the acquisition module 1604 if the relay module 1603 has not established a communication connection with the acquisition module 1604, and the relay module 1603 and the acquisition module 1604 are successful When the communication connection is established, the second audio data is sent to the transfer module 1603 .
  • the framework operation module 1602 is configured to perform mixing processing on the first audio data to obtain third audio data, and process the third audio data according to audio parameters to obtain the second audio data.
  • the framework operation module 1602 is configured to execute at least one of the following:
  • the audio parameter includes a target sampling rate, and the third audio data is resampled according to the target sampling rate to obtain the second audio data;
  • the audio parameter includes the target channel number, and the third audio data is subjected to channel number conversion processing according to the target channel number to obtain the second audio data;
  • the audio parameter includes a target sampling depth, and the third audio data is resampled according to the target sampling depth to obtain the second audio data.
  • the framework operation module 1602 includes a processing unit 1612, and the processing unit 1612 is configured to perform mixing processing on the first audio data to obtain the third audio data;
  • the processing unit 1612 is configured to process the third audio data according to the audio parameters to obtain the second audio data.
  • the relay module 1603 is a hardware abstraction layer operation module 1613
  • the framework operation module 1602 is used to obtain the audio parameters from the hardware abstraction layer operation module 1613
  • the hardware abstraction layer operation module 1613 stores the audio parameters.
  • the relay module 1603 is a hardware abstraction layer operation module 1613
  • the framework operation module 1602 is used to call the writing interface of the hardware abstraction layer operation module 1613, and write the second audio data into the hardware abstraction layer operation. Module 1613.
  • the relay module 1603 is a resampling module 1623
  • the framework operation module 1602 is further configured to obtain the audio parameter from the resampling module 1623, which is configured with the audio parameter.
  • the relay module 1603 is a resampling module 1623;
  • the resampling module 1623 is further configured to perform resampling processing on the second audio data to obtain processed second audio data;
  • the resampling module 1623 is configured to send the processed second audio data to the gathering module 1604 according to the communication connection between the resampling module 1623 and the gathering module 1604 .
  • the framework running module 1602 includes a recording unit 1622;
  • the resampling module 1623 for sending the second audio data to the recording unit 1622;
  • the recording unit 1622 is used to record the second audio data to obtain the third audio data
  • the acquisition module 1604 is configured to call the audio recording interface to read the third audio data from the recording unit 1622 .
  • the collection module 1604 is configured to discard the third audio data and send the second audio data to a local application of the terminal.
  • FIG. 19 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server includes: an application running module 1901, a framework running module 1902, a transfer module 1903, a collection module 1904, a recording module 1905, and an acquisition module 1906;
  • the application running module 1901 is configured to input the first detection audio data of the detection application to the framework running module 1902;
  • the recording module 1905 is used to record the sending time of the first detected audio data
  • the framework operation module 1902 is used to process the first detection audio data to obtain the second detection audio data, and send the second detection audio data to the relay module 1903;
  • the relay module 1903 is configured to send the second detection audio data to the acquisition module 1904 according to the communication connection between the relay module 1903 and the acquisition module 1904, and the acquisition module 1904 is used for the second detection audio data sent to the local application of the terminal;
  • the recording module 1905 is also used to record the first receiving time when the acquisition module 1904 receives the second detection audio data
  • the obtaining module 1906 is configured to obtain a first time difference between the sending time and the first receiving time, where the first time difference represents the delay in transmitting the detected audio data from the application running module 1901 to the collecting module 1904 .
  • the recording module 1905 is also used to record the second receiving time when the relay module receives the second detection audio data
  • the obtaining module 1906 is configured to obtain a second time difference between the sending time and the second receiving time, where the second time difference represents the delay in transmitting the detected audio data from the application running module 1901 to the transfer module 1903 .
  • the relay module 1903 is a hardware abstraction layer running module; or, the relay module 903 is a resampling module.
  • FIG. 20 is a structural block diagram of a terminal provided by an embodiment of the present application.
  • the terminal 2000 is used to perform the steps performed by the terminal in the above-mentioned embodiments.
  • the terminal 2000 is a portable mobile terminal, such as a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, a moving image expert Compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop or desktop computer.
  • Terminal 2000 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and the like by other names.
  • the terminal 2000 includes: a processor 2001 and a memory 2002 .
  • the processor 2001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 2001 can use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
  • the processor 2001 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor for processing data in a standby state.
  • the processor 2001 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 2001 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 2002 may include one or more computer-readable storage media, which may be non-transitory. Memory 2002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 2002 is used to store at least one program code, and the at least one program code is used to be executed by the processor 2001 to implement the methods provided by the method embodiments in this application. Audio data processing method, or delay acquisition method.
  • the terminal 2000 may optionally further include: a peripheral device interface 2003 and at least one peripheral device.
  • the processor 2001, the memory 2002 and the peripheral device interface 2003 may be connected through a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 2003 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 2004 , a display screen 2005 , a camera assembly 2006 , an audio circuit 2007 , a positioning assembly 2008 and a power supply 2009 .
  • FIG. 20 does not constitute a limitation on the terminal 2000, and may include more or less components than the one shown, or combine some components, or adopt different component arrangements.
  • the server 2100 may vary greatly due to different configurations or performance, and may include one or more processors (Central Processing Units, CPU) 2101 and one Or more than one memory 2102, wherein the memory 2102 stores at least one piece of program code, and the at least one piece of program code is loaded and executed by the processor 2101 to implement the methods provided by the above method embodiments.
  • the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output, and the server may also include other components for implementing device functions, which will not be described here.
  • the server 2100 may be configured to execute the steps executed by the server in the above-mentioned audio data processing method; or, used to execute the steps executed by the server in the above-mentioned delay acquisition method.
  • An embodiment of the present application further provides a computer device, the computer device includes a processor and a memory, the memory stores at least one piece of program code, and the at least one piece of program code is loaded and executed by the processor to implement the above-mentioned embodiments.
  • an embodiment of the present application further provides a storage medium, where the storage medium is used to store a computer program, and the computer program is used to execute the method provided by the foregoing embodiment.
  • the embodiments of the present application also provide a computer program product including instructions, which, when executed on a computer, cause the computer to execute the methods provided by the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Information Transfer Between Computers (AREA)
  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本申请实施例公开了一种音频数据处理方法、服务器及存储介质,属于计算机技术领域。方法应用于服务器,服务器包括云应用程序、系统框架、中转程序和音频采集程序,方法包括:将云应用程序的第一音频数据输入至系统框架;通过系统框架对第一音频数据进行处理,得到第二音频数据,将第二音频数据发送至中转程序;通过中转程序,根据中转程序与音频采集程序之间的通信连接,将第二音频数据发送至音频采集程序,音频采集程序用于将第二音频数据发送至终端本地的应用程序。在云服务器中,通过中转程序直接将系统框架处理后的第二音频数据发送至音频采集程序,减少了音频数据的传输链路,减小了云服务器下发音频数据的延时。

Description

音频数据处理方法、服务器及存储介质
本申请要求于2020年07月23日提交中国专利局、申请号为202010716978.3、申请名称为“音频数据处理方法、服务器及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及音频数据处理。
背景技术
云应用程序是指在服务器上运行的应用程序,服务器运行云应用程序,生成相应的音频数据,将该音频数据下发至终端本地的应用程序进行播放,终端本地的应用程序只需对该音频数据进行播放即可。
目前,服务器为音频采集程序提供AudioRecord(音频录制)接口。在云应用程序运行过程中服务器会通过录制线程来录制音频数据,音频采集程序即可调用AudioRecord接口,从录制线程中读取录制的音频数据,然后发送给终端本地的应用程序。
发明内容
一方面,本申请实施例提供了一种音频数据处理方法,所述方法应用于服务器,所述服务器包括云应用程序、系统框架、中转程序和音频采集程序,所述方法包括:将所述云应用程序的第一音频数据输入至所述系统框架;通过所述系统框架对所述第一音频数据进行处理,得到第二音频数据,将所述第二音频数据发送至所述中转程序;通过所述中转程序,根据所述中转程序与所述音频采集程序之间的通信连接,将所述第二音频数据发送至所述音频采集程序,所述音频采集程序用于将所述第二音频数据发送至终端本地的应用程序。
另一方面,本申请实施例提供了一种延时获取方法,所述方法应用于服务器,所述服务器包括检测应用程序、系统框架、中转程序和音频采集程序,所述方法包括:将所述检测应用程序的第一检测音频数据输入至所述系统框架,记录所述第一检测音频数据的发送时间;通过所述系统框架对所述第一检测音频数据进行处理,得到第二检测音频数据,将所述第二检测音频数据发送至所述中转程序;通过所述中转程序,根据所述中转程序与所述音频采集程序之间的通信连接,将所述第二检测音频数据发送至所述音频采集程序,记录所述音频采集程序接收所述第二检测音频数据的第一接收时间,所述音频采集程序用于将所述第二检测音频数据发送至终端本地的应用程序;获取所述发送时间和所述第一接收时间之间的第一时间差,所述第一时间差表示检测音频数据从所述检测应用程序传输至所述音频采集程序的延时。
另一方面,本申请实施例提供了一种服务器,所述服务器包括应用运行模块、框架运行模块、中转模块和采集模块;所述应用运行模块,用于将云应用程序的第一音频数据输入至所述框架运行模块;所述框架运行模块,用于对所述第一音频数据进行处理,得到第二音频数据,将所述第二音频数据发送至所述中转模块;所述中转模块,用于根据所述中转模块与所述采集模块之间的通信连接,将所述第二音频数据发送至所述采集模块,所述采集模块用于将所述第二音频数据发送至终端本地的应用程序。
一方面,本申请实施例提供了一种服务器,所述服务器包括应用运行模块、框架运行模块、中转模块、采集模块、记录模块和获取模块,所述应用运行模块,用于将检测应用程序的第一检测音频数据输入至所述框架运行模块;所述记录模块,用于记录所述第一检测音频数据的发送时间;所述框架运行模块,用于对所述第一检测音频数据进行处理,得到第二检测音频数据,将所述第二检测音频数据发送至所述中转模块;所述中转模块,用于根据所述中转模块与所述采集模块之间的通信连接,将所述第二检 测音频数据发送至所述采集模块,所述采集模块用于将所述第二检测音频数据发送至终端本地的应用程序;所述记录模块,还用于记录所述采集模块接收所述第二检测音频数据的第一接收时间;所述获取模块,用于获取所述发送时间和所述第一接收时间的第一时间差,所述第一时间差表示检测音频数据从所述应用运行模块传输至所述采集模块的延时。
另一方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序用于执行如上述方面所述的音频数据处理方法;或者,用于执行如上述方面所述的延时获取方法。
再一方面,本申请实施例提供了一种计算机程序产品或计算机程序,所述计算机程序产品或计算机程序包括计算机程序代码,所述计算机程序代码存储在计算机可读存储介质中。计算机设备的处理器从所述计算机可读存储介质读取所述计算机程序代码,所述处理器执行所述计算机程序代码,使得所述计算机设备实现如上述方面所述的音频数据处理方法;或者,实现如上述方面所述的延时获取方法。
再一方面,本申请实施例提供了一种服务器,所述服务器包括:
处理器、通信接口、存储器和通信总线;
其中,所述处理器、所述通信接口和所述存储器通过所述通信总线完成相互间的通信;所述通信接口为通信模块的接口;
所述存储器,用于存储程序代码,并将所述程序代码传输给所述处理器;
所述处理器,用于调用存储器中程序代码的指令执行如上述方面所述的音频数据处理方法;或者,执行如上述方面所述的延时获取方法。
附图说明
图1是本申请实施例提供的一种实施环境的示意图;
图2是本申请实施例提供的分布式系统应用于区块链系统的一个可选的结构示意图;
图3是本申请实施例提供的一种音频数据处理方法的流程图;
图4是本申请实施例提供的一种服务器向终端下发音频数据过程中,音频数据的传输流程图;
图5是本申请实施例提供的一种音频数据处理方法的流程图;
图6是本申请实施例提供的一种硬件抽象层将音频数据发送至音频采集程序的流程图;
图7是本申请实施例提供的一种服务器向终端下发音频数据过程中,音频数据的传输流程图;
图8是本申请实施例提供的一种音频数据处理方法的流程图;
图9是本申请实施例提供的一种延时获取方法的流程图;
图10是本申请实施例提供的一种延时获取方法的流程图;
图11是本申请实施例提供的一种检测应用程序输出的多个音频数据示意图;
图12是本申请实施例提供的一种服务器中多个程序获取音频数据的延时示意图;
图13是本申请实施例提供的一种延时获取方法的流程图;
图14是本申请实施例提供的一种服务器中多个程序获取音频数据的延时示意图;
图15是本申请实施例提供的一种服务器中多个程序获取音频数据的延时示意图;
图16是本申请实施例提供的一种音频数据处理装置的结构示意图;
图17是本申请实施例提供的另一种音频数据处理装置的结构示意图;
图18是本申请实施例提供的另一种音频数据处理装置的结构示意图;
图19是本申请实施例提供的一种延时获取装置的结构示意图;
图20是本申请实施例提供的一种终端的结构框图;
图21是本申请实施例提供的一种服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
可以理解,本申请所使用的术语“第一”、“第二”等可在本文中用于描述各种概念,但除非特别说明,这些概念不受这些术语限制。这些术语仅用于将一个概念与另一个概念区分。举例来说,在不脱离本申请的范围的情况下,可以将第一音频数据称为第二音频数据,且类似地,可将第二音频数据称为第一音频数据。
在对本申请实施例进行详细说明之前,先对涉及到的概念进行如下解释说明。
1、云应用程序:在服务器中运行的应用程序,可选地,该云应用程序为游戏类应用程序或者音频处理类应用程序等。
2、容器:容器封装了运行应用程序所必需的相关细节,例如操作系统等,一台服务器能够运行多个容器,每个容器里能够运行云应用程序和操作系统,其中,操作系统为任一操作系统,如安卓操作系统、iOS(iPhone Operation System,苹果操作系统)等。
3、硬件抽象层(AudioHal):处于系统框架和硬件驱动之间,负责接收系统框架下发的音频数据,将该音频数据通过硬件驱动输出到硬件。
4、系统框架:操作系统中提供的一个框架,可选地,为操作系统中的音频处理框架(AudioFlinger)。
5、重采样程序(RemoteSubmix):操作系统中的一个模块,用于将操作系统中的音频进行混音处理之后通过网络发送到远端。
6、音频采集程序:用于从服务器的操作系统中采集音频数据的程序,能够将采集的音频数据发送至编码模块(WebrtcProxy),由编码模块对该音频数据进行编码后下发至终端的应用程序,可选地,在云应用程序为云游戏程序时,该音频采集程序为CloudGame云游戏后端。
7、音频录制接口(AudioRecord):操作系统中音频数据采集的接口,音频数据的来源是麦克风、RemoteSubmix等。
8、混音线程(MixerThread):系统框架中负责混音的线程。
9、录制线程(RecordThread):系统框架中负责录音的线程。
图1是本申请实施例提供的一种实施环境的示意图,参见图1,该实施环境包括:终端101和服务器102,终端101与服务器102能够通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。
其中,终端101是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等设备,但并不局限于此。可选地,服务器102是独立的物理服务器;可选地,服务器102是多个物理服务器构成的服务器集群或者分布式系统;可选地,服务器102是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。
其中,服务器102中运行有云应用程序,在云应用程序的运行过程中,云应用程序生成音频数据,服务器102将该音频数据发送至终端101,以使终端101无需运行应用程序,也能够播放应用程序生成的 音频数据。
可选地,终端101安装有本地应用程序,用户能够通过本地应用程序向服务器102发送控制指令,服务器102中的云应用程序按照该控制指令运行,生成该控制指令对应的音频数据,服务器102将该音频数据下发至终端101,以使用户通过终端101上的本地应用程序播放音频数据。
可选地,本申请实施例涉及的终端和服务器连接形成分布式系统。以分布式系统为区块链系统为例,参见图2,图2是本申请实施例提供的分布式系统200应用于区块链系统的一个可选的结构示意图,由多个节点201(接入网络中的任意形式的计算设备,如服务器、终端)和客户端202形成,节点之间形成组成的点对点(P2P,Peer To Peer)网络,P2P协议是一个运行在传输控制协议(TCP,Transmission Control Protocol)协议之上的应用层协议。在分布式系统中,任何机器如服务器、终端都可以加入而成为节点。
参见图2示出的区块链系统中各节点的功能,涉及的功能包括:
(1)路由,节点具有的基本功能,用于支持节点之间的通信。
节点除具有路由功能外,还可以具有以下功能:
(2)应用,用于部署在区块链中,根据实际业务需求而实现特定业务,记录实现功能相关的数据形成记录数据,在记录数据中携带数字签名以表示任务数据的来源,将记录数据发送到区块链系统中的其他节点,供其他节点在验证记录数据来源以及完整性成功时,将记录数据添加到临时区块中。
例如,多个服务器中分别运行有云应用程序,每个服务器为区块链中的一个节点,该多个服务器运行云应用程序得到的数据同步。
本申请实施例能够应用于云游戏的场景下:
例如,用户通过终端控制云游戏运行,采用本申请实施例提供的音频数据处理方法,将云游戏运行过程中生成的音频数据发送至终端,由终端播放该音频数据,使用户在游戏过程中能够收听音频数据。
由于采用本申请实施例提供的音频数据处理方法,服务器能够更快地将音频数据发送至终端,减小了音频数据的延时,使得用户能够更快地收听到音频数据。
本申请实施例还能够应用于其他由服务器运行云应用程序的场景,本申请实施例对应用场景不做限定。
图3是本申请实施例提供的一种音频数据处理方法的流程图。本申请实施例的执行主体为服务器,参见图3,该方法包括以下步骤。
301、将云应用程序的第一音频数据输入至系统框架。
其中,云应用程序是在服务器中运行的任一应用程序,可选地,该云应用程序为游戏类应用程序,或者,该云应用程序为音频处理类应用程序等。本申请实施例对云应用程序的类型不做限定。第一音频数据为云应用程序在运行过程中生成的音频数据。
302、通过系统框架对第一音频数据进行处理,得到第二音频数据,将第二音频数据发送至中转程序。
系统框架为服务器的操作系统中的框架,用于对音频数据进行处理。中转程序为系统框架与音频采集程序之间的程序,用于将系统框架处理的音频数据传输至音频采集程序,该中转程序具有转发音频数据的功能,可选地,中转程序还能够具有其他功能,本申请实施例对此不做限定。
303、通过中转程序,根据中转程序与音频采集程序之间的通信连接,将第二音频数据发送至音频采集程序,音频采集程序用于将第二音频数据发送至终端本地的应用程序。
其中,中转程序与音频采集程序之间建立有通信连接,中转程序通过该通信连接,能够直接将第二音频数据发送至音频采集程序。
终端上安装有本地应用程序,该本地应用程序为支持终端与服务器进行交互的应用程序,音频采集程序在接收到第二音频数据之后,将第二音频数据发送至终端的本地应用程序,以使终端播放该第二音频数据,其中,该本地应用程序为步骤303中的终端本地的应用程序。
本申请实施例提供的音频数据处理方法,在系统框架和音频采集程序之间设置了中转程序,且建立了中转程序与音频采集程序之间的通信连接,通过该通信连接能够直接将系统框架处理后的音频数据发送至该音频采集程序。与音频采集程序通过调用音频录制接口从录制线程中读取音频数据相比,上述通过通信连接直接发送音频数据的方式,减少了音频数据的传输链路,缩短了音频采集程序获取到音频数据的时长,减小了服务器下发音频数据的延时。
需要说明的是,上述步骤302和步骤303中的中转程序为硬件抽象层;或者为操作系统中原有的重采样程序;或者,为其他程序,本申请实施例对此不做限定。
首先,本申请实施例以中转程序为硬件抽象层为例,对服务器进行说明。如图4所示,该服务器400包括云应用程序401、系统框架402、硬件抽象层403和音频采集程序404。
其中,云应用程序401能够调用系统框架402的接口,将音频数据通过例如混音线程写入系统框架402中,系统框架402能够调用硬件抽象层403的接口,将音频数据写入硬件抽象层403中。硬件抽象层403与音频采集程序404之间建立有通信连接,能够将音频数据发送至音频采集程序404。
需要说明的是,上述云应用程序401、系统框架402、硬件抽象层403和音频采集程序404均运行在服务器400的操作系统容器中。
可选地,该服务器400还包括编码程序405,音频采集程序404将音频数据发送至编码程序405,由编码程序405将该音频数据进行编码,并将编码后的音频数据发送至终端本地的应用程序。
在上述图4所示的服务器的基础上,本申请实施例还提供了一种音频数据处理方法,图5是本申请实施例提供的一种音频数据处理方法的流程图,本申请实施例的执行主体为图4所示的服务器,参见图5,该方法包括以下步骤。
501、将云应用程序的第一音频数据输入至系统框架。
其中,云应用程序为服务器中运行的应用程序,本地应用程序为终端安装的应用程序,该本地应用程序为支持终端与服务器进行交互的应用程序,服务器能够将云应用程序运行过程中生成的数据发送至终端的本地应用程序,以使终端展示该数据,因此终端无需运行该云应用程序,也能够获取云应用程序生成的数据。
可选地,用户还能够通过终端的本地应用程序向服务器发送指令,服务器根据该指令运行云应用程序,并将云应用程序生成的数据发送至终端的本地应用程序,使得终端能够控制服务器中云应用程序的运行,并且终端还能够获取云应用程序运行后生成的数据,因此终端无需安装和运行云应用程序,也能使用该云应用程序。
例如,用户在终端的本地应用程序中触发虚拟角色A释放技能a的操作,终端的本地应用程序响应于该操作,向服务器中的云应用程序发送技能释放指令,该技能释放指令携带虚拟角色A的虚拟标识和技能a对应的技能标识,云应用程序在接收到技能释放指令之后,根据该技能释放指令渲染虚拟角色A释放技能a的视频数据,将该视频数据发送至终端的本地应用程序中,由终端的本地应用程序展示该视频数 据,以使用户观看到虚拟角色A释放技能a的画面。由此可知,本申请实施例中是通过服务器中的云应用程序和终端的本地应用程序相互配合来实现虚拟角色A释放技能a的操作。
其中,在云应用程序的运行过程中,该云应用程序会生成音频数据,服务器能够将该音频数据发送至终端的本地应用程序,以使终端播放该音频数据或者存储该音频数据等。
例如,云应用程序根据该技能释放指令中的虚拟标识和技能标识,获取第一音频数据,将第一音频数据发送至终端的本地应用程序中,该第一音频数据为虚拟角色A释放技能a对应的技能释放音效,终端的本地应用程序在接收到第一音频数据后,播放该第一音频数据,以使用户在看到虚拟角色A释放技能a时听到对应的技能释放音效。
可选地,该云应用程序存储有多种类型的音频数据,该多种类型的音频数据包括如下类型。
背景音乐:该背景音乐为伴随云应用程序的运行而播放的音频数据,可选地,该云应用程序存储有一个背景音乐,该背景音乐随着云应用程序的运行而循环播放;可选地,该云应用程序存储有多个背景音乐,该多个背景音乐随着云应用程序的运行而循环播放,或者,不同的背景音乐适用于不同的运行阶段,云应用程序根据运行阶段从多个背景音乐中,选取该运行阶段对应的背景音乐进行循环播放。可选地,该云应用程序在运行过程中,还能够渲染视频数据,云应用程序根据渲染的视频数据,从多个背景音乐中,选择该视频数据对应的背景音乐进行循环播放。
音频系统通知:该音频系统通知为云应用程序运行过程中,向终端发送的音频通知消息,例如,云应用程序为游戏类应用程序时,该音频系统通知为“敌方还有XX秒达到站场”、“我方队友XXX被围攻”等,终端接收该音频系统通知后,播放该音频系统通知。
操作音效:该操作音效为伴随操作而播放的音频数据,以使用户有身临其境的感受。例如,用户操作虚拟角色A释放技能,则播放释放技能的音效,使得用户明确感知到自己进行了释放技能的操作,从而使用户产生身临其境的感受。
需要说明的是,上述背景音乐、音频系统通知、操作音效仅是对多种类型的音频数据进行示例性说明,而不对该多种类型的音频数据造成限定。
其中,在云应用程序的运行过程中,云应用程序能够根据当前的运行状态从多种类型的音频数据中选择与当前的运行状态对应的音频数据发送至终端,该第一音频数据为与当前的运行状态对应的音频数据。其中,云应用程序的运行状态包括:云应用程序的启动状态、云应用程序执行操作指令的状态或者云应用程序的加载场景状态等。
可选地,云应用程序在启动过程中,云应用程序从多种类型的音频数据中选择启动状态对应的音频数据,该音频数据即为第一音频数据。其中,云应用程序的启动过程是指:该云应用程序已经启动,但是还未完成启动,此时,云应用程序能够实现部分功能,例如,获取音频数据、下发音频数据等。可选地,启动状态对应的音频数据为背景音乐的音频数据。
例如,云应用程序为游戏类应用程序,对于一些比较大型的游戏应用程序,启动过程会耗费一定的时间,因此,在云应用程序的启动过程中,向终端发送音频数据,由终端播放该音频数据,以避免用户等待过程产生无聊的情绪。
可选地,云应用程序在运行过程中,接收到终端的本地应用程序发送的操作指令,该云应用程序响应于该操作指令,执行该操作指令对应的操作,从多种类型的音频数据中选择该操作指令对应的音频数据,该音频数据即为第一音频数据。
例如,云应用程序为游戏类应用程序,该云应用程序在运行过程中,接收到终端发送的技能释放指令,该技能释放指令携带虚拟角色标识和技能标识,云应用程序响应于该技能释放指令,根据该技能释放指令中的虚拟角色标识和技能标识,控制相应的虚拟角色释放相应的技能,从多种类型的音频数据中选择该技能释放对应的音频数据。
可选地,云应用程序中包括一个或多个音频源,该多种类型的音频数据存储在该一个或者多个音频源中。可选地,每个音频源存储一种类型的音频数据,不同音频源存储的音频数据的类型不同。
相应地,云应用程序能够根据当前的运行状态从多种类型的音频数据中选择与当前的运行状态对应的第一音频数据发送至终端,包括:云应用程序从任一音频源中,读取与当前的运行状态对应的第一音频数据发送至终端;或者,云应用程序根据当前的运行状态,确定目标音频源,从目标音频源中读取与当前的运行状态对应的第一音频数据发送至终端。
另外,云应用程序在将第一音频数据下发至终端的过程中,会先将第一音频数据输入至系统框架中进行处理。
502、通过系统框架,对第一音频数据进行混音处理,得到第三音频数据。
其中,系统框架是操作系统中的框架,该操作系统为安卓系统或者IOS(iPhone Operation System,苹果操作系统)等,可选地,该系统框架为音频处理框架(AudioFlinger)。
可选地,第一音频数据包括多路音频数据,将第一音频数据进行混音处理,是将该多路音频数据混合成一路音频数据,则混音处理得到的第三音频数据为一路音频数据。
例如,第一音频数据包括背景音乐对应的音频数据和操作音效对应的音频数据,也就是说,第一音频数据包括两路音频数据,为了使终端播放的音频数据更加流畅,因此,将该背景音乐对应的音频数据与操作音效对应的音频数据混合成一路音频数据,得到第三音频数据,以便后续用户听到的第三音频数据更加流畅,保证了用户的听觉效果。
另外,如果第一音频数据中包括多路音频数据,可能会存在用户更加关注其中某一路音频数据的情况,例如,第一音频数据包括背景音乐对应的音频数据和操作音效对应的音频数据,由于背景音乐是随着云应用程序的运行一直播放的音频数据,而操作音效为伴随着用户操作而播放的音频数据,因此,用户可能更加关注操作音效对应的音频数据。因此,第一音频数据包括多路音频数据,对第一音频数据进行混音处理,得到第三音频数据,包括:确定第一音频数据中每一路音频数据的权重,根据每一路音频数据的权重,将该多路音频数据混合成一路音频数据,得到第三音频数据。
可选地,每一路音频数据的权重,根据该音频数据的类型确定,例如,系统通知的权重最大,操作音效的权重次之,背景音乐的权重最小;或者,操作音效的权重最大、系统通知的权重次之,背景音乐的权重最小。
可选地,系统框架包括处理线程,则通过系统框架,对第一音频数据进行混音处理,得到第三音频数据包括:通过处理线程对第一音频数据进行混音处理,得到第三音频数据。可选地,该处理线程为混音线程。
503、通过系统框架从硬件抽象层获取音频参数,该硬件抽象层存储有音频参数。
本申请实施例定制化了硬件抽象层,该硬件抽象层与终端上的硬件抽象层不同,终端上的硬件抽象层用于调用扬声器等硬件的接口,将音频数据输入至硬件中进行播放,而本申请实施例中的硬件抽象层不与硬件连接,而是与音频采集程序建立通信连接,将音频数据发送至音频采集程序。
需要说明的是,音频采集程序配置有音频参数,该音频参数指示该音频采集程序接收的音频数据需要满足该音频参数,例如,音频参数为24KHz(千赫兹)双通道,指示音频采集程序被配置为接收24KHz双通道的音频数据。
如果硬件抽象层发送的音频数据的音频参数不符合音频采集程序的要求,那么音频采集程序可能无法顺利接收该音频数据,可选地,该硬件抽象层中存储有音频参数,该音频参数根据音频采集程序的需求设置,这样,系统框架能够从硬件抽象层中获取音频参数,生成满足该音频参数的音频数据,以使硬件抽象层成功将音频数据发送至音频采集程序。
例如,音频采集程序接收24KHz(千赫兹)的音频数据,该音频参数包括:采样率为24KHz。
可选地,该音频参数包括目标采样率、目标通道数或者目标采样深度中的至少一项。
另外,需要说明的是,可选地,该步骤503在步骤502之前执行,或者,该步骤503与步骤502同时执行,或者该步骤503在步骤502之后执行。
可选地,在云应用程序的运行过程中,该步骤503仅执行一次,或者,系统框架每次处理音频数据时,均需执行该步骤503,本申请实施例对此不做限定。
504、通过系统框架,按照该音频参数对第三音频数据进行处理,得到第二音频数据。
为了使系统框架输出的音频数据的音频参数满足音频采集程序的需求,可以按照硬件抽象层中的音频参数,对第三音频数据进行处理,得到第二音频数据,以使第二音频数据的音频参数与硬件抽象层中的音频参数一致,使得第二音频数据的音频参数满足音频采集程序的需求。也就是说,通过系统框架,按照该音频参数对第三音频数据进行处理,得到第二音频数据,相当于调整了音频数据的音频参数。
可选地,音频参数包括目标采样率、目标通道数或者目标采样深度中的至少一项;通过系统框架,按照该音频参数对第三音频数据进行处理,得到第二音频数据,包括以下(1)至(3)中的至少一项。
(1)音频参数包括目标采样率,通过系统框架,按照目标采样率对第三音频数据进行重采样处理,得到第二音频数据。
例如,如果目标采样率为24KHz,第三音频数据的采样率为48KHz,则对第三音频数据进行重采样处理,得到采样率为24KHz的第二音频数据。
(2)音频参数包括目标通道数,通过系统框架,按照目标通道数对第三音频数据进行通道数转换处理,得到第二音频数据。
例如,如果目标通道数为双通道,而第三音频数据为单通道音频数据,则对第三音频数据进行通道数转换处理,得到双通道的第二音频数据。
(3)音频参数包括目标采样深度,通过系统框架,按照目标采样深度对第三音频数据进行重采样处理,得到第二音频数据。
例如,如果目标采样深度为8bit,而第三音频数据的采样深度为16bit,则对第三音频数据进行重采样处理,得到采样深度为8bit的第二音频数据。
可选地,系统框架包括处理线程,通过系统框架,按照该音频参数对第三音频数据进行处理,得到第二音频数据包括:通过处理线程,按照音频参数对第三音频数据进行处理,得到第二音频数据。在系统框架中,对第一音频数据进行混音处理,以及按照音频参数对第三音频数据进行处理,均是通过同一个线程完成的,无需多个线程来分别处理,减少了音频数据在处理过程中的传输,进而加快了音频数据的处理速度。
可选地,该处理线程为混音线程。
505、通过系统框架,将第二音频数据发送至硬件抽象层。
通过系统框架,将第二音频数据发送至硬件抽象层,硬件抽象层会将该第二音频数据发送至音频采集程序,但是若音频采集程序还未启动,或者硬件抽象层与音频采集程序未建立通信连接,即便将第二音频数据发送至硬件抽象层,硬件抽象层也无法将第二音频数据发送至音频采集程序,因此,通过系统框架,在硬件抽象层与音频采集程序成功建立通信连接的情况下,将第二音频数据发送至硬件抽象层。
在一种可能实现方式中,通过系统框架,将第二音频数据发送至硬件抽象层,包括:若硬件抽象层与音频采集程序已建立通信连接,则通过系统框架将第二音频数据发送至硬件抽象层;若硬件抽象层还未与音频采集程序建立通信连接,则控制硬件抽象层与音频采集程序建立通信连接,在硬件抽象层与音频采集程序成功建立通信连接的情况下,通过系统框架,将第二音频数据发送至硬件抽象层。
其中,控制硬件抽象层与音频采集程序建立通信连接,包括:控制硬件抽象层向音频采集程序发送通信连接建立请求,若音频采集程序侦听到该通信连接建立请求,则建立硬件抽象层与音频采集程序之间的通信连接。
但是若音频采集程序未侦听到硬件抽象层发送的通信连接建立请求,则硬件抽象层与音频采集程序未能成功建立通信连接,则系统框架丢弃该第二音频数据,不再将该第二音频数据发送至硬件抽象层。
其中,音频采集程序未侦听到硬件抽象层发送的通信连接建立请求,可能是是因为音频采集程序还未启动成功。在一种可能实现方式中,该音频采集程序不仅用于将云应用程序生成的音频数据发送至终端的本地应用程序,还用于将云应用程序生成的视频数据发送至终端的本地应用程序。如果音频采集程序还未启动成功,该音频采集程序也不会将云应用程序生成的视频数据发送至终端的本地应用程序,从而终端无法根据视频数据渲染云应用程序的画面,此时即便丢弃云应用程序的第二音频数据,也不会对用户造成影响。
可选地,硬件抽象层包括写入接口,将第二音频数据发送至硬件抽象层,包括:通过系统框架调用硬件抽象层的写入接口,将第二音频数据写入硬件抽象层。
例如,系统框架会周期性地调用硬件抽象层的写入接口,在该写入接口中确定硬件抽象层是否与音频采集程序建立了通信连接,如果已经建立了通信连接,则将第二音频数据写入硬件抽象层,如果未建立通信连接,则控制硬件抽象层尝试与音频采集程序建立通信连接,通过通信连接建立成功,则将第二音频数据写入硬件抽象层,如果通信连接建立失败,则丢弃该第二音频数据。
506、通过硬件抽象层,根据硬件抽象层与音频采集程序之间的通信连接,将第二音频数据发送至音频采集程序,音频采集程序用于将第二音频数据发送至终端本地的应用程序。
其中,硬件抽象层与音频采集程序之间建立有通信连接,该通信连接可以是任一种形式的通信连接。
可选地,硬件抽象层与音频采集程序之间的通信连接为socket(套接字)连接。
其中,如图6所示,硬件抽象层601作为socket的客户端,音频采集程序602作为socket的服务端,在音频采集程序602中,存在一个单独的线程与socket绑定,在该线程上进行侦听,socket的accept(接收)函数调用是一个阻塞调用,会一直等到有socket客户端连接上来,在硬件抽象层601与音频采集程序602建立socket连接后,音频采集程序602会调用socket的read(读取)函数,该read函数被配置为一个阻塞函数,会一直等待硬件抽象层601将音频数据发送过来。由于硬件抽象层601和音频采集程序602均在同一容器中运行,因此,通过硬件抽象层601将第二音频数据发送至音频采集程序602相当于本机发送,延时 是微秒级别的,极大地减少了第二音频数据的传输时长,缩短了服务器获取音频数据的延时。
可选地,硬件抽象层与音频采集程序之间的通信连接为:共享内存连接。其中,以程序A和程序B为例,共享内存连接是指:程序A与程序B共享一个内存,程序A将数据存储到该内存中,程序B能够从该内存中将数据读取出来,实现了程序A与程序B的连接,也实现了程序A将数据发送至程序B的效果。
在一种可能实现方式中,通过系统框架,将第二音频数据发送至硬件抽象层,包括:通过系统框架,将第二音频数据发送至硬件抽象层的目标内存中,该目标内存为硬件抽象层与音频采集程序的共享内存。因此,通过硬件抽象层,根据硬件抽象层与音频采集程序之间的通信连接,将第二音频数据发送至音频采集程序,包括:音频采集程序从目标内存中读取第二音频数据。
需要说明的是,硬件抽象层与音频采集程序之间可以建立任一种通信连接,本申请实施例对二者之间的通信连接方式不做限定,且本申请实施例仅以socket连接和共享内存进行示例性说明,而不对二者的通信连接方式造成限制。
可选地,音频采集程序在获取到第二音频数据之后,将第二音频数据发送至编码程序,由编码程序对第二音频数据进行编码,之后,由编码程序将编码后的第二音频数据发送至终端本地的应用程序。
可选地,编码程序与终端能够建立通信连接,根据该通信连接,将编码后的第二音频数据发送至终端本地的应用程序,由终端本地的应用程序进行解码并播放。可选地,该通信连接为webrtc对等连接。
需要说明的是,本申请实施例仅是以云应用程序输出第一音频数据,音频采集程序获取到第二音频数据为例,对音频数据在服务器中的多个程序之间的处理过程和传输过程进行示例性说明,在一种可能实现方式中,在云应用程序的运行过程,云应用程序能够一直生成音频数据,或者多次生成音频数据,每次音频数据从云应用程序传输至音频采集程序的过程与上述步骤501至步骤506的过程类似,本申请实施例在此不再一一赘述。
需要说明的是,如果云应用程序持续输出音频数据,该云应用程序会周期性输出目标大小的音频数据。可选地,该音频数据的目标大小取决于终端音频数据缓存的大小,可选地,该音频数据的目标大小取决于系统框架、硬件抽象层、或者音频采集程序中缓存的大小。例如,该音频数据为播放时长为10ms的音频数据。
本申请实施例提供的音频数据处理方法,在系统框架和音频采集程序之间设置了中转程序,且建立了中转程序与音频采集程序之间的通信连接,通过该通信连接能够直接将系统框架处理后的音频数据发送至该音频采集程序。与音频采集程序通过调用音频录制接口从录制线程中读取音频数据相比,上述通过通信连接直接发送音频数据的方式,减少了音频数据的传输链路,缩短了音频采集程序获取到音频数据的时长,减小了服务器下发音频数据的延时。
另外,由于系统框架中进行混音处理的线程和按照音频参数进行处理的线程均为处理线程,通过一个线程能够进行两次处理,减少音频数据的传输,从而缩短了硬件抽象层获取到音频数据的时间,进一步减小了服务器下发音频数据的延时。
另外,由于硬件抽象层与音频采集程序之间的通信连接建立失败,硬件抽象层无法将第二音频数据发送至音频采集程序,系统框架将第二音频数据发送至硬件抽象层时,会确定硬件抽象层与音频采集程序是否建立了通信连接,例如图6所示,若硬件抽象层与音频采集程序还未建立通信连接,则控制硬件抽象层尝试与音频采集程序建立通信连接,在通信连接建立成功的情况下,才会将第二音频数据发送至硬件抽象层,在通信连接建立失败的情况下,会丢弃第二音频数据,减少了无用数据的发送,为服务器 减轻了负担。
在一种可能实现方式中,中转程序为重采样程序,如图7所示,该服务器700包括云应用程序701、系统框架702、重采样程序703和音频采集程序704。
其中,云应用程序701能够调用系统框架702的接口,将音频数据写入系统框架702中,系统框架702在对音频数据处理后,将得到的音频数据发送至重采样程序703。重采样程序703与音频采集程序704之间建立有通信连接,能够直接将音频数据发送至音频采集程序704。
需要说明的是,上述云应用程序701、系统框架702、重采样程序703和音频采集程序704均运行在服务器700的操作系统容器中。
可选地,该服务器700还包括编码程序705,音频采集程序704将音频数据发送至编码程序705,由编码程序705将该音频数据进行编码,并将编码后的音频数据发送至终端本地的应用程序。
在上述图7所示的服务器的基础上,本申请实施例还提供了一种音频数据处理方法,图8是本申请实施例提供的一种音频数据处理方法的流程图,本申请实施例的执行主体为图7所示的服务器,参见图8,该方法包括以下步骤。
801、将云应用程序的第一音频数据输入至系统框架。
该步骤801与上述步骤501类似,在此不再一一赘述。
802、通过系统框架对第一音频数据进行混音处理,得到第三音频数据。
该步骤802与上述步骤502类似,在此不再一一赘述。
803、通过系统框架从重采样程序获取音频参数。
其中,重采样程序中配置有音频参数,该音频参数指示该重采样程序接收的音频数据需要满足该音频参数,例如,音频参数为48KHz双通道,指示该重采样程序被配置为接收48KHz双通道的音频数据。因此,系统框架会从重采样程序中获取音频数据,以生成符合重采样程序需求的音频数据。
804、通过系统框架,按照该音频参数对第三音频数据进行处理,得到第二音频数据。
该步骤804与上述步骤504类似,在此不再一一赘述。
805、通过系统框架,将第二音频数据发送至重采样程序。
其中,通过系统框架,将第二音频数据发送至重采样程序,重采样程序会将该第二音频数据发送至音频采集程序,但是若音频采集程序还未启动,或者重采样程序与音频采集程序未建立通信连接,即便将第二音频数据发送至重采样程序,重采样程序也无法将第二音频数据发送至音频采集程序,可选地,通过系统框架,在重采样程序与音频采集程序成功建立通信连接的情况下,将第二音频数据发送至重采样程序。
在一种可能实现方式中,通过系统框架,将第二音频数据发送至重采样程序,包括:若重采样程序与音频采集程序已建立通信连接,则通过系统框架将第二音频数据发送至重采样程序;若重采样程序还未与音频采集程序建立通信连接,则控制重采样程序与音频采集程序建立通信连接,在重采样程序与音频采集程序成功建立通信连接的情况下,通过系统框架,将第二音频数据发送至重采样程序。
其中,控制重采样程序与音频采集程序建立通信连接,包括:控制重采样程序向音频采集程序发送通信连接建立请求,若音频采集程序侦听到该通信连接建立请求,则建立重采样程序与音频采集程序之间的通信连接。
但是若音频采集程序未侦听到重采样程序发送的通信连接建立请求,则重采样程序与音频采集程序 未能成功建立通信连接,则系统框架丢弃该第二音频数据,不再将该第二音频数据发送至重采样程序。
其中,音频采集程序未侦听到重采样程序发送的通信连接建立请求,可能是因为音频采集程序还未启动成功。在一种可能实现方式中,该音频采集程序不仅用于将云应用程序生成的音频数据发送至终端的本地应用程序,还用于将云应用程序生成的视频数据发送至终端的本地应用程序。如果音频采集程序还未启动成功,该音频采集程序也不会将云应用程序生成的视频数据发送至终端的本地应用程序,从而终端无法根据视频数据渲染云应用程序的画面,此时即便丢弃云应用程序的第二音频数据,也不会对用户造成影响。
需要说明的是,本申请实施例仅是以系统框架在重采样程序与音频采集程序成功建立通信连接的情况下,将第二音频数据发送至重采样程序为例,对音频数据在服务器中的传输过程进行示例性说明,而在另一实施例中,无论重采样程序与音频采集程序是否建立通信连接,系统框架均会将第二音频数据发送至重采样程序。
可选地,重采样程序包括接收线程,通过系统框架,将第二音频数据发送至硬件抽象层,包括:通过系统框架,将第二音频数据发送至重采样程序的接收线程。
可选地,系统框架是通过处理线程对第一音频数据进行处理,得到第二音频数据的,因此,在一种可能实现方式中,通过系统框架,将第二音频数据发送至重采样程序的接收线程,包括:通过处理线程,将第二音频数据发送至重采样程序的接收线程。
806、通过重采样程序,根据重采样程序与音频采集程序之间的通信连接,将第二音频数据发送至音频采集程序,音频采集程序用于将第二音频数据发送至终端本地的应用程序。
其中,重采样程序与音频采集程序之间建立有通信连接,该通信连接是任一种形式的通信连接。
可选地,重采样程序与音频采集程序之间的通信连接为socket连接,其中,重采样程序作为socket的客户端,音频采集程序作为socket的服务端。其中,通过重采样程序,根据重采样程序与音频采集程序之间的socket连接,将第二音频数据发送至音频采集程序的方式,与步骤506中,通过硬件抽象层,根据硬件抽象层与音频采集程序之间的socket连接,将第二音频数据发送至音频采集程序的方式类似,在此不再一一赘述。
可选地,重采样程序与音频采集程序之间的通信连接为共享内存连接。其中,通过重采样程序,根据重采样程序与音频采集程序之间的共享内存连接,将第二音频数据发送至音频采集程序的方式,与步骤506中,通过硬件抽象层,根据硬件抽象层与音频采集程序之间的共享内存连接,将第二音频数据发送至音频采集程序的方式类似,在此不再一一赘述。
可选地,重采样程序中包括接收线程,重采样程序与音频采集程序之间的通信连接为:接收线程与音频采集程序之间的通信连接;或者,重采样程序包括接收线程和第一发送线程,其中,接收线程用于接收系统框架发送的第二音频数据,第一发送线程为用于将接收线程接收的第二音频数据发送至音频采集程序。重采样程序与音频采集程序之间的通信连接为:第一发送线程与音频采集程序之间的通信连接。
另外,根据上述步骤803可知,第二音频数据的音频参数符合重采样程序的需求,若该第二音频数据的音频参数也符合音频采集程序的需求,则重采样程序能够直接将第二音频数据发送给音频采集程序,若第二音频数据的音频参数不符合音频采集程序的需求,则重采样程序需要对该第二音频数据进行重采样处理,以使处理后的第二音频数据符合音频采集程序的需求,再将处理后的第二音频数据发送至音频采集程序。
例如,重采样程序配置的音频参数为48KHz双通道,若音频采集程序的音频参数为48KHz双通道,则重采样程序无需对第二音频数据进行重采样处理,直接将第二音频数据发送至音频采集程序;若音频采集程序的音频参数为16KHz双通道,则重采样程序需要对第二音频数据进行重采样处理,以使处理后的第二音频数据的采样率为16KHz。
由于重采样程序配置的音频参数与音频采集程序配置的音频参数相同的情况下,重采样程序无需进行重采样处理,因此,可以根据音频采集程序配置的音频参数配置重采样程序,以使重采样程序配置的音频参数与音频采集程序配置的音频参数相同。
需要说明的是,在一种可能实现方式中,系统框架还包括录制线程和检测线程,系统框架中的检测线程会检测当前是否存在其他程序读取录制线程中的数据,如果没有其他程序读取录制线程中的数据,系统框架将不再向重采样程序发送数据。该检测线程的设计初衷在于节省不必要的运算,降低功耗。
另外,如果系统框架还包括录制线程,服务器还需要执行以下步骤807至步骤810。如果系统框架不包括录制线程,则音频采集程序在获取到第二音频数据后,将该第二音频数据发送至终端的本地应用程序。
807、通过重采样程序,将第二音频数据发送至录制线程。
其中,通过重采样程序,将第二音频数据发送至录制线程,录制线程会对接收到的第二音频数据进行录制,由于录制线程边接收第二音频数据边录制第二音频数据,且该录制过程会耗费一定的时间,导致重采样程序将第二音频数据发送至录制线程也耗费一定的时间。可选地,重采样程序包括接收线程和第二发送线程,其中,接收线程用于从系统框架中接收第二音频数据,在第二发送线程存在可用的缓存时,将第二音频数据发送至第二发送线程中。第二发送线程在接收到第二音频数据之后,根据录制线程中配置的音频参数确定是否对该第二音频数据进行重采样处理,若需要对该第二音频数据进行重采样处理,则按照录制线程配置的音频参数,对第二音频数据进行重采样处理,得到处理后的第二音频数据,将处理后的第二音频数据发送至录制线程中;若无需对第二音频数据进行重采样处理,则直接将第二音频数据发送至录制线程中。
其中,第二发送线程存在可用的缓存是指:第二发送线程将重采样程序上一次接收的音频数据全部发送至录制线程中。
另外,如果第二音频数据的音频参数与录制线程配置的音频参数相同,则重采样程序直接将第二音频数据发送至录制线程,该录制线程能够对该第二音频数据进行录制,如果第二音频数据的音频参数与录制线程配置的音频参数不同,则重采样程序直接将第二音频数据发送给录制线程,该录制线程可能无法顺序接收该第二音频数据。其中,第二发送线程根据录制线程中配置的音频参数确定是否对该第二音频数据进行重采样处理,包括:第二发送线程确定第二音频数据的音频参数与录制线程配置的音频参数是否相同,如果第二音频数据的音频参数与录制线程配置的音频参数相同,则确定无需对第二音频数据进行重采样处理;如果第二音频数据的音频参数与录制线程配置的音频参数不同,则确定需要对第二音频数据进行重采样处理。
808、通过录制线程对第二音频数据进行录制,得到第三音频数据。
系统框架还包括录制线程对应的缓存,通过录制线程对第二音频数据进行录制,得到第三音频数据,包括:通过录制线程将第二音频数据拷贝到对应的缓存中,得到第三音频数据,其中,第三音频数据的数据内容与第二音频数据的数据内容相同。
809、通过音频采集程序,调用音频录制接口从录制线程中读取第三音频数据。
其中,录制线程将第三音频数据复制到对应的缓存中,通过音频采集程序,调用音频录制接口从录制线程中读取第三音频数据,包括:通过音频采集程序,调用音频录制接口从录制线程对应的缓存中读取第三音频数据。
可选地,音频录制接口包括read(读取)函数,通过音频采集程序,调用音频录制接口从录制线程中读取第三音频数据,包括:音频采集程序调用音频录制接口的read函数,从录制线程对应的缓存中读取第三音频数据,如果录制线程对应的缓存中不存在第三音频数据,该音频采集程序就会等待,直至录制线程将第三音频数据拷贝到缓存中,再进行读取。
810、通过音频采集程序,丢弃第三音频数据,将第二音频数据发送至终端本地的应用程序。
其中,第二音频数据的数据内容与第三音频数据的数据内容相同,但是第二音频数据是重采样程序直接发送至音频采集程序的,而第三音频数据是重采样程序发送给录制线程,再由音频采集程序从录制线程中读取的,因此,第二音频数据相比第三音频数据能够更快到达音频采集程序,为了减小服务器下发音频数据的延时,音频采集程序会将第二音频数据发送至终端本地的应用程序,而将第三音频数据丢弃。
另外,音频采集程序与重采样程序之间建立有通信连接,第二音频数据是根据通信连接获取的音频数据,而第三音频数据是通过音频采集程序调用音频录制接口的获取的音频数据,因此,第二音频数据和第三音频数据的获取方式不同,可选地,根据获取方式来区分第二音频数据和第三音频数据,将第二音频数据发送至终端本地的应用程序。
例如,音频采集程序包括第一采集线程和第二采集线程,第一采集线程用于采集第二音频数据,该第一采集线程与重采样程序之间建立有通信连接,重采样程序根据重采样程序与该第一采集程序之间的通信连接,将第二音频数据发送至第一采集线程;第二采集线程用于采集第三音频数据,该第二采集线程调用音频录制接口从录制线程中读取第三音频数据。其中服务器将第一采集线程采集的音频数据发送至终端的本地应用程序,将第二采集线程采集的音频数据丢弃。
需要说明的是,本申请实施例中的重采样程序是操作系统中的程序,也就是说,该重采样程序是操作系统自带的程序,本申请通过对操作系统中原有的程序进行改进来实现上述音频数据处理方法。
需要说明的是,本申请实施例仅是以云应用程序输出第一音频数据,音频采集程序获取到第二音频数据为例,对音频数据在服务器中的多个程序之间的处理过程和传输过程进行示例性说明,在一种可能实现方式中,在云应用程序的运行过程,云应用程序能够一直生成音频数据,或者多次生成音频数据,每次音频数据从云应用程序传输至音频采集程序的过程与上述步骤801至步骤810的过程类似,本申请实施例在此不再一一赘述。
需要说明的是,如果云应用程序持续输出音频数据,该云应用程序会周期性输出目标大小的音频数据。可选地,该音频数据的目标大小取决于终端音频数据缓存的大小,可选地,该音频数据的目标大小取决于系统框架、重采样程序、或者音频采集程序中缓存的大小。例如,该音频数据为播放时长为10ms的音频数据。
本申请实施例提供的音频数据处理方法,对操作系统中的重采样程序进行了改进,在重采样程序与音频采集程序之间建立了通信连接,使得该重采样程序能够根据该通信连接,直接将第二音频数据发送至音频采集程序中。与音频采集程序通过调用音频录制接口从录制线程中读取音频数据相比,上述通过 通信连接直接发送音频数据的方式,减少了音频数据的传输链路,缩短了音频采集程序获取到音频数据的时长,减小了服务器下发音频数据的延时。
另外,重采样程序还会将音频数据发送至录制线程中,音频采集程序从录制线程中读取音频数据,以保证系统框架持续向重采样程序发送音频数据,保证了音频数据的持续处理和发送,并且,音频采集程序会发送重采样程序发送过来的音频数据,而丢弃从录制线程中读取的音频数据,保证了下发音频数据的延时较小。
另外,本申请实施例还提供了一种延时获取方法,该延时获取方法用于获取上述音频数据处理方法中服务器获取音频数据的延时。图9是本申请实施例提供的一种延时获取方法的流程图,本申请实施例的执行主体为服务器,参见图9,该方法包括以下步骤。
901、将检测应用程序的第一检测音频数据输入至系统框架,记录第一检测音频数据的发送时间。
其中,该检测应用程序是在服务器中运行的,且用于检测服务器下发音频数据延时的应用程序。该检测应用程序能够输出检测音频数据,后续通过获取服务器中其他程序接收该检测音频数据的时间,来获取检测音频数据在其他程序中传输所消耗的时长,其中,服务器中的其他程序为检测应用程序之外的程序。
其中,第一检测音频数据为检测应用程序输出的任一检测音频数据,可选地,检测应用程序能够持续输出音频数据,且除了输出检测音频数据之外,还会输出其他音频数据,其中,检测音频数据与其他音频数据不同,以便对检测音频数据和其他音频数据进行区分,从而得到程序接收检测音频数据的时间。
902、通过系统框架对第一检测音频数据进行处理,得到第二检测音频数据,将第二检测音频数据发送至中转程序。
其中,系统框架为操作系统中的框架,用于对音频数据进行处理。中转程序为系统框架与音频采集程序之间的程序,用于将系统框架处理的音频数据传输至音频采集程序,该中转程序具有转发音频数据的功能,可选地,中转程序还具有其他功能,本申请实施例对此不做限定。
第二检测音频数据为通过系统框架对第一检测音频数据进行处理后的音频数据,但是该第二检测音频数据与第一检测音频数据均为能够与其他音频数据进行区分的音频数据,因此,即便对第一检测音频数据进行处理,得到第二检测音频数据,也能够区分第二检测音频数据以及其他音频数据,从而得到程序接收第二检测音频数据的时间。
903、通过中转程序,根据中转程序与音频采集程序之间的通信连接,将第二检测音频数据发送至音频采集程序,记录音频采集程序接收第二检测音频数据的第一接收时间,音频采集程序用于将第二检测音频数据发送至终端本地的应用程序。
其中,音频采集程序为服务器中用于采集音频数据,并将音频数据发送至终端的程序,中转程序与音频采集程序之间建立有通信连接,中转程序通过该通信连接,直接将第二音频数据发送至音频采集程序。
904、获取发送时间和第一接收时间之间的第一时间差,第一时间差表示检测音频数据从检测应用程序传输至音频采集程序的延时。
由于发送时间为检测应用程序输出检测音频数据的时间,而第一接收时间为音频采集程序接收到该检测音频数据的时间,且由于音频采集程序为服务器中用于采集音频数据,并将音频数据发送至终端的程序,因此,音频采集程序接收到该检测音频数据的时间,能够认为是服务器获取到音频数据的时间, 因此,发送时间和第一接收时间之间的第一时间差还表示服务器获取到音频数据所消耗的时长,也即是服务器下发音频数据的延时。
本申请实施例提供的延时获取方法,通过检测应用程序发出检测音频数据,获取音频采集程序接收检测音频数据的接收时间,根据发送时间与接收时间之间的时间差,能够准确得到音频数据从检测应用程序传输至音频采集程序所消耗的时长,也即是得到服务器获取音频数据的时长,该时长能够代表服务器下发音频数据的延时,后续根据该时长能够确定服务器获取音频数据的延时是否会影响音频数据的播放效果,是否会影响终端用户的听觉效果,进而确定是否继续对服务器进行改进,为开发人员提供了较好的改进依据。
需要说明的是,上述步骤902和步骤903中的中转程序为硬件抽象层;或者为操作系统中原有的重采样程序;或者,为其他程序,本申请实施例对此不做限定。
可选地,该中转程序为硬件抽象层,如图4所示,该服务器400包括云应用程序401、系统框架402、硬件抽象层403和音频采集程序404。在上述图4所示的服务器的基础上,本申请实施例还提供了一种延时获取方法,能够用于检测图4所示的服务器下发音频数据的延时。图10是本申请实施例提供的一种延时获取方法的流程图,本申请实施例的执行主体为图4所示的服务器,参见图10,该方法包括以下步骤。
1001、将检测应用程序的第一检测音频数据输入至系统框架,记录第一检测音频数据的发送时间。
其中,该检测应用程序是在服务器中运行的,且用于检测服务器下发音频数据延时的应用程序。该检测应用程序能够输出检测数据,后续通过获取服务器中其他程序接收该检测数据的时间,来获取检测数据在其他程序中传输所消耗的时长,其中,服务器中的其他程序为检测应用程序之外的程序。
为了更加准确地获取到音频数据在服务器中多个程序之间传输所消耗的时长,检测应用程序输出的检测数据为检测音频数据,由于检测音频数据为音频数据,因此,将检测音频数据输出至系统框架等其他程序后,该其他程序能够模拟真实的音频数据处理过程,这样,后续通过获取其他程序接收检测音频数据的时间来确定延时会更加准确。
可选地,检测应用程序与上述步骤501中的云应用程序不同,云应用程序按照接收到的操作指令输出音频数据,而检测应用程序按照配置的检测逻辑来输出音频数据的。可选地,配置的检测逻辑为每隔第一时长发送一次检测音频数据。其中,第一时长可以是4秒、5秒等任一时长。
其中,第一检测音频数据为检测应用程序输出的任一检测音频数据,可选地,检测应用程序能够持续输出音频数据,且除了输出检测音频数据之外,还会输出其他音频数据,其中,检测音频数据与其他音频数据不同,以便区分检测音频数据与其他音频数据,从而得到程序接收检测音频数据的时间。
可选地,第一检测音频数据为携带有标签的音频数据,后续根据第一检测音频数据携带的标签能够确定是否接收到第一检测音频数据。
可选地,第一检测音频数据为一固定数值的音频数据,且该第一检测音频数据与检测应用程序输出的其他音频数据不同。例如,第一检测音频数据的数值为0xffff(0x表示十六进制,ffff表示十六进制的数值),而检测应用程序输出的其他音频数据的数值为0,如图11所示,该检测应用程序输出数值为0的音频数据,并周期性输出数值为0xffff的检测音频数据1101。
可选地,服务器还包括记录程序,该记录程序在检测应用程序将第一检测音频数据输入至系统框架时,记录下当前时间,该当前时间为第一检测音频数据的发送时间。可选地,检测应用程序在将第一检测音频数据输入至系统框架时,会向记录程序发送消息,该消息指示检测应用程序将第一检测音频数据 输入至系统框架中,该记录程序记录接收该消息的时间,作为第一检测音频数据的发送时间。
需要说明的是,该记录程序为检测应用程序之外的程序,或者为检测应用程序中的拥有记录功能的程序。
可选地,该记录程序还具有检测其他程序的功能,该记录程序能够检测系统框架中的数据,在检测到系统框架中包含检测音频数据时,记录当前时间,该当前时间为第一检测音频数据的发送时间。
1002、通过系统框架对第一检测音频数据进行处理,得到第二检测音频数据,将第二检测音频数据发送至硬件抽象层。
需要说明的是,通过系统框架对第一检测音频数据进行处理的方式,与上述步骤502中通过系统框架对第一音频数据进行处理的方式类似,通过系统框架将第二检测音频数据发送至硬件抽象层的方式,与上述步骤505中通过系统框架将第二音频数据发送至硬件抽象层的方式类似,在此不再一一赘述。
需要说明的是,通过系统框架对第一检测音频数据进行处理后,得到的第二检测音频数据与第一检测音频数据类似,均为能够与其他音频数据进行区分的音频数据。
例如,如果第一检测音频数据为携带标签的音频数据,那么第二检测音频数据也携带该标签。如果第一检测音频数据的数值为0xffff,其他音频数据的数值为0,那么第二检测音频数据的数值为非0数值,而其他音频数据在处理后,数值依然为0,也即是不会因为对检测音频数据进行处理,而使得检测音频数据的检测功能失效。
1003、记录硬件抽象层接收第二检测音频数据的第二接收时间,获取发送时间与第二接收时间之间的第二时间差,第二时间差表示检测音频数据从检测应用程序传输至硬件抽象层的延时。
其中,记录程序还用于记录硬件抽象层接收第二检测音频数据的第二接收时间,在记录第二接收时间之前,先确定硬件抽象层接收到第二检测音频数据。可选地,硬件抽象层在接收到第二检测音频数据之后,向记录程序上报消息,告知记录程序已接收到第二检测音频数据,记录程序接收到上报的消息时,记录当前时间,该当前时间为硬件抽象层接收第二检测音频数据的第二接收时间。
可选地,该记录程序还具有检测其他程序的功能,例如,该记录程序检测硬件抽象层的代码中是否包括第二检测音频数据,在检测到第二检测音频数据时,记录当前时间,该当前时间为硬件抽象层接收第二检测音频数据的第二接收时间。
1004、通过硬件抽象层,根据硬件抽象层与音频采集程序之间的通信连接,将第二检测音频数据发送至音频采集程序,记录音频采集程序接收第二检测音频数据的第一接收时间,音频采集程序用于将第二检测音频数据发送至终端本地的应用程序。
其中,通过硬件抽象层,根据硬件抽象层与音频采集程序之间的通信连接,将第二检测音频数据发送至音频采集程序,与上述步骤506中通过硬件抽象层,根据硬件抽象层与音频采集程序之间的通信连接,将第二音频数据发送至音频采集程序类似,在此不再一一赘述。
其中,记录程序还用于记录音频采集程序接收第二检测音频数据的第一接收时间,在记录第一接收时间之前,要先确定音频采集程序接收到第二检测音频数据。可选地,音频采集程序在接收到第二检测音频数据之后,向记录程序上报消息,告知记录程序已接收到第二检测音频数据,记录程序接收到上报的消息时,记录当前时间,该当前时间为音频采集程序接收第二检测音频数据的第一接收时间。
可选地,所述记录程序还具有检测其他程序的功能,例如,该记录程序检测音频采集程序的代码中是否包括第二检测音频数据,在检测到第二检测音频数据时,记录当前时间,该当前时间为音频采集程 序接收第二检测音频数据的第一接收时间。
1005、获取发送时间和第一接收时间之间的第一时间差,第一时间差表示检测音频数据从检测应用程序传输至音频采集程序的延时。
需要说明的是,本申请实施例仅是以检测应用程序输出第一检测音频数据,获取第一检测音频数据在服务器中的多个程序之间传输所消耗的时长为例,对获取服务器下发音频数据的延时进行示例性说明,在一种可能实现方式中,在检测应用程序的运行过程中,该检测应用程序能够一直输出音频数据,且每隔一定时长输出一次检测音频数据,根据每个检测音频数据,均能获取一个服务器下发音频数据的延时,可选地,对多个延时进行统计处理,得到服务器下发音频数据的目标延时,由于该目标延时考虑了多个检测音频数据的传输过程,因此,该目标延时更加准确。可选地,统计处理为平均处理。
例如,检测应用程序每隔一定时长发送一次检测音频数据,后续能够获取多个第一时间差和第二时间差,通过将多个第一时间差进行统计处理,得到的时间差能够更加准确地表示检测音频数据从检测应用程序传输至音频采集程序的延时;通过将多个第二时间差进行统计处理,得到的时间差能够更加准确地表示检测音频数据从检测应用程序传输至硬件抽象层的延时。
需要说明的是,为了确保相邻的两个检测音频数据能够明显区分开,可以设置较大的时长间隔,例如4秒、5秒等。
如图12所示,在实际获取延时的过程中,检测应用程序1201发送第一检测音频数据的发送时间为t0,硬件抽象层1202接收第二检测音频数据的第二接收时间为t1,根据t0和t1可知,检测音频数据从检测应用程序1201到硬件抽象层1202的延时大约为40ms(毫秒),音频采集程序1203接收第二检测音频数据的时间为t2,根据t1和t2可知,第二检测音频数据从硬件抽象层1202到音频采集程序1203接收到第二检测音频数据的延时大约在0ms。因此,将服务器从操作系统中获取音频数据的耗时控制在40ms左右,极大地缩短了服务器获取音频数据的耗时。
需要说明的是,在一种可能实现方式中,服务器不仅将云应用程序生成的音频数据发送至终端,还会将云应用程序生成的视频数据发送至终端,通过对终端播放的音频和视频进行延时检测,发现音频与视频并不同步,从播放视频到播放该视频对应的音频,延时大约在0.37秒左右。当视频与音频的延时高于0.3秒时,人耳能够感觉到比较明显的延时,从而影响用户的体验。如果采样本申请实施例提供的音频数据处理方法,能够减小服务器下发音频数据的延时,将视频到音频的延时降低至0.242秒左右,从而人耳感觉不到明显的延时,提高用户的体验。
本申请实施例提供的延时获取方法,通过检测应用程序发出检测音频数据,获取音频采集程序接收检测音频数据的接收时间,根据发送时间与接收时间的时间差,能够准确得到音频数据从检测应用程序传输至音频采集程序所消耗的时长,也即是得到服务器获取音频数据的时长,该时长能够代表服务器下发音频数据的延时,后续根据该时长能够确定服务器获取音频数据的延时是否会影响音频数据的播放效果,是否会影响终端用户的听觉效果,进而确定是否继续对服务器进行改进,为开发人员提供了较好的改进依据。
另外,还能够获取硬件抽象层接收到检测音频数据的接收时间,进一步得到检测音频数据从检测应用程序传输至硬件抽象层所消耗的时长,以及检测音频数据从硬件抽象层传输至音频采集程序所消耗的时长,从而准确得到音频数据在每个传输阶段所消耗的时长,以便后续开发人员有针对性对服务器进行改进。
可选地,服务器中的中转程序为重采样程序,如图7所示,该服务器700包括云应用程序701、系统框架702、重采样程序703和音频采集程序704。在上述图7的基础上,本申请实施例还提供了一种延时获取方法,能够用于检测图7所示的服务器下发音频数据的延时。图13是本申请实施例提供的一种延时获取方法的流程图,本申请实施例的执行主体为图7所示的服务器,参见图13,该方法包括以下步骤。
1301、将检测应用程序的第一检测音频数据输入至系统框架,记录第一检测音频数据的发送时间。
1302、通过系统框架对第一检测音频数据进行处理,得到第二检测音频数据,将第二检测音频数据发送至重采样程序。
1303、记录重采样程序接收第二检测音频数据的第二接收时间,获取发送时间与第二接收时间之间的第二时间差,第二时间差表示检测音频数据从检测应用程序传输至硬件抽象层的延时。
1304、通过重采样程序,根据重采样程序与音频采集程序之间的通信连接,将第二检测音频数据发送至音频采集程序,记录音频采集程序接收第二检测音频数据的第一接收时间,音频采集程序用于将第二检测音频数据发送至终端本地的应用程序。
1305、获取发送时间和第一接收时间之间的第一时间差,第一时间差表示检测音频数据从检测应用程序传输至音频采集程序的延时。
需要说明的是,上述步骤1301至步骤1305与上述步骤1101至步骤1105类似,区别仅在于将上述步骤1101至步骤1105中是将第二检测音频数据发送至硬件抽象层,而上述步骤1301至步骤1305是将第二检测音频数据发送至重采样程序。
如图14所示,在实际获取延时的过程中,检测应用程序1401发送第一检测音频数据的发送时间为t0,重采样程序1402接收第二检测音频数据的第二接收时间为t1,音频数据从检测应用程序1401到重采样程序1402的延时大约为40ms(毫秒),音频采集程序1403接收到第二检测音频数据的第一接收时间为t2,音频数据从重采样程序1402到音频采集程序1403的延时大约在0ms。因此,将服务器从操作系统中获取音频数据的耗时控制在40ms左右,极大地缩短了服务器获取音频数据的耗时。
需要说明的是,在一种可能实现方式中,重采样程序还用于将第二检测音频数据发送至录制线程。录制线程对第二检测音频数据进行录制,得到第三检测音频数据,音频采集程序从录制线程中读取第三检测音频数据。
在一种可能实现方式中,记录程序还用于记录音频采集程序接收第三音频数据的第三接收时间,服务器获取发送时间和第三接收时间之间的第三时间差,第三时间差表示检测音频数据从检测应用程序、重采样程序和录制线程传输至音频采集程序的延时。
为了使记录的第三接收时间更加准确,可选地,在重采样程序与音频采集程序未建立通信连接的情况下,来获取音频数据从检测应用程序输出、经过重采样程序和录制线程到达音频采集程序的延时。这样,音频采集程序仅接收到第三检测音频数据,而接收不到第二检测音频数据,因此,得到的第三接收时间更加准确。
如图15所示,在实际获取延时的过程中,检测应用程序1501发送第一检测音频数据的发送时间为t0,重采样程序1502接收第二检测音频数据的第二接收时间为t1,音频数据从检测应用程序1501到重采样程序1502的延时大约为40ms(毫秒),音频采集程序1503接收到第二检测音频数据的第三接收时间为t2,音频数据从重采样程序1502到音频采集程序1503的延时大约在90ms。
需要说明的是,在一种可能实现方式中,服务器不仅将云应用程序生成的音频数据发送至终端,还 会将云应用程序生成的视频数据发送至终端,通过对终端播放的音频和视频进行延时检测,发现音频与视频并不同步,从播放视频到播放该视频对应的音频,延时大约在0.37秒左右。当视频与音频的延时高于0.3秒时,人耳能够感觉到比较明显的延时,从而影响用户的体验。如果采样本申请实施例提供的音频数据处理方法,能够减小服务器下发音频数据的延时,将视频到音频的延时降低至0.242秒左右,从而人耳感觉不到明显的延时,提高用户的体验。
本申请实施例提供的延时获取方法,通过检测应用程序发出检测音频数据,获取音频采集程序接收检测音频数据的接收时间,根据发送时间与接收时间的时间差,能够准确得到音频数据从检测应用程序传输至音频采集程序所消耗的时长,也即是得到服务器获取音频数据的时长,该时长能够代表服务器下发音频数据的延时,后续根据该时长能够确定服务器获取音频数据的延时是否会影响音频数据的播放效果,是否会影响终端用户的听觉效果,进而确定是否继续对服务器进行改进,为开发人员提供了较好的改进依据。
另外,还能够获取重采样程序接收到检测音频数据的接收时间,进一步得到检测音频数据从检测应用程序传输至重采样程序所消耗的时长,以及检测音频数据从重采样程序传输至音频采集程序所消耗的时长,从而准确得到音频数据在每个传输阶段所消耗的时长,以便后续开发人员有针对性对服务器进行改进。
需要说明的是,本申请实施例提供的两种音频数据处理方法,对于减小服务器下发音频数据的延时的效果几乎相同,表1是采用本申请实施例提供的任一种音频数据处理方法,得到的视频到音频的延时,以及采用相关技术中其他音频数据处理方法,得到的视频到音频的延时,如表1所示:
表1
Figure PCTCN2021097794-appb-000001
其中,产品A采用本申请实施例提供的音频数据处理方法,而产品B和产品C采用其他音频数据处理方法。其中,产品A、产品B和产品C提供有至少一种类型的游戏,该至少一种类型的游戏的应用程序在服务器中运行,也即是,服务器中运行有至少一种类型的云应用程序。
图16是本申请实施例提供的一种服务器的结构示意图,参见图16,该服务器包括:应用运行模块1601、框架运行模块1602、中转模块1603和采集模块1604;
该应用运行模块1601,用于将云应用程序的第一音频数据输入至该框架运行模块1602;
该框架运行模块1602,用于对该第一音频数据进行处理,得到第二音频数据,将该第二音频数据发 送至该中转模块1603;
该中转模块1603,用于根据该中转模块1603与该采集模块1604之间的通信连接,将该第二音频数据发送至该采集模块1604,该采集模块1604用于将该第二音频数据发送至终端本地的应用程序。
可选地,该框架运行模块1602,用于若该中转模块1603与该采集模块1604已建立通信连接,则将该第二音频数据发送至该中转模块1603;
该框架运行模块1602,用于若该中转模块1603还未与该采集模块1604建立通信连接,则控制该中转模块1603与该采集模块1604建立通信连接,在该中转模块1603与该采集模块1604成功建立通信连接的情况下,将该第二音频数据发送至该中转模块1603。
可选地,该框架运行模块1602,用于对该第一音频数据进行混音处理,得到第三音频数据,按照音频参数对该第三音频数据进行处理,得到该第二音频数据。
可选地,该框架运行模块1602用于执行以下至少一项:
该音频参数包括目标采样率,按照该目标采样率对该第三音频数据进行重采样处理,得到该第二音频数据;
该音频参数包括目标通道数,按照该目标通道数对该第三音频数据进行通道数转换处理,得到该第二的音频数据;
该音频参数包括目标采样深度,按照该目标采样深度对该第三音频数据进行重采样处理,得到该第二音频数据。
如图17所示,可选地,该框架运行模块1602包括处理单元1612,该处理单元1612用于对该第一音频数据进行混音处理,得到该第三音频数据;
该处理单元1612,用于按照音频参数,对该第三音频数据进行处理,得到该第二音频数据。
可选地,该中转模块1603为硬件抽象层运行模块1613,该框架运行模块1602,用于从该硬件抽象层运行模块1613获取该音频参数,该硬件抽象层运行模块行模块1613存储有音频参数。
可选地,该中转模块1603为硬件抽象层运行模块1613,该框架运行模块1602,用于调用该硬件抽象层运行模块1613的写入接口,将该第二音频数据写入该硬件抽象层运行模块1613。
如图18所示,可选地,该中转模块1603为重采样模块1623,该框架运行模块1602,还用于从该重采样模块1623获取该音频参数,该重采样模块1623配置有音频参数。
可选地,该中转模块1603为重采样模块1623;
该重采样模块1623,还用于对该第二音频数据进行重采样处理,得到处理后的第二音频数据;
该重采样模块1623,用于根据该重采样模块1623与该采集模块1604之间的通信连接,将该处理后的第二音频数据发送至该采集模块1604。
可选地,该框架运行模块1602包括录制单元1622;
该重采样模块1623,用于将该第二音频数据发送至该录制单元1622;
该录制单元1622,用于对该第二音频数据进行录制,得到第三音频数据;
该采集模块1604,用于调用音频录制接口,从该录制单元1622中读取该第三音频数据。
可选地,该采集模块1604,用于丢弃该第三音频数据,将该第二音频数据发送至终端本地的应用程序。
图19是本申请实施例提供的一种服务器的结构示意图,参见图19,该服务器包括:应用运行模块1901、 框架运行模块1902、中转模块1903、采集模块1904、记录模块1905和获取模块1906;
该应用运行模块1901,用于将检测应用程序的第一检测音频数据输入至该框架运行模块1902;
该记录模块1905,用于记录该第一检测音频数据的发送时间;
该框架运行模块1902,用于对该第一检测音频数据进行处理,得到第二检测音频数据,将该第二检测音频数据发送至该中转模块1903;
该中转模块1903,用于根据该中转模块1903与该采集模块1904之间的通信连接,将该第二检测音频数据发送至该采集模块1904,该采集模块1904用于将该第二检测音频数据发送至终端本地的应用程序;
该记录模块1905,还用于记录该采集模块1904接收该第二检测音频数据的第一接收时间;
该获取模块1906,用于获取该发送时间和该第一接收时间的第一时间差,该第一时间差表示该检测音频数据从该应用运行模块1901传输至该采集模块1904的延时。
可选地,该记录模块1905,还用于记录该中转模块接收该第二检测音频数据的第二接收时间;
该获取模块1906,用于获取该发送时间与该第二接收时间之间的第二时间差,该第二时间差表示该检测音频数据从该应用运行模块1901传输至该中转模块1903的延时。
可选地,该中转模块1903为硬件抽象层运行模块;或者,该中转模块903为重采样模块。
图20是本申请实施例提供的一种终端的结构框图。该终端2000用于执行上述实施例中终端执行的步骤,可选地,该终端2000是便携式移动终端,比如:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端2000还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端2000包括有:处理器2001和存储器2002。
处理器2001可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器2001可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器2001也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器2001可以集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器2001还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器2002可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器2002还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器2002中的非暂态的计算机可读存储介质用于存储至少一个程序代码,该至少一个程序代码用于被处理器2001所执行以实现本申请中方法实施例提供的音频数据处理方法,或者延时获取方法。
在一些实施例中,终端2000还可选包括有:外围设备接口2003和至少一个外围设备。处理器2001、存储器2002和外围设备接口2003之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口2003相连。具体地,外围设备包括:射频电路2004、显示屏2005、摄像头组件2006、音频电路2007、定位组件2008和电源2009中的至少一种。
本领域技术人员可以理解,图20中示出的结构并不构成对终端2000的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
图21是本申请实施例提供的一种服务器的结构示意图,该服务器2100可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(Central Processing Units,CPU)2101和一个或一个以上的存储器2102,其中,存储器2102中存储有至少一条程序代码,至少一条程序代码由处理器2101加载并执行以实现上述各个方法实施例提供的方法。当然,该服务器还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器还可以包括其他用于实现设备功能的部件,在此不做赘述。
服务器2100可以用于执行上述音频数据处理方法中服务器所执行的步骤;或者,用于执行上述延时获取方法中服务器所执行的步骤。
本申请实施例还提供了一种计算机设备,该计算机设备包括处理器和存储器,该存储器中存储有至少一条程序代码,该至少一条程序代码由该处理器加载并执行以实现上述实施例所述的音频数据处理方法中所执行的操作;或者,以实现上述实施例所述的延时获取方法中所执行的操作。
另外,本申请实施例还提供了一种存储介质,所述存储介质用于存储计算机程序,所述计算机程序用于执行上述实施例提供的方法。
本申请实施例还提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例提供的方法。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来程序代码相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (17)

  1. 一种音频数据处理方法,所述方法应用于服务器,所述服务器包括云应用程序、系统框架、中转程序和音频采集程序,所述方法包括:
    将所述云应用程序的第一音频数据输入至所述系统框架;
    通过所述系统框架对所述第一音频数据进行处理,得到第二音频数据,将所述第二音频数据发送至所述中转程序;
    通过所述中转程序,根据所述中转程序与所述音频采集程序之间的通信连接,将所述第二音频数据发送至所述音频采集程序,所述音频采集程序用于将所述第二音频数据发送至终端本地的应用程序。
  2. 根据权利要求1所述的方法,所述将所述第二音频数据发送至所述中转程序,包括:
    若所述中转程序与所述音频采集程序已建立通信连接,则将所述第二音频数据发送至所述中转程序;
    若所述中转程序还未与所述音频采集程序建立通信连接,则控制所述中转程序与所述音频采集程序建立通信连接,在所述中转程序与所述音频采集程序成功建立通信连接的情况下,将所述第二音频数据发送至所述中转程序。
  3. 根据权利要求1所述的方法,所述通过所述系统框架对所述第一音频数据进行处理,得到第二音频数据,包括:
    通过所述系统框架,对所述第一音频数据进行混音处理,得到第三音频数据;
    通过所述系统框架,按照音频参数对所述第三音频数据进行处理,得到所述第二音频数据。
  4. 根据权利要求3所述的方法,所述通过所述系统框架,按照音频参数对所述第三音频数据进行处理,得到所述第二音频数据,包括以下至少一项:
    所述音频参数包括目标采样率,通过所述系统框架,按照所述目标采样率对所述第三音频数据进行重采样处理,得到所述第二音频数据;
    所述音频参数包括目标通道数,通过所述系统框架,按照所述目标通道数对所述第三音频数据进行通道数转换处理,得到所述第二音频数据;
    所述音频参数包括目标采样深度,通过所述系统框架,按照所述目标采样深度对所述第三音频数据进行重采样处理,得到所述第二音频数据。
  5. 根据权利要求3所述的方法,所述系统框架包括处理线程,所述通过所述系统框架,对所述第一音频数据进行混音处理,得到第三音频数据,包括:
    通过所述处理线程对所述第一音频数据进行混音处理,得到所述第三音频数据;
    所述通过所述系统框架,按照音频参数对所述第三音频数据进行处理,得到所述第二音频数据,包括;
    通过所述处理线程,按照所述音频参数对所述第三音频数据进行处理,得到所述第二音频数据。
  6. 根据权利要求3所述的方法,所述中转程序为硬件抽象层,所述通过所述系统框架,按照音频参数对所述第三音频数据进行处理,得到所述第二音频数据之前,所述方法还包括:
    通过所述系统框架从所述硬件抽象层获取所述音频参数,所述硬件抽象层存储有所述音频参数。
  7. 根据权利要求1所述的方法,所述中转程序为硬件抽象层,所述将所述第二音频数据发送至所述中转程序,包括:
    通过所述系统框架调用所述硬件抽象层的写入接口,将所述第二音频数据写入所述硬件抽象层。
  8. 根据权利要求1所述的方法,所述中转程序为重采样程序,所述通过所述中转程序,根据所述中转程序与所述音频采集程序之间的通信连接,将所述第二音频数据发送至所述音频采集程序,包括:
    通过所述重采样程序,对所述第二音频数据进行重采样处理,得到处理后的第二音频数据;
    通过所述重采样程序,根据所述重采样程序与所述音频采集程序之间的通信连接,将所述处理后的第二音频数据发送至所述音频采集程序。
  9. 根据权利要求1所述的方法,所述中转程序为重采样程序,所述系统框架包括录制线程,所述将所述第二音频数据发送至所述中转程序之后,所述方法还包括:
    通过所述重采样程序,将所述第二音频数据发送至所述录制线程;
    通过所述录制线程对所述第二音频数据进行录制,得到第三音频数据;
    通过所述音频采集程序,调用音频录制接口从所述录制线程中读取所述第三音频数据。
  10. 根据权利要求9所述的方法,所述方法还包括:
    通过所述音频采集程序,丢弃所述第三音频数据,将所述第二音频数据发送至所述终端本地的应用程序。
  11. 一种延时获取方法,所述方法应用于服务器,所述服务器包括检测应用程序、系统框架、中转程序和音频采集程序,所述方法包括:
    将所述检测应用程序的第一检测音频数据输入至所述系统框架,记录所述第一检测音频数据的发送时间;
    通过所述系统框架对所述第一检测音频数据进行处理,得到第二检测音频数据,将所述第二检测音频数据发送至所述中转程序;
    通过所述中转程序,根据所述中转程序与所述音频采集程序之间的通信连接,将所述第二检测音频数据发送至所述音频采集程序,记录所述音频采集程序接收所述第二检测音频数据的第一接收时间,所述音频采集程序用于将所述第二检测音频数据发送至终端本地的应用程序;
    获取所述发送时间和所述第一接收时间之间的第一时间差,所述第一时间差表示检测音频数据从所述检测应用程序传输至所述音频采集程序的延时。
  12. 根据权利要求11所述的方法,所述将所述第二检测音频数据发送至所述中转程序之后,所述方法还包括:
    记录所述中转程序接收所述第二检测音频数据的第二接收时间;
    获取所述发送时间与所述第二接收时间之间的第二时间差,所述第二时间差表示检测音频数据从所述检测应用程序传输至所述中转程序的延时。
  13. 一种服务器,所述服务器包括应用运行模块、框架运行模块、中转模块和采集模块;
    所述应用运行模块,用于将云应用程序的第一音频数据输入至所述框架运行模块;
    所述框架运行模块,用于对所述第一音频数据进行处理,得到第二音频数据,将所述第二音频数据发送至所述中转模块;
    所述中转模块,用于根据所述中转模块与所述采集模块之间的通信连接,将所述第二音频数据发送至所述采集模块,所述采集模块用于将所述第二音频数据发送至终端本地的应用程序。
  14. 一种服务器,所述服务器包括应用运行模块、框架运行模块、中转模块、采集模块、记录模块和获取模块,
    所述应用运行模块,用于将检测应用程序的第一检测音频数据输入至所述框架运行模块;
    所述记录模块,用于记录所述第一检测音频数据的发送时间;
    所述框架运行模块,用于对所述第一检测音频数据进行处理,得到第二检测音频数据,将所述第二检测音频数据发送至所述中转模块;
    所述中转模块,用于根据所述中转模块与所述采集模块之间的通信连接,将所述第二检测音频数据发送至所述采集模块,所述采集模块用于将所述第二检测音频数据发送至终端本地的应用程序;
    所述记录模块,还用于记录所述采集模块接收所述第二检测音频数据的第一接收时间;
    所述获取模块,用于获取所述发送时间和所述第一接收时间的第一时间差,所述第一时间差表示检测音频数据从所述应用运行模块传输至所述采集模块的延时。
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序用于执行如权利要求1至10任一项所述的音频数据处理方法;或者,用于执行如权利要求11或12所述的延时获取方法。
  16. 一种服务器,所述服务器包括:
    处理器、通信接口、存储器和通信总线;
    其中,所述处理器、所述通信接口和所述存储器通过所述通信总线完成相互间的通信;所述通信接口为通信模块的接口;
    所述存储器,用于存储程序代码,并将所述程序代码传输给所述处理器;
    所述处理器,用于调用存储器中程序代码的指令执行如权利要求1至10任一项所述的音频数据处理方法;或者,执行如权利要求11或12所述的延时获取方法。
  17. 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行如权利要求1至10任一项所述的音频数据处理方法;或者,执行如权利要求11或12所述的延时获取方法。
PCT/CN2021/097794 2020-07-23 2021-06-02 音频数据处理方法、服务器及存储介质 WO2022017007A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP21846422.0A EP4047471A4 (en) 2020-07-23 2021-06-02 AUDIO DATA PROCESSING METHOD, SERVER AND STORAGE MEDIA
KR1020227017498A KR20220080198A (ko) 2020-07-23 2021-06-02 오디오 데이터 프로세싱 방법, 서버, 및 저장 매체
JP2022548829A JP7476327B2 (ja) 2020-07-23 2021-06-02 オーディオデータ処理方法、遅延時間取得方法、サーバ、及びコンピュータプログラム
US17/737,886 US20220261217A1 (en) 2020-07-23 2022-05-05 Audio data processing method, server, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010716978.3 2020-07-23
CN202010716978.3A CN111596885B (zh) 2020-07-23 2020-07-23 音频数据处理方法、服务器及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/737,886 Continuation US20220261217A1 (en) 2020-07-23 2022-05-05 Audio data processing method, server, and storage medium

Publications (1)

Publication Number Publication Date
WO2022017007A1 true WO2022017007A1 (zh) 2022-01-27

Family

ID=72186622

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097794 WO2022017007A1 (zh) 2020-07-23 2021-06-02 音频数据处理方法、服务器及存储介质

Country Status (6)

Country Link
US (1) US20220261217A1 (zh)
EP (1) EP4047471A4 (zh)
JP (1) JP7476327B2 (zh)
KR (1) KR20220080198A (zh)
CN (1) CN111596885B (zh)
WO (1) WO2022017007A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114879930A (zh) * 2022-07-07 2022-08-09 北京麟卓信息科技有限公司 一种安卓兼容环境的音频输出优化方法
CN116132413A (zh) * 2023-01-13 2023-05-16 深圳市瑞云科技股份有限公司 一种基于WebRTC实时语音透传改进方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111596885B (zh) * 2020-07-23 2020-11-17 腾讯科技(深圳)有限公司 音频数据处理方法、服务器及存储介质
CN112206520B (zh) * 2020-10-21 2022-09-02 深圳市欢太科技有限公司 实时音频采集方法、系统、服务端、客户端及存储介质
CN114338621A (zh) * 2021-11-30 2022-04-12 北京金山云网络技术有限公司 一种云应用程序运行方法、系统及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012086917A2 (ko) * 2010-12-24 2012-06-28 (주)케이티 클라우드 컴퓨팅 환경에서 게임 서비스 제공 방법, 클라우드 컴퓨팅 서버, 및 클라우드 컴퓨팅 시스템
CN105491021A (zh) * 2015-11-24 2016-04-13 华东师范大学 一种Android云应用服务器及Android云应用服务器系统
CN106961421A (zh) * 2017-02-17 2017-07-18 浙江大学 一种Android系统服务端、远程桌面音频重定向方法及系统
CN110034828A (zh) * 2018-01-12 2019-07-19 网宿科技股份有限公司 云应用的音频采集方法及服务器
CN110694267A (zh) * 2019-11-14 2020-01-17 珠海金山网络游戏科技有限公司 一种云游戏实现方法及装置
CN110841278A (zh) * 2019-11-14 2020-02-28 珠海金山网络游戏科技有限公司 一种云游戏实现方法及装置
CN111596885A (zh) * 2020-07-23 2020-08-28 腾讯科技(深圳)有限公司 音频数据处理方法、服务器及存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1113361A1 (en) * 2000-01-03 2001-07-04 Wimba.Com S.A. Process of communication between an applet and a local agent using a socket communication channel
US8446823B2 (en) * 2009-06-23 2013-05-21 Magor Communications Corporation Method of managing the flow of time-sensitive data over packet networks
CN103023872B (zh) * 2012-11-16 2016-01-06 杭州顺网科技股份有限公司 一种云游戏服务平台
CN104093046B (zh) * 2013-04-01 2019-02-15 天津米游科技有限公司 一种基于云游戏的视频插播系统及方法
CN104637488B (zh) 2013-11-07 2018-12-25 华为终端(东莞)有限公司 声音处理的方法和终端设备
CN105280212A (zh) * 2014-07-25 2016-01-27 中兴通讯股份有限公司 混音播放方法及装置
CN111324576B (zh) 2018-12-14 2023-08-08 深圳市优必选科技有限公司 一种录音数据保存的方法、装置、存储介质及终端设备
CN109947387B (zh) * 2019-03-28 2022-10-21 阿波罗智联(北京)科技有限公司 音频采集方法、音频播放方法、系统、设备及存储介质
CN111294438B (zh) * 2020-01-22 2021-06-01 华为技术有限公司 实现立体声输出的方法及终端

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012086917A2 (ko) * 2010-12-24 2012-06-28 (주)케이티 클라우드 컴퓨팅 환경에서 게임 서비스 제공 방법, 클라우드 컴퓨팅 서버, 및 클라우드 컴퓨팅 시스템
CN105491021A (zh) * 2015-11-24 2016-04-13 华东师范大学 一种Android云应用服务器及Android云应用服务器系统
CN106961421A (zh) * 2017-02-17 2017-07-18 浙江大学 一种Android系统服务端、远程桌面音频重定向方法及系统
CN110034828A (zh) * 2018-01-12 2019-07-19 网宿科技股份有限公司 云应用的音频采集方法及服务器
CN110694267A (zh) * 2019-11-14 2020-01-17 珠海金山网络游戏科技有限公司 一种云游戏实现方法及装置
CN110841278A (zh) * 2019-11-14 2020-02-28 珠海金山网络游戏科技有限公司 一种云游戏实现方法及装置
CN111596885A (zh) * 2020-07-23 2020-08-28 腾讯科技(深圳)有限公司 音频数据处理方法、服务器及存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIANG PENGFEI: "The Design and Implementation of Multimedia Redirection Oriented for Android Mobile Desktop Cloud", MASTER THESIS, TIANJIN POLYTECHNIC UNIVERSITY, CN, no. 12, 15 January 2019 (2019-01-15), CN , XP055889780, ISSN: 1674-0246 *
See also references of EP4047471A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114879930A (zh) * 2022-07-07 2022-08-09 北京麟卓信息科技有限公司 一种安卓兼容环境的音频输出优化方法
CN114879930B (zh) * 2022-07-07 2022-09-06 北京麟卓信息科技有限公司 一种安卓兼容环境的音频输出优化方法
CN116132413A (zh) * 2023-01-13 2023-05-16 深圳市瑞云科技股份有限公司 一种基于WebRTC实时语音透传改进方法

Also Published As

Publication number Publication date
KR20220080198A (ko) 2022-06-14
EP4047471A4 (en) 2023-01-25
JP7476327B2 (ja) 2024-04-30
JP2023516905A (ja) 2023-04-21
EP4047471A1 (en) 2022-08-24
CN111596885A (zh) 2020-08-28
CN111596885B (zh) 2020-11-17
US20220261217A1 (en) 2022-08-18

Similar Documents

Publication Publication Date Title
WO2022017007A1 (zh) 音频数据处理方法、服务器及存储介质
WO2019174472A1 (zh) 画质参数调节方法、装置、终端及存储介质
EP3547715B1 (en) Method and apparatus for reducing continuous-wakeup delay of bluetooth loudspeaker, and bluetooth loudspeaker
US20220241686A1 (en) Information processing method, system, apparatus, device, and storage medium
WO2022100304A1 (zh) 应用内容跨设备流转方法与装置、电子设备
CN109194972B (zh) 直播流获取方法、装置、计算机设备及存储介质
WO2022037261A1 (zh) 音频播放、设备管理方法及装置
WO2019201340A1 (zh) 处理器核心调度方法、装置、终端及存储介质
JP7500574B2 (ja) スタンドアロンプログラムの実行方法、装置、デバイス及びコンピュータプログラム
CN107707972A (zh) 用于屏幕共享的数据处理方法、装置、系统及电子设备
US20200112768A1 (en) Network-based media device upgrading system
CN112565876B (zh) 投屏方法、装置、设备、系统及存储介质
WO2023165320A1 (zh) 播放参数配置方法及装置
WO2023116311A1 (zh) 数据交互方法、装置、设备及存储介质
US11190842B2 (en) USB-based media device upgrading system
WO2023185589A1 (zh) 音量控制方法及电子设备
CN107395493B (zh) 一种基于意图Intent分享消息的方法及装置
US20230297324A1 (en) Audio Control Method, System, and Electronic Device
WO2022252928A1 (zh) 投屏方法、装置、无线终端、投屏设备及存储介质
US11516586B2 (en) Contextual latency configuration for isochronous audio transport
US20230114327A1 (en) Method and system for generating media content
US12058186B2 (en) Private audio communication in a conference call
WO2024093922A1 (zh) 音频控制方法、存储介质、程序产品及电子设备
WO2024045877A1 (zh) 设备的连接方法、装置、电子设备以及存储介质
WO2023169202A1 (zh) 视频流数据获取方法、装置、电子设备和计算机可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21846422

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20227017498

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021846422

Country of ref document: EP

Effective date: 20220517

ENP Entry into the national phase

Ref document number: 2022548829

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE