CN109448676B

CN109448676B - Audio processing method, device and storage medium

Info

Publication number: CN109448676B
Application number: CN201811334209.6A
Authority: CN
Inventors: 夏佳权
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2018-11-09
Filing date: 2018-11-09
Publication date: 2024-01-19
Anticipated expiration: 2038-11-09
Also published as: CN109448676A

Abstract

The invention discloses an audio processing method, an audio processing device and a storage medium, and belongs to the field of information processing. The method is applied to the audio client and comprises the following steps: acquiring a target sampling rate of a terminal; starting an accompaniment audio playing function and an audio recording function; after the accompaniment audio playing function is started, if the preset sampling rate of the accompaniment audio to be played is detected to be inconsistent with the target sampling rate, resampling the accompaniment audio to obtain target accompaniment audio data with the target sampling rate, and controlling the terminal to play the target accompaniment audio data; after the audio recording function is started, the control terminal records the audio according to the target sampling rate; and synthesizing the played audio data and the recorded audio data to obtain synthesized audio. According to the invention, the audio client performs resampling processing on the accompaniment audio, so that the problems of delay of a terminal playing system and misalignment of rhythms of the accompaniment audio and recorded audio in the synthesized audio are avoided.

Description

Audio processing method, device and storage medium

Technical Field

The present invention relates to the field of information processing, and in particular, to an audio processing method, an audio processing device, and a storage medium.

Background

The current audio client can provide a karaoke function, and a user can not only use accompaniment as background sound to perform song singing, but also record the singed song through the karaoke function. For example, when a user sings, the terminal provided with the audio client can record the accompaniment audio while playing the accompaniment audio, then synthesize the played accompaniment with the recorded voice, and synthesize the audio as a singing recording result.

In the related art, when playing the accompaniment audio through the playing system, the terminal generally samples the accompaniment audio directly according to the sampling rate of the playing system and plays the sampled data, and if the sampling rate of the playing system is detected to be inconsistent with the preset sampling rate of the accompaniment audio in the playing process, the sampling rate of the playing system is adjusted to the preset sampling rate, then the accompaniment audio is sampled according to the preset sampling rate, and the sampled data is played. Similarly, the recorded audio also has a preset sampling rate, and when the terminal records the audio through the recording system, the terminal can also adjust if detecting that the sampling rate of the recording system is inconsistent with the preset sampling rate.

In the related art, after playing and recording are started, the sampling rate is judged and adjusted by the system hardware, so that the delay of the playing system and the recording system is caused, and the delay time is uncertain, therefore, when the played accompaniment audio and the recorded audio are synthesized, the problem of misalignment of the rhythms of the played accompaniment audio and the recorded audio possibly occurs, and the synthesis effect is affected.

Disclosure of Invention

The embodiment of the invention provides an audio processing method, an audio processing device and a storage medium, which can be used for solving the problem that the rhythms of accompaniment audio and recorded audio in synthesized audio in the related technology are not aligned. The technical scheme is as follows:

in one aspect, there is provided an audio processing method, the method including:

acquiring a target sampling rate of a terminal;

starting an accompaniment audio playing function and an audio recording function;

after the accompaniment audio playing function is started, if the preset sampling rate of the accompaniment audio to be played is detected to be inconsistent with the target sampling rate, resampling the accompaniment audio according to the target sampling rate to obtain target accompaniment audio data, and controlling the terminal to play the target accompaniment audio data, wherein the sampling rate of the target accompaniment audio data is the target sampling rate;

After the audio recording function is started, controlling the terminal to record audio according to the target sampling rate;

and synthesizing the played audio data and the recorded audio data to obtain synthesized audio.

Optionally, the resampling processing is performed on the accompaniment audio according to the target sampling rate to obtain target accompaniment audio data, including:

decoding the accompaniment audio to obtain first PCM (Pulse Code Modulation), pulse code modulation audio data;

resampling the first PCM audio data according to the target sampling rate to obtain second PCM audio data, wherein the sampling rate of the second PCM audio data is the target sampling rate;

the second PCM audio data is determined as the target accompaniment audio data.

Optionally, the resampling processing of the first PCM audio data according to the target sampling rate includes:

if the target sampling rate is greater than the preset sampling rate, up-sampling the first PCM audio data according to the target sampling rate to obtain the second PCM audio data;

and if the target sampling rate is smaller than the preset sampling rate, downsampling the first PCM audio data according to the target sampling rate to obtain the second PCM audio data.

Optionally, the acquiring the target sampling rate of the terminal includes:

and calling a system interface of the terminal, and acquiring a target sampling rate of the terminal through the system interface.

Optionally, the synthesizing the played audio data and the recorded audio data to obtain the synthesized audio includes:

synthesizing the played audio data and the recorded audio data;

and encoding the synthesized audio data to obtain the synthesized audio.

Optionally, after the synthesizing process is performed on the played audio data and the recorded audio data to obtain the synthesized audio, the method further includes:

and playing the synthesized audio.

Optionally, the target sampling rate is a sampling rate supported by the terminal and located within a preset sampling rate range, or is a sampling rate with optimal performance among the sampling rates supported by the terminal.

In one aspect, there is provided an audio processing apparatus, the apparatus comprising:

the acquisition module is used for acquiring the target sampling rate of the terminal;

the starting module is used for starting an accompaniment audio playing function and an audio recording function;

the resampling module is used for resampling the accompaniment audio according to the target sampling rate to obtain target accompaniment audio data if the preset sampling rate of the accompaniment audio to be played is detected to be inconsistent with the target sampling rate after the accompaniment audio playing function is started, wherein the sampling rate of the target accompaniment audio data is the target sampling rate;

The first playing module is used for controlling the terminal to play the target accompaniment audio data;

the recording module is used for controlling the terminal to record the audio according to the target sampling rate after the audio recording function is started;

and the synthesis module is used for synthesizing the played audio data and the recorded audio data to obtain synthesized audio.

Optionally, the resampling module comprises:

the decoding unit is used for decoding the accompaniment audio to obtain first Pulse Code Modulation (PCM) audio data;

the resampling unit is used for resampling the first PCM audio data according to the target sampling rate to obtain second PCM audio data, and the sampling rate of the second PCM audio data is the target sampling rate;

and a determining unit for determining the second PCM audio data as the target accompaniment audio data.

Optionally, the resampling unit is specifically configured to:

Optionally, the acquiring module is specifically configured to:

Optionally, the synthesis module is specifically configured to:

synthesizing the played audio data and the recorded audio data;

and encoding the synthesized audio data to obtain the synthesized audio.

Optionally, the apparatus further comprises:

and the second playing module is used for controlling the terminal to play the synthesized audio.

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the steps of any of the methods described above.

In one aspect, a computer readable storage medium having instructions stored thereon which when executed by a processor perform the steps of any of the methods described above is provided.

The technical scheme provided by the embodiment of the invention has the beneficial effects that:

in the embodiment of the invention, the target sampling rate of the terminal is obtained by the audio client before the terminal plays the accompaniment audio, if the preset sampling rate of the accompaniment audio is detected to be inconsistent with the target sampling rate, resampling processing is carried out on the accompaniment audio according to the target sampling rate to obtain target accompaniment audio data consistent with the target sampling rate, and the terminal plays the accompaniment audio data, so that the adjustment of the sampling rate in the playing process of the terminal is avoided, further, the delay brought by a playing system is avoided, in addition, the recording system of the terminal also directly records according to the target sampling rate, and as the playing and recording sampling rates are identical, deviation between the accompaniment audio and the recorded audio in the synthesized audio is avoided, and the problem of the misalignment of the rhythm of the accompaniment audio and the recorded audio in the synthesized audio is avoided.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment in which embodiments of the present invention are directed;

FIG. 2 is a flow chart of an audio processing method according to an embodiment of the present invention;

FIG. 3 is a flow chart of another audio processing method according to an embodiment of the present invention;

fig. 4 is a block diagram of an audio processing apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram of a terminal 500 according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.

Before explaining the embodiment of the present invention in detail, an application scenario of the embodiment of the present invention is described.

The audio processing method provided by the embodiment of the invention can be applied to a K song scene, for example, a scene in which a K song is carried out by using an audio client installed in a terminal and a singing song is recorded. However, when recording and synthesizing songs sung by a user, the situation that the rhythms of accompaniment and human voice in synthesized audio cannot be aligned often occurs, and the audio processing method provided by the embodiment of the invention solves the problem that the rhythms of accompaniment and human voice in synthesized audio cannot be aligned by resampling the accompaniment audio through an audio client before the accompaniment audio is played.

Next, an implementation environment according to an embodiment of the present invention will be described.

Fig. 1 is a schematic diagram of an implementation environment according to an embodiment of the present invention, and as shown in fig. 1, the implementation environment includes a terminal 10, where the terminal 10 may be a mobile phone, a tablet computer, or a computer. The terminal 10 includes a playing system 11 and a recording system 12, wherein the playing system 11 refers to audio playing hardware in the terminal 10, and the recording system 12 refers to audio recording hardware in the terminal 10.

In addition, the terminal 10 is further provided with an audio client 20, and the audio client 20 can be respectively connected with the playing system 11 and the recording system 12, and can respectively control the playing system 11 and the recording system 12 to play and record, and perform synthesis processing on the played audio and the recorded audio, so as to provide a singing function for a user. In addition, the audio client 20 may also provide the user with a plurality of audio files for the user to select a desired accompaniment audio therefrom to sing as background sound.

Fig. 2 is a flowchart of an audio processing method according to an embodiment of the present invention, where the method is applied to an audio client, for example, may be applied to the audio client 20 shown in fig. 1. Referring to fig. 1, the method includes:

Step 201: and obtaining the target sampling rate of the terminal.

Step 102: and starting an accompaniment audio playing function and an audio recording function.

Step 203: after the accompaniment audio playing function is started, if the preset sampling rate of the accompaniment audio to be played is detected to be inconsistent with the target sampling rate, resampling processing is carried out on the accompaniment audio according to the target sampling rate, and target accompaniment audio data is obtained, wherein the sampling rate of the target accompaniment audio data is the target sampling rate.

Step 204: and the control terminal plays the target accompaniment audio data.

Step 205: after the audio recording function is started, the control terminal records the audio according to the target sampling rate.

Step 206: and synthesizing the played audio data and the recorded audio data to obtain synthesized audio.

Optionally, resampling the accompaniment audio according to the target sampling rate to obtain target accompaniment audio data, including:

decoding the accompaniment audio to obtain first PCM audio data;

resampling the first PCM audio data according to a target sampling rate to obtain second PCM audio data, wherein the sampling rate of the second PCM audio data is the target sampling rate;

the second PCM audio data is determined as target accompaniment audio data.

Optionally, resampling the first PCM audio data according to the target sample rate comprises:

if the target sampling rate is greater than the preset sampling rate, up-sampling the first PCM audio data according to the target sampling rate to obtain second PCM audio data;

and if the target sampling rate is smaller than the preset sampling rate, downsampling the first PCM audio data according to the target sampling rate to obtain second PCM audio data.

Optionally, obtaining the target sampling rate of the terminal includes:

Optionally, synthesizing the played audio data and the recorded audio data to obtain synthesized audio, including:

Synthesizing the played audio data and the recorded audio data;

and encoding the synthesized audio data to obtain the synthesized audio.

Optionally, after synthesizing the played audio data and the recorded audio data to obtain the synthesized audio, the method further includes:

and playing the synthesized audio.

Optionally, the target sampling rate is a sampling rate supported by the terminal and within a preset sampling rate range, or is a sampling rate with optimal performance among the sampling rates supported by the terminal.

All the above optional technical solutions may be combined according to any choice to form an optional embodiment of the present invention, and the embodiments of the present invention will not be described in detail.

Fig. 3 is a flowchart of another audio processing method according to an embodiment of the present invention, where the method is applied to an audio client, for example, may be applied to the audio client 20 shown in fig. 1. Referring to fig. 3, the method includes:

step 301: and obtaining the target sampling rate of the terminal.

The target sampling rate of the terminal refers to a sampling rate supported by the terminal, and specifically may be a sampling rate supported by the terminal and located within a preset sampling rate range, or an optimal sampling rate supported by the terminal.

The optimal sampling rate refers to a sampling rate that is optimal in performance among all sampling rates supported by the terminal, such as a maximum sampling rate among all supported sampling rates. The preset sampling rate range may be set by default by the terminal or may be preset by a technician, which is not limited in the embodiment of the present invention. Optionally, the preset sampling rate range is a sampling rate range meeting a playing requirement of the accompaniment audio to be played, and the terminal is set according to the playing requirement of the accompaniment audio.

Specifically, when the audio client receives a singing instruction based on accompaniment audio to be played, the audio client may acquire a target sampling rate of the terminal. The accompaniment audio may be used as background sound in the singing process of the user, specifically, the accompaniment audio may be selected by the user from a plurality of audios provided by the audio client, or may be selected by the user from a plurality of audios stored in the terminal, which is not limited in the embodiment of the present invention.

That is, in the embodiment of the present invention, before playing and recording, the audio client first obtains the target sampling rate of the terminal, so as to play and record according to the target sampling rate of the terminal, so that the playing and recording will not generate deviation.

Specifically, the audio client may invoke a system interface of the terminal, and obtain a target sampling rate of the terminal through the system interface. The system interface is used for acquiring the target sampling rate of the terminal, and after the system interface is called, the system interface can acquire the target sampling rate of the terminal and return the target sampling rate to the audio client.

Further, the system interface may also acquire a plurality of sampling rates supported by the terminal, then determine an optimal sampling rate from the plurality of sampling rates according to a preset rule, and return the optimal sampling rate to the audio client as a target sampling rate. For example, the system interface may determine a maximum sample rate from a plurality of sample rates, and return the maximum sample rate to the audio client as a target sample rate.

It should be noted that, the system interface may be provided in advance by the terminal and directly called by the audio client, or may be specially set for the audio client and integrated into the audio client or the terminal by a technician in order to enable the audio client to obtain the target sampling rate of the terminal in the operation process.

Step 302: and starting an accompaniment audio playing function and an audio recording function.

The audio client has an accompaniment playing function and an audio recording function, and after the target sampling rate of the terminal is obtained, the audio client can simultaneously start the accompaniment audio playing function and the audio recording function so as to simultaneously play accompaniment and record human voice.

The accompaniment playing function is used for processing accompaniment audio and controlling the terminal to play the accompaniment audio, and the audio recording function is used for controlling the terminal to record audio.

Step 303: after the accompaniment audio playing function is started, whether the preset sampling rate of the accompaniment audio to be played is consistent with the target sampling rate is detected.

After the accompaniment audio playing function is started, the audio client can acquire a preset sampling rate of the accompaniment audio to be played, and then detect whether the preset sampling rate of the accompaniment audio is consistent with the target sampling rate.

The sampling rate, also called sampling frequency or sampling speed, defines the number of samples that are extracted from a continuous signal and constitute a discrete signal per second, and in colloquial terms, the sampling rate refers to how many signal samples are collected by a computer per second.

The preset sampling rate of the accompaniment audio is an inherent sampling rate of the accompaniment audio, and is generally preset by the audio client, so that the preset sampling rate can indicate how many samples should be acquired every second when audio data of the accompaniment audio is sampled. For example, a preset sampling rate of 44k/s for a certain accompaniment audio is used to indicate that 44k samples are collected from audio data of the accompaniment audio every second. Generally, the higher the sampling rate, the more information content of the sampled audio data, and the better the audio quality. However, if the sampling rate is low, the hearing of the accompaniment audio is not generally affected by the user because the time interval is very small.

Step 304: if the preset sampling rate of the accompaniment audio is detected to be inconsistent with the target sampling rate, resampling the accompaniment audio according to the target sampling rate to obtain target accompaniment audio data, wherein the sampling rate of the target accompaniment audio data is the target sampling rate.

That is, if the preset sampling rate of the accompaniment audio does not coincide with the target sampling rate, resampling processing is performed on the accompaniment audio to obtain target accompaniment audio data which coincides with the target sampling rate, i.e., resampling the accompaniment audio to the target sampling rate.

Resampling refers to the process of interpolating one type of pixel information from another type of pixel information. The resolution of the accompaniment audio can be changed by resampling the accompaniment audio, so that the sampling rate of the accompaniment audio is adjusted, and the sampling rate of the accompaniment audio is adjusted from the original preset sampling rate to the target sampling rate of the terminal, thus, the terminal can directly play the target accompaniment audio data obtained by resampling without adjusting the sampling rate of the playing system, and the delay of the playing system is avoided.

Specifically, the operation of resampling the accompaniment audio according to the target sampling rate to obtain the target accompaniment audio data includes: decoding the playing audio to obtain first PCM audio data; resampling the first PCM audio data according to the target sampling rate to obtain second PCM audio data, wherein the sampling rate of the second PCM audio data is the target sampling rate; the second PCM audio data is determined as target accompaniment audio data.

Because the playing system of the terminal can directly play the PCM audio data, decoding processing is needed to be carried out on accompaniment audio to obtain the first PCM audio data. After the first PCM audio data is obtained, the first PCM audio data may be resampled to obtain second PCM audio data consistent with the target sampling rate.

Specifically, the operation of resampling the first PCM audio data to obtain the second PCM audio data according to the target sampling rate includes: if the target sampling rate is greater than the preset sampling rate, up-sampling the first PCM audio data according to the target sampling rate to obtain second PCM audio data; and if the target sampling rate is smaller than the preset sampling rate, downsampling the first PCM audio data according to the target sampling rate to obtain second PCM audio data.

The manner in which resampling is handled typically involves upsampling and downsampling, and the basis for this distinction is a comparison of the magnitudes of the new and original sample rates at the time of resampling. If the original sampling rate is small, up-sampling is required, and if the original sampling rate is large, down-sampling is required.

Specifically, if the target sampling rate is large and the original sampling rate of the first PCM audio data is small, the first PCM audio data needs to be up-sampled so that the sampling rate thereof becomes large. The essence of upsampling is interpolation or interpolation, i.e. inserting some sampling points into the original data, making its resolution larger. Specific methods of upsampling may include nearest neighbor, bilinear interpolation, cubic convolution interpolation, and the like. For example, assuming that the preset sampling rate of the first PCM audio data is 44/s and the target sampling rate is 48k/s, 44k sampling points per second may be acquired during the resampling process, and then 4k sampling points may be inserted, where the inserted sampling points may be selected from the acquired sampling points, or may be preset non-information data.

Specifically, if the target sampling rate is small and the original sampling rate of the first PCM audio data is large, the first PCM audio data needs to be downsampled so that the sampling rate thereof becomes small. The essence of upsampling is to delete some of the sample points from the original data, making their resolution smaller. For example, assuming that the preset sampling rate of the first PCM audio data is 48/s and the target sampling rate is 44k/s, 48k sampling points per second may be acquired during the resampling process, and 4k sampling points may be deleted therefrom.

Step 305: if the preset sampling rate of the accompaniment audio is detected to be consistent with the target sampling rate, the audio data of the accompaniment audio is determined to be the target accompaniment audio data.

That is, if the preset sampling rate of the accompaniment audio is consistent with the target sampling rate, it is explained that the sampling rate of the accompaniment audio is the target sampling rate, and at this time, the resampling processing of the accompaniment audio is not needed, and the audio data of the accompaniment audio is directly determined as the target accompaniment audio data.

Specifically, the operation of determining the audio data of the accompaniment audio as target accompaniment audio data includes: decoding the accompaniment audio to obtain first PCM audio data, and determining the first PCM audio data as target accompaniment audio data.

Step 306: and the control terminal plays the target accompaniment audio data.

After the target accompaniment audio data with the target sampling rate is obtained, the target accompaniment audio data can be sent to the terminal for playing. Specifically, the target accompaniment audio data may be sent to a playing system of the terminal for playing, where the playing system of the terminal refers to audio hardware of the terminal.

Because the sampling rate of the playing system of the terminal is also the target sampling rate and is the same as the target accompaniment audio data, the playing system of the terminal can directly play the target accompaniment audio data without adjusting the sampling rate, thereby avoiding the delay of the playing system of the terminal.

Step 307: after the audio recording function is started, the control terminal records the audio according to the target sampling rate.

Specifically, after the audio recording function is started, the audio client may set the sampling rate of the recording system of the terminal to a target sampling rate, so that the recording system of the terminal directly records audio according to the target sampling rate when recording.

Step 308: and synthesizing the played audio data and the recorded audio data to obtain synthesized audio.

That is, the played audio track and the recorded audio track may be combined into a single audio track, the combined audio track including both the played audio data and the recorded audio data. Specifically, the played audio data and the recorded audio data may be synthesized, and then the synthesized audio data may be encoded to obtain the synthesized audio.

Because the accompaniment audio is resampled, the delay of a playing system of the terminal is avoided, and the sampling rates of the playing system and the recording system of the terminal are consistent, the accompaniment audio and the recording audio in the synthesized audio data are not deviated, the rhythm can be kept consistent, and the K song experience of a user is improved.

Step 309: and the control terminal plays the synthesized audio.

Specifically, after the synthesized audio is obtained, the terminal may be directly controlled to play the synthesized audio, or when receiving a play instruction for the synthesized audio, the terminal may be controlled to play the synthesized audio, or of course, may store the synthesized audio.

Specifically, when the terminal plays the synthesized audio, the synthesized audio can be played through the playing system of the terminal, namely audio hardware. In addition, as the sampling rate of the accompaniment audio and the recorded audio is consistent with the target sampling rate, the sampling rate of the synthesized audio obtained by the synthesis processing is also consistent with the target sampling rate, so that the terminal can directly play the synthesized audio without adjusting the sampling rate, and the playing efficiency of the synthesized audio is improved.

Fig. 4 is a block diagram of an audio processing apparatus according to an embodiment of the present invention, in which an audio client is integrated. Referring to fig. 4, the apparatus includes: an acquisition module 401, a starting module 402, a resampling module 403, a first playing module 404, a recording module 405 and a synthesizing module 406.

An obtaining module 401, configured to obtain a target sampling rate of a terminal;

A starting module 402, configured to start an accompaniment audio playing function and an audio recording function;

the resampling module 403 is configured to resample the accompaniment audio according to the target sampling rate to obtain target accompaniment audio data if the preset sampling rate of the accompaniment audio to be played is detected to be inconsistent with the target sampling rate after the accompaniment audio playing function is started, where the sampling rate of the target accompaniment audio data is the target sampling rate;

a first playing module 404, configured to control the terminal to play the target accompaniment audio data;

a recording module 405, configured to control the terminal to record audio according to the target sampling rate after the audio recording function is started;

and the synthesizing module 406 is configured to synthesize the played audio data and the recorded audio data to obtain synthesized audio.

Optionally, the resampling module 403 includes:

Optionally, the resampling unit is specifically configured to:

Optionally, the obtaining module 401 is specifically configured to:

Optionally, the synthesis module 406 is specifically configured to:

synthesizing the played audio data and the recorded audio data;

And encoding the synthesized audio data to obtain the synthesized audio.

Optionally, the apparatus further comprises:

It should be noted that: in the audio processing device provided in the above embodiment, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the audio processing apparatus and the audio processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 5 is a block diagram of a terminal 500 according to an embodiment of the present invention. The terminal 500 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. The terminal 500 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, the terminal 500 includes: a processor 501 and a memory 502.

Processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 501 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 501 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 501 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 501 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 502 is used to store at least one instruction for execution by processor 501 to implement the audio processing methods provided by the method embodiments herein.

In some embodiments, the terminal 500 may further optionally include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502, and peripheral interface 503 may be connected by buses or signal lines. The individual peripheral devices may be connected to the peripheral device interface 503 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 504, touch display 505, camera 506, audio circuitry 507, positioning component 508, and power supply 509.

Peripheral interface 503 may be used to connect at least one Input/Output (I/O) related peripheral to processor 501 and memory 502. In some embodiments, processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 501, memory 502, and peripheral interface 503 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 504 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuitry 504 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 504 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 504 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 504 may also include NFC (Near Field Communication ) related circuitry, which is not limited in this application.

The display 505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 505 is a touch display, the display 505 also has the ability to collect touch signals at or above the surface of the display 505. The touch signal may be input as a control signal to the processor 501 for processing. At this time, the display 505 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 505 may be one, providing a front panel of the terminal 500; in other embodiments, the display 505 may be at least two, respectively disposed on different surfaces of the terminal 500 or in a folded design; in still other embodiments, the display 505 may be a flexible display disposed on a curved surface or a folded surface of the terminal 500. Even more, the display 505 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 505 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 506 is used to capture images or video. Optionally, the camera assembly 506 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 506 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuitry 507 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 501 for processing, or inputting the electric signals to the radio frequency circuit 504 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 500. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 501 or the radio frequency circuit 504 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuitry 507 may also include a headphone jack.

The location component 508 is used to locate the current geographic location of the terminal 500 to enable navigation or LBS (Location Based Service, location-based services). The positioning component 508 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati system of Russia, or the Galileo system of the European Union.

A power supply 509 is used to power the various components in the terminal 500. The power supply 509 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 509 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 500 further includes one or more sensors 510. The one or more sensors 510 include, but are not limited to: an acceleration sensor 511, a gyro sensor 512, a pressure sensor 513, a fingerprint sensor 514, an optical sensor 515, and a proximity sensor 516.

The acceleration sensor 511 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 500. For example, the acceleration sensor 511 may be used to detect components of gravitational acceleration on three coordinate axes. The processor 501 may control the touch display 505 to display a user interface in a landscape view or a portrait view according to a gravitational acceleration signal acquired by the acceleration sensor 511. The acceleration sensor 511 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 512 may detect a body direction and a rotation angle of the terminal 500, and the gyro sensor 512 may collect a 3D motion of the user to the terminal 500 in cooperation with the acceleration sensor 511. The processor 501 may implement the following functions based on the data collected by the gyro sensor 512: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 513 may be disposed at a side frame of the terminal 500 and/or at a lower layer of the touch display 505. When the pressure sensor 513 is disposed at a side frame of the terminal 500, a grip signal of the user to the terminal 500 may be detected, and the processor 501 performs left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 513. When the pressure sensor 513 is disposed at the lower layer of the touch display screen 505, the processor 501 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 505. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 514 is used for collecting the fingerprint of the user, and the processor 501 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the user is authorized by the processor 501 to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 514 may be provided on the front, back or side of the terminal 500. When a physical key or a vendor Logo is provided on the terminal 500, the fingerprint sensor 514 may be integrated with the physical key or the vendor Logo.

The optical sensor 515 is used to collect the ambient light intensity. In one embodiment, the processor 501 may control the display brightness of the touch screen 505 based on the ambient light intensity collected by the optical sensor 515. Specifically, when the intensity of the ambient light is high, the display brightness of the touch display screen 505 is turned up; when the ambient light intensity is low, the display brightness of the touch display screen 505 is turned down. In another embodiment, the processor 501 may also dynamically adjust the shooting parameters of the camera assembly 506 based on the ambient light intensity collected by the optical sensor 515.

A proximity sensor 516, also referred to as a distance sensor, is typically provided on the front panel of the terminal 500. The proximity sensor 516 serves to collect a distance between the user and the front surface of the terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the user and the front of the terminal 500 gradually decreases, the processor 501 controls the touch display 505 to switch from the bright screen state to the off screen state; when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 gradually increases, the processor 501 controls the touch display 505 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 5 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.

In this embodiment, the terminal further includes one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the above-described audio processing method provided by the embodiments of the present invention. Wherein the one or more programs include an audio client.

In another embodiment, there is also provided a computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the above-described audio processing method.

In another embodiment, a computer program product is also provided which, when run on a computer, causes the computer to perform the above-described audio processing method.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. An audio processing method, characterized by being applied to an audio client, the method comprising:

when a singing instruction is received based on accompaniment audio to be played, acquiring a target sampling rate of a terminal, wherein the target sampling rate is a sampling rate supported by the terminal and positioned in a preset sampling rate range or is the sampling rate which is optimal in the sampling rate supported by the terminal, the terminal comprises the audio client, the preset sampling rate range is a sampling rate range which meets the playing requirement of the accompaniment audio to be played, and the terminal sets according to the playing requirement of the accompaniment audio;

after the accompaniment audio playing function is started, if the preset sampling rate of the accompaniment audio to be played is detected to be inconsistent with the target sampling rate, resampling the accompaniment audio according to the target sampling rate to obtain target accompaniment audio data, and controlling the terminal to play the target accompaniment audio data, wherein the sampling rate of the target accompaniment audio data is the target sampling rate, and the preset sampling rate is the inherent sampling rate of the accompaniment audio preset by the audio client;

2. The method of claim 1, wherein resampling the accompaniment audio according to the target sampling rate to obtain target accompaniment audio data comprises:

decoding the accompaniment audio to obtain first Pulse Code Modulation (PCM) audio data;

the second PCM audio data is determined as the target accompaniment audio data.

3. The method of claim 2, wherein resampling the first PCM audio data according to the target sampling rate comprises:

4. The method of claim 1, wherein the obtaining the target sampling rate of the terminal comprises:

5. The method of claim 1, wherein synthesizing the played audio data and the recorded audio data to obtain the synthesized audio comprises:

synthesizing the played audio data and the recorded audio data;

and encoding the synthesized audio data to obtain the synthesized audio.

6. The method according to any one of claims 1-5, wherein after synthesizing the played audio data and the recorded audio data to obtain the synthesized audio, the method further comprises:

and controlling the terminal to play the synthesized audio.

7. An audio processing apparatus, the apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

Wherein the processor is configured to perform the steps of the method of any of claims 1-6.

8. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the method of any of claims 1-6.