CN110931053A

CN110931053A - Method, device, terminal and storage medium for detecting recording time delay and recording audio

Info

Publication number: CN110931053A
Application number: CN201911253603.1A
Authority: CN
Inventors: 肖纯智; 劳振锋
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2020-03-27
Anticipated expiration: 2039-12-09
Also published as: CN110931053B

Abstract

The disclosure provides a method, a device, a terminal and a storage medium for detecting recording time delay and recording audio, and belongs to the field of audio signal processing. The method is applied to a terminal, the terminal is provided with an audio playing device and a sound collecting device, and the method comprises the following steps: controlling the audio playing equipment to play a test signal so as to generate sound; acquiring the sound generated by the audio playing equipment playing the test signal through the sound acquisition equipment to obtain a recorded signal; and determining the recording time delay of the terminal based on the test signal and the recording signal. The method can determine the recording time delay according to the recording signal and the test signal, avoids determining the recording time delay in a human ear audition mode, and is more accurate and higher in efficiency.

Description

Method, device, terminal and storage medium for detecting recording time delay and recording audio

Technical Field

The present disclosure relates to the field of audio signal processing, and in particular, to a method, an apparatus, a terminal, and a storage medium for detecting recording delay and recording audio.

Background

In the era of mobile internet, with the continuous development of terminals such as smart phones, the terminals can be used for karaoke by installing a karaoke application (also called a singing application). The user can utilize the K song to use the on-demand song, plays the accompaniment through audio playback equipment, gathers the vocal audio through sound collection equipment simultaneously, then will gather the vocal audio and accompany the synthesis, like this, through installing the terminal that the K song was used, just can sing the song exercise or be singing amusement anytime and anywhere.

When the terminal is used for recording the karaoke, the phenomenon that the recording is delayed generally exists, namely, the karaoke is played along with the rhythm during recording, and the fact that the voice is shot later during sound mixing is found. Therefore, it is necessary to acquire this recording delay and compensate when synthesizing the vocal audio and the accompaniment.

In the related art, manual trial listening is usually used to align the accompaniment and the vocal audio to determine the recording delay. However, the human ear has a limited ability to resolve the time delay, and the minimum time delay that can be perceived is typically 30ms to 50ms, which varies from person to person. Therefore, people who are hearing-sensitive need to find the time delay error of the estimation as small as possible, and the single time delay estimation also needs to record and repeatedly align the trial listening, which is time-consuming and extremely inefficient.

Disclosure of Invention

The embodiment of the disclosure provides a method, a device, a terminal and a storage medium for detecting recording time delay and recording audio. The technical scheme is as follows:

in one aspect, an embodiment of the present disclosure provides a method for detecting a recording delay, where the method includes:

controlling the audio playing equipment to play a test signal so as to generate sound;

acquiring the sound generated by the audio playing equipment playing the test signal through the sound acquisition equipment to obtain a recorded signal;

and determining the recording time delay of the terminal based on the test signal and the recording signal.

Optionally, the determining, based on the test signal and the recording signal, a recording delay of the terminal includes:

performing correlation operation on the test signal and the recording signal;

and determining the recording time delay of the terminal according to the occurrence time of the correlation peak.

Optionally, the determining the recording delay of the terminal according to the time of the occurrence of the correlation peak includes:

and taking the time of the occurrence of the correlation peak as the recording time delay of the terminal.

Optionally, the method further comprises:

acquiring the model of the terminal;

and recording the corresponding relation between the model of the terminal and the recording time delay.

Optionally, the test signal is a swept frequency signal.

In another aspect, an embodiment of the present disclosure provides a method for recording audio, which is applied to a terminal, where the terminal has an audio playing device and a sound collecting device, and the method includes:

controlling the audio playing equipment to play the accompaniment of the target song;

acquiring a human voice audio collected by the voice collecting equipment;

acquiring the recording time delay of the terminal, wherein the recording time delay of the terminal is obtained based on the method for detecting the recording time delay;

and synthesizing the accompaniment and the voice audio based on the recording time delay.

Optionally, the obtaining of the recording delay of the terminal includes:

acquiring the model of the terminal;

and determining the recording time delay of the terminal based on the corresponding relation between the preset recording time delay and the model of the terminal by adopting the model of the terminal.

Optionally, the synthesizing the accompaniment and the vocal audio based on the recording time delay includes:

carrying out time delay processing on the accompaniment according to the recording time delay;

and synthesizing the accompaniment subjected to time delay processing and the voice audio.

On the other hand, the embodiment of the present disclosure further provides a device for detecting recording time delay, which is applied to a terminal, where the terminal has an audio playing device and a sound collecting device, and the device includes:

the playing module is used for controlling the audio playing equipment to play the test signal so as to generate sound;

the acquisition module is used for acquiring the sound generated by the audio playing equipment playing the test signal through the sound acquisition equipment to obtain a recorded signal;

and the determining module is used for determining the recording time delay of the terminal based on the test signal and the recording signal.

Optionally, the determining module includes:

the calculation submodule is used for carrying out correlation operation on the test signal and the recording signal;

and the determining submodule is used for determining the recording time delay of the terminal according to the occurrence time of the correlation peak.

Optionally, the determining submodule is configured to use the time when the correlation peak occurs as the recording delay of the terminal.

Optionally, the apparatus further comprises:

the acquisition module is used for acquiring the model of the terminal;

and the recording module is used for recording the corresponding relation between the model of the terminal and the recording time delay.

Optionally, the test signal is a swept frequency signal.

On the other hand, the embodiment of the present disclosure further provides an apparatus for recording an audio, which is applied to a terminal, where the terminal has an audio playing device and a sound collecting device, and the apparatus includes:

the playing module is used for controlling the audio playing equipment to play the accompaniment of the target song;

the acquisition module is used for acquiring the human voice audio collected by the voice collection equipment;

the acquisition module is used for acquiring the recording time delay of the terminal, and the recording time delay of the terminal is obtained by the method for detecting the recording time delay;

and the synthesis module is used for synthesizing the accompaniment and the voice audio based on the recording time delay.

Optionally, the obtaining module includes:

the obtaining submodule is used for obtaining the model of the terminal;

and the determining submodule is used for determining the recording time delay of the terminal based on the corresponding relation between the preset recording time delay and the model of the terminal by adopting the model of the terminal.

Optionally, the synthesis module comprises:

the delay submodule is used for carrying out delay processing on the accompaniment according to the recording delay;

and the synthesis submodule is used for synthesizing the accompaniment subjected to the time delay processing and the voice audio.

On the other hand, the embodiment of the present disclosure further provides a terminal, where the terminal includes: a processor; a memory configured to store processor-executable instructions; wherein the instructions are loaded and executed by the processor to implement any of the methods of detecting recording latency or recording audio as previously described.

In another aspect, the present disclosure also provides a computer-readable storage medium, where instructions of the computer-readable storage medium, when executed by a processor of a terminal, cause the terminal to perform any one of the methods for detecting a recording delay or recording audio described above.

In the embodiment of the disclosure, the sound generated by the audio playing device playing the test signal is collected through the sound collecting device to obtain the recording signal, so that the recording signal and the test signal have relevance, therefore, the recording time delay can be determined according to the recording signal and the test signal, the recording time delay is prevented from being determined by adopting a human ear audition mode, the determined recording time delay is more accurate, and the efficiency is higher.

When the terminal is utilized to record audio, the recording time delay of the terminal can be acquired firstly, and accompaniment and voice audio are synthesized based on the recording time delay, so that the accompaniment and the voice audio can be aligned, and a better sound mixing effect is achieved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for detecting audio record delays according to an exemplary embodiment of the present disclosure;

FIG. 2 is a flowchart illustrating a method for detecting audio record delays according to an exemplary embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating a signal processing procedure of a method for detecting a recording delay according to an exemplary embodiment of the present disclosure;

FIG. 4 illustrates a flow chart of a method of recording audio according to an exemplary embodiment of the present disclosure;

FIG. 5 illustrates a flow chart of a method of recording audio according to an exemplary embodiment of the present disclosure;

FIG. 6 is a block diagram illustrating an apparatus for detecting latency of a sound recording according to an exemplary embodiment of the present disclosure;

fig. 7 is a block diagram illustrating an apparatus for recording audio according to an exemplary embodiment of the present disclosure;

fig. 8 shows a block diagram of a terminal according to an exemplary embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

The embodiments of the present disclosure are applicable to a terminal having an audio playing device (e.g., a speaker), and meanwhile, the terminal further has a sound collecting device (e.g., a built-in microphone, an external microphone, an earphone integrated with a microphone, etc.), which is also called a sound pick-up. The terminal can play sound through the audio playing device and collect sound through the sound collecting device.

Install K song in this terminal and use, the user can be through this K song application on-demand song, and the terminal is after receiving user's broadcast instruction, plays the accompaniment of the song that this broadcast instruction corresponds through audio playback equipment, simultaneously through sound collection equipment collection voice frequency, obtains recording file after will accompany and voice audio synthesis. In addition, some Karaoke applications also have a scoring function, the singing of the user at this time can be scored according to the collected voice and audio, and the higher the score is, the better the singing is. Through this K song application, the user can carry out K song at any time.

Illustratively, the terminal includes, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like.

Fig. 1 shows a flowchart of a method for detecting a recording delay according to an exemplary embodiment of the present disclosure. The method may be performed by the aforementioned terminal, and referring to fig. 1, the method includes:

step 101: and controlling the audio playing equipment to play the test signal so as to generate sound.

Here, the audio playing device may be a speaker built in the terminal. The test signal may be any audio signal, such as a frequency sweep signal. The frequency sweep signal refers to an audio signal whose frequency changes with time.

Step 102: and acquiring the sound generated by playing the test signal by the audio playing equipment through the sound acquisition equipment to obtain a recording signal.

In step 102, the sound collection device may be a microphone, and the microphone may be a microphone built in the terminal or an external microphone connected to the terminal.

Step 103: and determining the recording time delay of the terminal based on the test signal and the recording signal.

The recording signal is obtained by acquiring the sound played by the audio playing device through the sound acquisition device, so that the waveforms of the recording signal and the test signal are basically the same, and the difference is that the recording signal needs to be transmitted outside the sound and converted from the sound wave signal to the electric signal, and the like, so that the recording signal acquired by the terminal has certain lag and time delay relative to the test signal. Meanwhile, due to interference of channel modulation, equipment background noise, external environment noise and the like, the waveform of the recorded signal has a certain change relative to the test signal, but the basic shape of the waveform is the same. Therefore, the recording time delay of the terminal can be determined according to the similarity of the waveforms of the test signal and the recording signal.

Fig. 2 is a flowchart of a method for detecting a recording delay according to an embodiment of the present disclosure. The method may be performed in a terminal, see fig. 2, the method comprising:

step 201: and controlling the audio playing equipment to play the frequency sweeping signal so as to generate sound.

Here, the audio playing device may be a speaker built in the terminal.

In the embodiment of the present disclosure, a test signal is taken as a frequency sweep signal for example. The frequency sweep signal refers to an audio signal whose frequency changes with time. For example, the frequency of the swept frequency signal may vary linearly with time. The frequency sweep signal has only one autocorrelation peak and an obvious peak value, so that the frequency sweep signal is adopted as the test signal to carry out subsequent correlation operation to obtain only one correlation peak, the peak value is obvious, the anti-interference capability is strong, and misjudgment can not be caused.

Optionally, the maximum frequency variation range of the frequency sweep signal is 0-Fs/2, where Fs is a sampling rate of the recording signal to satisfy nyquist's law. For example, the frequency variation range of the sweep signal may be a sub-interval of the maximum frequency variation range. For example, 0 to Fs/3, or Fs/4 to Fs/2, the subinterval may be set according to actual needs, which is not limited by the present disclosure.

Illustratively, the frequency sweep signal may be any one of a sine wave frequency sweep signal, a cosine wave frequency sweep signal, a triangular wave frequency sweep signal, and a square wave frequency sweep signal. Compared with a triangular wave frequency sweep signal and a square wave frequency sweep signal, the correlation peak generated after the sine wave frequency sweep signal and the cosine wave frequency sweep signal are subjected to correlation operation is more obvious, and the time delay can be accurately determined.

Optionally, the method may further include:

a frequency sweep signal is generated.

For example, the following method may be used to generate the sweep signal:

determining parameter values of a sweep frequency signal, wherein the parameter values of the sweep frequency signal comprise signal duration, starting frequency (namely frequency corresponding to starting time), cut-off frequency (namely frequency corresponding to ending time) and waveform;

and generating the frequency sweep signal according to the parameter value of the frequency sweep signal.

These parameter values may be set by the tester as desired.

Step 202: and acquiring the sound generated by playing the sweep frequency signal by the audio playing equipment through the sound acquisition equipment to obtain a recording signal.

The recording signal is obtained by acquiring the sound played by the audio playing device through the sound acquisition device, so that the waveforms of the recording signal and the frequency sweeping signal are basically the same, and the difference is that the recording signal needs to be transmitted through the outside of the sound, and the sound wave signal is converted into the electric signal, so that the recording signal acquired by the terminal has certain lag and time delay relative to the frequency sweeping signal. Meanwhile, due to interference of channel modulation, equipment background noise, external environment noise and the like, the waveform of the recorded signal has a certain change relative to the test signal, but the basic shape of the waveform is the same.

Step 203: and carrying out correlation operation on the frequency sweep signal and the recording signal.

Step 204: and determining the recording time delay of the terminal according to the occurrence time of the correlation peak.

This step 204 may include: and taking the time of the occurrence of the correlation peak as the recording time delay of the terminal.

In the embodiment of the present disclosure, the correlation operation between two signals is a cross-correlation operation, and since a cross-correlation function is generally used to measure the degree of correlation between two time series, i.e., to describe the degree of correlation between two signals at any different time, the time delay between two signals can be determined according to the peak value (i.e., correlation peak) of the cross-correlation function. And the recording time delay is determined by utilizing the correlation peak, the anti-interference performance is strong, and the recording time delay can be accurately determined.

Fig. 3 is a schematic diagram illustrating a signal processing procedure of a method for detecting a recording delay according to an exemplary embodiment of the present disclosure. In fig. 3, the abscissa represents time, and the abscissa is 1 × 10, taking the sampling rate Fs 44100 as an example⁴Representing (10000/44100) seconds and the ordinate represents magnitude, unitless. As shown in fig. 3, the waveform of the sweep signal is shown in the uppermost frame, and it can be seen that the sweep signal is a sine wave sweep signal. The middle frame is a waveform diagram of the recording signal, the waveform of the recording signal is similar to that of the sweep-frequency signal, but is not completely the same because of certain noise in the recording process, and as can be seen from the diagram, the initial position of the recording signal is delayed by time t relative to the initial position of the sweep-frequency signal, and t is less than 0.5 x 10⁴Fs seconds. The waveform diagram obtained after the sweep frequency signal and the recording signal are subjected to relevant operation is shown in the lowermost box. As can be seen from the figure, the waveform obtained after the correlation operation has only one significant correlation peak, and the position of the correlation peak corresponds to the delay time t of the recording signal, that is, the time when the correlation peak appears is the delay time of the recording signal.

Because the hardware equipment used by the terminals of different models is different, and the recording time delays corresponding to the terminals of different models are different, before the Karaoke application is released, the recording time delays of the terminals of various models can be detected, and the detection result can be recorded, so that the terminals of various models can adopt the respective corresponding recording time delays to perform sound mixing. Thus, optionally, the method further comprises:

step 205: and acquiring the model of the terminal.

Alternatively, the model of the terminal may be acquired from configuration information of the terminal, or the model of the terminal input by the user may be acquired. Illustratively, the model of the terminal may be a combination of the brand and model of the terminal.

Step 206: and recording the corresponding relation between the model of the terminal and the recording time delay.

In a possible implementation manner, the correspondence between the model of the terminal and the recording delay may be recorded in the form of a table. For example, the corresponding relation between the model of the terminal and the recording time delay is recorded in a form of table one.

Watch 1

Model number	Recording time delay
		Brand A model a1	Time delay	1
Brand A model a2	Time delay				2
		Brand B model B	Time delay 3

In another possible implementation manner, the correspondence between the model of the terminal and the recording delay may be recorded in the form of an array. For example, [ brand a model a1, time delay 1], [ brand a model a2, time delay 2], [ brand B model B, time delay 3], and so on.

In the embodiment of the disclosure, the sound collecting device collects the sound played by the audio playing device to obtain the recorded signal, so that the recorded signal and the test signal have relevance, therefore, the recording time delay can be determined according to the recorded signal and the test signal, the recording time delay is prevented from being determined by adopting a human ear audition mode, the determined recording time delay is more accurate, and the efficiency is higher.

Fig. 4 shows a flowchart of a method of recording audio according to an exemplary embodiment of the present disclosure. The method may be performed in a terminal, see fig. 4, the method comprising:

step 401: and controlling the audio playing equipment to play the accompaniment of the target song.

Step 402: and acquiring the human voice audio collected by the voice collecting equipment.

Step 403: and acquiring the recording time delay of the terminal.

The recording delay of the terminal is the recording delay obtained by the method provided by fig. 1 or fig. 2.

Step 404: and synthesizing the accompaniment and the human voice audio based on the recording time delay.

In the embodiment of the disclosure, when the terminal is used for recording audio, the recording time delay of the terminal can be acquired first, and the accompaniment and the voice audio are synthesized based on the recording time delay, so that the accompaniment and the voice audio can be aligned, and a better sound mixing effect is achieved.

Fig. 5 shows a flowchart of a method of recording audio according to an exemplary embodiment of the present disclosure. The method may be performed in a terminal, see fig. 5, the method comprising:

step 501: and controlling the audio playing equipment to play the accompaniment of the target song.

Optionally, this step 501 may include:

the method comprises the following steps of firstly, receiving a playing instruction of a user, wherein the playing instruction is used for indicating a target song to be played.

And secondly, acquiring accompaniment data of the target song corresponding to the playing instruction.

And thirdly, controlling the audio playing equipment to play the accompaniment of the target song.

For example, the playing instruction may carry an identifier of the target song, and the second step may include downloading accompaniment data corresponding to the identifier of the target song, or may search for the accompaniment data corresponding to the identifier of the target song from a local memory.

Step 502: and acquiring the human voice audio collected by the voice collecting equipment.

In the process of playing the accompaniment, the user can sing according to the accompaniment, and the sound collection equipment collects the sound given by the singing of the user to obtain the voice frequency.

Step 503: and acquiring the model of the terminal.

Illustratively, the model of the terminal may be looked up from the configuration information of the terminal.

Step 504: and determining the recording time delay of the terminal by adopting the model of the terminal based on the corresponding relation between the pre-configured recording time delay and the model of the terminal.

In a possible implementation manner, the correspondence between the recording delay and the model of the terminal may be obtained by using the method shown in fig. 2, and stored in the terminal when the terminal installs the karaoke application.

In another possible implementation manner, the correspondence between the recording delay and the model of the terminal may be obtained by using the method shown in fig. 2, and stored in a server corresponding to the karaoke application. After the terminal installs the karaoke application, a request can be sent to the server, the request carries the model of the terminal, and the recording time delay of the terminal returned by the server is received.

The recording delay of the terminal can be determined through the

steps

503 and 504.

Step 505: and carrying out delay processing on the accompaniment according to the recording delay.

Step 506: and synthesizing the accompaniment and the human voice audio subjected to the time delay processing.

That is, the accompaniment is delayed by the recording time delay, so that the accompaniment is synchronous with the voice audio, and then the accompaniment and the voice audio after the time delay processing are synthesized, thereby achieving a better sound mixing effect.

The following are apparatus embodiments of the present application, to the details of which are not described in detail, reference may be made to the above-described method embodiments.

Fig. 6 is a block diagram of an apparatus for detecting recording delay according to an embodiment of the present disclosure. The apparatus has functions of implementing the above method examples, and the functions may be implemented by hardware or by hardware executing corresponding software. Referring to fig. 6, the apparatus 600 includes: a playing module 601, a collecting module 602 and a determining module 603.

The playing module 601 is configured to control the audio playing device to play sound by using a test signal. The acquisition module 602 is configured to acquire, through the sound acquisition device, sound played by the audio playing device to obtain a recording signal. The determining module 603 is configured to determine a recording delay of the terminal based on the test signal and the recording signal.

Optionally, the determining module 603 includes a calculating sub-module 6031 and a determining sub-module 6032. The calculation sub-module 6031 is configured to perform correlation operation on the test signal and the recording signal. The determining submodule 6032 is configured to determine the recording delay of the terminal according to the time when the correlation peak occurs.

Optionally, the determining sub-module 6032 is configured to use the time when the correlation peak occurs as the recording delay of the terminal.

Optionally, the apparatus 600 further comprises an obtaining module 604 and a recording module 605. The obtaining module 604 is configured to obtain a model of the terminal. The recording module 605 is configured to record a corresponding relationship between the model of the terminal and the recording delay.

Fig. 7 is a block diagram of an apparatus for detecting recording delay according to an embodiment of the present disclosure. The apparatus has functions of implementing the above method examples, and the functions may be implemented by hardware or by hardware executing corresponding software. Referring to fig. 7, the apparatus 700 includes: a playing module 701, an acquisition module 702, an acquisition module 703 and a synthesis module 704.

The playing module 701 is configured to control the audio playing device to play the accompaniment of the target song. The acquisition module 702 is configured to acquire a human voice audio acquired by the sound acquisition device. The obtaining module 703 is configured to obtain the recording delay of the terminal, where the recording delay of the terminal is obtained by using the method shown in fig. 1 or fig. 2. The synthesis module 704 is configured to synthesize the accompaniment and the vocal audio based on the recording time delay.

Optionally, the obtaining module 703 includes an obtaining sub-module 7031 and a determining sub-module 7032. The obtaining sub-module 7031 is configured to obtain a model of the terminal. The determining submodule 7032 is configured to determine, based on a correspondence between a preset recording delay and a terminal model, a recording delay of the terminal, by using the terminal model.

Optionally, the combining module 704 includes a delay sub-module 7041 and a combining sub-module 7042. The delay submodule 7041 is configured to perform delay processing on the accompaniment according to the recording delay. The synthesis sub-module 7042 is configured to synthesize the delayed accompaniment and the human voice audio.

Fig. 8 shows a block diagram of a terminal 800 according to an exemplary embodiment of the disclosure. The terminal 800 may be: a smartphone, a tablet, a laptop, or a desktop computer. The terminal 800 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

As shown in fig. 8, the terminal 800 includes: a processor 801 and a memory 802.

The processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 801 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 801 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 801 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 801 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one instruction for execution by processor 801 to implement the method of detecting recording delays or the method of recording audio provided by the method embodiments herein.

In some embodiments, the terminal 800 may further include: a peripheral interface 803 and at least one peripheral. The processor 801, memory 802 and peripheral interface 803 may be connected by bus or signal lines. Various peripheral devices may be connected to peripheral interface 803 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 804, a display screen 805, a camera assembly 806, an audio circuit 807, a positioning assembly 808, and a power supply 809.

The peripheral interface 803 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 801 and the memory 802. In some embodiments, the processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.

The Radio Frequency circuit 804 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 804 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 804 converts an electrical signal into an electromagnetic signal to be transmitted, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 804 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 804 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to capture touch signals on or above the surface of the display 805. The touch signal may be input to the processor 801 as a control signal for processing. At this point, the display 805 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 805 may be one, providing the front panel of the terminal 800; in other embodiments, the display 805 may be at least two, respectively disposed on different surfaces of the terminal 800 or in a folded design; in still other embodiments, the display 805 may be a flexible display disposed on a curved surface or a folded surface of the terminal 800. Even further, the display 805 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 805 can be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), and the like.

The camera assembly 806 is used to capture images or video. Optionally, camera assembly 806 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 806 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 801 for processing or inputting the electric signals to the radio frequency circuit 804 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 800. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 807 may also include a headphone jack.

The positioning component 808 is used to locate the current geographic position of the terminal 800 for navigation or LBS (location based Service). The positioning component 808 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

Power supply 809 is used to provide power to various components in terminal 800. The power supply 809 can be ac, dc, disposable or rechargeable. When the power source 809 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.

The acceleration sensor 811 may detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 800. For example, the acceleration sensor 811 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 801 may control the touch screen 805 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811. The acceleration sensor 811 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 812 may detect a body direction and a rotation angle of the terminal 800, and the gyro sensor 812 may cooperate with the acceleration sensor 811 to acquire a 3D motion of the user with respect to the terminal 800. From the data collected by the gyro sensor 812, the processor 801 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 813 may be disposed on the side bezel of terminal 800 and/or underneath touch display 805. When the pressure sensor 813 is disposed on the side frame of the terminal 800, the holding signal of the user to the terminal 800 can be detected, and the processor 801 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at a lower layer of the touch display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 805. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 814 is used for collecting a fingerprint of the user, and the processor 801 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 814 may be disposed on the front, back, or side of terminal 800. When a physical button or a vendor Logo is provided on the terminal 800, the fingerprint sensor 814 may be integrated with the physical button or the vendor Logo.

The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of the touch screen 805 based on the ambient light intensity collected by the optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 805 is increased; when the ambient light intensity is low, the display brightness of the touch display 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera assembly 806 based on the ambient light intensity collected by the optical sensor 815.

A proximity sensor 816, also known as a distance sensor, is typically provided on the front panel of the terminal 800. The proximity sensor 816 is used to collect the distance between the user and the front surface of the terminal 800. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front surface of the terminal 800 gradually decreases, the processor 801 controls the touch display 805 to switch from the bright screen state to the dark screen state; when the proximity sensor 816 detects that the distance between the user and the front surface of the terminal 800 becomes gradually larger, the processor 801 controls the touch display 805 to switch from the screen-on state to the screen-on state.

Those skilled in the art will appreciate that the configuration shown in fig. 8 is not intended to be limiting of terminal 800 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

The embodiment of the present disclosure also provides a computer-readable storage medium, and when instructions in the computer-readable storage medium are executed by a processor of a terminal, the terminal is caused to execute the method for detecting recording time delay or the method for recording audio. The computer readable storage medium may be non-transitory. For example, the computer readable storage medium may be a read-only memory, a magnetic or optical disk, or the like.

The above description is intended to be exemplary only and not to limit the present disclosure, and any modification, equivalent replacement, or improvement made without departing from the spirit and scope of the present disclosure is to be considered as the same as the present disclosure.

Claims

1. A method for detecting recording delay is applied to a terminal, wherein the terminal is provided with an audio playing device and a sound collecting device, and the method comprises the following steps:

2. The method of claim 1, wherein the determining the recording delay of the terminal based on the test signal and the recording signal comprises:

performing correlation operation on the test signal and the recording signal;

3. The method of claim 2, wherein the determining the recording delay of the terminal according to the time of the occurrence of the correlation peak comprises:

4. The method according to any one of claims 1 to 3, further comprising:

acquiring the model of the terminal;

5. A method according to any one of claims 1 to 3, wherein the test signal is a swept frequency signal.

6. A method for recording audio is applied to a terminal, wherein the terminal is provided with an audio playing device and a sound collecting device, and the method comprises the following steps:

acquiring a human voice audio collected by the voice collecting equipment;

acquiring the recording time delay of the terminal, wherein the recording time delay of the terminal is obtained based on the method of any one of claims 1 to 5;

7. The method of claim 6, wherein the obtaining the recording delay of the terminal comprises:

acquiring the model of the terminal;

8. The method of claim 6 or 7, wherein the synthesizing the accompaniment and the vocal audio based on the recording time delay comprises:

9. The utility model provides a detect device of recording time delay which characterized in that is applied to the terminal, the terminal has audio playback equipment and sound collection equipment, the device includes:

10. An apparatus for recording audio, applied to a terminal having an audio playing device and a sound collecting device, the apparatus comprising:

an obtaining module, configured to obtain a recording delay of the terminal, where the recording delay of the terminal is obtained based on the method of any one of claims 1 to 5;

11. A terminal, characterized in that the terminal comprises: a processor; a memory configured to store processor-executable instructions; wherein the instructions are loaded and executed by the processor to implement the method of detecting recording delays of any of claims 1 to 5 or the method of recording audio of any of claims 6 to 8.

12. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of a terminal, cause the terminal to perform the method of detecting recording latency of any one of claims 1 to 5 or perform the method of recording audio of any one of claims 6 to 8.