WO2022160669A1

WO2022160669A1 - Audio processing method and audio processing apparatus

Info

Publication number: WO2022160669A1
Application number: PCT/CN2021/113086
Authority: WO
Inventors: 邢文浩; 张晨
Original assignee: 北京达佳互联信息技术有限公司
Priority date: 2021-01-26
Filing date: 2021-08-17
Publication date: 2022-08-04
Also published as: CN112927666A; CN112927666B

Abstract

An audio processing method, comprising: receiving an audio clip, which is collected during singing, of a first user and a background audio playing time, which corresponds to the audio clip, of background audio of the first user (S201); and adjusting a playing position of background audio of a second user according to the background audio playing time, so that the background audio of the second user after the adjustment is aligned with the received audio clip of the first user (S202), wherein the background audio of the second user is the same as the background audio of the first user. Further disclosed are an audio processing apparatus, an electronic device and a storage medium.

Description

Audio processing method and audio processing device

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on the Chinese patent application with the application number of 202110110917.7 and the filing date of January 26, 2021, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.

technical field

The present disclosure relates to the field of signal processing, and in particular, to an audio processing method, an apparatus, an electronic device, and a storage medium.

Background technique

Now, online KTV chorus is becoming more and more popular. Online KTV chorus means that two people (for example, A and B) choose the same song to chorus. At this time, both A and B can hear each other's singing and their own accompaniment, just like offline KTV chorus.

SUMMARY OF THE INVENTION

The present disclosure provides an audio processing method, apparatus, electronic device and storage medium.

According to a first aspect of the embodiments of the present disclosure, an audio processing method is provided, the audio processing method comprising: receiving an audio segment of a first user collected during singing and a background audio of the first user corresponding to the audio segment the playing time of the background audio; adjust the playing position of the background audio of the second user according to the playing time of the background audio, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user, wherein the The background audio of the second user is the same as the background audio of the first user.

In some embodiments, the background audio playback moment is obtained by subtracting a time delay due to audio capture from the current playback moment of the background audio of the first user.

In some embodiments, the adjusting the playing position of the background audio of the second user according to the playing moment of the background audio includes: determining the playing position of the background audio of the second user when the audio clip of the first user is received; When the background audio playback position is within the time interval from the end of the second user's singing to the first user's beginning to sing, or the time interval from the second user's background audio to the first user's beginning to sing, the background audio The audio playback position is adjusted to correspond to the received background audio playback time.

In some embodiments, the adjusting the playing of the background audio of the second user according to the playing moment of the background audio includes: determining that the background audio of the second user is in a time interval from the end of the second user's singing to the beginning of the first user's singing Or the sub-interval with the smallest average audio energy in the time interval from the second user's background audio to the first user's beginning to sing; in the sub-interval, adjust the playback position of the second user's background audio according to the background audio playback time .

In some embodiments, the adjusting the playback position of the background audio of the second user in the sub-interval according to the playback moment of the background audio includes: determining the background audio of the second user when the audio clip of the first user is received Playing position; in response to the background audio playing position being in the sub-section, adjusting the background audio playing position to correspond to the received background audio playing time.

In some embodiments, determining that the background audio of the second user is in a time interval from the end of the second user's singing to the beginning of the first user's singing or a time interval from the beginning of the second user's background audio to the first user's beginning to sing The sub-intervals with the smallest audio frequency average energy include: calculating the audio average energy of each sub-interval in the time interval according to the following formula, and determining the sub-intervals with the smallest frequency average energy according to the calculated audio average energy of the respective intervals:

Among them, E(ab) is the average energy of sub-interval ab, TSb is the number of sampling points up to time b, TSa is the number of sampling points up to time a, TSb-TSa is the number of sampling points between intervals ab number, S(i) is the amplitude of the ith sampling point.

In some embodiments, the audio processing method further includes: sending to the first user the audio clip of the second user collected during singing and the background audio playing time of the second user's background audio corresponding to the audio clip of the second user .

In some embodiments, the audio processing method further includes: establishing a communication connection with the first user; playing the background audio, and playing the received audio clip of the first user.

In some embodiments, the receiving the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip includes: receiving the audio clip collected during singing according to a predetermined time interval. The audio clip of the first user and the background audio playing time of the background audio of the first user corresponding to the audio clip.

According to a second aspect of the embodiments of the present disclosure, there is provided an audio processing apparatus, the audio processing apparatus comprising: a receiving unit configured to receive an audio segment of a first user collected during singing and an audio clip of the background audio of the first user The background audio playback time corresponding to the audio clip; the adjusting unit is configured to adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is the same as the received audio. The audio clips of the first user are aligned, wherein the background audio of the second user is the same as the background audio of the first user.

In some embodiments, the adjusting the playback position of the background audio of the second user according to the playback moment of the background audio includes:

determining the background audio playback position of the second user when the audio clip of the first user is received;

When the background audio playback position is within the time interval from the end of the second user's singing to the first user's beginning to sing, or the time interval from the second user's background audio to the first user's beginning to sing, the The background audio playback position is adjusted to correspond to the received background audio playback time.

In some embodiments, the adjusting the playing of the background audio of the second user according to the playing time of the background audio includes: determining that the background audio of the second user is in a time interval from the end of the second user's singing to the beginning of the first user's singing Or the sub-interval with the smallest average audio energy in the time interval from the second user's background audio to the first user's beginning to sing; in the sub-interval, adjust the playback position of the second user's background audio according to the background audio playback time .

In some embodiments, the audio processing apparatus further includes: a sending unit configured to send to the first user the audio clip of the second user collected during singing, the background audio of the second user and the audio clip of the second user The corresponding background audio playback time.

In some embodiments, the audio processing apparatus further includes: a communication unit configured to establish a communication connection with the first user; an audio playback unit configured to play the background audio and play the received audio of the first user. audio clip.

In some embodiments, the receiving unit receives the audio segment of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio segment at predetermined time intervals.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, the electronic device comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions are stored by the At least one processor, when run, causes the at least one processor to perform the audio processing method as described above.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the audio processing method as described above .

According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer program product comprising computer instructions, which when executed by a processor implement the audio processing method as described above.

The embodiment of the present disclosure adjusts the playback position of the second user's background audio according to the background audio playback time of the first user's background audio corresponding to its audio segment, so that the second user's adjusted background audio is the same as the received background audio. The alignment of the audio clips of the first user can prevent the audio clips of the first user from being misaligned with the background audio of the local second user due to transmission delay, which affects the chorus experience. In addition, the embodiments of the present disclosure can also reduce the impact on the sense of hearing when adjusting the playing position of the background audio of the second user.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate example embodiments consistent with the present disclosure, and together with the description serve to explain the principles of the present disclosure and do not unduly limit the disclosure.

FIG. 1 is an exemplary system architecture in which exemplary embodiments of the present disclosure may be applied;

2 is a flowchart of an audio processing method of an exemplary embodiment of the present disclosure;

3 is a schematic diagram illustrating acquiring a background audio playback moment corresponding to an audio segment according to an exemplary embodiment of the present disclosure;

4 is a schematic diagram of adjusting the playback position of background audio according to an exemplary embodiment of the present disclosure;

5 is a schematic diagram of an application scenario of the audio processing method according to an exemplary embodiment of the present disclosure;

6 is a block diagram of an audio processing apparatus of an exemplary embodiment of the present disclosure;

FIG. 7 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed ways

In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first", "second" and the like in the description and claims of the present disclosure and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following examples are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

It should be noted here that "at least one of several items" in the present disclosure all means including "any one of the several items", "a combination of any of the several items", The three categories of "the whole of the several items" are juxtaposed. For example, "including at least one of A and B" includes the following three parallel situations: (1) including A; (2) including B; (3) including A and B. Another example is "execute at least one of step 1 and step 2", which means the following three parallel situations: (1) execute step 1; (2) execute step 2; (3) execute step 1 and step 2.

FIG. 1 illustrates an exemplary system architecture 100 in which exemplary embodiments of the present disclosure may be applied.

As shown in FIG. 1 , the system architecture 100 may include

terminal devices

101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the

terminal devices

101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The user can use the

terminal devices

101 , 102 and 103 to interact with the server 105 through the network 104 to receive or send messages (eg, audio and video data upload requests, audio and video data acquisition requests) and the like. Various communication client applications may be installed on the

terminal devices

101 , 102 and 103 , such as singing applications, audio and video recording software, audio and video players, instant communication tools, email clients, social platform software, and the like. The

terminal devices

101, 102, and 103 may be hardware or software. In the case where the

terminal devices

101, 102, and 103 are hardware, they can be various electronic devices with a display screen and capable of audio and video playback and recording, including but not limited to smart phones, tablet computers, laptop computers and desktops computer, etc. In the case where the

terminal devices

101, 102, and 103 are software, they can be installed in the electronic devices listed above, which can be implemented as multiple software or software modules (for example, to provide distributed services), or can be implemented as a single software or software modules.

The

terminal devices

101, 102, and 103 may be installed with image capture devices (eg, cameras) to capture video data. In practice, the smallest visual unit that composes a video is a frame. Each frame is a static image. A dynamic video is formed by synthesizing a sequence of temporally consecutive frames together. In addition, the

terminal devices

101, 102, 103 may also be installed with components for converting electrical signals into sounds (such as speakers) to play sounds, and may also be installed with devices for converting analog audio signals into digital audio signals (for example, microphone) to capture sound.

The server 105 may be a server that provides various services, such as a background server that provides support for multimedia applications installed on the

terminal devices

101 , 102 , and 103 . The background server can parse and store the received audio and video data upload requests and other data, and can also receive audio and video data acquisition requests sent by the

terminal devices

101, 102, and 103, and send the audio and video data acquisition requests. The indicated audio and video data are fed back to the

terminal devices

101 , 102 and 103 . In addition, the server 105 may, in response to a user's query request (eg, song query request), feed back information (eg, song information) corresponding to the query request to the

terminal devices

101 , 102 , and 103 .

It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server. In the case where the server is software, it can be implemented as a plurality of software or software modules (for example, for providing distributed services), or can be implemented as a single software or software module.

It should be noted that the audio processing methods provided by the embodiments of the present disclosure are generally executed by the

terminal devices

101 , 102 , and 103 , and correspondingly, the audio processing apparatuses are generally set in the

terminal devices

101 , 102 , and 103 .

It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

FIG. 2 is a flowchart of an audio processing method of an exemplary embodiment of the present disclosure. The method shown in FIG. 2 can be performed by any electronic device with audio processing function. The electronic device may be a PC computer, a tablet device, a personal digital assistant, a smart phone, or other devices capable of executing the above set of instructions.

In step S201, the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip are received. Here, the background audio may be background music or accompaniment when the user sings a song. The background audio of the first user is the background audio played when the first user sings. In some embodiments, the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip may be received at predetermined time intervals. The predetermined time interval may be a user-defined time interval, such as 20ms, but is not limited thereto.

In some embodiments, the above-mentioned background audio playback moment corresponding to the audio segment (hereinafter, may be denoted as T1) is obtained by subtracting the current playback moment of the background audio of the first user by a time delay due to audio capture. acquired. FIG. 3 is a schematic diagram illustrating acquiring a background audio playback moment corresponding to an audio segment according to an exemplary embodiment of the present disclosure. As shown in FIG. 3 , in the case where the user sings along with the background audio after the background audio is played, the current playing time (which may be represented as T0 ) of the background audio played locally by the user may be acquired. However, the time delay due to audio capture (i.e., the time difference between when a sound (such as a user's singing) is emitted and when it is captured by a capture device (such as a microphone), can be denoted as Tr), so that the The playing time of the background audio corresponding to the user's audio clip is not T0, but a time before T0, for example, T0-Tr.

In step S202, the playback position of the background audio of the second user may be adjusted according to the playback time of the background audio, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user. Here, the background audio of the second user is the same as the background audio of the first user. Here, the background audio of the second user represents the locally played background audio of the second user. Alignment of the adjusted background audio of the second user with the received audio clip of the first user indicates that there is no deviation between the background audio played locally by the second user and the received audio clip of the first user, in short, for example, The singing voice of the first user sounds in harmony with the accompaniment played locally by the second user. In addition, the adjustment of the background audio playback position is performed because there is a transmission delay when the audio clip of the first user is transmitted to the user equipment of the second user.

In some embodiments, in step S202, the background audio playback position of the second user may be determined first when the audio segment of the first user is received, and then, the background audio playback position is from the end of the second user's singing to the first In the case of a time interval when a user starts to sing or within a time interval from the second user's background audio to the first user's start to sing, the background audio playback position is adjusted to be played with the received background audio. corresponding time.

FIG. 4 is a schematic diagram of adjusting the playback position of background audio according to an exemplary embodiment of the present disclosure. As described in the background art of the present disclosure, due to the transmission delay, after A sings a sentence, B can actually hear it after a period of time. At this time, B will feel that the singing voice sung by A and the accompaniment played locally by B are correct. (A's singing is later than B's own accompaniment). For example, when B receives the singing voice at time T1 sung by A, the background audio played locally by B has been played until time T2, where T2 is equal to T1 + transmission delay Td. In this case, the playback position of the background audio played locally by B can be adjusted according to T1. As shown in Figure 4, B can rewind his accompaniment from time T2 to time T1 to start playing, and then he can sing with A. The singing is aligned. However, the rollback operation will make the user feel that the music has gone backwards, affecting the sense of hearing. In the above embodiment, by setting the background audio playback position within the time interval from the end of the second user's singing to the beginning of the first user's singing, or the time interval from the start of the second user's background audio playback to the beginning of the first user's singing Adjusting the background audio playback position under the circumstance that the audio playback position is adjusted can reduce the impact on the user's sense of hearing caused by the adjustment of the audio playback position. However, in the present disclosure, the adjustment of the background audio playback position of the second user may also be performed in other time intervals.

In other embodiments, in step S202, it may be first determined that the background audio of the second user is in the time interval from the end of the second user's singing to the beginning of the first user's singing, or from the beginning of the background audio of the second user to the beginning of the first user. The sub-interval with the smallest average audio energy in the singing time interval, and then, in the sub-interval, the playing position of the background audio of the second user is adjusted according to the playing time of the background audio. Since the playback position of the background audio of the second user is adjusted in the sub-section where the average audio energy is the smallest, the influence on the user's sense of hearing caused by the adjustment of the audio playback position can be minimized. In some embodiments, the background audio playback position of the second user may be determined first when the audio clip of the first user is received, and then, in response to the background audio playback position being in the above-mentioned sub-range, the background audio playback position is set to Adjusted to correspond to the received background audio playback time.

In some embodiments, the audio average energy of each sub-interval in the time interval can be calculated according to the following formula, and the sub-interval with the smallest frequency average energy is determined according to the calculated audio average energy of the respective interval:

Furthermore, in some embodiments, before receiving the audio segment and the background audio playing time, a communication connection can be established with the first user, and the background audio can be played, and the received audio segment of the first user is played. For example, in the case of real-time K-song chorus, the first user and the second user may connect microphones first, and then select a song to sing, and then both start to play the same background music at the same time.

In the above method, it is described that the second user can adjust the playback position of the background audio played locally by the second user when receiving the audio clip of the first user. However, in fact, in the case of chorus, in order to align the audio segment of the second user heard by the first user with the background audio played locally by the first user, the audio segment of the second user can also be aligned with the audio segment of the second user. The background audio playback time corresponding to the audio clips of the two users is sent to the first user, so that the first user can adjust the playback position of the background audio of the first user according to the received background audio playback time, so that the adjusted background audio of the first user can be adjusted. The background audio is aligned with the received audio segment of the second user. Therefore, in some embodiments, the audio processing method shown in FIG. 2 may further include: sending to the first user an audio clip of the second user collected during singing, and the background audio of the second user and the audio clip of the second user The corresponding background audio playback time. In some embodiments, the audio segment of the second user collected during singing and the background audio playback time of the second user's background audio corresponding to the audio segment of the second user may be sent to the first user at predetermined time intervals. Here, the time interval for transmitting the audio segment of the second user may be the same as or different from the time interval for receiving the audio segment of the first user.

The audio processing method according to the exemplary embodiment of the present disclosure has been described above with reference to FIGS. 2 to 4 . According to the above audio processing method, deviations between the audio segment sent by the other party and the local background audio due to transmission delay can be avoided. In addition, the embodiments of the present disclosure can also reduce the impact on the sense of hearing when adjusting the playback position of the background audio.

In order to facilitate understanding of the above audio processing method, exemplary application scenarios of the audio processing method are briefly described below. FIG. 5 is a schematic diagram of an application scenario of an audio processing method according to an exemplary embodiment of the present disclosure. FIG. 5 shows that in the online KTV scene, when the first user and the second user perform a K-song chorus, the two users jointly sing a song "The Girl by the Bridge". During the singing process, the devices of the first user (A) and the second user (B) can display the lyrics corresponding to the background music, and each sentence is marked in the lyrics file whether A sings or B sings, and A and B sing in turn. own sentences. According to the above audio processing method, for example, when B receives a singing voice from A, the switch from B to A (B has finished singing, A starts singing), or A sings the first sentence of the song, then B needs to perform a rollback operation (ie, the operation of adjusting the playback position of the background audio described above with reference to FIG. 2 to FIG. 4 ) according to the playback time T1 of the background audio received from A, so that the background music is played from the position T1 . In the example scenario of FIG. 5 , for example, before the other party sings "I welcome the fragrance under the warm sun", the rollback operation may be performed according to the playing time T1 of the background audio received from the other party.

6 , the audio processing apparatus 600 may include a receiving unit 601 and an adjusting unit 602 . In some embodiments, the receiving unit 601 may be configured to receive an audio clip of the first user collected during singing and a background audio playback moment of the first user's background audio corresponding to the audio clip. The adjustment unit 602 may be configured to adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is aligned with the received audio segment of the first user. Here, the background audio of the second user is the same as the background audio of the first user. In some embodiments, the audio processing apparatus 600 may further include a sending unit (not shown), and the sending unit may send to the first user the audio clip of the second user collected during singing and the background audio of the second user and the second user The background audio playback time corresponding to the user's audio clip. In some embodiments, the audio processing apparatus 600 may further include a communication unit (not shown) and an audio playback unit (not shown). The communication unit may establish a communication connection with the first user before receiving the audio segment and the background audio playing time. The audio playing unit may play the background audio and play the received audio clip of the first user. In addition, the audio playing unit may also play the collected audio of the second user.

Since the audio processing method shown in FIG. 2 can be performed by the audio processing apparatus 600 shown in FIG. 6 , and the receiving unit 601 and the adjusting unit 602 can respectively perform operations corresponding to steps 201 and 202 in FIG. For any relevant details involved in the operations performed by the units in FIG. 6 , reference may be made to the corresponding descriptions in relation to FIG. 2 , which will not be repeated here.

In addition, it should be noted that, although the audio processing apparatus 600 is described above as being divided into units for performing corresponding processing respectively, it is clear to those skilled in the art that the processing performed by the above units can also be performed in the audio processing unit. The apparatus 600 is executed without any specific unit division or clear demarcation between the units. In addition, the audio processing apparatus 500 may further include other units, for example, a storage unit.

FIG. 7 is a block diagram of an electronic device 700 according to an embodiment of the present disclosure. 7, an electronic device 700 may include at least one memory 701 and at least one processor 702, the at least one memory stores a set of computer-executable instructions, when the set of computer-executable instructions is executed by the at least one processor, the execution is performed according to the The audio processing method of the embodiment of the present disclosure.

In some embodiments, the electronic device may be a PC computer, a tablet device, a personal digital assistant, a smart phone, or other device capable of executing the above set of instructions. Here, the electronic device does not have to be a single electronic device, but can also be any set of devices or circuits that can individually or jointly execute the above-mentioned instructions (or instruction sets). The electronic device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces locally or remotely (e.g., via wireless transmission).

In an electronic device, a processor may include a central processing unit (CPU), a graphics processing unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. In some embodiments, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor may execute instructions or code stored in memory, which may also store data. Instructions and data may also be sent and received over a network via a network interface device, which may employ any known transport protocol.

The memory may be integrated with the processor, eg, RAM or flash memory arranged within an integrated circuit microprocessor or the like. Additionally, the memory may comprise a separate device such as an external disk drive, a storage array, or any other storage device that may be used by a database system. The memory and the processor may be operatively coupled, or may communicate with each other, eg, through I/O ports, network connections, etc., to enable the processor to read files stored in the memory.

In addition, the electronic device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device can be connected to each other via a bus and/or a network.

According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to perform the audio processing method according to an exemplary embodiment of the present disclosure . Examples of the computer-readable storage medium herein include: Read Only Memory (ROM), Random Access Programmable Read Only Memory (PROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Random Access Memory (RAM) , dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM , DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or Optical Disc Storage, Hard Disk Drive (HDD), Solid State Hard disk (SSD), card memory (such as a multimedia card, Secure Digital (SD) card, or Extreme Digital (XD) card), magnetic tape, floppy disk, magneto-optical data storage device, optical data storage device, hard disk, solid state disk, and any other apparatuses configured to store, in a non-transitory manner, a computer program and any associated data, data files and data structures and to provide said computer program and any associated data, data files and data structures The computer program is given to a processor or computer so that the processor or computer can execute the computer program. The computer program in the above-mentioned computer readable storage medium can be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, etc. In addition, in one example, the computer program and any associated data, data files and data structures are distributed over networked computer systems so that the computer programs and any associated data, data files and data structures are stored, accessed and executed in a distributed fashion by one or more processors or computers.

According to an embodiment of the present disclosure, a computer program product may also be provided, including computer instructions, which when executed by a processor implement the audio processing method according to an exemplary embodiment of the present disclosure.

All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the protection scope required by the present disclosure.

Claims

An audio processing method, comprising:

receiving the audio clip of the first user collected during singing and the background audio playback time corresponding to the audio clip of the background audio of the first user;

Adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user, wherein the background audio of the second user is the same as The background audio for the first user is the same.
The audio processing method according to claim 1, wherein the adjusting the playing of the background audio of the second user according to the playing time of the background audio comprises:

Determine the sub-interval with the smallest average audio energy in the time interval from the end of the second user's singing to the beginning of the first user's singing or the time interval from the second user's background audio to the first user's beginning to sing the background audio of the second user ;

Adjust the playback position of the background audio of the second user in the sub-section according to the playback time of the background audio.
The audio processing method of claim 1, wherein the background audio playing time is obtained by subtracting a time delay caused by audio collection from the current playing time of the background audio of the first user.
The audio processing method according to claim 1, wherein the adjusting the playing position of the background audio of the second user according to the playing time of the background audio comprises:

determining the background audio playback position of the second user when the audio clip of the first user is received;

When the background audio playback position is within the time interval from the end of the second user's singing to the first user's beginning to sing, or the time interval from the second user's background audio to the first user's beginning to sing, the The background audio playback position is adjusted to correspond to the received background audio playback time.
The audio processing method according to claim 2, wherein the adjusting the playback position of the background audio of the second user in the sub-interval according to the playback time of the background audio comprises:

determining the background audio playback position of the second user when the audio clip of the first user is received;

In response to the background audio playback position being in the sub-section, the background audio playback position is adjusted to correspond to the received background audio playback time.
The audio processing method according to claim 2, wherein the determining that the background audio of the second user is in a time interval from the end of the second user's singing to the beginning of the first user's singing, or from the beginning of the background audio of the second user to the first A sub-interval with the smallest average audio energy in the time interval when the user starts singing, including:

Calculate the audio average energy of each sub-interval in the time interval according to the following formula, and determine the sub-interval with the smallest frequency average energy according to the calculated audio average energy of the respective interval:

Among them, E(ab) is the average energy of sub-interval ab, TSb is the number of sampling points up to time b, TSa is the number of sampling points up to time a, TSb-TSa is the number of sampling points between intervals ab number, S(i) is the amplitude of the ith sampling point.
The audio processing method according to claim 2, further comprising: sending to the first user the audio clip of the second user collected during singing and the background audio playback time of the second user's background audio corresponding to the audio clip of the second user .
The audio processing method of claim 1, further comprising:

establishing a communication connection with the first user;

The background audio is played, and the received audio clip of the first user is played.
The audio processing method according to claim 1, wherein the receiving the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip comprises:

The audio segment of the first user collected during the performance and the background audio playback time corresponding to the audio segment of the background audio of the first user are received at predetermined time intervals.
An audio processing device, comprising:

a receiving unit, configured to receive the audio clip of the first user collected during singing and the background audio playback time corresponding to the audio clip of the background audio of the first user;

The adjustment unit is configured to adjust the playback position of the background audio of the second user according to the playback time of the background audio, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user, wherein the The background audio of the second user is the same as the background audio of the first user.
The audio processing apparatus according to claim 10, wherein the adjusting the playback of the background audio of the second user according to the playback time of the background audio comprises:

Determine the sub-interval with the smallest average audio energy in the time interval from the end of the second user's singing to the beginning of the first user's singing or the time interval from the second user's background audio to the first user's beginning to sing the background audio of the second user ;

Adjust the playback position of the background audio of the second user in the sub-section according to the playback time of the background audio.
The audio processing apparatus of claim 10, wherein the background audio playing time is obtained by subtracting a time delay caused by audio collection from the current playing time of the background audio of the first user.
The audio processing apparatus according to claim 10, wherein the adjusting the playing position of the background audio of the second user according to the playing time of the background audio comprises:

determining the background audio playback position of the second user when the audio clip of the first user is received;

When the background audio playback position is within the time interval from the end of the second user's singing to the first user's beginning to sing, or the time interval from the second user's background audio to the first user's beginning to sing, the The background audio playback position is adjusted to correspond to the received background audio playback time.
The audio processing apparatus according to claim 11, wherein the adjusting the playback position of the background audio of the second user in the sub-section according to the playback time of the background audio comprises:

determining the background audio playback position of the second user when the audio clip of the first user is received;

In response to the background audio playback position being in the sub-section, the background audio playback position is adjusted to correspond to the received background audio playback time.
The audio processing apparatus according to claim 11, wherein the determining that the background audio of the second user is played in a time interval from the end of the second user's singing to the beginning of the first user's singing or from the beginning of the background audio of the second user to the first A sub-interval with the smallest average audio energy in the time interval when the user starts singing, including:

Calculate the audio average energy of each sub-interval in the time interval according to the following formula, and determine the sub-interval with the smallest frequency average energy according to the calculated audio average energy of the respective interval:

Among them, E(ab) is the average energy of sub-interval ab, TSb is the number of sampling points up to time b, TSa is the number of sampling points up to time a, TSb-TSa is the number of sampling points between intervals ab number, S(i) is the amplitude of the ith sampling point.
The audio processing apparatus according to claim 11, further comprising: a sending unit configured to send to the first user the audio clip of the second user collected during singing, the background audio of the second user, and the audio clip of the second user The corresponding background audio playback time.
The audio processing device of claim 10, further comprising:

a communication unit, configured to establish a communication connection with the first user;

An audio playing unit configured to play the background audio and play the received audio clip of the first user.
The audio processing apparatus according to claim 10, wherein the receiving unit receives the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip at predetermined time intervals.
An electronic device comprising:

at least one processor;

at least one memory storing computer-executable instructions,

Wherein, the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the following steps:

receiving the audio clip of the first user collected during singing and the background audio playback time corresponding to the audio clip of the background audio of the first user;

Adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user, wherein the background audio of the second user is the same as The background audio for the first user is the same.
A computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to perform the following steps:

receiving the audio clip of the first user collected during singing and the background audio playback time corresponding to the audio clip of the background audio of the first user;

Adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user, wherein the background audio of the second user is the same as The background audio for the first user is the same.