WO2022160669A1 - Audio processing method and audio processing apparatus - Google Patents

Audio processing method and audio processing apparatus Download PDF

Info

Publication number
WO2022160669A1
WO2022160669A1 PCT/CN2021/113086 CN2021113086W WO2022160669A1 WO 2022160669 A1 WO2022160669 A1 WO 2022160669A1 CN 2021113086 W CN2021113086 W CN 2021113086W WO 2022160669 A1 WO2022160669 A1 WO 2022160669A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
audio
background audio
background
time
Prior art date
Application number
PCT/CN2021/113086
Other languages
French (fr)
Chinese (zh)
Inventor
邢文浩
张晨
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2022160669A1 publication Critical patent/WO2022160669A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/005Musical accompaniment, i.e. complete instrumental rhythm synthesis added to a performed melody, e.g. as output by drum machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/171Transmission of musical instrument data, control or status information; Transmission, remote access or control of music data for electrophonic musical instruments

Definitions

  • the present disclosure relates to the field of signal processing, and in particular, to an audio processing method, an apparatus, an electronic device, and a storage medium.
  • Online KTV chorus means that two people (for example, A and B) choose the same song to chorus. At this time, both A and B can hear each other's singing and their own accompaniment, just like offline KTV chorus.
  • the present disclosure provides an audio processing method, apparatus, electronic device and storage medium.
  • an audio processing method comprising: receiving an audio segment of a first user collected during singing and a background audio of the first user corresponding to the audio segment the playing time of the background audio; adjust the playing position of the background audio of the second user according to the playing time of the background audio, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user, wherein the The background audio of the second user is the same as the background audio of the first user.
  • the background audio playback moment is obtained by subtracting a time delay due to audio capture from the current playback moment of the background audio of the first user.
  • the adjusting the playing position of the background audio of the second user according to the playing moment of the background audio includes: determining the playing position of the background audio of the second user when the audio clip of the first user is received; When the background audio playback position is within the time interval from the end of the second user's singing to the first user's beginning to sing, or the time interval from the second user's background audio to the first user's beginning to sing, the background audio The audio playback position is adjusted to correspond to the received background audio playback time.
  • the adjusting the playing of the background audio of the second user according to the playing moment of the background audio includes: determining that the background audio of the second user is in a time interval from the end of the second user's singing to the beginning of the first user's singing Or the sub-interval with the smallest average audio energy in the time interval from the second user's background audio to the first user's beginning to sing; in the sub-interval, adjust the playback position of the second user's background audio according to the background audio playback time .
  • the adjusting the playback position of the background audio of the second user in the sub-interval according to the playback moment of the background audio includes: determining the background audio of the second user when the audio clip of the first user is received Playing position; in response to the background audio playing position being in the sub-section, adjusting the background audio playing position to correspond to the received background audio playing time.
  • the sub-intervals with the smallest audio frequency average energy include: calculating the audio average energy of each sub-interval in the time interval according to the following formula, and determining the sub-intervals with the smallest frequency average energy according to the calculated audio average energy of the respective intervals:
  • E(ab) is the average energy of sub-interval ab
  • TSb is the number of sampling points up to time b
  • TSa is the number of sampling points up to time a
  • TSb-TSa is the number of sampling points between intervals ab number
  • S(i) is the amplitude of the ith sampling point.
  • the audio processing method further includes: sending to the first user the audio clip of the second user collected during singing and the background audio playing time of the second user's background audio corresponding to the audio clip of the second user .
  • the audio processing method further includes: establishing a communication connection with the first user; playing the background audio, and playing the received audio clip of the first user.
  • the receiving the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip includes: receiving the audio clip collected during singing according to a predetermined time interval. The audio clip of the first user and the background audio playing time of the background audio of the first user corresponding to the audio clip.
  • an audio processing apparatus comprising: a receiving unit configured to receive an audio segment of a first user collected during singing and an audio clip of the background audio of the first user The background audio playback time corresponding to the audio clip; the adjusting unit is configured to adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is the same as the received audio.
  • the audio clips of the first user are aligned, wherein the background audio of the second user is the same as the background audio of the first user.
  • the background audio playback moment is obtained by subtracting a time delay due to audio capture from the current playback moment of the background audio of the first user.
  • the adjusting the playback position of the background audio of the second user according to the playback moment of the background audio includes:
  • the The background audio playback position is adjusted to correspond to the received background audio playback time.
  • the adjusting the playing of the background audio of the second user according to the playing time of the background audio includes: determining that the background audio of the second user is in a time interval from the end of the second user's singing to the beginning of the first user's singing Or the sub-interval with the smallest average audio energy in the time interval from the second user's background audio to the first user's beginning to sing; in the sub-interval, adjust the playback position of the second user's background audio according to the background audio playback time .
  • the adjusting the playback position of the background audio of the second user in the sub-interval according to the playback moment of the background audio includes: determining the background audio of the second user when the audio clip of the first user is received Playing position; in response to the background audio playing position being in the sub-section, adjusting the background audio playing position to correspond to the received background audio playing time.
  • the sub-intervals with the smallest audio frequency average energy include: calculating the audio average energy of each sub-interval in the time interval according to the following formula, and determining the sub-intervals with the smallest frequency average energy according to the calculated audio average energy of the respective intervals:
  • E(ab) is the average energy of sub-interval ab
  • TSb is the number of sampling points up to time b
  • TSa is the number of sampling points up to time a
  • TSb-TSa is the number of sampling points between intervals ab number
  • S(i) is the amplitude of the ith sampling point.
  • the audio processing apparatus further includes: a sending unit configured to send to the first user the audio clip of the second user collected during singing, the background audio of the second user and the audio clip of the second user The corresponding background audio playback time.
  • the audio processing apparatus further includes: a communication unit configured to establish a communication connection with the first user; an audio playback unit configured to play the background audio and play the received audio of the first user. audio clip.
  • the receiving unit receives the audio segment of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio segment at predetermined time intervals.
  • an electronic device comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions are stored by the At least one processor, when run, causes the at least one processor to perform the audio processing method as described above.
  • a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the audio processing method as described above .
  • a computer program product comprising computer instructions, which when executed by a processor implement the audio processing method as described above.
  • the embodiment of the present disclosure adjusts the playback position of the second user's background audio according to the background audio playback time of the first user's background audio corresponding to its audio segment, so that the second user's adjusted background audio is the same as the received background audio.
  • the alignment of the audio clips of the first user can prevent the audio clips of the first user from being misaligned with the background audio of the local second user due to transmission delay, which affects the chorus experience.
  • the embodiments of the present disclosure can also reduce the impact on the sense of hearing when adjusting the playing position of the background audio of the second user.
  • FIG. 1 is an exemplary system architecture in which exemplary embodiments of the present disclosure may be applied;
  • FIG. 2 is a flowchart of an audio processing method of an exemplary embodiment of the present disclosure
  • FIG. 3 is a schematic diagram illustrating acquiring a background audio playback moment corresponding to an audio segment according to an exemplary embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of adjusting the playback position of background audio according to an exemplary embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of an application scenario of the audio processing method according to an exemplary embodiment of the present disclosure
  • FIG. 6 is a block diagram of an audio processing apparatus of an exemplary embodiment of the present disclosure.
  • FIG. 7 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • FIG. 1 illustrates an exemplary system architecture 100 in which exemplary embodiments of the present disclosure may be applied.
  • the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101 , 102 and 103 to interact with the server 105 through the network 104 to receive or send messages (eg, audio and video data upload requests, audio and video data acquisition requests) and the like.
  • Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as singing applications, audio and video recording software, audio and video players, instant communication tools, email clients, social platform software, and the like.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices with a display screen and capable of audio and video playback and recording, including but not limited to smart phones, tablet computers, laptop computers and desktops computer, etc.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above, which can be implemented as multiple software or software modules (for example, to provide distributed services), or can be implemented as a single software or software modules.
  • the terminal devices 101, 102, and 103 may be installed with image capture devices (eg, cameras) to capture video data.
  • image capture devices eg, cameras
  • the smallest visual unit that composes a video is a frame.
  • Each frame is a static image.
  • a dynamic video is formed by synthesizing a sequence of temporally consecutive frames together.
  • the terminal devices 101, 102, 103 may also be installed with components for converting electrical signals into sounds (such as speakers) to play sounds, and may also be installed with devices for converting analog audio signals into digital audio signals (for example, microphone) to capture sound.
  • the server 105 may be a server that provides various services, such as a background server that provides support for multimedia applications installed on the terminal devices 101 , 102 , and 103 .
  • the background server can parse and store the received audio and video data upload requests and other data, and can also receive audio and video data acquisition requests sent by the terminal devices 101, 102, and 103, and send the audio and video data acquisition requests.
  • the indicated audio and video data are fed back to the terminal devices 101 , 102 and 103 .
  • the server 105 may, in response to a user's query request (eg, song query request), feed back information (eg, song information) corresponding to the query request to the terminal devices 101 , 102 , and 103 .
  • the server may be hardware or software.
  • the server can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server.
  • the server is software, it can be implemented as a plurality of software or software modules (for example, for providing distributed services), or can be implemented as a single software or software module.
  • the audio processing methods provided by the embodiments of the present disclosure are generally executed by the terminal devices 101 , 102 , and 103 , and correspondingly, the audio processing apparatuses are generally set in the terminal devices 101 , 102 , and 103 .
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • FIG. 2 is a flowchart of an audio processing method of an exemplary embodiment of the present disclosure.
  • the method shown in FIG. 2 can be performed by any electronic device with audio processing function.
  • the electronic device may be a PC computer, a tablet device, a personal digital assistant, a smart phone, or other devices capable of executing the above set of instructions.
  • step S201 the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip are received.
  • the background audio may be background music or accompaniment when the user sings a song.
  • the background audio of the first user is the background audio played when the first user sings.
  • the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip may be received at predetermined time intervals.
  • the predetermined time interval may be a user-defined time interval, such as 20ms, but is not limited thereto.
  • the above-mentioned background audio playback moment corresponding to the audio segment (hereinafter, may be denoted as T1) is obtained by subtracting the current playback moment of the background audio of the first user by a time delay due to audio capture. acquired.
  • FIG. 3 is a schematic diagram illustrating acquiring a background audio playback moment corresponding to an audio segment according to an exemplary embodiment of the present disclosure. As shown in FIG. 3 , in the case where the user sings along with the background audio after the background audio is played, the current playing time (which may be represented as T0 ) of the background audio played locally by the user may be acquired.
  • the time delay due to audio capture i.e., the time difference between when a sound (such as a user's singing) is emitted and when it is captured by a capture device (such as a microphone), can be denoted as Tr), so that the The playing time of the background audio corresponding to the user's audio clip is not T0, but a time before T0, for example, T0-Tr.
  • the playback position of the background audio of the second user may be adjusted according to the playback time of the background audio, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user.
  • the background audio of the second user is the same as the background audio of the first user.
  • the background audio of the second user represents the locally played background audio of the second user. Alignment of the adjusted background audio of the second user with the received audio clip of the first user indicates that there is no deviation between the background audio played locally by the second user and the received audio clip of the first user, in short, for example, The singing voice of the first user sounds in harmony with the accompaniment played locally by the second user.
  • the adjustment of the background audio playback position is performed because there is a transmission delay when the audio clip of the first user is transmitted to the user equipment of the second user.
  • the background audio playback position of the second user may be determined first when the audio segment of the first user is received, and then, the background audio playback position is from the end of the second user's singing to the first In the case of a time interval when a user starts to sing or within a time interval from the second user's background audio to the first user's start to sing, the background audio playback position is adjusted to be played with the received background audio. corresponding time.
  • FIG. 4 is a schematic diagram of adjusting the playback position of background audio according to an exemplary embodiment of the present disclosure.
  • B due to the transmission delay, after A sings a sentence, B can actually hear it after a period of time. At this time, B will feel that the singing voice sung by A and the accompaniment played locally by B are correct. (A's singing is later than B's own accompaniment). For example, when B receives the singing voice at time T1 sung by A, the background audio played locally by B has been played until time T2, where T2 is equal to T1 + transmission delay Td. In this case, the playback position of the background audio played locally by B can be adjusted according to T1.
  • B can rewind his accompaniment from time T2 to time T1 to start playing, and then he can sing with A.
  • the singing is aligned.
  • the rollback operation will make the user feel that the music has gone backwards, affecting the sense of hearing.
  • Adjusting the background audio playback position under the circumstance that the audio playback position is adjusted can reduce the impact on the user's sense of hearing caused by the adjustment of the audio playback position.
  • the adjustment of the background audio playback position of the second user may also be performed in other time intervals.
  • step S202 it may be first determined that the background audio of the second user is in the time interval from the end of the second user's singing to the beginning of the first user's singing, or from the beginning of the background audio of the second user to the beginning of the first user.
  • the sub-interval with the smallest average audio energy in the singing time interval and then, in the sub-interval, the playing position of the background audio of the second user is adjusted according to the playing time of the background audio. Since the playback position of the background audio of the second user is adjusted in the sub-section where the average audio energy is the smallest, the influence on the user's sense of hearing caused by the adjustment of the audio playback position can be minimized.
  • the background audio playback position of the second user may be determined first when the audio clip of the first user is received, and then, in response to the background audio playback position being in the above-mentioned sub-range, the background audio playback position is set to Adjusted to correspond to the received background audio playback time.
  • the audio average energy of each sub-interval in the time interval can be calculated according to the following formula, and the sub-interval with the smallest frequency average energy is determined according to the calculated audio average energy of the respective interval:
  • E(ab) is the average energy of sub-interval ab
  • TSb is the number of sampling points up to time b
  • TSa is the number of sampling points up to time a
  • TSb-TSa is the number of sampling points between intervals ab number
  • S(i) is the amplitude of the ith sampling point.
  • a communication connection can be established with the first user, and the background audio can be played, and the received audio segment of the first user is played.
  • the first user and the second user may connect microphones first, and then select a song to sing, and then both start to play the same background music at the same time.
  • the second user can adjust the playback position of the background audio played locally by the second user when receiving the audio clip of the first user.
  • the audio segment of the second user can also be aligned with the audio segment of the second user.
  • the background audio playback time corresponding to the audio clips of the two users is sent to the first user, so that the first user can adjust the playback position of the background audio of the first user according to the received background audio playback time, so that the adjusted background audio of the first user can be adjusted.
  • the background audio is aligned with the received audio segment of the second user.
  • the audio processing method shown in FIG. 2 may further include: sending to the first user an audio clip of the second user collected during singing, and the background audio of the second user and the audio clip of the second user The corresponding background audio playback time.
  • the audio segment of the second user collected during singing and the background audio playback time of the second user's background audio corresponding to the audio segment of the second user may be sent to the first user at predetermined time intervals.
  • the time interval for transmitting the audio segment of the second user may be the same as or different from the time interval for receiving the audio segment of the first user.
  • the audio processing method according to the exemplary embodiment of the present disclosure has been described above with reference to FIGS. 2 to 4 . According to the above audio processing method, deviations between the audio segment sent by the other party and the local background audio due to transmission delay can be avoided. In addition, the embodiments of the present disclosure can also reduce the impact on the sense of hearing when adjusting the playback position of the background audio.
  • FIG. 5 is a schematic diagram of an application scenario of an audio processing method according to an exemplary embodiment of the present disclosure.
  • FIG. 5 shows that in the online KTV scene, when the first user and the second user perform a K-song chorus, the two users jointly sing a song "The Girl by the Bridge".
  • the devices of the first user (A) and the second user (B) can display the lyrics corresponding to the background music, and each sentence is marked in the lyrics file whether A sings or B sings, and A and B sing in turn. own sentences.
  • the switch from B to A B has finished singing, A starts singing), or A sings the first sentence of the song
  • B needs to perform a rollback operation (ie, the operation of adjusting the playback position of the background audio described above with reference to FIG. 2 to FIG. 4 ) according to the playback time T1 of the background audio received from A, so that the background music is played from the position T1 .
  • the rollback operation may be performed according to the playing time T1 of the background audio received from the other party.
  • FIG. 6 is a block diagram of an audio processing apparatus of an exemplary embodiment of the present disclosure.
  • the audio processing apparatus 600 may include a receiving unit 601 and an adjusting unit 602 .
  • the receiving unit 601 may be configured to receive an audio clip of the first user collected during singing and a background audio playback moment of the first user's background audio corresponding to the audio clip.
  • the adjustment unit 602 may be configured to adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is aligned with the received audio segment of the first user.
  • the background audio of the second user is the same as the background audio of the first user.
  • the audio processing apparatus 600 may further include a sending unit (not shown), and the sending unit may send to the first user the audio clip of the second user collected during singing and the background audio of the second user and the second user The background audio playback time corresponding to the user's audio clip.
  • the audio processing apparatus 600 may further include a communication unit (not shown) and an audio playback unit (not shown).
  • the communication unit may establish a communication connection with the first user before receiving the audio segment and the background audio playing time.
  • the audio playing unit may play the background audio and play the received audio clip of the first user.
  • the audio playing unit may also play the collected audio of the second user.
  • the audio processing method shown in FIG. 2 can be performed by the audio processing apparatus 600 shown in FIG. 6 , and the receiving unit 601 and the adjusting unit 602 can respectively perform operations corresponding to steps 201 and 202 in FIG.
  • the receiving unit 601 and the adjusting unit 602 can respectively perform operations corresponding to steps 201 and 202 in FIG.
  • the receiving unit 601 and the adjusting unit 602 can respectively perform operations corresponding to steps 201 and 202 in FIG.
  • the audio processing apparatus 600 is described above as being divided into units for performing corresponding processing respectively, it is clear to those skilled in the art that the processing performed by the above units can also be performed in the audio processing unit.
  • the apparatus 600 is executed without any specific unit division or clear demarcation between the units.
  • the audio processing apparatus 500 may further include other units, for example, a storage unit.
  • FIG. 7 is a block diagram of an electronic device 700 according to an embodiment of the present disclosure.
  • an electronic device 700 may include at least one memory 701 and at least one processor 702, the at least one memory stores a set of computer-executable instructions, when the set of computer-executable instructions is executed by the at least one processor, the execution is performed according to the The audio processing method of the embodiment of the present disclosure.
  • the electronic device may be a PC computer, a tablet device, a personal digital assistant, a smart phone, or other device capable of executing the above set of instructions.
  • the electronic device does not have to be a single electronic device, but can also be any set of devices or circuits that can individually or jointly execute the above-mentioned instructions (or instruction sets).
  • the electronic device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces locally or remotely (e.g., via wireless transmission).
  • a processor may include a central processing unit (CPU), a graphics processing unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor.
  • processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
  • the processor may execute instructions or code stored in memory, which may also store data. Instructions and data may also be sent and received over a network via a network interface device, which may employ any known transport protocol.
  • the memory may be integrated with the processor, eg, RAM or flash memory arranged within an integrated circuit microprocessor or the like. Additionally, the memory may comprise a separate device such as an external disk drive, a storage array, or any other storage device that may be used by a database system.
  • the memory and the processor may be operatively coupled, or may communicate with each other, eg, through I/O ports, network connections, etc., to enable the processor to read files stored in the memory.
  • the electronic device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device can be connected to each other via a bus and/or a network.
  • a video display such as a liquid crystal display
  • a user interaction interface such as a keyboard, mouse, touch input device, etc.
  • a computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to perform the audio processing method according to an exemplary embodiment of the present disclosure .
  • Examples of the computer-readable storage medium herein include: Read Only Memory (ROM), Random Access Programmable Read Only Memory (PROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Random Access Memory (RAM) , dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM , DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or Optical Disc Storage, Hard Disk Drive (HDD), Solid State Hard disk (SSD), card memory (such as a multimedia card, Secure Digital (SD) card,
  • SD Secure Digital
  • the computer program in the above-mentioned computer readable storage medium can be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, etc.
  • the computer program and any associated data, data files and data structures are distributed over networked computer systems so that the computer programs and any associated data, data files and data structures are stored, accessed and executed in a distributed fashion by one or more processors or computers.
  • a computer program product may also be provided, including computer instructions, which when executed by a processor implement the audio processing method according to an exemplary embodiment of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

An audio processing method, comprising: receiving an audio clip, which is collected during singing, of a first user and a background audio playing time, which corresponds to the audio clip, of background audio of the first user (S201); and adjusting a playing position of background audio of a second user according to the background audio playing time, so that the background audio of the second user after the adjustment is aligned with the received audio clip of the first user (S202), wherein the background audio of the second user is the same as the background audio of the first user. Further disclosed are an audio processing apparatus, an electronic device and a storage medium.

Description

音频处理方法和音频处理装置Audio processing method and audio processing device
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请基于申请号为202110110917.7、申请日为2021年01月26日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on the Chinese patent application with the application number of 202110110917.7 and the filing date of January 26, 2021, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is incorporated herein by reference.
技术领域technical field
本公开涉及信号处理领域,尤其涉及一种音频处理方法、装置、电子设备及存储介质。The present disclosure relates to the field of signal processing, and in particular, to an audio processing method, an apparatus, an electronic device, and a storage medium.
背景技术Background technique
现在,线上KTV合唱变得越来越流行。线上KTV合唱指的是两个人(例如,A和B)选取同一首歌曲进进行合唱,这时A和B都能听到彼此的歌声以及自己的伴奏,像在线下KTV合唱一样。Now, online KTV chorus is becoming more and more popular. Online KTV chorus means that two people (for example, A and B) choose the same song to chorus. At this time, both A and B can hear each other's singing and their own accompaniment, just like offline KTV chorus.
发明内容SUMMARY OF THE INVENTION
本公开提供一种音频处理方法、装置、电子设备及存储介质。The present disclosure provides an audio processing method, apparatus, electronic device and storage medium.
根据本公开实施例的第一方面,提供了一种音频处理方法,所述音频处理方法包括:接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻;根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,以使第二用户的调整后的背景音频与接收到的第一用户的音频片段对齐,其中,第二用户的背景音频与第一用户的背景音频相同。According to a first aspect of the embodiments of the present disclosure, an audio processing method is provided, the audio processing method comprising: receiving an audio segment of a first user collected during singing and a background audio of the first user corresponding to the audio segment the playing time of the background audio; adjust the playing position of the background audio of the second user according to the playing time of the background audio, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user, wherein the The background audio of the second user is the same as the background audio of the first user.
在一些实施例中,所述背景音频播放时刻是通过将第一用户的背景音频的当前播放时刻减去由于音频采集而导致的时间延迟而获得的。In some embodiments, the background audio playback moment is obtained by subtracting a time delay due to audio capture from the current playback moment of the background audio of the first user.
在一些实施例中,所述根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,包括:确定在接收到第一用户的音频片段时第二用户的背景音频播放位置;在所述背景音频播放位置处于从第二用户演唱结束到第一用户开始演唱的时间区间内或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间内的情况下,将所述背景音频播放位置调整为与接收到的所述背景音频播放时刻对应。In some embodiments, the adjusting the playing position of the background audio of the second user according to the playing moment of the background audio includes: determining the playing position of the background audio of the second user when the audio clip of the first user is received; When the background audio playback position is within the time interval from the end of the second user's singing to the first user's beginning to sing, or the time interval from the second user's background audio to the first user's beginning to sing, the background audio The audio playback position is adjusted to correspond to the received background audio playback time.
在一些实施例中,所述根据所述背景音频播放时刻调整第二用户的背景音频的播放,包括:确定第二用户的背景音频在从第二用户演唱结束到第一用户开始演唱的时间区间或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间中音频平均能量最小的子区间;在所述子区间根据所述背景音频播放时刻调整第二用户的背景音频的播放位置。In some embodiments, the adjusting the playing of the background audio of the second user according to the playing moment of the background audio includes: determining that the background audio of the second user is in a time interval from the end of the second user's singing to the beginning of the first user's singing Or the sub-interval with the smallest average audio energy in the time interval from the second user's background audio to the first user's beginning to sing; in the sub-interval, adjust the playback position of the second user's background audio according to the background audio playback time .
在一些实施例中,所述在所述子区间根据所述背景音频播放时刻调整第二用户的背 景音频的播放位置,包括:确定在接收到第一用户的音频片段时第二用户的背景音频播放位置;响应于所述背景音频播放位置处于所述子区间,将所述背景音频播放位置调整为与接收到的所述背景音频播放时刻对应。In some embodiments, the adjusting the playback position of the background audio of the second user in the sub-interval according to the playback moment of the background audio includes: determining the background audio of the second user when the audio clip of the first user is received Playing position; in response to the background audio playing position being in the sub-section, adjusting the background audio playing position to correspond to the received background audio playing time.
在一些实施例中,所述确定第二用户的背景音频在从第二用户演唱结束到第一用户开始演唱的时间区间或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间中音频平均能量最小的子区间,包括:按照以下公式计算所述时间区间中的每个子区间的音频平均能量,并根据计算出的各自区间的音频平均能量确定频平均能量最小的子区间:In some embodiments, determining that the background audio of the second user is in a time interval from the end of the second user's singing to the beginning of the first user's singing or a time interval from the beginning of the second user's background audio to the first user's beginning to sing The sub-intervals with the smallest audio frequency average energy include: calculating the audio average energy of each sub-interval in the time interval according to the following formula, and determining the sub-intervals with the smallest frequency average energy according to the calculated audio average energy of the respective intervals:
Figure PCTCN2021113086-appb-000001
Figure PCTCN2021113086-appb-000001
其中,E(ab)为子区间ab的平均能量,TSb为截止到b时刻的采样点个数,TSa为截止到a时刻的采样点个数,TSb-TSa为区间ab之间的采样点个数,S(i)为第i个采样点的幅值。Among them, E(ab) is the average energy of sub-interval ab, TSb is the number of sampling points up to time b, TSa is the number of sampling points up to time a, TSb-TSa is the number of sampling points between intervals ab number, S(i) is the amplitude of the ith sampling point.
在一些实施例中,所述音频处理方法还包括:向第一用户发送演唱时采集的第二用户的音频片段以及第二用户的背景音频的与第二用户的音频片段对应的背景音频播放时刻。In some embodiments, the audio processing method further includes: sending to the first user the audio clip of the second user collected during singing and the background audio playing time of the second user's background audio corresponding to the audio clip of the second user .
在一些实施例中,所述音频处理方法还包括:与第一用户建立通信连接;播放所述背景音频,并且播放接收到的第一用户的音频片段。In some embodiments, the audio processing method further includes: establishing a communication connection with the first user; playing the background audio, and playing the received audio clip of the first user.
在一些实施例中,所述接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻,包括:按照预定时间间隔接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻。In some embodiments, the receiving the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip includes: receiving the audio clip collected during singing according to a predetermined time interval. The audio clip of the first user and the background audio playing time of the background audio of the first user corresponding to the audio clip.
根据本公开实施例的第二方面,提供了一种音频处理装置,所述音频处理装置包括:接收单元,被配置为接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻;调整单元,被配置为根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,以使第二用户的调整后的背景音频与接收到的第一用户的音频片段对齐,其中,第二用户的背景音频与第一用户的背景音频相同。According to a second aspect of the embodiments of the present disclosure, there is provided an audio processing apparatus, the audio processing apparatus comprising: a receiving unit configured to receive an audio segment of a first user collected during singing and an audio clip of the background audio of the first user The background audio playback time corresponding to the audio clip; the adjusting unit is configured to adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is the same as the received audio. The audio clips of the first user are aligned, wherein the background audio of the second user is the same as the background audio of the first user.
在一些实施例中,所述背景音频播放时刻是通过将第一用户的背景音频的当前播放时刻减去由于音频采集而导致的时间延迟而获得的。In some embodiments, the background audio playback moment is obtained by subtracting a time delay due to audio capture from the current playback moment of the background audio of the first user.
在一些实施例中,所述根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,包括:In some embodiments, the adjusting the playback position of the background audio of the second user according to the playback moment of the background audio includes:
确定在接收到第一用户的音频片段时第二用户的背景音频播放位置;determining the background audio playback position of the second user when the audio clip of the first user is received;
在所述背景音频播放位置处于从第二用户演唱结束到第一用户开始演唱的时间区间内或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间内的情况下,将所述背景音频播放位置调整为与接收到的所述背景音频播放时刻对应。When the background audio playback position is within the time interval from the end of the second user's singing to the first user's beginning to sing, or the time interval from the second user's background audio to the first user's beginning to sing, the The background audio playback position is adjusted to correspond to the received background audio playback time.
在一些实施例中,所述根据所述背景音频播放时刻调整第二用户的背景音频的播放, 包括:确定第二用户的背景音频在从第二用户演唱结束到第一用户开始演唱的时间区间或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间中音频平均能量最小的子区间;在所述子区间根据所述背景音频播放时刻调整第二用户的背景音频的播放位置。In some embodiments, the adjusting the playing of the background audio of the second user according to the playing time of the background audio includes: determining that the background audio of the second user is in a time interval from the end of the second user's singing to the beginning of the first user's singing Or the sub-interval with the smallest average audio energy in the time interval from the second user's background audio to the first user's beginning to sing; in the sub-interval, adjust the playback position of the second user's background audio according to the background audio playback time .
在一些实施例中,所述在所述子区间根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,包括:确定在接收到第一用户的音频片段时第二用户的背景音频播放位置;响应于所述背景音频播放位置处于所述子区间,将所述背景音频播放位置调整为与接收到的所述背景音频播放时刻对应。In some embodiments, the adjusting the playback position of the background audio of the second user in the sub-interval according to the playback moment of the background audio includes: determining the background audio of the second user when the audio clip of the first user is received Playing position; in response to the background audio playing position being in the sub-section, adjusting the background audio playing position to correspond to the received background audio playing time.
在一些实施例中,所述确定第二用户的背景音频在从第二用户演唱结束到第一用户开始演唱的时间区间或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间中音频平均能量最小的子区间,包括:按照以下公式计算所述时间区间中的每个子区间的音频平均能量,并根据计算出的各自区间的音频平均能量确定频平均能量最小的子区间:In some embodiments, determining that the background audio of the second user is in a time interval from the end of the second user's singing to the beginning of the first user's singing or a time interval from the beginning of the second user's background audio to the first user's beginning to sing The sub-intervals with the smallest audio frequency average energy include: calculating the audio average energy of each sub-interval in the time interval according to the following formula, and determining the sub-intervals with the smallest frequency average energy according to the calculated audio average energy of the respective intervals:
Figure PCTCN2021113086-appb-000002
Figure PCTCN2021113086-appb-000002
其中,E(ab)为子区间ab的平均能量,TSb为截止到b时刻的采样点个数,TSa为截止到a时刻的采样点个数,TSb-TSa为区间ab之间的采样点个数,S(i)为第i个采样点的幅值。Among them, E(ab) is the average energy of sub-interval ab, TSb is the number of sampling points up to time b, TSa is the number of sampling points up to time a, TSb-TSa is the number of sampling points between intervals ab number, S(i) is the amplitude of the ith sampling point.
在一些实施例中,所述音频处理装置还包括:发送单元,被配置为向第一用户发送演唱时采集的第二用户的音频片段以及第二用户的背景音频的与第二用户的音频片段对应的背景音频播放时刻。In some embodiments, the audio processing apparatus further includes: a sending unit configured to send to the first user the audio clip of the second user collected during singing, the background audio of the second user and the audio clip of the second user The corresponding background audio playback time.
在一些实施例中,所述音频处理装置还包括:通信单元,被配置为与第一用户建立通信连接;音频播放单元,被配置为播放所述背景音频,并且播放接收到的第一用户的音频片段。In some embodiments, the audio processing apparatus further includes: a communication unit configured to establish a communication connection with the first user; an audio playback unit configured to play the background audio and play the received audio of the first user. audio clip.
在一些实施例中,接收单元按照预定时间间隔接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻。In some embodiments, the receiving unit receives the audio segment of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio segment at predetermined time intervals.
根据本公开实施例的第三方面,提供了一种电子设备,所述电子设备包括:至少一个处理器;至少一个存储计算机可执行指令的存储器,其中,所述计算机可执行指令在被所述至少一个处理器运行时,促使所述至少一个处理器执行如上所述的音频处理方法。According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, the electronic device comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions are stored by the At least one processor, when run, causes the at least one processor to perform the audio processing method as described above.
根据本公开实施例的第四方面,提供了一种存储指令的计算机可读存储介质,当所述指令被至少一个处理器运行时,促使所述至少一个处理器执行如上所述的音频处理方法。According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the audio processing method as described above .
根据本公开实施例的第五方面,提供了一种计算机程序产品,包括计算机指令,所述计算机指令被处理器执行时实现如上所述的音频处理方法。According to a fifth aspect of the embodiments of the present disclosure, there is provided a computer program product comprising computer instructions, which when executed by a processor implement the audio processing method as described above.
本公开的实施例通过根据第一用户的背景音频的与其音频片段对应的背景音频播放时刻来调整第二用户的背景音频的播放位置,以使第二用户的调整后的背景音频与接 收到的第一用户的音频片段对齐,可以避免由于传输延迟导致第一用户的音频片段和本地第二用户的背景音频对不齐而影响合唱体验。此外,本公开的实施例还可以在调整第二用户的背景音频的播放位置时降低对听感的影响。The embodiment of the present disclosure adjusts the playback position of the second user's background audio according to the background audio playback time of the first user's background audio corresponding to its audio segment, so that the second user's adjusted background audio is the same as the received background audio. The alignment of the audio clips of the first user can prevent the audio clips of the first user from being misaligned with the background audio of the local second user due to transmission delay, which affects the chorus experience. In addition, the embodiments of the present disclosure can also reduce the impact on the sense of hearing when adjusting the playing position of the background audio of the second user.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的示例实施例,并与说明书一起用于解释本公开的原理,并不构成对本公开的不当限定。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate example embodiments consistent with the present disclosure, and together with the description serve to explain the principles of the present disclosure and do not unduly limit the disclosure.
图1是本公开的示例性实施例可以应用于其中的示例性系统架构;FIG. 1 is an exemplary system architecture in which exemplary embodiments of the present disclosure may be applied;
图2是本公开的示例性实施例的音频处理方法的流程图;2 is a flowchart of an audio processing method of an exemplary embodiment of the present disclosure;
图3是示出本公开的示例性实施例的获取与音频片段对应的背景音频播放时刻的示意图;3 is a schematic diagram illustrating acquiring a background audio playback moment corresponding to an audio segment according to an exemplary embodiment of the present disclosure;
图4是本公开的示例性实施例的调整背景音频的播放位置的示意图;4 is a schematic diagram of adjusting the playback position of background audio according to an exemplary embodiment of the present disclosure;
图5是本公开的示例性实施例的音频处理方法的应用场景的示意图;5 is a schematic diagram of an application scenario of the audio processing method according to an exemplary embodiment of the present disclosure;
图6是本公开的示例性实施例的音频处理装置的框图;6 is a block diagram of an audio processing apparatus of an exemplary embodiment of the present disclosure;
图7是根据本公开示例性实施例的电子设备的框图。FIG. 7 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
具体实施方式Detailed ways
为了使本领域普通人员更好地理解本公开的技术方案,下面将结合附图,对本公开实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。It should be noted that the terms "first", "second" and the like in the description and claims of the present disclosure and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used may be interchanged under appropriate circumstances such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following examples are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.
在此需要说明的是,在本公开中出现的“若干项之中的至少一项”均表示包含“该若干项中的任意一项”、“该若干项中的任意多项的组合”、“该若干项的全体”这三类并列的情况。例如“包括A和B之中的至少一个”即包括如下三种并列的情况:(1)包括A;(2)包括B;(3)包括A和B。又例如“执行步骤一和步骤二之中的至少一个”,即表示如下三种并列的情况:(1)执行步骤一;(2)执行步骤二;(3)执行步骤一和步骤二。It should be noted here that "at least one of several items" in the present disclosure all means including "any one of the several items", "a combination of any of the several items", The three categories of "the whole of the several items" are juxtaposed. For example, "including at least one of A and B" includes the following three parallel situations: (1) including A; (2) including B; (3) including A and B. Another example is "execute at least one of step 1 and step 2", which means the following three parallel situations: (1) execute step 1; (2) execute step 2; (3) execute step 1 and step 2.
图1示出了本公开的示例性实施例可以应用于其中的示例性系统架构100。FIG. 1 illustrates an exemplary system architecture 100 in which exemplary embodiments of the present disclosure may be applied.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。 网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息(例如音视频数据上传请求、音视频数据获取请求)等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如唱歌类应用、音视频录制软件、音视频播放器、即时通信工具、邮箱客户端、社交平台软件等。终端设备101、102、103可以是硬件,也可以是软件。在终端设备101、102、103为硬件的情况下,可以是具有显示屏并且能够进行音视频播放和录制的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。在终端设备101、102、103为软件的情况下,可以安装在上述所列举的电子设备中,其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The user can use the terminal devices 101 , 102 and 103 to interact with the server 105 through the network 104 to receive or send messages (eg, audio and video data upload requests, audio and video data acquisition requests) and the like. Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as singing applications, audio and video recording software, audio and video players, instant communication tools, email clients, social platform software, and the like. The terminal devices 101, 102, and 103 may be hardware or software. In the case where the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices with a display screen and capable of audio and video playback and recording, including but not limited to smart phones, tablet computers, laptop computers and desktops computer, etc. In the case where the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above, which can be implemented as multiple software or software modules (for example, to provide distributed services), or can be implemented as a single software or software modules.
终端设备101、102、103可以安装有图像采集装置(例如摄像头),以采集视频数据。实践中,组成视频的最小视觉单位是帧(Frame)。每一帧是一幅静态的图像。将时间上连续的帧序列合成到一起便形成动态视频。此外,终端设备101、102、103也可以安装有用于将电信号转换为声音的组件(例如扬声器)以播放声音,并且还可以安装有用于将模拟音频信号转换为数字音频信号的装置(例如,麦克风)以采集声音。The terminal devices 101, 102, and 103 may be installed with image capture devices (eg, cameras) to capture video data. In practice, the smallest visual unit that composes a video is a frame. Each frame is a static image. A dynamic video is formed by synthesizing a sequence of temporally consecutive frames together. In addition, the terminal devices 101, 102, 103 may also be installed with components for converting electrical signals into sounds (such as speakers) to play sounds, and may also be installed with devices for converting analog audio signals into digital audio signals (for example, microphone) to capture sound.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上所安装的多媒体应用提供支持的后台服务器。后台服务器可以对所接收到的音视频数据上传请求等数据进行解析、存储等处理,并且还可以接收终端设备101、102、103所发送的音视频数据获取请求,并将该音视频数据获取请求所指示的音视频数据反馈至终端设备101、102、103。此外,服务器105可响应于用户的查询请求(例如,歌曲查询请求),将与查询请求对应的信息(例如,歌曲信息)反馈至终端设备101、102、103。The server 105 may be a server that provides various services, such as a background server that provides support for multimedia applications installed on the terminal devices 101 , 102 , and 103 . The background server can parse and store the received audio and video data upload requests and other data, and can also receive audio and video data acquisition requests sent by the terminal devices 101, 102, and 103, and send the audio and video data acquisition requests. The indicated audio and video data are fed back to the terminal devices 101 , 102 and 103 . In addition, the server 105 may, in response to a user's query request (eg, song query request), feed back information (eg, song information) corresponding to the query request to the terminal devices 101 , 102 , and 103 .
需要说明的是,服务器可以是硬件,也可以是软件。在服务器为硬件的情况下,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。在服务器为软件的情况下,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。It should be noted that the server may be hardware or software. When the server is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server. In the case where the server is software, it can be implemented as a plurality of software or software modules (for example, for providing distributed services), or can be implemented as a single software or software module.
需要说明的是,本公开实施例所提供的音频处理方法一般由终端设备101、102、103执行,相应地,音频处理装置一般设置于终端设备101、102、103中。It should be noted that the audio processing methods provided by the embodiments of the present disclosure are generally executed by the terminal devices 101 , 102 , and 103 , and correspondingly, the audio processing apparatuses are generally set in the terminal devices 101 , 102 , and 103 .
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
图2是本公开的示例性实施例的音频处理方法的流程图。图2所示的方法可由任意具有音频处理功能的电子设备执行。电子设备可以是PC计算机、平板装置、个人数字助理、智能手机、或其他能够执行上述指令集合的装置。FIG. 2 is a flowchart of an audio processing method of an exemplary embodiment of the present disclosure. The method shown in FIG. 2 can be performed by any electronic device with audio processing function. The electronic device may be a PC computer, a tablet device, a personal digital assistant, a smart phone, or other devices capable of executing the above set of instructions.
在步骤S201,接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻。这里,背景音频可以是用户演唱歌曲时的背景音乐或伴奏。第一用户的背景音频是第一用户演唱时播放的背景音频。在一些实施例中,可按照预定时间间隔接收演唱时采集的第一用户的音频片段以及第一用户的背景音 频的与所述音频片段对应的背景音频播放时刻。所述预定时间间隔可以是用户预先定义的时间间隔,例如20ms,但不限于此。In step S201, the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip are received. Here, the background audio may be background music or accompaniment when the user sings a song. The background audio of the first user is the background audio played when the first user sings. In some embodiments, the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip may be received at predetermined time intervals. The predetermined time interval may be a user-defined time interval, such as 20ms, but is not limited thereto.
在一些实施例中,上述与音频片段对应的背景音频播放时刻(在下文中,可被表示为T1)是通过将第一用户的背景音频的当前播放时刻减去由于音频采集而导致的时间延迟而获得的。图3是示出本公开的示例性实施例的获取与音频片段对应的背景音频播放时刻的示意图。如图3所示,在背景音频被播放后用户随着背景音频演唱的情况下,可获取用户本地播放的背景音频的当前播放时刻(可被表示为T0)。然而,由于音频采集而导致的时间延迟(即,声音(如用户的歌声)从发出到被采集设备(如麦克风)采集到之间的时间差,可被表示为Tr),使得与采集设备采集的用户的音频片段对应的背景音频的播放时刻并不是T0,而是在T0之前的一个时刻,例如为T0-Tr。In some embodiments, the above-mentioned background audio playback moment corresponding to the audio segment (hereinafter, may be denoted as T1) is obtained by subtracting the current playback moment of the background audio of the first user by a time delay due to audio capture. acquired. FIG. 3 is a schematic diagram illustrating acquiring a background audio playback moment corresponding to an audio segment according to an exemplary embodiment of the present disclosure. As shown in FIG. 3 , in the case where the user sings along with the background audio after the background audio is played, the current playing time (which may be represented as T0 ) of the background audio played locally by the user may be acquired. However, the time delay due to audio capture (i.e., the time difference between when a sound (such as a user's singing) is emitted and when it is captured by a capture device (such as a microphone), can be denoted as Tr), so that the The playing time of the background audio corresponding to the user's audio clip is not T0, but a time before T0, for example, T0-Tr.
在步骤S202,可根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,以使第二用户的调整后的背景音频与接收到的第一用户的音频片段对齐。这里,第二用户的背景音频与第一用户的背景音频相同。这里,第二用户的背景音频表示第二用户的本地播放的背景音频。第二用户的调整后的背景音频与接收到的第一用户的音频片段对齐表示第二用户本地播放的背景音频与接收到的第一用户的音频片段之间没有偏差,简言之,例如,第一用户的歌声听起来与第二用户的本地播放的伴奏相契合。此外,之所以进行背景音频播放位置的调整是因为,第一用户的音频片段在传输到第二用户的用户设备时会存在传输延迟。In step S202, the playback position of the background audio of the second user may be adjusted according to the playback time of the background audio, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user. Here, the background audio of the second user is the same as the background audio of the first user. Here, the background audio of the second user represents the locally played background audio of the second user. Alignment of the adjusted background audio of the second user with the received audio clip of the first user indicates that there is no deviation between the background audio played locally by the second user and the received audio clip of the first user, in short, for example, The singing voice of the first user sounds in harmony with the accompaniment played locally by the second user. In addition, the adjustment of the background audio playback position is performed because there is a transmission delay when the audio clip of the first user is transmitted to the user equipment of the second user.
在一些实施例中,在步骤S202,可首先确定在接收到第一用户的音频片段时第二用户的背景音频播放位置,随后,在所述背景音频播放位置处于从第二用户演唱结束到第一用户开始演唱的时间区间内或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间内的情况下,将所述背景音频播放位置调整为与接收到的所述背景音频播放时刻对应。In some embodiments, in step S202, the background audio playback position of the second user may be determined first when the audio segment of the first user is received, and then, the background audio playback position is from the end of the second user's singing to the first In the case of a time interval when a user starts to sing or within a time interval from the second user's background audio to the first user's start to sing, the background audio playback position is adjusted to be played with the received background audio. corresponding time.
图4是本公开的示例性实施例的调整背景音频的播放位置的示意图。如本公开背景技术所述,由于传输延迟,A唱完一句后,实际上是经过了一段时间B才能听到的,这时B会觉得A唱的歌声和B自己本地播放的伴奏是对不上的(A的歌声晚于B自己的伴奏)。例如,在B收到A唱的T1时刻的歌声的情况下,自己本地播放的背景音频已经播放到了T2时刻,其中,T2等于T1+传输延迟Td。在这种情况下,可以根据T1调整B本地播放的背景音频的播放位置,如图4所示,可将B将自己的伴奏从T2时刻回退到T1时刻开始播放,便可以和A唱的歌声对齐。但是回退操作会使用户听起来音乐发生了倒退,影响听感。在上述实施例中,通过在所述背景音频播放位置处于从第二用户演唱结束到第一用户开始演唱的时间区间内或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间内的情况下调整所述背景音频播放位置,可以降低因音频播放位置的调整对用户听感所造成的影响。然而,在本公开中,也可在其他时间区间进行第二用户的背景音频播放位置的调整。FIG. 4 is a schematic diagram of adjusting the playback position of background audio according to an exemplary embodiment of the present disclosure. As described in the background art of the present disclosure, due to the transmission delay, after A sings a sentence, B can actually hear it after a period of time. At this time, B will feel that the singing voice sung by A and the accompaniment played locally by B are correct. (A's singing is later than B's own accompaniment). For example, when B receives the singing voice at time T1 sung by A, the background audio played locally by B has been played until time T2, where T2 is equal to T1 + transmission delay Td. In this case, the playback position of the background audio played locally by B can be adjusted according to T1. As shown in Figure 4, B can rewind his accompaniment from time T2 to time T1 to start playing, and then he can sing with A. The singing is aligned. However, the rollback operation will make the user feel that the music has gone backwards, affecting the sense of hearing. In the above embodiment, by setting the background audio playback position within the time interval from the end of the second user's singing to the beginning of the first user's singing, or the time interval from the start of the second user's background audio playback to the beginning of the first user's singing Adjusting the background audio playback position under the circumstance that the audio playback position is adjusted can reduce the impact on the user's sense of hearing caused by the adjustment of the audio playback position. However, in the present disclosure, the adjustment of the background audio playback position of the second user may also be performed in other time intervals.
在其他实施例中,在步骤S202,可首先确定第二用户的背景音频在从第二用户演 唱结束到第一用户开始演唱的时间区间或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间中音频平均能量最小的子区间,然后,在所述子区间根据所述背景音频播放时刻调整第二用户的背景音频的播放位置。由于在音频平均能量最小的子区间进行第二用户的背景音频的播放位置的调整,因此,可最大化地降低因音频播放位置的调整对用户听感所造成的影响。在一些实施例中,可以首先确定在接收到第一用户的音频片段时第二用户的背景音频播放位置,然后,响应于所述背景音频播放位置处于上述子区间,将所述背景音频播放位置调整为与接收到的所述背景音频播放时刻对应。In other embodiments, in step S202, it may be first determined that the background audio of the second user is in the time interval from the end of the second user's singing to the beginning of the first user's singing, or from the beginning of the background audio of the second user to the beginning of the first user. The sub-interval with the smallest average audio energy in the singing time interval, and then, in the sub-interval, the playing position of the background audio of the second user is adjusted according to the playing time of the background audio. Since the playback position of the background audio of the second user is adjusted in the sub-section where the average audio energy is the smallest, the influence on the user's sense of hearing caused by the adjustment of the audio playback position can be minimized. In some embodiments, the background audio playback position of the second user may be determined first when the audio clip of the first user is received, and then, in response to the background audio playback position being in the above-mentioned sub-range, the background audio playback position is set to Adjusted to correspond to the received background audio playback time.
在一些实施例中,可按照以下公式计算所述时间区间中的每个子区间的音频平均能量,并根据计算出的各自区间的音频平均能量确定频平均能量最小的子区间:In some embodiments, the audio average energy of each sub-interval in the time interval can be calculated according to the following formula, and the sub-interval with the smallest frequency average energy is determined according to the calculated audio average energy of the respective interval:
Figure PCTCN2021113086-appb-000003
Figure PCTCN2021113086-appb-000003
其中,E(ab)为子区间ab的平均能量,TSb为截止到b时刻的采样点个数,TSa为截止到a时刻的采样点个数,TSb-TSa为区间ab之间的采样点个数,S(i)为第i个采样点的幅值。Among them, E(ab) is the average energy of sub-interval ab, TSb is the number of sampling points up to time b, TSa is the number of sampling points up to time a, TSb-TSa is the number of sampling points between intervals ab number, S(i) is the amplitude of the ith sampling point.
此外,在一些实施例中,在接收音频片段和背景音频播放时刻之前,可与第一用户建立通信连接,并且可播放背景音频,并且播放接收到的第一用户的音频片段。例如,在K歌实时合唱的情况下,第一用户和第二用户可首先进行连麦,然后选择演唱歌曲后,两者同时开始播放相同的背景音乐。Furthermore, in some embodiments, before receiving the audio segment and the background audio playing time, a communication connection can be established with the first user, and the background audio can be played, and the received audio segment of the first user is played. For example, in the case of real-time K-song chorus, the first user and the second user may connect microphones first, and then select a song to sing, and then both start to play the same background music at the same time.
在上述方法中,描述了第二用户可以在接收到第一用户的音频片段时调整第二用户本地播放的背景音频的播放位置。然而,事实上,在合唱的情况下,为了使第一用户听到的第二用户的音频片段也能与第一用户本地播放的背景音频对齐,也可将第二用户的音频片段和与第二用户的音频片段对应的背景音频播放时刻发送给第一用户,以便于第一用户根据接收到的背景音频播放时刻调整第一用户的背景音频的播放位置,以使第一用户的调整后的背景音频与接收到的第二用户的音频片段对齐。因此,在一些实施例中,图2所述的音频处理方法还可包括:向第一用户发送演唱时采集的第二用户的音频片段以及第二用户的背景音频的与第二用户的音频片段对应的背景音频播放时刻。在一些实施例中,可按照预定时间间隔向第一用户发送演唱时采集的第二用户的音频片段以及第二用户的背景音频的与第二用户的音频片段对应的背景音频播放时刻。这里,发送第二用户的音频片段的时间间隔可以与接收第一用户的音频片段的时间间隔相同或不同。In the above method, it is described that the second user can adjust the playback position of the background audio played locally by the second user when receiving the audio clip of the first user. However, in fact, in the case of chorus, in order to align the audio segment of the second user heard by the first user with the background audio played locally by the first user, the audio segment of the second user can also be aligned with the audio segment of the second user. The background audio playback time corresponding to the audio clips of the two users is sent to the first user, so that the first user can adjust the playback position of the background audio of the first user according to the received background audio playback time, so that the adjusted background audio of the first user can be adjusted. The background audio is aligned with the received audio segment of the second user. Therefore, in some embodiments, the audio processing method shown in FIG. 2 may further include: sending to the first user an audio clip of the second user collected during singing, and the background audio of the second user and the audio clip of the second user The corresponding background audio playback time. In some embodiments, the audio segment of the second user collected during singing and the background audio playback time of the second user's background audio corresponding to the audio segment of the second user may be sent to the first user at predetermined time intervals. Here, the time interval for transmitting the audio segment of the second user may be the same as or different from the time interval for receiving the audio segment of the first user.
以上已经参照图2至图4描述了根据本公开示例性实施例的音频处理方法,根据上述音频处理方法,可以避免由于传输延迟而导致对方发送的音频片段和本地背景音频出现偏差。此外,本公开的实施例还可以在调整背景音频的播放位置时降低对听感的影响。The audio processing method according to the exemplary embodiment of the present disclosure has been described above with reference to FIGS. 2 to 4 . According to the above audio processing method, deviations between the audio segment sent by the other party and the local background audio due to transmission delay can be avoided. In addition, the embodiments of the present disclosure can also reduce the impact on the sense of hearing when adjusting the playback position of the background audio.
为了便于理解上述音频处理方法,下面简要描述音频处理方法的示例性应用场景。图5是本公开的示例性实施例的音频处理方法的应用场景的示意图。图5示出线上KTV场景中第一用户和第二用户进行K歌合唱时两个用户共同演唱一首曲目“桥边姑娘”。在演唱过程中,第一用户(A)和第二用户(B)的设备上可以显示与背景音乐对应的 歌词,并且歌词文件里标注每一句是A唱还是B唱,A和B轮流唱标注的自己的句子。根据上述音频处理方法,例如,在B收到A发来的歌声的情况下,B到A的切换(B已唱完,A开始唱),或者A唱的是歌曲的第一句,这时B需要根据从A接收的背景音频播放时刻T1来进行回退操作(即,以上参照图2至图4描述的调整背景音频的播放位置的操作),使得从T1位置开始播放背景音乐。在图5的示例场景中,例如,可以在对方唱“暖阳下我迎芬芳”之前根据从对方接收的背景音频播放时刻T1进行回退操作。In order to facilitate understanding of the above audio processing method, exemplary application scenarios of the audio processing method are briefly described below. FIG. 5 is a schematic diagram of an application scenario of an audio processing method according to an exemplary embodiment of the present disclosure. FIG. 5 shows that in the online KTV scene, when the first user and the second user perform a K-song chorus, the two users jointly sing a song "The Girl by the Bridge". During the singing process, the devices of the first user (A) and the second user (B) can display the lyrics corresponding to the background music, and each sentence is marked in the lyrics file whether A sings or B sings, and A and B sing in turn. own sentences. According to the above audio processing method, for example, when B receives a singing voice from A, the switch from B to A (B has finished singing, A starts singing), or A sings the first sentence of the song, then B needs to perform a rollback operation (ie, the operation of adjusting the playback position of the background audio described above with reference to FIG. 2 to FIG. 4 ) according to the playback time T1 of the background audio received from A, so that the background music is played from the position T1 . In the example scenario of FIG. 5 , for example, before the other party sings "I welcome the fragrance under the warm sun", the rollback operation may be performed according to the playing time T1 of the background audio received from the other party.
图6是本公开的示例性实施例的音频处理装置的框图;6 is a block diagram of an audio processing apparatus of an exemplary embodiment of the present disclosure;
参照图6,音频处理装置600可包括接收单元601和调整单元602。在一些实施例中,接收单元601可被配置为接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻。调整单元602可被配置为根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,以使第二用户的调整后的背景音频与接收到的第一用户的音频片段对齐。这里,第二用户的背景音频与第一用户的背景音频相同。在一些实施例中,音频处理装置600还可包括发送单元(未示出),发送单元可向第一用户发送演唱时采集的第二用户的音频片段以及第二用户的背景音频的与第二用户的音频片段对应的背景音频播放时刻。在一些实施例中,音频处理装置600还可包括通信单元(未示出)和音频播放单元(未示出)。通信单元可在接收所述音频片段和所述背景音频播放时刻之前,与第一用户建立通信连接。音频播放单元可播放所述背景音频,并且播放接收到的第一用户的音频片段。此外,音频播放单元也可播放采集到的第二用户的音频。6 , the audio processing apparatus 600 may include a receiving unit 601 and an adjusting unit 602 . In some embodiments, the receiving unit 601 may be configured to receive an audio clip of the first user collected during singing and a background audio playback moment of the first user's background audio corresponding to the audio clip. The adjustment unit 602 may be configured to adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is aligned with the received audio segment of the first user. Here, the background audio of the second user is the same as the background audio of the first user. In some embodiments, the audio processing apparatus 600 may further include a sending unit (not shown), and the sending unit may send to the first user the audio clip of the second user collected during singing and the background audio of the second user and the second user The background audio playback time corresponding to the user's audio clip. In some embodiments, the audio processing apparatus 600 may further include a communication unit (not shown) and an audio playback unit (not shown). The communication unit may establish a communication connection with the first user before receiving the audio segment and the background audio playing time. The audio playing unit may play the background audio and play the received audio clip of the first user. In addition, the audio playing unit may also play the collected audio of the second user.
由于图2所示的音频处理方法可由图6所示的音频处理装置600来执行,并且接收单元601和调整单元602可分别执行与图2中的步骤201和步骤202对应的操作,因此,关于图6中的各单元所执行的操作中涉及的任何相关细节均可参见关于图2的相应描述,这里都不再赘述。Since the audio processing method shown in FIG. 2 can be performed by the audio processing apparatus 600 shown in FIG. 6 , and the receiving unit 601 and the adjusting unit 602 can respectively perform operations corresponding to steps 201 and 202 in FIG. For any relevant details involved in the operations performed by the units in FIG. 6 , reference may be made to the corresponding descriptions in relation to FIG. 2 , which will not be repeated here.
此外,需要说明的是,尽管以上在描述音频处理装置600时将其划分为用于分别执行相应处理的单元,然而,本领域技术人员清楚的是,上述各单元执行的处理也可以在音频处理装置600不进行任何具体单元划分或者各单元之间并无明确划界的情况下执行。此外,音频处理装置500还可包括其他单元,例如,存储单元。In addition, it should be noted that, although the audio processing apparatus 600 is described above as being divided into units for performing corresponding processing respectively, it is clear to those skilled in the art that the processing performed by the above units can also be performed in the audio processing unit. The apparatus 600 is executed without any specific unit division or clear demarcation between the units. In addition, the audio processing apparatus 500 may further include other units, for example, a storage unit.
图7是根据本公开实施例的电子设备700的框图。参照图7,电子设备700可包括至少一个存储器701和至少一个处理器702,所述至少一个存储器中存储有计算机可执行指令集合,当计算机可执行指令集合被至少一个处理器执行时,执行根据本公开实施例的音频处理方法。FIG. 7 is a block diagram of an electronic device 700 according to an embodiment of the present disclosure. 7, an electronic device 700 may include at least one memory 701 and at least one processor 702, the at least one memory stores a set of computer-executable instructions, when the set of computer-executable instructions is executed by the at least one processor, the execution is performed according to the The audio processing method of the embodiment of the present disclosure.
在一些实施例中,电子设备可以是PC计算机、平板装置、个人数字助理、智能手机、或其他能够执行上述指令集合的装置。这里,电子设备并非必须是单个的电子设备,还可以是任何能够单独或联合执行上述指令(或指令集)的装置或电路的集合体。电子设备还可以是集成控制系统或系统管理器的一部分,或者可被配置为与本地或远程(例 如,经由无线传输)以接口互联的便携式电子设备。In some embodiments, the electronic device may be a PC computer, a tablet device, a personal digital assistant, a smart phone, or other device capable of executing the above set of instructions. Here, the electronic device does not have to be a single electronic device, but can also be any set of devices or circuits that can individually or jointly execute the above-mentioned instructions (or instruction sets). The electronic device may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces locally or remotely (e.g., via wireless transmission).
在电子设备中,处理器可包括中央处理器(CPU)、图形处理器(GPU)、可编程逻辑装置、专用处理器系统、微控制器或微处理器。在一些实施例中,处理器还可包括模拟处理器、数字处理器、微处理器、多核处理器、处理器阵列、网络处理器等。In an electronic device, a processor may include a central processing unit (CPU), a graphics processing unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. In some embodiments, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
处理器可运行存储在存储器中的指令或代码,其中,存储器还可以存储数据。指令和数据还可经由网络接口装置而通过网络被发送和接收,其中,网络接口装置可采用任何已知的传输协议。The processor may execute instructions or code stored in memory, which may also store data. Instructions and data may also be sent and received over a network via a network interface device, which may employ any known transport protocol.
存储器可与处理器集成为一体,例如,将RAM或闪存布置在集成电路微处理器等之内。此外,存储器可包括独立的装置,诸如,外部盘驱动、存储阵列或任何数据库系统可使用的其他存储装置。存储器和处理器可在操作上进行耦合,或者可例如通过I/O端口、网络连接等互相通信,使得处理器能够读取存储在存储器中的文件。The memory may be integrated with the processor, eg, RAM or flash memory arranged within an integrated circuit microprocessor or the like. Additionally, the memory may comprise a separate device such as an external disk drive, a storage array, or any other storage device that may be used by a database system. The memory and the processor may be operatively coupled, or may communicate with each other, eg, through I/O ports, network connections, etc., to enable the processor to read files stored in the memory.
此外,电子设备还可包括视频显示器(诸如,液晶显示器)和用户交互接口(诸如,键盘、鼠标、触摸输入装置等)。电子设备的所有组件可经由总线和/或网络而彼此连接。In addition, the electronic device may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device can be connected to each other via a bus and/or a network.
根据本公开的实施例,还可提供一种存储指令的计算机可读存储介质,其中,当指令被至少一个处理器运行时,促使至少一个处理器执行根据本公开示例性实施例的音频处理方法。这里的计算机可读存储介质的示例包括:只读存储器(ROM)、随机存取可编程只读存储器(PROM)、电可擦除可编程只读存储器(EEPROM)、随机存取存储器(RAM)、动态随机存取存储器(DRAM)、静态随机存取存储器(SRAM)、闪存、非易失性存储器、CD-ROM、CD-R、CD+R、CD-RW、CD+RW、DVD-ROM、DVD-R、DVD+R、DVD-RW、DVD+RW、DVD-RAM、BD-ROM、BD-R、BD-R LTH、BD-RE、蓝光或光盘存储器、硬盘驱动器(HDD)、固态硬盘(SSD)、卡式存储器(诸如,多媒体卡、安全数字(SD)卡或极速数字(XD)卡)、磁带、软盘、磁光数据存储装置、光学数据存储装置、硬盘、固态盘以及任何其他装置,所述任何其他装置被配置为以非暂时性方式存储计算机程序以及任何相关联的数据、数据文件和数据结构并将所述计算机程序以及任何相关联的数据、数据文件和数据结构提供给处理器或计算机使得处理器或计算机能执行所述计算机程序。上述计算机可读存储介质中的计算机程序可在诸如客户端、主机、代理装置、服务器等计算机设备中部署的环境中运行,此外,在一个示例中,计算机程序以及任何相关联的数据、数据文件和数据结构分布在联网的计算机系统上,使得计算机程序以及任何相关联的数据、数据文件和数据结构通过一个或多个处理器或计算机以分布式方式存储、访问和执行。According to an embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to perform the audio processing method according to an exemplary embodiment of the present disclosure . Examples of the computer-readable storage medium herein include: Read Only Memory (ROM), Random Access Programmable Read Only Memory (PROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Random Access Memory (RAM) , dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM , DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or Optical Disc Storage, Hard Disk Drive (HDD), Solid State Hard disk (SSD), card memory (such as a multimedia card, Secure Digital (SD) card, or Extreme Digital (XD) card), magnetic tape, floppy disk, magneto-optical data storage device, optical data storage device, hard disk, solid state disk, and any other apparatuses configured to store, in a non-transitory manner, a computer program and any associated data, data files and data structures and to provide said computer program and any associated data, data files and data structures The computer program is given to a processor or computer so that the processor or computer can execute the computer program. The computer program in the above-mentioned computer readable storage medium can be executed in an environment deployed in a computer device such as a client, a host, a proxy device, a server, etc. In addition, in one example, the computer program and any associated data, data files and data structures are distributed over networked computer systems so that the computer programs and any associated data, data files and data structures are stored, accessed and executed in a distributed fashion by one or more processors or computers.
根据本公开的实施例中,还可提供一种计算机程序产品,包括计算机指令,所述计算机指令被处理器执行时实现根据本公开示例性实施例的音频处理方法。According to an embodiment of the present disclosure, a computer program product may also be provided, including computer instructions, which when executed by a processor implement the audio processing method according to an exemplary embodiment of the present disclosure.
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。All the embodiments of the present disclosure can be implemented independently or in combination with other embodiments, which are all regarded as the protection scope required by the present disclosure.

Claims (20)

  1. 一种音频处理方法,包括:An audio processing method, comprising:
    接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻;receiving the audio clip of the first user collected during singing and the background audio playback time corresponding to the audio clip of the background audio of the first user;
    根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,以使第二用户的调整后的背景音频与接收到的第一用户的音频片段对齐,其中,第二用户的背景音频与第一用户的背景音频相同。Adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user, wherein the background audio of the second user is the same as The background audio for the first user is the same.
  2. 如权利要求1所述的音频处理方法,其中,所述根据所述背景音频播放时刻调整第二用户的背景音频的播放,包括:The audio processing method according to claim 1, wherein the adjusting the playing of the background audio of the second user according to the playing time of the background audio comprises:
    确定第二用户的背景音频在从第二用户演唱结束到第一用户开始演唱的时间区间或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间中音频平均能量最小的子区间;Determine the sub-interval with the smallest average audio energy in the time interval from the end of the second user's singing to the beginning of the first user's singing or the time interval from the second user's background audio to the first user's beginning to sing the background audio of the second user ;
    在所述子区间根据所述背景音频播放时刻调整第二用户的背景音频的播放位置。Adjust the playback position of the background audio of the second user in the sub-section according to the playback time of the background audio.
  3. 如权利要求1所述的音频处理方法,其中,所述背景音频播放时刻是通过将第一用户的背景音频的当前播放时刻减去由于音频采集而导致的时间延迟而获得的。The audio processing method of claim 1, wherein the background audio playing time is obtained by subtracting a time delay caused by audio collection from the current playing time of the background audio of the first user.
  4. 如权利要求1所述的音频处理方法,其中,所述根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,包括:The audio processing method according to claim 1, wherein the adjusting the playing position of the background audio of the second user according to the playing time of the background audio comprises:
    确定在接收到第一用户的音频片段时第二用户的背景音频播放位置;determining the background audio playback position of the second user when the audio clip of the first user is received;
    在所述背景音频播放位置处于从第二用户演唱结束到第一用户开始演唱的时间区间内或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间内的情况下,将所述背景音频播放位置调整为与接收到的所述背景音频播放时刻对应。When the background audio playback position is within the time interval from the end of the second user's singing to the first user's beginning to sing, or the time interval from the second user's background audio to the first user's beginning to sing, the The background audio playback position is adjusted to correspond to the received background audio playback time.
  5. 如权利要求2所述的音频处理方法,其中,所述在所述子区间根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,包括:The audio processing method according to claim 2, wherein the adjusting the playback position of the background audio of the second user in the sub-interval according to the playback time of the background audio comprises:
    确定在接收到第一用户的音频片段时第二用户的背景音频播放位置;determining the background audio playback position of the second user when the audio clip of the first user is received;
    响应于所述背景音频播放位置处于所述子区间,将所述背景音频播放位置调整为与接收到的所述背景音频播放时刻对应。In response to the background audio playback position being in the sub-section, the background audio playback position is adjusted to correspond to the received background audio playback time.
  6. 如权利要求2所述的音频处理方法,其中,所述确定第二用户的背景音频在从第二用户演唱结束到第一用户开始演唱的时间区间或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间中音频平均能量最小的子区间,包括:The audio processing method according to claim 2, wherein the determining that the background audio of the second user is in a time interval from the end of the second user's singing to the beginning of the first user's singing, or from the beginning of the background audio of the second user to the first A sub-interval with the smallest average audio energy in the time interval when the user starts singing, including:
    按照以下公式计算所述时间区间中的每个子区间的音频平均能量,并根据计算出的各自区间的音频平均能量确定频平均能量最小的子区间:Calculate the audio average energy of each sub-interval in the time interval according to the following formula, and determine the sub-interval with the smallest frequency average energy according to the calculated audio average energy of the respective interval:
    Figure PCTCN2021113086-appb-100001
    Figure PCTCN2021113086-appb-100001
    其中,E(ab)为子区间ab的平均能量,TSb为截止到b时刻的采样点个数,TSa为截止到a时刻的采样点个数,TSb-TSa为区间ab之间的采样点个数,S(i)为第i个采样点的幅值。Among them, E(ab) is the average energy of sub-interval ab, TSb is the number of sampling points up to time b, TSa is the number of sampling points up to time a, TSb-TSa is the number of sampling points between intervals ab number, S(i) is the amplitude of the ith sampling point.
  7. 如权利要求2所述的音频处理方法,还包括:向第一用户发送演唱时采集的第二用户的音频片段以及第二用户的背景音频的与第二用户的音频片段对应的背景音频播放时刻。The audio processing method according to claim 2, further comprising: sending to the first user the audio clip of the second user collected during singing and the background audio playback time of the second user's background audio corresponding to the audio clip of the second user .
  8. 如权利要求1所述的音频处理方法,还包括:The audio processing method of claim 1, further comprising:
    与第一用户建立通信连接;establishing a communication connection with the first user;
    播放所述背景音频,并且播放接收到的第一用户的音频片段。The background audio is played, and the received audio clip of the first user is played.
  9. 如权利要求1所述的音频处理方法,其中,所述接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻,包括:The audio processing method according to claim 1, wherein the receiving the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip comprises:
    按照预定时间间隔接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻。The audio segment of the first user collected during the performance and the background audio playback time corresponding to the audio segment of the background audio of the first user are received at predetermined time intervals.
  10. 一种音频处理装置,包括:An audio processing device, comprising:
    接收单元,被配置为接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻;a receiving unit, configured to receive the audio clip of the first user collected during singing and the background audio playback time corresponding to the audio clip of the background audio of the first user;
    调整单元,被配置为根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,以使第二用户的调整后的背景音频与接收到的第一用户的音频片段对齐,其中,第二用户的背景音频与第一用户的背景音频相同。The adjustment unit is configured to adjust the playback position of the background audio of the second user according to the playback time of the background audio, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user, wherein the The background audio of the second user is the same as the background audio of the first user.
  11. 如权利要求10所述的音频处理装置,其中,所述根据所述背景音频播放时刻调整第二用户的背景音频的播放,包括:The audio processing apparatus according to claim 10, wherein the adjusting the playback of the background audio of the second user according to the playback time of the background audio comprises:
    确定第二用户的背景音频在从第二用户演唱结束到第一用户开始演唱的时间区间或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间中音频平均能量最小的子区间;Determine the sub-interval with the smallest average audio energy in the time interval from the end of the second user's singing to the beginning of the first user's singing or the time interval from the second user's background audio to the first user's beginning to sing the background audio of the second user ;
    在所述子区间根据所述背景音频播放时刻调整第二用户的背景音频的播放位置。Adjust the playback position of the background audio of the second user in the sub-section according to the playback time of the background audio.
  12. 如权利要求10所述的音频处理装置,其中,所述背景音频播放时刻是通过将第一用户的背景音频的当前播放时刻减去由于音频采集而导致的时间延迟而获得的。The audio processing apparatus of claim 10, wherein the background audio playing time is obtained by subtracting a time delay caused by audio collection from the current playing time of the background audio of the first user.
  13. 如权利要求10所述的音频处理装置,其中,所述根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,包括:The audio processing apparatus according to claim 10, wherein the adjusting the playing position of the background audio of the second user according to the playing time of the background audio comprises:
    确定在接收到第一用户的音频片段时第二用户的背景音频播放位置;determining the background audio playback position of the second user when the audio clip of the first user is received;
    在所述背景音频播放位置处于从第二用户演唱结束到第一用户开始演唱的时间区间内或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间内的情况下,将所述背景音频播放位置调整为与接收到的所述背景音频播放时刻对应。When the background audio playback position is within the time interval from the end of the second user's singing to the first user's beginning to sing, or the time interval from the second user's background audio to the first user's beginning to sing, the The background audio playback position is adjusted to correspond to the received background audio playback time.
  14. 如权利要求11所述的音频处理装置,其中,所述在所述子区间根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,包括:The audio processing apparatus according to claim 11, wherein the adjusting the playback position of the background audio of the second user in the sub-section according to the playback time of the background audio comprises:
    确定在接收到第一用户的音频片段时第二用户的背景音频播放位置;determining the background audio playback position of the second user when the audio clip of the first user is received;
    响应于所述背景音频播放位置处于所述子区间,将所述背景音频播放位置调整为与接收到的所述背景音频播放时刻对应。In response to the background audio playback position being in the sub-section, the background audio playback position is adjusted to correspond to the received background audio playback time.
  15. 如权利要求11所述的音频处理装置,其中,所述确定第二用户的背景音频在 从第二用户演唱结束到第一用户开始演唱的时间区间或者从第二用户的背景音频开始播放到第一用户开始演唱的时间区间中音频平均能量最小的子区间,包括:The audio processing apparatus according to claim 11, wherein the determining that the background audio of the second user is played in a time interval from the end of the second user's singing to the beginning of the first user's singing or from the beginning of the background audio of the second user to the first A sub-interval with the smallest average audio energy in the time interval when the user starts singing, including:
    按照以下公式计算所述时间区间中的每个子区间的音频平均能量,并根据计算出的各自区间的音频平均能量确定频平均能量最小的子区间:Calculate the audio average energy of each sub-interval in the time interval according to the following formula, and determine the sub-interval with the smallest frequency average energy according to the calculated audio average energy of the respective interval:
    Figure PCTCN2021113086-appb-100002
    Figure PCTCN2021113086-appb-100002
    其中,E(ab)为子区间ab的平均能量,TSb为截止到b时刻的采样点个数,TSa为截止到a时刻的采样点个数,TSb-TSa为区间ab之间的采样点个数,S(i)为第i个采样点的幅值。Among them, E(ab) is the average energy of sub-interval ab, TSb is the number of sampling points up to time b, TSa is the number of sampling points up to time a, TSb-TSa is the number of sampling points between intervals ab number, S(i) is the amplitude of the ith sampling point.
  16. 如权利要求11所述的音频处理装置,还包括:发送单元,被配置为向第一用户发送演唱时采集的第二用户的音频片段以及第二用户的背景音频的与第二用户的音频片段对应的背景音频播放时刻。The audio processing apparatus according to claim 11, further comprising: a sending unit configured to send to the first user the audio clip of the second user collected during singing, the background audio of the second user, and the audio clip of the second user The corresponding background audio playback time.
  17. 如权利要求10所述的音频处理装置,还包括:The audio processing device of claim 10, further comprising:
    通信单元,被配置为与第一用户建立通信连接;a communication unit, configured to establish a communication connection with the first user;
    音频播放单元,被配置为播放所述背景音频,并且播放接收到的第一用户的音频片段。An audio playing unit configured to play the background audio and play the received audio clip of the first user.
  18. 如权利要求10所述的音频处理装置,其中,接收单元按照预定时间间隔接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻。The audio processing apparatus according to claim 10, wherein the receiving unit receives the audio clip of the first user collected during singing and the background audio playing time of the background audio of the first user corresponding to the audio clip at predetermined time intervals.
  19. 一种电子设备,包括:An electronic device comprising:
    至少一个处理器;at least one processor;
    至少一个存储计算机可执行指令的存储器,at least one memory storing computer-executable instructions,
    其中,所述计算机可执行指令在被所述至少一个处理器运行时,促使所述至少一个处理器执行以下步骤:Wherein, the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the following steps:
    接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻;receiving the audio clip of the first user collected during singing and the background audio playback time corresponding to the audio clip of the background audio of the first user;
    根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,以使第二用户的调整后的背景音频与接收到的第一用户的音频片段对齐,其中,第二用户的背景音频与第一用户的背景音频相同。Adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user, wherein the background audio of the second user is the same as The background audio for the first user is the same.
  20. 一种存储指令的计算机可读存储介质,其中,当所述指令被至少一个处理器运行时,促使所述至少一个处理器执行以下步骤:A computer-readable storage medium storing instructions, wherein the instructions, when executed by at least one processor, cause the at least one processor to perform the following steps:
    接收演唱时采集的第一用户的音频片段以及第一用户的背景音频的与所述音频片段对应的背景音频播放时刻;receiving the audio clip of the first user collected during singing and the background audio playback time corresponding to the audio clip of the background audio of the first user;
    根据所述背景音频播放时刻调整第二用户的背景音频的播放位置,以使第二用户的调整后的背景音频与接收到的第一用户的音频片段对齐,其中,第二用户的背景音频与第一用户的背景音频相同。Adjust the playback position of the background audio of the second user according to the background audio playback time, so that the adjusted background audio of the second user is aligned with the received audio clip of the first user, wherein the background audio of the second user is the same as The background audio for the first user is the same.
PCT/CN2021/113086 2021-01-26 2021-08-17 Audio processing method and audio processing apparatus WO2022160669A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110110917.7A CN112927666B (en) 2021-01-26 2021-01-26 Audio processing method, device, electronic equipment and storage medium
CN202110110917.7 2021-01-26

Publications (1)

Publication Number Publication Date
WO2022160669A1 true WO2022160669A1 (en) 2022-08-04

Family

ID=76166954

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/113086 WO2022160669A1 (en) 2021-01-26 2021-08-17 Audio processing method and audio processing apparatus

Country Status (2)

Country Link
CN (1) CN112927666B (en)
WO (1) WO2022160669A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927666B (en) * 2021-01-26 2023-11-28 北京达佳互联信息技术有限公司 Audio processing method, device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002251194A (en) * 2001-10-05 2002-09-06 Yamaha Corp Karaoke device
CN111261133A (en) * 2020-01-15 2020-06-09 腾讯科技(深圳)有限公司 Singing processing method and device, electronic equipment and storage medium
CN111383669A (en) * 2020-03-19 2020-07-07 杭州网易云音乐科技有限公司 Multimedia file uploading method, device, equipment and computer readable storage medium
CN111524494A (en) * 2020-04-27 2020-08-11 腾讯音乐娱乐科技(深圳)有限公司 Remote real-time chorus method and device and storage medium
CN112017622A (en) * 2020-09-04 2020-12-01 广州趣丸网络科技有限公司 Audio data alignment method, device, equipment and storage medium
CN112118062A (en) * 2019-06-19 2020-12-22 华为技术有限公司 Multi-terminal multimedia data communication method and system
CN112148248A (en) * 2020-09-28 2020-12-29 腾讯音乐娱乐科技(深圳)有限公司 Online song room implementation method, electronic device and computer readable storage medium
CN112927666A (en) * 2021-01-26 2021-06-08 北京达佳互联信息技术有限公司 Audio processing method and device, electronic equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9406341B2 (en) * 2011-10-01 2016-08-02 Google Inc. Audio file processing to reduce latencies in play start times for cloud served audio files
CN107093419B (en) * 2016-02-17 2020-04-24 广州酷狗计算机科技有限公司 Dynamic vocal accompaniment method and device
KR101987473B1 (en) * 2017-12-12 2019-06-10 미디어스코프 주식회사 System for synchronization between accompaniment and singing voice of online singing room service and apparatus for executing the same
CN109033335B (en) * 2018-07-20 2021-03-26 广州酷狗计算机科技有限公司 Audio recording method, device, terminal and storage medium
CN110267081B (en) * 2019-04-02 2021-01-22 北京达佳互联信息技术有限公司 Live stream processing method, device and system, electronic equipment and storage medium
CN110491358B (en) * 2019-08-15 2023-06-27 广州酷狗计算机科技有限公司 Method, device, equipment, system and storage medium for audio recording
CN111028818B (en) * 2019-11-14 2022-11-22 北京达佳互联信息技术有限公司 Chorus method, apparatus, electronic device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002251194A (en) * 2001-10-05 2002-09-06 Yamaha Corp Karaoke device
CN112118062A (en) * 2019-06-19 2020-12-22 华为技术有限公司 Multi-terminal multimedia data communication method and system
CN111261133A (en) * 2020-01-15 2020-06-09 腾讯科技(深圳)有限公司 Singing processing method and device, electronic equipment and storage medium
CN111383669A (en) * 2020-03-19 2020-07-07 杭州网易云音乐科技有限公司 Multimedia file uploading method, device, equipment and computer readable storage medium
CN111524494A (en) * 2020-04-27 2020-08-11 腾讯音乐娱乐科技(深圳)有限公司 Remote real-time chorus method and device and storage medium
CN112017622A (en) * 2020-09-04 2020-12-01 广州趣丸网络科技有限公司 Audio data alignment method, device, equipment and storage medium
CN112148248A (en) * 2020-09-28 2020-12-29 腾讯音乐娱乐科技(深圳)有限公司 Online song room implementation method, electronic device and computer readable storage medium
CN112927666A (en) * 2021-01-26 2021-06-08 北京达佳互联信息技术有限公司 Audio processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112927666A (en) 2021-06-08
CN112927666B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
US10043504B2 (en) Karaoke processing method, apparatus and system
WO2020253806A1 (en) Method and apparatus for generating display video, device and storage medium
JP2018519538A (en) Karaoke processing method and system
US11120782B1 (en) System, method, and non-transitory computer-readable storage medium for collaborating on a musical composition over a communication network
JP6785904B2 (en) Information push method and equipment
WO2017177621A1 (en) Data synchronization method in local area network, and apparatus and user terminal therefor
JP2020174339A (en) Method, device, server, computer-readable storage media, and computer program for aligning paragraph and image
WO2022142619A1 (en) Method and device for private audio or video call
US20220047954A1 (en) Game playing method and system based on a multimedia file
CN110312162A (en) Selected stage treatment method, device, electronic equipment and readable medium
US20170092253A1 (en) Karaoke system
WO2022110943A1 (en) Speech preview method and apparatus
US9405501B2 (en) System and method for automatic synchronization of audio layers
WO2022160669A1 (en) Audio processing method and audio processing apparatus
CN112365868B (en) Sound processing method, device, electronic equipment and storage medium
CN112687247B (en) Audio alignment method and device, electronic equipment and storage medium
US20160307551A1 (en) Multifunctional Media Players
US11862187B2 (en) Systems and methods for jointly estimating sound sources and frequencies from audio
WO2022227625A1 (en) Signal processing method and apparatus
JP6170692B2 (en) A communication karaoke system that can continue duet singing in the event of a communication failure
US11297368B1 (en) Methods, systems, and apparatuses and live audio capture
US20220303152A1 (en) Recordation of video conference based on bandwidth issue(s)
JP4981631B2 (en) Content transmission apparatus, content transmission method, and computer program
CN116450256A (en) Editing method, device, equipment and storage medium for audio special effects
US11522936B2 (en) Synchronization of live streams from web-based clients

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922264

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06.11.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21922264

Country of ref document: EP

Kind code of ref document: A1