CN115631738A

CN115631738A - Audio data processing method and device, electronic equipment and storage medium

Info

Publication number: CN115631738A
Application number: CN202211316116.7A
Authority: CN
Inventors: 林敏洁; 吴海全; 姜德军; 周浩; 杨斌; 郭世文
Original assignee: Shenzhen Grandsun Electronics Co Ltd; Shenzhen Feikedi System Development Co Ltd
Current assignee: Shenzhen Grandsun Electronics Co Ltd; Shenzhen Feikedi System Development Co Ltd
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2023-01-20

Abstract

The application is applicable to the technical field of computers, and provides an audio data processing method, an audio data processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: when the accompaniment audio of the target song sent by the terminal equipment is received, playing the accompaniment audio and collecting the singing audio of the environment where the audio playing equipment is located; and sending the singing audio to the terminal equipment, wherein the singing audio is used for triggering the terminal equipment to carry out song integration on the accompaniment audio and the singing audio based on the predetermined data transmission time delay. In the application, when the user records songs, the voice of the user can be collected through the audio playing equipment, so that the user can sing on the side of the audio playing equipment without being close to the terminal equipment, the recording range of the voice can be improved, and the use experience of the user is improved. In addition, the terminal equipment integrates songs based on data transmission time delay, so that singing audio and accompaniment audio can be synchronized, the accuracy of audio integration is improved, and the use experience of a user is further improved.

Description

Audio data processing method and device, electronic equipment and storage medium

Technical Field

The present application belongs to the field of computer technologies, and in particular, to an audio data processing method and apparatus, an electronic device, and a storage medium.

Background

Along with the development of internet technology, the application of singing software is more and more extensive, and a user can record singing voice through the singing software installed on terminal equipment such as a mobile phone.

Among the correlation technique, the user is when recording singing, usually by terminal equipment such as cell-phone with the accompaniment audio data transmission of song to audio playback equipment, for example, bluetooth earphone, bluetooth microphone etc. play accompaniment audio data through the loudspeaker among the audio playback equipment, then gather user's sound through terminal equipment, mix the user's that gathers sound and accompaniment by terminal equipment again to obtain complete singing sound. Because the voice of the user needs to be collected through the terminal equipment, the user needs to be close to the terminal equipment when singing, the recording of the voice of the user is limited by the distance, and the voice recording range is reduced.

Disclosure of Invention

The embodiment of the application provides an audio data processing method and device, electronic equipment and a storage medium, and can solve the problems that in the related art, when a user sings a song, the user needs to be close to terminal equipment, so that the recording of the user voice is limited by distance, and the recording range of the voice is reduced.

A first aspect of an embodiment of the present application provides an audio data processing method, including:

when the accompaniment audio of the target song sent by the terminal equipment is received, playing the accompaniment audio and collecting the singing audio of the environment where the audio playing equipment is located;

and sending the singing audio to the terminal equipment, wherein the singing audio is used for triggering the terminal equipment to carry out song integration on the accompaniment audio and the singing audio based on the predetermined data transmission time delay.

In some embodiments, the data transmission delay is determined by:

the terminal equipment sends detection audio to the audio playing equipment;

when the audio playing device receives the detection audio, the audio playing device sends feedback audio to the terminal device;

and when the terminal equipment receives the feedback audio, determining the data transmission delay according to the receiving time of the feedback audio and the sending time of the detection audio.

In some embodiments, the terminal device and the audio playing device perform data transmission through a pre-established data transmission channel, and the data transmission channel includes a first transmission channel and a second transmission channel;

the first transmission channel is used for transmitting data to be transmitted in the terminal equipment to the audio playing equipment, and the second transmission channel is used for transmitting the data to be transmitted in the audio playing equipment to the terminal equipment.

In some embodiments, prior to transmitting the singing audio to the terminal device, the method further comprises:

and carrying out noise filtering processing on the collected singing video, and switching the singing audio into the audio subjected to the noise filtering processing.

In some embodiments, the method further comprises:

when a first singing confirmation operation aiming at the audio playing equipment is detected, first operation information corresponding to the first singing confirmation operation is sent to the terminal equipment, wherein the first operation information is used for triggering the terminal equipment to determine a target song based on the first operation information.

A second aspect of the embodiments of the present application provides another audio data processing method, including:

sending the accompaniment audio of the target song to audio playing equipment, wherein the accompaniment audio is used for triggering the audio playing equipment to play the accompaniment audio and collecting the singing audio of the environment where the audio playing equipment is located;

and receiving singing audio sent by the audio playing equipment, and performing song integration on the accompaniment audio and the singing audio based on predetermined data transmission time delay.

In some embodiments, the target song is determined by any one of:

receiving first operation information sent by audio playing equipment, and determining a target song based on the first operation information;

and when a second singing confirmation operation aiming at the terminal equipment is detected, determining the target song according to second operation information corresponding to the second singing operation.

A third aspect of an embodiment of the present application provides an audio data processing apparatus, including:

the audio acquisition unit is used for playing the accompaniment audio and acquiring the singing audio of the environment where the audio playing equipment is located when the accompaniment audio of the target song sent by the terminal equipment is received;

and the audio transmitting unit is used for transmitting the singing audio to the terminal equipment, wherein the singing audio is used for triggering the terminal equipment to carry out song integration on the accompaniment audio and the singing audio based on the predetermined data transmission time delay.

In some embodiments, the data transmission delay is determined by:

the terminal equipment sends detection audio to the audio playing equipment;

and when the terminal equipment receives the feedback audio, determining the data transmission time delay according to the receiving time of the feedback audio and the sending time of the detection audio.

In some embodiments, the apparatus further comprises a noise filtering unit for performing noise filtering processing on the collected singing video and switching the singing audio into the noise-filtered audio.

In some embodiments, the apparatus further includes an information sending unit, configured to send, when a first singing confirmation operation for the audio playback device is detected, first operation information corresponding to the first singing confirmation operation to the terminal device, where the first operation information is used to trigger the terminal device to determine the target song based on the first operation information.

A fourth aspect of the embodiments of the present application provides another audio data processing apparatus, including:

the accompaniment transmitting unit is used for transmitting accompaniment audio of the target song to the audio playing device, wherein the accompaniment audio is used for triggering the audio playing device to play the accompaniment audio and collecting singing audio of the environment where the audio playing device is located;

and the song integration unit is used for receiving the singing audio sent by the audio playing equipment and performing song integration on the accompaniment audio and the singing audio based on the predetermined data transmission time delay.

In some embodiments, the target song is determined by any one of:

In some embodiments, the data transmission delay is determined by:

the terminal equipment sends detection audio to the audio playing equipment;

A fifth aspect of embodiments of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the audio data processing method provided in the first aspect or the second aspect when executing the computer program.

A sixth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the audio data processing method provided in the first aspect or the second aspect.

The audio data processing method, the audio data processing device, the electronic equipment and the storage medium provided by the embodiment of the application have the following beneficial effects: firstly, when the accompaniment audio of a target song sent by the terminal equipment is received, the accompaniment audio is played and the singing audio of the environment where the audio playing equipment is located is collected. And then, transmitting the singing audio to the terminal equipment, wherein the singing audio is used for triggering the terminal equipment to carry out song integration on the accompaniment audio and the singing audio based on the predetermined data transmission time delay. When recording songs, the user can collect the voice of the user through the audio playing equipment, so that the user can sing at the audio playing equipment side without being close to the terminal equipment, the recording range of the voice can be improved, and the use experience of the user is improved. In addition, the terminal equipment integrates songs based on data transmission time delay, so that singing audio and accompaniment audio can be synchronized, the accuracy of audio integration is improved, and the use experience of a user is further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart illustrating an implementation of an audio data processing method according to an embodiment of the present application;

fig. 2 is a flowchart of an implementation of determining a data transmission delay according to an embodiment of the present application;

fig. 3 is a schematic diagram of an application scenario of an audio data processing method provided in an embodiment of the present application;

FIG. 4 is a flowchart illustrating an implementation of a method for processing audio data according to another embodiment of the present application;

fig. 5 is a block diagram of an audio data processing apparatus according to an embodiment of the present application;

fig. 6 is a block diagram of an audio data processing apparatus according to another embodiment of the present application;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless otherwise specifically stated.

In order to explain the technical means of the present application, the following examples are given below.

Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of an audio data processing method according to an embodiment of the present application, including:

step 101, when the accompaniment audio of the target song sent by the terminal device is received, playing the accompaniment audio and collecting the singing audio of the environment where the audio playing device is located.

The terminal device is generally a device with a song recording function, such as a mobile phone, a tablet computer, and the like.

Wherein the target song is a song that the user wants to record.

The audio playing device is generally a device with an audio playing function, and in practice, a microphone may be configured on the audio playing device. In practical applications, the audio playing device may be a bluetooth microphone, a bluetooth headset, or the like.

Wherein the singing audio is the audio singing by the user.

In this embodiment, the audio data processing method may be applied to an audio playing device, and in this case, the main execution body of the audio data processing method is the audio playing device.

In practice, a communication connection, such as a bluetooth connection, is usually established between the audio playing device and the terminal device, and the audio playing device may receive audio data sent by the terminal device through the network.

In practice, when the audio playing device receives the accompaniment audio of the target song sent by the terminal device, the accompaniment audio can be decoded first, and then the decoded accompaniment audio is played through a loudspeaker in the audio playing device. Meanwhile, the audio playing device can collect singing audio in the environment through the microphone.

And 102, transmitting the singing audio to the terminal equipment.

The singing audio is used for triggering the terminal equipment to carry out song integration on the accompaniment audio and the singing audio based on the predetermined data transmission time delay.

Here, the data transmission delay is a time interval from the transmission of the audio data from the terminal device to the audio playback device to the reception of the audio data transmitted from the audio playback device by the terminal device. In practical applications, the data transmission delay may include a first transmission time interval and a second transmission time interval, where the first transmission time interval may be a time interval during which the audio data is transmitted from the terminal device to the audio playing device, and the second transmission time interval may be a time interval during which the audio playing device receives the audio data, decodes the audio data, encodes the decoded audio data, and sends the encoded audio data to the terminal device.

In practice, the audio playing device may transmit the singing audio to the terminal device through a communication channel established with the terminal device.

The audio data processing method provided by the embodiment includes the steps that firstly, when the accompaniment audio of a target song sent by a terminal device is received, the accompaniment audio is played and the singing audio of the environment where the audio playing device is located is collected. And then, transmitting the singing audio to the terminal equipment, wherein the singing audio is used for triggering the terminal equipment to carry out song integration on the accompaniment audio and the singing audio based on the predetermined data transmission time delay. When the user records songs, the voice of the user can be collected through the audio playing device, so that the user can sing on the side of the audio playing device without being close to the terminal device, the recording range of the voice can be improved, and the use experience of the user is improved. In addition, the terminal equipment integrates songs based on data transmission time delay, so that singing audio and accompaniment audio can be synchronized, the accuracy of audio integration is improved, and the use experience of a user is further improved.

Referring to fig. 2, fig. 2 is a flowchart illustrating an implementation of determining a data transmission delay according to an embodiment of the present application, including:

step 201, the terminal device sends a detection audio to the audio playing device.

The detection audio is usually a preset audio for detection. For example, the detection audio may be audio with a content of 0 and a duration of 0.1 seconds. It should be noted that the content and duration of the detected audio are not specifically limited in this embodiment.

In practice, a communication connection is usually established between the audio playing device and the terminal device, and the terminal device can send the detected audio to the audio playing device through a network. Here, the terminal device may record the transmission time when the detection audio is transmitted to the audio playback device.

Step 202, when the audio playing device receives the detected audio, it sends a feedback audio to the terminal device.

The feedback audio is typically audio that is used to respond to the detection audio. In practical applications, the content of the feedback audio is generally the same as the content of the detection audio.

In practice, when receiving a detection audio sent by a terminal device, an audio playing device generally decodes the detection audio, then encodes the decoded detection audio by using a preset encoding format to obtain a feedback audio corresponding to the detection audio, and then sends the feedback audio to the terminal device through a network.

And 203, when the terminal equipment receives the feedback audio, determining the data transmission delay according to the receiving time of the feedback audio and the sending time of the detection audio.

In practice, when receiving the feedback audio sent by the audio playing device, the terminal device may record the receiving time of the received feedback audio, and then obtain the data transmission delay by making a difference between the receiving time of the feedback audio and the sending time of the detection audio. For example, the sending time recorded when the terminal device sends the detection audio is 9 hours, 30 minutes and 10 seconds, the receiving time recorded when the terminal device receives the feedback audio is 9 hours, 30 minutes and 11 seconds, and the difference between the sending time and the receiving time can obtain that the data transmission delay is 1 second.

In practice, the terminal device may start timing when the detection audio is transmitted, and stop timing when the feedback audio is received, so as to determine the data transmission delay.

In practice, the terminal device may also determine a pre-stored empirical time value as the data transmission delay. For example, the terminal device may determine a pre-stored empirical time value of 0.5 seconds as the data transmission delay.

After the data transmission delay is determined, the terminal equipment can use the determined data transmission delay as audio time deviation between singing audio of a user and accompaniment audio in the terminal equipment, and song integration is carried out on the accompaniment audio and the singing audio based on the audio time deviation, so that the accompaniment audio and the singing audio are synchronous.

In some application scenarios, when receiving a feedback audio, a terminal device generally needs to decode the feedback audio, the terminal device may record decoding time consumed from starting decoding to finishing decoding, add data transmission delay and the decoding time consumed to obtain a detection time offset from starting transmission of a detection audio to finishing decoding of the feedback audio, and the terminal device may use the obtained detection time offset as an audio time offset between a singing audio and an accompaniment audio of a user. Here, the terminal device may perform song integration of the accompaniment audio and the singing audio based on the audio time offset, so that the accompaniment audio and the singing audio are synchronized. For example, when the audio time deviation is 1 second, the terminal device may integrate the accompaniment audio with the singing audio of the user after delaying for 1 second.

In practice, the terminal device may also obtain the audio time offset between the singing audio and the accompaniment audio of the user by making a difference between the decoding time when decoding is completed and the transmission time of the detection audio. The terminal equipment can also start timing when sending the detection audio, and stop timing when receiving the feedback audio and finishing decoding, so as to obtain the audio time deviation between the singing audio and the accompaniment audio of the user.

It is noted that the terminal device typically needs to determine the data transmission delay before song integration is performed.

In practice, the terminal device may detect a plurality of actual transmission delays, and determine an average value of the plurality of actual transmission delays as the data transmission delay.

In this embodiment, the transmission of the audio data between the terminal device and the audio playing device is simulated by detecting the transmission of the audio between the terminal device and the audio playing device, and the terminal device determines the data transmission delay based on the receiving time of the feedback audio and the sending time of the detection audio, so that the data transmission delay is the accurate time deviation of the audio data in the whole transmission process, and the accuracy of the data transmission delay can be improved.

In some embodiments, the terminal device and the audio playing device perform data transmission through a pre-established data transmission channel, and the data transmission channel includes a first transmission channel and a second transmission channel.

The first transmission channel is used for transmitting data to be transmitted in the terminal equipment to the audio playing equipment. The second transmission channel is used for transmitting the data to be transmitted in the audio playing device to the terminal device.

In practice, the first transmission channel may be an A2DP channel established by the terminal device and the Audio playing device based on a bluetooth Audio transmission model agreement (A2 DP). In practice, the terminal device may establish an A2DP connection with a microphone bluetooth module in the audio playing device through a terminal bluetooth module in the terminal device, so as to establish an A2DP channel between the terminal device and the audio playing device. Here, a terminal bluetooth module in the terminal device and a microphone bluetooth module in the audio playing device are respectively loaded with an A2DP protocol, the terminal device can search for the audio playing device through bluetooth scanning, and a user can click a connection button in a terminal device interface to establish an A2DP channel between the terminal device and the audio playing device. Here, the A2DP channel is a unidirectional audio data transmission channel from the terminal device to the audio playback device for transmitting high-definition audio data from the terminal device to the audio playback device.

In practice, the second transmission channel may be a Bluetooth Low Energy (BLE) channel established by the terminal device and the audio playing device based on a BLE, or may be an SPP channel established by the terminal device and the audio playing device based on a Bluetooth Serial protocol (SPP). For example, when a bluetooth terminal module in the terminal device and a bluetooth microphone module in the audio playing device respectively have a BLE protocol, the singing software in the terminal device may search for the audio playing device through bluetooth scanning, and the user may click a connection button in the interface of the terminal device to establish a BLE channel between the terminal device and the audio playing device. Here, the BLE channel is a bidirectional data transmission channel that does not limit the type of data to be transmitted, that is, the terminal device may transmit the data to be transmitted to the audio playback device through the BLE channel, and the audio playback device may also transmit the data to be transmitted to the terminal device through the BLE channel.

Referring to fig. 3, fig. 3 is a schematic view illustrating an application scenario of an audio data processing method according to an embodiment of the present application. As shown in fig. 3, terminal equipment and audio playback equipment carry out data transmission through the A2DP channel and the BLE channel that establish, when the user records the song, singing software in the terminal equipment sends the accompaniment audio of the target song to the audio playback equipment through the A2DP channel, the audio playback equipment plays the accompaniment audio through the loudspeaker after decoding the accompaniment audio, the user sings towards the audio playback equipment based on the heard accompaniment audio, the audio playback equipment collects the user's voice through the microphone, the collected user's voice is taken as the singing audio, and the singing audio is encoded according to the preset encoding format, then the encoded singing audio is sent to the terminal equipment through the BLE channel, the terminal equipment decodes the received singing audio, and performs accompaniment integration between the decoded singing audio and the song audio of the target song that the terminal equipment locally stores to obtain the integrated song audio, and plays the integrated song audio, the user can listen to the song integrated audio through the terminal equipment.

In the audio data processing method provided by this embodiment, the terminal device and the audio playing device perform data transmission through the first transmission channel and the second transmission channel, and the first transmission channel and the second transmission channel are parallel and independent from each other, so that data transmitted in the first transmission channel and data transmitted in the second transmission channel do not interfere with each other, and stability of audio data transmission can be improved.

In some embodiments, before transmitting the singing audio to the terminal device, the audio data processing method may further include:

and carrying out noise filtering processing on the collected singing audio, and switching the singing audio into the audio subjected to the noise filtering processing.

The singing audio usually includes noise, such as environmental noise and accompaniment audio, in addition to the audio of the user singing.

The noise filtering process is usually used to filter noise in the singing audio. In practice, the bluetooth microphone may employ an adaptive filtering algorithm, such as Least Mean Square adaptive filtering algorithm (LMS), recursive Least Square adaptive filtering algorithm (RLS), square root adaptive filtering algorithm (QR-RLS), etc., to perform a noise filtering process on the collected singing video, so as to filter noise in the singing audio, so that the audio after the noise filtering process only includes the audio of the user singing.

In practice, the audio playing device may also input the collected singing audio into a pre-trained noise filtering model, and perform noise filtering processing on the collected singing audio through the noise filtering model to obtain the audio after the noise filtering processing. The noise filtering model is used for representing the corresponding relation between the singing audio and the audio after the noise filtering processing. Here, the noise filtering model may be a model obtained by training an initial model (for example, a Convolutional Neural Network (CNN), a residual error Network (ResNet), or the like) by a machine learning method based on a training sample.

After the collected singing audio is subjected to the noise filtering processing, the audio playing device can switch the singing audio into the audio subjected to the noise filtering processing.

In some embodiments, the audio processing method may further include: when a first singing confirmation operation aiming at the audio playing equipment is detected, first operation information corresponding to the first singing confirmation operation is sent to the terminal equipment.

The first singing confirmation operation is generally an operation performed by a user at the end of the audio playing device to confirm singing. For example, the first song confirming operation may be implemented as an operation in which the user inputs a voice including song information of the target song by voice, or may be implemented as an operation in which the user taps the audio playback apparatus. The song information is generally information for describing a song, and the song information may include, but is not limited to, a song title, a singer, and the like.

The first operation information is used for triggering the terminal equipment to determine the target song based on the first operation information. In practice, the first manipulation information may include song information of the target song, such as a song title, an artist, and the like.

In practice, the audio playing device may collect the voice of the user through the microphone, and when the voice input by the user belongs to the voice corresponding to the preset first singing confirmation operation, the audio playing device may send the first operation information corresponding to the first singing confirmation operation to the terminal device through a pre-established data transmission channel, such as a BLE channel.

According to the audio data processing method provided by the embodiment, the user can control the audio playing device to play the accompaniment audio of the desired target song on the audio playing device side, the user does not need to specially walk to the terminal device side to control the accompaniment audio played by the audio playing device at each time, the user operation is facilitated, and the use experience of the user is facilitated to be improved.

Referring to fig. 4, fig. 4 is a flowchart illustrating an implementation of an audio data processing method according to another embodiment of the present application, including:

step 401, sending the accompaniment audio of the target song to the audio playing device.

The accompaniment audio is used for triggering the audio playing device to play the accompaniment audio and collecting the singing audio in the environment where the audio playing device is located.

In this embodiment, the audio data processing method may be applied to a terminal device, and in this case, the main execution body of the audio data processing method is the terminal device.

In practice, a communication connection, such as a bluetooth connection, is usually established between the terminal device and the audio playing device, and the terminal device may send the accompaniment audio of the target song to the audio playing device through the network.

In some application scenarios, an A2DP connection may be established between the terminal device and the audio playing device, and the terminal device may send the accompaniment audio of the target song to the audio playing device through an A2DP channel corresponding to the A2DP connection.

Step 402, receiving singing audio sent by an audio playing device, and performing song integration on the accompaniment audio and the singing audio based on predetermined data transmission delay.

In practice, the terminal device may receive the singing audio transmitted by the audio playing device through the network.

In some application scenarios, a BLE connection may be established between the terminal device and the audio playing device, and the terminal device may receive a singing audio sent by the audio playing device through a BLE channel corresponding to the BLE connection.

After receiving the singing audio sent by the audio playing device, the terminal device can perform song integration on the accompaniment audio and the singing audio based on the predetermined data transmission delay. In practice, the terminal device may adopt a song integration algorithm, such as a sum-and-mix algorithm, an average adjustment weight-mix algorithm, a sum-and-clamp-mix algorithm, and the like, to perform song integration on the accompaniment audio and the singing audio. Specifically, the terminal device may delay the accompaniment audio by a time interval corresponding to the data transmission delay, and then perform song integration on the accompaniment audio and the singing audio by using an audio mixing algorithm.

The audio data processing method provided by the embodiment firstly sends the accompaniment audio of the target song to the audio playing device, wherein the accompaniment audio is used for triggering the audio playing device to play the accompaniment audio and collecting the singing audio of the environment where the audio playing device is located. And then, receiving singing audio sent by the audio playing equipment, and carrying out song integration on the accompaniment audio and the singing audio based on the predetermined data transmission time delay. When recording songs, the user can collect the voice of the user through the audio playing equipment, so that the user can sing at the audio playing equipment side without being close to the terminal equipment, the recording range of the voice can be improved, and the use experience of the user is improved. In addition, the terminal equipment integrates songs based on data transmission time delay, so that singing audio and accompaniment audio can be synchronized, the accuracy of audio integration is improved, and the use experience of a user is further improved.

In some embodiments, the target song may be determined by any one of the following first to second items.

The first item is used for receiving first operation information sent by the audio playing device and determining a target song based on the first operation information.

The first operation information may include song information of the target song, such as a song title, a singer, and the like.

In practice, the terminal device may receive the first operation information sent by the audio playing device through a data transmission channel established with the audio playing device, for example, a BLE channel. Then, the terminal device may search for a target song corresponding to the first operation information from a pre-established operation information-song correspondence table by using the first operation information. The operation information-song correspondence table may be a correspondence table that is pre-established and stores a plurality of operation information-song correspondences.

And the second item is used for determining the target song according to second operation information corresponding to the second singing operation when the second singing confirmation operation aiming at the terminal equipment is detected.

Wherein, the second singing confirmation operation is the operation of confirming singing made by the user aiming at the terminal equipment. As an example, the second singing confirmation operation may be that the user inputs a confirmation singing instruction in the terminal device interface.

In practice, the terminal device may detect an instruction input by the user in the terminal device interface, and when a second singing confirmation operation for the terminal device is detected, the terminal device may search for a target song corresponding to the second operation information from a pre-established operation information-song correspondence table by using second operation information corresponding to the second singing operation. The operation information-song correspondence table may be a correspondence table that is pre-established and stores a plurality of operation information-song correspondences.

According to the audio data processing method provided by the embodiment, the terminal device can receive the first operation information sent by the audio playing device or the second operation information input by the user to determine the target song, and the user can control the terminal device to determine the target song according to requirements, so that the application range of sound recording can be expanded.

Referring to fig. 5, fig. 5 is a block diagram of an audio data processing apparatus 500 according to an embodiment of the present application, and in practice, the audio data processing apparatus 500 may be applied to an audio playing device. As shown in fig. 5, the audio data processing apparatus 500 may include an audio collecting unit 501 and an audio transmitting unit 502.

The audio acquisition unit 501 is configured to play accompaniment audio and acquire singing audio of an environment where the audio playing device is located when the accompaniment audio of the target song sent by the terminal device is received;

the audio sending unit 502 is configured to send a singing audio to the terminal device, where the singing audio is used to trigger the terminal device to perform song integration on the accompaniment audio and the singing audio based on a predetermined data transmission delay.

In some embodiments, the data transmission delay is determined by:

the terminal equipment sends detection audio to the audio playing equipment;

In some embodiments, the apparatus further includes a noise filtering unit (not shown in the figure) for performing noise filtering processing on the collected singing video and switching the singing audio into noise-filtered audio.

In some embodiments, the apparatus further includes an information sending unit (not shown in the figure) configured to, when a first singing confirmation operation for the audio playback device is detected, send first operation information corresponding to the first singing confirmation operation to the terminal device, where the first operation information is used to trigger the terminal device to determine the target song based on the first operation information.

The audio data processing device provided by the embodiment firstly plays the accompaniment audio and collects the singing audio of the environment where the audio playing equipment is located when receiving the accompaniment audio of the target song sent by the terminal equipment. And then, transmitting the singing audio to the terminal equipment, wherein the singing audio is used for triggering the terminal equipment to carry out song integration on the accompaniment audio and the singing audio based on the predetermined data transmission time delay. When the user records songs, the voice of the user can be collected through the audio playing device, so that the user can sing on the side of the audio playing device without being close to the terminal device, the recording range of the voice can be improved, and the use experience of the user is improved. In addition, the terminal equipment integrates songs based on data transmission time delay, so that singing audio and accompaniment audio can be synchronized, the accuracy of audio integration is improved, and the use experience of a user is further improved.

It should be noted that, for the information interaction, execution process and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as the method embodiment of the audio playing device side in the present application, and reference may be made to the part of the method embodiment of the audio playing device side specifically, and details are not described here again.

Referring to fig. 6, fig. 6 is a block diagram of an audio data processing apparatus 600 according to another embodiment of the present application, and in practice, the audio data processing apparatus 600 may be applied to a terminal device. As shown in fig. 6, the audio data processing device 600 may include an accompaniment transmission unit 601 and a song integration unit 602.

The accompaniment sending unit 601 is configured to send accompaniment audio of a target song to an audio playing device, where the accompaniment audio is used to trigger the audio playing device to play the accompaniment audio and collect singing audio of an environment where the audio playing device is located;

the song integration unit 602 is configured to receive the singing audio sent by the audio playing device, and perform song integration on the accompaniment audio and the singing audio based on a predetermined data transmission delay.

In some embodiments, the target song is determined by any one of:

In some embodiments, the data transmission delay is determined by:

the terminal equipment sends detection audio to the audio playing equipment;

The audio data processing apparatus provided in this embodiment first sends the accompaniment audio of the target song to the audio playing device, where the accompaniment audio is used to trigger the audio playing device to play the accompaniment audio and collect the singing audio of the environment where the audio playing device is located. And then, receiving singing audio sent by the audio playing equipment, and carrying out song integration on the accompaniment audio and the singing audio based on the predetermined data transmission time delay. When recording songs, the user can collect the voice of the user through the audio playing equipment, so that the user can sing at the audio playing equipment side without being close to the terminal equipment, the recording range of the voice can be improved, and the use experience of the user is improved. In addition, the terminal equipment integrates songs based on data transmission time delay, so that singing audio and accompaniment audio can be synchronized, the accuracy of audio integration is improved, and the use experience of a user is further improved.

It should be noted that, for the information interaction, execution process and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as the method embodiment of the terminal device side in the present application, and reference may be made to the method embodiment of the audio playing device side specifically, and details are not described here again.

Referring to fig. 7, fig. 7 is a block diagram of an electronic device 700 according to an embodiment of the present disclosure, where the electronic device 700 of the embodiment includes: at least one processor 701 (only one processor is shown in fig. 7), a memory 702, and a computer program 703, such as an audio data processing program, stored in the memory 702 and executable on the at least one processor 701. The steps in the embodiments of the respective audio data processing methods described above are implemented when the processor 701 executes the computer program 703. The processor 701 executes the computer program 703 to implement the functions of the modules/units in the above-described embodiments of the apparatus, for example, the functions of the audio acquisition unit 501 to the audio transmission unit 502 shown in fig. 5, or the functions of the accompaniment transmission unit 601 and the song integration unit 602 shown in fig. 6.

Illustratively, the computer program 703 may be divided into one or more units, which are stored in the memory 702 and executed by the processor 701 to accomplish the present application. One or more of the elements may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program 703 in the electronic device 700. For example, the computer program 703 may be divided into an audio acquisition unit, an audio transmission unit, an accompaniment transmission unit, and a song integration unit, and specific functions of each unit are described in the above embodiments, and are not described herein again.

The electronic device 700 may be a computing device such as an electronic device, a desktop computer, a tablet computer, a cloud-side electronic device, and a mobile terminal. The electronic device 700 may include, but is not limited to, a processor 701, a memory 702. Those skilled in the art will appreciate that fig. 7 is merely an example of an electronic device 700 and does not constitute a limitation of electronic device 700 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., an electronic device may also include input-output devices, network access devices, buses, etc.

The Processor 701 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 702 may be an internal storage unit of the electronic device 700, such as a hard disk or a memory of the electronic device 700. The memory 702 may also be an external storage device of the electronic device 700, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc., provided on the electronic device 700. Alternatively, the memory 702 may include both internal storage units and external storage devices of the electronic device 700. The memory 702 is used for storing computer programs and other programs and data required by the turntable device. The memory 702 may also be used to temporarily store data that has been output or is to be output.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated module, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by the present application, and a computer program that can be executed by related hardware through a computer program can be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of the embodiments of the methods described above can be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, in accordance with legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present application, and they should be construed as being included in the present application.

Claims

1. A method of audio data processing, the method comprising:

when accompaniment audio of a target song sent by terminal equipment is received, playing the accompaniment audio and collecting singing audio of an environment where audio playing equipment is located;

and sending the singing audio to the terminal equipment, wherein the singing audio is used for triggering the terminal equipment to carry out song integration on the accompaniment audio and the singing audio based on a predetermined data transmission time delay.

2. The audio data processing method according to claim 1, wherein the data transmission delay is determined by:

the terminal equipment sends detection audio to the audio playing equipment;

3. The audio data processing method according to claim 1, wherein the terminal device and the audio playing device perform data transmission through a pre-established data transmission channel, and the data transmission channel includes a first transmission channel and a second transmission channel;

the first transmission channel is used for transmitting the data to be transmitted in the terminal equipment to the audio playing equipment, and the second transmission channel is used for transmitting the data to be transmitted in the audio playing equipment to the terminal equipment.

4. The audio data processing method of claim 1, wherein before transmitting the singing audio to the terminal device, the method further comprises:

5. The audio data processing method according to any one of claims 1 to 4, characterized in that the method further comprises:

when a first singing confirmation operation aiming at the audio playing equipment is detected, first operation information corresponding to the first singing confirmation operation is sent to the terminal equipment, wherein the first operation information is used for triggering the terminal equipment to determine the target song based on the first operation information.

6. A method of audio data processing, the method comprising:

sending accompaniment audio of a target song to audio playing equipment, wherein the accompaniment audio is used for triggering the audio playing equipment to play the accompaniment audio and collecting singing audio of an environment where the audio playing equipment is located;

and receiving the singing audio sent by the audio playing equipment, and performing song integration on the accompaniment audio and the singing audio based on predetermined data transmission time delay.

7. The audio data processing method of claim 6, wherein the target song is determined by any one of:

receiving first operation information sent by the audio playing equipment, and determining the target song based on the first operation information;

8. An audio data processing apparatus, characterized by comprising:

and the audio sending unit is used for sending the singing audio to the terminal equipment, wherein the singing audio is used for triggering the terminal equipment to carry out song integration on the accompaniment audio and the singing audio based on a predetermined data transmission time delay.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the audio data processing method according to any of claims 1-7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the audio data processing method according to any one of claims 1 to 7.