CN116030778A

CN116030778A - Audio data processing method, device, computer equipment and storage medium

Info

Publication number: CN116030778A
Application number: CN202211632587.9A
Authority: CN
Inventors: 吴楠; 党晓妍
Original assignee: Weilai Automobile Technology Anhui Co Ltd
Current assignee: Weilai Automobile Technology Anhui Co Ltd
Priority date: 2022-12-19
Filing date: 2022-12-19
Publication date: 2023-04-28

Abstract

The present application relates to a method, an apparatus, a computer device, a storage medium and a computer program product for processing audio data. The terminal identifies song tracks corresponding to the sound signals when the sound signals are determined to be the target sound types by acquiring the sound signals in the environment, acquires accompaniment music of the song tracks, and plays audio after the accompaniment music and the sound signals are mixed, so that a K song mode is automatically entered, the complex flow of K song operation in the traditional technology is simplified, and the range of application scenes of the K song mode can be expanded.

Description

Audio data processing method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of information processing technology, and in particular, to an audio data processing method, an apparatus, a computer device, a storage medium, and a computer program product.

Background

With the continuous development of science and technology, the use of intelligent devices is becoming more and more popular. For example, K songs by smart devices have become a common entertainment item in people's lives.

In the conventional technology, when a user performs a K song by using an intelligent device, the user is usually required to manually open a K song application in the intelligent device and access a handheld microphone to enter a K song mode.

However, the existing karaoke operation mode is complex, so that the application of the karaoke operation mode is greatly limited, and especially for the scene of inconvenient operation of a user, for example, the karaoke application cannot be normally used by the user when the user drives.

Disclosure of Invention

Based on this, it is necessary to provide an audio data processing method, apparatus, computer device, computer readable storage medium and computer program product capable of simplifying the K song operation in order to solve the above-mentioned technical problem of the complex K song operation manner in the conventional technology.

In a first aspect, the present application provides an audio data processing method. The method comprises the following steps:

acquiring sound signals in the environment;

when the sound signal is determined to be the target sound category, identifying song tracks corresponding to the sound signal;

acquiring accompaniment music of the song track;

and playing the audio after mixing the accompaniment music and the sound signal.

In one embodiment, when the sound signal is determined to be the target sound category, identifying the song track corresponding to the sound signal includes: when the sound signal is determined to be the target sound type, returning a K song prompt message; and when receiving a confirmation instruction of the K song prompt information, identifying the song track corresponding to the sound signal.

In one embodiment, the K song prompt information includes at least one of text information or voice information.

In one embodiment, the method further comprises: detecting sound signals in the environment in real time; when the sound signal is detected, extracting the characteristics of the sound signal to obtain corresponding sound characteristics; and inputting the sound characteristics into a preset classification neural network to obtain the sound category of the sound signal.

In one embodiment, the method further comprises: detecting sound signals in the environment in real time; when the sound signal is detected, extracting the characteristics of the sound signal to obtain corresponding sound characteristics; and when the sound characteristics are matched with preset song voiceprint characteristics, determining the sound signal as a target sound category.

In one embodiment, the playing the audio obtained by mixing the accompaniment music with the sound signal includes: performing echo cancellation processing on the sound signal to obtain a target sound signal; mixing the target sound signal with the accompaniment music to obtain mixed audio; and playing the audio.

In a second aspect, the present application also provides an audio data processing apparatus. The device comprises:

the signal acquisition module is used for acquiring sound signals in the environment;

the song identification module is used for identifying song tracks corresponding to the sound signals when the sound signals are determined to be the target sound categories;

a music acquisition module for acquiring accompaniment music of the song;

and the playing module is used for playing the audio after the accompaniment music and the sound signal are mixed.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method according to the first aspect above when the processor executes the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described in the first aspect above.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method as described in the first aspect above.

According to the audio data processing method, the device, the computer equipment, the storage medium and the computer program product, the terminal is used for identifying the song tracks corresponding to the sound signals and obtaining the accompaniment music of the song tracks when the sound signals are determined to be the target sound types by acquiring the sound signals in the environment, and playing the audio after the accompaniment music and the sound signals are mixed, so that a K song mode is automatically entered, the complex flow of K song operation in the traditional technology is simplified, and the range of application scenes of the K song mode can be expanded.

Drawings

FIG. 1 is a flow chart of a method of processing audio data according to an embodiment;

FIG. 2 is a flow chart illustrating steps for identifying sound classes in one embodiment;

FIG. 3 is a flowchart illustrating a step of identifying a sound class according to another embodiment;

FIG. 4 is a flow diagram of a process for identifying a song track in one embodiment;

FIG. 5 is a flowchart illustrating a step of playing a mixed audio in one embodiment;

FIG. 6 is a block diagram of an audio data processing device in one embodiment;

FIG. 7A is a schematic block diagram of an audio data processing device in one embodiment;

FIG. 7B is a schematic diagram of an audio data processing device according to another embodiment;

FIG. 8 is a flow chart of a method of processing audio data according to another embodiment;

fig. 9 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, an audio data processing method is provided, where this embodiment is applied to a terminal for illustration, it is understood that the method may also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. The terminal can be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things equipment and portable wearable equipment, and the internet of things equipment can be smart speakers, smart televisions, smart air conditioners, smart vehicle-mounted equipment and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. In this embodiment, the method may include the steps of:

step 102, acquiring sound signals in the environment.

The sound signal is a sound wave signal generated by vibration, and is an information carrier with a variable wavelength and intensity. The environment is the current environment of the terminal equipment. In this embodiment, the terminal may monitor the sound signal in the environment, thereby acquiring the monitored sound signal, and perform processing through subsequent steps to automatically enter the K song mode. Specifically, the terminal can monitor and collect sound signals in the environment in real time through a built-in audio collection module (such as a built-in microphone or a microphone array), so that an external handheld microphone is not needed, and the requirement on hardware equipment in a K song scene is reduced.

Step 104, when the sound signal is determined to be the target sound category, identifying the song track corresponding to the sound signal.

Wherein the sound categories are categories classified based on the sound signals, and in the present embodiment, the sound categories include song categories and non-song categories. The target sound category may be a preset sound category to be processed, and in this embodiment, the target sound category may be a song category. The song track may then be the title of the song or musical composition.

Specifically, the terminal identifies the sound category corresponding to the sound signal obtained through the steps, and when the sound category corresponding to the sound signal is determined to be the target sound category, identifies the song track corresponding to the sound signal, namely identifies the name of the song or the musical piece corresponding to the sound signal. Therefore, the voice does not need to be identified based on the triggering of the user, the application range of the K song scene is enlarged, and the K song scene is convenient for the user to use.

Step 106, obtaining accompaniment music of the song tracks.

The accompaniment music refers to a instrumental performance accompanied with singing, namely accompaniment music and dubbing music. In this embodiment, after the terminal identifies the song track corresponding to the sound signal, the terminal may further acquire accompaniment music of the song track.

Step 108, playing the audio after mixing the accompaniment music and the sound signal.

Specifically, after the terminal acquires the accompaniment music of the song, the terminal enters a K song mode and plays the audio after mixing the accompaniment music with the sound signal.

In the audio data processing method, the terminal identifies the song track corresponding to the sound signal and acquires the accompaniment music of the song track by acquiring the sound signal in the environment and when the sound signal is determined to be the target sound type, and plays the audio after the accompaniment music and the sound signal are mixed, so that the K song mode is automatically entered, the complex flow of K song operation in the traditional technology is simplified, and the range of the application scene of the K song mode can be enlarged.

In one embodiment, as shown in fig. 2, the method may further include the following steps:

step 202, detecting a sound signal in an environment in real time.

Because the sound signal is a sound wave signal generated by vibration, the terminal can monitor the sound wave signal in the environment through the built-in audio acquisition module, so that the sound signal in the environment is detected in real time.

And 204, when the sound signal is detected, extracting the characteristics of the sound signal to obtain the corresponding sound characteristics.

The sound features include pitch, loudness, timbre, etc. Further, since the sound signal is a sound wave signal generated by vibration, which is a carrier of which the wavelength and intensity are changed, and the pitch is determined by the vibration frequency of the sound wave, the faster the vibration, the higher the pitch of the sound, and the slower the vibration, the lower the pitch of the sound. The loudness is determined by the amplitude of the sound wave, the larger the amplitude is, the larger the loudness is, and the smaller the amplitude is, the smaller the loudness is. The tone is determined by the waveform of the sound wave.

Specifically, when the terminal detects a sound signal, feature extraction is performed on the sound signal, for example, features such as pitch, loudness, tone, and the like of the sound signal are extracted to obtain corresponding sound features.

Step 206, inputting the sound characteristics into a preset classification neural network to obtain the sound category of the sound signal.

Wherein the classification neural network is a pre-trained network that classifies sound signals based on sound categories, which in this embodiment include song categories and non-song categories.

Specifically, the terminal inputs the acquired sound characteristics into a preset classification neural network, so that a sound category corresponding to a sound signal output by the classification neural network can be obtained.

In the above embodiment, the terminal detects the sound signal in the environment in real time, and when detecting the sound signal, performs feature extraction on the sound signal to obtain a corresponding sound feature, and inputs the sound feature into a preset classification neural network to obtain a sound class of the sound signal, so as to determine whether the sound signal is a target sound class, that is, determine whether the sound signal is a song class. Since the present embodiment identifies the sound characteristics of the sound signal through the classification neural network to determine the corresponding sound class, the accuracy of the sound class identification can be improved.

In one embodiment, as shown in fig. 3, the method may further include the following steps:

step 302, detecting an acoustic signal in an environment in real time.

Since the sound signal is an acoustic wave signal generated by vibration, the terminal detects the sound signal in the environment in real time by listening to the acoustic wave signal in the environment.

And 304, when the sound signal is detected, extracting the characteristics of the sound signal to obtain the corresponding sound characteristics.

The sound features include pitch, loudness, timbre, etc. Specifically, when the terminal detects a sound signal, feature extraction is performed on the sound signal, for example, features such as pitch, loudness, tone, and the like of the sound signal are extracted to obtain corresponding sound features.

Step 306, when the sound characteristic matches with the preset song voiceprint characteristic, determining the sound signal as the target sound category.

The voiceprint characteristic of the song can be a sound wave spectrum carrying song information, and the sound characteristic is a spectrum characteristic corresponding to vibration frequency, amplitude, waveform and the like of the sound wave signal. Therefore, the terminal compares the sound characteristics of the detected sound signal with the preset sound track characteristics of the song, and when the detected sound characteristics are matched with the preset sound track characteristics of the song, the sound signal can be determined to be the target sound type, namely the sound signal is determined to be the song type. And when the two do not match, it may be determined that the sound signal is not the target sound class.

In the above embodiment, the terminal detects the sound signal in the environment in real time, and when detecting the sound signal, performs feature extraction on the sound signal to obtain the corresponding sound feature, and when the sound feature matches with the preset song voiceprint feature, determines that the sound signal is the target sound type. According to the embodiment, the sound characteristics of the sound signals are compared with the preset sound print characteristics of the songs, so that whether the sound signals are of the target sound type can be rapidly and accurately determined.

In one embodiment, as shown in fig. 4, when the sound signal is determined to be the target sound category in step 104, the identifying the song track corresponding to the sound signal may specifically include the following steps:

and step 402, returning a K song prompt message when the sound signal is determined to be the target sound type.

The K song prompting information comprises at least one of text information or voice information and is used for prompting whether to enter a K song mode. Specifically, when the terminal determines that the sound signal is the target sound category, that is, determines that the sound signal is the song category, the karaoke prompting information may be returned to prompt whether to enter the karaoke mode. Specifically, the K song prompt information can be displayed at the terminal in a text form, or can be broadcast through voice when the terminal displays the text K song prompt information.

Step 404, when receiving the confirmation instruction of the K song prompt information, identifying the song track corresponding to the sound signal.

The confirmation instruction may be a feedback instruction made by the user on the karaoke prompt information, and specifically, the confirmation instruction may be an instruction or a command for agreeing to enter the karaoke mode.

In this embodiment, when the terminal receives a confirmation instruction for the K song prompt information, the song track corresponding to the sound signal is further identified.

In the above embodiment, when the terminal determines that the sound signal is the target sound type, the K song prompting information is returned to actively prompt whether to enter the K song mode, and only when a confirmation instruction for the K song prompting information is received, the song track corresponding to the sound signal is further identified, so that the terminal actively prompts, and enters the K song mode according to the feedback of the prompt, the power consumption of the terminal can be reduced, and the flexibility of entering the K song mode is improved.

In one embodiment, as shown in fig. 5, in step 108, playing audio obtained by mixing accompaniment music with a sound signal may specifically include the following steps:

step 502, echo cancellation processing is performed on the sound signal to obtain a target sound signal.

The echo refers to a reflected sound wave generated by reflection of the sound wave in the propagation process. Echo cancellation refers to the process of filtering out reflected sound waves in an acoustic signal. The target sound signal refers to a sound signal after filtering out reflected sound.

In this embodiment, the terminal performs echo cancellation processing on the detected sound signal, that is, filters out the reflected sound wave in the sound signal, thereby obtaining the target sound signal.

Step 504, the target sound signal and the accompaniment music are subjected to sound mixing processing, and the audio after sound mixing is obtained.

The mixing process refers to a process of integrating sounds of various sources into one stereo track or single track. In this embodiment, the terminal mixes the target sound signal with the corresponding accompaniment music after filtering the reflected sound, so as to obtain the audio after mixing.

Step 506, play the audio.

Specifically, the terminal plays the audio after the mixing, i.e. the audio signal hummed by the user and the accompaniment music of the identified song track are mixed and played, thereby realizing the K song.

In the above embodiment, the terminal performs echo cancellation processing on the sound signal to obtain the target sound signal, and performs audio mixing processing on the target sound signal and the accompaniment music to obtain the audio after audio mixing, so as to play the audio after audio mixing, thereby implementing K songs.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an audio data processing device for realizing the above-mentioned audio data processing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of the embodiment of one or more audio data processing devices provided below may be referred to the limitation of the audio data processing method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 6, there is provided an audio data processing apparatus including: a signal acquisition module 602, a song recognition module 604, a music acquisition module 606, and a play module 608, wherein:

a signal acquisition module 602, configured to acquire a sound signal in an environment;

a song identifying module 604, configured to identify a song track corresponding to the sound signal when the sound signal is determined to be the target sound category;

a music acquisition module 606 for acquiring accompaniment music of the song title;

a playing module 608, configured to play audio obtained by mixing the accompaniment music and the sound signal.

In one embodiment, the song recognition module is further configured to: when the sound signal is determined to be the target sound type, returning a K song prompt message; and when receiving a confirmation instruction of the K song prompt information, identifying the song track corresponding to the sound signal.

In one embodiment, the apparatus further comprises a sound class identification module for detecting sound signals in the environment in real time; when the sound signal is detected, extracting the characteristics of the sound signal to obtain corresponding sound characteristics; and inputting the sound characteristics into a preset classification neural network to obtain the sound category of the sound signal.

In one embodiment, the sound class identification module is further configured to: detecting sound signals in the environment in real time; when the sound signal is detected, extracting the characteristics of the sound signal to obtain corresponding sound characteristics; and when the sound characteristics are matched with preset song voiceprint characteristics, determining the sound signal as a target sound category.

In one embodiment, the playing module is further configured to: performing echo cancellation processing on the sound signal to obtain a target sound signal; mixing the target sound signal with the accompaniment music to obtain mixed audio; and playing the audio.

In an embodiment, to further illustrate the working principle of the audio data processing device, a schematic diagram of the audio data processing device is shown in fig. 7A, which specifically includes a microphone or microphone array, a signal processing module, a humming detection module, a logic control module, a K song mixing processing module, a power amplification module, a speaker, and a K song application module, where the microphone or microphone array may be configured in the signal acquisition module of fig. 6. The K song mixing processing module, the signal processing module, the power amplification module, and the speaker may be configured in the playing module of fig. 6. The humming detection module and the logic control module may be configured in the song recognition module of FIG. 6. The K song application module may be configured in the music retrieval module of fig. 6.

Specifically, the audio data processing device collects sound signals in the environment through a microphone or a microphone array, the signal processing module carries out echo cancellation, noise reduction and other processes on the collected sound signals, the processed signals are respectively input into the humming detection module and the K song mixing processing module, the humming detection module detects the sound types of the signals, namely, whether a user hums or sings, and the detection result is input into the logic control module, so that the logic control module carries out corresponding control according to the detection result. For example, if the detection result input to the logic control module is that the sound type of the signal is not the target sound type, i.e. not humming or singing by the user, the logic control module does not perform processing. If the detection result input into the logic control module indicates that the sound class of the signal is the target sound class, namely, the user humming or singing, the logic control module identifies the song track corresponding to the sound signal and starts the K song application module. The K song application module acquires corresponding accompaniment music according to the song tracks, and the power amplification module mixes and amplifies the accompaniment music and the sound signals processed by the signal processing module and plays the accompaniment music and the sound signals through the loudspeaker. The method not only simplifies the complex flow of the Karaoke operation in the traditional technology, but also can enlarge the range of the Karaoke mode application scene, for example, the method can be applied to driving scenes, so that a user can obtain the experience of driving while Karaoke.

In one embodiment, as shown in fig. 7A and 7B, the working principle of the audio data processing device is further described below. Specifically, the structure shown in fig. 7B adds a speech recognition module, a semantic understanding module, and a speech synthesis module to the structure shown in fig. 7A. Specifically, the audio data processing method of the present application is further described with reference to the structure shown in fig. 7B, and specifically as shown in fig. 8, the method includes the following steps:

in step 801, a microphone detects a sound signal in an environment and sends the detected sound signal to an information processing module.

In step 802, the information processing module performs processing such as echo cancellation and noise reduction on the sound signal.

In step 803, the information processing module sends the processed signal to the humming detection module, and the humming detection module detects whether the sound signal is humming.

In step 804, when the sound signal is detected to be humming, the humming signal is sent to the logic control module.

In step 805, the logic control module asks the user if the Karaoke mode is to be turned on.

In step 806, the speech synthesis module processes the query from the logic control module and generates a voice broadcast.

In step 807, the speech recognition module detects a feedback instruction for the voice broadcast.

If the confirmation instruction or the cancel instruction is used, the initial draft example takes the feedback instruction as the confirmation instruction.

In step 808, the speech recognition module sends the detected feedback instruction to the semantic understanding module.

And step 809, the semantic understanding module is used for understanding the feedback instruction and sending the information after understanding to the logic control module.

Step 810, the logic control module starts the karaoke application module according to the received information.

And 811, the K song application module acquires corresponding accompaniment music and sends the accompaniment music to the K song mixing processing module.

Step 812, the information processing module sends the processed sound signal to the K song mixing processing module.

Step 813, the k song mixing processing module mixes the received sound signal and the accompaniment music and outputs the mixed sound to the loudspeaker.

According to the embodiment, the terminal detects the sound signal in the environment, recognizes the corresponding song track when detecting that the sound signal is humming, and actively prompts the user whether to enter a K song mode, and acquires accompaniment music of the song track when receiving confirmation feedback of the user, so that the K song mode is automatically entered under the condition that a hand-held microphone is not available, the complex flow of K song operation in the traditional technology is simplified, the range of application scenes of the K song mode can be expanded, and the user can obtain better K song experience.

The respective modules in the above-described audio data processing device may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 9. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an audio data processing method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

acquiring sound signals in the environment;

acquiring accompaniment music of the song track;

In one embodiment, the processor when executing the computer program further performs the steps of: when the sound signal is determined to be the target sound type, returning a K song prompt message; and when receiving a confirmation instruction of the K song prompt information, identifying the song track corresponding to the sound signal.

In one embodiment, the processor when executing the computer program further performs the steps of: detecting sound signals in the environment in real time; when the sound signal is detected, extracting the characteristics of the sound signal to obtain corresponding sound characteristics; and inputting the sound characteristics into a preset classification neural network to obtain the sound category of the sound signal.

In one embodiment, the processor when executing the computer program further performs the steps of: detecting sound signals in the environment in real time; when the sound signal is detected, extracting the characteristics of the sound signal to obtain corresponding sound characteristics; and when the sound characteristics are matched with preset song voiceprint characteristics, determining the sound signal as a target sound category.

In one embodiment, the processor when executing the computer program further performs the steps of: performing echo cancellation processing on the sound signal to obtain a target sound signal; mixing the target sound signal with the accompaniment music to obtain mixed audio; and playing the audio.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

acquiring sound signals in the environment;

acquiring accompaniment music of the song track;

In one embodiment, the computer program when executed by the processor further performs the steps of: when the sound signal is determined to be the target sound type, returning a K song prompt message; and when receiving a confirmation instruction of the K song prompt information, identifying the song track corresponding to the sound signal.

In one embodiment, the computer program when executed by the processor further performs the steps of: detecting sound signals in the environment in real time; when the sound signal is detected, extracting the characteristics of the sound signal to obtain corresponding sound characteristics; and inputting the sound characteristics into a preset classification neural network to obtain the sound category of the sound signal.

In one embodiment, the computer program when executed by the processor further performs the steps of: detecting sound signals in the environment in real time; when the sound signal is detected, extracting the characteristics of the sound signal to obtain corresponding sound characteristics; and when the sound characteristics are matched with preset song voiceprint characteristics, determining the sound signal as a target sound category.

In one embodiment, the computer program when executed by the processor further performs the steps of: performing echo cancellation processing on the sound signal to obtain a target sound signal; mixing the target sound signal with the accompaniment music to obtain mixed audio; and playing the audio.

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

acquiring sound signals in the environment;

acquiring accompaniment music of the song track;

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of audio data processing, the method comprising:

acquiring sound signals in the environment;

acquiring accompaniment music of the song track;

2. The method of claim 1, wherein when the sound signal is determined to be the target sound category, identifying the song track corresponding to the sound signal comprises:

when the sound signal is determined to be the target sound type, returning a K song prompt message;

and when receiving a confirmation instruction of the K song prompt information, identifying the song track corresponding to the sound signal.

3. The method of claim 2, wherein the K song prompt message includes at least one of text information or voice information.

4. The method according to claim 1, wherein the method further comprises:

detecting sound signals in the environment in real time;

when the sound signal is detected, extracting the characteristics of the sound signal to obtain corresponding sound characteristics;

and inputting the sound characteristics into a preset classification neural network to obtain the sound category of the sound signal.

5. The method according to claim 1, wherein the method further comprises:

detecting sound signals in the environment in real time;

and when the sound characteristics are matched with preset song voiceprint characteristics, determining the sound signal as a target sound category.

6. The method according to any one of claims 1 to 5, wherein the playing audio after mixing the accompaniment music with the sound signal includes:

performing echo cancellation processing on the sound signal to obtain a target sound signal;

mixing the target sound signal with the accompaniment music to obtain mixed audio;

and playing the audio.

7. An audio data processing device, the device comprising:

a music acquisition module for acquiring accompaniment music of the song;

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.