CN111046226B

CN111046226B - Tuning method and device for music

Info

Publication number: CN111046226B
Application number: CN201811196608.0A
Authority: CN
Inventors: 孙浩华
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-10-15
Filing date: 2018-10-15
Publication date: 2023-05-05
Anticipated expiration: 2038-10-15
Also published as: CN111046226A

Abstract

The embodiment of the application discloses a tuning method and device for music, wherein the method comprises the following steps: receiving recording data aiming at a selected song sent by a client and identifying recording audio fingerprint characteristics contained in the recording data; determining target original audio data with audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from at least one original audio data associated with the selected song; performing tuning processing on the recorded data so that the recorded data is matched with the pitch of the target original singing audio data; and feeding back the recorded data after tuning processing to the client. According to the technical scheme, the experience degree of the user can be improved.

Description

Tuning method and device for music

Technical Field

The application relates to the technical field of Internet, in particular to a tuning method and device for music.

Background

With the continuous development of internet technology, more and more users begin to record songs using applications (applications) related to music. Such applications are for example QQ music, cool dog music, internet cloud music, etc. However, after the users record the songs, the users often find that the singing effect is not good, and singing problems such as tone running, tone breaking and the like usually occur.

Currently, some music production units typically use Auto Tune software to Tune a singer's work. However, because the operation interface of the software is professional, in the tuning process, too complex manual operation is often needed, so that the threshold used by the user is high, and the experience of the user is reduced.

Disclosure of Invention

The embodiment of the application aims to provide a tuning method and device for music, which can improve the experience of a user.

To achieve the above object, an embodiment of the present application provides a tuning method of music, the method including: receiving recording data aiming at a selected song sent by a client and identifying recording audio fingerprint characteristics contained in the recording data; determining target original audio data with audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from at least one original audio data associated with the selected song; performing tuning processing on the recorded data so that the recorded data is matched with the pitch of the target original singing audio data; and feeding back the recorded data after tuning processing to the client.

To achieve the above object, embodiments of the present application further provide a tuning device for music, the device including a memory and a processor, the memory being configured to store a computer program, the computer program, when executed by the processor, implementing the steps of: receiving recording data aiming at a selected song sent by a client and identifying recording audio fingerprint characteristics contained in the recording data; determining target original audio data with audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from at least one original audio data associated with the selected song; performing tuning processing on the recorded data so that the recorded data is matched with the pitch of the target original singing audio data; and feeding back the recorded data after tuning processing to the client.

As can be seen from the above, in the present application, after receiving the recording data for the selected song sent from the client, the recorded audio fingerprint feature included in the recording data may be identified. Since the selected song is typically associated with at least one original audio data, for example, a song will typically have at least one version of the singing, including the original version of the original singer, the everted version of the other singer, etc., where each version of the singing corresponds to one of the original audio data. Then, target original audio data with the audio fingerprint characteristics matched with the recorded audio fingerprint characteristics can be determined from the original audio data. After the target original voice data is determined, tuning processing can be performed on the recording data based on the target original voice data, so that the recording data is matched with the pitch of the target original voice data. At this time, the recorded data after tuning processing can be fed back to the client. Therefore, when the client plays the audio represented by the recorded data after tuning processing, the singing problems such as tone running, tone breaking and the like can not occur. Therefore, the user only needs to record songs aiming at the selected songs, tuning can be automatically completed through the technical scheme provided by the application, simplicity and intelligence are achieved, the threshold is low, and therefore the experience of the user can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a tuning method for music in an embodiment of the present application;

fig. 2 is a schematic diagram of an application scenario of a tuning method of music in an embodiment of the present application;

fig. 3 is a schematic diagram of a beat debugging process in the embodiment of the present application;

fig. 4 is a schematic structural view of a tuning device for music in the embodiment of the present application.

Detailed Description

In order to make the technical solutions in the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The embodiment of the application provides a tuning method of music, which can be applied to independent tuning processing terminal equipment. The terminal device may be an electronic device having data operation, storage function, page display function, and network interaction function. Specifically, the terminal device may be, for example, a desktop computer, a notebook computer, a tablet computer, or the like. The terminal device may also be software running in the electronic device that provides support for data processing, storage, page display, and network interaction. Referring to fig. 1, the method may include the following steps.

S11: recording data for the selected song sent by the client is received, and recorded audio fingerprint characteristics contained in the recording data are identified.

In this embodiment, the terminal device is provided with a song library for different songs. Wherein the song word library may be a data set storing lyrics and corresponding singing beats. The song word library can adopt any one of MySQL, oracle, DB, sybase and other database formats. The word stock may be deployed on a storage medium in the terminal device. Furthermore, lyrics of part of songs and corresponding singing tempo data can be downloaded from the terminal device through the client and stored in a cache of the client.

In this embodiment, the client may be an electronic device having a recording function and a shooting function. Specifically, the client may be, for example, a tablet computer, a notebook computer, a smart phone, a smart wearable device, or the like. Alternatively, the client may be software that can be run in the electronic device. The client can be provided with a communication module and can be in communication connection with remote terminal equipment to realize data transmission with the terminal equipment.

In this embodiment, the client may present the link of the song to be selected to the user. The links may be text links or picture links. At this time, if the user wants to sing and record a certain song, the user can click on the link of the song, so that the song he wants to sing can be selected. After the user clicks the text link of the song name, the client sends a lyric loading request comprising a lyric identifier and a beat identifier to the terminal equipment. Wherein the lyrics identification is used to identify lyrics of the song and the beat identification is used to identify singing beat data of the song. After receiving the lyric loading request, the terminal device may extract the lyric identifier and the beat identifier from the lyric loading request. After the lyric identifier and the beat identifier are extracted, the terminal device can read lyrics with the lyric identifier and singing beat data with the beat identifier from the lyric library, and feed back the lyrics and singing beat data corresponding to the song to the client. Therefore, the client can display lyrics corresponding to the song to the user, and when the song recording starts, the client can scroll and display the lyrics to the user word by word or row by row according to the singing beat data so that the user can sing along with the rhythm of the lyrics display, and meanwhile, the client can record audio data singed by the user through the loaded microphone or record video data singed by the user through the loaded microphone and the front-end camera. In this embodiment, the lyric loading request may be a character string written according to a preset rule. Wherein, the preset rule may be a network communication protocol followed between the client and the terminal device. For example, the lyric loading request may be a string written in accordance with the HTTP protocol. The preset rules may define individual components in the lyrics loading request and the order of arrangement between the individual components.

In this embodiment, in practical applications, the client typically does not play accompaniment music of the selected song when scrolling the lyrics to the user word by word or row by row. In this way, the recorded data may include audio data for characterizing the user's singing content or video data for characterizing the user's singing content and photographed user's singing pictures, but not accompaniment music data for characterizing accompaniment music of a user-selected song. The recorded audio data may be the complete recorded audio data for the selected song, or may be the segment recorded audio data for the specified time segment of the selected song. For example, the clip recorded audio data may be recorded audio data for a chorus portion of the selected song. The specified time period may also be set according to the actual application situation, which is not limited herein.

In this embodiment, after the client records the recording data of the song selected by the user, the client may display, to the user, a play control for playing the user singing content represented by the recording data, a re-recording control for re-recording the recording data, and a tuning control for tuning the recording data. Thus, after recording the song, the user can play back the singing audio or video by clicking the play control. If the user feels bad, the recording control can be clicked to re-record, or the tuning control can be clicked to send a tuning request comprising the recording data and the song identification to the terminal equipment. Wherein the song identification may be used to identify the original audio data associated with the selected song. When receiving the tuning request, the terminal device can receive recording data for the selected song sent by the client. Specifically, after receiving the tuning request, the terminal device may obtain recording data for the selected song from the tuning request, and may extract the song identification. In this embodiment, an audio database may also be provided in the terminal device. The audio database may be a data set storing audio data. The audio database can adopt any one of MySQL, oracle, DB, sybase and other database formats. The audio database may be deployed on a storage medium in the terminal device. Wherein the audio database may store at least one original audio data for a song. For example, a song will typically have at least one version of singing, including the original version of singing by the original singer, the everted version of the other singer, etc., where each version of singing corresponds to one of the original audio data. Thus, after extracting the song identification, the terminal device can read the original audio data with the song identification from the audio database. In this embodiment, each original audio data in the audio database may have a respective song identifier. The song identification and the original audio data can be stored in a key-value (key-value pair) mode, so that the corresponding original audio data can be obtained from the audio database through the provided song identification.

In this embodiment, in practical application, the recorded audio data and the original audio data in the audio database generally have corresponding audio fingerprint features. In order to facilitate subsequent tuning processing of the recorded data, after receiving recorded data for a selected song sent by the client, the terminal device may further identify recorded audio fingerprint features included in the recorded data, so as to determine target original audio data with audio fingerprint features matched with the recorded audio fingerprint features from at least one original audio data associated with the selected song according to the identified recorded audio fingerprint features, and perform tuning processing on the recorded data according to the target original audio data. Specifically, if the recorded data is recorded audio data or clip recorded audio data, the recorded audio fingerprint feature or clip recorded audio fingerprint feature included in the recorded audio data or clip recorded audio data may be directly identified. For example, after receiving the recorded audio data, the recorded audio data may be converted from a time domain to a frequency domain to obtain recorded audio data in the frequency domain, and the recorded audio data in the frequency domain may be converted to a Bark domain with reference to a preset frequency interval to obtain a plurality of frequency domain subbands. Wherein each frequency domain sub-band corresponds to a frequency bin. Finally, the subband energy value of each subband may be calculated and the subband energy value of each subband may be used as the recorded audio fingerprint. In this embodiment, the preset frequency interval may be set according to practical applications, for example, the auditory system of the reference person, and the preset frequency interval may include 300 hz to 2000 hz. If the recorded data is recorded video data, recorded picture data and recorded audio data can be separated from the recorded video data. Specifically, since the recorded video data is generally data in a specified package format, for example, FLV format, MP4 format, MKV format, or the like, the recorded video data needs to be unpacked to obtain audio compressed data and picture compressed data, and then audio decoding and picture decoding are performed on the audio compressed data and the picture compressed data, respectively, to obtain the recorded audio data and the recorded picture data. After separation of the recorded audio data, recorded audio fingerprint features contained in the recorded video data may be identified in the manner described above.

S13: and determining target original singing audio data with the audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from at least one original singing audio data associated with the selected song.

In this embodiment, in order to facilitate tuning the recording data, after receiving the recording data for the selected song sent by the client and identifying the recording audio fingerprint feature included in the recording data, the terminal device may determine, from at least one piece of original audio data associated with the selected song, target original audio data having an audio fingerprint feature matching the recording audio fingerprint feature. In practical applications, the terminal device may also provide an audio fingerprint database. The audio fingerprint database may be a data set storing audio fingerprint features. The audio database can adopt any one of MySQL, oracle, DB, sybase and other database formats. The audio fingerprint database may also be deployed on a storage medium in the terminal device. The audio fingerprint database can store audio fingerprint features bound with original audio data, and binding identifiers and fingerprint feature identifiers. The binding identifier may be used to identify the audio fingerprint feature to which the original audio data is bound. The fingerprint identification may be used to identify the audio fingerprint. The tuning request sent by the client can also comprise the binding identifier and the fingerprint feature identifier. Thus, after receiving the tuning request sent by the client, the terminal device not only reads at least one original audio data of the selected song through the song identifier included in the tuning request, but also extracts the binding identifier and the fingerprint feature identifier from the tuning request. After the binding identifier and the fingerprint feature identifier are extracted, the terminal equipment can read the audio fingerprint features with the binding identifier and the fingerprint feature identifier from the audio database, so that the audio fingerprint features bound with the original audio data can be obtained.

After obtaining the audio fingerprint feature bound to the original audio data, the terminal device may determine target original audio data having an audio fingerprint feature matching the recorded audio fingerprint feature from at least one of the original audio data associated with the selected song. Specifically, if the recorded data is recorded audio data, the terminal device may calculate, for at least one original audio data associated with the selected song, a similarity between the recorded audio fingerprint feature and each audio fingerprint feature bound with the original audio data. And then, the original singing audio data associated with the audio fingerprint features corresponding to the maximum similarity can be used as the target original singing audio data. If the recorded data is segment recorded audio data, the terminal device may first obtain segment original audio data in the specified time segment in the original audio data, and obtain segment audio fingerprint features in the specified time segment from audio fingerprint features of the original audio data, for at least one original audio data associated with the selected song. Then, the original singing audio data of the clip, whose audio fingerprint features match the recorded audio fingerprint features of the clip identified in step S11, may be used as the target original singing audio data. If the recorded data is recorded video data, the terminal device may calculate, for at least one original audio data associated with the selected song, a similarity between the recorded audio data separated from the recorded video data and audio fingerprint features bound to each of the original audio data, respectively. And then, the original singing audio data associated with the audio fingerprint features corresponding to the maximum similarity can be used as the target original singing audio data.

S15: and performing tuning processing on the recorded data so that the recorded data is matched with the pitch of the target original voice frequency data.

In this embodiment, in order to solve the singing problem of running, breaking, etc. that occurs in the singing work of the user for the selected song, after receiving the recording data for the selected song sent from the client, and determining, from at least one piece of original audio data associated with the selected song, target original audio data whose audio fingerprint feature matches with the recorded audio fingerprint feature, the terminal device may further perform tuning processing on the recording data according to the target original audio data, so that the recording data matches with the pitch of the target original audio data. Therefore, the singing problems such as tone running, tone breaking and the like of the singing works after the tone tuning processing of the recorded data representation after the tone tuning processing can be avoided. Specifically, if the recorded data is recorded audio data or segment recorded audio data, the terminal device may first determine whether the pitch of the designated singing time in the recorded audio data or the segment recorded audio data is the same as the pitch of the designated singing time in the target original audio data. If the recorded data is recorded video data, the terminal device may first determine whether the pitch of the designated singing moment in the recorded audio data separated from the recorded video data is the same as the pitch of the designated singing moment in the target original audio data. The designated singing moment can be used for representing any singing moment in the recorded data or the target original singing audio data. If the pitch of the appointed singing moment in the recording data is different from the pitch of the appointed singing moment in the target original singing audio data, the pitch of the appointed singing moment in the target original singing audio data can be used as the pitch of the appointed singing moment in the recording data. For example, a user records a singing work for a song with higher difficulty, and always cannot sing on a high-pitched part, and even breaks. At this time, the pitch of each singing moment corresponding to the treble part in the target original singing audio data can be used as the pitch of each singing moment corresponding to the treble part in the recorded audio data. Thus, the defect of singing when the user records can be overcome.

In one embodiment, if the recorded data is recorded video data, after tuning the recorded audio data separated from the recorded video data, the terminal device may further perform audio-video merging processing on the recorded picture data separated from the recorded video data and the recorded audio data after tuning processing, to obtain the recorded video data after tuning processing. Specifically, for a designated time node in different time nodes of recorded video data before tuning, performing audio-video merging processing on picture data at the designated time node in the recorded picture data and audio data at the designated time node in the recorded audio data after tuning, so as to obtain the video data at the designated time node after tuning. Then, for different time nodes, the video data at different time nodes after tuning processing can be obtained, and thus, the recorded video data after tuning processing can be obtained.

S17: and feeding back the recorded data after tuning processing to the client.

In this embodiment, after tuning the recorded data, the terminal device feeds back the tuned recorded data to the client through a remote connection established with the client. Thus, after receiving the tuning processed recorded data, the client may play audio or video characterized by the tuning processed recorded data to the user.

In one embodiment of the present application, in practical application, since the user does not have accompaniment music during recording, when enjoying his own singing work, it is often desirable to have corresponding accompaniment music. In order to meet such a demand of the user, the terminal device may also determine appropriate accompaniment music data after performing tuning processing on the recorded data. Specifically, the terminal device may also provide an accompaniment music database. The accompaniment music database may be a data set storing accompaniment music data and correspondence relation of the original audio data and accompaniment music data. The accompaniment music database may adopt any one of the database formats of MySQL, oracle, DB, sybase, etc. The accompaniment music database may also be disposed on a storage medium in the terminal device. Thus, after tuning the recorded data, the terminal device may further read, from the audio database, accompaniment music data corresponding to the target original audio data according to the determined target original audio data and the corresponding relationship between the original audio data and the accompaniment music data, and may use the accompaniment music data corresponding to the target original audio data as accompaniment music data corresponding to the tuning-processed recorded data. And then, carrying out the sound playing combination processing on the recorded data after the sound tuning processing and the accompaniment music data corresponding to the recorded data after the sound tuning processing to obtain the recorded data after the sound playing combination processing. Thus, after the recorded data after the sound playing combination processing is fed back to the client, the client can play the recorded audio represented by the recorded data after the tuning processing and the accompaniment music represented by the corresponding accompaniment music data to the user at the same time.

In one embodiment of the present application, in order to facilitate the user to compare and appreciate the recorded data before the tuning process with the recorded data after the tuning process, after the terminal device feeds back the recorded data after the tuning process to the client, the client may display the recorded data before the tuning process and the recorded data after the tuning process to the user at the same time. For example, if the recorded data is recorded video data, the client may use a multi-format video display manner to display the recorded video data before tuning and the recorded video data after tuning to the user at the same time, so that the user can compare and appreciate the difference between the recorded data before tuning and the recorded data after tuning at the same time.

In a specific application scenario, the client may be a smart phone. The user can record video data for his selected song through the front-facing camera and microphone on the smartphone. As shown in fig. 2, the user may start a song recording application on the smart phone, where different links of songs may be displayed on the song recording application, for example, the links may be text links of song names "talk about scatter", "talk about true", "learn cat", and the user may select a song to record according to his own wish. For example, the user clicks on a text link with a song name of "true", at which time the client may send a lyrics loading request including a lyrics identification and a beat identification to the terminal device. After receiving the lyric loading request, the terminal device may extract the lyric identifier and the beat identifier from the lyric loading request. After the lyric identifier and the beat identifier are extracted, the terminal device can read lyrics with the lyric identifier and singing beat data with the beat identifier from the lyric library, and feed back the lyrics and singing beat data corresponding to the song to the client. Therefore, the client can display the lyrics corresponding to the song to the user, and after the user clicks a start recording control displayed on the song recording application, the user can sing along with the lyrics displayed by the client in a word-by-word or line-by-line rolling mode and the rhythm of the lyrics display. After the recording is completed, the user clicks an end recording control displayed on the song recording application. At this time, the song recording application may present to the user a re-recording control for re-recording the recorded data and a tuning control for tuning the process, as well as a play control for playing the current recorded data. And if the user clicks the tuning control, a tuning request comprising the recording data and the song identification can be sent to the terminal equipment. When receiving the tuning request, the terminal device may receive the recorded video data for the selected song sent by the client, and may first separate recorded picture data and recorded audio data from the recorded video data, and then may identify recorded audio fingerprint features included in the recorded audio data. After identifying the recorded audio fingerprint feature contained in the recorded audio data, a song identification extracted from the tuning request may be utilized to determine target raw audio data having an audio fingerprint feature matching the recorded audio fingerprint feature from at least one raw audio data associated with the selected song. And then, tuning the recorded audio data according to the target original singing audio data so as to enable the recorded audio data to be matched with the pitch of the target original singing audio data. Then, according to the corresponding relation between the original singing audio data and the accompaniment music data, the accompaniment music data corresponding to the recorded data after tuning processing can be determined, the accompaniment music data corresponding to the recorded audio data after tuning processing and the accompaniment music data corresponding to the recorded audio data after tuning processing can be subjected to audio-to-video combination processing, and then the recorded picture data separated from the recorded video data and the recorded audio data subjected to audio-to-video combination processing are subjected to audio-to-video combination processing, so that the recorded video data after tuning processing is obtained. Finally, the recorded video data after tuning processing can be fed back to the smart phone. After receiving the recorded video data after the tuning process, the smart phone can display the recorded video data before the tuning process and the recorded video data after the tuning process to a user simultaneously in a multi-format video display mode. The recorded video data before tuning is original voice and no accompaniment, and the recorded video data after tuning is tuning voice and accompaniment. Therefore, the user only needs to record songs aiming at the selected songs, tuning can be automatically completed through the song recording application, simplicity and intelligence are achieved, the threshold is low, and therefore the experience of the user can be improved.

In one embodiment of the present application, in the process of recording audio, noise may be usually associated with the surrounding environment of the user, which may cause noise to be introduced during recording of the recorded data. After introducing noise, the signal-to-noise ratio of the recorded data may be low, thereby affecting the subsequent tuning process. In this case, in order to improve the signal-to-noise ratio of the recorded data, after receiving the recorded data for the selected song sent by the client, the terminal device may further perform smoothing filtering processing on the recorded data, and filter noise in the recorded data, so that the signal-to-noise ratio of the recorded data after the smoothing filtering processing is greater than or equal to a specified signal-to-noise ratio threshold. The value range of the specified signal-to-noise ratio threshold value can comprise 80-100 percent, and can be specifically set according to actual conditions. The recorded data after the smoothing filter process may then be replaced with the recorded data before the smoothing filter process so that tuning processing is subsequently performed on the recorded data after the smoothing filter process. In practical applications, the smoothing filtering process may include neighborhood average filtering, median filtering, gaussian filtering, frequency domain filtering, and so on.

In one embodiment of the present application, when recording a singing work, the beat of the singing of the user may be asynchronous with the singing beat of the song selected by the user, which may affect the subsequent tuning process of the recorded data. In this case, after receiving the recording data for the selected song transmitted from the client, the terminal device may further perform a beat debug process on the recording data, so that the singing beat in the recording data after the beat debug process is synchronized with the singing beat in the singing beat data of the selected song. For example, when the user records the audio of a song with a song name of "true," as shown in fig. 3, if the song-speaking beat a in the song "true" is not 0.1 seconds faster than the corresponding beat B in the song in the singing song lyrics part, "the song-speaking beat a in the recorded audio data may be lengthened by 0.1 seconds to obtain the song-speaking beat A1, and then the song-speaking beat A1 is truncated by 0.1 seconds before the song-speaking beat A1, that is, the partial beat in the dashed line frame in fig. 3, to obtain the song-speaking beat A2, and at this time, the song-speaking beat A2 in the recorded audio data is synchronized with the corresponding beat B in the song. The recorded data after the beat debugging process may then be replaced with the recorded data before the beat debugging process. Thus, tuning processing is performed on the recorded data after the beat debugging processing in the following.

In one embodiment of the present application, in practical applications, after tuning the recorded data, some users may want to know the difference between the audio before tuning and after tuning in order to improve the disadvantage of singing themselves. To meet such a user's demand, after tuning the recorded data, a data presentation page may be displayed to the user. The data display page can be used for displaying recorded data before tuning processing and recorded data after tuning processing. The data presentation page can comprise a play control before tuning and a play control after tuning. The pre-tuning play control may be disposed near an area where recorded data prior to the tuning process is presented. The post-tuning play control may be disposed near an area where recorded data after the tuning process is presented. When the play control before tuning is triggered, the audio represented by the part of the recorded data to be tuned in the recorded data before tuning is played can be represented. And when the play control after tuning is triggered, playing the audio represented by the part of the recorded data in the time slice corresponding to the part of the recorded data to be tuned in the recorded data after tuning.

For example, on a data presentation page presenting recorded data before tuning and recorded data after tuning, the recorded data to be tuned and the recorded data not to be tuned in the recorded data before tuning may be presented in different colors so as to distinguish the recorded data. Similarly, the partial recorded data after the tuning process and the partial recorded data without the tuning process may be displayed by different colors. In general, since there may be a plurality of partial record data to be tuned in the record data before tuning, a corresponding pre-tuning play control may be set near the display area of each partial record data to be tuned, so that when the user clicks a nearby pre-tuning play control, the audio represented by the corresponding partial record data to be tuned near the pre-tuning play control in the record data before tuning may be played to the user. Similarly, a plurality of post-tuning play controls can be set at corresponding positions near the recording data area after the tuning process is displayed, so that when a user clicks the post-tuning play control at the corresponding position, audio represented by part of the recording data in a time slice corresponding to the part of the recording data to be tuned in the recording data after the tuning process can be played to the user. So that the user can easily understand the difference of audio before and after each tuning process.

In one embodiment of the present application, the terminal device may also be a system architecture formed by a client and a server. The client may be an electronic device having a sound recording function and a photographing function. Specifically, the client may be, for example, a tablet computer, a notebook computer, a smart phone, a smart wearable device, or the like. Alternatively, the client may be software that can be run in the electronic device. The client can be provided with a communication module and can be in communication connection with a remote server to realize data transmission with the server. The server may be a device that stores audio data. Specifically, the server may be an electronic device having data operation, storage function and network interaction function; software running in the electronic device that supports data processing, storage, and network interactions may also be used. The number of servers is not particularly limited in the present embodiment. The server may be one server, several servers, or a server cluster formed by several servers. In some application scenarios, the client side may send the recorded audio data to the server in real time, the server side performs tuning processing, the recorded audio data after tuning processing may be fed back to the client, and the client may display the data display page to the user. In the embodiment of the processing on the server side, such as music tuning, is performed, the processing speed is generally higher than that on the client side, so that the processing pressure of the client side can be reduced, and the tuning processing speed can be improved. Of course, this description does not exclude other embodiments where all or part of the above processing is performed by the client side, such as where the client side performs real-time tuning processing.

In this embodiment, the functions implemented in the above-described method steps may be implemented by a computer program, which may be stored in a computer storage medium. In particular, the computer storage medium may be coupled to a processor, which may thereby read a computer program in the computer storage medium. The computer program, when executed by a processor, may perform the following functions:

s11: receiving recording data aiming at a selected song sent by a client and identifying recording audio fingerprint characteristics contained in the recording data;

s13: determining target original audio data with audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from at least one original audio data associated with the selected song;

s15: performing tuning processing on the recorded data so that the recorded data is matched with the pitch of the target original singing audio data;

s17: and feeding back the recorded data after tuning processing to the client.

In one embodiment, the computer program when executed by the processor further performs the steps of:

performing audio-video combination processing on the recorded picture data and the recorded audio data subjected to tuning processing to obtain recorded video data subjected to tuning processing;

And correspondingly, feeding back the recorded video data after tuning processing to the client.

In one embodiment, the memory is further for storing an audio fingerprint database; wherein the audio fingerprint database comprises audio fingerprint characteristics bound with the original audio data; the target original singing audio data is determined according to the following steps:

calculating the similarity between the recorded audio fingerprint characteristics and the audio fingerprint characteristics bound with the original singing audio data;

and taking the original singing audio data associated with the audio fingerprint characteristics corresponding to the maximum similarity as the target original singing audio data.

In one embodiment, the computer program, when executed by the processor, performs tuning processing on the recorded data, including the steps of:

judging whether the pitch of the appointed singing moment in the recorded data is the same as the pitch of the appointed singing moment in the target original singing audio data;

and if the target original singing voice frequency data is different, taking the pitch of the appointed singing moment in the target original singing voice frequency data as the pitch of the appointed singing moment in the recording data.

Performing smoothing filtering processing on the recorded data so that the signal-to-noise ratio of the recorded data after the smoothing filtering processing is greater than or equal to a specified signal-to-noise ratio threshold;

and replacing the recorded data after the smoothing filter processing with the recorded data before the smoothing filter processing.

In one embodiment, the memory is further for storing singing tempo data of the selected song; the computer program, when executed by the processor, further performs the steps of:

performing beat debugging processing on the recorded data to enable the singing beat in the recorded data after the beat debugging processing to be synchronous with the singing beat in the singing beat data of the selected song;

and replacing the recorded data after the beat debugging process with the recorded data before the beat debugging process.

In one embodiment, the memory is further configured to store a correspondence between the original audio data and accompaniment music data; the computer program, when executed by the processor, further performs the steps of:

according to the corresponding relation, determining the accompaniment music data corresponding to the target original singing audio data, and taking the accompaniment music data corresponding to the target original singing audio data as accompaniment music data corresponding to the recorded data after tuning processing;

Performing sound-playing merging processing on the recorded data after tuning processing and accompaniment music data corresponding to the recorded data after tuning processing;

correspondingly, the recorded data after the voice playing combination processing is fed back to the client.

displaying a data display page; the data display page is used for displaying recorded data before tuning processing and recorded data after tuning processing; the data display page comprises a play control before tuning and a play control after tuning;

when the play control before tuning is triggered, playing the audio represented by the part of the recorded data to be tuned in the recorded data before tuning;

and when the play control after tuning is triggered, playing the audio represented by the part of the recorded data in the time slice corresponding to the part of the recorded data to be tuned in the recorded data after tuning.

It should be noted that, the functions that can be implemented by the computer program in the computer storage medium may refer to the foregoing method implementation manner, and the technical effects that are implemented by the foregoing method implementation manner are similar to those that are implemented by the foregoing method implementation manner, and will not be repeated here.

Referring to fig. 4, the present application further provides a tuning device for music, the device may include a memory and a processor, the memory storing a computer program, the computer program when executed by the processor implementing the steps of:

s17: and feeding back the recorded data after tuning processing to the client.

In this embodiment, the memory may include physical means for storing information, typically by digitizing the information and then storing the information in a medium using electrical, magnetic, or optical methods. The memory according to the present embodiment may further include: means for storing information by means of electrical energy, such as RAM, ROM, etc.; devices for storing information by magnetic energy, such as hard disk, floppy disk, magnetic tape, magnetic core memory, bubble memory, and USB flash disk; devices for storing information optically, such as CDs or DVDs. Of course, there are other ways of storing, such as quantum storing, graphene storing, etc.

In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor, and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), a programmable logic controller, and an embedded microcontroller, among others.

The specific functions implemented by the memory and the processor of the device provided in the embodiments of the present disclosure may be explained in comparison with the previous embodiments in the present disclosure, and may achieve the technical effects of the previous embodiments, which will not be repeated here.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

Those skilled in the art will also appreciate that, in addition to implementing clients, servers in the form of pure computer readable program code, it is well possible to implement the same functions by logically programming method steps such that clients, servers are implemented in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, and the like. Such clients, servers may therefore be considered as a hardware component, and the means included therein for performing various functions may also be considered as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are referred to each other, and each embodiment is mainly described as different from other embodiments. In particular, for the embodiments of the computer storage medium, the auxiliary bedroom device, the server and the client, reference may be made to the description of the embodiments of the aforementioned method for comparison explanation.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Although the present application has been described by way of embodiments, those of ordinary skill in the art will recognize that there are many variations and modifications of the present application without departing from the spirit of the present application, and it is intended that the appended claims encompass such variations and modifications without departing from the spirit of the present application.

Claims

1. A method of tuning music, the method comprising:

receiving recording data aiming at a selected song sent by a client and identifying recording audio fingerprint characteristics contained in the recording data;

determining target original audio data with audio fingerprint characteristics matched with the recorded audio fingerprint characteristics from a plurality of original audio data associated with the selected song;

performing tuning processing on the recorded data so that the recorded data is matched with the pitch of the target original singing audio data;

and feeding back the recorded data after tuning processing to the client.

2. The method of claim 1, wherein the recording data comprises recording video data; identifying recorded audio fingerprint features contained in the recorded data includes:

separating recorded picture data and recorded audio data from the recorded video data;

identifying recorded audio fingerprint features contained in the recorded video data;

accordingly, tuning processing is carried out on the recorded audio data, so that the pitch of the recorded audio data is matched with that of the target original audio data.

3. The method of claim 2, wherein after tuning the recorded audio data, the method further comprises:

4. The method according to claim 1, characterized in that an audio fingerprint database is provided; wherein the audio fingerprint database comprises audio fingerprint characteristics bound with the original audio data; the target original singing audio data is determined according to the following steps:

5. The method of claim 1, wherein tuning the recorded data comprises:

6. The method of claim 1, wherein after receiving the recording data for the selected song from the client, the method further comprises:

7. The method of claim 1, wherein the selected song is provided with singing tempo data; after receiving the recording data for the selected song sent from the client, the method further includes:

8. The method according to claim 1, wherein a correspondence relation of the original audio data and accompaniment music data is provided; after tuning the recorded data, the method further comprises:

9. The method of claim 1, wherein after tuning the recorded data, the method further comprises:

10. A tuning device for music, the device comprising a memory and a processor, the memory for storing a computer program which, when executed by the processor, performs the steps of:

and feeding back the recorded data after tuning processing to the client.

11. The apparatus of claim 10, wherein the recording data comprises recording video data; identifying recorded audio fingerprint features contained in the recorded data when the computer program is executed by the processor comprises the steps of:

12. The apparatus of claim 11, wherein the computer program when executed by the processor further performs the steps of:

13. The apparatus of claim 10, wherein the memory is further configured to store an audio fingerprint database; wherein the audio fingerprint database comprises audio fingerprint characteristics bound with the original audio data; the target original singing audio data is determined according to the following steps:

14. The apparatus of claim 10, wherein the computer program, when executed by the processor, performs tuning processing on the recorded data comprising the steps of:

15. The apparatus of claim 10, wherein the computer program when executed by the processor further performs the steps of:

16. The apparatus of claim 10, wherein the memory is further for storing singing tempo data for the selected song; the computer program, when executed by the processor, further performs the steps of:

17. The apparatus of claim 10, wherein the memory is further configured to store a correspondence of the original audio data and accompaniment music data; the computer program, when executed by the processor, further performs the steps of:

18. The apparatus of claim 10, wherein the computer program when executed by the processor further performs the steps of: