WO2021068903A1 - 确定音量的调节比例信息的方法、装置、设备及存储介质 - Google Patents

确定音量的调节比例信息的方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021068903A1
WO2021068903A1 PCT/CN2020/120044 CN2020120044W WO2021068903A1 WO 2021068903 A1 WO2021068903 A1 WO 2021068903A1 CN 2020120044 W CN2020120044 W CN 2020120044W WO 2021068903 A1 WO2021068903 A1 WO 2021068903A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
singing
segment
loudness
accompaniment
Prior art date
Application number
PCT/CN2020/120044
Other languages
English (en)
French (fr)
Inventor
庄晓滨
林森
Original Assignee
腾讯音乐娱乐科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯音乐娱乐科技(深圳)有限公司 filed Critical 腾讯音乐娱乐科技(深圳)有限公司
Priority to US17/766,911 priority Critical patent/US20230252964A1/en
Publication of WO2021068903A1 publication Critical patent/WO2021068903A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/46Volume control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/366Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems with means for modifying or correcting the external signal, e.g. pitch correction, reverberation, changing a singer's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/005Musical accompaniment, i.e. complete instrumental rhythm synthesis added to a performed melody, e.g. as output by drum machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal

Definitions

  • This application relates to the field of Internet technology, and in particular to a method, device, equipment, and storage medium for determining volume adjustment ratio information.
  • the singing application provides users with accompaniment audio, and the user sings songs with the accompaniment audio.
  • the terminal running the singing application will record the user's vocal audio, synthesize the vocal audio and the accompaniment audio into singing audio, and publish it on the Internet.
  • the user can adjust the volume of the accompaniment audio to obtain the accompaniment audio after the volume adjustment, which is then synthesized with the vocal audio.
  • the released singing audio can be recorded twice, that is, the user can select the poorly sung segment of the original singing audio (that is, the released singing audio) for the second recording.
  • the terminal provides the user with the accompaniment audio after volume adjustment, and records the user's vocal audio, synthesizes the singing audio of the selected segment, and replaces the singing audio of the selected segment with the original singing audio of the segment selected by the user Audio, and then realize the secondary recording of the original singing audio.
  • the method for the terminal to obtain the accompaniment audio adjusted by the user is to filter out the vocal audio from the original singing audio based on a preset algorithm, thereby extracting the accompaniment audio.
  • a preset algorithm due to the defect of the algorithm itself, there are more accompaniment audio extracted.
  • the extracted accompaniment audio can be compared with the original accompaniment audio (that is, the accompaniment audio whose volume has not been adjusted by the user) to obtain the user's adjustment ratio information of the original accompaniment audio volume, and then the adjustment ratio information according to the volume Obtain the accompaniment audio adjusted by the user without noise.
  • the noise in the accompaniment audio extracted by the algorithm may affect the comparison result with the original accompaniment audio, resulting in inaccurate volume adjustment ratio information.
  • the embodiments of the present application provide a method, device, device, and storage medium for determining volume adjustment ratio information, which can avoid noise in the accompaniment audio in the singing audio, which may cause the user to be inconsistent with the original accompaniment audio volume adjustment ratio information.
  • the exact question. The technical solution is as follows:
  • the ratio of the loudness characteristic of the first audio to the loudness characteristic of the second audio is determined as the adjustment ratio information for adjusting the accompaniment volume of the first singing audio.
  • the acquiring the first audio of the non-singing part of the first singing audio includes:
  • acquiring the second audio corresponding to the first audio with a playback time period includes:
  • the acquiring the loudness characteristic of the first audio includes: dividing the first audio into a plurality of third audio segments of preset duration, and determining the loudness characteristic of each third audio segment;
  • the acquiring the loudness characteristic of the second audio includes: dividing the second audio into a plurality of fourth audio segments of preset duration, and determining the loudness characteristic of each fourth audio segment.
  • the determining the ratio of the loudness characteristic of the first audio to the loudness characteristic of the second audio as the adjustment ratio information for adjusting the accompaniment volume of the first singing audio includes:
  • the smallest first preset number of first loudness features are selected, and among all the loudness features of the fourth audio segment, the smallest first preset number of second loudness features are selected;
  • the ratio of the sum of the first preset number of first loudness features to the sum of the first preset number of second loudness features is determined as the adjustment ratio information for adjusting the accompaniment volume of the first singing audio.
  • the determining the loudness characteristics of each third audio segment includes: for each third audio segment, uniformly selecting a second preset number of playback time points in the third audio segment, and determining the selected The root mean square of the audio amplitude corresponding to each playback time point is used as the loudness characteristic of the third audio segment;
  • the determining the loudness characteristics of each fourth audio segment includes: for each fourth audio segment, uniformly selecting a second preset number of playback time points in the fourth audio segment, and determining each selected playback time The root mean square of the audio amplitude corresponding to the point is used as the loudness characteristic of the fourth audio segment.
  • the method further includes:
  • the recording the second singing audio based on the adjusted accompaniment audio includes:
  • a device for determining adjustment ratio information of a volume includes:
  • the first acquisition module is configured to acquire a first singing audio and an original accompaniment audio corresponding to the first singing audio, wherein the first singing audio is a user singing audio;
  • the second acquiring module is configured to acquire the first audio of the non-singing part of the first singing audio, and acquire the loudness characteristic of the first audio;
  • a third acquiring module configured to acquire, in the original accompaniment audio, a second audio corresponding to the first audio with a playing time period, and acquire a loudness characteristic of the second audio
  • the determining module is configured to determine the ratio of the loudness characteristic of the first audio to the loudness characteristic of the second audio as the adjustment ratio information for adjusting the accompaniment volume of the first singing audio.
  • the second acquiring module is configured to:
  • the third acquiring module is configured to:
  • the second acquisition module is configured to: divide the first audio into a plurality of third audio segments of preset duration, and determine the loudness characteristics of each third audio segment;
  • the third acquisition module is configured to divide the second audio into a plurality of fourth audio segments with a preset duration, and determine the loudness characteristics of each fourth audio segment.
  • the determining module is configured to:
  • the smallest first preset number of first loudness features are selected, and among all the loudness features of the fourth audio segment, the smallest first preset number of second loudness features are selected;
  • the ratio of the sum of the first preset number of first loudness features to the sum of the first preset number of second loudness features is determined as the adjustment ratio information for adjusting the accompaniment volume of the first singing audio.
  • the second acquisition module is configured to: for each third audio segment, uniformly select a second preset number of play time points in the third audio segment, and determine each selected play time The root mean square of the audio amplitude corresponding to the point is used as the loudness characteristic of the third audio segment;
  • the third acquisition module is configured to: for each fourth audio segment, uniformly select a second preset number of playback time points in the fourth audio segment, and determine the audio corresponding to each selected playback time point The root mean square of the amplitude is used as the loudness characteristic of the fourth audio segment.
  • the device further includes a recording module configured to:
  • the recording module is configured as:
  • a computer device in yet another aspect, includes a processor and a memory, and at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the above-mentioned Determine the operation performed by the volume adjustment ratio information method.
  • a computer-readable storage medium stores at least one instruction, and the at least one instruction is loaded and executed by a processor to implement the method for determining the volume adjustment ratio information as described above. Action performed.
  • the loudness characteristics of the first audio and the second audio are determined, and then the first audio
  • the ratio of the loudness characteristics of the first audio and the second audio is determined as the adjustment ratio information for adjusting the accompaniment volume of the singing audio. Since this application does not use an algorithm to extract the accompaniment audio in the singing audio, the problem of inaccurate adjustment ratio information of the original accompaniment audio volume obtained by the user due to the noise in the accompaniment audio in the singing audio can be avoided.
  • FIG. 1 is a flowchart of a method for determining volume adjustment ratio information provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a method for processing singing audio provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a method for processing original accompaniment audio provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an apparatus for determining volume adjustment ratio information provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a terminal structure provided by an embodiment of the present application.
  • the method for determining volume adjustment ratio information provided in this application can be implemented by a terminal.
  • the terminal can run a singing application, the terminal can have components such as a microphone, a screen, and a speaker.
  • the terminal has a communication function and can access the Internet.
  • the terminal is equipped with a processor to process data information.
  • the terminal can be a mobile phone, a tablet computer, a smart wearable device, a desktop computer, a notebook computer, etc.
  • the singing application can run on the terminal, and the user can select the song they want to record in the singing application.
  • the singing application will send the identification information of the song selected by the user to the server.
  • the server can send the accompaniment audio and lyrics file corresponding to the song to the terminal according to the identification information of the song.
  • the terminal After the terminal receives the accompaniment audio, it can play the accompaniment audio, and According to the lyrics file, the lyrics are displayed on the screen of the terminal according to the playback progress.
  • the terminal turns on the recording function.
  • the user can sing songs according to the lyrics prompted by the singing application on the terminal screen, and the terminal records the user's vocal audio.
  • the accompaniment audio corresponding to the song is synthesized to generate singing audio. Users can publish the singing audio as K song works to the Internet for other users to listen to.
  • the user Before the terminal synthesizes the vocal audio and accompaniment audio into singing audio, the user can adjust the volume of the accompaniment audio so that the volume of the accompaniment audio is more matched with the vocal audio, so that the synthesized singing audio can better satisfy the user’s hearing On the feeling.
  • the released karaoke work can be recorded twice, that is, the user can select the poorly sung segment of the original singing audio (that is, the released karaoke work) to re-record, and then re-record it.
  • the recorded singing audio replaces the poorly sung segment selected by the user.
  • the terminal obtains the accompaniment audio adjusted by the user in the original singing audio, and records the vocal audio re-sung by the user, and finally synthesizes the singing audio of the selected segment, and replaces the selected segment of the singing audio with the audio of the segment selected by the user in the original singing audio. And then realize the second recording of K song works.
  • the method for determining the volume adjustment ratio information provided by the embodiment of the present application can obtain the user's adjustment information of the original accompaniment audio volume according to the user's singing audio and the original accompaniment audio, and the terminal obtains the accompaniment audio after the user adjusts the volume according to the volume adjustment information. .
  • Fig. 1 is a flowchart of a method for determining volume adjustment ratio information provided by an embodiment of the present application. Referring to Figure 1, this embodiment includes:
  • Step 101 Obtain the first singing audio and the original accompaniment audio corresponding to the first singing audio.
  • the first singing audio is the user singing audio, which is synthesized from the user's vocal audio and the original accompaniment audio.
  • the first singing audio (that is, the K song work) may be obtained by synthesizing the vocal audio recorded by the user through the singing application program and the accompaniment audio of the corresponding singing song.
  • the original accompaniment audio is the accompaniment audio of the song corresponding to the first singing audio.
  • the first singing audio can be stored locally in the terminal or can be obtained from the server.
  • the terminal obtains the first singing audio from the server, it can send a download request with the first singing audio to the server, and the server can send the first singing audio, the original accompaniment audio of the first singing audio, and the first singing audio to the terminal according to the download request.
  • the lyric file of the corresponding song in which the lyric file records the time point at which each sentence of lyrics starts to be played and the time point at which it ends.
  • Step 102 Obtain the first audio of the non-singing part of the first singing audio, and obtain the loudness characteristic of the first audio.
  • the loudness feature is the feature information of the audio volume level, and may be a numerical value representing the volume level.
  • the first singing audio is synthesized from the vocal audio sung by the user and the accompaniment audio of the corresponding singing song.
  • the terminal can process the first singing audio.
  • the audio part of the first singing audio that does not contain the singing voice is cut out, and the cut out pieces of audio that do not contain the singing voice are spliced to obtain the first audio.
  • the volume information of the first audio can be obtained, that is, the loudness characteristic of the first audio can be obtained.
  • the first singing audio may be segmented according to the time point recorded in the lyric file of the song corresponding to the first singing audio to obtain the first audio of the non-singing part.
  • the corresponding processing can be as follows: Obtain the start and end time points of each sentence in the lyrics data corresponding to the first singing audio; based on the start time point and end time point of each sentence, determine the first singing audio.
  • the multiple first audio segments of the singing part are spliced according to the play time sequence to obtain the first audio.
  • the first audio segment is a pure accompaniment audio part of the first singing audio
  • the first audio is obtained by splicing audios of multiple first audio segments in the order of playing time.
  • the server will send the lyric file of the song corresponding to the first singing audio to the terminal.
  • Multiple time points are marked in the lyric file, including the time point when each sentence of lyrics starts to be played and the time point when the playback ends.
  • the first singing audio can be divided into multiple audio segments according to the time point when each sentence of lyrics in the lyric file starts to be played and the time point when it ends, and the audio of the corresponding lyrics part is removed, and the audio of the pure accompaniment part is retained as the first audio segment , And then perform splicing according to the sequence of time points in the lyrics file to obtain the first audio.
  • the first singing audio A can be divided into audio segments a, b, c, d, e, and f, where the audio segments b, d, and f include vocal audio and accompaniment audio, a,
  • the c and e audio segments only contain accompaniment audio, and the a, c, and e audio segments are spliced in chronological order to obtain the first audio B.
  • the terminal can also set a time length threshold. If the time interval between the ending time of the lyrics in the lyric file and the starting time of the next lyrics (that is, the time interval between lyrics) is less than the set time length threshold, These two time points can be ignored, and the audio in these two time points will not be segmented.
  • the first audio can be divided into multiple audio segments, and the loudness characteristics of each audio segment can be acquired.
  • the corresponding processing is as follows: divide the first audio into multiple third audio segments of preset duration, Determine the loudness characteristics of each third audio segment.
  • the terminal may divide the first audio into a plurality of third audio segments according to a preset duration, and each third audio segment is equal in duration.
  • a sampling rate may be set in the terminal to sample the audio amplitude of each divided third audio segment, and the loudness characteristics of each third audio segment can be determined according to the sampling value of each audio segment.
  • each third audio segment For determining the loudness characteristics of each third audio segment, the following processing can be performed: For each third audio segment, the second preset number of playback time points are uniformly selected in the third audio segment, and each selected playback time is determined The root mean square of the audio amplitude corresponding to the point is used as the loudness characteristic of the third audio segment.
  • the audio amplitude of each divided third audio segment can be sampled multiple times, and the root mean square of the multiple sample amplitudes of each third audio segment can be obtained as each third audio The loudness characteristics of the segment.
  • Step 103 In the original accompaniment audio, obtain the second audio corresponding to the first audio in the playing time period, and obtain the loudness characteristic of the second audio.
  • the first audio may include multiple audio segments that do not contain singing vocals, and each audio segment corresponds to its own playing time period.
  • the playing time period can be determined by the starting time point of the audio segment in the original accompaniment audio.
  • the end playing time point indicates that the duration of the first audio may be the sum of the playing time periods of these audio segments.
  • the server will send the original accompaniment audio of the song corresponding to the first singing audio to the terminal (that is, the accompaniment audio whose volume has not been adjusted by the user), and the terminal can perform processing on the original accompaniment audio according to the processing method for the first singing audio in step 102.
  • Processing intercepting the audio part of the original accompaniment audio whose playing time period is the same as the audio part of the first singing audio that does not contain the singing human voice, and splicing the intercepted audio segments to obtain the second audio.
  • the play time period of each audio segment included in the second audio corresponds to the play time period of the audio segment that does not include singing human voice in the first audio.
  • the terminal may segment the original accompaniment audio according to the lyric file of the song corresponding to the first singing audio to obtain the second audio.
  • the corresponding processing can be as follows: Based on the start and end playback time points of each sentence, in the original accompaniment audio, determine multiple second audio segments corresponding to the non-singing part of the first singing audio, in order of playback time The second audio is obtained by splicing multiple second audio segments.
  • the start time point and end time point of each second audio segment in the original accompaniment audio correspond to the start time point and end time point of each first audio segment in the first singing audio in step 102
  • the second audio It is obtained by splicing multiple second audio segments in chronological order.
  • the terminal can divide the original accompaniment audio into multiple audio segments according to the time point when each sentence of lyrics in the lyric file starts and ends, and removes the audio corresponding to the lyrics part, and removes the remaining multiple audio segments.
  • the second audio segment splicing is performed according to the sequence of time points in the lyric file to obtain the second audio.
  • the original accompaniment audio C can be divided into g, h, i, j, k, and l audio segments, where the g, i, and k audio segments correspond to the aforementioned a, c, and e audio segments.
  • the g, i, and k audio segments are spliced in chronological order to obtain the second audio D.
  • the second audio can be divided into multiple audio segments, and the loudness characteristics of each audio segment can be acquired.
  • the corresponding processing is as follows: divide the second audio into multiple fourth audios with a preset duration Segment, determine the loudness characteristics of each fourth audio segment.
  • the terminal may divide the second audio into a plurality of fourth audio segments according to a preset duration, and each fourth audio segment is equal in duration.
  • a sampling rate may also be set in the terminal, where the sampling rate is the same as the sampling rate in step 102 above, and the amplitude of each divided fourth audio segment is sampled according to the sampling rate. The sampled value can determine the loudness characteristics of each fourth audio segment.
  • each fourth audio segment For determining the loudness characteristics of each fourth audio segment, the following processing can be performed: For each fourth audio segment, a second preset number of playback time points are uniformly selected in the fourth audio segment, and each selected playback time is determined The root mean square of the audio amplitude corresponding to the point is used as the loudness characteristic of the fourth audio segment.
  • the audio amplitude of each divided fourth audio segment can be sampled multiple times, and the root mean square of the multiple sample amplitudes of each fourth audio segment can be obtained as each fourth audio The loudness characteristics of the segment.
  • Step 104 Determine the ratio of the loudness characteristic of the first audio to the loudness characteristic of the second audio as the adjustment ratio information for adjusting the accompaniment volume of the first singing audio.
  • the ratio of the loudness characteristics of the first audio to the loudness characteristics of the second audio can be calculated, and the volume gain of the second audio relative to the first audio can be determined according to the ratio, so as to determine the user's volume of the original accompaniment audio Adjusted ratio information.
  • the loudness characteristics of the first audio and the loudness characteristics of the second audio can be obtained according to the loudness characteristics of the third audio segment and the loudness characteristics of the fourth audio segment in the above steps 103 and 104, and the corresponding processing is as follows: Among the loudness characteristics of all the third audio segments, the smallest first preset number of first loudness features are selected, and among the loudness characteristics of all fourth audio segments, the smallest first preset number of second loudness features are selected;
  • the ratio of the sum of the first preset number of first loudness features to the sum of the first preset number of second loudness features is determined as the adjustment ratio information for adjusting the accompaniment volume of the first singing audio.
  • the loudness characteristics of all third audio segments can be arranged in ascending order according to the numerical value of the loudness characteristics.
  • the loudness of all third audio segments after sorting can be selected The loudness characteristics of the first part of the features are summed, for example, the loudness characteristics of the first half are continued to be summed to obtain the loudness characteristics of the first audio.
  • the loudness characteristics of all the fourth audio segments can be arranged in ascending order according to the numerical value of the loudness characteristics, and the loudness characteristics of all the fourth audio segments after sorting can be selected with the same number of loudness characteristics as the third audio segment.
  • the features are summed, for example, the first half of the loudness characteristics are continued to be summed to obtain the loudness characteristics of the second audio.
  • the ratio of the volume information of the first audio to the volume information of the second audio is calculated, and the ratio is determined as the adjustment ratio information for the user to adjust the accompaniment volume of the first singing audio.
  • the formula is as follows:
  • Li is the loudness characteristic of the i-th audio segment
  • N is the number of sampling points in each audio segment
  • S j is the audio amplitude of the j-th sampling point.
  • the resulting L 1 of L i to be sorted in ascending order, to obtain the array are arranged to select one of the elements of the array are arranged in the front half summed to obtain the first audio loudness characteristic V 1.
  • the second audio can be processed in the same manner as the first audio to obtain the loudness characteristic V 2 of the second audio. Then the ratio of V 1 to V 2 is calculated, and the ratio is determined as the adjustment ratio information for the user to adjust the accompaniment volume of the first singing audio.
  • the original accompaniment audio is adjusted in volume to obtain the adjusted accompaniment audio; based on the adjusted accompaniment audio, the second singing audio is recorded.
  • the adjustment ratio information may be the ratio of the loudness characteristic of the first audio to the loudness characteristic of the second audio.
  • the original accompaniment audio can be adjusted according to the adjustment ratio information, and the original accompaniment can be adjusted.
  • the adjusted accompaniment audio can be obtained.
  • the terminal can play the adjusted accompaniment audio, and at the same time turn on the recording function, record the vocal audio of the user singing again, and synthesize the adjusted accompaniment audio with the recorded vocal audio of the user singing again. Get the second singing audio.
  • the second singing audio can be recorded according to the time period specified by the user, and the corresponding processing is as follows: obtain segment time information for segment re-recording of the first singing audio; based on the segment time information, intercept the adjusted Part of the accompaniment audio of the accompaniment audio is recorded based on the part of the accompaniment audio; a singing audio segment is recorded; the singing audio segment is used to replace the singing audio segment corresponding to the segment time information in the first singing audio to obtain the The second singing audio.
  • the segment time information is the start time point and end time point of the re-recorded segment
  • the user selects the audio time period that needs to be sung again in the first singing audio.
  • the singing application can pre-store the lyrics of each sentence in the song corresponding to the first singing audio. The user can select the lyrics that need to be sung again for the singing start time and the singing end time.
  • the singing application receives the lyrics selection instruction, it can change the lyrics according to the pre-stored singing start time and singing end time of each lyrics.
  • the singing start time point of the first lyrics and the singing end time point of the last lyrics of the lyrics selected by the user are determined as the start time point and end time point of the re-recording segment, and then according to the start time point and end time of the re-recording segment Click to cut out part of the accompaniment audio in the adjusted accompaniment audio.
  • the terminal plays part of the accompaniment audio, and records the user's vocal audio at the same time, and then synthesizes part of the accompaniment audio with the vocal audio, and replaces the singing audio segment in the first singing audio with the same playback time as part of the accompaniment audio. Get the second singing audio after the second recording.
  • the first audio of the non-singing part of the user's singing audio is intercepted, and the second audio corresponding to the first audio in the original accompaniment audio is intercepted to determine the loudness characteristics of the first audio and the second audio, and then The ratio of the loudness characteristics of the first audio and the second audio is determined as the adjustment ratio information for adjusting the accompaniment volume of the singing audio. Since this application does not use an algorithm to extract the accompaniment audio in the singing audio, the problem of inaccurate adjustment ratio information of the original accompaniment audio volume obtained by the user due to the noise in the accompaniment audio in the singing audio can be avoided.
  • FIG. 4 is a schematic structural diagram of an apparatus for determining volume adjustment ratio information provided by an embodiment of the present application.
  • the apparatus may be the terminal in the foregoing embodiment.
  • the device includes:
  • the first obtaining module 410 is configured to obtain a first singing audio and an original accompaniment audio corresponding to the first singing audio, where the first singing audio is a user singing audio;
  • the second acquiring module 420 is configured to acquire the first audio of the non-singing part of the first singing audio, and acquire the loudness characteristic of the first audio;
  • the third obtaining module 430 is configured to obtain, in the original accompaniment audio, a second audio whose playing time period corresponds to the first audio, and obtain a loudness characteristic of the second audio;
  • the determining module 440 is configured to determine the ratio of the loudness characteristic of the first audio to the loudness characteristic of the second audio as the adjustment ratio information for adjusting the accompaniment volume of the first singing audio.
  • the second obtaining module 420 is configured to:
  • the third acquiring module 430 is configured to:
  • the second acquisition module 420 is configured to: divide the first audio into a plurality of third audio segments of preset duration, and determine the loudness characteristics of each third audio segment;
  • the third acquisition module is configured to divide the second audio into a plurality of fourth audio segments with a preset duration, and determine the loudness characteristics of each fourth audio segment.
  • the determining module 440 is configured to:
  • the smallest first preset number of first loudness features are selected, and among all the loudness features of the fourth audio segment, the smallest first preset number of second loudness features are selected;
  • the ratio of the sum of the first preset number of first loudness features to the sum of the first preset number of second loudness features is determined as the adjustment ratio information for adjusting the accompaniment volume of the first singing audio.
  • the second acquisition module 420 is configured to: for each third audio segment, uniformly select a second preset number of playback time points in the third audio segment, and determine each selected playback time point.
  • the root mean square of the audio amplitude corresponding to the time point is used as the loudness characteristic of the third audio segment;
  • the third acquisition module 430 is configured to: for each fourth audio segment, uniformly select a second preset number of playback time points in the fourth audio segment, and determine the corresponding playback time point for each selected playback time point.
  • the root mean square of the audio amplitude is used as the loudness characteristic of the fourth audio segment.
  • the device further includes a recording module configured to:
  • the recording module is configured as:
  • the device for determining the volume adjustment ratio information provided in the above embodiment only uses the division of the above functional modules to illustrate when determining the volume adjustment ratio information.
  • the above functions can be changed according to needs.
  • the allocation is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the device for determining volume adjustment ratio information provided in the foregoing embodiment and the method embodiment for determining volume adjustment ratio information belong to the same concept. For the specific implementation process, please refer to the method embodiment, which will not be repeated here.
  • FIG. 5 shows a structural block diagram of a terminal 500 provided by an exemplary embodiment of the present application.
  • the terminal 500 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, moving picture experts compressing standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compressing standard audio Level 4) Player, laptop or desktop computer.
  • the terminal 500 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
  • the terminal 500 includes a processor 501 and a memory 502.
  • the processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 501 may adopt at least one hardware form among DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array, Programmable Logic Array). achieve.
  • the processor 501 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the awake state, also called a CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor used to process data in the standby state.
  • the processor 501 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used to render and draw content that needs to be displayed on the display screen.
  • the processor 501 may further include an AI (Artificial Intelligence) processor, and the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence
  • the memory 502 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 502 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 502 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 501 to implement the volume determination provided in the method embodiment of the present application. The method of adjusting the ratio information.
  • the terminal 500 may optionally further include: a peripheral device interface 503 and at least one peripheral device.
  • the processor 501, the memory 502, and the peripheral device interface 503 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 503 through a bus, a signal line, or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 504, a touch screen 505, a camera 506, an audio circuit 507, a positioning component 508, and a power supply 509.
  • the peripheral device interface 503 can be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 501 and the memory 502.
  • the processor 501, the memory 502, and the peripheral device interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 501, the memory 502, and the peripheral device interface 503 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 504 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 504 communicates with a communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 504 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
  • the radio frequency circuit 504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
  • the radio frequency circuit 504 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity, wireless fidelity) networks.
  • the radio frequency circuit 504 may also include a circuit related to NFC (Near Field Communication), which is not limited in this application.
  • the display screen 505 is used to display a UI (User Interface, user interface).
  • the UI can include graphics, text, icons, videos, and any combination thereof.
  • the display screen 505 also has the ability to collect touch signals on or above the surface of the display screen 505.
  • the touch signal may be input to the processor 501 as a control signal for processing.
  • the display screen 505 may also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
  • the display screen 505 may be one display screen 505, which is provided with the front panel of the terminal 500; in other embodiments, there may be at least two display screens 505, which are respectively provided on different surfaces of the terminal 500 or in a folded design; In still other embodiments, the display screen 505 may be a flexible display screen, which is arranged on a curved surface or a folding surface of the terminal 500. Furthermore, the display screen 505 can also be set as a non-rectangular irregular pattern, that is, a special-shaped screen.
  • the display screen 505 may be made of materials such as LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode, organic light-emitting diode).
  • the camera assembly 506 is used to capture images or videos.
  • the camera assembly 506 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • the camera assembly 506 may also include a flash.
  • the flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to a combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
  • the audio circuit 507 may include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals to be input to the processor 501 for processing, or input to the radio frequency circuit 504 to implement voice communication. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively set in different parts of the terminal 500.
  • the microphone can also be an array microphone or an omnidirectional collection microphone.
  • the speaker is used to convert the electrical signal from the processor 501 or the radio frequency circuit 504 into sound waves.
  • the speaker can be a traditional thin-film speaker or a piezoelectric ceramic speaker.
  • the speaker When the speaker is a piezoelectric ceramic speaker, it can not only convert the electrical signal into human audible sound waves, but also convert the electrical signal into human inaudible sound waves for distance measurement and other purposes.
  • the audio circuit 507 may also include a headphone jack.
  • the positioning component 508 is used to locate the current geographic location of the terminal 500 to implement navigation or LBS (Location Based Service, location-based service).
  • the positioning component 508 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, the Grenas system of Russia, or the Galileo system of the European Union.
  • the power supply 509 is used to supply power to various components in the terminal 500.
  • the power source 509 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery.
  • the rechargeable battery may support wired charging or wireless charging.
  • the rechargeable battery can also be used to support fast charging technology.
  • the terminal 500 further includes one or more sensors 510.
  • the one or more sensors 510 include, but are not limited to: an acceleration sensor 511, a gyroscope sensor 512, a pressure sensor 513, a fingerprint sensor 514, an optical sensor 515, and a proximity sensor 516.
  • the acceleration sensor 511 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 500.
  • the acceleration sensor 511 can be used to detect the components of gravitational acceleration on three coordinate axes.
  • the processor 501 may control the touch screen 505 to display the user interface in a horizontal view or a vertical view according to the gravity acceleration signal collected by the acceleration sensor 511.
  • the acceleration sensor 511 may also be used for the collection of game or user motion data.
  • the gyroscope sensor 512 can detect the body direction and rotation angle of the terminal 500, and the gyroscope sensor 512 can cooperate with the acceleration sensor 511 to collect the user's 3D actions on the terminal 500.
  • the processor 501 can implement the following functions according to the data collected by the gyroscope sensor 512: motion sensing (for example, changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 513 may be disposed on the side frame of the terminal 500 and/or the lower layer of the touch screen 505.
  • the processor 501 performs left and right hand recognition or quick operation according to the holding signal collected by the pressure sensor 513.
  • the processor 501 controls the operability controls on the UI interface according to the user's pressure operation on the touch display screen 505.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 514 is used to collect the user's fingerprint.
  • the processor 501 can identify the user's identity based on the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 can identify the user's identity based on the collected fingerprint. When it is recognized that the user's identity is a trusted identity, the processor 501 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
  • the fingerprint sensor 514 may be provided on the front, back or side of the terminal 500. When a physical button or a manufacturer logo is provided on the terminal 500, the fingerprint sensor 514 can be integrated with the physical button or the manufacturer logo.
  • the optical sensor 515 is used to collect the ambient light intensity.
  • the processor 501 may control the display brightness of the touch screen 505 according to the ambient light intensity collected by the optical sensor 515. Specifically, when the ambient light intensity is high, the display brightness of the touch screen 505 is increased; when the ambient light intensity is low, the display brightness of the touch screen 505 is decreased.
  • the processor 501 may also dynamically adjust the shooting parameters of the camera assembly 506 according to the ambient light intensity collected by the optical sensor 515.
  • the proximity sensor 516 also called a distance sensor, is usually arranged on the front panel of the terminal 500.
  • the proximity sensor 516 is used to collect the distance between the user and the front of the terminal 500.
  • the processor 501 controls the touch screen 505 to switch from the on-screen state to the off-screen state; when the proximity sensor 516 detects When the distance between the user and the front of the terminal 500 gradually increases, the processor 501 controls the touch display screen 505 to switch from the rest screen state to the bright screen state.
  • FIG. 5 does not constitute a limitation on the terminal 500, and may include more or fewer components than shown, or combine certain components, or adopt different component arrangements.
  • a computer-readable storage medium such as a memory including instructions, which can be executed by a processor in a terminal to complete the method for determining volume adjustment ratio information in the following embodiments.
  • the computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • the program can be stored in a computer-readable storage medium.
  • the storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

一种确定音量的调节比例信息的方法、装置、设备及存储介质,属于互联网技术领域,所述方法包括:获取第一演唱音频和所述第一演唱音频对应的原始伴奏音频,其中,第一演唱音频是用户演唱音频(101);获取第一演唱音频中非演唱部分的第一音频,并获取第一音频的响度特征(102);在原始伴奏音频中,获取播放时间段与第一音频相对应的第二音频,并获取第二音频的响度特征(103);将第一音频的响度特征和第二音频的响度特征的比值,确定为第一演唱音频进行伴奏音量调节的调节比例信息(104)。所述方法截取出用户演唱音频和原始伴奏音频中对应的第一音频和第二音频,确定对伴奏音频进行音量调节的比例信息,可提高伴奏音频音量调节比例信息的准确性。

Description

确定音量的调节比例信息的方法、装置、设备及存储介质
本申请要求于2019年10月10日提交的申请号为201910958720.1、发明名称为“确定音量的调节比例信息的方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,特别涉及一种确定音量的调节比例信息的方法、装置、设备及存储介质。
背景技术
随着互联网技术的发展,人们可选择的娱乐方式越来越多,在歌唱应用程序上演唱歌曲已经是人们常见的一种娱乐方式。
歌唱应用程序为用户提供伴奏音频,用户伴随伴奏音频演唱歌曲,运行歌唱应用程序的终端会将用户的人声音频录制下来,并将人声音频与伴奏音频合成为演唱音频,并发布在网上。在人声音频与伴奏音频合成之前,用户可以调节伴奏音频的音量大小得到音量调节后的伴奏音频,再与人声音频进行合成。当用户把演唱音频发布到网上后,可对发布的演唱音频进行二次录制,即用户可选择原演唱音频(即发布的演唱音频)中演唱较差的片段进行二次录制。在进行二次录制时,终端为用户提供音量调节后的伴奏音频,并录制用户的人声音频,合成选中片段的演唱音频,将选中片段的演唱音频替换掉原演唱音频中被用户选中片段的音频,进而实现对原演唱音频的二次录制。目前,终端获取用户调节后的伴奏音频的方式是,基于预设算法在原演唱音频中滤除人声音频,从而提取伴奏音频,但由于算法本身的缺陷,所提取的伴奏音频中存在较多的杂音,为了消除杂音,可将提取出来的伴奏音频与原始伴奏音频(即用户未调节音量的伴奏音频)进行比对,得到用户对原始伴奏音频音量的调节比例信息,再根据音量的调节比例信息得到用户调节后且没有杂音的伴奏音频。
在实现本申请的过程中,发明人发现现有技术至少存在以下问题:
通过算法提取出来的伴奏音频中存在的杂音,可能会影响与原始伴奏音频 的比对结果,导致得到的音量的调节比例信息不准确。
发明内容
本申请实施例提供了一种确定音量的调节比例信息的方法、装置、设备及存储介质,能够避免因演唱音频中伴奏音频存在的杂音,造成得到的用户对原始伴奏音频音量的调节比例信息不准确的问题。所述技术方案如下:
获取第一演唱音频和所述第一演唱音频对应的原始伴奏音频,其中,所述第一演唱音频是用户演唱音频;
获取所述第一演唱音频中非演唱部分的第一音频,并获取所述第一音频的响度特征;
在所述原始伴奏音频中,获取播放时间段与所述第一音频相对应的第二音频,并获取所述第二音频的响度特征;
将所述第一音频的响度特征和所述第二音频的响度特征的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息。
可选的,所述获取所述第一演唱音频中非演唱部分的第一音频,包括:
获取所述第一演唱音频对应的歌词数据中各句子的开始播放时间点和结束播放时间点;
基于所述各句子的开始播放时间点和结束播放时间点,确定所述第一演唱音频中非演唱部分的多个第一音频段,按播放时间顺序对所述多个第一音频段进行拼接得到第一音频。
可选的,所述在所述原始伴奏音频中,获取播放时间段与所述第一音频相对应的第二音频,包括:
基于所述各句子的开始播放时间点和结束播放时间点,在所述原始伴奏音频中,确定与所述第一演唱音频中非演唱部分相对应的多个第二音频段,按播放时间顺序对所述多个第二音频段进行拼接得到第二音频。
可选的,所述获取所述第一音频的响度特征,包括:将所述第一音频划分为多个预设时长的第三音频段,确定每个第三音频段的响度特征;
所述获取所述第二音频的响度特征,包括:将所述第二音频划分为多个预设时长的第四音频段,确定每个第四音频段的响度特征。
可选的,所述将所述第一音频的响度特征和所述第二音频的响度特征的比 值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息,包括:
在所有第三音频段的响度特征中,选取最小的第一预设数目个第一响度特征,在所有第四音频段的响度特征中,选取最小的第一预设数目个第二响度特征;
将所述第一预设数目个第一响度特征之和与所述第一预设数目个第二响度特征之和的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息。
可选的,所述确定每个第三音频段的响度特征,包括:对于每个第三音频段,在所述第三音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为所述第三音频段的响度特征;
所述确定每个第四音频段的响度特征,包括:对于每个第四音频段,在所述第四音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为所述第四音频段的响度特征。
可选的,所述将所述第一音频的响度特征和所述第二音频的响度特征的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息之后,还包括:
基于所述调节比例信息,对所述原始伴奏音频进行音量调节,得到调节后的伴奏音频;
基于所述调节后的伴奏音频,录制第二演唱音频。
可选的,所述基于所述调节后的伴奏音频,录制第二演唱音频,包括:
获取对所述第一演唱音频进行片段重录制的片段时间信息;
基于所述片段时间信息,截取调节后的伴奏音频的部分伴奏音频,基于所述部分伴奏音频录制演唱音频片段;
使用所述演唱音频片段,替换所述第一演唱音频中与所述片段时间信息相对应的演唱音频片段,得到所述第二演唱音频。
另一方面,提供了一种确定音量的调节比例信息的装置,所述装置包括:
第一获取模块,被配置为获取第一演唱音频和所述第一演唱音频对应的原始伴奏音频,其中,所述第一演唱音频是用户演唱音频;
第二获取模块,被配置为获取所述第一演唱音频中非演唱部分的第一音频,并获取所述第一音频的响度特征;
第三获取模块,被配置为在所述原始伴奏音频中,获取播放时间段与所述第一音频相对应的第二音频,并获取所述第二音频的响度特征;
确定模块,被配置为将所述第一音频的响度特征和所述第二音频的响度特征的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息。
可选的,所述第二获取模块,被配置为:
获取所述第一演唱音频对应的歌词数据中各句子的开始播放时间点和结束播放时间点;
基于所述各句子的开始播放时间点和结束播放时间点,确定所述第一演唱音频中非演唱部分的多个第一音频段,按播放时间顺序对所述多个第一音频段进行拼接得到第一音频。
可选的,所述第三获取模块,被配置为:
基于所述各句子的开始播放时间点和结束播放时间点,在所述原始伴奏音频中,确定与所述第一演唱音频中非演唱部分相对应的多个第二音频段,按播放时间顺序对所述多个第二音频段进行拼接得到第二音频。
可选的,所述第二获取模块,被配置为:将所述第一音频划分为多个预设时长的第三音频段,确定每个第三音频段的响度特征;
所述第三获取模块,被配置为:将所述第二音频划分为多个预设时长的第四音频段,确定每个第四音频段的响度特征。
可选的,所述确定模块,被配置为:
在所有第三音频段的响度特征中,选取最小的第一预设数目个第一响度特征,在所有第四音频段的响度特征中,选取最小的第一预设数目个第二响度特征;
将所述第一预设数目个第一响度特征之和与所述第一预设数目个第二响度特征之和的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息。
可选的,所述第二获取模块,被配置为:对于每个第三音频段,在所述第三音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为所述第三音频段的响度特征;
所述第三获取模块,被配置为:对于每个第四音频段,在所述第四音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为所述第四音频段的响度特征。
可选的,所述装置还包括录制模块,被配置为:
基于所述调节比例信息,对所述原始伴奏音频进行音量调节,得到调节后的伴奏音频;
基于所述调节后的伴奏音频,录制第二演唱音频。
可选的,所述录制模块,被配置为:
获取对所述第一演唱音频进行片段重录制的片段时间信息;
基于所述片段时间信息,截取调节后的伴奏音频的部分伴奏音频,基于所述部分伴奏音频录制演唱音频片段;
使用所述演唱音频片段,替换所述第一演唱音频中与所述片段时间信息相对应的演唱音频片段,得到所述第二演唱音频。
再一方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如上所述的确定音量的调节比例信息方法所执行的操作。
再一方面,提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如上所述的确定音量的调节比例信息方法所执行的操作。
本申请实施例提供的技术方案带来的有益效果是:
通过截取出用户演唱音频中非演唱部分的第一音频,并截取出原始伴奏音频中与第一音频在时间上对应的第二音频,确定第一音频和第二音频的响度特征,再将第一音频和第二音频的响度特征的比值,确定为演唱音频进行伴奏音量调节的调节比例信息。由于本申请没有采用算法去提取演唱音频中的伴奏音频,所以可以避免因演唱音频中伴奏音频存在的杂音,造成得到的用户对原始伴奏音频音量的调节比例信息不准确的问题。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请 的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的确定音量的调节比例信息的方法流程图;
图2是本申请实施例提供的对演唱音频处理方法示意图;
图3是本申请实施例提供的对原始伴奏音频处理方法示意图;
图4是本申请实施例提供的确定音量的调节比例信息的装置结构示意图;
图5是本申请实施例提供的终端结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请提供的确定音量的调节比例信息的方法可以由终端实现。终端可以运行有歌唱应用程序,终端可以具备麦克风、屏幕、扬声器等部件,终端具有通信功能,可以接入互联网,终端中设置有处理器,可对数据信息进行处理。终端可以是手机、平板电脑、智能穿戴设备、台式计算机、笔记本电脑等。
歌唱应用程序可运行在终端,用户可以在歌唱应用程序中选择自己想要录制的歌曲。歌唱应用程序会将用户选择的歌曲的标识信息发送给服务器,服务器可根据歌曲的标识信息向终端发送歌曲对应的伴奏音频和歌词文件,终端接收到伴奏音频后,可对伴奏音频进行播放,并根据歌词文件将歌词按照播放的进度显示在终端的屏幕中,同时终端开启录音功能,用户可以根据歌唱应用程序在终端屏幕上的提示的歌词进行演唱歌曲,终端将用户的人声音频录制下来,与歌曲对应的伴奏音频进行合成,进而生成演唱音频。用户可以将演唱音频作为K歌作品发布到网上,供其他用户进行收听。在终端将人声音频和伴奏音频合成为演唱音频之前,用户可以对伴奏音频的音量进行调节,使伴奏音频在音量上与人声音频更匹配,进而使合成的演唱音频更能满足用户在听觉上的感受。
用户将K歌作品发布到网上后,可对发布的K歌作品进行二次录制,即用户可选择原演唱音频(即发布的K歌作品)中的演唱较差的片段重新录制,然后将重新录制的演唱音频替换掉用户选择的演唱较差的片段。终端获取原演唱音频中用户调节之后的伴奏音频,并录制用户重新演唱的人声音频,最后将合 成选中片段的演唱音频,将选中片段的演唱音频替换掉原演唱音频中用户选中片段的音频,进而实现对K歌作品进行二次录制。本申请实施例提供的确定音量的调节比例信息的方法,可以根据用户的演唱音频与原始伴奏音频得到用户对原始伴奏音频音量的调节信息,终端根据音量的调节信息得到用户调节音量之后的伴奏音频。
图1是本申请实施例提供的一种确定音量的调节比例信息的方法的流程图。参见图1,该实施例包括:
步骤101,获取第一演唱音频和第一演唱音频对应的原始伴奏音频。
其中,第一演唱音频是用户演唱音频,由用户的人声音频和原始伴奏音频合成。
在实施中,第一演唱音频(即K歌作品)可以是用户通过歌唱应用程序录制的人声音频和对应演唱歌曲的伴奏音频合成得到的。原始伴奏音频为第一演唱音频对应的歌曲伴奏音频。第一演唱音频可存储在终端本地,也可以从服务器中获取。终端向服务器获取第一演唱音频时,可向服务器发送带有第一演唱音频的下载请求,服务器可以根据下载请求向终端发送第一演唱音频、第一演唱音频的原始伴奏音频以及第一演唱音频对应歌曲的歌词文件,其中,歌词文件中记录有每句歌词开始播放的时间点和结束播放的时间点。
步骤102,获取第一演唱音频中非演唱部分的第一音频,并获取第一音频的响度特征。
其中,响度特征为音频音量大小的特征信息,可以为表示音量大小的数值。
在实施中,第一演唱音频是由用户演唱的人声音频和对应演唱歌曲的伴奏音频合成得到的,终端在获取到第一演唱音频后,可对第一演唱音频进行处理。将第一演唱音频中不包含演唱人声的音频部分截取出来,将截取出的多段不包含演唱人声的音频进行拼接得到第一音频。得到第一音频之后可以获取第一音频的音量信息,即获取第一音频的响度特征。
可选的,可根据第一演唱音频对应歌曲的歌词文件中记录的时间点对第一演唱音频进行切分得到非演唱部分的第一音频。相应的处理可以如下:获取第一演唱音频对应的歌词数据中各句子的开始播放时间点和结束播放时间点;基于各句子的开始播放时间点和结束播放时间点,确定第一演唱音频中非演唱部分的多个第一音频段,按播放时间顺序对多个第一音频段进行拼接得到第一音 频。
其中,第一音频段为第一演唱音频中纯伴奏音频的部分,第一音频是由多个第一音频段的音频按着播放的时间顺序拼接得到的。
在实施中,服务器会向终端发送第一演唱音频对应歌曲的歌词文件,歌词文件中标记着多个时间点,其中包括每句歌词开始播放的时间点与结束播放的时间点。按照歌词文件中每句歌词开始播放的时间点与结束播放的时间点可以将第一演唱音频进行切分成多段音频,将其中对应歌词部分的音频去除,保留纯伴奏部分的音频作为第一音频段,再按照歌词文件中的时间点先后顺序进行拼接,得到第一音频。如图2所示,第一演唱音频A可被切分为a、b、c、d、e、f音频段,其中,b、d、f音频段中包含人声音频和伴奏音频,a、c、e音频段中只包含伴奏音频,将a、c、e音频段按时间顺序进行拼接,得到第一音频B。
另外,终端还可以设置有时间长度阈值,若歌词文件中歌词结束播放时间点的与下句歌词的开始播放时间点的时间间隔(即歌词之间的时间间隔)小于所设的时间长度阈值,可以将这两个时间点忽略,不对这两个时间点内的音频进行切分。
在获取第一音频之后,可将第一音频划分为多个音频段,获取每个音频段的响度特征,相应的处理如下:将第一音频划分为多个预设时长的第三音频段,确定每个第三音频段的响度特征。
在实施中,终端可以将第一音频按照预设时长划分为多个第三音频段,每个第三音频段在时长上是相等的。终端中还可以设置有采样率,对划分出的每个第三音频段中的音频的幅值进行采样,根据每段音频的采样值,可以确定每个第三音频段的响度特征。
对于确定每个第三音频段的响度特征,可进行如下处理:对于每个第三音频段,在第三音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为第三音频段的响度特征。
在实施中,可对划分出的每个第三音频段中的音频的幅值进行多次采样,求出每个第三音频段的多个采样幅值的均方根作为每个第三音频段的响度特征。
步骤103,在原始伴奏音频中,获取播放时间段与第一音频相对应的第二音频,并获取第二音频的响度特征。
其中,第一音频中可以包括多个不包含演唱人声的音频段,每个音频段都对应有自己的播放时间段,该播放时间段可以由音频段在原始伴奏音频中的开始播放时间点和结束播放时间点表示,第一音频的时长可以是这些音频段的播放时间段的总和。
在实施中,服务器会给终端发送第一演唱音频对应歌曲的原始伴奏音频(即用户未调节音量的伴奏音频),终端可按照对步骤102中对第一演唱音频的处理方式对原始伴奏音频进行处理,截取出原始伴奏音频中播放时间段与第一演唱音频中不包含演唱人声的音频部分相同的音频部分,并将截取出来的音频段进行拼接得到第二音频。其中,第二音频中包括的各个音频段的播放时间段与第一音频中的不包含演唱人声的音频段的播放时间段相对应。
可选的,终端可根据第一演唱音频对应歌曲的歌词文件对原始伴奏音频进行切分得到第二音频。相应的处理可以如下:基于各句子的开始播放时间点和结束播放时间点,在原始伴奏音频中,确定与第一演唱音频中非演唱部分相对应的多个第二音频段,按播放时间顺序对多个第二音频段进行拼接得到第二音频。
其中,每个第二音频段在原始伴奏音频中的开始时间点和结束时间点与步骤102中每个第一音频段在第一演唱音频的开始时间点和结束时间点对应一致,第二音频是由多个第二音频段按照时间顺序拼接得到的。
在实施中,终端可按照歌词文件中每句歌词开始播放的时间点与结束播放的时间点可以将原始伴奏音频进行切分成多段音频,将其中对应有歌词部分的音频去除,将剩余的多段音频作为第二音频段,再按照歌词文件中的时间点的先后顺序进行拼接,得到第二音频。如图3所示,原始伴奏音频C可被切分为g、h、i、j、k、l音频段,其中,g、i、k音频段与上述a、c、e音频段相对应,将g、i、k音频段按时间顺序进行拼接,得到第二音频D。
在获取第二音频之后,可将第二音频划分为多个音频段,获取每个音频段的响度特征,相应的处理如下:将所述第二音频划分为多个预设时长的第四音频段,确定每个第四音频段的响度特征。
在实施中,终端可以将第二音频按照预设时长划分为多个第四音频段,每个第四音频段在时长上是相等的。终端中还可以设置有采样率,其中,采样率与上述步骤102中的采样率相同,按照采样率对划分出的每个第四音频段中的 音频的幅值进行采样,根据每段音频的采样值,可以确定每个第四音频段的响度特征。
对于确定每个第四音频段的响度特征,可进行如下处理:对于每个第四音频段,在第四音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为第四音频段的响度特征。
在实施中,可对划分出的每个第四音频段中的音频的幅值进行多次采样,求出每个第四音频段的多个采样幅值的均方根作为每个第四音频段的响度特征。
步骤104,将第一音频的响度特征和第二音频的响度特征的比值,确定为第一演唱音频进行伴奏音量调节的调节比例信息。
在实施中,可以计算出第一音频的响度特征与第二音频的响度特征的比值,根据比值确定第二音频相对于第一音频在音量上的增益值,从而确定用户对原始伴奏音频进行音量调节的比例信息。
可选的,可根据上述步骤103和步骤104中,第三音频段的响度特征和第四音频段的响度特征得到第一音频的响度特征和第二音频的响度特征,相应的处理如下:在所有第三音频段的响度特征中,选取最小的第一预设数目个第一响度特征,在所有第四音频段的响度特征中,选取最小的第一预设数目个第二响度特征;
将所述第一预设数目个第一响度特征之和与所述第一预设数目个第二响度特征之和的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息。
在实施中,可将所有第三音频段的响度特征按照响度特征的数值大小进行升序排列,为了减少第一音频中的杂音对音量值的影响,可选取排序之后的所有第三音频段的响度特征中前一部分的响度特征进行求和,例如,将前二分之一的响度特征继续求和,得到第一音频的响度特征。同样,可将所有第四音频段的响度特征按照响度特征的数值大小进行升序排列,可选取排序之后的所有第四音频段的响度特征中与选取第三音频段的响度特征的数量相同的响度特征进行求和,例如,将前二分之一的响度特征继续求和,得到第二音频的响度特征。最后求出第一音频的音量信息和第二音频的音量信息的比值,将比值确定为用户对第一演唱音频进行伴奏音量调节的调节比例信息。
例如,可将预设时长设置为t,可设置采样率为s。则每个音频段中采样点 数为N=s*t,从而对每个音频段得到N个幅值,然后可利用均方根公式根据每段N个音频幅值,得到每段的音量值,公式如下:
Figure PCTCN2020120044-appb-000001
其中,L i为第i音频段的响度特征,N为每个音频段中采样点数,S j为第j个采样点的音频幅值。然后将得到的L 1到L i进行升序排序,得到排列数组,选取排列数组中前二分之一的元素进行求和得到第一音频的响度特征V 1。按照对第一音频同样的处理方式可对第二音频进行处理,得到第二音频的响度特征V 2。然后计算出V 1和V 2的比值,将比值确定为用户对第一演唱音频进行伴奏音量调节的调节比例信息。
可选的,基于调节比例信息,对原始伴奏音频进行音量调节,得到调节后的伴奏音频;基于调节后的伴奏音频,录制第二演唱音频。
在实施中,调节比例信息可以为第一音频的响度特征和第二音频的响度特征的比值,在得到音频的调节比例信息之后,可根据调节比例信息对原始伴奏音频进行调节,可以将原始伴奏音频的各采样点的幅值与调节比例信息相乘后,可得到调节后的伴奏音频。得到调节后的伴奏音频后,终端可以播放调节后的伴奏音频,同时开启录制功能,录制用户再次演唱的人声音频,将调节后的伴奏音频与录制的用户再次演唱的人声音频进行合成,得到第二演唱音频。
可选的,可以根据用户指定的时间段,录制第二演唱音频,相应的处理如下:获取对所述第一演唱音频进行片段重录制的片段时间信息;基于所述片段时间信息,截取调节后的伴奏音频的部分伴奏音频,基于所述部分伴奏音频录制演唱音频片段;使用所述演唱音频片段,替换所述第一演唱音频中与所述片段时间信息相对应的演唱音频片段,得到所述第二演唱音频。
其中,片段时间信息为重新录制片段的开始时间点和结束时间点
在实施中,用户在录制第二演唱音频之前,在第一演唱音频中选择需要进行再次演唱的音频时间段,例如,歌唱应用程序中可以预先存储第一演唱音频对应的歌曲中每句歌词的演唱开始时间点和演唱结束时间点,用户可以选择需要再次演唱的歌词,当歌唱应用程序接收到歌词选取指令之后,可根据预先存储的每句歌词的演唱开始时间点和演唱结束时间点,将用户选择的歌词中第一 句歌词的演唱开始时间点和最后一句歌词的演唱结束时间点,确定为重新录制片段的开始时间点和结束时间点,然后根据重新录制片段的开始时间点和结束时间点在调节后的伴奏音频中截取出部分伴奏音频。终端播放部分伴奏音频,同时录制用户的人声音频,再将部分伴奏音频与人声音频进行合成,并替换掉第一演唱音频中与部分伴奏音频在播放时间上相同的演唱音频片段,即可得到二次录制之后的第二演唱音频。
本申请通过截取出用户演唱音频中非演唱部分的第一音频,并截取出原始伴奏音频中与第一音频在时间上对应的第二音频,确定第一音频和第二音频的响度特征,再将第一音频和第二音频的响度特征的比值,确定为演唱音频进行伴奏音量调节的调节比例信息。由于本申请没有采用算法去提取演唱音频中的伴奏音频,所以可以避免因演唱音频中伴奏音频存在的杂音,造成得到的用户对原始伴奏音频音量的调节比例信息不准确的问题。
上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。
图4是本申请实施例提供的一种确定音量的调节比例信息的装置结构示意图,该装置可以为上述实施例中的终端。参见图4,该装置包括:
第一获取模块410,被配置为获取第一演唱音频和所述第一演唱音频对应的原始伴奏音频,其中,所述第一演唱音频是用户演唱音频;
第二获取模块420,被配置为获取所述第一演唱音频中非演唱部分的第一音频,并获取所述第一音频的响度特征;
第三获取模块430,被配置为在所述原始伴奏音频中,获取播放时间段与所述第一音频相对应的第二音频,并获取所述第二音频的响度特征;
确定模块440,被配置为将所述第一音频的响度特征和所述第二音频的响度特征的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息。
可选的,所述第二获取模块420,被配置为:
获取所述第一演唱音频对应的歌词数据中各句子的开始播放时间点和结束播放时间点;
基于所述各句子的开始播放时间点和结束播放时间点,确定所述第一演唱音频中非演唱部分的多个第一音频段,按播放时间顺序对所述多个第一音频段 进行拼接得到第一音频。
可选的,所述第三获取模块430,被配置为:
基于所述各句子的开始播放时间点和结束播放时间点,在所述原始伴奏音频中,确定与所述第一演唱音频中非演唱部分相对应的多个第二音频段,按播放时间顺序对所述多个第二音频段进行拼接得到第二音频。
可选的,所述第二获取模块420,被配置为:将所述第一音频划分为多个预设时长的第三音频段,确定每个第三音频段的响度特征;
所述第三获取模块,被配置为:将所述第二音频划分为多个预设时长的第四音频段,确定每个第四音频段的响度特征。
可选的,所述确定模块440,被配置为:
在所有第三音频段的响度特征中,选取最小的第一预设数目个第一响度特征,在所有第四音频段的响度特征中,选取最小的第一预设数目个第二响度特征;
将所述第一预设数目个第一响度特征之和与所述第一预设数目个第二响度特征之和的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息。
可选的,所述第二获取模块420,被配置为:对于每个第三音频段,在所述第三音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为所述第三音频段的响度特征;
所述第三获取模块430,被配置为:对于每个第四音频段,在所述第四音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为所述第四音频段的响度特征。
可选的,所述装置还包括录制模块,被配置为:
基于所述调节比例信息,对所述原始伴奏音频进行音量调节,得到调节后的伴奏音频;
基于所述调节后的伴奏音频,录制第二演唱音频。
可选的,所述录制模块,被配置为:
获取对所述第一演唱音频进行片段重录制的片段时间信息;
基于所述片段时间信息,截取调节后的伴奏音频的部分伴奏音频,基于所述部分伴奏音频录制演唱音频片段;
使用所述演唱音频片段,替换所述第一演唱音频中与所述片段时间信息相 对应的演唱音频片段,得到所述第二演唱音频。
需要说明的是:上述实施例提供的确定音量的调节比例信息的装置在确定音量的调节比例信息时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的确定音量的调节比例信息的装置与确定音量的调节比例信息的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图5示出了本申请一个示例性实施例提供的终端500的结构框图。该终端500可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端500还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端500包括有:处理器501和存储器502。
处理器501可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器501可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器501也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器501可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器501还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器502可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器502还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中, 存储器502中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器501所执行以实现本申请中方法实施例提供的确定音量的调节比例信息的方法。
在一些实施例中,终端500还可选包括有:外围设备接口503和至少一个外围设备。处理器501、存储器502和外围设备接口503之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口503相连。具体地,外围设备包括:射频电路504、触摸显示屏505、摄像头506、音频电路507、定位组件508和电源509中的至少一种。
外围设备接口503可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器501和存储器502。在一些实施例中,处理器501、存储器502和外围设备接口503被集成在同一芯片或电路板上;在一些其他实施例中,处理器501、存储器502和外围设备接口503中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路504用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路504通过电磁信号与通信网络以及其他通信设备进行通信。射频电路504将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路504包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路504可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:城域网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路504还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏505用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏505是触摸显示屏时,显示屏505还具有采集在显示屏505的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器501进行处理。此时,显示屏505还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏505可以为一个,设置终端500的前面板;在另一些实施例中,显示屏505可以为至少两个,分别设置在终端500的不同表面或呈折叠设计;在再一些 实施例中,显示屏505可以是柔性显示屏,设置在终端500的弯曲表面上或折叠面上。甚至,显示屏505还可以设置成非矩形的不规则图形,也即异形屏。显示屏505可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件506用于采集图像或视频。可选地,摄像头组件506包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件506还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路507可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器501进行处理,或者输入至射频电路504以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端500的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器501或射频电路504的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路507还可以包括耳机插孔。
定位组件508用于定位终端500的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件508可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统、俄罗斯的格雷纳斯系统或欧盟的伽利略系统的定位组件。
电源509用于为终端500中的各个组件进行供电。电源509可以是交流电、直流电、一次性电池或可充电电池。当电源509包括可充电电池时,该可充电电池可以支持有线充电或无线充电。该可充电电池还可以用于支持快充技术。
在一些实施例中,终端500还包括有一个或多个传感器510。该一个或多个 传感器510包括但不限于:加速度传感器511、陀螺仪传感器512、压力传感器513、指纹传感器514、光学传感器515以及接近传感器516。
加速度传感器511可以检测以终端500建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器511可以用于检测重力加速度在三个坐标轴上的分量。处理器501可以根据加速度传感器511采集的重力加速度信号,控制触摸显示屏505以横向视图或纵向视图进行用户界面的显示。加速度传感器511还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器512可以检测终端500的机体方向及转动角度,陀螺仪传感器512可以与加速度传感器511协同采集用户对终端500的3D动作。处理器501根据陀螺仪传感器512采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器513可以设置在终端500的侧边框和/或触摸显示屏505的下层。当压力传感器513设置在终端500的侧边框时,可以检测用户对终端500的握持信号,由处理器501根据压力传感器513采集的握持信号进行左右手识别或快捷操作。当压力传感器513设置在触摸显示屏505的下层时,由处理器501根据用户对触摸显示屏505的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器514用于采集用户的指纹,由处理器501根据指纹传感器514采集到的指纹识别用户的身份,或者,由指纹传感器514根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器501授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器514可以被设置终端500的正面、背面或侧面。当终端500上设置有物理按键或厂商Logo时,指纹传感器514可以与物理按键或厂商Logo集成在一起。
光学传感器515用于采集环境光强度。在一个实施例中,处理器501可以根据光学传感器515采集的环境光强度,控制触摸显示屏505的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏505的显示亮度;当环境光强度较低时,调低触摸显示屏505的显示亮度。在另一个实施例中,处理器501还可以根据光学传感器515采集的环境光强度,动态调整摄像头组件506的拍摄 参数。
接近传感器516,也称距离传感器,通常设置在终端500的前面板。接近传感器516用于采集用户与终端500的正面之间的距离。在一个实施例中,当接近传感器516检测到用户与终端500的正面之间的距离逐渐变小时,由处理器501控制触摸显示屏505从亮屏状态切换为息屏状态;当接近传感器516检测到用户与终端500的正面之间的距离逐渐变大时,由处理器501控制触摸显示屏505从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图5中示出的结构并不构成对终端500的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括指令的存储器,上述指令可由终端中的处理器执行以完成下述实施例中确定音量的调节比例信息的方法。例如,所述计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (18)

  1. 一种确定音量的调节比例信息的方法,其特征在于,所述方法包括:
    获取第一演唱音频和所述第一演唱音频对应的原始伴奏音频,其中,所述第一演唱音频是用户演唱音频;
    获取所述第一演唱音频中非演唱部分的第一音频,并获取所述第一音频的响度特征;
    在所述原始伴奏音频中,获取播放时间段与所述第一音频相对应的第二音频,并获取所述第二音频的响度特征;
    将所述第一音频的响度特征和所述第二音频的响度特征的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息。
  2. 根据权利要求1所述的方法,其特征在于,所述获取所述第一演唱音频中非演唱部分的第一音频,包括:
    获取所述第一演唱音频对应的歌词数据中各句子的开始播放时间点和结束播放时间点;
    基于所述各句子的开始播放时间点和结束播放时间点,确定所述第一演唱音频中非演唱部分的多个第一音频段,按播放时间顺序对所述多个第一音频段进行拼接得到第一音频。
  3. 根据权利要求2所述的方法,其特征在于,所述在所述原始伴奏音频中,获取播放时间段与所述第一音频相对应的第二音频,包括:
    基于所述各句子的开始播放时间点和结束播放时间点,在所述原始伴奏音频中,确定与所述第一演唱音频中非演唱部分相对应的多个第二音频段,按播放时间顺序对所述多个第二音频段进行拼接得到第二音频。
  4. 根据权利要求1所述的方法,其特征在于,所述获取所述第一音频的响度特征,包括:将所述第一音频划分为多个预设时长的第三音频段,确定每个第三音频段的响度特征;
    所述获取所述第二音频的响度特征,包括:将所述第二音频划分为多个预设时长的第四音频段,确定每个第四音频段的响度特征。
  5. 根据权利要求4所述的方法,其特征在于,所述将所述第一音频的响度特征和所述第二音频的响度特征的比值,确定为所述第一演唱音频进行伴奏音 量调节的调节比例信息,包括:
    在所有第三音频段的响度特征中,选取最小的第一预设数目个第一响度特征,在所有第四音频段的响度特征中,选取最小的第一预设数目个第二响度特征;
    将所述第一预设数目个第一响度特征之和与所述第一预设数目个第二响度特征之和的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息。
  6. 根据权利要求4所述的方法,其特征在于,所述确定每个第三音频段的响度特征,包括:对于每个第三音频段,在所述第三音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为所述第三音频段的响度特征;
    所述确定每个第四音频段的响度特征,包括:对于每个第四音频段,在所述第四音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为所述第四音频段的响度特征。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述将所述第一音频的响度特征和所述第二音频的响度特征的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息之后,还包括:
    基于所述调节比例信息,对所述原始伴奏音频进行音量调节,得到调节后的伴奏音频;
    基于所述调节后的伴奏音频,录制第二演唱音频。
  8. 根据权利要求7所述的方法,其特征在于,所述基于所述调节后的伴奏音频,录制第二演唱音频,包括:
    获取对所述第一演唱音频进行片段重录制的片段时间信息;
    基于所述片段时间信息,截取调节后的伴奏音频的部分伴奏音频,基于所述部分伴奏音频录制演唱音频片段;
    使用所述演唱音频片段,替换所述第一演唱音频中与所述片段时间信息相对应的演唱音频片段,得到所述第二演唱音频。
  9. 一种确定音量的调节比例信息的装置,其特征在于,所述装置包括:
    第一获取模块,被配置为获取第一演唱音频和所述第一演唱音频对应的原始伴奏音频,其中,所述第一演唱音频是用户演唱音频;
    第二获取模块,被配置为获取所述第一演唱音频中非演唱部分的第一音频, 并获取所述第一音频的响度特征;
    第三获取模块,被配置为在所述原始伴奏音频中,获取播放时间段与所述第一音频相对应的第二音频,并获取所述第二音频的响度特征;
    确定模块,被配置为将所述第一音频的响度特征和所述第二音频的响度特征的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息。
  10. 根据权利要求9所述的装置,其特征在于,所述第二获取模块,被配置为:
    获取所述第一演唱音频对应的歌词数据中各句子的开始播放时间点和结束播放时间点;
    基于所述各句子的开始播放时间点和结束播放时间点,确定所述第一演唱音频中非演唱部分的多个第一音频段,按播放时间顺序对所述多个第一音频段进行拼接得到第一音频。
  11. 根据权利要求10所述的装置,其特征在于,所述第三获取模块,被配置为:
    基于所述各句子的开始播放时间点和结束播放时间点,在所述原始伴奏音频中,确定与所述第一演唱音频中非演唱部分相对应的多个第二音频段,按播放时间顺序对所述多个第二音频段进行拼接得到第二音频。
  12. 根据权利要求9所述的装置,其特征在于,所述第二获取模块,被配置为:将所述第一音频划分为多个预设时长的第三音频段,确定每个第三音频段的响度特征;
    所述第三获取模块,被配置为:将所述第二音频划分为多个预设时长的第四音频段,确定每个第四音频段的响度特征。
  13. 根据权利要求12所述的装置,其特征在于,所述确定模块,被配置为:
    在所有第三音频段的响度特征中,选取最小的第一预设数目个第一响度特征,在所有第四音频段的响度特征中,选取最小的第一预设数目个第二响度特征;
    将所述第一预设数目个第一响度特征之和与所述第一预设数目个第二响度特征之和的比值,确定为所述第一演唱音频进行伴奏音量调节的调节比例信息。
  14. 根据权利要求12所述的装置,其特征在于,所述第二获取模块,被配置为:对于每个第三音频段,在所述第三音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为所述第 三音频段的响度特征;
    所述第三获取模块,被配置为:对于每个第四音频段,在所述第四音频段中均匀选取第二预设数目个播放时间点,确定选取的每个播放时间点对应的音频幅值的均方根,作为所述第四音频段的响度特征。
  15. 根据权利要求9-14任一项所述的装置,其特征在于,所述装置还包括录制模块,被配置为:
    基于所述调节比例信息,对所述原始伴奏音频进行音量调节,得到调节后的伴奏音频;
    基于所述调节后的伴奏音频,录制第二演唱音频。
  16. 根据权利要求15所述的装置,其特征在于,所述录制模块,被配置为:
    获取对所述第一演唱音频进行片段重录制的片段时间信息;
    基于所述片段时间信息,截取调节后的伴奏音频的部分伴奏音频,基于所述部分伴奏音频录制演唱音频片段;
    使用所述演唱音频片段,替换所述第一演唱音频中与所述片段时间信息相对应的演唱音频片段,得到所述第二演唱音频。
  17. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令,所述至少一条指令由所述处理器加载并执行以实现如权利要求1至权利要求8任一项所述的确定音量的调节比例信息方法所执行的操作。
  18. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述至少一条指令由处理器加载并执行以实现如权利要求1至权利要求8任一项所述的确定音量的调节比例信息方法所执行的操作。
PCT/CN2020/120044 2019-10-10 2020-10-09 确定音量的调节比例信息的方法、装置、设备及存储介质 WO2021068903A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/766,911 US20230252964A1 (en) 2019-10-10 2020-10-09 Method and apparatus for determining volume adjustment ratio information, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910958720.1A CN110688082B (zh) 2019-10-10 2019-10-10 确定音量的调节比例信息的方法、装置、设备及存储介质
CN201910958720.1 2019-10-10

Publications (1)

Publication Number Publication Date
WO2021068903A1 true WO2021068903A1 (zh) 2021-04-15

Family

ID=69112019

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/120044 WO2021068903A1 (zh) 2019-10-10 2020-10-09 确定音量的调节比例信息的方法、装置、设备及存储介质

Country Status (3)

Country Link
US (1) US20230252964A1 (zh)
CN (1) CN110688082B (zh)
WO (1) WO2021068903A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688082B (zh) * 2019-10-10 2021-08-03 腾讯音乐娱乐科技(深圳)有限公司 确定音量的调节比例信息的方法、装置、设备及存储介质
CN111326132B (zh) * 2020-01-22 2021-10-22 北京达佳互联信息技术有限公司 音频处理方法、装置、存储介质及电子设备
CN111491176B (zh) * 2020-04-27 2022-10-14 百度在线网络技术(北京)有限公司 一种视频处理方法、装置、设备及存储介质
CN111813367A (zh) * 2020-07-22 2020-10-23 广州繁星互娱信息科技有限公司 调整音量的方法、装置、设备以及存储介质
CN112216294B (zh) * 2020-08-31 2024-03-19 北京达佳互联信息技术有限公司 音频处理方法、装置、电子设备及存储介质
CN114466241A (zh) * 2022-01-27 2022-05-10 海信视像科技股份有限公司 显示设备及音频处理方法
CN114863953A (zh) * 2022-04-21 2022-08-05 杭州网易云音乐科技有限公司 音量调节方法、装置、存储介质和计算设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107680571A (zh) * 2017-10-19 2018-02-09 百度在线网络技术(北京)有限公司 一种歌曲伴奏方法、装置、设备和介质
US20180247675A1 (en) * 2015-11-23 2018-08-30 Guangzhou Kugou Computer Technology Co., Ltd. Audio file re-recording method, device and storage medium
CN109300482A (zh) * 2018-09-13 2019-02-01 广州酷狗计算机科技有限公司 音频录制方法、装置、存储介质以及终端
CN109828740A (zh) * 2019-01-21 2019-05-31 北京小唱科技有限公司 音频调节方法及装置
CN110688082A (zh) * 2019-10-10 2020-01-14 腾讯音乐娱乐科技(深圳)有限公司 确定音量的调节比例信息的方法、装置、设备及存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7521623B2 (en) * 2004-11-24 2009-04-21 Apple Inc. Music synchronization arrangement
CN1924992A (zh) * 2006-09-12 2007-03-07 东莞市步步高视听电子有限公司 一种卡拉ok人声播放方法
US10001968B1 (en) * 2016-03-18 2018-06-19 Audio Fusion Systems, LLC Monitor mixing apparatus that presents each musician with summary control of both their contributed channels and the remaining channels, for rapid and accurate sound balance
US10008188B1 (en) * 2017-01-31 2018-06-26 Kyocera Document Solutions Inc. Musical score generator
CN107705778B (zh) * 2017-08-23 2020-09-15 腾讯音乐娱乐(深圳)有限公司 音频处理方法、装置、存储介质以及终端
CN109003627B (zh) * 2018-09-07 2021-02-12 广州酷狗计算机科技有限公司 确定音频得分的方法、装置、终端及存储介质
CN109859729B (zh) * 2019-01-21 2021-03-05 北京小唱科技有限公司 对音频进行波形幅度控制方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180247675A1 (en) * 2015-11-23 2018-08-30 Guangzhou Kugou Computer Technology Co., Ltd. Audio file re-recording method, device and storage medium
CN107680571A (zh) * 2017-10-19 2018-02-09 百度在线网络技术(北京)有限公司 一种歌曲伴奏方法、装置、设备和介质
CN109300482A (zh) * 2018-09-13 2019-02-01 广州酷狗计算机科技有限公司 音频录制方法、装置、存储介质以及终端
CN109828740A (zh) * 2019-01-21 2019-05-31 北京小唱科技有限公司 音频调节方法及装置
CN110688082A (zh) * 2019-10-10 2020-01-14 腾讯音乐娱乐科技(深圳)有限公司 确定音量的调节比例信息的方法、装置、设备及存储介质

Also Published As

Publication number Publication date
US20230252964A1 (en) 2023-08-10
CN110688082A (zh) 2020-01-14
CN110688082B (zh) 2021-08-03

Similar Documents

Publication Publication Date Title
WO2021068903A1 (zh) 确定音量的调节比例信息的方法、装置、设备及存储介质
CN108008930B (zh) 确定k歌分值的方法和装置
CN109033335B (zh) 音频录制方法、装置、终端及存储介质
CN109040297B (zh) 用户画像生成方法及装置
CN108538302B (zh) 合成音频的方法和装置
CN110933330A (zh) 视频配音方法、装置、计算机设备及计算机可读存储介质
CN109168073B (zh) 直播间封面显示的方法和装置
CN110491358B (zh) 进行音频录制的方法、装置、设备、系统及存储介质
WO2020103550A1 (zh) 音频信号的评分方法、装置、终端设备及计算机存储介质
CN109994127B (zh) 音频检测方法、装置、电子设备及存储介质
CN109192218B (zh) 音频处理的方法和装置
CN109346111B (zh) 数据处理方法、装置、终端及存储介质
CN111061405B (zh) 录制歌曲音频的方法、装置、设备及存储介质
CN109743461B (zh) 音频数据处理方法、装置、终端及存储介质
WO2022111168A1 (zh) 视频的分类方法和装置
CN109616090B (zh) 多音轨序列生成方法、装置、设备及存储介质
CN109065068B (zh) 音频处理方法、装置及存储介质
CN110996167A (zh) 在视频中添加字幕的方法及装置
CN111081277B (zh) 音频测评的方法、装置、设备及存储介质
CN110956971A (zh) 音频处理方法、装置、终端及存储介质
CN112667844A (zh) 检索音频的方法、装置、设备和存储介质
CN110798327B (zh) 消息处理方法、设备及存储介质
CN110867194B (zh) 音频的评分方法、装置、设备及存储介质
CN113963707A (zh) 音频处理方法、装置、设备和存储介质
CN109547847B (zh) 添加视频信息的方法、装置及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20874311

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.08.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20874311

Country of ref document: EP

Kind code of ref document: A1