CN111862913B - Method, device, equipment and storage medium for converting voice into rap music - Google Patents

Method, device, equipment and storage medium for converting voice into rap music Download PDF

Info

Publication number
CN111862913B
CN111862913B CN202010688502.3A CN202010688502A CN111862913B CN 111862913 B CN111862913 B CN 111862913B CN 202010688502 A CN202010688502 A CN 202010688502A CN 111862913 B CN111862913 B CN 111862913B
Authority
CN
China
Prior art keywords
alignment
rhythm
information
period
music
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010688502.3A
Other languages
Chinese (zh)
Other versions
CN111862913A (en
Inventor
徐雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Baiguoyuan Information Technology Co Ltd
Original Assignee
Guangzhou Baiguoyuan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Baiguoyuan Information Technology Co Ltd filed Critical Guangzhou Baiguoyuan Information Technology Co Ltd
Priority to CN202010688502.3A priority Critical patent/CN111862913B/en
Publication of CN111862913A publication Critical patent/CN111862913A/en
Priority to PCT/CN2021/095236 priority patent/WO2022012164A1/en
Application granted granted Critical
Publication of CN111862913B publication Critical patent/CN111862913B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/111Automatic composing, i.e. using predefined musical rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/341Rhythm pattern selection, synthesis or composition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B20/00Energy efficient lighting technologies, e.g. halogen lamps or gas discharge lamps
    • Y02B20/40Control techniques providing energy savings, e.g. smart controller or presence detection

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for converting voice into rap music. The method comprises the steps of identifying the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music; determining at least one alignment period for aligning the voice segment with the background music according to the text attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period; and controlling the alignment of the characters in the voice section and the rhythm points in the background music according to each alignment information table, and forming the rap audio after tone changing and adjustment and special effect processing. The method converts random recorded voice into the rap segment matched with background music, does not need to limit the voice content to be converted, ensures the free recording of the voice content to be converted, simplifies the realization process of voice conversion, avoids the situation that the rhythm points of voice characters and music are misplaced, and improves the application range of voice conversion rap music.

Description

Method, device, equipment and storage medium for converting voice into rap music
Technical Field
The embodiment of the invention relates to the technical field of music production, in particular to a method, a device, equipment and a storage medium for converting voice into rap music.
Background
With the popularization of various K song software, the research of a sound repairing algorithm and a voice-to-music algorithm is gradually and widely focused, and the interest of people for automatically repairing sound and speaking and singing is also increasing. In recent years, the rap culture gradually enters the field of view of the public, and the rap music is characterized in that a creator rapidly and rhythmically speaks a series of rhyme words under background music, a complex process is often required in the rap music making process, and most non-audio processing staff are required to learn to use professional audio processing software and are time-consuming to perform complex manual operation.
Aiming at the problems, some voice conversion software suitable for operation of non-audio processing personnel currently appears, however, different defects exist in the process of realizing voice conversion and raping of the existing different voice conversion software, for example, one of the technical schemes of voice conversion and raping defines the technical scheme that specific lyrics need to be read, and since the lyrics are completely matched with background music, the alignment position of words and rhythm points is fixed, and the scheme can not well process the conditions of unknown lyrics content and length, thereby reducing the creation space when a user applies and further limiting the application prospect of the scheme. For another example, in another technical scheme of voice-to-talk singing, the algorithm design on audio segmentation and audio alignment is complex, the conversion difficulty is increased, meanwhile, the problem of misplacement of the rhythm points of voice characters and music exists, and the conversion mode is not beneficial to the effective processing of uploading music by a user.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a method, apparatus, device and storage medium for converting speech into rap music, so as to solve the problems of limited speech content and poor speech conversion effect in the existing speech conversion.
In a first aspect, an embodiment of the present invention provides a method for converting speech into rap music, including:
identifying the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music;
determining at least one alignment period for aligning the speech segment with the background music according to the text attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period;
and controlling the alignment of the characters in the voice section and the rhythm points in the background music according to each alignment information table, and forming the rap audio after tone changing adjustment and special effect processing.
In a second aspect, an embodiment of the present invention provides an apparatus for converting speech into rap music, including:
the information determining module is used for identifying the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music;
An alignment information determining module, configured to determine at least one alignment period for aligning the speech segment with the background music according to the text attribute information and the music tempo information, and obtain an alignment information table of each alignment period;
and the conversion control module is used for controlling the alignment of the characters in the voice section and the rhythm points in the background music according to each alignment information table, and forming the rap audio after tone change adjustment and special effect processing.
In a third aspect, an embodiment of the present invention provides a computer apparatus, including:
one or more processors;
a storage means for storing one or more programs;
the one or more programs are executed by the one or more processors to cause the one or more processors to implement a method for converting speech into rap provided by an embodiment of the first aspect of the present invention.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method for converting speech to rap music provided by the embodiments of the first aspect of the present invention.
In the method, the device, the equipment and the storage medium for converting the voice into the rap music provided by the embodiment of the invention, firstly, the obtained voice section can be identified, and the selected background music is processed to obtain the character attribute information of the characters in the voice section and the music rhythm information in the background music; then, at least one alignment period for matching the voice section with the background music can be determined according to the character attribute information and the music rhythm information, and an alignment information table of each corresponding period is obtained; finally, according to each alignment information table, the alignment of the characters in the voice section and the rhythm points in the background music is controlled, and the rap audio is formed after the tone change adjustment and special effect processing. According to the technical scheme, the voice content fragments recorded at will by the user are effectively converted into the rap fragments matched with the background music, the tedious process of manual audio clip production is simplified, and the rap music production possibility is provided for non-professional audio processing staff; meanwhile, compared with the existing voice conversion rap method, the voice content to be converted is not required to be limited, free recording of the voice content to be converted is guaranteed, the implementation process of voice conversion is simplified, the situation that voice characters are misplaced with music rhythm points is avoided, and the application range of voice conversion rap music is widened.
Drawings
FIG. 1 is a flow chart of a method for converting speech into rap music according to a first embodiment of the present invention;
fig. 2 is a flowchart of a method for converting speech into rap music according to a second embodiment of the present invention;
FIG. 3 is a flowchart showing the implementation of determining an alignment period in the method for converting speech to rap music according to the present embodiment;
FIG. 4 is a flowchart showing the implementation of determining the alignment units and the information of the alignment units in the pair Ji Zhou in the method for converting speech into rap music according to the present embodiment;
FIG. 5 is a flowchart showing a specific development of determining the alignment units and the alignment unit information in the pair Ji Zhou according to an embodiment of the present invention;
fig. 6 is a block diagram showing a device for converting speech into talking music according to the third embodiment of the present invention;
fig. 7 is a schematic hardware structure of a computer device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings. It should be understood that the described embodiments are merely some, but not all, embodiments of the invention. Furthermore, embodiments of the invention and features of the embodiments may be combined with each other without conflict.
In the description of the present application, it should be understood that the terms "first," "second," "third," and the like are used merely to distinguish between similar objects and are not necessarily used to describe a particular order or sequence, nor should they be construed to indicate or imply relative importance. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art according to the specific circumstances.
Example 1
Fig. 1 is a flowchart of a method for converting speech into rap music, which is applicable to the case of converting speech segments recorded by a user into rap music, and may be performed by an apparatus for converting speech into rap music, where the apparatus may be implemented by software and/or hardware, and may be generally integrated on a computer device.
In the application mode, a background music selection interface can be provided for a user first, so that the background music selected by the user is obtained; then, a voice content selection interface can be provided for the user, so that a voice section recorded in real time by the user through triggering the recording button or a pre-recorded voice section uploaded by the user through triggering the uploading button is obtained; then, the method for converting the voice provided by the embodiment into the talking music can be implemented to convert the obtained voice section into the talking section matched with the background music.
As shown in fig. 1, a method for converting speech into rap music according to a first embodiment of the present invention includes the following operations:
s101, recognizing the obtained voice segment, processing the selected background music, and obtaining character attribute information of characters in the voice segment and music rhythm information of the background music.
In this embodiment, the obtained speech segment may be understood as a speech segment recorded or prerecorded in real time by the user obtained before the step is performed, and the selected background music may be understood as music to be used selected from the background music set by the user received before the step is performed.
The step can carry out voice recognition on the voice section, thereby acquiring the related character attribute information such as the character serial number of the characters included in the voice section, the character pronunciation time (the starting and ending time of the characters), the starting position of the first vowel in the characters and the like; the detection processing of the music beats can also be carried out on the background music, so that the related music rhythm information such as the rhythm point serial numbers of each rhythm point included in the background music, the positions of the rhythm points and the rhythm points included in each beat period formed by dividing can be obtained.
It is to be understood that the specific modes of voice recognition, text detection, and rhythm point detection are not limited in this embodiment, as long as the required text attribute information and music rhythm information can be obtained.
S102, determining at least one alignment period for aligning the voice segment with the background music according to the text attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period.
It should be noted that, for the formation of a user's voice to music piece, in addition to the above-mentioned steps of voice recognition and beat detection, the most important link is to align the corresponding text in the voice piece with the rhythm point in the background music. The alignment of the speech segment and the background music can be considered as dividing the speech into individual words, and then each word is applied to strong-rhythm regular accent points, wherein the repeated use of some head-tail words or intermediate words is possibly accompanied by the enhancement of rhythm sense. Thus, in the present embodiment, when the conversion of speech into rap music is implemented, it is necessary to determine the alignment period for aligning the speech segment with the background music and the corresponding alignment information table through the present step.
In particular, the alignment period is understood as a minimal repeating unit comprising a rhythm point that can be aligned with all the words in the speech segment, i.e. starting from a certain time T, the rhythm of the background music is repeated with a fixed period T that can be aligned with all the words in the speech segment. The alignment information table may be specifically understood as an information statement table including information such as correspondence information (e.g., a sequence number of a rhythm point, a sequence number of a character) between a required rhythm point and the character to be aligned, and a speed ratio when the rhythm point is aligned with the character in one alignment period.
The specific implementation of this step can be expressed as:
first, the total number of words included in the speech segment can be determined from the word attribute information, and the total number of rhythm points included in the background music and the period information of the beat period formed by dividing the rhythm points can be determined from the music rhythm information. The beat period is understood to mean a smallest repeating unit of the rhythm found from the rhythm point, i.e. starting from a certain time, the rhythm of the background music is repeated with a fixed period Z.
Then, according to the total number of characters and the number of rhythm points included in one rhythm period, whether the existing rhythm period can meet the condition of being used as one alignment period or not can be determined, if so, each rhythm period is directly regarded as the alignment period, and if not, the period length of the rhythm period needs to be updated, and the rhythm period which can be used as the alignment period is obtained.
Then, because the rhythm of each alignment period is repeated, an alignment period can be randomly selected, and the rhythm point information of rhythm points in the alignment period relative to one alignment period is extracted from the music rhythm information in combination with the character start-stop time and the initial position of the first vowel of the character in the character attribute information to determine the rhythm point to be aligned of each character in the speech segment in the alignment period and the speed ratio required by aligning the rhythm point, thereby forming an information table comprising the sequence numbers of the rhythm points, the character sequence numbers of the associated characters and the corresponding speed ratio to serve as an alignment information table of the alignment period.
Finally, the present step may consider the alignment information table as an alignment information table for each complete alignment period, whereas for incomplete alignment periods, part of the alignment information may be extracted from the alignment information table to construct a corresponding alignment information table, whereby at least one alignment period, and an alignment information table corresponding to each alignment period, is obtained by the present step.
And S103, controlling the alignment of the characters in the voice section and the rhythm points in the background music according to each alignment information table, and forming the rap audio after the adjustment of the change and the special effect processing.
In this embodiment, this step may directly determine the matched text and the rhythm point by the above-mentioned alignment period formed by dividing the rhythm point of the background music and the alignment information table including the alignment relationship between the Duan Wen word of the voice and the rhythm point, and control the alignment of the text in the voice segment and the rhythm point in the background music and change the speed of the aligned audio based on the corresponding speed ratio, and then, may further change the tone of the audio after changing the speed according to the pitch of the background music and add special effects such as reverberation to form the converted rap audio.
The first embodiment of the invention provides a method for converting voice into rap music, which can firstly identify the obtained voice section and process the selected background music to obtain the character attribute information of characters in the voice section and the music rhythm information in the background music; then, at least one alignment period for matching the voice section with the background music can be determined according to the character attribute information and the music rhythm information, and an alignment information table of each corresponding period is obtained; finally, according to each alignment information table, the alignment of the characters in the voice section and the rhythm points in the background music is controlled, and the rap audio is formed after the tone change adjustment and special effect processing. According to the technical scheme, the voice content fragments recorded at will by the user are effectively converted into the rap fragments matched with the background music, the complicated process of manual audio editing production is simplified, and the possibility of rap music production is provided for non-professional audio processing staff; meanwhile, compared with the existing voice conversion rap method, the voice content to be converted is not required to be limited, free recording of the voice content to be converted is guaranteed, the implementation process of voice conversion is simplified, the situation that voice characters are misplaced with music rhythm points is avoided, and the application range of voice conversion rap music is widened.
As an optional embodiment of the first embodiment of the present invention, on the basis of the foregoing embodiment, the optimizing further includes, before determining at least one alignment period for aligning the speech segment with the background music based on the text attribute information and the music tempo information, and obtaining an alignment information table for each of the alignment periods: and if the total text amount in the text attribute information is larger than the total rhythm point amount in the music rhythm information, ending the process of converting the voice segment into the rap music and giving a prompt of reacquiring the voice segment or background music.
It should be noted that, in the implementation of the method for converting speech into rap music according to the present embodiment, the execution condition of S102 and S103 may default that the total amount of text in the text attribute information obtained in S101 is less than or equal to the total amount of rhythm points in the music rhythm information, that is, the total number of words in the speech segment to be obtained is greater than the number of rhythm points in the background music. When the above conditions are not satisfied, it may be considered that the condition for continuing the conversion of speech into the rap music is not satisfied, at this time, the operation of this alternative embodiment may be performed, that is, when it is determined that the total text amount is greater than the total play point amount, the subsequent step of converting the speech segment into the rap music may be ended, and at the same time, a prompt for re-recording the speech segment is performed to inform the user of re-recording the speech segment. Alternatively, there may be other optional operations, for example, the present optional embodiment may also give a prompt to reselect the background music, informing the user of the reselection of the background music.
The operation of the alternative embodiment ensures the effective matching of the voice segment to be converted and the background music, thereby improving the user experience of converting the voice into the rap music.
Example two
Fig. 2 is a flow chart of a method for converting speech into rap music according to a second embodiment of the present invention, wherein the second embodiment is optimized based on the first embodiment, and in this embodiment, specifically, the steps of identifying the obtained speech segment and processing the selected background music, obtaining text attribute information of the text in the speech segment and music rhythm information of the background music are further optimized as follows: carrying out noise reduction processing and end point detection processing on the voice section selected by the user, and obtaining the character sequence number, the start and stop time, the initial position of the first vowel and the total quantity of characters of each character in the voice section through voice recognition of the processed voice section to form character attribute information of the voice section; detecting rhythm points and dividing beat periods of background music selected by a user, determining the total quantity of the rhythm points, sequence numbers of the rhythm points and period information of each beat period contained in the background music, and forming music rhythm information of the background music; wherein the period information includes: the cycle number, the number of rhythm points of the rhythm points included in the cycle of the beat, the sequence number of the rhythm points of each rhythm point and the starting time of the rhythm points.
Meanwhile, according to the text attribute information and the music rhythm information, the embodiment determines at least one alignment period for aligning the speech segment with the background music, and obtains an alignment information table of each alignment period for further optimization: determining at least one alignment period for aligning the speech segment with the background music according to the total text amount in the text attribute information and the period information of each beat period in the music rhythm information; selecting a complete alignment period as a to-be-aligned rhythm segment, and determining at least one alignment unit and corresponding alignment unit information according to the character attribute information and rhythm point information of to-be-aligned rhythm points in the to-be-aligned rhythm segment; and summarizing the Ji Shan meta-information of each pair to form a current alignment information table of the rhythm segment to be aligned, and determining the alignment information table of each remaining alignment period according to the current alignment information table.
As shown in fig. 2, a method for converting speech into rap music according to the second embodiment includes the following operations:
s201, carrying out noise reduction processing and end point detection processing on the voice section selected by the user, and obtaining the character serial number, the start and stop time, the initial position of the first vowel and the total amount of characters of each character in the voice section through voice recognition of the processed voice section to form character attribute information of the voice section.
In this embodiment, the noise processing policy in the audio processing may be used to perform noise reduction processing on the recorded speech segment, the endpoint detection policy may be used to remove the silence segment from the noise-reduced speech segment, and then the speech recognition policy may be used to recognize the processed speech segment, so as to obtain relevant information of each text that forms the speech segment.
The obtained information may specifically include the total amount of text included in the whole speech segment, the serial number of each text, and the corresponding start-stop time of the text in the speech segment, where the start-stop time of the text and the start-stop position of the first vowel may be regarded as a relative time point, i.e., the start time of the first text may be regarded as 0 seconds according to the playing order of the whole speech. The embodiment can record the information as text attribute information corresponding to the voice segment.
For example, table 1 shows a data table effect display of text attribute information, and as shown in table 1, each column in table 1 can be regarded as a text attribute item, at least including a text serial number, a start time of a text, a start time of a first vowel in the text, and an end time of the text, and the number of rows in the table can be regarded as the total number of the text included in the speech segment.
TABLE 1 text attribute information for text in speech segment
S202, detecting rhythm points and dividing beat periods of background music selected by a user, and determining period information containing the total quantity of the rhythm points, the serial numbers of the rhythm points and each beat period in the background music to form music rhythm information of the background music.
In this embodiment, a strong-rhythm stress point (i.e., a rhythm point) may be detected from background music by using a rhythm point detection strategy in audio processing, and then a pronunciation rule of the detected rhythm point may be found by using a beat division strategy, so as to divide a beat period having a minimum rhythm repeating unit. For a piece of background music, the detected rhythm points have certain attribute information, such as the serial numbers of the rhythm points, the total amount of the rhythm points and the position of the rhythm points (namely, the relative time of occurrence of the rhythm points), and meanwhile, after the beat detection is performed, corresponding period information is formed corresponding to each beat period, and the period information can include: the cycle number, the number of rhythm points of the rhythm points included in the cycle of the beat, the sequence number of the rhythm points of each rhythm point and the starting time of the rhythm points. The present embodiment can aggregate these pieces of information to form one piece of music tempo information.
As an example, this embodiment gives a piece of music tempo information in the form of a data table, and thus the music tempo information is shown as information in the information table, as shown in table 2, table 2 gives a data table effect showing the music tempo information, it can be seen that table 2 is a cascade table, the first column of table 2 shows beat periods identified by cycle numbers, the second column gives tempo point numbers, and at the same time, shows in cascade form which tempo points are included in beat periods of cycle number 1, the tempo point numbers included in the beat periods and the tempo point positions (i.e., start times) are cascade-connected down to each cycle number, and the number of rows of the cascaded tempo points down to each cycle number can be regarded as the number of tempo points of the beat period.
Table 2 music tempo information corresponding to background music
The following S203 to S205 of the present embodiment show specific implementations of an alignment period and an alignment information table required for determining the alignment of a speech segment with background music by text attribute information and music tempo information.
S203, determining at least one alignment period for aligning the voice segment with the background music according to the text total amount in the text attribute information and the period information of each beat period in the music rhythm information.
In this embodiment, it is first determined by this step how many alignment periods for aligning all the characters in the speech segment can be included in the whole background music, and it is known that when the number of alignment periods is greater than 1, the last alignment period determined may be an incomplete period (i.e., not all the characters are included). This step corresponds to a rough alignment period division of the background music. The whole dividing process needs to determine whether to directly take the beat period as an alignment period or an alignment period obtained by combining the beat period by means of the total text amount of the text included in the voice section and the number of the rhythm points in a complete beat period in the background music and comparing the total text amount with the number of the rhythm points in the beat period.
Further, fig. 3 is a flowchart for implementing the method for converting speech into rap music according to the embodiment, where, as shown in fig. 3, the determining at least one alignment period for aligning the speech segment with the background music according to the text total amount in the text attribute information and the period information of each beat period in the music rhythm information may be specifically optimized as follows:
S2031, selecting a complete beat period, and acquiring the number of rhythm points in the corresponding period information.
It will be appreciated that at least one beat period may be detected in the whole piece of background music, and when one beat period is detected, the beat period is considered as a complete period, and when more than 1 beat period is detected, the last period formed by division may be an incomplete period, i.e. not include all beat points in a fixed period. In this embodiment, one of the whole beat periods may be selected, and the number of beat points in the period information corresponding to the selected beat period may be obtained. Wherein the value of the number of tempo points in each complete beat period is the same.
S2032, judging whether the number of the rhythm points is larger than or equal to the total text amount, if so, executing S2033; if not, S2034 is executed.
The main purpose of the decision of this step is to determine whether or not all the characters in the obtained speech segment can be accommodated for one complete beat period obtained by the current detection, and if so, execute S2033; if not, execution of S2034 is required.
S2033, each of the beat periods is regarded as one alignment period.
With the above determination, when the number of rhythm points is greater than or equal to the total amount of text, the beat period can be directly regarded as an alignment period. It is to be noted that, when it is determined that a complete beat cycle has the condition, other detected complete beat cycles can be regarded as a complete alignment cycle as well, and an included incomplete beat cycle can be regarded as an incomplete alignment cycle.
S2034, if the number of beat cycles included in the background music is greater than 1, executing S2035; if not, then S2036 is executed.
And when the number of the rhythm points is smaller than the total quantity of the characters, the method is equivalent to that a complete rhythm period can not accommodate all the characters in the obtained voice section, at the moment, the rhythm periods need to be combined through the step, and the condition of combination is that the number of the rhythm periods included in the background music is at least two. Whether the number of beat cycles in the background music is greater than 1 can be continuously determined through the step, if so, the merging condition is met, and the step 2035 can be continuously executed; otherwise, corresponding to the segment of background music not matching the speech segment, execution of S2036 is required.
And S2035, merging the beat periods in pairs according to the arrangement sequence of the period numbers to form at least one new beat period, and returning to the execution of S2031.
In this embodiment, when the number of beat periods is greater than 1, the beat periods may be combined in pairs according to the period number order, so as to form a new beat period, and it can be known that the period information corresponding to the newly formed beat period will also change correspondingly. Taking the above table 2 as an example, assume that two beat periods with period number 1 and period number 2 are combined, and the number of rhythm points included in the formed new beat period is the sum of the numbers of the previous two rhythm points. Meanwhile, it can be known that after the step of merging every two, the number of the formed beat periods is half of the number of the original beat periods plus 1, and then S2031 can be returned to further perform alignment period determination according to the period information of the newly formed beat periods, so that the cycle is performed until a proper alignment period is found, or the conversion operation from the subsequent voice to the rap music is finished when the search fails.
S2036, the process of converting the speech segment into the rap music is ended, and a prompt is given to retrieve the speech segment or the background music.
In this embodiment, if the number of beat periods is only one and the number of rhythm points is smaller than the total text, it may be considered that the speech segment does not match the selected background music, and the re-uploading or re-recording of the speech segment is required to be obtained through the operation of an alternative embodiment of this embodiment, or the background music is re-selected.
S204, selecting a complete alignment period as a to-be-aligned rhythm segment, and determining at least one alignment unit and corresponding alignment unit information according to the character attribute information and rhythm point information of to-be-aligned rhythm points in the to-be-aligned rhythm segment.
After the above-mentioned alignment period division is performed, the matching of each character included in the speech segment with respect to each rhythm point in the alignment period can be determined by taking one alignment period as a reference in this step. In this embodiment, matching between a rhythm point included in a period of time and a character in a speech segment is specifically regarded as an alignment unit, and each alignment unit information specifically includes a rhythm point number of the existing rhythm point, a character number of the character matched with the rhythm point, and a speed ratio required when aligning the existing rhythm point with the matched character.
Each alignment unit has alignment unit information including at least a rhythm point number, a character number, and a speed ratio. Meanwhile, since the number of rhythm points included in each alignment period is the same, and the music rhythms are the same, the step can only determine the alignment unit and the information of the alignment unit for any complete alignment period.
Specifically, the implementation process of determining the alignment unit and the alignment unit information in this step may be described as: firstly, marking an alignment period selected for information determination as a to-be-aligned rhythm segment, wherein rhythm point information of the alignment period can be directly used as rhythm point information of to-be-aligned rhythm points in the to-be-aligned rhythm segment; then, the step can determine an alignment matching value for aligning the word with the rhythm point according to the word attribute information and the rhythm point information; then determining an alignment range of the alignment matching value in a preset rhythm point-character alignment rule table; finally, determining alignment units in the rhythm segment to be aligned and determining alignment unit information of each alignment unit based on an alignment rule corresponding to the alignment range, wherein the alignment range and the corresponding alignment rule of the rhythm point-character alignment rule table can be preset through historical experience.
Further, fig. 4 shows a flowchart of determining an alignment unit and alignment unit information in a pair Ji Zhou in a method for converting speech into rap music, and as shown in fig. 4, determining at least one alignment unit and corresponding alignment unit information according to the text attribute information and the tempo point information of a tempo point to be aligned in a tempo segment to be aligned may be specifically optimized as follows:
it is to be noted that the following S2041 to S2048 are specific expansion execution of S204 described above in this embodiment.
S2041, selecting a complete alignment period as a to-be-aligned rhythm segment, forming to-be-aligned rhythm blocks corresponding to-be-aligned rhythm points one by one based on rhythm point information of to-be-aligned rhythm points in the to-be-aligned rhythm segment, and recording the number of the to-be-aligned rhythm points as the initial number of remaining points.
In this embodiment, a complete alignment period may be selected from the determined pair Ji Zhou as a to-be-aligned tempo segment determined by the alignment information table, where to-be-aligned tempo points in the to-be-aligned tempo segment are each tempo point included in the alignment period, and tempo point information of the included tempo points is tempo point information of the to-be-aligned tempo points.
It should be noted that, in this embodiment, the interval formed by two adjacent to-be-aligned rhythm points may be recorded as one to-be-aligned rhythm block, so that the same number of to-be-aligned rhythm blocks may be formed according to the number of to-be-aligned rhythm points included in the to-be-aligned rhythm segment, that is, the formed to-be-aligned rhythm blocks may be considered to correspond to the to-be-aligned rhythm points one to one, where in this step, a corresponding block serial number may be set for each to-be-aligned rhythm block, and then, the number of to-be-aligned rhythm points may be preferably recorded as the initial number of remaining points in this step.
S2042, determining the ratio of the number of the residual points to the total amount of Chinese in the text attribute information, and marking the ratio as an alignment matching value.
In order to achieve matching between each to-be-aligned rhythm point in the to-be-aligned rhythm segment and the text in the voice segment, the ratio of each to-be-aligned rhythm point, which is not matched with the text, in the to-be-aligned rhythm segment to the total text is determined through the method, and the ratio is recorded as an alignment matching value.
It can be understood that when there are no matched rhythm points in the rhythm segment to be aligned, the number of rhythm points to be matched is all rhythm points to be aligned, and therefore, the number of remaining points is initially set to be the number of included rhythm points to be aligned.
S2043, searching a preset rhythm point-character alignment rule table, and determining the length ratio range to which the alignment matching value belongs.
In this embodiment, a rhythm point-text alignment rule table is preset, where the rule table is a binary association table, and two associated objects are a length ratio range and an alignment rule respectively. The length ratio range can be specifically set by the ratio of the number of unmatched rhythm points in an alignment period to the total text included in the whole voice segment. Preferably, the length ratio ranges of 6 different intervals are formed based on historical experience, and are respectively: (0,0.2 ], (0.2, 0.8], (0.8,1 ], (1, 1.1], (1.1, 1.3) and (1.3, +..
In this embodiment, the length ratio range of the alignment matching value obtained above in the rhythm point-text alignment rule table may be determined.
S2044, determining that rhythm blocks to be aligned with the matched characters exist according to the alignment rule corresponding to the length ratio range, and marking the rhythm blocks as candidate alignment units.
Through the steps, the length ratio range to which the alignment matching value belongs is determined, and the alignment rule associated with the length ratio range can be obtained.
In this embodiment, the matching of the text and the rhythm points can be regarded as the matching of the text and one rhythm block to be aligned, based on the alignment rule corresponding to the length ratio range, the text (the number of the text is uncertain but is at least 1) matched with each rhythm block to be aligned can be determined for each rhythm block to be aligned, and the matched rhythm blocks to be aligned can be used as a candidate alignment unit.
In connection with the above description of the alignment rule table of rhythm point-text, corresponding alignment rules are set in this embodiment corresponding to different length ratio ranges, and exemplary, table 3 gives a preset alignment rule table of rhythm point-text. The text matching can be performed for the remaining rhythm points (remaining rhythm segments to be aligned) by the alignment rules corresponding to the length ratio ranges in table 3.
TABLE 3 rhythm point-character alignment rule List
S2045, counting the number of blocks of the remaining rhythm blocks to be aligned as the new number of remaining points.
After the above-mentioned S2044 is adopted for alignment matching once, there may also exist unmatched rhythm blocks to be aligned, and in this step, the number of blocks of the remaining rhythm blocks to be aligned in the rhythm segment to be aligned may be counted, and the number of blocks may be used as the new number of remaining points.
S2046, determining whether the number of the residual points is 0, and if yes, executing S2047; if not, the process returns to S2042.
Through the step, whether the number of the remaining points is 0 can be judged, if so, the remaining rhythm blocks to be aligned in the rhythm segment to be aligned can be considered as 0, namely, all the rhythm blocks to be aligned are matched, and at the moment, the operation of S2047 can be executed; if not 0, it may be considered that there are still unmatched rhythm blocks to be aligned in the rhythm segment to be aligned, at which point the alignment matching value determination operation of S2042 may be returned to be performed again.
It can be understood that, based on the operation of this step, when all the rhythm blocks to be aligned in one rhythm segment to be aligned are matched, the number of the candidate alignment units formed by the same is actually the same as the number of the included rhythm points to be aligned. That is, it can be considered that one candidate alignment unit exists corresponding to one tempo point to be aligned (tempo block to be aligned), and the unit numbers of the formed candidate alignment units may be sequentially incremented by marks from 0 in the alignment order thereof.
To facilitate a better understanding of the determination of candidate alignment elements, an exemplary description is given of this embodiment. For example, assume that the number of to-be-aligned tempo points in one to-be-aligned tempo segment is 8, and the number of remaining points currently determined is 8; the text total amount included in the voice segment obtained by the user is 5, for example, if the text total amount is "light yellow long skirt", the process of matching the "light yellow long skirt" with 8 remaining rhythm points to determine each candidate alignment unit may be described as:
1) The values for Ji Pi are: 8/5=1.6, falls within the length ratio range of (1.3, +_j), looking up table 3 above, the corresponding pair Ji Gui can be obtained.
2) And matching the characters with the rhythm points according to the alignment rule associated with the length ratio range (1.3 and infinity).
Specifically, the alignment rule is: "the characters with the word length of 10% are selected from the first character to be matched from the first remaining rhythm point, then the remaining rhythm points with the word length of 100% are respectively matched with the characters in sequence, and then the characters with the word length of 20% are selected from the last character to be repeatedly matched for the remaining rhythm points with the word length of 20%. Based on this alignment rule, it is first necessary to select 10% word length, i.e. 0.5 words, to repeat starting from the first word of the "pale yellow long skirt". Note that when the word length to be repeated is smaller than 1, the rounding-down operation is performed, and thus, the number of words to be repeated is currently required to be 0. Then, the first remaining rhythm point can be directly started, rhythm points with the word length of 100% are selected and are respectively matched with 5 words in sequence, and at the moment, the to-be-aligned rhythm blocks 0-4 formed by the rhythm points 0-4 respectively correspond to 5 words of light color, yellow color, long skirt color; then, starting from the last word of the 'pale yellow long skirt', a 20% word length, namely 1.5 words, needs to be selected for repetition, and based on the downward rounding, the number of words which need to be repeated currently is 1, namely the last word of the 'skirt', at this time, the to-be-aligned rhythm block formed by the rhythm point 6 corresponds to the 'skirt'. The matching operation of characters and rhythm points according to the alignment rule associated with the length ratio range (1.3 and infinity) is completed, the unit serial numbers of the currently determined candidate alignment units are respectively 0-5, and the characters corresponding to the candidate alignment units are respectively: "pale", "yellow", "color", "long", "skirt".
3) After the above operation, 2 unmatched rhythm blocks to be aligned remain in the 8 rhythm blocks to be aligned, and the number of remaining points is considered to be greater than 0, so that the alignment matching value can be determined again, the new pairing Ji Pi matching value is 2/5=0.4, the new pairing Ji Pi matching value falls into the length ratio range of (0.2, 0.8), and the corresponding alignment rule can be obtained by searching the table 3.
4) And matching the characters with the rhythm points according to the alignment rules associated with the length ratio range (0.2, 0.8).
Specifically, the alignment rule is: when L is less than or equal to 0.5, randomly selecting characters with L word length to be repeated, adjusting the positions of matched playing points and characters, and repeatedly adding after the selected characters; when L is larger than 0.5, randomly selecting characters to be repeated with 50% of word length, adjusting the positions of the matched rhythm points and characters, repeatedly adding the characters after the characters are selected, and adding mute segments for the rest rhythm points with the (L-0.5) word length, wherein L is an aligned matching value. "
Analyzing the alignment matching value of 0.4, it can be known that according to the alignment rule, the alignment matching value of 0.4 is smaller than 0.5, so that an operation of randomly selecting 40% word length (i.e. 2 words) can be directly performed, and given that the randomly selected word sizes of the word sizes of 0-4 are 1 and 3, and the corresponding words are "yellow" and "long", respectively, the "light yellow skirt" formed by the matching needs to be adjusted afterwards, so that after the word to be repeated can be located at the position of the selected word, the words matched by the remaining two rhythm blocks to be aligned are "yellow" and "long" respectively according to the alignment rule, thereby forming new candidate alignment units for respectively matching the two words of "yellow" and "long", and because the word corresponding to each candidate alignment unit is respectively after the operation is performed by adjusting the "light yellow skirt" formed by the matching: "pale", "yellow", "color", "long", "skirt".
5) After the above operation, the remaining unmatched rhythm blocks to be aligned are 0, that is, the number of remaining points is 0, and the matching condition of the ending candidate alignment unit is met, whereby the above operation can be ended.
After this step 5), 8 candidate alignment units with unit numbers 0 to 7 in sequence can be formed. Thereby completing the alignment matching of the characters in the voice section to the rhythm section to be aligned.
S2047, determining at least one alignment unit according to the unit time length of each candidate alignment unit and the matching character attribute information of the matched characters, and obtaining a corresponding speed ratio.
From the above description, it can be known that the number of candidate alignment units determined from the to-be-aligned tempo segment is the same as the number of blocks including the to-be-aligned tempo blocks, and one to-be-aligned tempo block is a spacer block formed from a corresponding tempo point to an adjacent next tempo point or a tempo end point (this case mainly refers to the last tempo point), that is, a duration of one to-be-aligned tempo block is a duration of a space of two tempo points (or tempo end points). In this embodiment, since one candidate alignment unit corresponds to one to-be-aligned tempo block, the duration of the to-be-aligned tempo block may be used as the unit duration of the corresponding candidate alignment unit.
After determining the characters matched with the candidate alignment units, all that is needed is to align the characters matched with the candidate alignment units with the character pronunciation and the unit time. Typically, such alignment may be directly by playing the candidate alignment element audio signal while mixing in the pronunciation of the matched text. Considering that the pronunciation time of some characters is shorter, but the unit duration of the candidate alignment unit matched with the pronunciation time is longer, or, the pronunciation time of some characters is longer, but the unit duration of the candidate alignment unit matched with the pronunciation time is shorter, in order to realize the alignment of the characters and the unit to be aligned, the pronunciation rate of the characters needs to be adjusted, for example, the pronunciation time of the characters is stretched (the pronunciation speed is reduced) or compressed (the pronunciation speed is increased) so as to be equal to the unit duration.
In this embodiment, the ratio value of the text to be stretched or compressed is recorded as the gear ratio, and in this step, the gear ratio required when the matched text is aligned with the corresponding candidate alignment unit may be determined according to the unit duration of the candidate alignment unit and the attribute information of the matched text (such as the start-stop time of the text of the matched text, the start position of the first vowel in the text, etc.) of the text matched with the candidate alignment unit.
However, in the implementation of aligning the text with the unit to be aligned by stretching or compressing the text pronunciation, there is a limit to the extent to which the text pronunciation can be stretched or compressed, if only the text pronunciation is stretched or compressed infinitely in consideration of the alignment, there is a risk of distortion of the audio formed after the actual alignment operation is performed, and therefore, the present embodiment needs to set a proper range for the compression or stretching of the text pronunciation, that is, needs to ensure that the gear ratio corresponding to the text is in a normal ratio range, which can be regarded as a proper condition for the stretching or compression.
Thus, the present step may also determine whether the corresponding candidate alignment unit is suitable as an alignment unit by comparing the gear ratio with the set suitable condition based on the gear ratio calculated as described above, and if so, may directly determine the candidate alignment unit as an alignment unit and determine its corresponding gear ratio as the gear ratio of the alignment unit; if not, it is necessary to perform mute filling or merging processing of two or more candidate alignment units on the candidate alignment units, thereby obtaining an alignment unit satisfying the above-described suitability condition, and taking the gear ratio at which the suitability condition determination is made as the gear ratio of the alignment unit.
Through the operation of this step, the candidate alignment units with the determined number of the to-be-aligned rhythm points can finally form at least one alignment unit, each alignment unit can at least comprise one rhythm point and at least comprise one matched character, and the speed ratio of each alignment unit can be regarded as a ratio value required for stretching or expanding the character when the included character is aligned with the included rhythm point.
S2048, determining the unit serial numbers of the alignment units, the initial rhythm point serial numbers in the included rhythm points, the character serial numbers of the matched characters and the speed ratio as corresponding alignment unit information.
It is known that, when the above-mentioned determination operations of the alignment units and the corresponding gear ratios are performed, the unit number of each alignment unit and the rhythm point number of each rhythm point included in the alignment unit are also obtained correspondingly, and at the same time, the character number of each character matched in the alignment unit may also be obtained. Through the operation of this step, the above information can be summarized for each alignment unit, thereby forming corresponding alignment unit information for each alignment unit.
S205, summarizing the information of each alignment unit to form a current alignment information table of the rhythm segment to be aligned, and determining the alignment information table of each remaining alignment period according to the current alignment information table.
In this embodiment, at least one alignment unit included in the to-be-aligned rhythm segment and corresponding alignment unit information may be determined through S204, and this step may be to arrange and summarize the determined alignment unit information according to a unit serial number sequence of the alignment unit, thereby forming a current alignment information table. Then, the alignment information table of the remaining alignment periods determined in S203 may be determined according to the current alignment information table.
Specifically, for the remaining other alignment periods, if the alignment period is a complete alignment period, the current alignment information table can be duplicated to be directly used as a corresponding alignment information table; if the alignment period is a incomplete alignment period, the alignment unit information of the same row with the number of rhythm points included in the alignment period can be taken out from the current alignment information table to form a corresponding alignment information table.
Table 4 alignment information table formed based on information of each alignment unit in one alignment period
Unit number Start rhythm point number Character serial number Speed change ratio
1 1 2 1.0
2 2 3,4 1.2
3 3 5 0.9
For example, table 4 shows an alignment information table formed based on information of each alignment unit in one alignment period, and each column in the alignment information table corresponds to attribute information of the alignment unit as shown in table 4, and may include: the number of units of the alignment unit, the number of rhythm points of the initial rhythm point in the alignment unit, the character number of each character matched and the speed change ratio required for alignment, and the number of lines of the alignment information table represents the number of units of the alignment unit in the alignment period.
Further, the determining the alignment information table of each remaining alignment period according to the current alignment information table may be embodied as: for each remaining alignment period, if the alignment period is a complete period, using the current alignment information table as an alignment information table of the alignment period; if the alignment period is a non-complete period, determining the number of target points of rhythm points included in the alignment period; and selecting the alignment unit information of the target point number rows in the current alignment information table in an inverted order to form an alignment information table of the alignment period.
The above description of the embodiment specifically provides a determining process of the alignment information table of the rest of the alignment periods in the background music, and for the incomplete alignment period, assuming that the alignment information table includes 2 rhythm points, two rows of alignment unit information can be directly selected from the current alignment information table from bottom to top to form a corresponding alignment information table.
S206, controlling the text in the voice section to be aligned with the rhythm point in the background music according to each alignment information table, and forming the rap audio after the adjustment of the change and the special effect processing.
In this embodiment, it may be specifically known that the alignment information table formed corresponding to each alignment period includes at least one alignment unit and corresponding alignment unit information, and each alignment unit information includes a tempo point number actually used for aligning a text with a tempo point, a matched text number, a speed ratio required for alignment, and the like. After the alignment information table of each alignment period is obtained through the steps, the embodiment can control the corresponding rhythm point to align with the matched text according to the corresponding speed ratio according to the information of each alignment unit included in each alignment information table, thereby realizing the alignment matching of the rhythm point in the voice section and the background music.
It should be noted that, when the text in the voice segment is aligned with the matched rhythm point in this step, for the matching in each alignment period, the actual obtaining of the audio data actually corresponding to each alignment unit in the voice segment according to the pronunciation occupation time length of the text included in each alignment unit in the alignment period (the interval time length from the initial vowel starting point of the alignment unit to the initial vowel starting point of the next unit), and then performing variable speed adjustment on the actually corresponding audio data according to the gear ratio of each alignment unit, and finally, performing operations such as variable speed adjustment, special effect processing on the audio data after variable speed adjustment, so as to form the converted rap music.
The second embodiment of the invention provides a method for converting voice into rap music, which specifically provides a determining operation of text attribute information and music rhythm information, and also provides a specific operation of determining an alignment period and a related alignment information table required for aligning a voice segment with background music. By the method provided by the embodiment, after the background music and the talking voice with random content are selected and recorded by a user, the alignment strategy of matching alignment and speed change of the words and the rhythm points is determined according to the obtained rhythm point positions, the starting and ending time of the single words and the starting time of vowels, and therefore the rap music formed after the words and the rhythm points are aligned can be obtained in a short time through the alignment strategy. The realization of the whole technical scheme simplifies the complicated process of manual audio editing and provides possibility of rap music production for non-professional audio processing staff; meanwhile, compared with the existing voice conversion rap method, the voice content to be converted is not required to be limited, free recording of the voice content to be converted is guaranteed, the implementation process of voice conversion is simplified, the situation that voice characters are misplaced with music rhythm points is avoided, and the application range of voice conversion rap music is widened.
As an optional embodiment of the second embodiment of the present invention, before executing the determination in S202 that the background music includes the total amount of tempo points, the sequence number of tempo points, and the period information of each beat period, the further optimization includes:
acquiring detected initial rhythm points, and determining interval duration formed by two adjacent initial rhythm points; and determining the rhythm points to be deleted in the initial rhythm points according to the average word length of the words included in the voice segment and the interval time length, and deleting the rhythm points to be deleted to obtain the effective rhythm points in the background music.
In this alternative embodiment, specifically, an operation of performing optimization processing on detected tempo points from background music is given, by which more closely spaced tempo points whose adjacent two tempo points are less than half the average word length in interval time can be removed from the detected tempo points (this alternative embodiment is denoted as initial tempo points).
Specifically, the average word length of the words is the ratio of the time length occupied by all words to the total word amount, in general, if the interval time length between two adjacent rhythm points is less than half of the average word length, the alignment of the words and the rhythm points is unfavorable, therefore, any one of the two adjacent rhythm points needs to be deleted, so that the un-deleted rhythm point and the previous or next one of the deleted rhythm points form a new interval time length, and the newly formed interval time length can be determined again by the mode of the alternative embodiment, thereby circularly updating and removing invalid rhythm points and reserving valid rhythm points.
As another optional embodiment of the second embodiment of the present invention, the execution of S2047 is further optimized, fig. 5 shows a specific development flowchart for determining the alignment units and the alignment unit information in Ji Zhou in the embodiment of the present invention, and as shown in fig. 5, at least one alignment unit is further determined according to the unit duration of each candidate alignment unit in combination with the matching text attribute information of the matched text, and a corresponding gear ratio is obtained, where the steps include:
it is to be noted that this alternative embodiment is a specific implementation of S2047 described above. Through the operation of S2046 described above, a certain number of candidate alignment units can be obtained in the rhythm segment to be aligned, and the following operation of this alternative embodiment can realize the determination operation of the alignment units and the corresponding gear ratios of the alignment units from among the candidate alignment units.
S1, selecting an unselected candidate alignment unit as a current processing unit according to the sequence of the unit serial numbers.
In this embodiment, the candidate alignment units in the to-be-aligned rhythm segment have corresponding unit serial numbers, and this step may first select, according to the sequence of the unit serial numbers, a candidate alignment unit that is not selected before as the current processing unit, where the unselected unit may be understood as not being selected as the current processing unit.
Illustratively, this step first selects the first candidate processing unit as the current processing unit.
S2, according to the unit duration of the current processing unit, combining the starting and ending time of the characters respectively matched in the current processing unit and the next adjacent candidate alignment unit and the starting position of the first vowel, and determining the current speed ratio of the current processing unit.
According to the above description of the present embodiment, it can be known that the alignment of the text and the candidate alignment unit is mainly represented by the alignment of the actual pronunciation time length of the text and the unit time length of the candidate alignment unit, and the alignment of the two units can be specifically achieved by stretching or compressing the pronunciation time length of the text, and the stretching or compressing the pronunciation time length of the text can be determined by a speed ratio. And the speed ratio is equivalent to the ratio of the pronunciation time length to the actual pronunciation time length of the characters.
It should be noted that, for a word, the actual pronunciation time starts from the initial position of the first vowel, and the actual pronunciation end time can be regarded as the initial position of the first vowel of the next word. When considering the combination of the text and the candidate alignment units, the time length occupied by the actual pronunciation of all the matched text in one candidate alignment unit should be from the first vowel position of the first matched text in the candidate alignment unit to the end of the first vowel position of the first matched text in the next candidate alignment unit adjacent to the candidate alignment unit. Therefore, the actual pronunciation time length of all the characters matched in the current processing unit can be determined through the start and stop time of the characters matched in the current processing unit and the initial position of the first vowel in the next adjacent candidate alignment unit, and the current speed ratio of the current processing unit can be obtained according to the known unit time length and the determined actual pronunciation time length.
Specifically, in this embodiment, according to the unit duration of the current processing unit, in combination with the start-stop time of the text and the start position of the first vowel in the current processing unit and the adjacent next candidate alignment unit, the current speed ratio of the current processing unit may be further determined to be optimized:
s21, determining pronunciation occupation time of all the matched characters in the current processing unit according to start and stop time of all the matched characters and initial positions of first vowels of the current processing unit.
The step can acquire matching character attribute information of all characters matched in the current processing unit, specifically, the start and stop time of each character and the initial position of the first reason of the character, and based on the information, the pronunciation occupation time of all the matched characters in the current processing unit can be determined.
For example, assuming that there is only one text currently in the current processing unit, the start-stop time of the text is t1 and t2, respectively, and the start position of the first vowel is t3, where t1< t3< t2, the pronunciation occupation duration of the text in the current processing unit is actually t2-t3.
Assuming that there are two characters in the current processing unit, the start and stop time of the first character is t1 and t2 respectively, the start position of the first vowel is t3, the start and stop time of the second character is t2 and t4 respectively, and the start position of the first vowel is t5, wherein t1< t3< t2< t5< t4, the pronunciation occupation duration of the two characters in the current processing unit is as follows: t4-t3 is alternatively (t 2-t 3) + (t 4-t 5). It can be seen that the pronunciation occupation duration of all the characters matched in the current processing unit is only the difference between the sum of the start and stop durations of all the characters and the interval from the start to the first vowel of the first character.
It should be noted that, the pronunciation occupation duration is not the actual pronunciation duration of all the characters matched in the current processing unit, and the actual pronunciation duration also includes the vowel interval duration of the first character matched by the next candidate alignment unit adjacent to the current processing unit, where the vowel interval duration can be obtained through S22 described below. The purpose of determining the pronunciation occupation duration by adopting the above manner in this embodiment is to enable the first vowel position of each word in the speech segment to be aligned to the rhythm point of the matched candidate alignment processing unit, thereby ensuring that the playing effect of the word aligned by adopting the manner and the rhythm point is better than the playing effect of directly aligning the prefix with the rhythm point.
S22, determining the vowel interval duration of the first character according to the start-stop time and the start position of the first vowel of the first character matched by the next candidate alignment unit adjacent to the current processing unit.
Taking the example that the current processing unit includes two characters as an example, assuming that the start and stop time of the first character in the characters matched by the next candidate alignment unit adjacent to the current processing unit is t4 and t6, the starting position of the first vowel is t7, where t1< t3< t2< t5< t4< t7< t6, and the vowel interval duration of the first character is t7-t4.
Through S21 and S22, it may be determined that the actual pronunciation duration of all the characters matched in the current processing unit is the pronunciation occupation duration of all the characters matched in the current processing unit and the vowel interval duration of the first character matched in the next candidate alignment unit. In the above example, the actual pronunciation time of all the characters matched in the current processing unit is (t 4-t 3) + (t 7-t 4).
S23, taking the ratio of the unit duration of the current processing unit to the determined actual pronunciation duration as the current variable speed ratio of the current processing unit, wherein the actual pronunciation duration is the sum of the pronunciation duration and the vowel interval duration.
In the above example, assuming that the unit duration of the current processing unit is t, the current gear ratio of the current processing unit may be expressed as: t/[ (t 4-t 3) + (t 7-t 4) ].
S3, comparing the current speed ratio with a set first speed ratio and a set second speed ratio, wherein the second speed ratio is larger than the first speed ratio.
After determining the current gear ratio of the current processing unit through S2, the current gear ratio may be compared with the set first gear ratio and second gear ratio, so as to determine whether the matched text is stretched or compressed through the current gear ratio to satisfy the conventional stretching/compressing condition.
The present embodiment preferably sets the speed ratio between the first speed ratio and the second speed ratio to satisfy the stretch/compression condition, smaller than the first speed ratio and larger than the second speed ratio, without satisfying the compression condition.
And S4, if the current speed ratio is larger than or equal to a first speed ratio and smaller than or equal to a second speed ratio, determining the current processing unit as an alignment unit, recording the current speed ratio as the speed ratio of the alignment unit, and executing S7.
Specifically, when the current gear ratio is greater than or equal to the first gear ratio and less than or equal to the second gear ratio, the current gear ratio of the current processing unit is considered to satisfy the conventional tension/compression condition, at which time the current processing unit may be directly regarded as an alignment unit, and the current gear ratio may be regarded as the gear ratio of the alignment unit, and the operation is continued by jumping to S7.
And S5, if the current speed change ratio is larger than the second speed change ratio, determining mute time for filling the current processing unit, determining a new current speed change ratio according to the mute time, and executing S3.
Specifically, when the current gear ratio is greater than the second gear ratio, the current gear ratio of the current processing unit is considered to not meet the conventional stretching condition, and at this time, the unit duration corresponding to the current processing unit is longer than the actual pronunciation duration of all the characters matched, and a mute duration needs to be added to the current processing unit, so that the actual pronunciation duration of the characters is increased.
The step determines that the added mute time length is preferably the start-stop time length of a word, and therefore, the step redetermines the current gear ratio by taking the unit time length as a numerator and the sum of the mute time length and the actual sound-producing time length as a denominator according to the combination of the mute time length and the determined actual sound-producing time length. And after that, returns to S3 again to perform the comparison operation of the gear ratio.
And S6, if the current speed ratio is smaller than the first speed ratio, merging the current processing unit with the next adjacent candidate alignment unit to form a new current processing unit, and returning to the step S2.
Specifically, when the current gear ratio is smaller than the first gear ratio, the current gear ratio of the current processing unit is considered to not meet the conventional compression condition, and at this time, the unit duration corresponding to the current processing unit is excessively smaller than the actual pronunciation duration of all the characters matched.
The candidate alignment unit to be incorporated is preferably the next candidate alignment unit adjacent to the existing current processing unit, at this time, the unit duration of the newly formed current processing unit is the sum of the unit duration corresponding to the original unit duration and the next candidate alignment unit, and then, the actual pronunciation duration of all characters matched in the newly formed current processing unit can be returned to S2 for recalculation.
It should be noted that, in this embodiment, the operation of selecting the next candidate alignment unit and incorporating the next candidate alignment unit into the existing current processing unit is performed, and it is considered that the selected next candidate alignment unit is already selected, and when S1 needs to be executed later, the selection of the next candidate alignment unit may be skipped, and the next candidate alignment unit is not selected alone as the current processing unit.
S7, judging whether all candidate alignment units are selected to participate in the processing, if so, executing S8; if not, returning to execute S1;
after determining an alignment unit through the above steps, if there may still exist unselected candidate alignment units in the to-be-aligned rhythm segment, a decision may be made through this step, and if all candidate alignment units are selected to participate in the above processing, S8 may be executed, otherwise, it is necessary to return to S1 to reselect an unselected candidate alignment unit to perform the above operation.
S8, summarizing the determined alignment units and corresponding speed ratios.
The step may summarize the above-determined alignment units and the corresponding speed ratios to obtain at least one alignment unit and a speed ratio included in the rhythm segment to be aligned.
The implementation process of the effective alignment unit and the corresponding speed ratio in the rhythm segment to be aligned is provided by the alternative embodiment, through the implementation of the alternative embodiment, the effective alignment of the rhythm points in the rhythm segment to be aligned and the characters in the voice segment can be ensured, the occurrence of dislocation of the voice characters and the music rhythm points is avoided, and therefore effective theoretical support is provided for the conversion from voice to rap music in the alternative embodiment.
Example III
Fig. 6 is a block diagram of an apparatus for converting speech into rap music according to a third embodiment of the present invention, where the apparatus is suitable for rap music conversion on speech recorded by a user, and the apparatus may be implemented by software or hardware, and may be generally integrated on a computer device. As shown in fig. 6, the apparatus includes: an information determination module 31, an alignment information determination module 32, and a conversion control module 33.
An information determining module 31, configured to identify the obtained speech segment and process the selected background music, and obtain text attribute information of text in the speech segment and music rhythm information of the background music;
an alignment information determining module 32, configured to determine at least one alignment period for aligning the speech segment with the background music according to the text attribute information and the music tempo information, and obtain an alignment information table of each of the alignment periods;
The conversion control module 33 is configured to control, according to each of the alignment information tables, alignment of text in the speech segment with a rhythm point in the background music, and form a rap audio after pitch-shifting and special effects processing.
The device for converting the voice into the rap music provided by the third embodiment of the invention effectively realizes the conversion of the voice content segments randomly recorded by the user into the rap segments matched with the background music, simplifies the complicated process of manual audio editing and provides the possibility of rap music production for non-professional audio processing staff; meanwhile, compared with the existing voice conversion rap method, the voice content to be converted is not required to be limited, free recording of the voice content to be converted is guaranteed, the implementation process of voice conversion is simplified, the situation that voice characters are misplaced with music rhythm points is avoided, and the application range of voice conversion rap music is enlarged.
Example IV
Fig. 7 is a schematic hardware structure of a computer device according to a fourth embodiment of the present invention, and specifically, the computer device includes: a processor and a storage device. At least one instruction is stored in the storage means and executed by the processor, causing the computer device to perform the method of converting speech to rap music as described in the method embodiments above.
Referring to fig. 7, the computer device may specifically include: a processor 40, a storage device 41, a display 42, an input device 43, an output device 44, and a communication device 45. The number of processors 40 in the computer device may be one or more, one processor 40 being illustrated in fig. 6. The number of storage means 41 in the computer device may be one or more, one storage means 41 being exemplified in fig. 7. The processor 40, the storage means 41, the display 42, the input means 43, the output means 44 and the communication means 45 of the computer device may be connected by a bus or by other means, in fig. 7 by way of example.
Specifically, in the embodiment, when the processor 40 executes one or more programs stored in the storage device 41, the following operations are specifically implemented: identifying the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music; determining at least one alignment period for aligning the speech segment with the background music according to the text attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period; and controlling the alignment of the characters in the voice section and the rhythm points in the background music according to each alignment information table, and forming the rap audio after tone changing adjustment and special effect processing.
The embodiment of the present invention also provides a computer-readable storage medium, in which a program is executed by a processor of a computer device, to enable the computer device to perform the method of converting speech into rap music as described in the above embodiment. Illustratively, the method for converting speech into rap music according to the above embodiment includes: identifying the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music; determining at least one alignment period for aligning the speech segment with the background music according to the text attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period; and controlling the alignment of the characters in the voice section and the rhythm points in the background music according to each alignment information table, and forming the rap audio after tone changing adjustment and special effect processing.
It should be noted that, for the apparatus, computer device, and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and the relevant points refer to the part of the description of the method embodiments.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., comprising several instructions for causing a computer device (which may be a robot, a personal computer, a server, or a network device, etc.) to execute the method for converting speech into rap music according to any embodiment of the present invention.
It should be noted that, in the above-mentioned device for converting voice into rap music, each unit and module included are only divided according to the functional logic, but not limited to the above-mentioned division, so long as the corresponding functions can be realized; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution device. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (12)

1. A method of converting speech to rap music, comprising:
identifying the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music;
determining at least one alignment period for aligning the speech segment with the background music according to the text attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period;
according to each alignment information table, controlling the alignment of the characters in the voice section and the rhythm points in the background music, and forming the rap audio after tone changing adjustment and special effect processing;
the determining at least one alignment period for aligning the speech segment with the background music according to the text attribute information and the music rhythm information, and obtaining an alignment information table of each alignment period, includes:
determining at least one alignment period for aligning the speech segment with the background music according to the total text amount in the text attribute information and the period information of each beat period in the music rhythm information;
Selecting a complete alignment period as a to-be-aligned rhythm segment, and determining at least one alignment unit and corresponding alignment unit information according to the character attribute information and rhythm point information of to-be-aligned rhythm points in the to-be-aligned rhythm segment;
summarizing the alignment unit information to form a current alignment information table of the rhythm segment to be aligned, and determining the alignment information table of the rest alignment periods according to the current alignment information table.
2. The method of claim 1, wherein the identifying the obtained speech segment and processing the selected background music to obtain text attribute information for text within the speech segment and music tempo information for the background music comprises:
noise reduction processing and end point detection processing are carried out on the voice section selected by the user, the character serial numbers, the start and stop time, the initial position of the first vowel and the total quantity of characters of each character in the voice section are obtained through voice recognition of the processed voice section, and character attribute information of the voice section is formed;
detecting rhythm points and dividing beat periods of background music selected by a user, determining the total quantity of the rhythm points, the sequence number of the rhythm points and the period information of each beat period contained in the background music, and forming music rhythm information of the background music;
Wherein the period information includes: the cycle number, the number of rhythm points of the rhythm points included in the cycle of the beat, the sequence number of the rhythm points of each rhythm point and the starting time of the rhythm points.
3. The method according to claim 2, further comprising, before determining that the background music contains the total amount of tempo points, the tempo point number and cycle information of each beat cycle, the steps of:
acquiring detected initial rhythm points, and determining interval duration formed by two adjacent initial rhythm points;
and determining the rhythm points to be deleted in the initial rhythm points according to the average word length of the words included in the voice segment and the interval time length, and deleting the rhythm points to be deleted to obtain the effective rhythm points in the background music.
4. The method according to claim 1, wherein said determining at least one alignment period for aligning the speech segment with the background music based on the total amount of words in the word attribute information and period information of each beat period in the music tempo information includes:
judging whether the number of rhythm points in period information corresponding to a complete beat period is larger than or equal to the total text amount or not;
If yes, each beat period is regarded as an alignment period;
if not, when the number of the beat periods included in the background music is greater than 1, merging the beat periods in pairs according to the arrangement sequence of the period numbers to form at least one new beat period, and returning to continuously judging the number of the beat points and the total text.
5. The method of claim 1, wherein determining the alignment information table for each remaining alignment period based on the current alignment information table comprises:
for each remaining alignment period, if the alignment period is a complete period, taking the current alignment information table as an alignment information table of the alignment period;
if the alignment period is a non-complete period, determining the number of target points of rhythm points included in the alignment period;
and selecting the alignment unit information of the target point number rows in the current alignment information table in an inverted order to form an alignment information table of the alignment period.
6. The method according to claim 1, wherein determining at least one alignment unit and corresponding alignment unit information according to the text attribute information and the tempo point information of the tempo point to be aligned in the tempo segment to be aligned comprises:
Forming rhythm blocks to be aligned, which are in one-to-one correspondence with all rhythm points to be aligned, based on rhythm point information of the rhythm points to be aligned in the rhythm segment to be aligned, and recording the number of the rhythm points to be aligned as the initial number of remaining points;
determining the ratio of the number of the remaining points to the total Chinese amount in the text attribute information, and marking the ratio as a value matched with Ji Pi;
searching a preset rhythm point-character alignment rule table, and determining the length ratio range of the alignment matching value;
determining rhythm blocks to be aligned with matched characters according to an alignment rule corresponding to the length ratio range, and marking the rhythm blocks as candidate alignment units;
counting the number of blocks of the remaining rhythm blocks to be aligned, taking the number of blocks as the number of new remaining points, and returning to execute the determining operation of the alignment matching value again until the number of the remaining points is 0;
determining at least one alignment unit according to the unit time length of each candidate alignment unit and the matching character attribute information of the matched characters, and obtaining a corresponding speed ratio;
and determining the unit serial numbers of the alignment units, the initial rhythm point serial numbers in the included rhythm points, the character serial numbers of the matched characters and the speed ratio as corresponding alignment unit information.
7. The method of claim 6, wherein determining at least one aligned cell and obtaining a corresponding gear ratio based on cell duration of each candidate aligned cell in combination with matching text attribute information of the matched text, comprises:
a) Selecting an unselected candidate alignment unit as a current processing unit according to the sequence of the unit serial numbers;
b) According to the unit time length of the current processing unit, combining the starting and ending time of the characters matched in the current processing unit and the next adjacent candidate alignment unit and the starting position of the first vowel respectively, determining the current speed ratio of the current processing unit;
c) Comparing the current speed ratio with a set first speed ratio and a set second speed ratio, wherein the second speed ratio is larger than the first speed ratio;
d) If the current speed ratio is greater than or equal to a first speed ratio and less than or equal to a second speed ratio, determining the current processing unit as an alignment unit and noting that the current speed ratio is the speed ratio of the alignment unit, and then executing step g);
e) If the current speed change ratio is larger than the second speed change ratio, determining mute time length for filling the current processing unit, determining a new current speed change ratio according to the mute time length, and returning to the step c);
f) If the current speed ratio is smaller than the first speed ratio, merging the current processing unit with the next adjacent candidate alignment unit to form a new current processing unit, and returning to the step b);
g) Judging whether all candidate alignment units are selected to participate in the processing, if so, executing the step h); if not, returning to the step a);
h) The determined alignment units and the corresponding gear ratios are summed.
8. The method of claim 7, wherein the determining the current gear ratio of the current processing unit according to the unit duration of the current processing unit and in combination with the start-stop time of the word and the start position of the first vowel in the current processing unit and the next adjacent candidate alignment unit, respectively, comprises:
determining pronunciation occupation time length of all the matched characters in the current processing unit according to the starting and ending time of all the matched characters and the starting position of the first vowel of the current processing unit;
determining the vowel interval duration of the first character according to the start-stop time of the first character and the initial position of the first vowel matched by the next candidate alignment unit adjacent to the current processing unit;
And taking the ratio of the unit duration of the current processing unit to the determined actual pronunciation duration as the current speed ratio of the current processing unit, wherein the actual pronunciation duration is the sum of the pronunciation duration and the vowel interval duration.
9. The method according to any one of claims 1 to 8, further comprising, before said determining at least one alignment period for aligning said speech segment with said background music based on said text attribute information and said music tempo information, and obtaining an alignment information table for each of said alignment periods:
and if the total text amount in the text attribute information is larger than the total rhythm point amount in the music rhythm information, ending the process of converting the voice segment into the rap music and giving a prompt of reacquiring the voice segment or background music.
10. An apparatus for converting speech to rap music, comprising:
the information determining module is used for identifying the obtained voice segment and processing the selected background music to obtain character attribute information of characters in the voice segment and music rhythm information of the background music;
an alignment information determining module, configured to determine at least one alignment period for aligning the speech segment with the background music according to the text attribute information and the music tempo information, and obtain an alignment information table of each alignment period;
The conversion control module is used for controlling the alignment of the characters in the voice section and the rhythm points in the background music according to each alignment information table, and forming the rap audio after tone changing and adjustment and special effect processing;
the alignment information determining module is specifically configured to:
determining at least one alignment period for aligning the speech segment with the background music according to the total text amount in the text attribute information and the period information of each beat period in the music rhythm information;
selecting a complete alignment period as a to-be-aligned rhythm segment, and determining at least one alignment unit and corresponding alignment unit information according to the character attribute information and rhythm point information of to-be-aligned rhythm points in the to-be-aligned rhythm segment;
summarizing the alignment unit information to form a current alignment information table of the rhythm segment to be aligned, and determining the alignment information table of the rest alignment periods according to the current alignment information table.
11. A computer device, comprising:
one or more processors;
a storage means for storing one or more programs;
the one or more programs being executable by the one or more processors to cause the one or more processors to implement the method of converting speech to rap music as claimed in any one of claims 1 to 9.
12. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a method of converting speech into rap music as claimed in any one of claims 1-9.
CN202010688502.3A 2020-07-16 2020-07-16 Method, device, equipment and storage medium for converting voice into rap music Active CN111862913B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010688502.3A CN111862913B (en) 2020-07-16 2020-07-16 Method, device, equipment and storage medium for converting voice into rap music
PCT/CN2021/095236 WO2022012164A1 (en) 2020-07-16 2021-05-21 Method and apparatus for converting voice into rap music, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010688502.3A CN111862913B (en) 2020-07-16 2020-07-16 Method, device, equipment and storage medium for converting voice into rap music

Publications (2)

Publication Number Publication Date
CN111862913A CN111862913A (en) 2020-10-30
CN111862913B true CN111862913B (en) 2023-09-05

Family

ID=72984100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010688502.3A Active CN111862913B (en) 2020-07-16 2020-07-16 Method, device, equipment and storage medium for converting voice into rap music

Country Status (2)

Country Link
CN (1) CN111862913B (en)
WO (1) WO2022012164A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111862913B (en) * 2020-07-16 2023-09-05 广州市百果园信息技术有限公司 Method, device, equipment and storage medium for converting voice into rap music
CN113823281B (en) * 2020-11-24 2024-04-05 北京沃东天骏信息技术有限公司 Voice signal processing method, device, medium and electronic equipment
CN112669849A (en) * 2020-12-18 2021-04-16 百度国际科技(深圳)有限公司 Method, apparatus, device and storage medium for outputting information
CN112712783B (en) * 2020-12-21 2023-09-29 北京百度网讯科技有限公司 Method and device for generating music, computer equipment and medium
CN112700781B (en) * 2020-12-24 2022-11-11 江西台德智慧科技有限公司 Voice interaction system based on artificial intelligence
CN114566191A (en) * 2022-02-25 2022-05-31 腾讯音乐娱乐科技(深圳)有限公司 Sound correcting method for recording and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5811707A (en) * 1994-06-24 1998-09-22 Roland Kabushiki Kaisha Effect adding system
CN101399036A (en) * 2007-09-30 2009-04-01 三星电子株式会社 Device and method for conversing voice to be rap music
CN103440862A (en) * 2013-08-16 2013-12-11 北京奇艺世纪科技有限公司 Method, device and equipment for synthesizing voice and music
CN107170464A (en) * 2017-05-25 2017-09-15 厦门美图之家科技有限公司 A kind of changing speed of sound method and computing device based on music rhythm
CN111402843A (en) * 2020-03-23 2020-07-10 北京字节跳动网络技术有限公司 Rap music generation method and device, readable medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103035235A (en) * 2011-09-30 2013-04-10 西门子公司 Method and device for transforming voice into melody
CN105931625A (en) * 2016-04-22 2016-09-07 成都涂鸦科技有限公司 Rap music automatic generation method based on character input
CN111862913B (en) * 2020-07-16 2023-09-05 广州市百果园信息技术有限公司 Method, device, equipment and storage medium for converting voice into rap music

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5811707A (en) * 1994-06-24 1998-09-22 Roland Kabushiki Kaisha Effect adding system
CN101399036A (en) * 2007-09-30 2009-04-01 三星电子株式会社 Device and method for conversing voice to be rap music
CN103440862A (en) * 2013-08-16 2013-12-11 北京奇艺世纪科技有限公司 Method, device and equipment for synthesizing voice and music
CN107170464A (en) * 2017-05-25 2017-09-15 厦门美图之家科技有限公司 A kind of changing speed of sound method and computing device based on music rhythm
CN111402843A (en) * 2020-03-23 2020-07-10 北京字节跳动网络技术有限公司 Rap music generation method and device, readable medium and electronic equipment

Also Published As

Publication number Publication date
CN111862913A (en) 2020-10-30
WO2022012164A1 (en) 2022-01-20

Similar Documents

Publication Publication Date Title
CN111862913B (en) Method, device, equipment and storage medium for converting voice into rap music
CN107123415B (en) Automatic song editing method and system
CN105718503B (en) Voice search device and speech search method
CN112382257B (en) Audio processing method, device, equipment and medium
CN110782869A (en) Speech synthesis method, apparatus, system and storage medium
CN105161116A (en) Method and device for determining climax fragment of multimedia file
CN105206264A (en) Speech synthesis method and device
CN111354325A (en) Automatic word and song creation system and method thereof
JPH11272274A (en) Method for retrieving piece of music by use of singing voice
CN109841203B (en) Electronic musical instrument music harmony determination method and system
CN110134823B (en) MIDI music genre classification method based on normalized note display Markov model
CN110942765A (en) Method, device, server and storage medium for constructing corpus
CN111785236A (en) Automatic composition method based on motivational extraction model and neural network
JP4395493B2 (en) Karaoke equipment
CN113571030A (en) MIDI music correction method and device based on auditory sense harmony evaluation
CN112825244B (en) Music audio generation method and device
CN109033110B (en) Method and device for testing quality of extended questions in knowledge base
CN112528631B (en) Intelligent accompaniment system based on deep learning algorithm
JP6565416B2 (en) Voice search device, voice search method and program
KR100762079B1 (en) Automatic musical composition method and system thereof
Köküer et al. Curating and annotating a collection of traditional Irish flute recordings to facilitate stylistic analysis
JP2005242231A (en) Device, method, and program for speech synthesis
KR100981540B1 (en) Speech recognition method of processing silence model in a continous speech recognition system
CN113140202A (en) Information processing method, information processing device, electronic equipment and storage medium
JP2003022091A (en) Method, device, and program for voice recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant