CN112420004A - Method and device for generating songs, electronic equipment and computer readable storage medium - Google Patents
Method and device for generating songs, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN112420004A CN112420004A CN201910779948.4A CN201910779948A CN112420004A CN 112420004 A CN112420004 A CN 112420004A CN 201910779948 A CN201910779948 A CN 201910779948A CN 112420004 A CN112420004 A CN 112420004A
- Authority
- CN
- China
- Prior art keywords
- information
- character
- pitch information
- pitch
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000012545 processing Methods 0.000 claims abstract description 51
- 238000005070 sampling Methods 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 11
- 230000003595 spectral effect Effects 0.000 claims description 10
- 238000002372 labelling Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 abstract description 9
- 230000006870 function Effects 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 230000033764 rhythmic process Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000000737 periodic effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 238000005034 decoration Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 210000003928 nasal cavity Anatomy 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/361—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
- G10H1/366—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems with means for modifying or correcting the external signal, e.g. pitch correction, reverberation, changing a singer's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The embodiment of the application provides a method and a device for generating songs, electronic equipment and a computer readable storage medium. Relates to the technical field of voice processing, and the method comprises the following steps: the method comprises the steps of extracting a plurality of first pitch information corresponding to melody information from audio information selected by a user, extracting parameter information related to tone corresponding to each character from voice information input by the user, determining first pitch information corresponding to each character based on the first pitch information corresponding to the melody information, and generating a song based on the first pitch information corresponding to each character and the parameter information related to tone corresponding to each character. According to the embodiment of the application, the effect of generating the song is improved, the user does not need to have higher singing skill, and the user experience can be improved.
Description
Technical Field
The present application relates to the field of speech processing technologies, and in particular, to a method and an apparatus for generating a song, an electronic device, and a computer-readable storage medium.
Background
With the development of information technology, various application software, particularly application software related to songs, for example, a singing synthesis system, has come to be produced.
The existing singing synthesis system plays the melody of the song selected by the user or displays the prompt message aiming at the melody of the song selected by the user on the screen, the user needs to sing according to the played melody or the prompt message aiming at the melody of the song displayed on the screen and records the singing, and the singing sound of the user and the melody of the song selected by the user are simply combined to obtain the combined song.
However, in the above-described singing synthesis system, the user is required to know the melody of the song to be recorded, and the singing voice of the user for the song is recorded, so that the requirement for using the singing skill of the user is high; in addition, in the synthesis process, only the melody is added into the recorded singing voice of the user, the effect of the combined song is poor, and the user experience is low.
Disclosure of Invention
The application provides a method, an apparatus, an electronic device and a computer-readable storage medium for generating a song, which are used for at least one technical problem, and the specific technical scheme is as follows:
in a first aspect, a method for generating a song is provided, the method comprising:
extracting a plurality of first pitch information corresponding to the melody information from the audio information selected by the user;
extracting parameter information which is corresponding to each character and is related to tone from voice information input by a user;
determining first pitch information corresponding to each character based on a plurality of first pitch information corresponding to the melody information;
and generating the song based on the first pitch information corresponding to each character and the parameter information related to tone corresponding to each character.
In one possible implementation manner, determining, based on a plurality of first pitch information corresponding to the melody information, first pitch information corresponding to each word includes:
respectively determining first pitch information for replacing each second pitch information from a plurality of first pitch information corresponding to the melody information based on the matching relationship between the melody information and each character in the voice information input by the user, and obtaining first pitch information corresponding to each character;
and each second pitch information is the original pitch information corresponding to each word in the voice information input by the user.
In another possible implementation manner, generating a song based on the first pitch information corresponding to each word and the parameter information related to tone color corresponding to each word includes:
generating a song based on the first class pitch information, the second class pitch information and the parameter information which is corresponding to each character and is related to the tone;
wherein the first type pitch information comprises: first pitch information for replacing each second pitch information; the second type pitch information includes: pitch information except the first pitch information in the plurality of first pitch information corresponding to the melody information.
In another possible implementation manner, generating a song based on the first pitch information corresponding to each word and the parameter information related to tone color corresponding to each word, before further including:
the method comprises the following steps of presetting parameter information which corresponds to each character and is related to tone colors, wherein the presetting comprises the following steps: at least one of interpolation processing and sampling processing;
the generating of the song based on the first pitch information corresponding to each character and the parameter information related to tone corresponding to each character includes:
and generating a song based on the first pitch information corresponding to each character and the parameter information which is corresponding to each preset character and is related to the tone after the preset processing.
In another possible implementation manner, the pre-setting processing is performed on the parameter information related to the tone color corresponding to each word, and the method further includes:
marking the start time and the cut-off time corresponding to each note in the melody information to obtain marked melody information;
the method for presetting the parameter information which corresponds to each character and is related to the tone comprises the following steps:
and presetting the parameter information related to the tone corresponding to each character based on the labeled melody information.
In another possible implementation manner, extracting, from the speech information input by the user, parameter information related to timbre corresponding to each word, previously further includes:
acquiring voice information input by a user;
and denoising the voice information input by the user.
In another possible implementation, the parameter information related to timbre includes at least one of:
spectral envelope parameters SP; an aperiodic sequence signal AP.
In a second aspect, there is provided an apparatus for generating a song, the apparatus comprising:
the first extraction module is used for extracting a plurality of first pitch information corresponding to the melody information from the audio information selected by the user;
the second extraction module is used for extracting the parameter information which corresponds to each character and is related to the tone from the voice information input by the user;
the determining module is used for determining first pitch information corresponding to each character based on a plurality of first pitch information corresponding to the melody information;
and the generating module is used for generating the song based on the first pitch information corresponding to each character and the parameter information which is corresponding to each character and is related to the tone.
In a possible implementation manner, the determining module is specifically configured to determine, based on a matching relationship between the melody information and each word in the voice information input by the user, first pitch information used for replacing each second pitch information from a plurality of first pitch information corresponding to the melody information, respectively, to obtain first pitch information corresponding to each word;
and each second pitch information is the original pitch information corresponding to each word in the voice information input by the user.
In another possible implementation manner, the generating module is specifically configured to generate a song based on the first class pitch information, the second class pitch information, and the parameter information related to the tone color and corresponding to each word;
wherein the first type pitch information comprises: first pitch information for replacing each second pitch information; the second type pitch information includes: pitch information except the first pitch information in the plurality of first pitch information corresponding to the melody information.
In another possible implementation manner, the apparatus further includes: presetting a processing module, wherein,
the preset processing module is used for carrying out preset processing on the parameter information which corresponds to each character and is related to the tone, and the preset processing comprises the following steps: at least one of interpolation processing and sampling processing;
and the generating module is specifically used for generating the song based on the first pitch information corresponding to each character and the parameter information related to the tone corresponding to each preset processed character.
In another possible implementation manner, the apparatus further includes: a labeling module, wherein,
the marking module is used for marking the starting time and the ending time corresponding to each note in the melody information respectively to obtain the marked melody information;
and the presetting processing module is specifically used for presetting the tone-related parameter information corresponding to each character based on the labeled melody information.
In another possible implementation manner, the apparatus further includes: an acquisition module and a denoising processing module, wherein,
the acquisition module is used for acquiring voice information input by a user;
and the denoising processing module is used for denoising the voice information input by the user.
In another possible implementation, the parameter information related to timbre includes at least one of:
spectral envelope parameters SP; an aperiodic sequence signal AP.
In a third aspect, an electronic device is provided, which includes:
one or more processors;
a memory;
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: and executing the operation corresponding to the method for generating the song shown in the first aspect or any possible implementation manner of the first aspect.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the method of generating songs as shown in the first aspect or any possible implementation manner of the first aspect.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
compared with the prior art, the method, the device, the electronic equipment and the computer-readable storage medium for generating the song extract a plurality of first pitch information corresponding to melody information from audio information selected by a user, extract parameter information related to tone color and corresponding to each character from voice information input by the user, determine the first pitch information corresponding to each character based on the first pitch information corresponding to the melody information, and generate the song based on the first pitch information corresponding to each character and the parameter information related to tone color and corresponding to each character. That is, when the song is generated, the song is generated based on the first pitch information in the melody information selected by the user and the parameter information related to the tone in the voice information input by the user, and the melody is not simply added to the singing voice recorded by the user, so that the effect of generating the song can be improved, and the user only needs the voice information input by the user when generating the song, so that the user does not need to have higher singing skill, and the user experience can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a method for generating songs according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of an apparatus for generating songs according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device for generating songs according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
An embodiment of the present application provides a method for generating a song, as shown in fig. 1, the method includes:
step S101, extracting a plurality of first pitch information corresponding to the melody information from the audio information selected by the user.
For the embodiment of the present application, the audio information selected by the user may only include the melody information, or may include the melody information and information other than the melody information. The embodiments of the present application are not limited.
For the present embodiment, pitch refers to one of the basic features of sound of various heights, i.e., the height of the sound. The sound level is determined by the vibration frequency, and the sound level and the vibration frequency are in positive correlation: the frequency (i.e., how many times the vibration is performed per unit time) is high, the sound is "high", and otherwise, the sound is "low".
Step S102, extracting parameter information related to tone color corresponding to each character from voice information input by a user.
For the present embodiment, timbre refers to the fact that the frequency representation of different sounds always has distinctive characteristics in terms of waveform. In the embodiment of the present application, the parameter information related to the tone color includes: at least one of spectral envelope parameters SP and a non-periodic sequence signal AP.
For the embodiment of the application, the voice is a complex multi-frequency signal, each frequency component has different amplitude, and when the frequency components are arranged according to the size of the frequency, a curve trained by the top end of the voice becomes a voice spectrum envelope. The shape of the envelope varies with the sound emitted. The sound waves generated by the vocal cords vibrate to resonate when passing through the vocal tract formed by the oral cavity, the nasal cavity, and the like. Some regions of the spectrum are emphasized as a result of resonance. Therefore, the shape of the spectral envelope varies from person to person.
For the embodiment of the present application, step S101 may be executed before step S102, may be executed after step S102, and may also be executed simultaneously with step S102. The embodiments of the present application are not limited.
Step S103, determining first pitch information corresponding to each character based on the plurality of first pitch information corresponding to the melody information.
For the embodiment of the application, the first pitch information corresponding to each character is selected from the plurality of first pitch information corresponding to the melody information, and is used as the first pitch information corresponding to each character extracted from the voice information input by the user.
And step S104, generating a song based on the first pitch information corresponding to each character and the parameter information related to tone corresponding to each character.
For the embodiment of the present application, a song is generated based on the first pitch information corresponding to each word determined in step S103 and the parameter information related to the tone color corresponding to each word extracted in step S102.
Compared with the prior art, the method for generating the song comprises the steps of extracting a plurality of first pitch information corresponding to melody information from audio information selected by a user, extracting parameter information related to tone color corresponding to each character from voice information input by the user, determining the first pitch information corresponding to each character based on the first pitch information corresponding to the melody information, and generating the song based on the first pitch information corresponding to each character and the parameter information related to tone color corresponding to each character. That is, in the embodiment of the present application, when a song is generated, the song is generated based on the first pitch information in the melody information selected by the user and the parameter information related to the tone in the voice information input by the user, and it is not simple to add a melody to the singing sound recorded by the user, so that the effect of generating the song can be improved, and the user only needs the voice information input by the user when the song is generated, so that the user does not need to have a higher singing skill, and further, the user experience can be improved.
In a possible implementation manner of the embodiment of the present application, step S103 may specifically include: step S1031 (not shown in the figure), in which,
and step S1031, respectively determining first pitch information for replacing each second pitch information from a plurality of first pitch information corresponding to the melody information based on the matching relationship between the melody information and each character in the voice information input by the user, and obtaining the first pitch information corresponding to each character.
And each second pitch information is the original pitch information corresponding to each word in the voice information input by the user.
For the embodiment of the application, the first pitch information corresponding to the note matched with each character in the voice information input by the user is determined from the melody information, and the first pitch information corresponding to the note information matched with each character in the voice information input by the user in the melody information is replaced by the second pitch information corresponding to each character or is used as the pitch information corresponding to each character in the voice information input by the user.
In another possible implementation manner of the embodiment of the present application, step S104 may specifically include: step S1041 (not shown), in which,
and S1041, generating a song based on the first class pitch information, the second class pitch information and the parameter information which is respectively corresponding to each character and is related to the tone.
Wherein the first type pitch information comprises: first pitch information for replacing each second pitch information; the second type pitch information includes: pitch information except the first pitch information in the plurality of first pitch information corresponding to the melody information.
For the embodiment of the present application, the second type of pitch information may include: pitch information corresponding to prelude or interlude and the like besides the first type pitch information; in the embodiment of the application, the song is generated based on the first class pitch information, the second class pitch information and the parameter information which is corresponding to each character and is related to the tone, the integrity of the generated song can be improved, the effect of the generated song can be improved, and the user experience is improved.
In another possible implementation manner of the embodiment of the present application, step S104 may further include: step Sa (not shown in the figure), in which,
and step Sa, presetting the parameter information related to the tone corresponding to each character.
Wherein the preset treatment comprises: at least one of interpolation processing and sampling processing.
For the embodiment of the application, interpolation is an important method for discrete function approximation, and the approximation of the function at other points can be estimated by using the evaluation conditions of the function at a limited number of points.
For the embodiment of the application, interpolation processing and/or sampling processing are performed on the parameter information related to the tone color and the parameter information related to the tone color, which are respectively corresponding to each character, so that the parameter information related to the tone color, which is respectively corresponding to each character after interpolation and/or sampling processing, is consistent with the length of the note corresponding to each character in the melody.
For the embodiment of the present application, the parameter information related to the tone color may include: at least one of spectral envelope parameters SP and a non-periodic sequence signal AP. In this embodiment, step Sa may specifically include: carrying out interpolation processing and/or sampling processing on the spectrum envelope parameters respectively corresponding to each word; and performing interpolation processing and/or sampling processing on the non-periodic sequence signals respectively corresponding to the words.
In another possible implementation manner of the embodiment of the present application, on the basis of step Sa, step S104 may specifically include: step S1042 (not shown in the figure), in which,
step S1042, generating a song based on the first pitch information corresponding to each character and the parameter information related to the tone corresponding to each preset character.
For the embodiment of the application, the song is generated based on the first pitch information corresponding to each character and the parameter information related to the tone corresponding to each character after interpolation processing or sampling processing.
In another possible implementation manner of the embodiment of the present application, before the step Sa, the step may further include: step Sb (not shown in the figure), in which,
and Sb, marking the start time and the cut-off time corresponding to each note in the rhythm information to obtain the marked rhythm information.
For the embodiment of the application, the start time and the cut-off time corresponding to each note in the rhythm information are respectively marked through the trained marking model, so that the marked rhythm information is obtained.
For the embodiment of the application, melody information input by a user through a humming mode is obtained, for example, melody information input through a humming mode of 'cheela', corresponding audio features such as Mel frequency cepstrum coefficient MFCC are obtained from the melody information input by the user, and then the MFCC obtained from the melody information input by the user passes through a trained labeling model to label the starting time and the labeling time of each 'cheela' in the 'cheela'.
In another possible implementation manner of the embodiment of the present application, on the basis of step Sb, step Sa specifically may include: step Sa1 (not shown), in which,
step Sa1 is to perform a preset process on the parameter information related to the timbre corresponding to each character based on the labeled melody information.
For the embodiment of the application, by labeling the melody information, the start time and the end time of each note in the melody information can be determined, and interpolation processing or sampling processing is performed on the tone color information respectively corresponding to each character respectively corresponding to each note according to the determined start time and end time of each note.
For example, if the start time and the end time of the first note in the melody information are 0 '10 "and 0' 15", respectively, then the words corresponding to the first note in the speech information input by the user are interpolated or sampled according to the start time and the end time of the first note.
In another possible implementation manner of the embodiment of the present application, step S102 may further include: step Sc (not shown) and step Sd (not shown), wherein,
and step Sc, acquiring voice information input by a user.
And Sd, denoising the voice information input by the user.
For the embodiment of the application, the voice information input by the user is denoised by at least one of the following algorithms:
a minimum Mean Square error (LMS) adaptive filter; an adaptive notch filter of the LMS; performing spectral subtraction; wiener filtering method.
For the embodiment of the application, the LMS adaptive filter automatically adjusts the current filter parameter by using the filter parameter obtained at the previous moment to adapt to the unknown or randomly changing statistical characteristics of the signal and the noise, thereby realizing the optimal filtering.
For the embodiment of the application, the adaptive notch filter of the LMS is suitable for monochromatic interference noise, such as single-frequency sine wave noise, the characteristic of the notch filter is expected to be ideal, the shoulder of the notch is arbitrarily narrow, and the notch can immediately enter the flat area.
For the embodiment of the present application, spectral subtraction is used to perform noise reduction processing on the frequency domain of the speech signal.
For the embodiment of the present application, the wiener filtering method designs a digital filter h (n) so that the error between the input noisy speech signal and the clean speech signal satisfies the LMS criterion.
For the embodiment of the application, the parameter information related to the tone color and corresponding to each character is extracted from the voice information input by the user, so that the accuracy of extracting the parameter information related to the tone color and corresponding to each character from the voice information input by the user can be improved by denoising the voice information input by the user, the effect of generating songs can be further improved, and the user experience is improved.
The foregoing embodiment introduces a method for generating a song from the perspective of a method flow, and the following introduces an apparatus for generating a song from the perspective of a virtual module or a virtual unit, and the following apparatus for generating a song is applicable to the method for generating a song, and specifically as follows:
an embodiment of the present application provides an apparatus for generating a song, and as shown in fig. 2, the apparatus 20 for generating a song may specifically include: a first extraction module 21, a second extraction module 22, a determination module 23 and a generation module 24, wherein,
the first extraction module 21 is configured to extract a plurality of first pitch information corresponding to the melody information from the audio information selected by the user.
And a second extracting module 22, configured to extract, from the voice information input by the user, parameter information related to the tone color corresponding to each word.
For the embodiment of the present application, the first extraction module 21 and the second extraction module 22 may be the same extraction module, or may be different extraction modules. The embodiments of the present application are not limited.
The determining module 23 is configured to determine, based on the plurality of first pitch information corresponding to the melody information, first pitch information corresponding to each character.
And the generating module 24 is configured to generate a song based on the first pitch information corresponding to each word and the parameter information related to the tone color corresponding to each word.
In another possible implementation manner of the embodiment of the application, the determining module 23 is specifically configured to determine, based on a matching relationship between the melody information and each word in the voice information input by the user, first pitch information used for replacing each second pitch information from a plurality of first pitch information corresponding to the melody information, respectively, so as to obtain first pitch information corresponding to each word.
And each second pitch information is the original pitch information corresponding to each word in the voice information input by the user.
In another possible implementation manner of the embodiment of the present application, the generating module 24 is specifically configured to generate a song based on the first type pitch information, the second type pitch information, and the parameter information related to the tone color and corresponding to each word.
Wherein the first type pitch information comprises: first pitch information for replacing each second pitch information; the second type pitch information includes: pitch information except the first pitch information in the plurality of first pitch information corresponding to the melody information.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: presetting a processing module, wherein,
and the preset processing module is used for presetting the parameter information which corresponds to each character and is related to the tone.
Wherein the preset treatment comprises: at least one of interpolation processing and sampling processing.
The generating module 24 is specifically configured to generate a song based on the first pitch information corresponding to each word and the parameter information related to the tone color corresponding to each word after the preset processing.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a labeling module, wherein,
and the marking module is used for marking the starting time and the ending time corresponding to each note in the melody information respectively to obtain the marked melody information.
And the presetting processing module is specifically used for presetting the tone-related parameter information corresponding to each character based on the labeled melody information.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: an acquisition module and a denoising processing module, wherein,
the acquisition module is used for acquiring voice information input by a user;
and the denoising processing module is used for denoising the voice information input by the user.
In another possible implementation manner of the embodiment of the present application, the parameter information related to the tone includes at least one of the following:
spectral envelope parameters SP; an aperiodic sequence signal AP.
Compared with the prior art, the device for generating the song provided by the embodiment of the application extracts a plurality of first pitch information corresponding to melody information from audio information selected by a user, extracts tone-related parameter information corresponding to each character from voice information input by the user, determines the first pitch information corresponding to each character based on the first pitch information corresponding to the melody information, and generates the song based on the first pitch information corresponding to each character and the tone-related parameter information corresponding to each character. That is, in the embodiment of the present application, when a song is generated, the song is generated based on the first pitch information in the melody information selected by the user and the parameter information related to the tone in the voice information input by the user, and it is not simple to add a melody to the singing sound recorded by the user, so that the effect of generating the song can be improved, and the user only needs the voice information input by the user when the song is generated, so that the user does not need to have a higher singing skill, and further, the user experience can be improved.
The apparatus for generating a song according to this embodiment may execute the method for generating a song provided in the foregoing method embodiment, and the implementation principles thereof are similar, and are not described herein again.
The embodiment describes a method for generating a song from the perspective of a method flow and a device for generating a song from the perspective of a virtual module and a virtual unit, and the following describes an electronic device from the perspective of a physical structure, and is specifically as follows:
an embodiment of the present application provides an electronic device, as shown in fig. 3, an electronic device 3000 shown in fig. 3 includes: a processor 3001 and a memory 3003. The processor 3001 is coupled to the memory 3003, such as via a bus 3002. Optionally, the electronic device 3000 may further comprise a transceiver 3004. It should be noted that the transceiver 3004 is not limited to one in practical applications, and the structure of the electronic device 3000 is not limited to the embodiment of the present application.
The processor 3001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 3001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
The memory 3003 is used for storing application program codes for performing the present scheme, and is controlled to be executed by the processor 3001. The processor 3001 is configured to execute application program code stored in the memory 3003 to implement any of the method embodiments shown above.
An embodiment of the present application provides an electronic device, where the electronic device includes: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: the method and the device for generating the song comprise the steps of extracting a plurality of first pitch information corresponding to melody information from audio information selected by a user, extracting parameter information related to tone color corresponding to each character from voice information input by the user, determining first pitch information corresponding to each character based on the first pitch information corresponding to the melody information, and generating the song based on the first pitch information corresponding to each character and the parameter information related to tone color corresponding to each character. That is, in the embodiment of the present application, when a song is generated, the song is generated based on the first pitch information in the melody information selected by the user and the parameter information related to the tone in the voice information input by the user, and it is not simple to add a melody to the singing sound recorded by the user, so that the effect of generating the song can be improved, and the user only needs the voice information input by the user when the song is generated, so that the user does not need to have a higher singing skill, and further, the user experience can be improved.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the method and the device have the advantages that the first pitch information corresponding to the melody information is extracted from the audio information selected by the user, the parameter information related to the tone color corresponding to each character is extracted from the voice information input by the user, the first pitch information corresponding to each character is determined based on the first pitch information corresponding to the melody information, and the song is generated based on the first pitch information corresponding to each character and the parameter information related to the tone color corresponding to each character. That is, in the embodiment of the present application, when a song is generated, the song is generated based on the first pitch information in the melody information selected by the user and the parameter information related to the tone in the voice information input by the user, and it is not simple to add a melody to the singing sound recorded by the user, so that the effect of generating the song can be improved, and the user only needs the voice information input by the user when the song is generated, so that the user does not need to have a higher singing skill, and further, the user experience can be improved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (16)
1. A method of generating a song, comprising:
extracting a plurality of first pitch information corresponding to the melody information from the audio information selected by the user;
extracting parameter information which is corresponding to each character and is related to tone from voice information input by a user;
determining first pitch information corresponding to each character based on a plurality of first pitch information corresponding to the melody information;
and generating a song based on the first pitch information corresponding to each character and the parameter information related to tone corresponding to each character.
2. The method of claim 1, wherein the determining the first pitch information corresponding to each word based on the plurality of first pitch information corresponding to the melody information comprises:
respectively determining first pitch information for replacing each second pitch information from a plurality of first pitch information corresponding to the melody information based on the matching relationship between the melody information and each word in the voice information input by the user, and obtaining first pitch information corresponding to each word;
and each second pitch information is original pitch information corresponding to each word in the voice information input by the user.
3. The method of claim 2, wherein generating a song based on the first pitch information corresponding to each of the words and the parameter information related to timbre corresponding to each of the words comprises:
generating a song based on the first class pitch information, the second class pitch information and the parameter information which is respectively corresponding to each character and is related to the tone;
wherein the first type pitch information comprises: the first pitch information replacing each second pitch information; the second-class pitch information includes: and the pitch information except the first type pitch information in the plurality of first pitch information corresponding to the melody information.
4. The method of claim 1, wherein generating a song based on the first pitch information corresponding to each of the words and the parameter information related to timbre corresponding to each of the words further comprises:
and presetting the parameter information which is respectively corresponding to each character and is related to the tone, wherein the presetting comprises the following steps: at least one of interpolation processing and sampling processing;
wherein the generating a song based on the first pitch information corresponding to each character and the parameter information related to tone color corresponding to each character comprises:
and generating a song based on the first pitch information corresponding to each character and the parameter information which is corresponding to each character and is related to tone after the preset processing.
5. The method of claim 4, wherein the pre-setting of the parameter information related to tone color corresponding to each of the words further comprises:
marking the start time and the cut-off time corresponding to each note in the melody information to obtain marked melody information;
the preset processing is performed on the parameter information related to the tone color and corresponding to each character, and includes:
and presetting the parameter information related to the tone corresponding to each character based on the labeled melody information.
6. The method according to claim 1, wherein the extracting, from the speech information input by the user, the parameter information related to tone color corresponding to each word further comprises:
acquiring voice information input by a user;
and denoising the voice information input by the user.
7. The method according to any of claims 1-6, wherein the parameter information related to timbre comprises at least one of:
spectral envelope parameters SP; an aperiodic sequence signal AP.
8. An apparatus for generating songs, comprising:
the first extraction module is used for extracting a plurality of first pitch information corresponding to the melody information from the audio information selected by the user;
the second extraction module is used for extracting the parameter information which corresponds to each character and is related to the tone from the voice information input by the user;
the determining module is used for determining first pitch information corresponding to each character based on a plurality of first pitch information corresponding to the melody information;
and the generating module is used for generating a song based on the first pitch information corresponding to each character and the parameter information which is corresponding to each character and is related to tone.
9. The apparatus of claim 8,
the determining module is specifically configured to determine, based on a matching relationship between the melody information and each word in the voice information input by the user, first pitch information for replacing each second pitch information from multiple first pitch information corresponding to the melody information, respectively, and obtain first pitch information corresponding to each word;
and each second pitch information is original pitch information corresponding to each word in the voice information input by the user.
10. The apparatus of claim 9,
the generating module is specifically configured to generate a song based on the first class pitch information, the second class pitch information, and the parameter information related to the tone color and corresponding to each character;
wherein the first type pitch information comprises: the first pitch information replacing each second pitch information; the second-class pitch information includes: and the pitch information except the first type pitch information in the plurality of first pitch information corresponding to the melody information.
11. The apparatus of claim 8, further comprising: presetting a processing module, wherein,
the preset processing module is configured to perform preset processing on the parameter information, which corresponds to each character and is related to the tone, where the preset processing includes: at least one of interpolation processing and sampling processing;
the generating module is specifically configured to generate a song based on the first pitch information corresponding to each character and the parameter information related to the tone color corresponding to each character after the preset processing.
12. The apparatus of claim 11, further comprising: a labeling module, wherein,
the marking module is used for marking the starting time and the ending time corresponding to each note in the melody information respectively to obtain marked melody information;
the preset processing module is specifically configured to preset, based on the labeled melody information, the tone-related parameter information corresponding to each character.
13. The apparatus of claim 8, further comprising: an acquisition module and a denoising processing module, wherein,
the acquisition module is used for acquiring voice information input by a user;
and the denoising processing module is used for denoising the voice information input by the user.
14. The apparatus according to any one of claims 8-13, wherein the parameter information related to timbre comprises at least one of:
spectral envelope parameters SP; an aperiodic sequence signal AP.
15. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: a method of generating a song according to any one of claims 1 to 7 is performed.
16. A computer readable storage medium having stored thereon at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of generating songs as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910779948.4A CN112420004A (en) | 2019-08-22 | 2019-08-22 | Method and device for generating songs, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910779948.4A CN112420004A (en) | 2019-08-22 | 2019-08-22 | Method and device for generating songs, electronic equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112420004A true CN112420004A (en) | 2021-02-26 |
Family
ID=74779901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910779948.4A Pending CN112420004A (en) | 2019-08-22 | 2019-08-22 | Method and device for generating songs, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112420004A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113808555A (en) * | 2021-09-17 | 2021-12-17 | 广州酷狗计算机科技有限公司 | Song synthesis method and device, equipment, medium and product thereof |
Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1036282A (en) * | 1988-03-08 | 1989-10-11 | 雅马哈株式会社 | Musical-tone-generating-control apparatus |
CA2090948A1 (en) * | 1992-03-09 | 1993-09-10 | Brian C. Gibson | Musical entertainment system |
JPH0720861A (en) * | 1993-06-30 | 1995-01-24 | Casio Comput Co Ltd | Automatic playing device |
JPH0950287A (en) * | 1995-08-04 | 1997-02-18 | Yamaha Corp | Automatic singing device |
CN1325104A (en) * | 2000-05-22 | 2001-12-05 | 董红伟 | Language playback device with automatic music composing function |
JP2004077608A (en) * | 2002-08-12 | 2004-03-11 | Yamaha Corp | Apparatus and method for chorus synthesis and program |
KR20070016750A (en) * | 2005-08-05 | 2007-02-08 | 모두스타 주식회사 | Ubiquitous music information retrieval system and method based on query pool with feedback of customer characteristics |
US20090314155A1 (en) * | 2008-06-20 | 2009-12-24 | Microsoft Corporation | Synthesized singing voice waveform generator |
US20110000360A1 (en) * | 2009-07-02 | 2011-01-06 | Yamaha Corporation | Apparatus and Method for Creating Singing Synthesizing Database, and Pitch Curve Generation Apparatus and Method |
TW201108202A (en) * | 2009-08-25 | 2011-03-01 | Inst Information Industry | System, method, and apparatus for singing voice synthesis |
CN102024453A (en) * | 2009-09-09 | 2011-04-20 | 财团法人资讯工业策进会 | Singing sound synthesis system, method and device |
CN102820027A (en) * | 2012-06-21 | 2012-12-12 | 福建星网视易信息系统有限公司 | Accompaniment subtitle display system and method |
CN104392731A (en) * | 2014-11-30 | 2015-03-04 | 陆俊 | Singing practicing method and system |
CN105118519A (en) * | 2015-07-10 | 2015-12-02 | 中山大学孙逸仙纪念医院 | Hearing evaluation system |
CN105825844A (en) * | 2015-07-30 | 2016-08-03 | 维沃移动通信有限公司 | Sound repairing method and device |
CN106547797A (en) * | 2015-09-23 | 2017-03-29 | 腾讯科技(深圳)有限公司 | Audio frequency generation method and device |
CN106971703A (en) * | 2017-03-17 | 2017-07-21 | 西北师范大学 | A kind of song synthetic method and device based on HMM |
CN107203571A (en) * | 2016-03-18 | 2017-09-26 | 腾讯科技(深圳)有限公司 | Song lyric information processing method and device |
CN108428441A (en) * | 2018-02-09 | 2018-08-21 | 咪咕音乐有限公司 | Multimedia file producting method, electronic equipment and storage medium |
CN108630243A (en) * | 2018-05-09 | 2018-10-09 | 福建星网视易信息系统有限公司 | A kind of method and terminal that auxiliary is sung |
CN109741724A (en) * | 2018-12-27 | 2019-05-10 | 歌尔股份有限公司 | Make the method, apparatus and intelligent sound of song |
-
2019
- 2019-08-22 CN CN201910779948.4A patent/CN112420004A/en active Pending
Patent Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1036282A (en) * | 1988-03-08 | 1989-10-11 | 雅马哈株式会社 | Musical-tone-generating-control apparatus |
CA2090948A1 (en) * | 1992-03-09 | 1993-09-10 | Brian C. Gibson | Musical entertainment system |
JPH0720861A (en) * | 1993-06-30 | 1995-01-24 | Casio Comput Co Ltd | Automatic playing device |
JPH0950287A (en) * | 1995-08-04 | 1997-02-18 | Yamaha Corp | Automatic singing device |
CN1325104A (en) * | 2000-05-22 | 2001-12-05 | 董红伟 | Language playback device with automatic music composing function |
JP2004077608A (en) * | 2002-08-12 | 2004-03-11 | Yamaha Corp | Apparatus and method for chorus synthesis and program |
KR20070016750A (en) * | 2005-08-05 | 2007-02-08 | 모두스타 주식회사 | Ubiquitous music information retrieval system and method based on query pool with feedback of customer characteristics |
US20090314155A1 (en) * | 2008-06-20 | 2009-12-24 | Microsoft Corporation | Synthesized singing voice waveform generator |
US20110000360A1 (en) * | 2009-07-02 | 2011-01-06 | Yamaha Corporation | Apparatus and Method for Creating Singing Synthesizing Database, and Pitch Curve Generation Apparatus and Method |
TW201108202A (en) * | 2009-08-25 | 2011-03-01 | Inst Information Industry | System, method, and apparatus for singing voice synthesis |
CN102024453A (en) * | 2009-09-09 | 2011-04-20 | 财团法人资讯工业策进会 | Singing sound synthesis system, method and device |
CN102820027A (en) * | 2012-06-21 | 2012-12-12 | 福建星网视易信息系统有限公司 | Accompaniment subtitle display system and method |
CN104392731A (en) * | 2014-11-30 | 2015-03-04 | 陆俊 | Singing practicing method and system |
CN105118519A (en) * | 2015-07-10 | 2015-12-02 | 中山大学孙逸仙纪念医院 | Hearing evaluation system |
CN105825844A (en) * | 2015-07-30 | 2016-08-03 | 维沃移动通信有限公司 | Sound repairing method and device |
CN106547797A (en) * | 2015-09-23 | 2017-03-29 | 腾讯科技(深圳)有限公司 | Audio frequency generation method and device |
CN107203571A (en) * | 2016-03-18 | 2017-09-26 | 腾讯科技(深圳)有限公司 | Song lyric information processing method and device |
CN106971703A (en) * | 2017-03-17 | 2017-07-21 | 西北师范大学 | A kind of song synthetic method and device based on HMM |
CN108428441A (en) * | 2018-02-09 | 2018-08-21 | 咪咕音乐有限公司 | Multimedia file producting method, electronic equipment and storage medium |
CN108630243A (en) * | 2018-05-09 | 2018-10-09 | 福建星网视易信息系统有限公司 | A kind of method and terminal that auxiliary is sung |
CN109741724A (en) * | 2018-12-27 | 2019-05-10 | 歌尔股份有限公司 | Make the method, apparatus and intelligent sound of song |
Non-Patent Citations (1)
Title |
---|
吴耀中等: "一个基于正弦模型的音乐合成系统", 《福建师范大学学报( 自然科学版)》, vol. 16, no. 1, 31 March 2000 (2000-03-31) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113808555A (en) * | 2021-09-17 | 2021-12-17 | 广州酷狗计算机科技有限公司 | Song synthesis method and device, equipment, medium and product thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6027087B2 (en) | Acoustic signal processing system and method for performing spectral behavior transformations | |
CN111383646B (en) | Voice signal transformation method, device, equipment and storage medium | |
CN110010151A (en) | A kind of acoustic signal processing method and equipment, storage medium | |
CN111445900A (en) | Front-end processing method and device for voice recognition and terminal equipment | |
Mittal et al. | Study of characteristics of aperiodicity in Noh voices | |
CN108269579B (en) | Voice data processing method and device, electronic equipment and readable storage medium | |
CN108369803B (en) | Method for forming an excitation signal for a parametric speech synthesis system based on a glottal pulse model | |
AU2014395554B2 (en) | Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system | |
CN112420004A (en) | Method and device for generating songs, electronic equipment and computer readable storage medium | |
JP6193395B2 (en) | Digital watermark detection apparatus, method and program | |
CN111782868B (en) | Audio processing method, device, equipment and medium | |
CN109697985B (en) | Voice signal processing method and device and terminal | |
Lee et al. | Excitation signal extraction for guitar tones | |
CN112164387A (en) | Audio synthesis method and device, electronic equipment and computer-readable storage medium | |
JP6213217B2 (en) | Speech synthesis apparatus and computer program for speech synthesis | |
JP5879813B2 (en) | Multiple sound source identification device and information processing device linked to multiple sound sources | |
CN111862931A (en) | Voice generation method and device | |
CN108780634B (en) | Sound signal processing method and sound signal processing device | |
JP6834370B2 (en) | Speech synthesis method | |
CN113450768B (en) | Speech synthesis system evaluation method and device, readable storage medium and terminal equipment | |
CN116189636B (en) | Accompaniment generation method, device, equipment and storage medium based on electronic musical instrument | |
WO2014108890A1 (en) | Method and apparatus for phoneme separation in an audio signal | |
JP6329408B2 (en) | Speech processing apparatus, analysis method and program for speech processing apparatus | |
JP5877823B2 (en) | Speech recognition apparatus, speech recognition method, and program | |
WO2018043708A1 (en) | Method for extracting intonation structure of speech, and computer program therefor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |