CN112420004A - Method and device for generating songs, electronic equipment and computer readable storage medium - Google Patents

Method and device for generating songs, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN112420004A
CN112420004A CN201910779948.4A CN201910779948A CN112420004A CN 112420004 A CN112420004 A CN 112420004A CN 201910779948 A CN201910779948 A CN 201910779948A CN 112420004 A CN112420004 A CN 112420004A
Authority
CN
China
Prior art keywords
information
character
pitch information
pitch
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910779948.4A
Other languages
Chinese (zh)
Inventor
郝舫
张跃
白云飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Fengqu Internet Information Service Co ltd
Original Assignee
Beijing Fengqu Internet Information Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Fengqu Internet Information Service Co ltd filed Critical Beijing Fengqu Internet Information Service Co ltd
Priority to CN201910779948.4A priority Critical patent/CN112420004A/en
Publication of CN112420004A publication Critical patent/CN112420004A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/366Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems with means for modifying or correcting the external signal, e.g. pitch correction, reverberation, changing a singer's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The embodiment of the application provides a method and a device for generating songs, electronic equipment and a computer readable storage medium. Relates to the technical field of voice processing, and the method comprises the following steps: the method comprises the steps of extracting a plurality of first pitch information corresponding to melody information from audio information selected by a user, extracting parameter information related to tone corresponding to each character from voice information input by the user, determining first pitch information corresponding to each character based on the first pitch information corresponding to the melody information, and generating a song based on the first pitch information corresponding to each character and the parameter information related to tone corresponding to each character. According to the embodiment of the application, the effect of generating the song is improved, the user does not need to have higher singing skill, and the user experience can be improved.

Description

Method and device for generating songs, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of speech processing technologies, and in particular, to a method and an apparatus for generating a song, an electronic device, and a computer-readable storage medium.
Background
With the development of information technology, various application software, particularly application software related to songs, for example, a singing synthesis system, has come to be produced.
The existing singing synthesis system plays the melody of the song selected by the user or displays the prompt message aiming at the melody of the song selected by the user on the screen, the user needs to sing according to the played melody or the prompt message aiming at the melody of the song displayed on the screen and records the singing, and the singing sound of the user and the melody of the song selected by the user are simply combined to obtain the combined song.
However, in the above-described singing synthesis system, the user is required to know the melody of the song to be recorded, and the singing voice of the user for the song is recorded, so that the requirement for using the singing skill of the user is high; in addition, in the synthesis process, only the melody is added into the recorded singing voice of the user, the effect of the combined song is poor, and the user experience is low.
Disclosure of Invention
The application provides a method, an apparatus, an electronic device and a computer-readable storage medium for generating a song, which are used for at least one technical problem, and the specific technical scheme is as follows:
in a first aspect, a method for generating a song is provided, the method comprising:
extracting a plurality of first pitch information corresponding to the melody information from the audio information selected by the user;
extracting parameter information which is corresponding to each character and is related to tone from voice information input by a user;
determining first pitch information corresponding to each character based on a plurality of first pitch information corresponding to the melody information;
and generating the song based on the first pitch information corresponding to each character and the parameter information related to tone corresponding to each character.
In one possible implementation manner, determining, based on a plurality of first pitch information corresponding to the melody information, first pitch information corresponding to each word includes:
respectively determining first pitch information for replacing each second pitch information from a plurality of first pitch information corresponding to the melody information based on the matching relationship between the melody information and each character in the voice information input by the user, and obtaining first pitch information corresponding to each character;
and each second pitch information is the original pitch information corresponding to each word in the voice information input by the user.
In another possible implementation manner, generating a song based on the first pitch information corresponding to each word and the parameter information related to tone color corresponding to each word includes:
generating a song based on the first class pitch information, the second class pitch information and the parameter information which is corresponding to each character and is related to the tone;
wherein the first type pitch information comprises: first pitch information for replacing each second pitch information; the second type pitch information includes: pitch information except the first pitch information in the plurality of first pitch information corresponding to the melody information.
In another possible implementation manner, generating a song based on the first pitch information corresponding to each word and the parameter information related to tone color corresponding to each word, before further including:
the method comprises the following steps of presetting parameter information which corresponds to each character and is related to tone colors, wherein the presetting comprises the following steps: at least one of interpolation processing and sampling processing;
the generating of the song based on the first pitch information corresponding to each character and the parameter information related to tone corresponding to each character includes:
and generating a song based on the first pitch information corresponding to each character and the parameter information which is corresponding to each preset character and is related to the tone after the preset processing.
In another possible implementation manner, the pre-setting processing is performed on the parameter information related to the tone color corresponding to each word, and the method further includes:
marking the start time and the cut-off time corresponding to each note in the melody information to obtain marked melody information;
the method for presetting the parameter information which corresponds to each character and is related to the tone comprises the following steps:
and presetting the parameter information related to the tone corresponding to each character based on the labeled melody information.
In another possible implementation manner, extracting, from the speech information input by the user, parameter information related to timbre corresponding to each word, previously further includes:
acquiring voice information input by a user;
and denoising the voice information input by the user.
In another possible implementation, the parameter information related to timbre includes at least one of:
spectral envelope parameters SP; an aperiodic sequence signal AP.
In a second aspect, there is provided an apparatus for generating a song, the apparatus comprising:
the first extraction module is used for extracting a plurality of first pitch information corresponding to the melody information from the audio information selected by the user;
the second extraction module is used for extracting the parameter information which corresponds to each character and is related to the tone from the voice information input by the user;
the determining module is used for determining first pitch information corresponding to each character based on a plurality of first pitch information corresponding to the melody information;
and the generating module is used for generating the song based on the first pitch information corresponding to each character and the parameter information which is corresponding to each character and is related to the tone.
In a possible implementation manner, the determining module is specifically configured to determine, based on a matching relationship between the melody information and each word in the voice information input by the user, first pitch information used for replacing each second pitch information from a plurality of first pitch information corresponding to the melody information, respectively, to obtain first pitch information corresponding to each word;
and each second pitch information is the original pitch information corresponding to each word in the voice information input by the user.
In another possible implementation manner, the generating module is specifically configured to generate a song based on the first class pitch information, the second class pitch information, and the parameter information related to the tone color and corresponding to each word;
wherein the first type pitch information comprises: first pitch information for replacing each second pitch information; the second type pitch information includes: pitch information except the first pitch information in the plurality of first pitch information corresponding to the melody information.
In another possible implementation manner, the apparatus further includes: presetting a processing module, wherein,
the preset processing module is used for carrying out preset processing on the parameter information which corresponds to each character and is related to the tone, and the preset processing comprises the following steps: at least one of interpolation processing and sampling processing;
and the generating module is specifically used for generating the song based on the first pitch information corresponding to each character and the parameter information related to the tone corresponding to each preset processed character.
In another possible implementation manner, the apparatus further includes: a labeling module, wherein,
the marking module is used for marking the starting time and the ending time corresponding to each note in the melody information respectively to obtain the marked melody information;
and the presetting processing module is specifically used for presetting the tone-related parameter information corresponding to each character based on the labeled melody information.
In another possible implementation manner, the apparatus further includes: an acquisition module and a denoising processing module, wherein,
the acquisition module is used for acquiring voice information input by a user;
and the denoising processing module is used for denoising the voice information input by the user.
In another possible implementation, the parameter information related to timbre includes at least one of:
spectral envelope parameters SP; an aperiodic sequence signal AP.
In a third aspect, an electronic device is provided, which includes:
one or more processors;
a memory;
one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: and executing the operation corresponding to the method for generating the song shown in the first aspect or any possible implementation manner of the first aspect.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the method of generating songs as shown in the first aspect or any possible implementation manner of the first aspect.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
compared with the prior art, the method, the device, the electronic equipment and the computer-readable storage medium for generating the song extract a plurality of first pitch information corresponding to melody information from audio information selected by a user, extract parameter information related to tone color and corresponding to each character from voice information input by the user, determine the first pitch information corresponding to each character based on the first pitch information corresponding to the melody information, and generate the song based on the first pitch information corresponding to each character and the parameter information related to tone color and corresponding to each character. That is, when the song is generated, the song is generated based on the first pitch information in the melody information selected by the user and the parameter information related to the tone in the voice information input by the user, and the melody is not simply added to the singing voice recorded by the user, so that the effect of generating the song can be improved, and the user only needs the voice information input by the user when generating the song, so that the user does not need to have higher singing skill, and the user experience can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a method for generating songs according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of an apparatus for generating songs according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device for generating songs according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
An embodiment of the present application provides a method for generating a song, as shown in fig. 1, the method includes:
step S101, extracting a plurality of first pitch information corresponding to the melody information from the audio information selected by the user.
For the embodiment of the present application, the audio information selected by the user may only include the melody information, or may include the melody information and information other than the melody information. The embodiments of the present application are not limited.
For the present embodiment, pitch refers to one of the basic features of sound of various heights, i.e., the height of the sound. The sound level is determined by the vibration frequency, and the sound level and the vibration frequency are in positive correlation: the frequency (i.e., how many times the vibration is performed per unit time) is high, the sound is "high", and otherwise, the sound is "low".
Step S102, extracting parameter information related to tone color corresponding to each character from voice information input by a user.
For the present embodiment, timbre refers to the fact that the frequency representation of different sounds always has distinctive characteristics in terms of waveform. In the embodiment of the present application, the parameter information related to the tone color includes: at least one of spectral envelope parameters SP and a non-periodic sequence signal AP.
For the embodiment of the application, the voice is a complex multi-frequency signal, each frequency component has different amplitude, and when the frequency components are arranged according to the size of the frequency, a curve trained by the top end of the voice becomes a voice spectrum envelope. The shape of the envelope varies with the sound emitted. The sound waves generated by the vocal cords vibrate to resonate when passing through the vocal tract formed by the oral cavity, the nasal cavity, and the like. Some regions of the spectrum are emphasized as a result of resonance. Therefore, the shape of the spectral envelope varies from person to person.
For the embodiment of the present application, step S101 may be executed before step S102, may be executed after step S102, and may also be executed simultaneously with step S102. The embodiments of the present application are not limited.
Step S103, determining first pitch information corresponding to each character based on the plurality of first pitch information corresponding to the melody information.
For the embodiment of the application, the first pitch information corresponding to each character is selected from the plurality of first pitch information corresponding to the melody information, and is used as the first pitch information corresponding to each character extracted from the voice information input by the user.
And step S104, generating a song based on the first pitch information corresponding to each character and the parameter information related to tone corresponding to each character.
For the embodiment of the present application, a song is generated based on the first pitch information corresponding to each word determined in step S103 and the parameter information related to the tone color corresponding to each word extracted in step S102.
Compared with the prior art, the method for generating the song comprises the steps of extracting a plurality of first pitch information corresponding to melody information from audio information selected by a user, extracting parameter information related to tone color corresponding to each character from voice information input by the user, determining the first pitch information corresponding to each character based on the first pitch information corresponding to the melody information, and generating the song based on the first pitch information corresponding to each character and the parameter information related to tone color corresponding to each character. That is, in the embodiment of the present application, when a song is generated, the song is generated based on the first pitch information in the melody information selected by the user and the parameter information related to the tone in the voice information input by the user, and it is not simple to add a melody to the singing sound recorded by the user, so that the effect of generating the song can be improved, and the user only needs the voice information input by the user when the song is generated, so that the user does not need to have a higher singing skill, and further, the user experience can be improved.
In a possible implementation manner of the embodiment of the present application, step S103 may specifically include: step S1031 (not shown in the figure), in which,
and step S1031, respectively determining first pitch information for replacing each second pitch information from a plurality of first pitch information corresponding to the melody information based on the matching relationship between the melody information and each character in the voice information input by the user, and obtaining the first pitch information corresponding to each character.
And each second pitch information is the original pitch information corresponding to each word in the voice information input by the user.
For the embodiment of the application, the first pitch information corresponding to the note matched with each character in the voice information input by the user is determined from the melody information, and the first pitch information corresponding to the note information matched with each character in the voice information input by the user in the melody information is replaced by the second pitch information corresponding to each character or is used as the pitch information corresponding to each character in the voice information input by the user.
In another possible implementation manner of the embodiment of the present application, step S104 may specifically include: step S1041 (not shown), in which,
and S1041, generating a song based on the first class pitch information, the second class pitch information and the parameter information which is respectively corresponding to each character and is related to the tone.
Wherein the first type pitch information comprises: first pitch information for replacing each second pitch information; the second type pitch information includes: pitch information except the first pitch information in the plurality of first pitch information corresponding to the melody information.
For the embodiment of the present application, the second type of pitch information may include: pitch information corresponding to prelude or interlude and the like besides the first type pitch information; in the embodiment of the application, the song is generated based on the first class pitch information, the second class pitch information and the parameter information which is corresponding to each character and is related to the tone, the integrity of the generated song can be improved, the effect of the generated song can be improved, and the user experience is improved.
In another possible implementation manner of the embodiment of the present application, step S104 may further include: step Sa (not shown in the figure), in which,
and step Sa, presetting the parameter information related to the tone corresponding to each character.
Wherein the preset treatment comprises: at least one of interpolation processing and sampling processing.
For the embodiment of the application, interpolation is an important method for discrete function approximation, and the approximation of the function at other points can be estimated by using the evaluation conditions of the function at a limited number of points.
For the embodiment of the application, interpolation processing and/or sampling processing are performed on the parameter information related to the tone color and the parameter information related to the tone color, which are respectively corresponding to each character, so that the parameter information related to the tone color, which is respectively corresponding to each character after interpolation and/or sampling processing, is consistent with the length of the note corresponding to each character in the melody.
For the embodiment of the present application, the parameter information related to the tone color may include: at least one of spectral envelope parameters SP and a non-periodic sequence signal AP. In this embodiment, step Sa may specifically include: carrying out interpolation processing and/or sampling processing on the spectrum envelope parameters respectively corresponding to each word; and performing interpolation processing and/or sampling processing on the non-periodic sequence signals respectively corresponding to the words.
In another possible implementation manner of the embodiment of the present application, on the basis of step Sa, step S104 may specifically include: step S1042 (not shown in the figure), in which,
step S1042, generating a song based on the first pitch information corresponding to each character and the parameter information related to the tone corresponding to each preset character.
For the embodiment of the application, the song is generated based on the first pitch information corresponding to each character and the parameter information related to the tone corresponding to each character after interpolation processing or sampling processing.
In another possible implementation manner of the embodiment of the present application, before the step Sa, the step may further include: step Sb (not shown in the figure), in which,
and Sb, marking the start time and the cut-off time corresponding to each note in the rhythm information to obtain the marked rhythm information.
For the embodiment of the application, the start time and the cut-off time corresponding to each note in the rhythm information are respectively marked through the trained marking model, so that the marked rhythm information is obtained.
For the embodiment of the application, melody information input by a user through a humming mode is obtained, for example, melody information input through a humming mode of 'cheela', corresponding audio features such as Mel frequency cepstrum coefficient MFCC are obtained from the melody information input by the user, and then the MFCC obtained from the melody information input by the user passes through a trained labeling model to label the starting time and the labeling time of each 'cheela' in the 'cheela'.
In another possible implementation manner of the embodiment of the present application, on the basis of step Sb, step Sa specifically may include: step Sa1 (not shown), in which,
step Sa1 is to perform a preset process on the parameter information related to the timbre corresponding to each character based on the labeled melody information.
For the embodiment of the application, by labeling the melody information, the start time and the end time of each note in the melody information can be determined, and interpolation processing or sampling processing is performed on the tone color information respectively corresponding to each character respectively corresponding to each note according to the determined start time and end time of each note.
For example, if the start time and the end time of the first note in the melody information are 0 '10 "and 0' 15", respectively, then the words corresponding to the first note in the speech information input by the user are interpolated or sampled according to the start time and the end time of the first note.
In another possible implementation manner of the embodiment of the present application, step S102 may further include: step Sc (not shown) and step Sd (not shown), wherein,
and step Sc, acquiring voice information input by a user.
And Sd, denoising the voice information input by the user.
For the embodiment of the application, the voice information input by the user is denoised by at least one of the following algorithms:
a minimum Mean Square error (LMS) adaptive filter; an adaptive notch filter of the LMS; performing spectral subtraction; wiener filtering method.
For the embodiment of the application, the LMS adaptive filter automatically adjusts the current filter parameter by using the filter parameter obtained at the previous moment to adapt to the unknown or randomly changing statistical characteristics of the signal and the noise, thereby realizing the optimal filtering.
For the embodiment of the application, the adaptive notch filter of the LMS is suitable for monochromatic interference noise, such as single-frequency sine wave noise, the characteristic of the notch filter is expected to be ideal, the shoulder of the notch is arbitrarily narrow, and the notch can immediately enter the flat area.
For the embodiment of the present application, spectral subtraction is used to perform noise reduction processing on the frequency domain of the speech signal.
For the embodiment of the present application, the wiener filtering method designs a digital filter h (n) so that the error between the input noisy speech signal and the clean speech signal satisfies the LMS criterion.
For the embodiment of the application, the parameter information related to the tone color and corresponding to each character is extracted from the voice information input by the user, so that the accuracy of extracting the parameter information related to the tone color and corresponding to each character from the voice information input by the user can be improved by denoising the voice information input by the user, the effect of generating songs can be further improved, and the user experience is improved.
The foregoing embodiment introduces a method for generating a song from the perspective of a method flow, and the following introduces an apparatus for generating a song from the perspective of a virtual module or a virtual unit, and the following apparatus for generating a song is applicable to the method for generating a song, and specifically as follows:
an embodiment of the present application provides an apparatus for generating a song, and as shown in fig. 2, the apparatus 20 for generating a song may specifically include: a first extraction module 21, a second extraction module 22, a determination module 23 and a generation module 24, wherein,
the first extraction module 21 is configured to extract a plurality of first pitch information corresponding to the melody information from the audio information selected by the user.
And a second extracting module 22, configured to extract, from the voice information input by the user, parameter information related to the tone color corresponding to each word.
For the embodiment of the present application, the first extraction module 21 and the second extraction module 22 may be the same extraction module, or may be different extraction modules. The embodiments of the present application are not limited.
The determining module 23 is configured to determine, based on the plurality of first pitch information corresponding to the melody information, first pitch information corresponding to each character.
And the generating module 24 is configured to generate a song based on the first pitch information corresponding to each word and the parameter information related to the tone color corresponding to each word.
In another possible implementation manner of the embodiment of the application, the determining module 23 is specifically configured to determine, based on a matching relationship between the melody information and each word in the voice information input by the user, first pitch information used for replacing each second pitch information from a plurality of first pitch information corresponding to the melody information, respectively, so as to obtain first pitch information corresponding to each word.
And each second pitch information is the original pitch information corresponding to each word in the voice information input by the user.
In another possible implementation manner of the embodiment of the present application, the generating module 24 is specifically configured to generate a song based on the first type pitch information, the second type pitch information, and the parameter information related to the tone color and corresponding to each word.
Wherein the first type pitch information comprises: first pitch information for replacing each second pitch information; the second type pitch information includes: pitch information except the first pitch information in the plurality of first pitch information corresponding to the melody information.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: presetting a processing module, wherein,
and the preset processing module is used for presetting the parameter information which corresponds to each character and is related to the tone.
Wherein the preset treatment comprises: at least one of interpolation processing and sampling processing.
The generating module 24 is specifically configured to generate a song based on the first pitch information corresponding to each word and the parameter information related to the tone color corresponding to each word after the preset processing.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a labeling module, wherein,
and the marking module is used for marking the starting time and the ending time corresponding to each note in the melody information respectively to obtain the marked melody information.
And the presetting processing module is specifically used for presetting the tone-related parameter information corresponding to each character based on the labeled melody information.
In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: an acquisition module and a denoising processing module, wherein,
the acquisition module is used for acquiring voice information input by a user;
and the denoising processing module is used for denoising the voice information input by the user.
In another possible implementation manner of the embodiment of the present application, the parameter information related to the tone includes at least one of the following:
spectral envelope parameters SP; an aperiodic sequence signal AP.
Compared with the prior art, the device for generating the song provided by the embodiment of the application extracts a plurality of first pitch information corresponding to melody information from audio information selected by a user, extracts tone-related parameter information corresponding to each character from voice information input by the user, determines the first pitch information corresponding to each character based on the first pitch information corresponding to the melody information, and generates the song based on the first pitch information corresponding to each character and the tone-related parameter information corresponding to each character. That is, in the embodiment of the present application, when a song is generated, the song is generated based on the first pitch information in the melody information selected by the user and the parameter information related to the tone in the voice information input by the user, and it is not simple to add a melody to the singing sound recorded by the user, so that the effect of generating the song can be improved, and the user only needs the voice information input by the user when the song is generated, so that the user does not need to have a higher singing skill, and further, the user experience can be improved.
The apparatus for generating a song according to this embodiment may execute the method for generating a song provided in the foregoing method embodiment, and the implementation principles thereof are similar, and are not described herein again.
The embodiment describes a method for generating a song from the perspective of a method flow and a device for generating a song from the perspective of a virtual module and a virtual unit, and the following describes an electronic device from the perspective of a physical structure, and is specifically as follows:
an embodiment of the present application provides an electronic device, as shown in fig. 3, an electronic device 3000 shown in fig. 3 includes: a processor 3001 and a memory 3003. The processor 3001 is coupled to the memory 3003, such as via a bus 3002. Optionally, the electronic device 3000 may further comprise a transceiver 3004. It should be noted that the transceiver 3004 is not limited to one in practical applications, and the structure of the electronic device 3000 is not limited to the embodiment of the present application.
The processor 3001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 3001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 3002 may include a path that conveys information between the aforementioned components. The bus 3002 may be a PCI bus or an EISA bus, etc. The bus 3002 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.
Memory 3003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 3003 is used for storing application program codes for performing the present scheme, and is controlled to be executed by the processor 3001. The processor 3001 is configured to execute application program code stored in the memory 3003 to implement any of the method embodiments shown above.
An embodiment of the present application provides an electronic device, where the electronic device includes: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: the method and the device for generating the song comprise the steps of extracting a plurality of first pitch information corresponding to melody information from audio information selected by a user, extracting parameter information related to tone color corresponding to each character from voice information input by the user, determining first pitch information corresponding to each character based on the first pitch information corresponding to the melody information, and generating the song based on the first pitch information corresponding to each character and the parameter information related to tone color corresponding to each character. That is, in the embodiment of the present application, when a song is generated, the song is generated based on the first pitch information in the melody information selected by the user and the parameter information related to the tone in the voice information input by the user, and it is not simple to add a melody to the singing sound recorded by the user, so that the effect of generating the song can be improved, and the user only needs the voice information input by the user when the song is generated, so that the user does not need to have a higher singing skill, and further, the user experience can be improved.
The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the method and the device have the advantages that the first pitch information corresponding to the melody information is extracted from the audio information selected by the user, the parameter information related to the tone color corresponding to each character is extracted from the voice information input by the user, the first pitch information corresponding to each character is determined based on the first pitch information corresponding to the melody information, and the song is generated based on the first pitch information corresponding to each character and the parameter information related to the tone color corresponding to each character. That is, in the embodiment of the present application, when a song is generated, the song is generated based on the first pitch information in the melody information selected by the user and the parameter information related to the tone in the voice information input by the user, and it is not simple to add a melody to the singing sound recorded by the user, so that the effect of generating the song can be improved, and the user only needs the voice information input by the user when the song is generated, so that the user does not need to have a higher singing skill, and further, the user experience can be improved.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (16)

1. A method of generating a song, comprising:
extracting a plurality of first pitch information corresponding to the melody information from the audio information selected by the user;
extracting parameter information which is corresponding to each character and is related to tone from voice information input by a user;
determining first pitch information corresponding to each character based on a plurality of first pitch information corresponding to the melody information;
and generating a song based on the first pitch information corresponding to each character and the parameter information related to tone corresponding to each character.
2. The method of claim 1, wherein the determining the first pitch information corresponding to each word based on the plurality of first pitch information corresponding to the melody information comprises:
respectively determining first pitch information for replacing each second pitch information from a plurality of first pitch information corresponding to the melody information based on the matching relationship between the melody information and each word in the voice information input by the user, and obtaining first pitch information corresponding to each word;
and each second pitch information is original pitch information corresponding to each word in the voice information input by the user.
3. The method of claim 2, wherein generating a song based on the first pitch information corresponding to each of the words and the parameter information related to timbre corresponding to each of the words comprises:
generating a song based on the first class pitch information, the second class pitch information and the parameter information which is respectively corresponding to each character and is related to the tone;
wherein the first type pitch information comprises: the first pitch information replacing each second pitch information; the second-class pitch information includes: and the pitch information except the first type pitch information in the plurality of first pitch information corresponding to the melody information.
4. The method of claim 1, wherein generating a song based on the first pitch information corresponding to each of the words and the parameter information related to timbre corresponding to each of the words further comprises:
and presetting the parameter information which is respectively corresponding to each character and is related to the tone, wherein the presetting comprises the following steps: at least one of interpolation processing and sampling processing;
wherein the generating a song based on the first pitch information corresponding to each character and the parameter information related to tone color corresponding to each character comprises:
and generating a song based on the first pitch information corresponding to each character and the parameter information which is corresponding to each character and is related to tone after the preset processing.
5. The method of claim 4, wherein the pre-setting of the parameter information related to tone color corresponding to each of the words further comprises:
marking the start time and the cut-off time corresponding to each note in the melody information to obtain marked melody information;
the preset processing is performed on the parameter information related to the tone color and corresponding to each character, and includes:
and presetting the parameter information related to the tone corresponding to each character based on the labeled melody information.
6. The method according to claim 1, wherein the extracting, from the speech information input by the user, the parameter information related to tone color corresponding to each word further comprises:
acquiring voice information input by a user;
and denoising the voice information input by the user.
7. The method according to any of claims 1-6, wherein the parameter information related to timbre comprises at least one of:
spectral envelope parameters SP; an aperiodic sequence signal AP.
8. An apparatus for generating songs, comprising:
the first extraction module is used for extracting a plurality of first pitch information corresponding to the melody information from the audio information selected by the user;
the second extraction module is used for extracting the parameter information which corresponds to each character and is related to the tone from the voice information input by the user;
the determining module is used for determining first pitch information corresponding to each character based on a plurality of first pitch information corresponding to the melody information;
and the generating module is used for generating a song based on the first pitch information corresponding to each character and the parameter information which is corresponding to each character and is related to tone.
9. The apparatus of claim 8,
the determining module is specifically configured to determine, based on a matching relationship between the melody information and each word in the voice information input by the user, first pitch information for replacing each second pitch information from multiple first pitch information corresponding to the melody information, respectively, and obtain first pitch information corresponding to each word;
and each second pitch information is original pitch information corresponding to each word in the voice information input by the user.
10. The apparatus of claim 9,
the generating module is specifically configured to generate a song based on the first class pitch information, the second class pitch information, and the parameter information related to the tone color and corresponding to each character;
wherein the first type pitch information comprises: the first pitch information replacing each second pitch information; the second-class pitch information includes: and the pitch information except the first type pitch information in the plurality of first pitch information corresponding to the melody information.
11. The apparatus of claim 8, further comprising: presetting a processing module, wherein,
the preset processing module is configured to perform preset processing on the parameter information, which corresponds to each character and is related to the tone, where the preset processing includes: at least one of interpolation processing and sampling processing;
the generating module is specifically configured to generate a song based on the first pitch information corresponding to each character and the parameter information related to the tone color corresponding to each character after the preset processing.
12. The apparatus of claim 11, further comprising: a labeling module, wherein,
the marking module is used for marking the starting time and the ending time corresponding to each note in the melody information respectively to obtain marked melody information;
the preset processing module is specifically configured to preset, based on the labeled melody information, the tone-related parameter information corresponding to each character.
13. The apparatus of claim 8, further comprising: an acquisition module and a denoising processing module, wherein,
the acquisition module is used for acquiring voice information input by a user;
and the denoising processing module is used for denoising the voice information input by the user.
14. The apparatus according to any one of claims 8-13, wherein the parameter information related to timbre comprises at least one of:
spectral envelope parameters SP; an aperiodic sequence signal AP.
15. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: a method of generating a song according to any one of claims 1 to 7 is performed.
16. A computer readable storage medium having stored thereon at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of generating songs as claimed in any one of claims 1 to 7.
CN201910779948.4A 2019-08-22 2019-08-22 Method and device for generating songs, electronic equipment and computer readable storage medium Pending CN112420004A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910779948.4A CN112420004A (en) 2019-08-22 2019-08-22 Method and device for generating songs, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910779948.4A CN112420004A (en) 2019-08-22 2019-08-22 Method and device for generating songs, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112420004A true CN112420004A (en) 2021-02-26

Family

ID=74779901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910779948.4A Pending CN112420004A (en) 2019-08-22 2019-08-22 Method and device for generating songs, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112420004A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808555A (en) * 2021-09-17 2021-12-17 广州酷狗计算机科技有限公司 Song synthesis method and device, equipment, medium and product thereof

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1036282A (en) * 1988-03-08 1989-10-11 雅马哈株式会社 Musical-tone-generating-control apparatus
CA2090948A1 (en) * 1992-03-09 1993-09-10 Brian C. Gibson Musical entertainment system
JPH0720861A (en) * 1993-06-30 1995-01-24 Casio Comput Co Ltd Automatic playing device
JPH0950287A (en) * 1995-08-04 1997-02-18 Yamaha Corp Automatic singing device
CN1325104A (en) * 2000-05-22 2001-12-05 董红伟 Language playback device with automatic music composing function
JP2004077608A (en) * 2002-08-12 2004-03-11 Yamaha Corp Apparatus and method for chorus synthesis and program
KR20070016750A (en) * 2005-08-05 2007-02-08 모두스타 주식회사 Ubiquitous music information retrieval system and method based on query pool with feedback of customer characteristics
US20090314155A1 (en) * 2008-06-20 2009-12-24 Microsoft Corporation Synthesized singing voice waveform generator
US20110000360A1 (en) * 2009-07-02 2011-01-06 Yamaha Corporation Apparatus and Method for Creating Singing Synthesizing Database, and Pitch Curve Generation Apparatus and Method
TW201108202A (en) * 2009-08-25 2011-03-01 Inst Information Industry System, method, and apparatus for singing voice synthesis
CN102024453A (en) * 2009-09-09 2011-04-20 财团法人资讯工业策进会 Singing sound synthesis system, method and device
CN102820027A (en) * 2012-06-21 2012-12-12 福建星网视易信息系统有限公司 Accompaniment subtitle display system and method
CN104392731A (en) * 2014-11-30 2015-03-04 陆俊 Singing practicing method and system
CN105118519A (en) * 2015-07-10 2015-12-02 中山大学孙逸仙纪念医院 Hearing evaluation system
CN105825844A (en) * 2015-07-30 2016-08-03 维沃移动通信有限公司 Sound repairing method and device
CN106547797A (en) * 2015-09-23 2017-03-29 腾讯科技(深圳)有限公司 Audio frequency generation method and device
CN106971703A (en) * 2017-03-17 2017-07-21 西北师范大学 A kind of song synthetic method and device based on HMM
CN107203571A (en) * 2016-03-18 2017-09-26 腾讯科技(深圳)有限公司 Song lyric information processing method and device
CN108428441A (en) * 2018-02-09 2018-08-21 咪咕音乐有限公司 Multimedia file producting method, electronic equipment and storage medium
CN108630243A (en) * 2018-05-09 2018-10-09 福建星网视易信息系统有限公司 A kind of method and terminal that auxiliary is sung
CN109741724A (en) * 2018-12-27 2019-05-10 歌尔股份有限公司 Make the method, apparatus and intelligent sound of song

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1036282A (en) * 1988-03-08 1989-10-11 雅马哈株式会社 Musical-tone-generating-control apparatus
CA2090948A1 (en) * 1992-03-09 1993-09-10 Brian C. Gibson Musical entertainment system
JPH0720861A (en) * 1993-06-30 1995-01-24 Casio Comput Co Ltd Automatic playing device
JPH0950287A (en) * 1995-08-04 1997-02-18 Yamaha Corp Automatic singing device
CN1325104A (en) * 2000-05-22 2001-12-05 董红伟 Language playback device with automatic music composing function
JP2004077608A (en) * 2002-08-12 2004-03-11 Yamaha Corp Apparatus and method for chorus synthesis and program
KR20070016750A (en) * 2005-08-05 2007-02-08 모두스타 주식회사 Ubiquitous music information retrieval system and method based on query pool with feedback of customer characteristics
US20090314155A1 (en) * 2008-06-20 2009-12-24 Microsoft Corporation Synthesized singing voice waveform generator
US20110000360A1 (en) * 2009-07-02 2011-01-06 Yamaha Corporation Apparatus and Method for Creating Singing Synthesizing Database, and Pitch Curve Generation Apparatus and Method
TW201108202A (en) * 2009-08-25 2011-03-01 Inst Information Industry System, method, and apparatus for singing voice synthesis
CN102024453A (en) * 2009-09-09 2011-04-20 财团法人资讯工业策进会 Singing sound synthesis system, method and device
CN102820027A (en) * 2012-06-21 2012-12-12 福建星网视易信息系统有限公司 Accompaniment subtitle display system and method
CN104392731A (en) * 2014-11-30 2015-03-04 陆俊 Singing practicing method and system
CN105118519A (en) * 2015-07-10 2015-12-02 中山大学孙逸仙纪念医院 Hearing evaluation system
CN105825844A (en) * 2015-07-30 2016-08-03 维沃移动通信有限公司 Sound repairing method and device
CN106547797A (en) * 2015-09-23 2017-03-29 腾讯科技(深圳)有限公司 Audio frequency generation method and device
CN107203571A (en) * 2016-03-18 2017-09-26 腾讯科技(深圳)有限公司 Song lyric information processing method and device
CN106971703A (en) * 2017-03-17 2017-07-21 西北师范大学 A kind of song synthetic method and device based on HMM
CN108428441A (en) * 2018-02-09 2018-08-21 咪咕音乐有限公司 Multimedia file producting method, electronic equipment and storage medium
CN108630243A (en) * 2018-05-09 2018-10-09 福建星网视易信息系统有限公司 A kind of method and terminal that auxiliary is sung
CN109741724A (en) * 2018-12-27 2019-05-10 歌尔股份有限公司 Make the method, apparatus and intelligent sound of song

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴耀中等: "一个基于正弦模型的音乐合成系统", 《福建师范大学学报( 自然科学版)》, vol. 16, no. 1, 31 March 2000 (2000-03-31) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113808555A (en) * 2021-09-17 2021-12-17 广州酷狗计算机科技有限公司 Song synthesis method and device, equipment, medium and product thereof

Similar Documents

Publication Publication Date Title
JP6027087B2 (en) Acoustic signal processing system and method for performing spectral behavior transformations
CN111383646B (en) Voice signal transformation method, device, equipment and storage medium
CN110010151A (en) A kind of acoustic signal processing method and equipment, storage medium
CN111445900A (en) Front-end processing method and device for voice recognition and terminal equipment
Mittal et al. Study of characteristics of aperiodicity in Noh voices
CN108269579B (en) Voice data processing method and device, electronic equipment and readable storage medium
CN108369803B (en) Method for forming an excitation signal for a parametric speech synthesis system based on a glottal pulse model
AU2014395554B2 (en) Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
CN112420004A (en) Method and device for generating songs, electronic equipment and computer readable storage medium
JP6193395B2 (en) Digital watermark detection apparatus, method and program
CN111782868B (en) Audio processing method, device, equipment and medium
CN109697985B (en) Voice signal processing method and device and terminal
Lee et al. Excitation signal extraction for guitar tones
CN112164387A (en) Audio synthesis method and device, electronic equipment and computer-readable storage medium
JP6213217B2 (en) Speech synthesis apparatus and computer program for speech synthesis
JP5879813B2 (en) Multiple sound source identification device and information processing device linked to multiple sound sources
CN111862931A (en) Voice generation method and device
CN108780634B (en) Sound signal processing method and sound signal processing device
JP6834370B2 (en) Speech synthesis method
CN113450768B (en) Speech synthesis system evaluation method and device, readable storage medium and terminal equipment
CN116189636B (en) Accompaniment generation method, device, equipment and storage medium based on electronic musical instrument
WO2014108890A1 (en) Method and apparatus for phoneme separation in an audio signal
JP6329408B2 (en) Speech processing apparatus, analysis method and program for speech processing apparatus
JP5877823B2 (en) Speech recognition apparatus, speech recognition method, and program
WO2018043708A1 (en) Method for extracting intonation structure of speech, and computer program therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination