CN107045867A - Automatic composing method, device and terminal device - Google Patents

Automatic composing method, device and terminal device Download PDF

Info

Publication number
CN107045867A
CN107045867A CN201710175115.8A CN201710175115A CN107045867A CN 107045867 A CN107045867 A CN 107045867A CN 201710175115 A CN201710175115 A CN 201710175115A CN 107045867 A CN107045867 A CN 107045867A
Authority
CN
China
Prior art keywords
frame
music
note
energy value
difference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710175115.8A
Other languages
Chinese (zh)
Other versions
CN107045867B (en
Inventor
何江聪
潘青华
胡国平
胡郁
刘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201710175115.8A priority Critical patent/CN107045867B/en
Publication of CN107045867A publication Critical patent/CN107045867A/en
Application granted granted Critical
Publication of CN107045867B publication Critical patent/CN107045867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The application proposes a kind of automatic composing method, device and terminal device, and above-mentioned automatic composing method includes:The music file of leading portion music to be predicted is received, the music file of the leading portion music to be predicted includes the voice data or music description information of the leading portion music to be predicted;Extract the frame level audio frequency characteristics of the music file correspondence music;According to the frame level audio frequency characteristics and the music frequency band feature binding model built in advance, the frame level audio frequency characteristics for carrying band information are obtained;According to the frame level audio frequency characteristics of the carrying band information and the music forecast model built in advance, the music predicted is obtained, to realize automatic composition.The application can realize automatic composition, and then can improve the efficiency and feasibility wrirted music automatically, reduce influence of the subjective factor to wrirting music automatically.

Description

Automatic composing method, device and terminal device
Technical field
The application is related to Audio Signal Processing technical field, more particularly to a kind of automatic composing method, device and terminal are set It is standby.
Background technology
With application of the computer technology in music processing, Computer Music is arisen at the historic moment.Computer Music is as new Raw generation art, gradually penetrates into the various aspects such as creation, instrument playing, education, the amusement of music.Using artificial intelligence technology Progress is wrirted music as research direction newer in Computer Music automatically, and the height of association area researcher is received in recent years Pay attention to.
The existing automatic composing method based on artificial intelligence technology mainly has following two:Based on heuristic search from Action song and the automatic composition based on genetic algorithm.But, the existing automatic composition based on heuristic search is only applicable to pleasure Situation short Qu Changdu, its search efficiency is added to exponential decline with length of audio track, thus for the longer pleasure of length The poor feasibility of bent this method;And the automatic composing method based on genetic algorithm inherits some exemplary shortcomings of genetic algorithm, For example:, genetic operator big to initial population dependence is difficult to precisely selected etc..
The content of the invention
The purpose of the application is intended at least solve one of technical problem in correlation technique to a certain extent.
Therefore, first purpose of the application is to propose a kind of automatic composing method.This method is by building music frequency Band feature binding model and music forecast model, realize automatic composition, are a kind of brand-new automatic composing methods, solve existing Efficiency present in technology is low, poor feasibility, big subjective impact the problems such as.
Second purpose of the application is to propose a kind of automatic composition device.
The 3rd purpose of the application is to propose a kind of terminal device.
The 4th purpose of the application is to propose a kind of storage medium for including computer executable instructions.
To achieve these goals, the automatic composing method of the application first aspect embodiment, including:Before reception is to be predicted Duan Yinle music file, the music file of the leading portion music to be predicted includes the voice data of the leading portion music to be predicted Or music description information;Extract the frame level audio frequency characteristics of the music file correspondence music;According to the frame level audio frequency characteristics and The music frequency band feature binding model built in advance, obtains the frame level audio frequency characteristics for carrying band information;Frequency is carried according to described The frame level audio frequency characteristics of information and the music forecast model built in advance, obtain the music predicted, to realize automatic composition.
In the automatic composing method of the embodiment of the present application, after the music file for receiving leading portion music to be predicted, in extraction The frame level audio frequency characteristics of music file correspondence music are stated, then according to above-mentioned frame level audio frequency characteristics and the music frequency band built in advance Feature binding model, obtains the frame level audio frequency characteristics for carrying band information, finally according to the frame level sound of above-mentioned carrying band information Frequency feature and the music forecast model built in advance, obtain the music that predicts, so as to realize automatic composition, and then can be with The efficiency and feasibility wrirted music automatically are improved, influence of the subjective factor to wrirting music automatically is reduced.
To achieve these goals, the automatic composition device of the application second aspect embodiment, including:Receiving module, is used In the music file for receiving leading portion music to be predicted, the music file of the leading portion music to be predicted includes the leading portion to be predicted The voice data or music description information of music;Extraction module, for extracting the music file correspondence that the receiving module is received The frame level audio frequency characteristics of music;Module is obtained, for according to the frame level audio frequency characteristics and the music frequency band feature built in advance Binding model, obtains the frame level audio frequency characteristics for carrying band information;And it is special according to the frame level audio of the carrying band information Seek peace the music forecast model built in advance, the music predicted is obtained, to realize automatic composition.
In the automatic composition device of the embodiment of the present application, receiving module receive leading portion music to be predicted music file it Afterwards, extraction module extracts the frame level audio frequency characteristics of above-mentioned music file correspondence music, then obtains module according to above-mentioned frame level sound Frequency feature and the music frequency band feature binding model built in advance, obtain the frame level audio frequency characteristics for carrying band information, Yi Jigen According to the frame level audio frequency characteristics and the music forecast model that builds in advance of above-mentioned carrying band information, the music predicted is obtained, from And automatic composition can be realized, and then the efficiency and feasibility wrirted music automatically can be improved, subjective factor is reduced to acting certainly Bent influence.
To achieve these goals, the terminal device of the application third aspect embodiment, including:One or more processing Device;Storage device, for storing one or more programs;When one or more of programs are by one or more of processors During execution so that one or more of processors realize method as described above.
To achieve these goals, the application fourth aspect embodiment provides a kind of depositing comprising computer executable instructions Storage media, the computer executable instructions are used to perform method as described above when being performed by computer processor.
The aspect and advantage that the application is added will be set forth in part in the description, and will partly become from the following description Obtain substantially, or recognized by the practice of the application.
Brief description of the drawings
The above-mentioned and/or additional aspect of the application and advantage will become from the following description of the accompanying drawings of embodiments Substantially and be readily appreciated that, wherein:
Fig. 1 is the flow chart of the automatic composing method one embodiment of the application;
Fig. 2 is the flow chart of automatic another embodiment of composing method of the application;
Fig. 3 is the schematic diagram of topological structure one embodiment in the automatic composing method of the application;
Fig. 4 is the flow chart of the automatic composing method further embodiment of the application;
Fig. 5 is energy value coordinate representation schematic diagram in the automatic composing method of the application;
Fig. 6 is the flow chart of the automatic composing method further embodiment of the application;
Fig. 7 is the flow chart of the automatic composing method further embodiment of the application;
Fig. 8 is the schematic diagram of topological structure another embodiment in the automatic composing method of the application;
Fig. 9 is that the application wrirtes music the structural representation of device one embodiment automatically;
Figure 10 is that the application wrirtes music the structural representation of another embodiment of device automatically;
Figure 11 is the structural representation of the application terminal device one embodiment.
Embodiment
Embodiments herein is described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, is only used for explaining the application, and it is not intended that limitation to the application.On the contrary, this All changes in the range of spirit and intension that the embodiment of application includes falling into attached claims, modification and equivalent Thing.
Fig. 1 is the flow chart of the automatic composing method one embodiment of the application, as shown in figure 1, above-mentioned automatic composing method It can include:
Step 101, the music file of leading portion music to be predicted is received, the music file of above-mentioned leading portion music to be predicted includes The voice data or music description information of above-mentioned leading portion music to be predicted.
Wherein, the voice data or music description information of above-mentioned leading portion music to be predicted refer to giving a bit of music Voice data or music description information, then can just be described according to the voice data or music of given a bit of music Music behind information prediction.
Above-mentioned music description information can typically be converted to voice data, and above-mentioned music description information can be Instrument Digital Interface (Musical Instrument Digital Interface;Hereinafter referred to as:MIDI) file etc..
Step 102, the frame level audio frequency characteristics of above-mentioned music file correspondence music are extracted.
Step 103, according to above-mentioned frame level audio frequency characteristics and the music frequency band feature binding model built in advance, carried The frame level audio frequency characteristics of band information.
Step 104, according to the frame level audio frequency characteristics of above-mentioned carrying band information and the music forecast model built in advance, obtain The music that must be predicted, to realize automatic composition.
In above-mentioned automatic composing method, after the music file for receiving leading portion music to be predicted, above-mentioned music file is extracted The frame level audio frequency characteristics of correspondence music, then according to above-mentioned frame level audio frequency characteristics and the music frequency band feature combination mould built in advance Type, obtains the frame level audio frequency characteristics for carrying band information, frame level audio frequency characteristics finally according to above-mentioned carrying band information and pre- The music forecast model first built, obtains the music predicted, so as to realize automatic composition, and then can improve from action Bent efficiency and feasibility, reduces influence of the subjective factor to wrirting music automatically.
Fig. 2 is the flow chart of automatic another embodiment of composing method of the application, as shown in Fig. 2 before step 103, also It can include:
Step 201, music file is collected, and above-mentioned music file is converted to the audio file of same format.
Specifically, a large amount of training datas can be obtained by crawling a large amount of music files in internet, above-mentioned music file can be with It is voice data or music description information, for example:MIDI files etc..Then above-mentioned music file can be converted to The audio file of same format, the form of above-mentioned audio file, which need to only be met, can carry out Fast Fourier Transform (FFT) (Fast Fourier Transformation;Hereinafter referred to as:FFT), for example:" .PCM " or " .WAV " etc., the present embodiment is to above-mentioned The form of audio file is not construed as limiting, and the present embodiment is illustrated by taking " .PCM " form as an example.It should be noted that:If above-mentioned Music file is music description information, and such as MIDI files then need that MIDI files first are converted into audio file, are reconverted into The audio file of " .PCM " form.
Step 202, the frame level audio frequency characteristics of above-mentioned audio file are extracted.
Step 203, the topological structure of music frequency band feature binding model is determined.
Specifically, topological structure is a neural network structure liquidated, and the present embodiment is with the Recognition with Recurrent Neural Network that liquidates (Recurrent Neural Networks;Hereinafter referred to as:RNN exemplified by), its topological structure includes two independent RNN and one Individual connection unit, as shown in figure 3, Fig. 3 is the schematic diagram of topological structure one embodiment in the automatic composing method of the application.Two Independent RNN, is named as LF_RNN and HF_RNN respectively, is respectively used to low-frequency range multi-frequency feature and combines and high band multi-frequency Feature is combined.
LF_RNN input is certain frame TmWhen, the energy value E (T since low frequencym,Fi), i=1,2 ..., k, k=1, 2 ..., N/2 (N be FFT points), and upper frequency LF_RNN output Li-1;LF_RNN is output as LiExpression considers low T after frequency informationmThe energy value of the frequency of frame i-th.
Similarly, HF_RNN input is certain frame TmWhen, the energy value E (T since high frequencym,Fj), j=N/2, N/2- 1 ..., k, wherein k=1,2 ..., N/2 (N counts for FFT), and a upper frequency HF_RNN output Hj+1;HF_RNN output For HiExpression considers the T after high-frequency informationmThe energy value of frame jth frequency.
Unit is the concatenate in Fig. 3 in succession, and the two is connected into N (T as i=j=km,Fk), examined The T of other frequency point informations is consideredmThe energy value of frame kth frequency.
Step 204, according to the topological structure of determination and above-mentioned frame level audio frequency characteristics, above-mentioned music frequency band feature is trained to combine Model.
Specifically, when training music frequency band feature binding model, the training algorithm used can be neutral net mould Type training algorithm, such as backpropagation (Back Propagation;Hereinafter referred to as:BP) algorithm, training of the present embodiment to use Algorithm is not construed as limiting.
Fig. 4 is the flow chart of the automatic composing method further embodiment of the application, as shown in figure 4, real shown in the application Fig. 2 Apply in example, step 202 can include:
Step 401, above-mentioned audio file is fixed to the Fast Fourier Transform (FFT) of points by frame.
Specifically, the audio file of " .PCM " form can be fixed to the FFT of points by frame.
Step 402, every frame of above-mentioned audio file is calculated in each Frequency point according to the result of Fast Fourier Transform (FFT) Energy value.
Fig. 5 is energy value coordinate representation schematic diagram in the automatic composing method of the application, and Fig. 5 gives each frame in each frequency Energy value coordinate representation schematic diagram, wherein, transverse axis t represents temporal frame, and longitudinal axis f represents Frequency point, and coordinate E (t, f) is represented Energy value, M represents totalframes, and N represents that FFT counts.
Step 403, determine that the note per frame belongs to according to above-mentioned energy value.
Specifically, in each Frequency point, determine that the first frame and the second frame of above-mentioned audio file belong to first note;So Judge whether the absolute value of the first difference is less than the second difference afterwards, wherein, above-mentioned first difference is the 3 of above-mentioned audio file First frame of the energy value of frame and above-mentioned audio file is to the difference of the average value of the second frame energy value, and above-mentioned second difference is above-mentioned Difference of first frame of audio file to the maxima and minima of the second frame energy value;If it is, determining above-mentioned audio file The 3rd frame belong to first note, then judge backward successively the 4th frame until last frame note belong to.
If the absolute value of above-mentioned first difference is more than or equal to the second difference, the 3rd frame of above-mentioned audio file is made For the beginning of second note, and determine that the 4th frame of above-mentioned audio file belongs to second note;From above-mentioned audio file 5th frame starts to judge whether the absolute value of the 3rd difference is less than the 4th difference, and above-mentioned 3rd difference is the of above-mentioned audio file The difference of average value of 3rd frame of the energy value of five frames and above-mentioned audio file to the 4th frame energy value, above-mentioned 4th difference is upper The 3rd frame of audio file is stated to the difference of the maxima and minima of the 4th frame energy value;According to judging that the note of the 3rd frame is returned Category identical mode determines the note ownership of the 5th frame, by that analogy, until by the note of the last frame of above-mentioned audio file Ownership determination is finished.
That is, determining that the note ownership per frame can be:Each Frequency point is handled as follows:By T1And T2Frame Think to belong to first note, from T3If frame starts to judge ownership --- meet E (T3, F1)-Emean(T1, T2)|<(Emax (T1,T2)-Emin(T1,T2)), then T3Frame belongs to first note, then judges the ownership per frame backward successively, wherein, Emean (T1, T2)、Emax(T1,T2) and Emin(T1,T2) T is represented respectively1To T2Average value, maximum and the minimum value of frame energy value;It is no Then by T3Frame as second note beginning, and determine T4Frame belongs to second note, from T5Frame starts judgement, still It is by formula | E (T5, F1)-Emean(T3, T4)|<(Emax(T3,T4)-Emin(T3,T4)) determine that the note of T5 frames belongs to, directly Note ownership determination to all frames is finished.
Step 404, the energy value of each note is calculated, frame level audio frequency characteristics are obtained according to the energy value of each note.
Fig. 6 is the flow chart of the automatic composing method further embodiment of the application, as shown in fig. 6, real shown in the application Fig. 4 Apply in example, step 404 can include:
Step 601, the average energy value of all frames contained by each note is calculated, the energy value of each note is used as.
Step 602, the energy value of every frame included by each note is normalized to the energy value of affiliated note.
Step 603, the note that energy value is less than predetermined threshold is filtered out, to obtain frame level audio frequency characteristics.
Wherein, above-mentioned predetermined threshold according to systematic function and/or can realize the sets itselfs such as demand when implementing, The present embodiment is not construed as limiting to the size of above-mentioned predetermined threshold.
In the present embodiment, the energy for defining a note is the average energy value of all frames contained by the note, thus can be with The average energy value of all frames contained by each note is calculated, as the energy value E (i) of each note, then by included by each note Every frame energy value be normalized to belonging to note energy value.Further, can also after the energy value of each note is calculated, Too small energy value is filtered out according to note average energy value Emean, the less note of these energy values is probably noise.Namely Say, for each E (i), if E (i)<α Emean, then can be set to 0 by the energy value of the note, wherein, on α Emean are Predetermined threshold is stated, α values can determine that the present embodiment is not construed as limiting to this according to practical situations.
It should be noted that in the application embodiment illustrated in fig. 2, step 201~step 204 can be with step 101~step Rapid 102 successively perform, and can also parallel be performed with step 101~step 102, the embodiment of the present application is not construed as limiting to this.
Fig. 7 is the flow chart of the automatic composing method further embodiment of the application, as shown in fig. 7, real shown in the application Fig. 1 Apply in example, before step 104, can also include:
Step 701, the topological structure of music forecast model is determined.
In the present embodiment, above-mentioned music forecast model uses RNN models, is wrirted music automatically as shown in figure 8, Fig. 8 is the application The schematic diagram of another embodiment of topological structure in method, the input of the RNN models shown in Fig. 8 is music frequency band feature combination mould Output N (the T of typem,Fk), and previous frame model output hm, it is output as the energy value N (T of next framem+1,Fk)。
Step 702, according to the output of above-mentioned music frequency band feature binding model, and the topological structure determined, in training State music forecast model.
, can also be with it should be noted that step 701 and step 702 can successively be performed with step 101~step 103 Step 101~step 103 is performed parallel, and the present embodiment is not construed as limiting to this.
Above-mentioned automatic composing method can realize automatic composition, and then can improve the efficiency and feasibility wrirted music automatically, Influence of the subjective factor to wrirting music automatically is reduced, is a kind of brand-new automatic composing method, solves present in prior art Efficiency is low, poor feasibility and the problems such as big subjective impact.
Fig. 9 is that the application wrirtes music the structural representation of device one embodiment automatically, the automatic composition dress in the present embodiment Put can as terminal device, or terminal device a part, realize the application provide automatic composing method.Wherein, on It can be client device to state terminal device, or server device, the application does not make to the form of above-mentioned terminal device Limit.
As shown in figure 9, above-mentioned automatic composition device can include:Receiving module 91, extraction module 92 and acquisition module 93;
Wherein, receiving module 91, the music file for receiving leading portion music to be predicted, above-mentioned leading portion music to be predicted Music file includes the voice data or music description information of above-mentioned leading portion music to be predicted;Wherein, above-mentioned leading portion sound to be predicted Happy voice data or music description information refers to giving the voice data or music description information of a bit of music, then Music below can be just predicted according to the voice data or music description information of given a bit of music.Above-mentioned music is retouched Voice data can be typically converted to by stating information, and above-mentioned music description information can be MIDI files etc..
Extraction module 92, the frame level audio frequency characteristics of the music file correspondence music for extracting the reception of receiving module 91;
Module 93 is obtained, for according to above-mentioned frame level audio frequency characteristics and the music frequency band feature binding model built in advance, Obtain the frame level audio frequency characteristics for carrying band information;And according to the frame level audio frequency characteristics and advance structure of above-mentioned carrying band information The music forecast model built, obtains the music predicted, to realize automatic composition.
In above-mentioned automatic composition device, receiving module 91 is received after the music file of leading portion music to be predicted, extracts mould Block 92 extracts the frame level audio frequency characteristics of above-mentioned music file correspondence music, then obtains module 93 according to above-mentioned frame level audio frequency characteristics The music frequency band feature binding model built in advance, obtains the frame level audio frequency characteristics for carrying band information, and according to above-mentioned The music forecast model for carrying the frame level audio frequency characteristics of band information and building in advance, obtains the music predicted, so as to Automatic composition is realized, and then the efficiency and feasibility wrirted music automatically can be improved, subjective factor is reduced to the shadow wrirted music automatically Ring.
Figure 10 is that the application wrirtes music the structural representation of another embodiment of device automatically, with the automatic composition shown in Fig. 9 Device is compared, and difference is, the automatic composition device shown in Figure 10 can also include:Collection module 94, modular converter 95, Determining module 96 and training module 97;
Collection module 94, for before the frame level audio frequency characteristics of the acquisition carrying band information of module 93 are obtained, collecting sound Music file;
Modular converter 95, the music file for collection module 94 to be collected is converted to the audio file of same format;
Specifically, collection module 94 can obtain a large amount of training datas, above-mentioned sound by crawling a large amount of music files in internet Music file can be voice data or music description information, for example:MIDI files etc..Then modular converter 95 can be with Above-mentioned music file is converted to the audio file of same format, the form of above-mentioned audio file, which need to only be met, can carry out FFT , for example:" .PCM " or " .WAV " etc., the present embodiment are not construed as limiting to the form of above-mentioned audio file, the present embodiment with Illustrated exemplified by " .PCM " form.It should be noted that:If above-mentioned music file is music description information, such as MIDI texts Part, then need that MIDI files first are converted into audio file, be reconverted into the audio file of " .PCM " form.
Extraction module 92, is additionally operable to extract the frame level audio frequency characteristics for the audio file that modular converter 95 is changed.
Determining module 96, the topological structure for determining music frequency band feature binding model;Specifically, it is determined that module 96 is true Fixed topological structure is a neural network structure liquidated, and the present embodiment is by taking the RNN that liquidates as an example, and its topological structure includes two RNN and independent connection unit, as shown in figure 3, two independent RNN, are named as LF_RNN and HF_RNN respectively, respectively Combined for low-frequency range multi-frequency feature and high band multi-frequency feature is combined.
LF_RNN input is certain frame TmWhen, the energy value E (T since low frequencym,Fi), i=1,2 ..., k, k=1, 2 ..., N/2 (N be FFT points), and upper frequency LF_RNN output Li-1;LF_RNN is output as LiExpression considers low T after frequency informationmThe energy value of the frequency of frame i-th.
Similarly, HF_RNN input is certain frame TmWhen, the energy value E (T since high frequencym,Fj), j=N/2, N/2- 1 ..., k, wherein k=1,2 ..., N/2 (N counts for FFT), and a upper frequency HF_RNN output Hj+1;HF_RNN output For HiExpression considers the T after high-frequency informationmThe energy value of frame jth frequency.
Unit is the concatenate in Fig. 3 in succession, and the two is connected into N (T as i=j=km,Fk), examined The T of other frequency point informations is consideredmThe energy value of frame kth frequency.
Training module 97, the frame level audio that topological structure and extraction module 92 for being determined according to determining module 96 are extracted Feature, trains above-mentioned music frequency band feature binding model.Specifically, training module 97 is in training music frequency band feature binding model When, the training algorithm used can be neural network model training algorithm, such as BP algorithm, and training of the present embodiment to use is calculated Method is not construed as limiting.
In the present embodiment, extraction module 92 can include:Transformation submodule 921, calculating sub module 922, determination sub-module 923 and acquisition submodule 924;
Wherein, transformation submodule 921, the fast Fourier for above-mentioned audio file to be fixed into points by frame becomes Change;Specifically, the FFT of points the audio file of " .PCM " form can be fixed by frame in transformation submodule 921.
Calculating sub module 922, for calculating above-mentioned audio text according to the result of the Fast Fourier Transform (FFT) of transformation submodule 921 Energy value of the every frame of part in each Frequency point;Fig. 5 gives schematic diagram of each frame in the energy value coordinate representation of each frequency, Wherein, transverse axis t represents temporal frame, and longitudinal axis f represents Frequency point, and coordinate E (t, f) represents energy value, and M represents totalframes, and N is represented FFT counts.
Determination sub-module 923, the energy value for being calculated according to calculating sub module 922 determines that the note per frame belongs to.
Calculating sub module 922, is additionally operable to calculate the energy value of each note;
Acquisition submodule 924, the energy value of each note for being calculated according to calculating sub module 922 obtains frame level audio Feature.
Wherein, calculating sub module 922, the average energy value specifically for calculating all frames contained by each note, as each The energy value of note;And the energy value of every frame included by each note is normalized to the energy value of affiliated note;
Acquisition submodule 924, the note of predetermined threshold is less than specifically for filtering out energy value, special to obtain frame level audio Levy.Wherein, above-mentioned predetermined threshold according to systematic function and/or can realize the sets itselfs, this reality such as demand when implementing Example is applied to be not construed as limiting the size of above-mentioned predetermined threshold.
In the present embodiment, the energy for defining a note is the average energy value of all frames contained by the note, thus can be with The average energy value of all frames contained by each note is calculated, as the energy value E (i) of each note, then by included by each note Every frame energy value be normalized to belonging to note energy value.Further, can also after the energy value of each note is calculated, Too small energy value is filtered out according to note average energy value Emean, the less note of these energy values is probably noise.Namely Say, for each E (i), if E (i)<α Emean, then can be set to 0 by the energy value of the note, wherein, on α Emean are Predetermined threshold is stated, α values can determine that the present embodiment is not construed as limiting to this according to practical situations.
In the present embodiment, determination sub-module 923 can include:Note determining unit 9231 and judging unit 9232;
Note determining unit 9231, in each Frequency point, determining the first frame and the second frame category of above-mentioned audio file In first note;
Judging unit 9232, for judging whether the absolute value of the first difference is less than the second difference;Above-mentioned first difference is The energy value of the 3rd frame and the first frame of above-mentioned audio file of above-mentioned audio file to the average value of the second frame energy value difference, Second difference is the difference of the maxima and minima of the first frame to the second frame energy value of above-mentioned audio file;
Note determining unit 9231, is additionally operable to, when the absolute value of the first difference is less than the second difference, determine above-mentioned audio 3rd frame of file belongs to first note, then judges the 4th frame backward successively until the note of last frame belongs to.
Note determining unit 9231, is additionally operable to when the absolute value of the first difference is more than or equal to the second difference, will be above-mentioned 3rd frame of audio file and determines that the 4th frame of above-mentioned audio file belongs to second sound as the beginning of second note Symbol;
Judging unit 9232, be additionally operable to judge since the 5th frame of above-mentioned audio file the 3rd difference absolute value whether Less than the 4th difference, above-mentioned 3rd difference is the energy value of the 5th frame of above-mentioned audio file and the 3rd frame of above-mentioned audio file To the difference of the average value of the 4th frame energy value, above-mentioned 4th difference is the 3rd frame to the 4th frame energy value of above-mentioned audio file The difference of maxima and minima;Determine that the note of the 5th frame belongs to according to judging that the note of the 3rd frame belongs to identical mode, By that analogy, until the note ownership determination of the last frame of above-mentioned audio file is finished.
That is, determination sub-module 923 determines that the note ownership of every frame can be:Each Frequency point is located as follows Reason:Note determining unit 9231 is by T1And T2Frame thinks to belong to first note, and judging unit 9232 is from T3Frame starts judgement and returned If category --- meet | E (T3, F1)-Emean(T1, T2)|<(Emax(T1,T2)-Emin(T1,T2)), then T3 frames belong to first Individual note, then judge the ownership per frame backward successively, wherein, Emean(T1, T2)、Emax(T1,T2) and Emin(T1,T2) represent respectively T1To T2Average value, maximum and the minimum value of frame energy value;Otherwise by T3Frame as second note beginning, and really Fixed T4Frame belongs to second note, from T5Frame starts to judge, is still by formula | E (T5, F1)-Emean(T3, T4)|<(Emax (T3,T4)-Emin(T3,T4)) determine that the note of T5 frames belongs to, until the note ownership determination of all frames is finished.
Further, above-mentioned automatic composition device can also include:Determining module 96 and training module 97;
Determining module 96, for before the music that the acquisition of module 93 is predicted is obtained, determining opening up for music forecast model Flutter structure;In the present embodiment, the topological structure for the music forecast model that determining module 96 is determined is RNN models, as shown in figure 8, The input of RNN models is the output N (T of music frequency band feature binding modelm,Fk), and previous frame model output hm, output For the energy value N (T of next framem+1,Fk)。
Training module 97, is determined for the output according to above-mentioned music frequency band feature binding model, and determining module 96 Topological structure, train above-mentioned music forecast model.
Above-mentioned automatic composition device can realize automatic composition, and then can improve the efficiency and feasibility wrirted music automatically, Influence of the subjective factor to wrirting music automatically is reduced, is a kind of brand-new automatic composing method, solves present in prior art Efficiency is low, poor feasibility and the problems such as big subjective impact.
Figure 11 is that the terminal device in the structural representation of the application terminal device one embodiment, the application can be realized The automatic composing method that the application is provided, above-mentioned terminal device can be client device, or server device, this Shen Please the form to above-mentioned terminal device is not construed as limiting.Above-mentioned terminal device can include:One or more processors;Storage dress Put, for storing one or more programs;When said one or multiple programs are by said one or multiple computing devices, make Obtain said one or multiple processors realize the automatic composing method that the application is provided.
Figure 11 shows the block diagram suitable for being used for the exemplary terminal equipment 12 for realizing the application embodiment.Figure 11 is shown Terminal device 12 be only an example, should not be to the function of the embodiment of the present application and any limitation using range band.
As shown in figure 11, terminal device 12 is showed in the form of universal computing device.The component of terminal device 12 can be wrapped Include but be not limited to:One or more processor or processing unit 16, system storage 28, connection different system component (bag Include system storage 28 and processing unit 16) bus 18.
Bus 18 represents the one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.Lift For example, these architectures include but is not limited to industry standard architecture (Industry Standard Architecture;Hereinafter referred to as:ISA) bus, MCA (Micro Channel Architecture;Below Referred to as:MAC) bus, enhanced isa bus, VESA (Video Electronics Standards Association;Hereinafter referred to as:VESA) local bus and periphery component interconnection (Peripheral Component Interconnection;Hereinafter referred to as:PCI) bus.
Terminal device 12 typically comprises various computing systems computer-readable recording medium.These media can be it is any can be by end The usable medium that end equipment 12 is accessed, including volatibility and non-volatile media, moveable and immovable medium.
System storage 28 can include the computer system readable media of form of volatile memory, such as arbitrary access Memory (Random Access Memory;Hereinafter referred to as:RAM) 30 and/or cache memory 32.Terminal device 12 can To further comprise other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as act Example, storage system 34 can be used for the immovable, non-volatile magnetic media of read-write, and (Figure 11 does not show that commonly referred to as " hard disk drives Dynamic device ").Although not shown in Figure 11, can provide for the magnetic to may move non-volatile magnetic disk (such as " floppy disk ") read-write Disk drive, and to removable anonvolatile optical disk (for example:Compact disc read-only memory (Compact Disc Read Only Memory;Hereinafter referred to as:CD-ROM), digital multi read-only optical disc (Digital Video Disc Read Only Memory;Hereinafter referred to as:DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving Device can be connected by one or more data media interfaces with bus 18.Memory 28 can include the production of at least one program Product, the program product has one group of (for example, at least one) program module, and it is each that these program modules are configured to perform the application The function of embodiment.
Program/utility 40 with one group of (at least one) program module 42, can be stored in such as memory 28 In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs The realization of network environment is potentially included in each or certain combination in module and routine data, these examples.Program mould Block 42 generally performs the automatic composing method in embodiments described herein.
Terminal device 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24) Communication, can also enable a user to the equipment communication interacted with the terminal device 12 with one or more, and/or with causing the end Any equipment (such as network interface card, modem etc.) that end equipment 12 can be communicated with one or more of the other computing device Communication.This communication can be carried out by input/output (I/O) interface 22.Also, terminal device 12 can also be suitable by network Orchestration 20 and one or more network (such as LAN (Local Area Network;Hereinafter referred to as:LAN), wide area network (Wide Area Network;Hereinafter referred to as:WAN) and/or public network, such as internet) communication.As shown in figure 11, network Adapter 20 is communicated by bus 18 with other modules of terminal device 12.Although it should be understood that not shown in Figure 11, Ke Yijie Close terminal device 12 and use other hardware and/or software module, include but is not limited to:Microcode, device driver, redundancy processing Unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 16 is stored in program in system storage 28 by operation, thus perform various function application and Data processing, for example, realize the automatic composing method that the application is provided.
It should be noted that in the description of the present application, term " first ", " second " etc. are only used for describing purpose, without It is understood that to indicate or imply relative importance.In addition, in the description of the present application, unless otherwise indicated, the implication of " multiple " It is two or more.
Any process described otherwise above or method description are construed as in flow chart or herein, represent to include Module, fragment or the portion of the code of one or more executable instructions for the step of realizing specific logical function or process Point, and the scope of the preferred embodiment of the application includes other realization, wherein can not be by shown or discussion suitable Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be by the application Embodiment person of ordinary skill in the field understood.
It should be appreciated that each several part of the application can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, the software that multiple steps or method can in memory and by suitable instruction execution system be performed with storage Or firmware is realized.If, and in another embodiment, can be with well known in the art for example, realized with hardware Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal Discrete logic, the application specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (Programmable Gate Array;Hereinafter referred to as:PGA), field programmable gate array (Field Programmable Gate Array;Hereinafter referred to as:FPGA) etc..
The application also provides a kind of storage medium for including computer executable instructions, and above computer executable instruction exists For performing the automatic composing method that the application is provided when being performed by computer processor.
The above-mentioned storage medium comprising computer executable instructions can use one or more computer-readable media Any combination.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Calculate Machine readable storage medium storing program for executing for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, Device or device, or any combination above.The more specifically example (non exhaustive list) of computer-readable recording medium Including:Electrical connection, portable computer diskette, hard disk, random access memory (RAM) with one or more wires, only Read memory (Read Only Memory;Hereinafter referred to as:ROM), erasable programmable read only memory (Erasable Programmable Read Only Memory;Hereinafter referred to as:) or flash memory, optical fiber, portable compact disc are read-only deposits EPROM Reservoir (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer Readable storage medium storing program for executing can be it is any include or storage program tangible medium, the program can be commanded execution system, device Or device is used or in connection.
Computer-readable signal media can be included in a base band or as the data-signal of carrier wave part propagation, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium beyond computer-readable recording medium, the computer-readable medium can send, propagate or Transmit for being used or program in connection by instruction execution system, device or device.
The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but do not limit In --- wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.
The computer for performing the application operation can be write with one or more programming languages or its combination Program code, described program design language includes object oriented program language-such as Java, Smalltalk, C++, Also including conventional procedural programming language-such as " C " language or similar programming language.Program code can be with Fully perform, partly perform on the user computer on the user computer, as independent software kit execution, a portion Divide part execution or the execution completely on remote computer or server on the remote computer on the user computer. It is related in the situation of remote computer, remote computer can be by the network of any kind --- including LAN (Local Area Network;Hereinafter referred to as:) or wide area network (Wide Area Network LAN;Hereinafter referred to as:WAN) it is connected to user Computer, or, it may be connected to outer computer (such as using ISP come by Internet connection).
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried Rapid to can be by program to instruct the hardware of correlation to complete, described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
In addition, each functional module in each embodiment of the application can be integrated in a processing module or Modules are individually physically present, can also two or more modules be integrated in a module.Above-mentioned integrated module Both it can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.If the integrated module Using in the form of software function module realize and as independent production marketing or in use, a computer can also be stored in can Read in storage medium.
Storage medium mentioned above can be read-only storage, disk or CD etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means to combine specific features, structure, material or the spy that the embodiment or example are described Point is contained at least one embodiment of the application or example.In this manual, to the schematic representation of above-mentioned term not Necessarily refer to identical embodiment or example.Moreover, specific features, structure, material or the feature of description can be any One or more embodiments or example in combine in an appropriate manner.
Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to the limitation to the application is interpreted as, one of ordinary skill in the art within the scope of application can be to above-mentioned Embodiment is changed, changed, replacing and modification.

Claims (16)

1. a kind of automatic composing method, it is characterised in that including:
Receive the music file of leading portion music to be predicted, the music file of the leading portion music to be predicted include it is described it is to be predicted before Duan Yinle voice data or music description information;
Extract the frame level audio frequency characteristics of the music file correspondence music;
According to the frame level audio frequency characteristics and the music frequency band feature binding model built in advance, the frame for carrying band information is obtained Level audio frequency characteristics;
According to the frame level audio frequency characteristics of the carrying band information and the music forecast model built in advance, the sound predicted is obtained It is happy, to realize automatic composition.
2. according to the method described in claim 1, it is characterised in that described to build according to the frame level audio frequency characteristics and in advance Before music frequency band feature binding model, the frame level audio frequency characteristics for obtaining carrying band information, in addition to:
Music file is collected, and the music file is converted to the audio file of same format;
Extract the frame level audio frequency characteristics of the audio file;
Determine the topological structure of music frequency band feature binding model;
According to the topological structure of determination and the frame level audio frequency characteristics, the music frequency band feature binding model is trained.
3. method according to claim 2, it is characterised in that the frame level audio frequency characteristics bag of the extraction audio file Include:
The audio file is fixed to the Fast Fourier Transform (FFT) of points by frame;
Energy value of every frame in each Frequency point of the audio file is calculated according to the result of Fast Fourier Transform (FFT);
Determine that the note per frame belongs to according to the energy value;
The energy value of each note is calculated, frame level audio frequency characteristics are obtained according to the energy value of each note.
4. method according to claim 3, it is characterised in that described to determine that the note per frame belongs to according to the energy value Including:
In each Frequency point, determine that the first frame and the second frame of the audio file belong to first note;
Judge whether the absolute value of the first difference is less than the second difference;First difference is the 3rd frame of the audio file First frame of energy value and the audio file is to the difference of the average value of the second frame energy value, and second difference is the audio Difference of first frame of file to the maxima and minima of the second frame energy value;
If it is, determine that the 3rd frame of the audio file belongs to first note, then judge the 4th frame backward successively until The note ownership of last frame.
5. method according to claim 4, it is characterised in that whether the absolute value for judging the first difference is less than second After the absolute value of difference, in addition to:
If the absolute value of first difference is more than or equal to second difference, the 3rd frame of the audio file is made For the beginning of second note, and determine that the 4th frame of the audio file belongs to second note;
Judge whether the absolute value of the 3rd difference is less than the 4th difference since the 5th frame of the audio file, the described 3rd is poor It is worth the average value of the energy value of the 5th frame for the audio file and the 3rd frame to the 4th frame energy value of the audio file Difference, the 4th difference for maxima and minima of the 3rd frame to the 4th frame energy value of the audio file difference;Directly Finished to by the note ownership determination of the last frame of the audio file.
6. method according to claim 3, it is characterised in that the energy value of each note of calculating, according to each sound The energy value of symbol, which obtains frame level audio frequency characteristics, to be included:
The average energy value of all frames contained by each note is calculated, the energy value of each note is used as;
The energy value of every frame included by each note is normalized to the energy value of affiliated note;
The note that energy value is less than predetermined threshold is filtered out, to obtain frame level audio frequency characteristics.
7. according to the method described in claim 1, it is characterised in that the frame level audio according to the carrying band information is special Seek peace the music forecast model built in advance, obtain before the music predicted, can also include:
Determine the topological structure of music forecast model;
According to the output of the music frequency band feature binding model, and the topological structure determined, train the music prediction mould Type.
8. a kind of automatic composition device, it is characterised in that including:
Receiving module, the music file for receiving leading portion music to be predicted, the music file bag of the leading portion music to be predicted Include the voice data or music description information of the leading portion music to be predicted;
Extraction module, the frame level audio frequency characteristics for extracting the music file correspondence music that the receiving module is received;
Module is obtained, for according to the frame level audio frequency characteristics and the music frequency band feature binding model built in advance, being taken Frame level audio frequency characteristics with band information;And according to the frame level audio frequency characteristics for carrying band information and the sound built in advance Happy forecast model, obtains the music predicted, to realize automatic composition.
9. device according to claim 8, it is characterised in that also include:Collection module, modular converter, determining module and Training module;
The collection module, for before the acquisition module obtains the frame level audio frequency characteristics for carrying band information, collecting sound Music file;
The modular converter, the music file for the collection module to be collected is converted to the audio file of same format;
The extraction module, is additionally operable to extract the frame level audio frequency characteristics of the audio file of the modular converter conversion;
The determining module, the topological structure for determining music frequency band feature binding model;
The training module, the frame level sound extracted for the topological structure determined according to the determining module and the extraction module Frequency feature, trains the music frequency band feature binding model.
10. device according to claim 9, it is characterised in that the extraction module includes:
Transformation submodule, the Fast Fourier Transform (FFT) for the audio file to be fixed to points by frame;
Calculating sub module, for calculating the every of the audio file according to the result of the transformation submodule Fast Fourier Transform (FFT) Energy value of the frame in each Frequency point;
Determination sub-module, the energy value for being calculated according to the calculating sub module determines that the note per frame belongs to;
The calculating sub module, is additionally operable to calculate the energy value of each note;
Acquisition submodule, the energy value of each note for being calculated according to the calculating sub module obtains frame level audio frequency characteristics.
11. device according to claim 10, it is characterised in that the determination sub-module includes:
Note determining unit, in each Frequency point, determining that the first frame and the second frame of the audio file belong to first Note;
Judging unit, for judging whether the absolute value of the first difference is less than the second difference;First difference is the audio The energy value of the 3rd frame and the first frame of the audio file of file to the average value of the second frame energy value difference, described second Difference is the difference of the maxima and minima of the first frame to the second frame energy value of the audio file;
The note determining unit, is additionally operable to, when the absolute value of first difference is less than the second difference, determine the audio 3rd frame of file belongs to first note, then judges the 4th frame backward successively until the note of last frame belongs to.
12. device according to claim 11, it is characterised in that
The note determining unit, is additionally operable to when the absolute value of first difference is more than or equal to second difference, will 3rd frame of the audio file and determines that the 4th frame of the audio file belongs to described as the beginning of second note Two notes;
The judging unit, is additionally operable to judge whether the absolute value of the 3rd difference is less than since the 5th frame of the audio file 4th difference, the 3rd difference for the audio file the energy value of the 5th frame and the 3rd frame of the audio file to the The difference of the average value of four frame energy values, the 4th difference is the maximum of the 3rd frame to the 4th frame energy value of the audio file The difference of value and minimum value;Until the note ownership determination of the last frame of the audio file is finished.
13. device according to claim 10, it is characterised in that
The calculating sub module, the average energy value specifically for calculating all frames contained by each note, is used as the energy of each note Value;And the energy value of every frame included by each note is normalized to the energy value of affiliated note;
The acquisition submodule, the note of predetermined threshold is less than specifically for filtering out energy value, to obtain frame level audio frequency characteristics.
14. device according to claim 8, it is characterised in that also include:Determining module and training module;
The determining module, for before the acquisition module obtains the music predicted, determining opening up for music forecast model Flutter structure;
The training module, is determined for the output according to the music frequency band feature binding model, and the determining module Topological structure, train the music forecast model.
15. a kind of terminal device, it is characterised in that including:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are by one or more of computing devices so that one or more of processors Realize the method as described in any in claim 1-7.
16. a kind of storage medium for including computer executable instructions, the computer executable instructions are by computer disposal For performing the method as described in any in claim 1-7 when device is performed.
CN201710175115.8A 2017-03-22 2017-03-22 Automatic composition method and device and terminal equipment Active CN107045867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710175115.8A CN107045867B (en) 2017-03-22 2017-03-22 Automatic composition method and device and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710175115.8A CN107045867B (en) 2017-03-22 2017-03-22 Automatic composition method and device and terminal equipment

Publications (2)

Publication Number Publication Date
CN107045867A true CN107045867A (en) 2017-08-15
CN107045867B CN107045867B (en) 2020-06-02

Family

ID=59544865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710175115.8A Active CN107045867B (en) 2017-03-22 2017-03-22 Automatic composition method and device and terminal equipment

Country Status (1)

Country Link
CN (1) CN107045867B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108538301A (en) * 2018-02-13 2018-09-14 吟飞科技(江苏)有限公司 A kind of intelligent digital musical instrument based on neural network Audiotechnica
CN109192187A (en) * 2018-06-04 2019-01-11 平安科技(深圳)有限公司 Composing method, system, computer equipment and storage medium based on artificial intelligence
CN109285560A (en) * 2018-09-28 2019-01-29 北京奇艺世纪科技有限公司 A kind of music features extraction method, apparatus and electronic equipment
CN109727590A (en) * 2018-12-24 2019-05-07 成都嗨翻屋科技有限公司 Music generating method and device based on Recognition with Recurrent Neural Network
CN109872709A (en) * 2019-03-04 2019-06-11 湖南工程学院 A kind of new bent generation method of the low similarity based on note complex network
CN110660375A (en) * 2018-06-28 2020-01-07 北京搜狗科技发展有限公司 Method, device and equipment for generating music

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040089142A1 (en) * 2002-11-12 2004-05-13 Alain Georges Systems and methods for creating, modifying, interacting with and playing musical compositions
US20110161078A1 (en) * 2007-03-01 2011-06-30 Microsoft Corporation Pitch model for noise estimation
CN104050972A (en) * 2013-03-14 2014-09-17 雅马哈株式会社 Sound signal analysis apparatus, sound signal analysis method and sound signal analysis program
CN104282300A (en) * 2013-07-05 2015-01-14 中国移动通信集团公司 Non-periodic component syllable model building and speech synthesizing method and device
CN105374347A (en) * 2015-09-22 2016-03-02 中国传媒大学 A mixed algorithm-based computer-aided composition method for popular tunes in regions south of the Yangtze River
US20170046973A1 (en) * 2015-10-27 2017-02-16 Thea Kuddo Preverbal elemental music: multimodal intervention to stimulate auditory perception and receptive language acquisition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040089142A1 (en) * 2002-11-12 2004-05-13 Alain Georges Systems and methods for creating, modifying, interacting with and playing musical compositions
US20110161078A1 (en) * 2007-03-01 2011-06-30 Microsoft Corporation Pitch model for noise estimation
CN104050972A (en) * 2013-03-14 2014-09-17 雅马哈株式会社 Sound signal analysis apparatus, sound signal analysis method and sound signal analysis program
CN104282300A (en) * 2013-07-05 2015-01-14 中国移动通信集团公司 Non-periodic component syllable model building and speech synthesizing method and device
CN105374347A (en) * 2015-09-22 2016-03-02 中国传媒大学 A mixed algorithm-based computer-aided composition method for popular tunes in regions south of the Yangtze River
US20170046973A1 (en) * 2015-10-27 2017-02-16 Thea Kuddo Preverbal elemental music: multimodal intervention to stimulate auditory perception and receptive language acquisition

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108538301A (en) * 2018-02-13 2018-09-14 吟飞科技(江苏)有限公司 A kind of intelligent digital musical instrument based on neural network Audiotechnica
CN108538301B (en) * 2018-02-13 2021-05-07 吟飞科技(江苏)有限公司 Intelligent digital musical instrument based on neural network audio technology
CN109192187A (en) * 2018-06-04 2019-01-11 平安科技(深圳)有限公司 Composing method, system, computer equipment and storage medium based on artificial intelligence
CN110660375A (en) * 2018-06-28 2020-01-07 北京搜狗科技发展有限公司 Method, device and equipment for generating music
CN110660375B (en) * 2018-06-28 2024-06-04 北京搜狗科技发展有限公司 Method, device and equipment for generating music
CN109285560A (en) * 2018-09-28 2019-01-29 北京奇艺世纪科技有限公司 A kind of music features extraction method, apparatus and electronic equipment
CN109285560B (en) * 2018-09-28 2021-09-03 北京奇艺世纪科技有限公司 Music feature extraction method and device and electronic equipment
CN109727590A (en) * 2018-12-24 2019-05-07 成都嗨翻屋科技有限公司 Music generating method and device based on Recognition with Recurrent Neural Network
CN109872709A (en) * 2019-03-04 2019-06-11 湖南工程学院 A kind of new bent generation method of the low similarity based on note complex network

Also Published As

Publication number Publication date
CN107045867B (en) 2020-06-02

Similar Documents

Publication Publication Date Title
CN107045867A (en) Automatic composing method, device and terminal device
CN107221326B (en) Voice awakening method and device based on artificial intelligence and computer equipment
CN109859772B (en) Emotion recognition method, emotion recognition device and computer-readable storage medium
KR102128926B1 (en) Method and device for processing audio information
CN105702250B (en) Speech recognition method and device
CN107134279A (en) A kind of voice awakening method, device, terminal and storage medium
CN108573694A (en) Language material expansion and speech synthesis system construction method based on artificial intelligence and device
CN109036396A (en) A kind of exchange method and system of third-party application
CN111081280B (en) Text-independent speech emotion recognition method and device and emotion recognition algorithm model generation method
CN108281138A (en) Age discrimination model training and intelligent sound exchange method, equipment and storage medium
CN108269567A (en) For generating the method, apparatus of far field voice data, computing device and computer readable storage medium
CN108922564A (en) Emotion identification method, apparatus, computer equipment and storage medium
CN112017650B (en) Voice control method and device of electronic equipment, computer equipment and storage medium
CN110459207A (en) Wake up the segmentation of voice key phrase
CN111192594B (en) Method for separating voice and accompaniment and related product
WO2023116660A2 (en) Model training and tone conversion method and apparatus, device, and medium
CN107978308A (en) A kind of K songs methods of marking, device, equipment and storage medium
CN113053410B (en) Voice recognition method, voice recognition device, computer equipment and storage medium
CN109785846A (en) The role recognition method and device of the voice data of monophonic
CN112948623B (en) Music heat prediction method, device, computing equipment and medium
EP4033483A2 (en) Method and apparatus for testing vehicle-mounted voice device, electronic device and storage medium
WO2021227308A1 (en) Video resource generation method and apparatus
CN112562725A (en) Mixed voice emotion classification method based on spectrogram and capsule network
CN112420079B (en) Voice endpoint detection method and device, storage medium and electronic equipment
CN106228976A (en) Audio recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant