CN113053336A - Method, device and equipment for generating musical composition and storage medium - Google Patents

Method, device and equipment for generating musical composition and storage medium Download PDF

Info

Publication number
CN113053336A
CN113053336A CN202110285844.5A CN202110285844A CN113053336A CN 113053336 A CN113053336 A CN 113053336A CN 202110285844 A CN202110285844 A CN 202110285844A CN 113053336 A CN113053336 A CN 113053336A
Authority
CN
China
Prior art keywords
vector
preset
quantization
query
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110285844.5A
Other languages
Chinese (zh)
Inventor
刘奡智
党艺飞
韩宝强
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110285844.5A priority Critical patent/CN113053336A/en
Publication of CN113053336A publication Critical patent/CN113053336A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/111Automatic composing, i.e. using predefined musical rules

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a method, a device, equipment and a storage medium for generating musical compositions, which are used for generating the musical compositions according to a preset vector quantization variation automatic coding VQ-VAE model and original audio data, improving the music generation efficiency, improving the accuracy of pitch and rhythm, and enabling the musical compositions to have more uniqueness and expressive force. The method for generating the musical composition comprises the following steps: acquiring original data, wherein the original data is audio data to be processed; calling a preset autoregressive discrete autoencoder to extract features of original data to generate a query vector, wherein the query vector is used for querying key information; quantizing the query vector to obtain a quantized vector, wherein the quantized vector comprises numerical quantized pitch and rhythm information; calling a preset vector quantization variation automatic coding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating to obtain target data; and inputting the target data into a preset decoder to generate the musical composition.

Description

Method, device and equipment for generating musical composition and storage medium
Technical Field
The present invention relates to the field of audio conversion, and in particular, to a method, an apparatus, a device, and a storage medium for generating musical compositions.
Background
Music is an auditory image formed by organized sounds, is an artistic form expressing the thought emotion of people and the real life of society, and is far from the output of mechanical rhythm and pitch. The effects of music are largely derived from their appeal, which comes from the unique way in which each player processes and plays, and the different players bring their experiences and unique understandings of the music into play, thereby producing an appealing musical effect.
In the prior art, music is generated by using music score display as input, the uniqueness on music performance emotion and expression is neglected, the control on rhythm accuracy and pitch is embodied, and only a limited variety of instruments can be processed, so that the loss of music effect is caused.
Disclosure of Invention
The invention provides a method for generating musical works, which is used for generating the musical works according to a preset vector quantization variation automatic coding VQ-VAE model and original audio data, improves the music generation efficiency, improves the accuracy of pitch and rhythm, and makes the musical works more unique and expressive.
The first aspect of the present invention provides a method for generating a musical composition, comprising: acquiring original data, wherein the original data is audio data to be processed; calling a preset autoregressive discrete autoencoder to extract features of the original data to generate a query vector, wherein the query vector is used for querying key information; quantizing the query vector to obtain a quantized vector, wherein the quantized vector comprises numerical quantized pitch and rhythm information; calling a preset vector quantization variation automatic coding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating to obtain target data; and inputting the target data into a preset decoder to generate the musical composition.
Optionally, in a first implementation manner of the first aspect of the present invention, the invoking a preset vector quantization variation automatic coding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating and obtaining target data includes: calculating and obtaining target data according to a preset formula based on the query vector and the quantization vector, wherein the preset formula is as follows:
LVQ-VAE=-logp(x|q')+(q'-[q])2+β([q']-q)2(ii) a Wherein L isVQ-VAEFor the loss function, p (x | q ') is the probability of the quantized vector q ' occurring in the case of class x, q is the query vector, q ' is the quantized vector, and β is the weight of the average of the difference between the query vector and the quantized vector.
Optionally, in a second implementation manner of the first aspect of the present invention, the invoking a preset autoregressive discrete autoencoder to perform feature extraction on the original data to generate a query vector, where the query vector is used to query key information and includes: inputting the original data into a preset autoregressive discrete autoregressive coder, wherein the preset autoregressive discrete autoregressive coder directly takes the original data as a learning object; extracting features of the original data to obtain a plurality of target features, and converting the plurality of target features into initial vectors based on a preset algorithm; and filtering the initial vector to generate a query vector, wherein the query vector is used for querying key information.
Optionally, in a third implementation manner of the first aspect of the present invention, the performing feature extraction on the original data to obtain a plurality of target features, and converting the plurality of target features into an initial vector based on a preset algorithm includes: extracting features of the original data based on a preset autoregressive discrete autoencoder to obtain a plurality of initial features; calling a preset music knowledge base to carry out normalization processing on the plurality of initial features to obtain a plurality of target features, wherein the plurality of target features comprise pitch, rhythm, speed and timbre; and converting the target characteristics according to a preset algorithm to obtain an initial vector.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the quantizing the query vector to obtain a quantized vector, where the quantized vector includes pitch and tempo information of number quantization, and the quantizing includes: randomly selecting a vector from the query vectors as a base vector; in each iteration, randomly selecting an iteration vector, calculating the distance between the iteration vector and the basic vector, determining cluster marks, if the cluster marks are equal, reducing the distance between the basic vector and the iteration vector, and if the cluster marks are not equal, increasing the distance between the basic vector and the iteration vector; and when the preset iteration times are reached, generating a quantization vector by taking the current basic vector as a final result, wherein the quantization vector comprises the pitch and rhythm information of number quantization.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the inputting the target data into a preset decoder to generate a musical piece includes: calling a modulator in a preset decoder to read the target data, wherein the preset decoder comprises the modulator and a preset local music model; and combining the target data with the preset local music model based on the modulator to generate the musical composition.
Optionally, in a sixth implementation manner of the first aspect of the present invention, before the acquiring the original data, the method further includes: and constructing a preset music knowledge base, wherein the preset music knowledge base comprises basic element information of music.
A second aspect of the present invention provides an apparatus for generating a musical composition, comprising: the acquisition module is used for acquiring original data, wherein the original data is audio data to be processed; the characteristic extraction module is used for calling a preset autoregressive discrete autoencoder to perform characteristic extraction on the original data to generate a query vector, and the query vector is used for querying key information; the quantization module is used for performing quantization processing on the query vector to obtain a quantization vector, and the quantization vector comprises number-quantized pitch and rhythm information; the calculation module is used for calling a preset vector quantization variation automatic coding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating and obtaining target data; and the generating module is used for inputting the target data into a preset decoder to generate the musical composition.
Optionally, in a first implementation manner of the second aspect of the present invention, the calculation module is specifically configured to: calculating and obtaining target data according to a preset formula based on the query vector and the quantization vector, wherein the preset formula is as follows: l isVQ-VAE=-logp(x|q')+(q'-[q])2+β([q']-q)2(ii) a Wherein L isVQ-VAEFor the loss function, p (x | q ') is the probability of the quantized vector q ' occurring in the case of class x, q is the query vector, q ' is the quantized vector, and β is the weight of the average of the difference between the query vector and the quantized vector.
Optionally, in a second implementation manner of the second aspect of the present invention, the feature extraction module includes: the input unit is used for inputting the original data into a preset autoregressive discrete autoregressive coder, and the preset autoregressive discrete autoregressive coder directly takes the original data as a learning object; the characteristic extraction unit is used for extracting characteristics of the original data to obtain a plurality of target characteristics, and converting the plurality of target characteristics into initial vectors based on a preset algorithm; and the filtering unit is used for filtering the initial vector to generate a query vector, and the query vector is used for querying key information.
Optionally, in a third implementation manner of the second aspect of the present invention, the feature extraction unit is specifically configured to: extracting features of the original data based on a preset autoregressive discrete autoencoder to obtain a plurality of initial features; calling a preset music knowledge base to carry out normalization processing on the plurality of initial features to obtain a plurality of target features, wherein the plurality of target features comprise pitch, rhythm, speed and timbre; and converting the target characteristics according to a preset algorithm to obtain an initial vector.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the quantization module includes: a random selection unit, configured to randomly select a vector as a base vector from the query vectors; a calculating unit, configured to randomly select an iteration vector in each iteration, calculate a distance between the iteration vector and the basis vector, and determine a cluster marker, where if the cluster markers are equal, the distance between the basis vector and the iteration vector is decreased, and if the cluster markers are not equal, the distance between the basis vector and the iteration vector is increased; and the first generation unit is used for generating a quantization vector by taking the current basic vector as a final result when a preset iteration number is reached, wherein the quantization vector comprises the pitch and rhythm information of the number quantization.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the generating module includes: the reading unit is used for calling a modulator in a preset decoder to read the target data, and the preset decoder comprises the modulator and a preset local music model; and the second generation unit is used for combining the target data with the preset local music model based on the modulator to generate the musical composition.
Optionally, in a sixth implementation manner of the second aspect of the present invention, before the acquiring the original data, the apparatus further includes: the building module is used for building a preset music knowledge base, and the preset music knowledge base comprises basic element information of music.
A third aspect of the present invention provides a musical piece generating apparatus comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the musical piece generating apparatus to perform the above-described musical piece generating method.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to execute the above-described musical piece generation method.
According to the technical scheme provided by the invention, original data are obtained, wherein the original data are audio data to be processed; calling a preset autoregressive discrete autoencoder to extract features of the original data to generate a query vector, wherein the query vector is used for querying key information; quantizing the query vector to obtain a quantized vector, wherein the quantized vector comprises numerical quantized pitch and rhythm information; calling a preset vector quantization variation automatic coding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating to obtain target data; and inputting the target data into a preset decoder to generate the musical composition. In the embodiment of the invention, the musical works are generated according to the preset vector quantization variation automatic coding VQ-VAE model and the original audio data, so that the music generation efficiency is improved, the accuracy of pitch and rhythm is improved, and the musical works have more uniqueness and expressive force.
Drawings
FIG. 1 is a diagram of an embodiment of a method for generating a musical composition according to an embodiment of the present invention;
FIG. 2 is a diagram of another embodiment of a method for generating a musical composition according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of a musical composition generating apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another embodiment of a musical composition generating apparatus according to an embodiment of the present invention;
FIG. 5 is a diagram of an embodiment of a musical composition generating apparatus according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for generating musical compositions, which are used for generating the musical compositions according to a preset vector quantization variation automatic coding VQ-VAE model and original audio data, so that the music generation efficiency is improved, and the uniqueness and the expressiveness of the musical compositions are improved.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a specific flow of an embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a method for generating a musical composition according to an embodiment of the present invention includes:
101. and acquiring original data, wherein the original data is audio data to be processed.
The server acquires original data, wherein the original data are audio data to be processed. The server takes the raw data as input, the encoder directly takes the raw data as a learning object, and the audio data to be processed can contain a plurality of music styles instead of only taking music symbols as the learning object as in the traditional music generation technology.
It is to be understood that the execution subject of the present invention may be a musical composition generation apparatus, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
102. And calling a preset autoregressive discrete autoencoder to perform feature extraction on the original data to generate a query vector, wherein the query vector is used for querying key information.
And the server calls a preset autoregressive discrete autoencoder to perform feature extraction on the original data to generate a query vector, and the query vector is used for querying key information. Specifically, the server inputs the original data into a preset autoregressive discrete autoregressive encoder, and the preset autoregressive discrete autoregressive encoder directly takes the original data as a learning object; the server extracts features of the original data to obtain a plurality of target features, and converts the plurality of target features into initial vectors based on a preset algorithm; and the server filters the initial vector to generate a query vector, and the query vector is used for querying the key information.
The feature extraction process mainly comprises a Principal Component Analysis (PCA) algorithm, wherein the PCA algorithm projects high-dimensional data to a low-dimensional space through linear transformation, dimensionality reduced by the PCA is noise or redundant data, denoising aims to remove a feature vector corresponding to a small feature value, the size of the feature value reflects the transformation amplitude in the feature vector direction after transformation, redundancy elimination aims to remove linear correlation vectors which can be represented by other vectors, and key information is extracted and redundant information is filtered through the feature extraction process to generate a query vector. The query vector can conveniently extract key information from the music data, and the music is split into smaller parts through the key information, so that the learning can be performed with the lowest calculation power, and the requirement on hardware is lowered. For example, music data may be divided into "pitch", "tempo", "harmony", "tempo", and the like, and key information belonging to these categories may be extracted from the original data to generate a query vector.
103. And quantizing the query vector to obtain a quantized vector, wherein the quantized vector comprises the pitch and rhythm information of number quantization.
And the server quantizes the query vector to obtain a quantized vector, wherein the quantized vector comprises the pitch and rhythm information of number quantization. Specifically, the server randomly selects a vector from the query vectors as a basic vector; the server randomly selects an iteration vector in each iteration, calculates the distance between the iteration vector and the basic vector, determines a cluster mark, reduces the distance between the basic vector and the iteration vector if the cluster marks are equal, and increases the distance between the basic vector and the iteration vector if the cluster marks are not equal; and when the preset iteration times are reached, the server takes the current basic vector as a final result to generate a quantized vector, wherein the quantized vector comprises the pitch and rhythm information of number quantization.
104. And calling a preset vector quantization variation automatic coding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating to obtain target data.
And calling a preset vector quantized variable automatic encoding (VQ-VAE) model by the server, substituting the query vector and the quantized vector into a preset formula, and calculating to obtain target data. Specifically, the server calculates and obtains target data according to a preset formula based on the query vector and the quantization vector, wherein the preset formula is as follows:
LVQ-VAE=-logp(x|q')+(q'-[q])2+β([q']-q)2(ii) a Wherein L isVQ-VAEFor the loss function, p (x | q ') is the probability that a quantized vector q ' occurs in the case of class x, q is the query vector, q ' is the quantized vector, and β is the weight of the average of the differences between the query vector and the quantized vector.
Log conditional probability is measured for input quantization vectors to reduce fluctuation, so that the characteristics of the input quantization vectors are more stable, structurality can be kept in music with longer duration, and deviation between finally generated data and ideal data can be reduced by calculating square difference of query vectors and quantization vectors. Vector Quantization (VQ) is a supervised neural network classification method with simple structure and powerful functions. As a nearest neighbor prototype classifier, the boundary between weight vectors of different classes can be gradually converged to a Bayes classification boundary by continuously updating the weight vector of a neuron and continuously adjusting the learning rate of the neuron in the training process. In the algorithm, the selection of the nearest neighbor weight vector is judged by calculating the distance between the input sample and the weight vector.
105. And inputting the target data into a preset decoder to generate the musical composition.
The server inputs the target data into a preset decoder to generate the musical composition. Specifically, the server calls a modulator in a preset decoder to read target data, wherein the preset decoder comprises the modulator and a preset local music model; the server combines the target data with a preset local music model based on the modulator to generate a musical composition.
The decoder exists because the audio video data is stored by compression, otherwise the data size is too large, the compression requires a certain coding to store the audio and video data with the highest quality with the smallest capacity, so when the data needs to be played, the decoding is performed by a decoder, examples of digital encoding formats that can be decoded are digital audio coding-3 (AC-3), High Definition Compatible Digital (HDCD) technology, digital cinema systems (DTS), etc., these are multi-channel audio-video coding formats, and if a high fidelity level is to be achieved, there are two channels of Pulse Code Modulation (PCM) digital codes, the modulator is a part of the decoder, and the preset music performance style is stored in the decoder, and combined with the target data to finally generate the music works with performance characteristics.
In the embodiment of the invention, the musical works are generated according to the preset vector quantization variation automatic coding VQ-VAE model and the original audio data, so that the music generation efficiency is improved, the accuracy of pitch and rhythm is improved, and the musical works have more uniqueness and expressive force.
Referring to fig. 2, another embodiment of the method for generating a musical composition according to an embodiment of the present invention includes:
201. and acquiring original data, wherein the original data is audio data to be processed.
The server acquires original data, wherein the original data are audio data to be processed. The server takes the raw data as input, the encoder directly takes the raw data as a learning object, and the audio data to be processed can contain a plurality of music styles instead of only taking music symbols as the learning object as in the traditional music generation technology.
It is to be understood that the execution subject of the present invention may be a musical composition generation apparatus, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
202. And inputting the original data into a preset autoregressive discrete autoregressive coder, wherein the preset autoregressive discrete autoregressive coder directly takes the original data as a learning object.
The server inputs the original data into a preset autoregressive discrete autoencoder, and the preset autoregressive discrete autoencoder takes the original data as a learning object and extracts high-order common characteristics from the input original data, so that the requirement on the data processing capacity of a single decoder is reduced.
203. And extracting features of the original data to obtain a plurality of target features, and converting the plurality of target features into initial vectors based on a preset algorithm.
The server extracts the features of the original data to obtain a plurality of target features, and converts the plurality of target features into initial vectors based on a preset algorithm. Specifically, the server performs feature extraction on original data based on a preset autoregressive discrete autoencoder to obtain a plurality of initial features; the server calls a preset music knowledge base to carry out normalization processing on the plurality of initial features to obtain a plurality of target features, wherein the plurality of target features comprise pitch, rhythm, speed and timbre; and the server converts the target characteristics according to a preset algorithm to obtain an initial vector.
The feature extraction process mainly comprises a Principal Component Analysis (PCA) algorithm, wherein the PCA algorithm projects high-dimensional data to a low-dimensional space through linear transformation, dimensionality reduced by PCA is noise or redundant data, denoising aims to remove a feature vector corresponding to a small feature value, the size of the feature value reflects the transformation amplitude in the direction of the feature vector after transformation, and redundancy elimination aims to remove linear correlation vectors which can be represented by other vectors.
For example, the length and the strength of music are normalized to the rhythm of the music, the speed of music progression is normalized to the speed of the music, and the normalization processing of a plurality of initial characteristics by calling a preset music knowledge base is helpful for subsequent identification and grouping, so that redundancy is avoided. The algorithm for converting the plurality of target features into the initial vector comprises a text vectorization word2vec algorithm.
204. And filtering the initial vector to generate a query vector, wherein the query vector is used for querying key information.
And the server filters the initial vector to generate a query vector, and the query vector is used for querying the key information.
The filtering process includes a high correlation filtering process, when the data variation trends of two columns are similar, the information contained in the two columns is also similar, so that the machine learning model can be satisfied by using one column in the similar columns, the similarity between the numerical value columns is represented by calculating a correlation coefficient, and only one column is reserved for the two columns with the correlation coefficient larger than a preset threshold value. For example, music data can be split into "pitch", "rhythm", "harmony", "speed", and the like, key information belonging to these categories is extracted from an initial vector, duplicate data is deleted when two columns of data have similar variation trends, and finally a query vector is generated.
205. And quantizing the query vector to obtain a quantized vector, wherein the quantized vector comprises the pitch and rhythm information of number quantization.
And the server quantizes the query vector to obtain a quantized vector, wherein the quantized vector comprises the pitch and rhythm information of number quantization.
And the server quantizes the query vector to obtain a quantized vector, wherein the quantized vector comprises the pitch and rhythm information of number quantization. Specifically, the server randomly selects a vector from the query vectors as a basic vector; the server randomly selects an iteration vector in each iteration, calculates the distance between the iteration vector and the basic vector, determines a cluster mark, reduces the distance between the basic vector and the iteration vector if the cluster marks are equal, and increases the distance between the basic vector and the iteration vector if the cluster marks are not equal; and when the preset iteration times are reached, the server takes the current basic vector as a final result to generate a quantized vector, wherein the quantized vector comprises the pitch and rhythm information of number quantization.
206. And calling a preset vector quantization variation automatic coding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating to obtain target data.
And calling a preset vector quantized variable automatic encoding (VQ-VAE) model by the server, substituting the query vector and the quantized vector into a preset formula, and calculating to obtain target data. Specifically, the server calculates and obtains target data according to a preset formula based on the query vector and the quantization vector, wherein the preset formula is as follows:
LVQ-VAE=-logp(x|q')+(q'-[q])2+β([q']-q)2(ii) a Wherein L isVQ-VAEFor the loss function, p (x | q ') is the probability that a quantized vector q ' occurs in the case of class x, q is the query vector, q ' is the quantized vector, and β is the weight of the average of the differences between the query vector and the quantized vector.
Log conditional probability is measured for input quantization vectors to reduce fluctuation, so that the characteristics of the input quantization vectors are more stable, structurality can be kept in music with longer duration, and deviation between finally generated data and ideal data can be reduced by calculating square difference of query vectors and quantization vectors. Vector Quantization (VQ) is a supervised neural network classification method with simple structure and powerful functions. As a nearest neighbor prototype classifier, the boundary between weight vectors of different classes can be gradually converged to a Bayes classification boundary by continuously updating the weight vector of a neuron and continuously adjusting the learning rate of the neuron in the training process. In the algorithm, the selection of the nearest neighbor weight vector is judged by calculating the distance between the input sample and the weight vector.
207. And inputting the target data into a preset decoder to generate the musical composition.
The server inputs the target data into a preset decoder to generate the musical composition. Specifically, the server calls a modulator in a preset decoder to read target data, wherein the preset decoder comprises the modulator and a preset local music model; the server combines the target data with a preset local music model based on the modulator to generate a musical composition.
The decoder exists because the audio video data is stored by compression, otherwise the data size is too large, the compression requires a certain coding to store the audio and video data with the highest quality with the smallest capacity, so when the data needs to be played, the decoding is performed by a decoder, examples of digital encoding formats that can be decoded are digital audio coding-3 (AC-3), High Definition Compatible Digital (HDCD) technology, digital cinema systems (DTS), etc., these are multi-channel audio-video coding formats, and if a high fidelity level is to be achieved, there are two channels of Pulse Code Modulation (PCM) digital codes, the modulator is a part of the decoder, and the preset music performance style is stored in the decoder, and combined with the target data to finally generate the music works with performance characteristics.
In the embodiment of the invention, the musical works are generated according to the preset vector quantization variation automatic coding VQ-VAE model and the original audio data, so that the music generation efficiency is improved, the accuracy of pitch and rhythm is improved, and the musical works have more uniqueness and expressive force.
With reference to fig. 3, the method for generating musical compositions in the embodiment of the present invention is described above, and a device for generating musical compositions in the embodiment of the present invention is described below, where an embodiment of the device for generating musical compositions in the embodiment of the present invention includes:
an obtaining module 301, configured to obtain original data, where the original data is audio data to be processed;
the feature extraction module 302 is configured to invoke a preset autoregressive discrete autoencoder to perform feature extraction on original data, so as to generate a query vector, where the query vector is used for querying key information;
the quantization module 303 is configured to perform quantization processing on the query vector to obtain a quantized vector, where the quantized vector includes pitch and rhythm information of number quantization;
the calculation module 304 is used for calling a preset vector quantization variation automatic coding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating and obtaining target data;
the generating module 305 is used for inputting the target data into a preset decoder to generate the musical composition.
In the embodiment of the invention, the musical works are generated according to the preset vector quantization variation automatic coding VQ-VAE model and the original audio data, so that the music generation efficiency is improved, the accuracy of pitch and rhythm is improved, and the musical works have more uniqueness and expressive force.
Referring to fig. 4, another embodiment of the apparatus for generating a musical composition according to an embodiment of the present invention includes:
an obtaining module 301, configured to obtain original data, where the original data is audio data to be processed;
the feature extraction module 302 is configured to invoke a preset autoregressive discrete autoencoder to perform feature extraction on original data, so as to generate a query vector, where the query vector is used for querying key information;
the quantization module 303 is configured to perform quantization processing on the query vector to obtain a quantized vector, where the quantized vector includes pitch and rhythm information of number quantization;
the calculation module 304 is used for calling a preset vector quantization variation automatic coding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating and obtaining target data;
the generating module 305 is used for inputting the target data into a preset decoder to generate the musical composition.
Optionally, the calculating module 304 is specifically configured to:
calculating and obtaining target data according to a preset formula based on the query vector and the quantization vector, wherein the preset formula is as follows: l isVQ-VAE=-logp(x|q')+(q'-[q])2+β([q']-q)2(ii) a Wherein L isVQ-VAEFor the loss function, p (x | q ') is the probability that a quantized vector q ' occurs in the case of class x, q is the query vector, q ' is the quantized vector, and β is the weight of the average of the differences between the query vector and the quantized vector.
Optionally, the feature extraction module 302 includes:
an input unit 3021, configured to input the raw data into a preset autoregressive discrete autocoder, where the preset autoregressive discrete autocoder directly uses the raw data as a learning object;
a feature extraction unit 3022, configured to perform feature extraction on the original data to obtain a plurality of target features, and convert the plurality of target features into initial vectors based on a preset algorithm;
and the filtering unit 3023 is configured to perform filtering processing on the initial vector to generate a query vector, where the query vector is used to query the key information.
Optionally, the feature extraction unit 3022 is specifically configured to:
extracting features of the original data based on a preset autoregressive discrete autoencoder to obtain a plurality of initial features; calling a preset music knowledge base to carry out normalization processing on the plurality of initial characteristics to obtain a plurality of target characteristics, wherein the plurality of target characteristics comprise pitch, rhythm, speed and timbre; and converting the plurality of target characteristics according to a preset algorithm to obtain an initial vector.
Optionally, the quantization module 303 includes:
a random selection unit 3031, configured to randomly select a vector as a base vector from the query vectors;
a calculating unit 3032, configured to randomly select an iteration vector in each iteration, calculate a distance between the iteration vector and the base vector, and determine a cluster marker, where if the cluster markers are equal, the distance between the base vector and the iteration vector is decreased, and if the cluster markers are not equal, the distance between the base vector and the iteration vector is increased;
a first generating unit 3033, configured to generate a quantized vector including numerical quantized pitch and tempo information with the current basis vector as a final result when a preset number of iterations is reached.
Optionally, the generating module 305 includes:
a reading unit 3051, configured to call a modulator in a preset decoder to read target data, where the preset decoder includes the modulator and a preset local music model;
a second generation unit 3052, configured to combine the target data with a preset local music model based on the modulator to generate a musical piece.
Optionally, the apparatus for generating musical composition further comprises:
and the constructing module 306 is configured to construct a preset music knowledge base, where the preset music knowledge base includes basic element information of music.
In the embodiment of the invention, the musical works are generated according to the preset vector quantization variation automatic coding VQ-VAE model and the original audio data, so that the music generation efficiency is improved, the accuracy of pitch and rhythm is improved, and the musical works have more uniqueness and expressive force.
Fig. 3 and 4 above describe in detail the musical piece generating apparatus in the embodiment of the present invention from the perspective of the modular functional entity, and the musical piece generating device in the embodiment of the present invention from the perspective of the hardware processing is described in detail below.
Fig. 5 is a schematic structural diagram of a musical composition generating apparatus 500 according to an embodiment of the present invention, which may have relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) for storing applications 533 or data 532. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations in the musical composition generating apparatus 500. Still further, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the musical composition generating apparatus 500.
The musical composition generating device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the musical composition generating device architecture illustrated in FIG. 5 is not intended to be limiting of musical composition generating devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components.
The invention also provides a musical composition generating device, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the musical composition generating method in the above embodiments.
The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the method of generating a musical composition.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of generating a musical composition, the method comprising:
acquiring original data, wherein the original data is audio data to be processed;
calling a preset autoregressive discrete autoencoder to extract features of the original data to generate a query vector, wherein the query vector is used for querying key information;
quantizing the query vector to obtain a quantized vector, wherein the quantized vector comprises numerical quantized pitch and rhythm information;
calling a preset vector quantization variation automatic coding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating to obtain target data;
and inputting the target data into a preset decoder to generate the musical composition.
2. The method of generating a musical composition according to claim 1, wherein said invoking a preset vector quantization variational auto-encoding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating and obtaining target data comprises:
calculating and obtaining target data according to a preset formula based on the query vector and the quantization vector, wherein the preset formula is as follows:
LVQ-VAE=-logp(x|q')+(q'-[q])2+β([q']-q)2
wherein L isVQ-VAEFor the loss function, p (x | q ') is the probability of the quantized vector q ' occurring in the case of class x, q is the query vector, q ' is the quantized vector, and β is the weight of the average of the difference between the query vector and the quantized vector.
3. The method of generating musical compositions according to claim 1, wherein said invoking a preset autoregressive discrete autoencoder to perform feature extraction on said raw data to generate a query vector, said query vector being used to query key information comprising:
inputting the original data into a preset autoregressive discrete autoregressive coder, wherein the preset autoregressive discrete autoregressive coder directly takes the original data as a learning object;
extracting features of the original data to obtain a plurality of target features, and converting the plurality of target features into initial vectors based on a preset algorithm;
and filtering the initial vector to generate a query vector, wherein the query vector is used for querying key information.
4. The method of generating a musical composition according to claim 3, wherein said extracting features from said original data to obtain a plurality of target features, and converting said plurality of target features into an initial vector based on a preset algorithm comprises:
extracting features of the original data based on a preset autoregressive discrete autoencoder to obtain a plurality of initial features;
calling a preset music knowledge base to carry out normalization processing on the plurality of initial features to obtain a plurality of target features, wherein the plurality of target features comprise pitch, rhythm, speed and timbre;
and converting the target characteristics according to a preset algorithm to obtain an initial vector.
5. The method of generating a musical composition according to claim 1, wherein quantizing the query vector to obtain a quantized vector, the quantized vector including numerically quantized pitch and tempo information comprises:
randomly selecting a vector from the query vectors as a base vector;
in each iteration, randomly selecting an iteration vector, calculating the distance between the iteration vector and the basic vector, determining cluster marks, if the cluster marks are equal, reducing the distance between the basic vector and the iteration vector, and if the cluster marks are not equal, increasing the distance between the basic vector and the iteration vector;
and when the preset iteration times are reached, generating a quantization vector by taking the current basic vector as a final result, wherein the quantization vector comprises the pitch and rhythm information of number quantization.
6. The method of generating a musical composition according to claim 1, wherein said inputting said target data into a preset decoder, generating a musical composition comprises:
calling a modulator in a preset decoder to read the target data, wherein the preset decoder comprises the modulator and a preset local music model;
and combining the target data with the preset local music model based on the modulator to generate the musical composition.
7. The method of generating a musical composition according to any one of claims 1-6 wherein prior to said obtaining the raw data, the method further comprises:
and constructing a preset music knowledge base, wherein the preset music knowledge base comprises basic element information of music.
8. An apparatus for generating a musical composition, comprising:
the acquisition module is used for acquiring original data, wherein the original data is audio data to be processed;
the characteristic extraction module is used for calling a preset autoregressive discrete autoencoder to perform characteristic extraction on the original data to generate a query vector, and the query vector is used for querying key information;
the quantization module is used for performing quantization processing on the query vector to obtain a quantization vector, and the quantization vector comprises number-quantized pitch and rhythm information;
the calculation module is used for calling a preset vector quantization variation automatic coding VQ-VAE model, substituting the query vector and the quantization vector into a preset formula, and calculating and obtaining target data;
and the generating module is used for inputting the target data into a preset decoder to generate the musical composition.
9. A musical composition generating apparatus, comprising:
a memory and at least one processor, the memory having instructions stored therein;
the at least one processor invokes the instructions in the memory to cause the musical piece generation apparatus to perform the musical piece generation method of any one of claims 1-7.
10. A computer readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement a method of generating a musical composition according to any one of claims 1-7.
CN202110285844.5A 2021-03-17 2021-03-17 Method, device and equipment for generating musical composition and storage medium Pending CN113053336A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110285844.5A CN113053336A (en) 2021-03-17 2021-03-17 Method, device and equipment for generating musical composition and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110285844.5A CN113053336A (en) 2021-03-17 2021-03-17 Method, device and equipment for generating musical composition and storage medium

Publications (1)

Publication Number Publication Date
CN113053336A true CN113053336A (en) 2021-06-29

Family

ID=76512981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110285844.5A Pending CN113053336A (en) 2021-03-17 2021-03-17 Method, device and equipment for generating musical composition and storage medium

Country Status (1)

Country Link
CN (1) CN113053336A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113838041A (en) * 2021-09-29 2021-12-24 西安工程大学 Method for detecting defect area of color texture fabric based on self-encoder

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793446A (en) * 2012-10-29 2014-05-14 汤晓鸥 Music video generation method and system
US10068557B1 (en) * 2017-08-23 2018-09-04 Google Llc Generating music with deep neural networks
CN108806657A (en) * 2018-06-05 2018-11-13 平安科技(深圳)有限公司 Music model training, musical composition method, apparatus, terminal and storage medium
CN110164463A (en) * 2019-05-23 2019-08-23 北京达佳互联信息技术有限公司 A kind of phonetics transfer method, device, electronic equipment and storage medium
JP2020003536A (en) * 2018-06-25 2020-01-09 カシオ計算機株式会社 Learning device, automatic music transcription device, learning method, automatic music transcription method and program
CN110853604A (en) * 2019-10-30 2020-02-28 西安交通大学 Automatic generation method of Chinese folk songs with specific region style based on variational self-encoder

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793446A (en) * 2012-10-29 2014-05-14 汤晓鸥 Music video generation method and system
US10068557B1 (en) * 2017-08-23 2018-09-04 Google Llc Generating music with deep neural networks
CN108806657A (en) * 2018-06-05 2018-11-13 平安科技(深圳)有限公司 Music model training, musical composition method, apparatus, terminal and storage medium
WO2019232928A1 (en) * 2018-06-05 2019-12-12 平安科技(深圳)有限公司 Musical model training method, music creation method, devices, terminal and storage medium
JP2020003536A (en) * 2018-06-25 2020-01-09 カシオ計算機株式会社 Learning device, automatic music transcription device, learning method, automatic music transcription method and program
CN110164463A (en) * 2019-05-23 2019-08-23 北京达佳互联信息技术有限公司 A kind of phonetics transfer method, device, electronic equipment and storage medium
CN110853604A (en) * 2019-10-30 2020-02-28 西安交通大学 Automatic generation method of Chinese folk songs with specific region style based on variational self-encoder

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113838041A (en) * 2021-09-29 2021-12-24 西安工程大学 Method for detecting defect area of color texture fabric based on self-encoder
CN113838041B (en) * 2021-09-29 2023-09-08 西安工程大学 Method for detecting defect area of color texture fabric based on self-encoder

Similar Documents

Publication Publication Date Title
CN110223705B (en) Voice conversion method, device, equipment and readable storage medium
US7460994B2 (en) Method and apparatus for producing a fingerprint, and method and apparatus for identifying an audio signal
KR101036712B1 (en) Adaptation of compressed acoustic models
CN109493881B (en) Method and device for labeling audio and computing equipment
EP1758097B1 (en) Compression of gaussian models
WO2016119604A1 (en) Voice information search method and apparatus, and server
CN111554255A (en) MIDI playing style automatic conversion system based on recurrent neural network
CN112184859A (en) End-to-end virtual object animation generation method and device, storage medium and terminal
CN111108557A (en) Method of modifying a style of an audio object, and corresponding electronic device, computer-readable program product and computer-readable storage medium
CN113035228A (en) Acoustic feature extraction method, device, equipment and storage medium
CN113160848A (en) Dance animation generation method, dance animation model training method, dance animation generation device, dance animation model training device, dance animation equipment and storage medium
CN113053336A (en) Method, device and equipment for generating musical composition and storage medium
CN113409803B (en) Voice signal processing method, device, storage medium and equipment
Lyon et al. Sparse coding of auditory features for machine hearing in interference
CN112818098B (en) Knowledge base-based dialogue generation method, device, terminal and storage medium
CN115019824A (en) Video processing method and device, computer equipment and readable storage medium
CN115424616A (en) Audio data screening method, device, equipment and computer readable medium
CN112906872B (en) Method, device, equipment and storage medium for generating conversion of music score into sound spectrum
CN114783417B (en) Voice detection method and device, electronic equipment and storage medium
CN113066457B (en) Fan-exclamation music generation method, device, equipment and storage medium
US20060136210A1 (en) System and method for tying variance vectors for speech recognition
CN113436621B (en) GPU (graphics processing Unit) -based voice recognition method and device, electronic equipment and storage medium
JPS62245294A (en) Voice recognition system
JPH07160287A (en) Standard pattern making device
JPH07248791A (en) Method and device for identifying speaker

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination