WO2020151150A1

WO2020151150A1 - Dcgan-based music generation method, and music generation apparatus

Info

Publication number: WO2020151150A1
Application number: PCT/CN2019/088805
Authority: WO
Inventors: 王义文; 王健宗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-01-23
Filing date: 2019-05-28
Publication date: 2020-07-30
Also published as: CN109872708A; CN109872708B

Abstract

Provided are a DCGAN-based music generation method, and a music generation apparatus. The method comprises: constructing a deep convolution generative adversarial network (DCGAN) model (S101); then acquiring a training data set (S102); next, inputting N melody matrices and N corresponding chord matrices in the training data set into the DCGAN model for training so as to obtain a trained DCGAN model (S103); and then inputting an acquired target melody matrix into the trained DCGAN model for processing, acquiring a target chord matrix generated by the trained DCGAN model and matching the target melody matrix, and finally, outputting a music file formed after combining a melody track mapped by the target melody matrix with a chord track mapped by the target chord matrix (S104). By means of the music generation method, a music file with matching chords can be automatically generated, thereby reducing manual handling steps.

Description

A DCGAN-based music generation method and device

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 23, 2019, the application number is 2019100661308, and the application name is "A method and device for generating music based on DCGAN", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the field of computer technology, in particular to a DCGAN-based music generation method and device.

Background technique

At present, the existing music generation method is usually to give a melody, and professional musicians can chord the given melody to obtain a music file with chord matching. But specifically, to realize the chord of a melody, the music player needs to have strong hardware technical support in terms of music theory and operation knowledge. At the same time, the music player is also required to have a strong and sensitive musical experience in terms of software technology. Therefore, to generate a high-quality music file is bound to be restricted by the level of the music player.

Summary of the invention

The embodiment of the present application provides a DCGAN-based music generation method, which can automatically generate music files with chord matching and reduce manual processing links.

In the first aspect, an embodiment of the present application provides a DCGAN-based music generation method, which includes:

Construct a deep convolutional generative confrontation network DCGAN model;

Obtain a training data set. The training data set includes N melody matrices and corresponding N chord matrices, where the melody matrix and the chord matrix are both binary matrices;

Input the N melody matrices and the corresponding N chord matrices in the training data set into the DCGAN model for training to obtain a trained DCGAN model;

Input the obtained target melody matrix into the trained DCGAN model for processing, and obtain the target chord matrix generated by the trained DCGAN model that matches the target melody matrix, and output the melody track mapped by the target melody matrix A music file merged with the chord track mapped out by the target chord matrix.

In a second aspect, an embodiment of the present application provides a music generating device, which includes:

Construction module, used to construct a deep convolutional generative confrontation network DCGAN model;

The first acquisition module is used to acquire a training data set, the training data set includes N melody matrices and corresponding N chord matrices, wherein the melody matrix and the chord matrix are both binary matrices;

The training module is used to input the N melody matrices and the corresponding N chord matrices in the training data set into the DCGAN model for training to obtain a trained DCGAN model;

The input module is used to input the obtained target melody matrix into the trained DCGAN model for processing, and obtain the target chord matrix generated by the trained DCGAN model that matches the target melody matrix;

The output module is used to output a music file obtained by merging the melody track mapped from the target melody matrix and the chord track mapped from the target chord matrix.

In a third aspect, an embodiment of the present application provides a terminal, including a processor, an input device, an output device, and a memory. The processor, input device, output device, and memory are connected to each other, wherein the memory is used to store and support terminal execution The computer program of the above method, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the DCGAN-based music generation method of the above first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium that stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to execute The above-mentioned first aspect of the DCGAN-based music generation method.

In the embodiment of the application, by training a DCGAN model and generating a chord matrix using the trained DCGAN model, a music file with chord matching can be automatically generated and manual processing steps are reduced.

Description of the drawings

FIG. 1 is a schematic flowchart of a DCGAN-based music generation method provided by an embodiment of the present application;

2 is a schematic diagram of the network structure of the DCGAN model provided by an embodiment of the present application;

FIG. 3 is another schematic flowchart of a DCGAN-based music generation method provided by an embodiment of the present application;

4a is a schematic diagram of MIDI notes provided by an embodiment of the present application;

Figure 4b is a schematic diagram of a melody matrix provided by an embodiment of the present application;

Figure 5a is a schematic diagram of 24 chords provided by an embodiment of the present application;

Figure 5b is a schematic diagram of a chord matrix provided by an embodiment of the present application;

Fig. 6 is a schematic block diagram of a music generating device provided by an embodiment of the present application;

FIG. 7 is a schematic block diagram of a terminal provided by an embodiment of the present application.

detailed description

Hereinafter, the DCGAN-based music generation method and device provided by the embodiments of the present application will be described with reference to FIGS. 1 to 7.

Refer to Fig. 1, which is a schematic flowchart of a DCGAN-based music generation method provided by an embodiment of the present application. As shown in Figure 1, the DCGAN-based music generation method may include steps:

S101, construct a deep convolutional generative confrontation network DCGAN model.

In some feasible implementation manners, the terminal may construct a Deep Convolution Generative Adversarial Networks (DCGAN) model. Among them, the DCGAN model can include a generator, a discriminator and a regulator. The generator, discriminator and regulator are all convolutional neural networks (Convolutional Neural Network, CNN), the generator may include at least one fully connected layer and at least one transposed convolutional layer; the discriminator may include at least one convolution Layer and at least one fully connected layer; the regulator may be an inverted generator, including at least one convolutional layer and at least one fully connected layer. The generator can be used to generate a piece of music that is as realistic as possible according to a given random sequence to deceive the discriminator. The discriminator can be used to distinguish the music generated by the generator from the real music as much as possible, so that the generator and the discriminator It constitutes a dynamic "game process", and the adjuster can be used to adjust the parameters on the transposed convolutional layer of the generator, so that the music generated by the generator can better deceive the discriminator.

As shown in FIG. 2, it is a schematic diagram of the network structure of the DCGAN model provided by an embodiment of the present application. Among them, Condititoner CNN represents the regulator in the DCGAN model, Generator CNN represents the generator in the DCGAN model, and Discriminator CNN represents the discriminator in the DCGAN model. Since the regulator is essentially a reverse generator, the regulator and the generator have the same convolution kernel shape, and the output of the regulator and the generator have the same shape, so the output of each convolution layer of the regulator Give it to the generator's corresponding transposed convolutional layer, so that the parameters on the generator's transposed convolutional layer can be adjusted, and the output of the generator is used as an input of the discriminator. Noise z represents the random sequence of the input generator, X or G(z) represents the output of the generator, and 2D conditions represents the real data (here refers to the data not generated by the generator).

S102: Obtain a training data set.

In some feasible implementation manners, the terminal may obtain N training samples for training the above-mentioned DCGAN model from a preset training database, and each training sample may include a melody matrix and a corresponding chord matrix. The terminal may determine the N training samples as the training sample set of the aforementioned DCGAN model, then the training sample set includes N training samples, that is, the training sample set may include N melody matrices and corresponding N chord matrices. Wherein, N can be an integer greater than or equal to 2. The melody matrix can be a 128*16 binary matrix, and the chord matrix can be a 16*13 binary matrix.

S103: Input the N melody matrices and the corresponding N chord matrices in the training data set into the DCGAN model for training, to obtain a trained DCGAN model.

In some feasible implementation manners, the aforementioned DCGAN model includes a generator, a discriminator, and a regulator. The generator, discriminator, and regulator are all CNN. The terminal can use a separate alternate training method to train the generator and discriminator in the DCGAN model. Specifically, take one iteration of the training process as an example. The terminal can fix the parameters on the convolutional layer of the discriminator and train the generator. The terminal can input any melody matrix i in the above training data set into the generator of the DCGAN model to generate the first chord that matches the melody matrix i Matrix j. The fixed generator transposes the parameters on the convolutional layer and trains the discriminator. The terminal can input the first chord matrix j and the chord matrix k corresponding to the melody matrix i in the training data set to the discriminator of the DCGAN model. Identify the probability that the first chord matrix j is the same as the chord matrix k (ie, the similarity between the first chord matrix j and the chord matrix k). It is determined whether the probabilities that the discriminator outputs for the first chord matrix j are all within a preset range (for example, between 0.85 and 1, inclusive of 0.85 and 1). If the probability that the discriminator outputs for the first chord matrix j is not within the preset range, the terminal can input the probability output by the discriminator into the regulator of the DCGAN model on the transposed convolutional layer of the generator. Parameters are adjusted. The terminal can re-input the melody matrix i into the adjusted generator to regenerate the first chord matrix j that matches the melody matrix i, and can combine the regenerated first chord matrix j and the chord matrix k into the DCGAN The discriminator of the model discriminates that the first chord matrix j has the same probability as the chord matrix k. If the probability that the discriminator outputs for the first chord matrix j is within the preset range, the terminal can select another melody matrix from the training data set to perform one iteration of the training process. For each melody matrix in the above training data set, one round of iteration is required during the training process, that is, there are N melody matrices in the training data set, and the training process has at least N rounds of iteration. When the probability that the discriminator outputs each first chord matrix generated by the generator is within the preset range, a trained DCGAN model is obtained.

In some feasible implementation manners, the training process of the aforementioned DCGAN model can be represented by the following function 1-1:

Wherein, p _data in the function 1-1 represents the N chord matrices in the training data set, and p _z represents the N melody matrices in the training data set. D stands for discriminator and G stands for generator. G(z) represents the output of the generator, D(x) represents the output of the discriminator (the value of D(x) is in the range of 0 to 1, including 0 and 1). Training D maximizes log D(x), and training G minimizes log(1-D(G(z))), that is, maximizes the loss of D. The training process is usually to fix one party (such as the discriminator D), update the parameters of the other network (such as the generator G), and alternate iterations to maximize the error of the other party. Finally, when G converges, the training of G and D is completed, and a trained DCGAN model is obtained.

In some feasible implementations, the generator of the DCGAN model adds feature matching during the learning process. Feature matching can be represented by the following function 1-2:

Among them, E in the function 1-2 represents the average value, X represents the chord matrix in the training data set, z represents the melody matrix in the training data set, and G(z) represents the output of the generator. The first convolutional layer of f discriminator, λ ₁ , λ ₂ represent the tuning parameters of the generator. The range of adjustment parameters can be within the range of system distortion.

S104: Input the obtained target melody matrix into the trained DCGAN model for processing, and obtain a target chord matrix matching the target melody matrix generated by the trained DCGAN model, and output the melody track and target mapped by the target melody matrix The music file after the chord tracks mapped out by the chord matrix are merged.

In some feasible implementation manners, the terminal may obtain a target melody matrix after obtaining the trained DCGAN model. The target melody matrix may be a binary matrix directly input by the user, or may be a binary matrix randomly generated by the terminal. For example, first obtain a random noise (Gaussian noise, uniform noise, etc.), then process the obtained random noise into a matrix with the same data format as the melody matrix in the above training data set, and then process the random noise to obtain the matrix Determine the target melody matrix. After obtaining the target melody matrix, the terminal can input the target melody matrix into the above-mentioned trained DCGAN model to generate a target chord matrix that matches the target melody matrix. The terminal can obtain the target chord matrix generated by the trained DCGAN model, and can map the target melody matrix to a melody track, and the target chord matrix to a chord track. The terminal can merge the melody track mapped by the target melody matrix and the chord track mapped by the target chord matrix to obtain a merged music file, and the merged music file can be used with a musical instrument digital interface (musical instrument digital interface). , MIDI) format output. Among them, the size of the target chord matrix is the same as the size of the chord matrix in the training data set. The merged music file includes melody and chord. For example, at time t, the melody track mapped by the target melody matrix and the chord track mapped by the target chord matrix simultaneously emit their respective notes at time t. The embodiment of this application constructs a DCGAN model and uses the melody matrix and chord matrix to train it to obtain a trained DCGAN model, and then input the target melody matrix (which can be a random noise) into the trained DCGAN model, and the trained DCGAN model The DCGAN model generates a target chord matrix that matches the target melody matrix based on the target melody matrix, and can automatically generate music files with chord matching, thereby saving manpower and reducing manual processing steps.

In the embodiment of this application, the terminal constructs a deep convolutional generative confrontation network DCGAN model, obtains a training data set, and then inputs the N melody matrices and the corresponding N chord matrices in the training data set into the DCGAN model. Training to obtain a trained DCGAN model, and then input the obtained target melody matrix into the trained DCGAN model for processing, and obtain the target chord matrix generated by the trained DCGAN model that matches the target melody matrix, Finally, a music file obtained by merging the melody track mapped from the target melody matrix and the chord track mapped from the target chord matrix is output. Can automatically generate music files with chord matching, reducing manual processing links.

Refer to FIG. 3, which is another schematic flowchart of a DCGAN-based music generation method provided by an embodiment of the present application. As shown in Figure 3, the DCGAN-based music generation method may include steps:

S301: Construct a deep convolutional generative confrontation network DCGAN model.

In some feasible implementation manners, the implementation manner of step S301 in the embodiment of the present application can refer to the implementation manner of step S101 in the embodiment shown in FIG. 1, which will not be repeated here.

S302: Acquire a dual-track data set including multiple dual-track music files.

S303: Determine N target dual-track music files from the dual-track data set.

In some feasible implementation manners, the terminal may obtain a MIDI data set, and the MIDI data set may include multiple music files in MIDI format. The terminal may determine a music file including a melody track and a chord track in the MIDI data set as a dual-track music file, and may use multiple dual-track music files in the MIDI data set as a dual-track data set. The terminal may determine a dual-track music file whose chords in the dual-track data set belong to the preset basic chord set and whose number of bars is equal to the target threshold as the target dual-track music file, and obtain N target dual-track music files. Among them, the preset basic chord set may include 12 major chords and 12 minor chords. The 12 major chords are: C, C#, D, D#, E, F, F#, G, G#, A, A#, B; the 12 minor chords are: A, A#, B, C, C#, D, D#, E, F, F#, G, G#. Only one chord is used in each measure of each target dual-track music file. The target threshold may be 16, that is, each target dual-track music file includes 16 bars. N can be an integer greater than or equal to 2.

In some feasible implementation manners, in order to satisfy the input format of the aforementioned DCGAN model, the terminal may sequentially divide each of the aforementioned N target dual-track music files into groups of 8 bars. For example, a target two-track music file has a total of 18 measures and is divided into groups of 8 measures, the first group is the first 8 measures, the second group is the middle 8 measures, and the third group is the last 2 measures.

S304: Obtain the melody matrix of each target dual-track music file on the melody track among the N target dual-track music files, to obtain N melody matrices.

In some feasible implementation manners, the terminal may adjust the melody of each target dual-track music file among the N target dual-track music files to within a preset pitch range. The preset pitch range can be between the two octaves of C4 to B5. For example, the terminal removes the melody notes whose pitches of the melody notes in each target dual-track music file are not between the preset two octaves, and only keeps the pitches of the melody notes in each target dual-track music file between C4 and C4. B5 The melody note between these two octaves. The terminal may obtain the adjusted melody notes in each target dual-track music file, and may generate the melody matrix of each target dual-track music file according to the adjusted melody notes in each target dual-track music file. Among them, the melody matrix can be a binary matrix of h*w, h can be used to represent the number of MIDI notes, h=128; w can be used to represent the number of bars of the target two-track music file, w=16. Element 0 in the melody matrix can be used to indicate that the adjusted target dual-track music file has no MIDI notes at the position corresponding to the 0 element, and element 1 in the melody matrix can be used to indicate the adjusted target dual-track music file There is a corresponding MIDI note at the position corresponding to the 1 element.

For example, take a target two-track music file to generate a melody matrix as an example. As shown in FIG. 4a, FIG. 4a is a schematic diagram of MIDI notes provided by an embodiment of the present application; as shown in FIG. 4b, FIG. 4b is a schematic diagram of a melody matrix provided by an embodiment of the present application. Among them, M represents the melody matrix, and the size of M is 128 rows and 16 columns. Each line in M represents a MIDI note. For example, the first line represents the first MIDI note 00 (hexadecimal note code) in 128 MIDI notes, and the second line represents the second of 128 MIDI notes. MIDI note 01, the thirteenth row represents the thirteenth MIDI note 0C among 128 MIDI notes, etc. Each column in M represents a measure, for example, the first column represents the first measure in the target two-track music file, and the tenth column represents the tenth measure in the target two-track music file. As shown in Figure 4b, the element 1 in the second row and the first column of M indicates that the first note in the first measure of the target dual-track music file is MIDI note 01; the element in the second row and the third column of M Element 1 indicates that the second note of the third bar of the target dual-track music file is MIDI note 01. The element 0 in the first row and fifth column of M indicates that there is no MIDI note 00 in the fifth measure of the target two-track music file.

S305: Obtain the chord matrix of each target dual-track music file on the chord track among the N target dual-track music files, to obtain N chord matrices.

In some feasible implementation manners, only one chord is used in each measure of the target two-track music file. The terminal can obtain the chord used in each bar of each target dual-track music file in the above N target dual-track music files, and can determine the chord category of the chord used in each bar (that is, the chord of each bar is a major chord or Minor chords). The terminal may generate the chord matrix of each target dual-track music file according to the chord adopted in each bar of the target dual-track music file and the chord type of the chord adopted in each bar. Among them, the chord matrix can be a binary matrix of w*m, w can be used to represent the number of bars of the target dual-track music file, w=16; m can be used to represent the chord parameters of each bar, m=13, this 13 The first 12 chord parameters of the two chord parameters represent 12 chords respectively, and the 13th chord parameter represents the chord category, namely major chord or minor chord.

For example, take a target two-track music file to generate a chord matrix as an example. As shown in Fig. 5a, Fig. 5a is a schematic diagram of 24 chords provided in an embodiment of the present application. Among them, major in Figure 5a represents a major chord, minor represents a minor chord; "13" represents the 13th chord parameter, the 13th chord parameter is 0, the chord category is a major chord, and the 13th chord parameter is 1 represents the chord category It is a minor chord. As shown in Fig. 5b, Fig. 5b is a schematic diagram of a chord matrix provided by an embodiment of the present application. Among them, Y in Figure 5b represents the chord matrix, and Y has 16 rows and 13 columns. Each line in Y represents a measure, for example, the first line represents the first bar in the target dual-track music file, the fourth line represents the fourth bar in the target dual-track music file, and so on. Each column in Y represents a chord parameter, the 0 element in the first 12 columns indicates that there is no corresponding chord, the 1 element in the first 12 columns indicates that there is a corresponding chord, and there is only one of the first 12 elements in each row of Y One 1 element. The 13th column of Y indicates the chord category, 0 indicates a major chord, and 1 indicates a minor chord. As shown in Figure 5b, the element in the first row and the 13th column is 1, indicating a minor chord, then the element 1 in the first row and the second column indicates that the first bar of the target dual-track music file adopts the minor chord A#. The element in the second row and the 13th column is 0, indicating a major chord, then the element 1 in the second row and the fourth column indicates that the second measure of the target dual-track music file uses the major chord D#. Because the element in row 16 and column 13 is 0, indicating a major chord, then the element 1 in row 16 and column 1 indicates that the 16th bar of the target dual-track music file uses a major chord C.

S306: Input the N melody matrices and the corresponding N chord matrices in the training data set into the DCGAN model for training, to obtain a trained DCGAN model.

In some feasible implementation manners, the implementation manner of step S306 in the embodiment of the present application can refer to the implementation manner of step S103 in the embodiment shown in FIG. 1, which will not be repeated here.

S307: Acquire a target melody matrix.

In some feasible implementation manners, the terminal may obtain any single-track music file including a melody track from the MIDI data set. The terminal can remove the melody notes whose pitches of the melody notes in the single-track music file are outside the preset pitch range (two octaves from C4 to B5), and only retain the pitch of the melody notes in the single-track music file The melody notes within the preset pitch range will be adjusted to a single-track music file. The terminal may obtain the adjusted melody note of the single-track music file, and may generate the target melody matrix of the adjusted single-track music file according to the adjusted melody note of the single-track music file. Among them, the target melody matrix is a binary matrix of 128*16. Element 0 in the target melody matrix can be used to indicate that the adjusted single-track music file has no MIDI notes at the position corresponding to the 0 element, and element 1 in the target melody matrix can be used to indicate the adjusted single-track music file There is a corresponding MIDI note at the position corresponding to the 1 element.

S308: Input the obtained target melody matrix into the trained DCGAN model for processing, and obtain the target chord matrix generated by the trained DCGAN model that matches the target melody matrix, and output the melody track and target mapped by the target melody matrix The music file after the chord tracks mapped out by the chord matrix are merged.

In some feasible implementation manners, the implementation manner of step S308 in the embodiment of the present application can refer to the implementation manner of step S104 in the embodiment shown in FIG. 1, and details are not described herein again.

In the embodiment of this application, the terminal constructs a deep convolutional generative confrontation network DCGAN model, and then obtains a dual-track data set including multiple dual-track music files, and determines N target dual-tones from the dual-track data set Track music files, obtain the melody matrix of each target dual-track music file on the melody track in the N target dual-track music files, obtain N melody matrices, and obtain each target dual-tone in the N target dual-track music files Track the chord matrix of the music file on the chord track to obtain N chord matrices. The N melody matrices and the corresponding N chord matrices in the training data set are input into the DCGAN model for training, and a trained DCGAN model is obtained. Then obtain the target melody matrix, input the target melody matrix into the trained DCGAN model for processing, and obtain the target chord matrix generated by the trained DCGAN model that matches the target melody matrix, and output the melody track mapped by the target melody matrix and The music file after the chord tracks mapped by the target chord matrix are merged. Can automatically generate music files with chord matching, reducing manual processing links.

Refer to FIG. 6, which is a schematic block diagram of a music generating apparatus provided by an embodiment of the present application. As shown in FIG. 6, the music generating device of the embodiment of the present application includes:

The construction module 10 is used to construct a deep convolutional generative confrontation network DCGAN model;

The first obtaining module 20 is configured to obtain a training data set, the training data set includes N melody matrices and corresponding N chord matrices, wherein the melody matrix and the chord matrix are both binary matrices;

The training module 30 is configured to input the N melody matrices and the corresponding N chord matrices in the training data set into the DCGAN model for training to obtain a trained DCGAN model;

The input module 40 is configured to input the obtained target melody matrix into the trained DCGAN model for processing, and obtain the target chord matrix generated by the trained DCGAN model that matches the target melody matrix;

The output module 50 is configured to output a music file obtained by merging the melody track mapped from the target melody matrix and the chord track mapped from the target chord matrix.

In some feasible implementation manners, the aforementioned DCGAN model includes a generator, a discriminator, and a regulator, and the generator, the discriminator, and the regulator are all convolutional neural network CNN. The above-mentioned training module 30 is specifically configured to: for any melody matrix i in the training data set, input the melody matrix i into the generator of the DCGAN model to generate a first chord matrix j that matches the melody matrix i; A chord matrix j and the corresponding chord matrix k of the melody matrix i in the training data set are input into the discriminator of the DCGAN model. The probability that the first chord matrix j is the same as the chord matrix k is judged; Whether the output probabilities of a chord matrix j are all within the preset range, if otherwise, the probability is input into the regulator of the DCGAN model to adjust the parameters on the transposed convolutional layer of the generator, and the melody matrix i is renewed Input the adjusted generator to regenerate the first chord matrix j matching the melody matrix i, and input the regenerated first chord matrix j and the chord matrix k corresponding to the melody matrix i in the training data set into the The discriminator of the DCGAN model discriminates the probability that the regenerated first chord matrix j is the same as the chord matrix k. When the probability that the discriminator outputs to each first chord matrix generated by the generator is within a preset range, a trained DCGAN model is obtained.

In some feasible implementation manners, the aforementioned first acquiring module 20 includes a first acquiring unit 201, a determining unit 202, a second acquiring unit 203, and a third acquiring unit 204.

The above-mentioned first obtaining unit 201 is configured to obtain a dual-track data set including a plurality of dual-track music files, the dual-track music file is used to represent a music file containing a melody track and a chord track; the above determining unit 202 , Used to determine N target dual-track music files from the dual-track data set; the second acquisition unit 203, used to obtain the target dual-track music files in the N target dual-track music files The melody matrix on the sound track to obtain N melody matrices; the third obtaining unit 204 is used to obtain the chord matrix of each target dual-track music file on the chord track among the N target dual-track music files to obtain N chord matrices. Among them, the chords in the target dual-track music file belong to a preset basic chord set. The basic chord set includes 12 major chords and 12 minor chords. Each measure of the target dual-track music file uses one chord.

In some feasible implementation manners, the aforementioned second obtaining unit 203 is specifically configured to: adjust the melody of each target dual-track music file among the N target dual-track music files to within a preset pitch range; Melody notes in each target dual-track music file; according to the adjusted melody notes in each target dual-track music file, generate the adjusted melody matrix of each target dual-track music file, the melody matrix is h*w The h is used to represent the preset number of notes, and the w is used to represent the number of bars of the target two-track music file.

In some feasible implementation manners, the above-mentioned third obtaining unit 204 is specifically configured to: obtain the chord adopted by each bar of each target dual-track music file in the N target dual-track music files and the chord adopted by each bar The chord category of each target dual-track music file; according to the chord used in each measure of the target dual-track music file and the chord category of the chord used in each measure, the chord matrix of each target dual-track music file is generated, and the chord matrix is w* A binary matrix of m, the w is used to represent the number of bars of the target two-track music file, and the m is used to represent the chord parameters of each bar.

In some feasible implementation manners, the device further includes a second acquisition module 60. The second acquiring module 60 is configured to acquire a single-track music file including a melody track; adjust the melody of the single-track music file to a preset pitch range; acquire the melody in the adjusted single-track music file Note: According to the melody note in the adjusted single-track music file, the target melody matrix of the adjusted single-track music file is generated.

In some feasible embodiments, the generator includes at least one fully connected layer and at least one transposed convolutional layer, the discriminator includes at least one convolutional layer and at least one fully connected layer, and the adjuster includes at least one convolutional layer. Layer and at least one fully connected layer, the regulator is a reverse generator.

In specific implementation, the above-mentioned music generating device can execute the implementation provided by each step in the implementation provided in Figure 1 or Figure 3 through the above-mentioned modules to implement the functions implemented in the above-mentioned embodiments. For details, please refer to the above-mentioned figure. The corresponding description provided in each step in the method embodiment shown in 1 or FIG. 3 will not be repeated here.

In the embodiment of the present application, the music generating device constructs a deep convolutional generative confrontation network DCGAN model, then obtains a training data set, and then inputs the N melody matrices and the corresponding N chord matrices in the training data set to the DCGAN model Training in the DCGAN model to obtain a trained DCGAN model, and then input the obtained target melody matrix into the trained DCGAN model for processing, and obtain the target chord generated by the trained DCGAN model that matches the target melody matrix Matrix, and finally output a music file in which the melody track mapped from the target melody matrix and the chord track mapped from the target chord matrix are combined. Can automatically generate music files with chord matching, reducing manual processing links.

Refer to FIG. 7, which is a schematic block diagram of a terminal provided in an embodiment of the present application. As shown in FIG. 7, the terminal in this embodiment of the present application may include: one or more processors 701; one or more input devices 702, one or more output devices 703, and a memory 704. The aforementioned processor 701, input device 702, output device 703, and memory 704 are connected via a bus 705. The memory 702 is configured to store a computer program including program instructions, and the processor 701 is configured to execute the program instructions stored in the memory 702.

The processor 701 is configured to call the program instructions to execute: construct a deep convolutional generative confrontation network DCGAN model; obtain a training data set, which includes N melody matrices and corresponding N chord matrices, where The melody matrix and the chord matrix are both binary matrices; the N melody matrices and the corresponding N chord matrices in the training data set are input into the DCGAN model for training, and a trained DCGAN model is obtained. The input device 702 is configured to input the acquired target melody matrix into the trained DCGAN model for processing, and acquire a target chord matrix generated by the trained DCGAN model that matches the target melody matrix. The output device 703 is configured to output a music file in which the melody track mapped from the target melody matrix and the chord track mapped from the target chord matrix are combined.

It should be understood that in the embodiments of the present application, the processor 701 may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors or digital signal processors (DSP). , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The input device 702 may include a touch panel, a microphone, etc., and the output device 703 may include a display (LCD, etc.), a speaker, etc.

The memory 704 may include a read-only memory and a random access memory, and provides instructions and data to the processor 701. A part of the memory 704 may also include a non-volatile random access memory. For example, the memory 704 may also store device type information.

In specific implementation, the processor 701, input device 702, and output device 703 described in the embodiments of this application can perform the implementation described in the DCGAN-based music generation method provided in the embodiments of this application, and can also perform the implementation of this application. The implementation of the music generating device described in the embodiment will not be repeated here.

The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program includes program instructions. When the program instructions are executed by a processor, the computer-readable storage medium shown in FIG. 1 or FIG. For the DCGAN music generation method, please refer to the description of the embodiment shown in FIG. 1 or FIG. 3 for specific details, which will not be repeated here.

The above-mentioned computer-readable storage medium may be the internal storage unit of the music generating apparatus or terminal described in any of the foregoing embodiments, such as the hard disk or memory of the terminal. The computer-readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, and a flash memory card equipped on the terminal. (flash card) etc. Further, the computer-readable storage medium may also include both an internal storage unit of the terminal and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the terminal. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A DCGAN-based music generation method is characterized in that it includes:

Construct a deep convolutional generative confrontation network DCGAN model;

Acquiring a training data set, the training data set including N melody matrices and corresponding N chord matrices, wherein the melody matrix and the chord matrix are both binary matrices;

Input the N melody matrices and the corresponding N chord matrices in the training data set into the DCGAN model for training, to obtain a trained DCGAN model;

Input the obtained target melody matrix into the trained DCGAN model for processing, obtain the target chord matrix generated by the trained DCGAN model and match the target melody matrix, and output the target melody matrix to map out A music file obtained by merging the melody track of and the chord track mapped from the target chord matrix.
The method according to claim 1, wherein the DCGAN model includes a generator, a discriminator, and a regulator, and the generator, the discriminator, and the regulator are all convolutional neural network CNN;

The inputting the N melody matrices and the corresponding N chord matrices in the training data set into the DCGAN model for training to obtain a trained DCGAN model includes:

For any melody matrix i in the training data set, input the melody matrix i into the generator of the DCGAN model to generate a first chord matrix j that matches the melody matrix i;

Input the first chord matrix j and the chord matrix k corresponding to the melody matrix i in the training data set into the discriminator of the DCGAN model to discriminate that the first chord matrix j is the same as the chord matrix k Probability

Determine whether the probabilities output by the discriminator for the first chord matrix j are all within a preset range, if not, input the probabilities into the regulator of the DCGAN model to transpose the convolutional layer of the generator The above parameters are adjusted, the melody matrix i is re-input into the adjusted generator to regenerate the first chord matrix j matching the melody matrix i, and the regenerated first chord matrix j and the The probability that the corresponding chord matrix k of the melody matrix i in the training data set is input into the discriminator of the DCGAN model to discriminate that the regenerated first chord matrix j is the same as the chord matrix k;

When the probability that the discriminator outputs for each first chord matrix generated by the generator is within a preset range, a trained DCGAN model is obtained.
The method according to claim 1 or 2, wherein the obtaining a training data set comprises:

Acquiring a dual-track data set including a plurality of dual-track music files, where the dual-track music file is used to represent a music file containing a melody track and a chord track;

N target dual-track music files are determined from the dual-track data set, the chords in the target dual-track music file belong to a preset basic chord set, and the basic chord set includes 12 major chord sums 12 minor chords, one chord is adopted for each measure of the target dual-track music file;

Obtaining a melody matrix of each target dual-track music file on the melody track among the N target dual-track music files to obtain N melody matrices;

Obtain the chord matrix of each target dual-track music file on the chord track among the N target dual-track music files to obtain N chord matrices.
The method according to claim 3, wherein the obtaining the melody matrix of each target dual-track music file on the melody track among the N target dual-track music files to obtain N melody matrices includes :

Adjusting the melody of each target dual-track music file in the N target dual-track music files to within a preset pitch range;

Obtain the melody notes in each target dual-track music file after adjustment;

According to the adjusted melody notes in each target dual-track music file, the melody matrix of each target dual-track music file after adjustment is generated. The melody matrix is a binary matrix of h*w, and the h is Y represents the preset number of notes, and the w is used to represent the number of bars of the target two-track music file.
The method according to claim 3, wherein the obtaining the chord matrix of each target dual-track music file in the chord track among the N target dual-track music files to obtain N chord matrices includes :

Acquiring the chord used in each bar of each target dual-track music file in the N target dual-track music files and the chord category of the chord used in each bar;

According to the chord used in each bar of the target dual-track music file and the chord category of the chord used in each bar, the chord matrix of each target dual-track music file is generated, and the chord matrix is w* A binary matrix of m, where w is used to represent the number of bars of the target two-track music file, and m is used to represent the chord parameters of each bar.
The method according to any one of claims 1 to 5, wherein before inputting the acquired target melody matrix into the trained DCGAN model for processing, the method further comprises:

Obtain a single-track music file including a melody track;

Adjusting the melody of the single-track music file to a preset pitch range;

Get the melody notes in the adjusted single track music file;

According to the melody notes in the adjusted single-track music file, a target melody matrix of the adjusted single-track music file is generated.
The method according to claim 2, wherein the generator includes at least one fully connected layer and at least one transposed convolutional layer, and the discriminator includes at least one convolutional layer and at least one fully connected layer, so The regulator includes at least one convolutional layer and at least one fully connected layer, and the regulator is an inverted generator.
A music generating device, characterized by comprising:

Construction module, used to construct a deep convolutional generative confrontation network DCGAN model;

An acquiring module, configured to acquire a training data set, the training data set includes N melody matrices and corresponding N chord matrices, wherein the melody matrix and the chord matrix are both binary matrices;

A training module, configured to input the N melody matrices and the corresponding N chord matrices in the training data set into the DCGAN model for training, to obtain a trained DCGAN model;

An input module, configured to input the acquired target melody matrix into the trained DCGAN model for processing, and acquire a target chord matrix generated by the trained DCGAN model that matches the target melody matrix;

The output module is used to output a music file obtained by combining the melody track mapped from the target melody matrix and the chord track mapped from the target chord matrix.
The device according to claim 8, wherein the DCGAN model includes a generator, a discriminator, and a regulator, and the generator, the discriminator, and the regulator are all convolutional neural network CNN;

The training module is specifically used for:

For any melody matrix i in the training data set, input the melody matrix i into the generator of the DCGAN model to generate a first chord matrix j that matches the melody matrix i;

Input the first chord matrix j and the chord matrix k corresponding to the melody matrix i in the training data set into the discriminator of the DCGAN model to discriminate that the first chord matrix j is the same as the chord matrix k Probability

Determine whether the probabilities output by the discriminator for the first chord matrix j are all within a preset range, if not, input the probabilities into the regulator of the DCGAN model to transpose the convolutional layer of the generator The above parameters are adjusted, the melody matrix i is re-input into the adjusted generator to regenerate the first chord matrix j matching the melody matrix i, and the regenerated first chord matrix j and the The probability that the corresponding chord matrix k of the melody matrix i in the training data set is input into the discriminator of the DCGAN model to discriminate that the regenerated first chord matrix j is the same as the chord matrix k;

When the probability that the discriminator outputs for each first chord matrix generated by the generator is within a preset range, a trained DCGAN model is obtained.
The device according to claim 8 or 9, wherein the first obtaining module comprises:

The first acquiring unit is configured to acquire a dual-track data set including a plurality of dual-track music files, where the dual-track music file is used to represent a music file containing a melody track and a chord track;

The determining unit is configured to determine N target dual-track music files from the dual-track data set, the chords in the target dual-track music files belong to a preset basic chord set, and the basic chord set includes 12 major chords and 12 minor chords, each measure of the target dual-track music file adopts one chord;

The second acquiring unit is configured to acquire the melody matrix of each target double-track music file on the melody track among the N target double-track music files to obtain N melody matrices;

The third obtaining unit is used to obtain the chord matrix of each target dual-track music file on the chord track among the N target dual-track music files to obtain N chord matrices.
The device according to claim 10, wherein the second acquiring unit is specifically configured to:

Adjusting the melody of each target dual-track music file in the N target dual-track music files to within a preset pitch range;

Obtain the melody notes in each target dual-track music file after adjustment;

According to the adjusted melody notes in each target dual-track music file, the melody matrix of each target dual-track music file after adjustment is generated. The melody matrix is a binary matrix of h*w, and the h is Y represents the preset number of notes, and the w is used to represent the number of bars of the target two-track music file.
The device according to claim 10, wherein the third acquiring unit is specifically configured to:

Acquiring the chord used in each bar of each target dual-track music file in the N target dual-track music files and the chord category of the chord used in each bar;

According to the chord used in each bar of the target dual-track music file and the chord category of the chord used in each bar, the chord matrix of each target dual-track music file is generated, and the chord matrix is w* A binary matrix of m, where w is used to represent the number of bars of the target two-track music file, and m is used to represent the chord parameters of each bar.
The device according to any one of claims 8-12, wherein the device further comprises:

The second acquisition module is configured to acquire a single-track music file including a melody track, adjust the melody of the single-track music file to a preset pitch range, and acquire the melody notes in the adjusted single-track music file And generating the target melody matrix of the adjusted single-track music file according to the melody notes in the adjusted single-track music file.
The device according to claim 9, wherein the generator includes at least one fully connected layer and at least one transposed convolutional layer, and the discriminator includes at least one convolutional layer and at least one fully connected layer, so The regulator includes at least one convolutional layer and at least one fully connected layer, and the regulator is an inverted generator.
A terminal, characterized in that it includes a processor, an input device, an output device, and a memory, the processor, input device, output device, and memory are connected to each other, and the memory is used to store a computer program, and the computer program includes a program Instructions, the processor is used to execute the program instructions of the memory, wherein:

The processor is used to construct a deep convolutional generative confrontation network DCGAN model; to obtain a training data set, the training data set includes N melody matrices and corresponding N chord matrices, wherein the melody matrix and the chord matrix are both two Meta matrix; input the N melody matrices and the corresponding N chord matrices in the training data set into the DCGAN model for training to obtain a trained DCGAN model;

The input device is configured to input the acquired target melody matrix into the trained DCGAN model for processing, and acquire a target chord matrix generated by the trained DCGAN model that matches the target melody matrix;

The output device is configured to output a music file obtained by merging a melody track mapped from the target melody matrix and a chord track mapped from the target chord matrix.
The terminal according to claim 15, wherein the DCGAN model includes a generator, a discriminator, and a regulator, and the generator, the discriminator, and the regulator are all convolutional neural networks CNN;

The processor is specifically used for:

For any melody matrix i in the training data set, input the melody matrix i into the generator of the DCGAN model to generate a first chord matrix j that matches the melody matrix i;

Input the first chord matrix j and the chord matrix k corresponding to the melody matrix i in the training data set into the discriminator of the DCGAN model to discriminate that the first chord matrix j is the same as the chord matrix k Probability

Determine whether the probabilities output by the discriminator for the first chord matrix j are all within a preset range, if not, input the probabilities into the regulator of the DCGAN model to transpose the convolutional layer of the generator The above parameters are adjusted, the melody matrix i is re-input into the adjusted generator to regenerate the first chord matrix j matching the melody matrix i, and the regenerated first chord matrix j and the The probability that the corresponding chord matrix k of the melody matrix i in the training data set is input into the discriminator of the DCGAN model to discriminate that the regenerated first chord matrix j is the same as the chord matrix k;

When the probability that the discriminator outputs for each first chord matrix generated by the generator is within a preset range, a trained DCGAN model is obtained.
The terminal according to claim 15 or 16, wherein the processor is specifically configured to:

Acquiring a dual-track data set including a plurality of dual-track music files, where the dual-track music file is used to represent a music file containing a melody track and a chord track;

N target dual-track music files are determined from the dual-track data set, the chords in the target dual-track music file belong to a preset basic chord set, and the basic chord set includes 12 major chord sums 12 minor chords, one chord is adopted for each measure of the target dual-track music file;

Obtaining a melody matrix of each target dual-track music file on the melody track among the N target dual-track music files to obtain N melody matrices;

Obtain the chord matrix of each target dual-track music file on the chord track in the N target dual-track music files, and obtain N chord matrixes.
The terminal according to claim 17, wherein the processor is further specifically configured to:

Adjusting the melody of each target dual-track music file in the N target dual-track music files to within a preset pitch range;

Obtain the melody notes in each target dual-track music file after adjustment;

According to the adjusted melody notes in each target dual-track music file, the melody matrix of each target dual-track music file after adjustment is generated. The melody matrix is a binary matrix of h*w, and the h is Y represents the preset number of notes, and the w is used to represent the number of bars of the target two-track music file.
The terminal according to claim 17, wherein the processor is further specifically configured to:

Acquiring the chord used in each bar of each target dual-track music file in the N target dual-track music files and the chord category of the chord used in each bar;

According to the chord used in each bar of the target dual-track music file and the chord category of the chord used in each bar, the chord matrix of each target dual-track music file is generated, and the chord matrix is w* A binary matrix of m, where w is used to represent the number of bars of the target two-track music file, and m is used to represent the chord parameters of each bar.
A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to execute The method of any one of 1-7 is required.