CN113077770B - Buddha music generation method, device, equipment and storage medium - Google Patents

Buddha music generation method, device, equipment and storage medium Download PDF

Info

Publication number
CN113077770B
CN113077770B CN202110301852.4A CN202110301852A CN113077770B CN 113077770 B CN113077770 B CN 113077770B CN 202110301852 A CN202110301852 A CN 202110301852A CN 113077770 B CN113077770 B CN 113077770B
Authority
CN
China
Prior art keywords
buddha
preset
buddha music
music
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110301852.4A
Other languages
Chinese (zh)
Other versions
CN113077770A (en
Inventor
蒋慧军
王若竹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110301852.4A priority Critical patent/CN113077770B/en
Publication of CN113077770A publication Critical patent/CN113077770A/en
Application granted granted Critical
Publication of CN113077770B publication Critical patent/CN113077770B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/046Forward inferencing; Production systems
    • G06N5/047Pattern matching networks; Rete networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/021Background music, e.g. for video sequences, elevator music
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/105Composing aid, e.g. for supporting creation, edition or modification of a piece of music

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The invention relates to the field of artificial intelligence, and discloses a method, a device, equipment and a storage medium for generating Buddha music, which are applied to the field of intelligent education and are used for generating Buddha music works more in line with the expectations of users according to a preset Buddha music generation model and Buddha music fragments, so that the efficiency of generating Buddha music is improved. The Buddha music generating method comprises the following steps: obtaining a Buddha music segment to be created; calling a preset variation self-encoder (VAE), converting the Buddha music fragments to be authored into potential variables, and decomposing the potential variables into pitch variables and rhythm variables; calling a preset melody healer Inpainter to obtain an intermediate Buddha music segment; processing the intermediate Buddha music segments based on random unmasked codes to generate target Buddha music segments, calling a preset Connector, combining the target Buddha music segments with a preset Buddha music summary sketch to generate target potential variables; and calling a preset decoder to decode the target potential variable to generate the final Buddha musical composition.

Description

Buddha music generation method, device, equipment and storage medium
Technical Field
The present invention relates to the field of audio conversion, and in particular, to a method, apparatus, device, and storage medium for generating a Buddha music.
Background
Music automatic generation has great influence on research and expansion of human expression creativity. In recent years, neural network technology has achieved good results in the field of music automatic generation. Previous related research efforts have supported various forms of music generation, with research providing a constraint mechanism that allows users to limit the results of the generation to match the style of the composition, with research that can compose accompaniment from an existing melody in classical music, and so on. However, these above methods all require that the user's preferences be defined as a more complete track, which is more difficult for people without composer experience.
In the existing scheme, the most relevant to music generation is music repair, namely, a series of missing metric values are generated according to the context of music background to complete the music work, but the method hardly considers the preference condition of a user, or a plurality of music repair fragments are randomly generated according to the same context of music background, and a user selects a music fragment which is more biased by the user, but the method cannot directly generate the optimal music fragment of the user according to the preference setting of the user.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for generating a Buddha music work which is more in line with the expectations of a user according to a preset Buddha music generation model and Buddha music fragments, thereby reducing the difficulty of the user in creating the Buddha music and improving the efficiency of generating the Buddha music.
The first aspect of the invention provides a Buddha music generating method, comprising the following steps: obtaining a Buddha music segment to be authored, wherein the Buddha music segment to be authored comprises a first Buddha music segment and a second Buddha music segment, and the starting time of the second Buddha music segment is later than the ending time of the first Buddha music segment; calling a preset variation self-encoder (VAE), converting the Buddha music segment to be authored into potential variables, and decomposing the potential variables into pitch variables and rhythm variables; calling a preset melody healer Inpainter, and predicting corresponding Buddha music fragments based on the background of the Buddha music and the potential variables to obtain middle Buddha music fragments; processing the intermediate Buddha music fragments based on random unmasked codes to generate target Buddha music fragments, calling a preset Connector, and combining the target Buddha music fragments with a preset Buddha music summary sketch to generate target potential variables; and calling a preset decoder to decode the target potential variable to generate a final Buddha music work.
Optionally, in a first implementation manner of the first aspect of the present invention, the calling a preset variation self-encoder VAE, converting the phor piece to be authored into a potential variable, and decomposing the potential variable into a pitch variable and a rhythm variable includes: converting the Buddha music segment to be authored into a subsequence consisting of a pitch sequence P and a rhythm sequence R, wherein the pitch sequence P consists of a pitch type presented in the Buddha music segment to be authored, and the rhythm sequence R consists of a duration type presented in the Buddha music segment to be authored; inputting the pitch sequence P and the rhythm sequence R into a preset variation self-encoder (VAE) to generate potential variables; and decomposing the potential variable into a pitch variable and a rhythm variable based on a preset factorization reasoning network.
Optionally, in a second implementation manner of the first aspect of the present invention, the calling a preset melody healer Inpainter, predicting a corresponding score based on a background of the score and the latent variable, and obtaining the intermediate score includes: calling a preset melody healer Inpainter to read the potential variable; inputting the potential variables into a pitch gating and circulating unit GRU and a rhythm gating and circulating unit GRU to obtain a basic Buddha music segment; an intermediate Buddha segment is generated based on the background of Buddha and the base Buddha segment.
Optionally, in a third implementation manner of the first aspect of the present invention, the processing the intermediate phor segment based on the random unmasked mask to generate a target phor segment, calling a preset Connector, and combining the target phor segment with a preset phor summary sketch, where generating the target latent variable includes: controlling and modifying the intermediate Buddha music segment based on a preset random unmasked code to generate a target Buddha music segment; calling a preset Connector to read a preset Buddha music summary sketch, wherein the preset Buddha music summary sketch comprises pitch and rhythm information input by a user; and combining the target Buddha music segment with the preset Buddha music summary sketch based on the preset Connector, generating a target latent variable and sending the target latent variable to a preset decoder.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the invoking the preset decoder to decode the target latent variable includes: calling a preset decoder to read the target potential variable; and decoding the target potential variable based on the preset decoder to generate a final Buddha musical composition.
Optionally, in a fifth implementation manner of the first aspect of the present invention, after the capturing the piece of the phor to be authored, the piece of the phor to be authored includes a past piece of the phor and a future piece of the phor, before the invoking the preset variational self-encoder VAE, converting the piece of the phor to be authored into the latent variable, and decomposing the latent variable into a pitch part and a rhythm part, the method further includes: a preset sketch of a phoropter overview is received, the preset sketch of a phoropter overview comprising pitch and rhythm information input by a user.
Optionally, in a sixth implementation manner of the first aspect of the present invention, after the calling a preset decoder to decode the target latent variable to generate the final foole work, the method further includes: calculating a loss function l i (theta, phi), the specific formula is:
where θ is a parameter of the preset variable self-encoder VAE, φ is a parameter of the preset decoder, let θ denote a mapping from x to z, φ denote a reconstruction from z to x, q θ (z|x i ) Is a posterior distribution of z derived from x, p (z) is a priori distribution of z, and p (z) is assumed to be a gaussian distribution N (0, 1) with a mean of 0 and a variance of 1.
A second aspect of the present invention provides a phor musical generation device comprising: the system comprises an acquisition module, a storage module and a control module, wherein the acquisition module is used for acquiring a Buddha music segment to be authored, the Buddha music segment to be authored comprises a first Buddha music segment and a second Buddha music segment, and the starting time of the second Buddha music segment is later than the ending time of the first Buddha music segment; the conversion module is used for calling a preset variational self-encoder (VAE), converting the Buddha music fragments to be authored into potential variables and decomposing the potential variables into pitch variables and rhythm variables; the prediction module is used for calling a preset melody healer Inpainter, predicting corresponding Buddha music fragments based on the background of the Buddha music and the potential variables, and obtaining middle Buddha music fragments; the processing module is used for processing the intermediate Buddha music fragments based on random unmasked codes to generate target Buddha music fragments, calling a preset Connector, and combining the target Buddha music fragments with a preset Buddha music summary sketch to generate target potential variables; and the decoding module is used for calling a preset decoder to decode the target potential variable to generate a final Buddha music work.
Optionally, in a first implementation manner of the second aspect of the present invention, the conversion module includes: the conversion unit is used for converting the Buddha music segment to be authored into a subsequence consisting of a pitch sequence P and a rhythm sequence R, wherein the pitch sequence P consists of a pitch type presented in the Buddha music segment to be authored, and the rhythm sequence R consists of a duration type presented in the Buddha music segment to be authored; the first input unit is used for inputting the pitch sequence P and the rhythm sequence R into a preset variation self-encoder VAE to generate potential variables; and the decomposition unit is used for decomposing the potential variable into a pitch variable and a rhythm variable based on a preset factorization reasoning network.
Optionally, in a second implementation manner of the second aspect of the present invention, the prediction module includes: the first reading unit is used for calling a preset melody healer Inpainter to read the potential variables; the second input unit is used for inputting the potential variable into the pitch gating and circulating unit GRU and the rhythm gating and circulating unit GRU to obtain a basic Buddha music segment; and the generation unit is used for generating an intermediate Buddha music segment based on the background of the Buddha music and the basic Buddha music segment.
Optionally, in a third implementation manner of the second aspect of the present invention, the processing module includes: a modification unit, configured to control and modify the intermediate phora segment based on a preset random unmasked mask, and generate a target phora segment; the second reading unit is used for calling a preset Connector to read a preset Buddha's music summary sketch, wherein the preset Buddha's music summary sketch comprises pitch and rhythm information input by a user; and the combining unit is used for combining the target Buddha music segment with the preset Buddha music summary sketch based on the preset Connector, generating a target potential variable and sending the target potential variable to a preset decoder.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the decoding module includes: a third reading unit, configured to invoke a preset decoder to read the target latent variable; and the decoding unit is used for decoding the target latent variable based on the preset decoder to generate a final Buddha musical composition.
Optionally, in a fifth implementation manner of the second aspect of the present invention, after the capturing the piece of the phor to be authored, the piece of the phor to be authored includes the past piece of the phor and the future piece of the phor, before the invoking the preset variance self-encoder VAE, converting the piece of the phor to be authored into the potential variable and decomposing the potential variable into the pitch part and the rhythm part, the apparatus further includes: and the receiving module is used for receiving a preset Buddha's music summary sketch, wherein the preset Buddha's music summary sketch comprises pitch and rhythm information input by a user.
Optionally, in a sixth implementation manner of the second aspect of the present invention, after the invoking the preset decoder to decode the target latent variable to generate the final foci work, the apparatus further includes: calculating a loss function l i (theta, phi), the specific formula is:
where θ is a parameter of the preset variable self-encoder VAE, φ is a parameter of the preset decoder, let θ denote a mapping from x to z, φ denote a reconstruction from z to x, q θ (z|x i ) Is a posterior distribution of z derived from x, p (z) is a priori distribution of z, and p (z) is assumed to be a gaussian distribution N (0, 1) with a mean of 0 and a variance of 1.
A third aspect of the present invention provides a phor musical generation device comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the Buddha music generating apparatus to perform the Buddha music generating method described above.
A fourth aspect of the present invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the above-described method of generating a phoropter.
According to the technical scheme provided by the invention, the Buddha music fragments to be authored are obtained, wherein the Buddha music fragments to be authored comprise a first Buddha music fragment and a second Buddha music fragment, and the starting time of the second Buddha music fragment is later than the ending time of the first Buddha music fragment; calling a preset variation self-encoder (VAE), converting the Buddha music segment to be authored into potential variables, and decomposing the potential variables into pitch variables and rhythm variables; calling a preset melody healer Inpainter, and predicting corresponding Buddha music fragments based on the background of the Buddha music and the potential variables to obtain middle Buddha music fragments; processing the intermediate Buddha music fragments based on random unmasked codes to generate target Buddha music fragments, calling a preset Connector, and combining the target Buddha music fragments with a preset Buddha music summary sketch to generate target potential variables; and calling a preset decoder to decode the target potential variable to generate a final Buddha music work. According to the embodiment of the invention, the Buddha music works which are more in line with the expectations of the user are generated according to the preset Buddha music generation model and the Buddha music fragments, so that the difficulty of the user in creating the Buddha music is reduced, and the efficiency of generating the Buddha music is improved.
Drawings
FIG. 1 is a diagram illustrating a Buddha music generating method according to an embodiment of the present invention;
FIG. 2 is a diagram of another embodiment of a Buddha music generating method according to the present invention;
FIG. 3 is a schematic diagram of an embodiment of a Buddha music generating apparatus according to the present invention;
fig. 4 is a schematic view of another embodiment of a phorail generating device according to an embodiment of the present invention;
fig. 5 is a schematic view of an embodiment of a Buddha music generating apparatus according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for generating a Buddha music work which is more in line with the expectations of a user according to a preset Buddha music generation model and a Buddha music segment, thereby reducing the difficulty of the user in creating the Buddha music and improving the efficiency of generating the Buddha music.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For ease of understanding, a specific flow of an embodiment of the present invention is described below with reference to fig. 1, and an embodiment of a method for generating a phorail according to an embodiment of the present invention includes:
101. the method comprises the steps of obtaining a Buddha music piece to be authored, wherein the Buddha music piece to be authored comprises a first Buddha music piece and a second Buddha music piece, and the starting time of the second Buddha music piece is later than the ending time of the first Buddha music piece.
The server acquires a Buddha music segment to be authored, wherein the Buddha music segment to be authored comprises a first Buddha music segment and a second Buddha music segment, and the starting time of the second Buddha music segment is later than the ending time of the first Buddha music segment. And a preset duration exists between the first Buddha music segment and the second Buddha music segment and is used for inserting pitch and rhythm information input by a user to generate a final Buddha music work.
It is to be understood that the execution subject of the present invention may be a phor generating device, or may be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
102. And calling a preset variational self-encoder VAE, converting the Buddha music fragments to be authored into potential variables, and decomposing the potential variables into pitch variables and rhythm variables.
The server invokes a preset variation self-encoder VAE, converts the piece of the phor to be authored into potential variables, and decomposes the potential variables into pitch and rhythm variables. Specifically, the server converts the Buddha music segment to be authored into a subsequence consisting of a pitch sequence P and a rhythm sequence R, wherein the pitch sequence P consists of a pitch type presented in the Buddha music segment to be authored, and the rhythm sequence R consists of a duration type presented in the Buddha music segment to be authored; the server inputs a pitch sequence P and a rhythm sequence R into a preset variation self-encoder VAE to generate potential variables; the server decomposes the latent variable into pitch and tempo variables based on a preset factorized inference network.
For example, the pitch sequence P may use D 5 、A 4 、B 5 And G 4 The pitch of each note in the music score is represented by a note duration "-" of the rhythm sequence R, the minimum unit of the duration is one sixteenth note, the preset variation self-encoder VAE comprises a learnable embedded layer, a pitch gating circulating unit GRU, a rhythm gating circulating unit GRU and two linear layers, potential variables are obtained through normal distribution, the VAE assumes the hidden layer after being encoded by a neural network to be a standard Gaussian distribution, then a feature is sampled from the distribution, the feature is used for decoding, the result identical to the original input is expected, compared with a common self-encoder, a regular term of KL divergence of the encoding inferred distribution and the standard Gaussian distribution is added, the KL divergence refers to relative entropy, and the asymmetry measure of difference between the two probability distributions is obtained.
103. And calling a preset melody healer Inpainter, and predicting corresponding Buddha music fragments based on the background and the potential variables of the Buddha music to obtain intermediate Buddha music fragments.
The server calls a preset melody healer Inpainter, predicts corresponding Buddha music fragments based on the background and the potential variables of the Buddha music, and obtains intermediate Buddha music fragments. Specifically, the server calls a preset melody healer Inpainter to read potential variables; the server inputs potential variables into a pitch gating and circulating unit GRU and a rhythm gating and circulating unit GRU to obtain a basic Buddha music segment; the server generates an intermediate Buddha segment based on the background of the Buddha and the base Buddha segment.
The melody healer Inpainter comprises a preset melody healer Inpaint algorithm and a preset gradient optimization algorithm Adam, predicts corresponding Buddha music fragments based on background style characteristics of the Buddha music, specifies the style of the Buddha music by controlling pitch and rhythm, does not perform Buddha music creation through a complete sound track, and reduces the difficulty of creating the Buddha music.
104. Processing the intermediate Buddha segments based on random unmasked codes to generate target Buddha segments, calling a preset Connector, combining the target Buddha segments with a preset Buddha overview sketch, and generating target potential variables.
The server processes the intermediate Buddha segments based on random unmasked, generates target Buddha segments, calls a preset Connector, combines the target Buddha segments with a preset Buddha overview sketch, and generates target potential variables. Specifically, the server controls and modifies the middle Buddha music segment based on a preset random unmasked code to generate a target Buddha music segment; the server calls a preset Connector to read a preset sketch of the Buddha's music summary, wherein the sketch of the Buddha's music summary comprises pitch and rhythm information input by a user; the server combines the target phor clip with the preset phor summary sketch based on the preset Connector, generates a target latent variable and sends the target latent variable to the preset decoder.
The preset Connector combines the user-entered phor summary sketch containing pitch and rhythm information with the target phor clip, and comprises a Kafka Connector for providing streaming integration between data storage and a Kafka queue, the Kafka Connector having rich application program interface APIs, and further having a representational state transfer application program interface REST API for configuring and managing the Connector. The Kafka connector itself is a modular part of the key components that comprise connectors for defining a set of JAR files associated with data storage integration and converters for handling serialization and de-serialization of data.
105. And calling a preset decoder to decode the target potential variable to generate the final Buddha musical composition.
The server invokes a preset decoder to decode the target latent variable to generate a final Buddha musical composition. Specifically, the server calls a preset decoder to read the target potential variable; the server decodes the target latent variable based on a preset decoder to generate a final Buddha musical composition.
According to the embodiment of the invention, the Buddha music works which are more in line with the expectations of the user are generated according to the preset Buddha music generation model and the Buddha music fragments, so that the difficulty of the user in creating the Buddha music is reduced, and the efficiency of generating the Buddha music is improved. The scheme can be applied to the field of intelligent education, so that the construction of intelligent cities is promoted.
Referring to fig. 2, another embodiment of the method for generating a phor according to the present invention includes:
201. the method comprises the steps of obtaining a Buddha music piece to be authored, wherein the Buddha music piece to be authored comprises a first Buddha music piece and a second Buddha music piece, and the starting time of the second Buddha music piece is later than the ending time of the first Buddha music piece.
The server acquires a Buddha music segment to be authored, wherein the Buddha music segment to be authored comprises a first Buddha music segment and a second Buddha music segment, and the starting time of the second Buddha music segment is later than the ending time of the first Buddha music segment. And a preset duration exists between the first Buddha music segment and the second Buddha music segment and is used for inserting pitch and rhythm information input by a user to generate a final Buddha music work.
It is to be understood that the execution subject of the present invention may be a phor generating device, or may be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
202. And calling a preset variational self-encoder VAE, converting the Buddha music fragments to be authored into potential variables, and decomposing the potential variables into pitch variables and rhythm variables.
The server invokes a preset variation self-encoder VAE, converts the piece of the phor to be authored into potential variables, and decomposes the potential variables into pitch and rhythm variables. Specifically, the server converts the Buddha music segment to be authored into a subsequence consisting of a pitch sequence P and a rhythm sequence R, wherein the pitch sequence P consists of a pitch type presented in the Buddha music segment to be authored, and the rhythm sequence R consists of a duration type presented in the Buddha music segment to be authored; the server inputs a pitch sequence P and a rhythm sequence R into a preset variation self-encoder VAE to generate potential variables; the server decomposes the latent variable into pitch and tempo variables based on a preset factorized inference network.
For example, the pitch sequence P may use D 5 、A 4 、B 5 And G 4 The pitch of each note in the music score is represented by a note duration "-" of the rhythm sequence R, the minimum unit of the duration is one sixteenth note, the preset variation self-encoder VAE comprises a learnable embedded layer, a pitch gating circulating unit GRU, a rhythm gating circulating unit GRU and two linear layers, potential variables are obtained through normal distribution, the VAE assumes the hidden layer after being encoded by a neural network to be a standard Gaussian distribution, then a feature is sampled from the distribution, the feature is used for decoding, the result identical to the original input is expected, compared with a common self-encoder, a regular term of KL divergence of the encoding inferred distribution and the standard Gaussian distribution is added, the KL divergence refers to relative entropy, and the asymmetry measure of difference between the two probability distributions is obtained.
203. And calling a preset melody healer Inpainter to read the potential variable.
The server calls a preset melody healer Inpainter to read the potential variables. The melody healer Inpainter comprises a preset melody healing Inpaint algorithm and a preset gradient optimization algorithm Adam, wherein Adam is a first-order optimization algorithm capable of replacing the traditional random gradient descent process, and can iteratively update the weight of the neural network based on training data.
204. And inputting the potential variables into a pitch gating and circulating unit GRU and a rhythm gating and circulating unit GRU to obtain the basic Buddha music segment.
The server inputs potential variables into a pitch gating and circulating unit GRU and a rhythm gating and circulating unit GRU to obtain a basic Buddha music segment.
The gating circulation unit GRU is a model which keeps the effect of a long and short term memory network LSTM, but has a simpler structure, fewer parameters and better convergence, the GRU consists of an updating gate and a resetting gate, the influence degree of an output hidden layer at the previous moment on a current hidden layer is controlled by the updating gate, and the larger the value of the updating gate is, the larger the influence of hidden layer output at the previous moment on the current hidden layer is; the degree to which the hidden layer information at the previous time is ignored is controlled by the reset gate, and the smaller the value of the reset gate, the more ignored. The GRU is simpler in structure: one gate less than LSTM reduces the number of matrix multiplications and GRU can save much time in the case of large training data.
205. Based on the background of the Buddha and the base Buddha segment, an intermediate Buddha segment is generated.
The server generates an intermediate Buddha segment based on the background of the Buddha and the base Buddha segment. The background of Buddha music includes the history of creation, transmission and development of Buddha music, the protective inheritance value of Buddha music and representative works.
206. The server processes the intermediate Buddha segments based on random unmasked, generates target Buddha segments, calls a preset Connector, combines the target Buddha segments with a preset Buddha overview sketch, and generates target potential variables. Specifically, the server controls and modifies the middle Buddha music segment based on a preset random unmasked code to generate a target Buddha music segment; the server calls a preset Connector to read a preset sketch of the Buddha's music summary, wherein the sketch of the Buddha's music summary comprises pitch and rhythm information input by a user; the server combines the target phor clip with the preset phor summary sketch based on the preset Connector, generates a target latent variable and sends the target latent variable to the preset decoder.
The preset Connector combines the user-entered phor summary sketch containing pitch and rhythm information with the target phor clip, and comprises a Kafka Connector for providing streaming integration between data storage and a Kafka queue, the Kafka Connector having rich application program interface APIs, and further having a representational state transfer application program interface REST API for configuring and managing the Connector. The Kafka connector itself is a modular part of the key components that comprise connectors for defining a set of JAR files associated with data storage integration and converters for handling serialization and de-serialization of data.
207. And calling a preset decoder to decode the target potential variable to generate the final Buddha musical composition.
The server invokes a preset decoder to decode the target latent variable to generate a final Buddha musical composition. Specifically, the server calls a preset decoder to read the target potential variable; the server decodes the target latent variable based on a preset decoder to generate a final Buddha musical composition.
According to the embodiment of the invention, the Buddha music works which are more in line with the expectations of the user are generated according to the preset Buddha music generation model and the Buddha music fragments, so that the difficulty of the user in creating the Buddha music is reduced, and the efficiency of generating the Buddha music is improved. The scheme can be applied to the field of intelligent education, so that the construction of intelligent cities is promoted.
The method for generating the Buddha in the embodiment of the present invention is described above, and the apparatus for generating the Buddha in the embodiment of the present invention is described below, referring to fig. 3, one embodiment of the apparatus for generating the Buddha in the embodiment of the present invention includes:
an obtaining module 301, configured to obtain a piece of Buddha to be authored, where the piece of Buddha to be authored includes a first piece of Buddha and a second piece of Buddha, and a start time of the second piece of Buddha is later than a termination time of the first piece of Buddha;
the conversion module 302 is configured to call a preset variable self-encoder VAE, convert a piece of the phor to be authored into a potential variable, and decompose the potential variable into a pitch variable and a rhythm variable;
the prediction module 303 is configured to call a preset melody healer Inpainter, predict a corresponding Buddha music segment based on a background and a latent variable of the Buddha music, and obtain an intermediate Buddha music segment;
the processing module 304 is configured to process the intermediate Buddha segments based on a random unmasked code, generate a target Buddha segment, call a preset Connector, combine the target Buddha segment with a preset Buddha overview sketch, and generate a target potential variable;
and the decoding module 305 is used for calling a preset decoder to decode the target potential variable to generate a final Buddha music work.
According to the embodiment of the invention, the Buddha music works which are more in line with the expectations of the user are generated according to the preset Buddha music generation model and the Buddha music fragments, so that the difficulty of the user in creating the Buddha music is reduced, and the efficiency of generating the Buddha music is improved. The scheme can be applied to the field of intelligent education, so that the construction of intelligent cities is promoted.
Referring to fig. 4, another embodiment of the apparatus for generating a Buddha music according to the present invention includes:
an obtaining module 301, configured to obtain a piece of Buddha to be authored, where the piece of Buddha to be authored includes a first piece of Buddha and a second piece of Buddha, and a start time of the second piece of Buddha is later than a termination time of the first piece of Buddha;
the conversion module 302 is configured to call a preset variable self-encoder VAE, convert a piece of the phor to be authored into a potential variable, and decompose the potential variable into a pitch variable and a rhythm variable;
the prediction module 303 is configured to call a preset melody healer Inpainter, predict a corresponding Buddha music segment based on a background and a latent variable of the Buddha music, and obtain an intermediate Buddha music segment;
the processing module 304 is configured to process the intermediate Buddha segments based on a random unmasked code, generate a target Buddha segment, call a preset Connector, combine the target Buddha segment with a preset Buddha overview sketch, and generate a target potential variable;
and the decoding module 305 is used for calling a preset decoder to decode the target potential variable to generate a final Buddha music work.
Optionally, the conversion module 302 includes:
a conversion unit 3021 for converting a piece of a verse to be authored into a sub-sequence consisting of a pitch sequence P and a rhythm sequence R, the pitch sequence P consisting of pitch types presented in the piece of a verse to be authored, the rhythm sequence R consisting of duration types presented in the piece of a verse to be authored;
a first input unit 3022 for inputting the pitch sequence P and the rhythm sequence R into a preset variance self-encoder VAE to generate latent variables;
a decomposition unit 3023 for decomposing the latent variable into a pitch variable and a tempo variable based on a preset factorized inference network.
Optionally, the prediction module 303 includes:
a first reading unit 3031, configured to invoke a preset melody healer Inpainter to read the latent variable;
the second input unit 3032 is used for inputting potential variables into the pitch gating and circulating unit GRU and the rhythm gating and circulating unit GRU to obtain a basic Buddha music segment;
a generating unit 3033 for generating an intermediate Buddha segment based on the background of the Buddha and the base Buddha segment.
Optionally, the processing module 304 includes:
a modification unit 3041 for controlling and modifying the intermediate Buddha musical piece based on a preset random unmasked code to generate a target Buddha musical piece;
a second reading unit 3042, configured to invoke a preset Connector to read a preset sketch of a vered summary of the vered, where the preset sketch of the vered summary of the vered includes pitch and rhythm information input by a user;
a combining unit 3043 for combining the target phor clip with the preset phor summary sketch based on the preset Connector, generating target latent variable and transmitting the target latent variable to the preset decoder.
Optionally, the decoding module 305 includes:
a third reading unit 3051 for calling a preset decoder to read the target latent variable;
and a decoding unit 3052 for decoding the target latent variable based on a preset decoder to generate a final Buddha musical composition.
According to the embodiment of the invention, the Buddha music works which are more in line with the expectations of the user are generated according to the preset Buddha music generation model and the Buddha music fragments, so that the difficulty of the user in creating the Buddha music is reduced, and the efficiency of generating the Buddha music is improved. The scheme can be applied to the field of intelligent education, so that the construction of intelligent cities is promoted.
Fig. 3 and 4 above describe the detailed description of the phor generating device in the embodiment of the present invention from the point of view of the modularized functional entity, and the detailed description of the phor generating apparatus in the embodiment of the present invention from the point of view of the hardware processing.
Fig. 5 is a schematic structural diagram of a device for generating a phor, where the phor generating device 500 may have a relatively large difference according to different configurations or performances, and may include one or more processors (central processing units, CPU) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) storing application programs 533 or data 532 according to an embodiment of the present invention. Wherein memory 520 and storage medium 530 may be transitory or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations to the phoropter generation device 500. Still further, the processor 510 may be configured to communicate with the storage medium 530 and execute a series of instruction operations in the storage medium 530 on the Buddha music generating apparatus 500.
The Buddha music generating device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input/output interfaces 560, and/or one or more operating systems 531, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the configuration of the Buddha music generating apparatus shown in FIG. 5 is not limiting of the apparatus, and may include more or fewer components than shown, or may be combined with certain components, or may have a different arrangement of components.
The present invention also provides a Buddha music generating apparatus including a memory and a processor, the memory storing computer readable instructions that, when executed by the processor, cause the processor to perform the steps of the Buddha music generating method in the above embodiments.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, or may be a volatile computer readable storage medium, having stored therein instructions that, when executed on a computer, cause the computer to perform the steps of the Buddha music generating method.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. A method of generating a Buddha music, the method comprising:
obtaining a Buddha music segment to be authored, wherein the Buddha music segment to be authored comprises a first Buddha music segment and a second Buddha music segment, and the starting time of the second Buddha music segment is later than the ending time of the first Buddha music segment;
calling a preset variation self-encoder (VAE), converting the Buddha music segment to be authored into potential variables, and decomposing the potential variables into pitch variables and rhythm variables;
the calling a preset variation self-encoder (VAE) to convert the Buddha music segment to be authored into potential variables and decompose the potential variables into pitch variables and rhythm variables comprises the following steps:
converting the Buddha music segment to be authored into a subsequence consisting of a pitch sequence P and a rhythm sequence R, wherein the pitch sequence P consists of a pitch type presented in the Buddha music segment to be authored, and the rhythm sequence R consists of a duration type presented in the Buddha music segment to be authored;
inputting the pitch sequence P and the rhythm sequence R into a preset variation self-encoder (VAE) to generate potential variables;
decomposing the potential variable into a pitch variable and a rhythm variable based on a preset factorization reasoning network;
calling a preset melody healer Inpainter, and predicting corresponding Buddha music fragments based on the background of the Buddha music and the potential variables to obtain middle Buddha music fragments;
the calling a preset melody healer Inpainter, predicting corresponding Buddha music fragments based on the background of the Buddha music and the latent variable, and obtaining intermediate Buddha music fragments comprises:
calling a preset melody healer Inpainter to read the potential variable;
inputting the potential variables into a pitch gating and circulating unit GRU and a rhythm gating and circulating unit GRU to obtain a basic Buddha music segment;
generating an intermediate Buddha music segment based on the background of Buddha music and the base Buddha music segment;
processing the intermediate Buddha music fragments based on random unmasked codes to generate target Buddha music fragments, calling a preset Connector, and combining the target Buddha music fragments with a preset Buddha music summary sketch to generate target potential variables;
the processing the intermediate Buddha music segment based on random unmasked, generating a target Buddha music segment, calling a preset Connector, combining the target Buddha music segment with a preset Buddha music summary sketch, and generating target potential variables comprises:
controlling and modifying the intermediate Buddha music segment based on a preset random unmasked code to generate a target Buddha music segment;
calling a preset Connector to read a preset Buddha music summary sketch, wherein the preset Buddha music summary sketch comprises pitch and rhythm information input by a user;
combining the target phor segment with the preset phor summary sketch based on the preset Connector, generating a target latent variable and sending the target latent variable to a preset decoder;
and calling a preset decoder to decode the target potential variable to generate a final Buddha music work.
2. The method of claim 1, wherein invoking a preset decoder to decode the target latent variable to generate a final Buddha musical composition comprises:
calling a preset decoder to read the target potential variable;
and decoding the target potential variable based on the preset decoder to generate a final Buddha musical composition.
3. The method of generating a phor of claim 1 or 2, wherein after the capturing of the phor pieces to be authored, the phor pieces to be authored including a past phor piece and a future phor piece, the method further comprises, before the invoking the preset variational self-encoder VAE, converting the phor pieces to be authored into potential variables, and decomposing the potential variables into a pitch part and a rhythm part:
a preset sketch of a phoropter overview is received, the preset sketch of a phoropter overview comprising pitch and rhythm information input by a user.
4. The method of claim 1, wherein after the invoking the preset decoder to decode the target latent variable to generate the final Buddha musical piece, the method further comprises:
calculating a loss function l i (theta, phi), the specific formula is:
where θ is a parameter of the preset variable self-encoder VAE, φ is a parameter of the preset decoder, let θ denote a mapping from x to z, φ denote a reconstruction from z to x, q θ (z|x i ) Is a posterior distribution of z derived from x, p (z) is a priori distribution of z, and p (z) is assumed to be a gaussian distribution N (0, 1) with a mean of 0 and a variance of 1.
5. A phor generating apparatus that performs the phor generating method according to claim 1, characterized in that the phor generating apparatus comprises:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a Buddha music segment to be authored, and the Buddha music segment to be authored comprises a past Buddha music segment and a future Buddha music segment;
the conversion module is used for calling a preset variational self-encoder (VAE), converting the Buddha music fragments to be authored into potential variables and decomposing the potential variables into pitch variables and rhythm variables;
the prediction module is used for calling a preset melody healer Inpainter, predicting corresponding Buddha music fragments based on the background of the Buddha music and the potential variables, and obtaining middle Buddha music fragments;
the processing module is used for processing the intermediate Buddha music fragments based on random unmasked codes to generate target Buddha music fragments, calling a preset Connector, and combining the target Buddha music fragments with a preset Buddha music summary sketch to generate target potential variables;
and the decoding module is used for calling a preset decoder to decode the target potential variable to generate a final Buddha music work.
6. A phor music generating apparatus, characterized in that the phor music generating apparatus comprises: a memory and at least one processor, the memory having instructions stored therein;
the at least one processor invokes the instructions in the memory to cause the Buddha music generating device to perform the Buddha music generating method of any of claims 1-4.
7. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement a method of generating a phoropter of any of claims 1-4.
CN202110301852.4A 2021-03-22 2021-03-22 Buddha music generation method, device, equipment and storage medium Active CN113077770B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110301852.4A CN113077770B (en) 2021-03-22 2021-03-22 Buddha music generation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110301852.4A CN113077770B (en) 2021-03-22 2021-03-22 Buddha music generation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113077770A CN113077770A (en) 2021-07-06
CN113077770B true CN113077770B (en) 2024-03-05

Family

ID=76613396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110301852.4A Active CN113077770B (en) 2021-03-22 2021-03-22 Buddha music generation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113077770B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03203785A (en) * 1989-12-30 1991-09-05 Casio Comput Co Ltd Music part generating device
US6297439B1 (en) * 1998-08-26 2001-10-02 Canon Kabushiki Kaisha System and method for automatic music generation using a neural network architecture
CN102610222A (en) * 2007-02-01 2012-07-25 缪斯亚米有限公司 Music transcription method, system and device
CN109671416A (en) * 2018-12-24 2019-04-23 成都嗨翻屋科技有限公司 Music rhythm generation method, device and user terminal based on enhancing study
CN110853604A (en) * 2019-10-30 2020-02-28 西安交通大学 Automatic generation method of Chinese folk songs with specific region style based on variational self-encoder
CN112331170A (en) * 2020-10-28 2021-02-05 平安科技(深圳)有限公司 Method, device and equipment for analyzing similarity of Buddha music melody and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10453479B2 (en) * 2011-09-23 2019-10-22 Lessac Technologies, Inc. Methods for aligning expressive speech utterances with text and systems therefor

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03203785A (en) * 1989-12-30 1991-09-05 Casio Comput Co Ltd Music part generating device
US6297439B1 (en) * 1998-08-26 2001-10-02 Canon Kabushiki Kaisha System and method for automatic music generation using a neural network architecture
CN102610222A (en) * 2007-02-01 2012-07-25 缪斯亚米有限公司 Music transcription method, system and device
CN109671416A (en) * 2018-12-24 2019-04-23 成都嗨翻屋科技有限公司 Music rhythm generation method, device and user terminal based on enhancing study
CN110853604A (en) * 2019-10-30 2020-02-28 西安交通大学 Automatic generation method of Chinese folk songs with specific region style based on variational self-encoder
CN112331170A (en) * 2020-10-28 2021-02-05 平安科技(深圳)有限公司 Method, device and equipment for analyzing similarity of Buddha music melody and storage medium

Also Published As

Publication number Publication date
CN113077770A (en) 2021-07-06

Similar Documents

Publication Publication Date Title
Johnson et al. Learning nonlinear functions using regularized greedy forest
CN109727590B (en) Music generation method and device based on recurrent neural network
JP6871809B2 (en) Information processing equipment, information processing methods, and programs
US7418434B2 (en) Forward-chaining inferencing
CN111310436B (en) Text processing method and device based on artificial intelligence and electronic equipment
Xu et al. A multiple priority queueing genetic algorithm for task scheduling on heterogeneous computing systems
CN109101624A (en) Dialog process method, apparatus, electronic equipment and storage medium
CN108320778A (en) Medical record ICD coding methods and system
JP6680659B2 (en) Information processing device, information processing method, and program
CN113077770B (en) Buddha music generation method, device, equipment and storage medium
Hirata et al. Reconstructing state spaces from multivariate data using variable delays
CN113421646A (en) Method and device for predicting duration of illness, computer equipment and storage medium
WO2024027406A1 (en) Artificial-intelligence-based remote massage control method and system
CN116469359A (en) Music style migration method, device, computer equipment and storage medium
Sharma et al. Genetic algorithm optimal approach for scheduling processes in operating system
CN112420002A (en) Music generation method, device, electronic equipment and computer readable storage medium
CN115762449A (en) Conditional music theme melody automatic generation method and system based on Transformer
CN115602139A (en) Automatic music generation method and device based on two-stage generation model
WO2021240715A1 (en) Mood prediction method, mood prediction device, and program
CN113066457B (en) Fan-exclamation music generation method, device, equipment and storage medium
CN113032615A (en) Meditation music generation method, device, equipment and storage medium
Srinivasan et al. An integrated framework for devising optimum generation schedules
Sarkar et al. Genetic Sequence compression using Machine Learning and Arithmetic Encoding Decoding Techniques
KR102378038B1 (en) Graph generating device and method for obtaining synthetic graphs having properties of target network
CN112951187B (en) Var-bei music generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant