CN112951239A - Fole generation method, device, equipment and storage medium based on attention model - Google Patents

Fole generation method, device, equipment and storage medium based on attention model Download PDF

Info

Publication number
CN112951239A
CN112951239A CN202110311437.7A CN202110311437A CN112951239A CN 112951239 A CN112951239 A CN 112951239A CN 202110311437 A CN202110311437 A CN 202110311437A CN 112951239 A CN112951239 A CN 112951239A
Authority
CN
China
Prior art keywords
matrix
attention
generating
vector
output matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110311437.7A
Other languages
Chinese (zh)
Other versions
CN112951239B (en
Inventor
刘奡智
郭锦岳
韩宝强
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202110311437.7A priority Critical patent/CN112951239B/en
Publication of CN112951239A publication Critical patent/CN112951239A/en
Application granted granted Critical
Publication of CN112951239B publication Critical patent/CN112951239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • G10H1/0041Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention relates to the field of artificial intelligence, and discloses a Buddha music generation method, a device, equipment and a storage medium based on an attention model, which are used for adopting an improved relative self-attention algorithm in a self-encoder, so that the generated Buddha music works are more consistent with the rules of traditional music, and the accuracy of a libretto rhyme is improved. The Fole generation method based on the attention model comprises the following steps: acquiring an original audio file; extracting lyric characters based on original audio file to generate more wordsA single event; generating a query matrix Q, a key matrix K and a value matrix V; generating a relative attention matrix zhGenerating an output matrix z, and generating a weighting result according to the embedded vector x and the output matrix z; generating an encoder output matrix ae(ii) a Output matrix a of the encodereInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating the final Fole work. In addition, the invention also relates to a block chain technology, and the generated Buddha music work can be stored in the block chain nodes.

Description

Fole generation method, device, equipment and storage medium based on attention model
Technical Field
The invention relates to the field of audio conversion, in particular to a Fole generation method, a device, equipment and a storage medium based on an attention model.
Background
The Buddhism music is a very distinctive cultural form in China, can embody unique cultural features of Chinese cultural circles, and develops a unique music structure by combining the cultural physical cutting with phonological features such as special song cards, special word lattices and the like in China in the traditional Buddhism music in various places, so that the rhythm with poetry and songs is perfectly integrated, and the high artistic expression is realized.
The existing music generation method can generate music fragments with a certain long-time structure, but because researchers are concentrated in western countries, the proposed model mainly focuses on western classical music, and when the existing model is combined with the traditional music style of China, the special relation between the lyrics and the melody in the traditional Buddhism music of China cannot be reflected, and the problems of inconsistent tone and rhythm and unclear ideogram occur.
Disclosure of Invention
The invention provides a Buddha music generation method, a device, equipment and a storage medium based on an attention model, which are used for adopting an improved relative self-attention algorithm in a self-encoder, so that the generated Buddha music works are more consistent with the rule of traditional music, and the accuracy of the phonogram is improved.
The invention provides a Fole generation method based on an attention model in a first aspect, which comprises the following steps: acquiring an original audio file, wherein the original audio file is a Musical Instrument Digital Interface (MIDI) file of Buddhist music; extracting the lyric characters based on the original audio file, searching the corresponding tone of the lyric characters according to a preset sound production table, and generating a plurality of independent eventsA member; taking each single event as an embedded vector x, generating a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vector x based on a preset vector formula, and stacking the plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vector x to obtain a corresponding query matrix Q, a key matrix K and a value matrix V; generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahGenerating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into a feedforward neural network; iteration is carried out according to preset times, and an encoder output matrix a is generated based on the output matrix z obtained at the last timee(ii) a Outputting the encoder output matrix aeInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating the final Fole work.
Optionally, in a first implementation manner of the first aspect of the present invention, the extracting an lyric text based on the original audio file, and searching for a tone corresponding to the lyric text according to a preset sound-producing table, and generating a plurality of separate events includes: extracting lyric characters corresponding to each melody note and the timestamp thereof based on the original audio file to obtain a plurality of groups of lyric characters; searching the lyric tones corresponding to the multiple groups of lyric characters according to the Hakka voice list to obtain multiple groups of lyric tones; based on the plurality of sets of phonographic tones, a plurality of individual events are generated, each individual event comprising a set of phonographic tones and melody notes and timestamps corresponding to the set of phonographic tones.
Optionally, in a second implementation manner of the first aspect of the present invention, the taking each individual event as an embedded vector x, generating multiple sets of query vectors Q, key vectors K, and value vectors V corresponding to the embedded vector x based on a preset vector formula, and stacking the multiple sets of query vectors Q, key vectors K, and value vectors V corresponding to the embedded vector x to obtain corresponding query matrices Q, key matrices K, and value matrices V includes: calculating each single event as an embedded vector x based on a preset vector formulaAnd generating a plurality of groups of query vectors q, key vectors k and value vectors v corresponding to the embedded vector x, wherein the preset vector formula is as follows: q ═ WQx,k=WKx,v=WVx, wherein, WQ、WKAnd WVThe preset parameter matrix has the size of 512 xn respectively, n is the length of the embedding vector x, the generated query vector q, and the key vector k and the value vector v are vectors with the length of 512 respectively; and stacking the query vector Q, the key vector K and the value vector V generated by each single event respectively according to a 512 x n mode to obtain a corresponding query matrix Q, a key matrix K and a corresponding value matrix V.
Optionally, in a third implementation manner of the first aspect of the present invention, the relative attention matrix z of each segment of the attention unit is generated based on a preset relative attention matrix calculation formulahAnd generating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into a feedforward neural network comprises: calculating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahThe attention unit comprises eight segments, each segment having a length of
Figure BDA0002989584670000021
The preset relative attention moment array calculation formula is as follows:
Figure BDA0002989584670000022
wherein D ishDividing the length of the key vector by the length of the attention unit segment, SrelA relative position matrix for each segment, the relative position matrix having dimensions of 512 x 512; a relative attention matrix z for each of the segmentshAnd sequentially connecting to generate an output matrix z, performing weighted addition on the embedded vector x serving as a residual error and the output matrix z to generate a weighted result, and inputting the weighted result into a feedforward neural network.
Optionally, in the first aspect of the inventionIn a fourth implementation manner of one aspect, the iteration is performed according to preset times, and an encoder output matrix a is generated based on the output matrix z obtained at the last timeeThe method comprises the following steps: stacking the query vector q, the key vector k and the value vector v according to preset times, and performing the relative attention matrix zhPerforming iterative computation to generate the output matrix z; splitting the output matrix z obtained at the last time into h, and sequentially adding to obtain an encoder output matrix ae
Optionally, in a fifth implementation manner of the first aspect of the present invention, the encoder outputs the matrix aeInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating a final Fole work, wherein the method comprises the following steps: outputting the encoder output matrix aeInputting a preset decoder, and generating a target output matrix based on an attention unit and a feedforward neural network of the preset decoder; and performing inverse operation of the embedded vector based on the target output matrix, outputting a MIDI file, and generating the final Buddha musical work.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the encoder outputs the matrix aeBefore inputting the preset decoder, the method further comprises: inserting an intermediate attention layer in the preset decoder, the intermediate attention layer including attention units, the input of the first h-1 units of the intermediate attention layer being the zhThe last cell of the intermediate attention layer has an input of ae
A second aspect of the present invention provides an attention model-based folk music generation apparatus, including: the acquisition module is used for acquiring an original audio file, wherein the original audio file is a Musical Instrument Digital Interface (MIDI) file of Buddhist music; the extraction module is used for extracting the lyric characters based on the original audio file, searching the tones corresponding to the lyric characters according to a preset pronunciation table and generating a plurality of independent events; a first generation module for generating each single event as an embedded vector x based on a preset vector formulaA plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vector x are stacked to obtain a corresponding query matrix Q, a corresponding key matrix K and a corresponding value matrix V; a second generation module for generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahGenerating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into a feedforward neural network; an iteration module for performing iteration according to preset times and generating an encoder output matrix a based on the output matrix z obtained at the last timee(ii) a A decoding module for outputting the encoder output matrix aeInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating the final Fole work.
Optionally, in a first implementation manner of the second aspect of the present invention, the extracting module includes: the extracting unit is used for extracting each melody note and the lyric characters corresponding to the time stamp thereof based on the original audio file to obtain a plurality of groups of lyric characters; the searching unit is used for searching the lyric tones corresponding to the multiple groups of lyric characters according to the voice list of the family guest to obtain multiple groups of lyric tones; a generating unit for generating a plurality of individual events based on the plurality of sets of phonogram tones, each individual event containing a set of phonogram tones and melody notes and time stamps corresponding to the set of phonogram tones.
Optionally, in a second implementation manner of the second aspect of the present invention, the first generating module includes: a first calculating unit, configured to use each individual event as an embedded vector x, and generate a plurality of sets of query vectors q, key vectors k, and value vectors v corresponding to the embedded vector x based on a preset vector formula, where the preset vector formula is: q ═ WQx,k=WKx,v=WVx, wherein, WQ、WKAnd WVThe query vector is generated by a preset parameter matrix with the size of 512 xn respectively, n is the length of the embedded vector xq, the key vector k and the value vector v are vectors with length of 512 respectively; and the first stacking unit is configured to stack the query vector Q, the key vector K, and the value vector V generated by each individual event in a 512 × n manner, so as to obtain a corresponding query matrix Q, a corresponding key matrix K, and a corresponding value matrix V.
Optionally, in a third implementation manner of the second aspect of the present invention, the second generating module includes: a second calculation unit for calculating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahThe attention unit comprises eight segments, each segment having a length of
Figure BDA0002989584670000041
The preset relative attention moment array calculation formula is as follows:
Figure BDA0002989584670000042
wherein D ishDividing the length of the key vector by the length of the attention unit segment, SrelA relative position matrix for each segment, the relative position matrix having dimensions of 512 x 512; a connection unit for connecting the relative attention matrix z of each segmenthAnd sequentially connecting to generate an output matrix z, performing weighted addition on the embedded vector x serving as a residual error and the output matrix z to generate a weighted result, and inputting the weighted result into a feedforward neural network.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the iteration module includes: a second stacking unit, configured to stack the query vector q, the key vector k, and the value vector v according to preset times, and stack the relative attention matrix zhPerforming iterative computation to generate the output matrix z; a splitting unit, configured to split the output matrix z obtained at the last time into h, and then add the h to obtain an encoder output matrix ae
Optionally, in a fifth implementation manner of the second aspect of the inventionThe decoding module comprises: an input unit for outputting the encoder output matrix aeInputting a preset decoder, and generating a target output matrix based on an attention unit and a feedforward neural network of the preset decoder; and the output unit is used for carrying out inverse operation of the embedded vector based on the target output matrix, outputting the MIDI file and generating the final Buddha musical work.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the encoder outputs the matrix aeBefore inputting the preset decoder, the device further comprises:
an insertion module for inserting an intermediate attention layer in the preset decoder, the intermediate attention layer comprising attention units, the input of the first h-1 units of the intermediate attention layer being the z-numberhThe last cell of the intermediate attention layer has an input of ae
A third aspect of the present invention provides an attention model-based folk music generating apparatus, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the attention model-based Fowler generating device to perform the attention model-based Fowler generating method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the above-described attention model-based folk generation method.
In the technical scheme provided by the invention, an original audio file is obtained, wherein the original audio file is a MIDI file of a musical instrument digital interface of Buddhism music; extracting the lyric characters based on the original audio file, searching the tones corresponding to the lyric characters according to a preset tone list, and generating a plurality of independent events; taking each single event as an embedded vector x, generating a plurality of groups of query vectors q, key vectors k and value vectors v corresponding to the embedded vector x based on a preset vector formula, and taking the plurality of groups of embedded vectors x as correspondingThe query vector Q, the key vector K and the value vector V are stacked to obtain a corresponding query matrix Q, a key matrix K and a corresponding value matrix V; generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahGenerating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into a feedforward neural network; iteration is carried out according to preset times, and an encoder output matrix a is generated based on the output matrix z obtained at the last timee(ii) a Outputting the encoder output matrix aeInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating the final Fole work. In the embodiment of the invention, an improved relative self-attention algorithm is adopted in the self-encoder, so that the generated Buddha musical works are more in line with the rules of traditional music, and the accuracy of the phonogram is improved.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a Fowler generating method based on an attention model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of another embodiment of a Fowler generating method based on an attention model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of a Fowler generating device based on an attention model according to an embodiment of the invention;
FIG. 4 is a schematic diagram of another embodiment of a Fowler generating device based on an attention model according to an embodiment of the invention;
fig. 5 is a schematic diagram of an embodiment of a folk music generation device based on an attention model in the embodiment of the invention.
Detailed Description
The embodiment of the invention provides a Buddha music generation method, a device, equipment and a storage medium based on an attention model, which are used for adopting an improved relative self-attention algorithm in a self-encoder, so that the generated Buddha music works are more consistent with the rule of traditional music, and the accuracy of the phonogram is improved.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a specific flow of an embodiment of the present invention is described below, and referring to fig. 1, an embodiment of a method for generating folk music based on an attention model in an embodiment of the present invention includes:
101. and acquiring an original audio file, wherein the original audio file is a MIDI file of a musical instrument digital interface of Buddhist music.
The server obtains an original audio file, wherein the original audio file is a MIDI file of a musical instrument digital interface of Buddhist music. The server obtains the MIDI file of Buddhist music from the relevant Buddhist web site or the preset music library, the Musical Instrument Digital Interface (MIDI) is a communication standard, which is used to determine the information and control signal exchanged among the computer music program, synthesizer and other electronic sound equipment, the MIDI file includes the playing note information of each channel, such as key channel number, length, volume and force, because the MIDI file is a series of instructions, not the waveform, it needs very little disk space, the editing and modifying of MIDI data is very flexible, it can conveniently add or delete a certain note or change the attribute of note.
It is to be understood that the executing subject of the present invention may be a folk music generating apparatus based on an attention model, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
102. Extracting the lyric characters based on the original audio file, and searching the tone corresponding to the lyric characters according to a preset sound-producing table to generate a plurality of independent events.
The server extracts the lyrics based on the original audio file, looks up the corresponding tone of the lyrics according to a preset sound-producing table, and generates a plurality of single events. The server extracts the lyric characters from the MIDI file of the original audio, and searches the corresponding tone of the lyric characters according to a Meizhou Hakka pronunciation table, wherein the Meizhou language is a Hakka language dialect distributed in the Meizhou area of Guangdong province, and the Meizhou Hakka pronunciation table comprises corresponding initial consonants, vowels and tone information.
103. And taking each single event as an embedded vector x, generating a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x based on a preset vector formula, and stacking the query vectors Q, the key vectors K and the value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, a key matrix K and a value matrix V.
The server takes each single event as an embedded vector x, generates a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vector x based on a preset vector formula, and stacks the query vectors Q, the key vectors K and the value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, a key matrix K and a corresponding value matrix V. Specifically, the server takes each individual event as an embedded vector x, and calculates and generates a plurality of groups of query vectors q, key vectors k and value vectors v corresponding to the embedded vectors x based on a preset vector formula, where the preset vector formula is as follows: q ═ WQx,k=WKx,v=WVx, wherein, WQ、WKAnd WVThe method comprises the steps that a preset parameter matrix is adopted, the size of the preset parameter matrix is 512 xn respectively, n is the length of an embedded vector x, a generated query vector q is generated, and a key vector k and a value vector v are vectors with the length of 512 respectively; and the server respectively stacks the query vector Q, the key vector K and the value vector V generated by each single event according to a 512 multiplied by n mode to obtain a corresponding query matrix Q, a key matrix K and a corresponding value matrix V.
The query vector Q, the key vector K and the value vector V are generated by multiplying the embedded vector x by a preset parameter matrix respectively, the server stacks according to 512 xn based on the generated query vector Q, the key vector K and the value vector V, and finally, the corresponding query matrix Q, the key matrix K and the value matrix V are generated.
104. Generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahAnd generating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into the feedforward neural network.
The server generates a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahAnd generating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into the feedforward neural network. Specifically, the server calculates a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahThe attention unit comprises eight segments, each segment having a length of
Figure BDA0002989584670000081
The preset calculation formula of the relative attention moment array is as follows:
Figure BDA0002989584670000082
wherein D ishIs the length of the key vector divided by the length of the attention unit segment, SrelA relative position matrix for each segment, the dimension of the relative position matrix being 512 x 512; the server maps the relative attention matrix z of each segmenthAnd sequentially connecting to generate an output matrix z, performing weighted addition on the embedded vector x serving as a residual error and the output matrix z to generate a weighted result, and inputting the weighted result into the feedforward neural network.
The method comprises the following steps of using an embedded vector x as a residual error, carrying out weighted summation on the residual error and an output matrix z according to preset weight, and inputting a generated result to the feedforward neural network.
105. Iteration is carried out according to preset times, and an encoder output matrix a is generated based on the output matrix z obtained at the last timee
The server iterates according to the preset times, and generates an encoder output matrix a based on the output matrix z obtained at the last timee. Specifically, the query vector q, the key vector k and the value vector v are stacked according to the preset times, and the relative attention matrix z is subjected to stackinghPerforming iterative computation to generate an output matrix z; splitting the output matrix z obtained at the last time into h, and sequentially adding to obtain an encoder output matrix ae
The server stacks the vectors and computes a relative attention matrix zhThe last obtained output matrix z is split into h and added in sequence to obtain an encoder output matrix ae
106. Output matrix a of the encodereInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating the final Fole work.
The server outputs the encoder output matrix aeInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating the final Fole work. Specifically, the server outputs the encoder output matrix aeInputting a preset decoder, and generating a target output matrix based on an attention unit and a feedforward neural network of the preset decoder; and the server performs inverse operation of the embedded vector based on the target output matrix, outputs the MIDI file and generates the final Buddha musical work.
The decoder is used for outputting the matrix a of the decodereThe tone information in the embedded matrix is fixed as the singing information of the target single character, the pitch and time position information of the melody is generated by the model, and the final output matrix is converted into the MIDI file again to output the final Buddha musical work.
In the embodiment of the invention, an improved relative self-attention algorithm is adopted in the self-encoder, so that the generated Buddha musical works are more in line with the rules of traditional music, and the accuracy of the phonogram is improved.
Referring to fig. 2, another embodiment of the method for generating folk music based on attention model in the embodiment of the present invention includes:
201. and acquiring an original audio file, wherein the original audio file is a MIDI file of a musical instrument digital interface of Buddhist music.
The server obtains an original audio file, wherein the original audio file is a MIDI file of a musical instrument digital interface of Buddhist music. The server obtains the MIDI file of Buddhist music from the relevant Buddhist web site or the preset music library, the Musical Instrument Digital Interface (MIDI) is a communication standard, which is used to determine the information and control signal exchanged among the computer music program, synthesizer and other electronic sound equipment, the MIDI file includes the playing note information of each channel, such as key channel number, length, volume and force, because the MIDI file is a series of instructions, not the waveform, it needs very little disk space, the editing and modifying of MIDI data is very flexible, it can conveniently add or delete a certain note or change the attribute of note.
It is to be understood that the executing subject of the present invention may be a folk music generating apparatus based on an attention model, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
202. And extracting the lyric characters corresponding to each melody note and the timestamp thereof based on the original audio file to obtain a plurality of groups of lyric characters.
The server extracts the lyric characters corresponding to each melody note and the time stamp thereof based on the original audio file to obtain a plurality of groups of lyric characters. The server extracts melody notes from the original audio file, determines timestamps corresponding to different melody notes and lyric characters corresponding to different melody notes, and obtains multiple groups of lyric characters.
203. And searching the lyric tones corresponding to the multiple groups of lyric characters according to the Hakka voice list to obtain multiple groups of lyric tones.
The server searches the corresponding lyric tones of the plurality of groups of lyric characters according to the voice list of the family, and obtains a plurality of groups of lyric tones. The server refers to a six-tone system of Hakka words in Meizhou, takes Yin Ping, Yang Ping, Shang Ping, De-Ping, Yin in and Yang in as the tone and rhyme labels of the lyrics, adds the tone and rhyme labels into an embedded matrix of the attention model to serve as one of input parameters, and searches the lyric tones corresponding to a plurality of groups of lyric characters based on a Hakka voice table.
204. Based on the plurality of groups of phonographic tones, a plurality of individual events are generated, each individual event comprising a group of phonographic tones and a melody note and a time stamp corresponding to the group of phonographic tones.
The server generates a plurality of individual events based on the plurality of groups of tone words, each individual event comprising a group of tone words and a melody note and a time stamp corresponding to the group of tone words. Each individual event is composed of an ideogram tone searched based on the ideogram, a corresponding melody note and a corresponding time stamp, and forms a finite-length event sequence.
205. And taking each single event as an embedded vector x, generating a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x based on a preset vector formula, and stacking the query vectors Q, the key vectors K and the value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, a key matrix K and a value matrix V.
The server takes each single event as an embedded vector x, generates a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vector x based on a preset vector formula, and stacks the query vectors Q, the key vectors K and the value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, a key matrix K and a corresponding value matrix V. Specifically, the server takes each individual event as an embedded vector x, and calculates and generates a plurality of groups of query vectors q, key vectors k and value vectors v corresponding to the embedded vectors x based on a preset vector formula, where the preset vector formula is as follows: q ═ WQx,k=WKx,v=WVx, wherein, WQ、WKAnd WVThe method comprises the steps that a preset parameter matrix is adopted, the size of the preset parameter matrix is 512 xn respectively, n is the length of an embedded vector x, a generated query vector q is generated, and a key vector k and a value vector v are vectors with the length of 512 respectively; and the server respectively stacks the query vector Q, the key vector K and the value vector V generated by each single event according to a 512 multiplied by n mode to obtain a corresponding query matrix Q, a key matrix K and a corresponding value matrix V.
The query vector Q, the key vector K and the value vector V are generated by multiplying the embedded vector x by a preset parameter matrix respectively, the server stacks according to 512 xn based on the generated query vector Q, the key vector K and the value vector V, and finally, the corresponding query matrix Q, the key matrix K and the value matrix V are generated.
206. Generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahAnd generating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into the feedforward neural network.
The server generates a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahAnd generating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into the feedforward neural network. Specifically, the server calculates a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahThe attention unit comprises eight segments, each segment having a length of
Figure BDA0002989584670000111
The preset calculation formula of the relative attention moment array is as follows:
Figure BDA0002989584670000112
wherein D ishIs the length of the key vector divided by the length of the attention unit segment, SrelA relative position matrix for each segment, the dimension of the relative position matrix being 512 x 512; the server maps the relative attention matrix z of each segmenthAre connected in sequence to generateAnd outputting a matrix z, performing weighted addition on the embedded vector x serving as a residual error and the output matrix z to generate a weighted result, and inputting the weighted result into the feedforward neural network.
The method comprises the following steps of using an embedded vector x as a residual error, carrying out weighted summation on the residual error and an output matrix z according to preset weight, and inputting a generated result to the feedforward neural network.
207. Iteration is carried out according to preset times, and an encoder output matrix a is generated based on the output matrix z obtained at the last timee
The server iterates according to the preset times, and generates an encoder output matrix a based on the output matrix z obtained at the last timee. Specifically, the query vector q, the key vector k and the value vector v are stacked according to the preset times, and the relative attention matrix z is subjected to stackinghPerforming iterative computation to generate an output matrix z; splitting the output matrix z obtained at the last time into h, and sequentially adding to obtain an encoder output matrix ae
The server stacks the vectors and computes a relative attention matrix zhThe last obtained output matrix z is split into h and added in sequence to obtain an encoder output matrix ae
208. Output matrix a of the encodereInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating the final Fole work.
The server outputs the encoder output matrix aeInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating the final Fole work. Specifically, the server outputs the encoder output matrix aeInputting a preset decoder, and generating a target output matrix based on an attention unit and a feedforward neural network of the preset decoder; the server performs inverse operation of the embedded vector based on the target output matrix, and outputsMIDI file to generate final Buddha music.
The decoder is used for outputting the matrix a of the decodereThe tone information in the embedded matrix is fixed as the singing information of the target single character, the pitch and time position information of the melody is generated by the model, and the final output matrix is converted into the MIDI file again to output the final Buddha musical work.
In the embodiment of the invention, an improved relative self-attention algorithm is adopted in the self-encoder, so that the generated Buddha musical works are more in line with the rules of traditional music, and the accuracy of the phonogram is improved.
With reference to fig. 3, the method for generating folk music based on an attention model in the embodiment of the present invention is described above, and a folk music generating apparatus based on an attention model in the embodiment of the present invention is described below, where an embodiment of the folk music generating apparatus based on an attention model in the embodiment of the present invention includes:
an obtaining module 301, configured to obtain an original audio file, where the original audio file is a MIDI file of a musical instrument digital interface of buddhist music;
an extracting module 302, configured to extract the lyric text based on the original audio file, look up a tone corresponding to the lyric text according to a preset sound table, and generate a plurality of individual events;
the first generating module 303 is configured to use each individual event as an embedded vector x, generate a plurality of sets of query vectors Q, key vectors K, and value vectors V corresponding to the embedded vectors x based on a preset vector formula, and stack the plurality of sets of query vectors Q, key vectors K, and value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, a key matrix K, and a value matrix V;
a second generating module 304 for generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahGenerating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into a feedforward neural network;
iteration module 305 for rootIteration is carried out according to preset times, and an encoder output matrix a is generated based on the output matrix z obtained at the last timee
A decoding module 306 for decoding the encoder output matrix aeInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating the final Fole work.
In the embodiment of the invention, an improved relative self-attention algorithm is adopted in the self-encoder, so that the generated Buddha musical works are more in line with the rules of traditional music, and the accuracy of the phonogram is improved.
Referring to fig. 4, another embodiment of the device for generating folk music based on attention model in the embodiment of the present invention includes:
an obtaining module 301, configured to obtain an original audio file, where the original audio file is a MIDI file of a musical instrument digital interface of buddhist music;
an extracting module 302, configured to extract the lyric text based on the original audio file, look up a tone corresponding to the lyric text according to a preset sound table, and generate a plurality of individual events;
the first generating module 303 is configured to use each individual event as an embedded vector x, generate a plurality of sets of query vectors Q, key vectors K, and value vectors V corresponding to the embedded vectors x based on a preset vector formula, and stack the plurality of sets of query vectors Q, key vectors K, and value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, a key matrix K, and a value matrix V;
a second generating module 304 for generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahGenerating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into a feedforward neural network;
an iteration module 305, configured to iterate according to a preset number of times, and generate an encoder output matrix a based on the last obtained output matrix ze
A decoding module 306 for decoding the encoder output matrix aeInputting the preset decoder to obtain a target output matrix,and converting the target output matrix into a MIDI file to generate the final Buddha music work.
Optionally, the extracting module 302 includes:
an extracting unit 3021, configured to extract, based on the original audio file, lyric characters corresponding to each melody note and a timestamp thereof to obtain multiple groups of lyric characters;
the searching unit 3022 is configured to search the lyric tones corresponding to the multiple groups of lyric characters according to the guest speech sound table to obtain multiple groups of lyric tones;
a generating unit 3023 for generating a plurality of individual events based on the plurality of sets of phonogram tones, each individual event containing a set of phonogram tones and melody notes and time stamps corresponding to the set of phonogram tones.
Optionally, the first generating module 303 includes:
a first calculating unit 3031, configured to calculate and generate, by taking each individual event as an embedded vector x, a query vector q, a key vector k, and a value vector v corresponding to a plurality of groups of embedded vectors x based on a preset vector formula, where the preset vector formula is: q ═ WQx,k=WKx,v=WVx, wherein, WQ、WKAnd WVThe method comprises the steps that a preset parameter matrix is adopted, the size of the preset parameter matrix is respectively 512 xn, n is the length of an embedded vector x, a generated query vector q, a key vector k and a value vector v are respectively vectors with the length of 512;
the first stacking unit 3032 is configured to stack the query vector Q, the key vector K, and the value vector V generated by each individual event in a 512 × n manner, respectively, to obtain a corresponding query matrix Q, a corresponding key matrix K, and a corresponding value matrix V.
Optionally, the second generating module 304 includes:
a second calculating unit 3041 for calculating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahThe attention unit comprises eight segments, each segment having a length of
Figure BDA0002989584670000141
Preset relative attention moment arrayThe calculation formula is as follows:
Figure BDA0002989584670000142
wherein D ishIs the length of the key vector divided by the length of the attention unit segment, SrelA relative position matrix for each segment, the dimension of the relative position matrix being 512 x 512;
a connection unit 3042 for connecting the relative attention matrix z of each segmenthAnd sequentially connecting to generate an output matrix z, performing weighted addition on the embedded vector x serving as a residual error and the output matrix z to generate a weighted result, and inputting the weighted result into the feedforward neural network.
Optionally, the iteration module 305 includes:
a second stacking unit 3051, configured to stack the query vector q, the key vector k, and the value vector v according to a preset number of times, and stack the relative attention matrix zhPerforming iterative computation to generate an output matrix z;
a splitting unit 3052, configured to split the output matrix z obtained at the last time into h, and then add the h to obtain an encoder output matrix ae
Optionally, the decoding module 306 includes:
an input unit 3061 for outputting the encoder output matrix aeInputting a preset decoder, and generating a target output matrix based on an attention unit and a feedforward neural network of the preset decoder;
an output unit 3062, which is used for performing inverse operation of the embedded vector based on the target output matrix, outputting the MIDI file and generating the final Buddha musical work.
Optionally, the apparatus for generating folk music based on attention model further includes:
an insertion module 307 for inserting an intermediate attention layer in the preset decoder, the intermediate attention layer comprising attention units, the input of the first h-1 units of the intermediate attention layer being zhThe input to the last cell of the middle attention layer is ae
In the embodiment of the invention, an improved relative self-attention algorithm is adopted in the self-encoder, so that the generated Buddha musical works are more in line with the rules of traditional music, and the accuracy of the phonogram is improved.
Fig. 3 and 4 describe the attention model-based folk music generation device in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the attention model-based folk music generation device in the embodiment of the present invention is described in detail from the perspective of the hardware processing.
Fig. 5 is a schematic structural diagram of an attention model-based folk generation device 500 according to an embodiment of the present invention, which may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) storing an application 533 or data 532. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instructions operating on the attention model-based folk music generating device 500. Still further, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the attention model-based foley generating device 500.
The attention model-based Fowler generating device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the attention model-based foley generation device configuration shown in fig. 5 does not constitute a limitation of the attention model-based foley generation device, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
The present invention also provides an attention model-based folk music generation device, wherein the computer device comprises a memory and a processor, the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the attention model-based folk music generation method in the above embodiments.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, which may also be a volatile computer readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the attention model based folk generation method.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An attention model-based folk music generation method, characterized by comprising:
acquiring an original audio file, wherein the original audio file is a Musical Instrument Digital Interface (MIDI) file of Buddhist music;
extracting the lyric characters based on the original audio file, searching the tones corresponding to the lyric characters according to a preset tone list, and generating a plurality of independent events;
taking each single event as an embedded vector x, generating a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x based on a preset vector formula, and stacking the query vectors Q, the key vectors K and the value vectors V corresponding to the plurality of groups of embedded vectors x to obtain a corresponding query matrix Q, a key matrix K and a value matrix V;
generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahGenerating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into a feedforward neural network;
iterating according to preset times based on the last timeThe obtained output matrix z generates an encoder output matrix ae
Outputting the encoder output matrix aeInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating the final Fole work.
2. The attention model-based Buddha music generation method of claim 1, wherein said extracting the lyrics text based on the original audio file, looking up the tone corresponding to the lyrics text according to a preset pronunciation table, and generating a plurality of individual events, each individual event containing a lyrics text and the tone corresponding to the lyrics text comprises:
extracting lyric characters corresponding to each melody note and the timestamp thereof based on the original audio file to obtain a plurality of groups of lyric characters;
searching the lyric tones corresponding to the multiple groups of lyric characters according to the Hakka voice list to obtain multiple groups of lyric tones;
based on the plurality of sets of phonographic tones, a plurality of individual events are generated, each individual event comprising a set of phonographic tones and melody notes and timestamps corresponding to the set of phonographic tones.
3. The method for generating Fowler alternifolia based on attention model according to claim 1, wherein the step of generating a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x based on a preset vector formula by using each individual event as an embedded vector x, and the step of stacking the plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x to obtain corresponding query matrices Q, key matrices K and value matrices V comprises:
taking each single event as an embedded vector x, calculating and generating a plurality of groups of query vectors q, key vectors k and value vectors v corresponding to the embedded vector x based on a preset vector formula, wherein the preset vector formula is as follows: q ═ WQx,k=WKx,v=WVx, wherein, WQ、WKAnd WVThe preset parameter matrix has the size of 512 xn respectively, n is the length of the embedding vector x, the generated query vector q, and the key vector k and the value vector v are vectors with the length of 512 respectively;
and stacking the query vector Q, the key vector K and the value vector V generated by each single event respectively according to a 512 x n mode to obtain a corresponding query matrix Q, a key matrix K and a corresponding value matrix V.
4. The attention model-based Fowler generation method as claimed in claim 1, wherein the relative attention matrix z for each segment of attention unit is generated based on a preset relative attention matrix calculation formulahAnd generating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into a feedforward neural network comprises:
calculating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahThe attention unit comprises eight segments, each segment having a length of
Figure FDA0002989584660000021
The preset relative attention moment array calculation formula is as follows:
Figure FDA0002989584660000022
wherein D ishDividing the length of the key vector by the length of the attention unit segment, SrelA relative position matrix for each segment, the relative position matrix having dimensions of 512 x 512;
a relative attention matrix z for each of the segmentshAnd sequentially connecting to generate an output matrix z, performing weighted addition on the embedded vector x serving as a residual error and the output matrix z to generate a weighted result, and inputting the weighted result into a feedforward neural network.
5. The method of claim 1, wherein the iteration is performed according to a preset number of times, and an encoder output matrix a is generated based on the output matrix z obtained last timeeThe method comprises the following steps:
stacking the query vector q, the key vector k and the value vector v according to preset times, and performing the relative attention matrix zhPerforming iterative computation to generate the output matrix z;
splitting the output matrix z obtained at the last time into h, and sequentially adding to obtain an encoder output matrix ae
6. The attention model-based Fowler generation method according to any of claims 1-5, wherein the encoder output matrix a is used for generating the encoder output matrixeInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating a final Fole work, wherein the method comprises the following steps:
outputting the encoder output matrix aeInputting a preset decoder, and generating a target output matrix based on an attention unit and a feedforward neural network of the preset decoder;
and performing inverse operation of the embedded vector based on the target output matrix, outputting a MIDI file, and generating the final Buddha musical work.
7. The attention model-based Fowler generation method of claim 6, wherein the encoder output matrix a is appliedeBefore inputting the preset decoder, the method further comprises:
inserting an intermediate attention layer in the preset decoder, the intermediate attention layer including attention units, the input of the first h-1 units of the intermediate attention layer being the zhThe last cell of the intermediate attention layer has an input of ae
8. An attention model-based folk music generation apparatus, characterized by comprising:
the acquisition module is used for acquiring an original audio file, wherein the original audio file is a Musical Instrument Digital Interface (MIDI) file of Buddhist music;
the extraction module is used for extracting the lyric characters based on the original audio file, searching the tones corresponding to the lyric characters according to a preset pronunciation table and generating a plurality of independent events;
the first generation module is used for taking each single event as an embedded vector x, generating a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vector x based on a preset vector formula, and stacking the plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vector x to obtain a corresponding query matrix Q, a key matrix K and a value matrix V;
a second generation module for generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formulahGenerating an output matrix z, generating a weighting result according to the embedded vector x and the output matrix z, and inputting the weighting result into a feedforward neural network;
an iteration module for performing iteration according to preset times and generating an encoder output matrix a based on the output matrix z obtained at the last timee
A decoding module for outputting the encoder output matrix aeInputting a preset decoder to obtain a target output matrix, converting the target output matrix into a MIDI file, and generating the final Fole work.
9. An attention model-based folk music generation device, characterized by comprising: a memory and at least one processor, the memory having instructions stored therein;
the at least one processor invokes the instructions in the memory to cause the attention model-based Fowler generating device to perform the attention model-based Fowler generating method as recited in any one of claims 1-7.
10. A computer readable storage medium having instructions stored thereon, which when executed by a processor implement the method of attention model-based folk generation as claimed in any one of claims 1-7.
CN202110311437.7A 2021-03-24 2021-03-24 Buddha music generation method, device, equipment and storage medium based on attention model Active CN112951239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110311437.7A CN112951239B (en) 2021-03-24 2021-03-24 Buddha music generation method, device, equipment and storage medium based on attention model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110311437.7A CN112951239B (en) 2021-03-24 2021-03-24 Buddha music generation method, device, equipment and storage medium based on attention model

Publications (2)

Publication Number Publication Date
CN112951239A true CN112951239A (en) 2021-06-11
CN112951239B CN112951239B (en) 2023-07-28

Family

ID=76228479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110311437.7A Active CN112951239B (en) 2021-03-24 2021-03-24 Buddha music generation method, device, equipment and storage medium based on attention model

Country Status (1)

Country Link
CN (1) CN112951239B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090272251A1 (en) * 2002-11-12 2009-11-05 Alain Georges Systems and methods for portable audio synthesis
CN105788589A (en) * 2016-05-04 2016-07-20 腾讯科技(深圳)有限公司 Audio data processing method and device
WO2018194456A1 (en) * 2017-04-20 2018-10-25 Universiteit Van Amsterdam Optical music recognition omr : converting sheet music to a digital format
CN109492227A (en) * 2018-11-16 2019-03-19 大连理工大学 It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations
CN110853626A (en) * 2019-10-21 2020-02-28 成都信息工程大学 Bidirectional attention neural network-based dialogue understanding method, device and equipment
US20200135174A1 (en) * 2018-10-24 2020-04-30 Tencent America LLC Multi-task training architecture and strategy for attention-based speech recognition system
CN111477221A (en) * 2020-05-28 2020-07-31 中国科学技术大学 Speech recognition system using bidirectional time sequence convolution and self-attention mechanism network
CN111524503A (en) * 2020-04-15 2020-08-11 上海明略人工智能(集团)有限公司 Audio data processing method and device, audio recognition equipment and storage medium
CN112331170A (en) * 2020-10-28 2021-02-05 平安科技(深圳)有限公司 Method, device and equipment for analyzing similarity of Buddha music melody and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090272251A1 (en) * 2002-11-12 2009-11-05 Alain Georges Systems and methods for portable audio synthesis
CN105788589A (en) * 2016-05-04 2016-07-20 腾讯科技(深圳)有限公司 Audio data processing method and device
WO2018194456A1 (en) * 2017-04-20 2018-10-25 Universiteit Van Amsterdam Optical music recognition omr : converting sheet music to a digital format
US20200135174A1 (en) * 2018-10-24 2020-04-30 Tencent America LLC Multi-task training architecture and strategy for attention-based speech recognition system
CN109492227A (en) * 2018-11-16 2019-03-19 大连理工大学 It is a kind of that understanding method is read based on the machine of bull attention mechanism and Dynamic iterations
CN110853626A (en) * 2019-10-21 2020-02-28 成都信息工程大学 Bidirectional attention neural network-based dialogue understanding method, device and equipment
CN111524503A (en) * 2020-04-15 2020-08-11 上海明略人工智能(集团)有限公司 Audio data processing method and device, audio recognition equipment and storage medium
CN111477221A (en) * 2020-05-28 2020-07-31 中国科学技术大学 Speech recognition system using bidirectional time sequence convolution and self-attention mechanism network
CN112331170A (en) * 2020-10-28 2021-02-05 平安科技(深圳)有限公司 Method, device and equipment for analyzing similarity of Buddha music melody and storage medium

Also Published As

Publication number Publication date
CN112951239B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
US7488886B2 (en) Music information retrieval using a 3D search algorithm
US7696426B2 (en) Recombinant music composition algorithm and method of using the same
Dighe et al. Swara Histogram Based Structural Analysis And Identification Of Indian Classical Ragas.
Lemström et al. SEMEX-An efficient Music Retrieval Prototype.
Birmingham et al. Query by humming with the vocalsearch system
US12014707B2 (en) Systems, devices, and methods for varying digital representations of music
US12014708B2 (en) Systems, devices, and methods for harmonic structure in digital representations of music
CN113035161B (en) Song melody generation method, device and equipment based on chord and storage medium
Kızrak et al. Classification of classic Turkish music makams
Syarif et al. Human and computation-based music representation for gamelan music
Lemström et al. On comparing edit distance and geometric frameworks in content-based retrieval of symbolically encoded polyphonic music
CN112951239A (en) Fole generation method, device, equipment and storage medium based on attention model
JP2010026337A (en) Method, and program and device for creating song, and song providing system
López et al. Harmonic reductions as a strategy for creative data augmentation
CN113032615A (en) Meditation music generation method, device, equipment and storage medium
KR102227415B1 (en) System, device, and method to generate polyphonic music
Suzuki Score Transformer: Generating Musical Score from Note-level Representation
CN113033778B (en) Buddha music generation method, device, equipment and storage medium
JP4447567B2 (en) How to add singing melody data to karaoke works, how to generate singing melody data
CN109241312A (en) Compose a poem to a given tune of ci method, apparatus and the terminal device of melody
Ye et al. A Cross-language Music Retrieval Method by Using Misheard Lyrics
Perttu Combinatorial pattern matching in musical sequences
Karamanolakis et al. Audio-based distributional semantic models for music auto-tagging and similarity measurement
Murthy et al. Computational Aspects of Classical Music
Karkera et al. Unveiling the Art of Music Generation with LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant