CN112951239B

CN112951239B - Buddha music generation method, device, equipment and storage medium based on attention model

Info

Publication number: CN112951239B
Application number: CN202110311437.7A
Authority: CN
Inventors: 刘奡智; 郭锦岳; 韩宝强; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2023-07-28
Anticipated expiration: 2041-03-24
Also published as: CN112951239A

Abstract

The invention relates to the field of artificial intelligence, and discloses a Buddha music generation method, device, equipment and storage medium based on an attention model, which are used for adopting an improved relative self-attention algorithm in a self-encoder to enable the generated Buddha music work to more accord with the rule of traditional music and improve the accuracy of phonological notes. The Buddha music generating method based on the attention model comprises the following steps: acquiring an original audio file; extracting word-reading characters based on the original audio file to generate a plurality of independent events; generating a query matrix Q, a key matrix K and a value matrix V; generating a relative attention matrix z _h Generating an output matrix z, and generating a weighted result according to the embedded vector x and the output matrix z; generating an encoder output matrix a _e The method comprises the steps of carrying out a first treatment on the surface of the Matrix a is output by the encoder _e And inputting the target output matrix into a preset decoder to obtain the target output matrix, and converting the target output matrix into MIDI files to generate the final Buddha musical composition. In addition, the invention also relates to a blockchain technology, and the generated Buddha musical works can be stored in the blockchain nodes.

Description

Buddha music generation method, device, equipment and storage medium based on attention model

Technical Field

The present invention relates to the field of audio conversion, and in particular, to a method, apparatus, device, and storage medium for generating a Buddha music based on an attention model.

Background

The Buddhism music is a cultural form with very special characteristics in China, can embody the unique cultural characteristics of Chinese cultural circles, and in the traditional Buddhism music in all places, the unique musical structure is developed by often combining with the unique literary style with phonological characteristics such as the song cards, the language grids and the like in China, so that the melody with poetry singing words and songs is perfectly fused, and is a highly-embodied art.

The existing music generation method can generate music fragments with a certain long-time structure, but because researchers concentrate on western countries, the proposed model mainly focuses on western classical music, when the existing model is combined with the traditional music style of China, the special relation between the singing words and the melody in the traditional Buddhist music of China cannot be reflected, and the problems of inconsistent melody and phonological of the singing words and unclear ideographic are caused.

Disclosure of Invention

The invention provides a Buddha music generating method, device, equipment and storage medium based on an attention model, which are used for adopting an improved relative self-attention algorithm in a self-encoder, so that the generated Buddha music works more accord with the rule of traditional music, and the accuracy of phonological notes is improved.

The first aspect of the invention provides a Buddha music generating method based on an attention model, which comprises the following steps: acquiring an original audio file, wherein the original audio file is a musical instrument digital interface MIDI file of Buddhism music; extracting word-reading characters based on the original audio file, searching tones corresponding to the word-reading characters according to a preset pronunciation table, and generating a plurality of independent events; each single event is used as an embedded vector x, a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vector x are generated based on a preset vector formula, and the plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vector x are stacked to obtain a corresponding query matrix Q, key matrix K and value matrix V; based on a preset relative attention matrix calculation formula, generating a relative attention matrix z of each segment of the attention unit _h Generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into a feedforward neural network; iterating according to preset times, and generating an encoder output matrix a based on the output matrix z obtained last time _e The method comprises the steps of carrying out a first treatment on the surface of the Matrix a is output by the encoder _e Inputting a preset solutionAnd the coder is used for obtaining a target output matrix, converting the target output matrix into MIDI files and generating a final Buddha music work.

Optionally, in a first implementation manner of the first aspect of the present invention, the extracting the word text based on the original audio file, searching the tone corresponding to the word text according to a preset pronunciation table, and generating the plurality of individual events includes: extracting the word-reading characters corresponding to each melody note and the time stamp thereof based on the original audio file to obtain a plurality of groups of word-reading characters; searching the phonation tones corresponding to the plurality of groups of phonation characters according to a Hakka pronunciation table to obtain a plurality of groups of phonation tones; based on the plurality of sets of the phonograph tones, a plurality of individual events are generated, each individual event comprising a set of phonograph tones, and melody notes and timestamps corresponding to the set of phonograph tones.

Optionally, in a second implementation manner of the first aspect of the present invention, the generating, based on a preset vector formula, a plurality of sets of query vectors Q, key vectors K, and value vectors V corresponding to the embedded vectors x using each individual event as the embedded vector x, and stacking the plurality of sets of query vectors Q, key vectors K, and value vectors V corresponding to the embedded vectors x, to obtain the corresponding query matrix Q, key matrix K, and value matrix V includes: taking each single event as an embedded vector x, calculating and generating a plurality of groups of query vectors q, key vectors k and value vectors v corresponding to the embedded vector x based on a preset vector formula, wherein the preset vector formula is as follows: ，/>，/>Wherein->、/>And->Is a preset parameter matrix with the size of +.>N is the length of the embedded vector x, the generated query vector q, the key vector k and the value vector v are respectively vectors with the length of 512; generating said query vector q, said key vector k and said value vector v for each individual event according to +.>And stacking the two types of the data to obtain a corresponding query matrix Q, a key matrix K and a value matrix V.

Optionally, in a third implementation manner of the first aspect of the present invention, the generating the relative attention matrix z of each segment of the attention unit is based on a preset relative attention matrix calculation formula _h And generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into a feedforward neural network comprises: based on a preset relative attention matrix calculation formula, calculating a relative attention matrix z of each segment of the attention unit _h The attention unit comprises eight segments, each segment having a length ofThe preset relative attention moment array calculation formula is as follows:

wherein->Dividing the length of the key vector by the attention

The length of the cell segment is chosen to be,for each segment, a relative position matrix having dimensions ofThe method comprises the steps of carrying out a first treatment on the surface of the A relative attention matrix z for each segment _h And sequentially connecting the embedded vectors to generate an output matrix z, taking the embedded vectors x as residual errors, carrying out weighted addition on the embedded vectors x and the output matrix z, generating a weighted result, and inputting the weighted result into a feedforward neural network.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the iterating is performed according to a preset number of times, and the encoder output matrix a is generated based on the output matrix z obtained last time _e Comprising the following steps: stacking the query vector q, the key vector k and the value vector v according to a preset number of times, and the relative attention matrix z _h Performing iterative computation to generate the output matrix z; splitting the output matrix z obtained last time into h pieces, and sequentially adding the h pieces to obtain an encoder output matrix a _e 。

Optionally, in a fifth implementation manner of the first aspect of the present invention, the outputting the encoder output matrix a _e Inputting a preset decoder to obtain a target output matrix, converting the target output matrix into MIDI files, and generating a final Buddha musical composition comprises: matrix a is output by the encoder _e Inputting a preset decoder, and generating a target output matrix based on an attention unit and a feedforward neural network of the preset decoder; and performing inverse operation of the embedded vector based on the target output matrix, outputting MIDI files, and generating a final Buddha musical composition.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the encoder output matrix a is obtained after the step of outputting the encoder output matrix a _e Before inputting the preset decoder, the method further comprises: inserting an intermediate attention layer in said preset decoder, said intermediate attention layer comprising attention units, the front of said intermediate attention layerThe input of the individual units is the z _h The input of the last unit of the middle attention layer is a _e 。

A second aspect of the present invention provides a focus model-based phor music generation apparatus, comprising: the acquisition module is used for acquiring an original audio file, wherein the original audio file is a musical instrument digital interface MIDI file of Buddhism music; the extraction module is used for extracting the word-reading characters based on the original audio file, searching the tone corresponding to the word-reading characters according to a preset pronunciation table and generating a plurality of independent events; the first generation module is used for taking each single event as an embedded vector x, generating a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x based on a preset vector formula, and stacking the plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, key matrix K and value matrix V; a second generation module for generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formula _h Generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into a feedforward neural network; the iteration module is used for carrying out iteration according to preset times and generating an encoder output matrix a based on the output matrix z obtained last time _e The method comprises the steps of carrying out a first treatment on the surface of the A decoding module for outputting the encoder output matrix a _e And inputting a preset decoder to obtain a target output matrix, and converting the target output matrix into MIDI files to generate a final Buddha musical composition.

Optionally, in a first implementation manner of the second aspect of the present invention, the extracting module includes:

the extraction unit is used for extracting the word-playing characters corresponding to each melody note and the time stamp thereof based on the original audio file to obtain a plurality of groups of word-playing characters; the searching unit is used for searching the phonation tones corresponding to the plurality of groups of phonation characters according to the Hakka pronunciation table to obtain a plurality of groups of phonation tones; and the generating unit is used for generating a plurality of independent events based on the plurality of groups of the phonograph tones, wherein each independent event comprises a group of phonograph tones, melody notes corresponding to the group of phonograph tones and a time stamp.

Optionally, in a second implementation manner of the second aspect of the present invention, the first generating module includes: the first computing unit is used for taking each single event as an embedded vector x, generating a plurality of groups of query vectors q, key vectors k and value vectors v corresponding to the embedded vector x based on a preset vector formula, wherein the preset vector formula is as follows:，/>，wherein->、/>And->Is a preset parameter matrix with the size of +.>N is the length of the embedded vector x, the generated query vector q, the key vector k and the value vector v are respectively vectors with the length of 512; a first stacking unit for generating said query vector q, said key vector k and said value vector v according to +.>And stacking the two types of the data to obtain a corresponding query matrix Q, a key matrix K and a value matrix V.

Optionally, in a third implementation manner of the second aspect of the present invention, the second generating module includes: a second calculation unit for calculating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formula _h The attention unit comprises eight segments, each segment

The length of the segment isThe preset relative attention moment array calculation formula is as follows:

wherein->Dividing the length of the key vector by the attention

The length of the cell segment is chosen to be,for each segment, a relative position matrix having dimensions ofThe method comprises the steps of carrying out a first treatment on the surface of the A connection unit for matrix z of relative attention of each segment _h And sequentially connecting the embedded vectors to generate an output matrix z, taking the embedded vectors x as residual errors, carrying out weighted addition on the embedded vectors x and the output matrix z, generating a weighted result, and inputting the weighted result into a feedforward neural network.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the iteration module includes: a second stacking unit for stacking the query vector q, the key vector k and the value vector v according to a preset number of times, and for the relative attention matrix z _h Performing iterative computation to generate the output matrix z; a splitting unit, configured to split the output matrix z obtained last time into h pieces and sequentially add the h pieces to obtain an encoder output matrix a _e 。

Optionally, in a fifth implementation manner of the second aspect of the present invention, the decoding module includes: an input unit for outputting the encoder output matrix a _e Inputting a preset decoder, and generating a target output matrix based on an attention unit and a feedforward neural network of the preset decoder; an output unit for inputting based on the target output matrixAnd (5) performing inverse operation on the line embedded vector, outputting MIDI files, and generating a final Buddha musical composition.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the encoder output matrix a is obtained after the step of applying the encoder output matrix a _e Before inputting the preset decoder, the apparatus further comprises:

an inserting module for inserting an intermediate attention layer in the preset decoder, wherein the intermediate attention layer comprises an attention unit, and the front of the intermediate attention layerThe input of the individual units is the z _h The input of the last unit of the middle attention layer is a _e 。

A third aspect of the present invention provides a focus model-based phor music generation apparatus comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the attention model based phoropter generation device to perform the attention model based phoropter generation method described above.

A fourth aspect of the present invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the above-described attention model-based method of generating a phoropter.

In the technical scheme provided by the invention, an original audio file is obtained, wherein the original audio file is a musical instrument digital interface MIDI file of Buddhism music; extracting word-reading characters based on the original audio file, searching tones corresponding to the word-reading characters according to a preset pronunciation table, and generating a plurality of independent events; each single event is used as an embedded vector x, a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vector x are generated based on a preset vector formula, and the plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vector x are stacked to obtain a corresponding query matrix Q, key matrix K and value matrix V; based on a preset relative attention matrix calculation formula, generatingRelative attention matrix z of each segment of an attention unit _h Generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into a feedforward neural network; iterating according to preset times, and generating an encoder output matrix a based on the output matrix z obtained last time _e The method comprises the steps of carrying out a first treatment on the surface of the Matrix a is output by the encoder _e And inputting a preset decoder to obtain a target output matrix, and converting the target output matrix into MIDI files to generate a final Buddha musical composition. In the embodiment of the invention, an improved relative self-attention algorithm is adopted in the self-encoder, so that the generated Buddha musical composition is more in accordance with the rule of traditional music, and the accuracy of phonological notes of the singing words is improved.

Drawings

FIG. 1 is a diagram of an embodiment of a method for generating a Buddha music based on an attention model according to an embodiment of the present invention;

FIG. 2 is a diagram of another embodiment of a method for generating a Buddha music based on an attention model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an embodiment of a Buddha music generating apparatus based on an attention model according to an embodiment of the present invention;

FIG. 4 is a schematic view of another embodiment of a Buddha music generating apparatus based on an attention model according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an embodiment of a focus model-based Buddha music generating apparatus in an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a Buddha music generation method, device, equipment and storage medium based on an attention model, which are used for adopting an improved relative self-attention algorithm in a self-encoder, so that the generated Buddha music work is more in line with the rule of traditional music, and the accuracy of phonation of a word is improved.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For ease of understanding, a specific flow of an embodiment of the present invention is described below with reference to fig. 1, and an embodiment of a method for generating a Buddha music based on an attention model in the embodiment of the present invention includes:

101. and obtaining an original audio file, wherein the original audio file is a musical instrument digital interface MIDI file of Buddhism music.

The server acquires an original audio file, wherein the original audio file is a musical instrument digital interface MIDI file of Buddhism music. The server obtains the MIDI file of Buddhism music from the relevant Buddhism website or preset music library, the digital interface of musical instrument (musical instrument digital interface, MIDI) is a kind of communication standard, the apparatus used for confirming the computer music procedure, synthesizer and other electronic sound exchanges information and control signal each other, the MIDI file includes the musical note information of every channel, such as key channel number, duration, volume and dynamics, etc., because the MIDI file is a series of instructions, rather than the waveform, it needs very little disk space, edit and modify very flexible to MIDI data, can increase or delete some notes or change the attribute of the note conveniently.

It will be appreciated that the execution subject of the present invention may be a focus model-based Buddha music generating apparatus, or may be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.

102. And extracting the word-reading characters based on the original audio file, searching the tone corresponding to the word-reading characters according to a preset pronunciation table, and generating a plurality of independent events.

The server extracts the word words based on the original audio file, searches the tone corresponding to the word words according to a preset pronunciation table, and generates a plurality of independent events. The server extracts the phonation words from the MIDI file of the original audio, searches the tone corresponding to the phonation words according to the Meizhou phonation list, wherein the Meizhou phonation refers to the Hakka dialect distributed in the Meizhou region of Guangdong, and the Meizhou phonation list comprises corresponding initials, finals and tone information.

103. And taking each single event as an embedded vector x, generating a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x based on a preset vector formula, and stacking the plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, key matrix K and value matrix V.

The server takes each single event as an embedded vector x, generates a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x based on a preset vector formula, stacks the plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x, and obtains a corresponding query matrix Q, key matrix K and value matrix V. Specifically, the server uses each individual event as an embedded vector x, calculates and generates a query vector q, a key vector k and a value vector v corresponding to a plurality of groups of embedded vectors x based on a preset vector formula, wherein the preset vector formula is as follows: ，/>，/>Wherein->、/>And->Is a preset parameter matrix with the size of +.>N is the length of the embedded vector x, the generated query vector q, and the key vector k and the value vector v are respectively vectors with the length of 512; the server generates a query vector q, a key vector k and a value vector v according to +.>And stacking the two types of the data to obtain a corresponding query matrix Q, a key matrix K and a value matrix V.

The query vector q, the key vector k and the value vector v are generated by multiplying the embedded vector x by a preset parameter matrix respectively, and the server is based on the generated query vector q, key vector k and value vector v according toAnd stacking to finally generate a corresponding query matrix Q, a key matrix K and a value matrix V.

104. Based on a preset relative attention matrix calculation formula, generating a relative attention matrix z of each segment of the attention unit _h And generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into the feedforward neural network.

The server generates a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formula _h And generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into the feedforward neural network. Specifically, the server calculates the relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formula _h Note that

The force unit comprises eight segments, each segment having a length ofThe preset relative attention moment array calculation formula is as follows:

wherein->Segmentation of the length of a key vector divided by the attention unit

Is provided for the length of (a),for each segmented relative position matrix, the dimension of the relative position matrix is +.>T represents the transpose of the key matrix K; the server matrices the relative attention of each segment z _h And sequentially connecting the embedded vectors to generate an output matrix z, taking the embedded vectors x as residual errors, carrying out weighted addition on the residual errors and the output matrix z, generating a weighted result, and inputting the weighted result into the feedforward neural network.

The feedforward neural network is also called as a multi-layer perceptron, each neuron is arranged in layers, each neuron is only connected with neurons of a previous layer, the output of the previous layer is received and input to the next layer, feedback is not provided among layers, the embedded vector x is used as a residual error in the scheme, weighted summation is carried out according to preset weights and an output matrix z, and the generated result is input to the feedforward neural network.

105. Iterating according to preset times, and generating an encoder output matrix a based on the output matrix z obtained last time _e 。

The server iterates according to preset times and generates an encoder output matrix a based on the output matrix z obtained last time _e . Specifically, according to the preset times, the query vector q, the key vector k and the value vector v are stacked, and the relative attention matrix z is obtained _h Performing iterative computation to generate an output matrix z; splitting the output matrix z obtained last time into h pieces and sequentially adding the h pieces to obtain an encoder output matrix a _e 。

The server stacks the vectors and calculates a relative attention matrix z _h Five iterations are performed to obtain the output momentDividing the matrix z into h and sequentially adding to obtain an encoder output matrix a _e 。

106. Matrix a is output by the encoder _e And inputting the target output matrix into a preset decoder to obtain the target output matrix, and converting the target output matrix into MIDI files to generate the final Buddha musical composition.

The server outputs the encoder output matrix a _e And inputting the target output matrix into a preset decoder to obtain the target output matrix, and converting the target output matrix into MIDI files to generate the final Buddha musical composition. Specifically, the server outputs the encoder output matrix a _e Inputting a preset decoder, and generating a target output matrix based on an attention unit and a feedforward neural network of the preset decoder; the server performs inverse operation of the embedded vector based on the target output matrix, outputs MIDI files, and generates a final Buddha musical composition.

The decoder is used for outputting matrix a of decoder output _e The tone information in the input embedded matrix is fixed as the word information of the target single word through the attention unit and the feedforward neural network and converted into MIDI file again, when the MIDI file is called, the model generates the pitch and time position information of the melody, and the final output matrix is converted into MIDI file again, thus the final Buddha music work can be output.

In the embodiment of the invention, an improved relative self-attention algorithm is adopted in the self-encoder, so that the generated Buddha musical composition is more in accordance with the rule of traditional music, and the accuracy of phonological notes of the singing words is improved.

Referring to fig. 2, another embodiment of the method for generating a Buddha music based on an attention model according to the present invention includes:

201. and obtaining an original audio file, wherein the original audio file is a musical instrument digital interface MIDI file of Buddhism music.

202. Based on the original audio file, extracting the word-record characters corresponding to each melody note and the time stamp thereof to obtain a plurality of groups of word-record characters.

The server extracts the word-record characters corresponding to each melody note and the time stamp thereof based on the original audio file, and obtains a plurality of groups of word-record characters. The server extracts melody notes from the original audio file, determines time stamps corresponding to different melody notes and word-playing characters corresponding to different melody notes, and obtains a plurality of word-playing characters.

203. And searching the phonograph tones corresponding to the plurality of groups of phonograph words according to the Hakka pronunciation table to obtain a plurality of groups of phonograph tones.

The server searches the phonation tones corresponding to the phonation words of the plurality of groups according to the phonation list of the guest, and obtains the phonation tones of the plurality of groups. The server refers to a six-tone system of the Hakka voice in Meizhou, takes the phonological labels of the Chinese words of yin-yang level, up-sound, down-sound, yin-in and yang-in as phonological labels of the Chinese words, adds the phonological labels into an embedded matrix of the attention model, takes the phonological labels as one of input parameters, and searches the Chinese word tones corresponding to a plurality of groups of Chinese word characters based on the Hakka voice pronunciation table.

204. Based on the plurality of sets of the phonograph tones, a plurality of individual events are generated, each individual event comprising a set of the phonograph tones, and a melody note and a timestamp corresponding to the set of the phonograph tones.

The server generates a plurality of individual events based on the plurality of sets of the phonograph tones, each individual event including a set of the phonograph tones, and melody notes and time stamps corresponding to the set of the phonograph tones. Each individual event is composed of a word tone, corresponding melody notes and corresponding time stamps, which are queried based on word words, forming a finite sequence of events.

205. And taking each single event as an embedded vector x, generating a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x based on a preset vector formula, and stacking the plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, key matrix K and value matrix V.

The server takes each single event as an embedded vector x, generates a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x based on a preset vector formula, stacks the plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x, and obtains a corresponding query matrix Q, key matrix K and value matrix V. Specifically, the server uses each individual event as an embedded vector x, calculates and generates a query vector q, a key vector k and a value vector v corresponding to a plurality of groups of embedded vectors x based on a preset vector formula, wherein the preset vector formula is as follows: ，/>，/>Wherein->、/>And->Is a preset parameter matrix with the size of +.>N is the length of the embedded vector x,the generated query vector q, the key vector k and the value vector v are vectors with the length of 512 respectively; the server generates a query vector q, a key vector k and a value vector v according to +.>And stacking the two types of the data to obtain a corresponding query matrix Q, a key matrix K and a value matrix V.

206. Based on a preset relative attention matrix calculation formula, generating a relative attention matrix z of each segment of the attention unit _h And generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into the feedforward neural network.

207. Iterating according to preset times, and generating an encoder output matrix a based on the output matrix z obtained last time _e 。

The server stacks the vectors and calculates a relative attention matrix z _h Five iterations are carried out, the output matrix z obtained in the last time is split into h and added in sequence, and the encoder output matrix a is obtained _e 。

208. Matrix a is output by the encoder _e Inputting the preset decoder to obtain target output matrix,and converting the target output matrix into MIDI files to generate the final Buddha musical composition.

The attention model-based Buddha music generating method in the embodiment of the present invention is described above, and the attention model-based Buddha music generating device in the embodiment of the present invention is described below, referring to fig. 3, and one embodiment of the attention model-based Buddha music generating device in the embodiment of the present invention includes:

the obtaining module 301 is configured to obtain an original audio file, where the original audio file is a MIDI file;

the extracting module 302 is configured to extract the word text based on the original audio file, search the tone corresponding to the word text according to a preset pronunciation table, and generate a plurality of individual events;

The first generating module 303 is configured to use each individual event as an embedded vector x, generate a plurality of sets of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x based on a preset vector formula, and stack the plurality of sets of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, key matrix K and value matrix V;

a second generation module 304 for generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formula _h Generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into a feedforward neural network;

an iteration module 305 for performing iteration according to a preset number of times, and generating an encoder output matrix a based on the output matrix z obtained last time _e ；

A decoding module 306 for matrix a of encoder outputs _e And inputting the target output matrix into a preset decoder to obtain the target output matrix, and converting the target output matrix into MIDI files to generate the final Buddha musical composition.

Referring to fig. 4, another embodiment of the Buddha music generating apparatus based on the attention model according to the embodiment of the present invention includes:

a second generation module 304 for calculating a formula based on a preset relative attention moment arrayGenerating a relative attention matrix z for each segment of an attention unit _h Generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into a feedforward neural network;

Optionally, the extracting module 302 includes:

an extracting unit 3021, configured to extract, based on the original audio file, the phonogram text corresponding to each melody note and the timestamp thereof, so as to obtain a plurality of groups of phonogram text;

a searching unit 3022 for searching the phonation tones corresponding to the plurality of groups of phonation words according to the Hakka pronunciation table to obtain a plurality of groups of phonation tones;

the generating unit 3023 is configured to generate a plurality of individual events based on the plurality of sets of the phonograph tones, where each individual event includes a set of the phonograph tones, and a melody note and a time stamp corresponding to the set of the phonograph tones.

Optionally, the first generating module 303 includes:

the first calculating unit 3031 is configured to calculate and generate, based on a preset vector formula, a plurality of query vectors q, a key vector k and a value vector v corresponding to the embedded vectors x, with each individual event as an embedded vector x, where the preset vector formula is:，/>，/>wherein->、/>And->Is a preset parameter matrix, the sizes of which are respectivelyN is the length of the embedded vector x, the generated query vector q, the key vector k and the value vector v are the vectors with the length of 512 respectively;

A first stacking unit 3032 for stacking the query vector q, the key vector k and the value vector v generated by each individual event according to the following parameters respectivelyAnd stacking the two types of the data to obtain a corresponding query matrix Q, a key matrix K and a value matrix V.

Optionally, the second generating module 304 includes:

a second calculation unit 3041 for calculating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formula _h The attention unit comprises eight segments, each segment

Length ofThe preset relative attention moment array calculation formula is as follows:

wherein->Dividing the length of the key vector by the length of the attention unit segment, < >>For each segmented relative position matrix, the dimension of the relative position matrix is +.>T represents the transpose of the key matrix K;

a connection unit 3042 for matrix z of relative attention of each segment _h And sequentially connecting the embedded vectors to generate an output matrix z, taking the embedded vectors x as residual errors, carrying out weighted addition on the residual errors and the output matrix z, generating a weighted result, and inputting the weighted result into the feedforward neural network.

Optionally, the iteration module 305 includes:

a second stacking unit 3051 for stacking the query vector q, the key vector k, and the value vector v according to a preset number of times, and for stacking the relative attention matrix z _h Performing iterative computation to generate an output matrix z;

splitting unit 3052 for splitting output matrix z obtained last time into h and sequentially adding to obtain encoder output matrix a _e 。

Optionally, the decoding module 306 includes:

an input unit 3061 for outputting an encoder output matrix a _e Inputting a preset decoder, and generating a target output matrix based on an attention unit and a feedforward neural network of the preset decoder;

and the output unit 3062 is used for performing inverse operation of the embedded vector based on the target output matrix, outputting MIDI files and generating a final Buddha musical composition.

Optionally, the focus model-based Buddha music generating apparatus further includes:

an inserting module 307 for inserting a middle attention layer in the preset decoder, the middle attention layer comprising an attention unit, the middle attention layer being preceded by an attention unitThe input of the individual cell is z _h The input of the last cell of the intermediate attention layer is a _e 。

The attention model-based Buddha music generating apparatus in the embodiment of the present invention is described in detail above in terms of the modularized functional entity in fig. 3 and 4, and the attention model-based Buddha music generating device in the embodiment of the present invention is described in detail below in terms of hardware processing.

Fig. 5 is a schematic structural diagram of a focus model-based phor generating device 500 according to an embodiment of the present invention, where the focus model-based phor generating device 500 may have relatively large differences due to different configurations or performances, and may include one or more processors (central processing units, CPU) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) storing application programs 533 or data 532. Wherein memory 520 and storage medium 530 may be transitory or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations on the attention model-based Buddha music generating apparatus 500. Still further, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the attention model-based Buddha music generating device 500.

The attention model-based Buddha music generating device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input/output interfaces 560, and/or one or more operating systems 531, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the attention model based Buddha music generating device structure shown in FIG. 5 does not constitute a limitation of the attention model based Buddha music generating device, and may include more or less components than illustrated, or may combine certain components, or may be a different arrangement of components.

The present invention also provides an attention model-based Buddha music generating apparatus, the computer apparatus including a memory and a processor, the memory storing computer-readable instructions that, when executed by the processor, cause the processor to perform the steps of the attention model-based Buddha music generating method in the above embodiments. The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, in which instructions are stored which, when executed on a computer, cause the computer to perform the steps of the focus model-based Buddha music generating method.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for generating a Buddha music based on an attention model, characterized in that the method for generating a Buddha music based on an attention model comprises:

acquiring an original audio file, wherein the original audio file is a musical instrument digital interface MIDI file of Buddhism music;

extracting word-reading characters based on the original audio file, searching tones corresponding to the word-reading characters according to a preset pronunciation table, and generating a plurality of independent events;

each single event is used as an embedded vector x, a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x are generated based on a preset vector formula, and the query vectors Q, the key vectors K and the value vectors V corresponding to the embedded vectors x are stacked to obtain a corresponding query matrix Q, a key matrix K and a value matrix V;

Based on a preset relative attention matrix calculation formula, generating a relative attention matrix z of each segment of the attention unit _h Generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into a feedforward neural network;

iterating according to preset times, and generating an encoder output matrix a based on the output matrix z obtained last time _e ；

Matrix a is output by the encoder _e Inputting into a preset decoder to obtain target outputThe matrix is used for converting the target output matrix into MIDI files to generate a final Buddha music work;

the preset relative attention moment array calculation formula is as follows:

，

wherein,,dividing the length of the key vector by the length of the attention unit segment, < >>For each segmented relative position matrix, T represents the transpose of the key matrix K.

2. The attention model based Buddha music generating method as set forth in claim 1, wherein the extracting the word-of-a-word from the original audio file, searching the tone corresponding to the word-of-a-word from a preset pronunciation table, generating a plurality of individual events, each individual event including a word-of-a-word and the tone corresponding to the word-of-a-word, includes:

Extracting the word-reading characters corresponding to each melody note and the time stamp thereof based on the original audio file to obtain a plurality of groups of word-reading characters;

searching the phonation tones corresponding to the plurality of groups of phonation characters according to a Hakka pronunciation table to obtain a plurality of groups of phonation tones;

based on the plurality of sets of the phonograph tones, a plurality of individual events are generated, each individual event comprising a set of phonograph tones, and melody notes and timestamps corresponding to the set of phonograph tones.

3. The method of claim 1, wherein generating a plurality of sets of query vectors Q, key vectors K, and value vectors V corresponding to the embedded vectors x based on a preset vector formula by using each individual event as the embedded vector x, and stacking the plurality of sets of query vectors Q, key vectors K, and value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, key matrix K, and value matrix V comprises:

taking each single event as an embedded vector x, calculating and generating a plurality of groups of query vectors q, key vectors k and value vectors v corresponding to the embedded vector x based on a preset vector formula, wherein the preset vector formula is as follows:，/>，wherein- >、/>And->Is a preset parameter matrix with the size of +.>N is the length of the embedded vector x, the generated query vector q, the key vector k and the value vector v are respectively vectors with the length of 512;

generating said query vector q, said key vector k and said value vector v for each individual event according toAnd stacking the two types of the data to obtain a corresponding query matrix Q, a key matrix K and a value matrix V.

4. The attention model based Buddha music generating method of claim 1,the method is characterized in that the relative attention matrix z of each segment of the attention unit is generated based on a preset relative attention matrix calculation formula _h And generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into a feedforward neural network comprises:

based on a preset relative attention matrix calculation formula, calculating a relative attention matrix z of each segment of the attention unit _h The attention unit comprises eight segments, each segment having a length ofThe n is the length of the embedded vector x, and the preset relative attention moment array calculation formula is as follows:

wherein- >Dividing the length of the key vector by the length of the attention unit segment, +.>For each segment a relative position matrix having a dimension +.>；

A relative attention matrix z for each segment _h And sequentially connecting the embedded vectors to generate an output matrix z, taking the embedded vectors x as residual errors, carrying out weighted addition on the embedded vectors x and the output matrix z, generating a weighted result, and inputting the weighted result into a feedforward neural network.

5. The attention model based Buddha music generating method as claimed in claim 1, wherein said iterating is performed according to a preset number of times, and a code is generated based on the output matrix z obtained last timeOutput matrix a of the device _e Comprising the following steps:

stacking the query vector q, the key vector k and the value vector v according to a preset number of times, and the relative attention matrix z _h Performing iterative computation to generate the output matrix z;

splitting the output matrix z obtained last time into h pieces, and sequentially adding the h pieces to obtain an encoder output matrix a _e 。

6. The attention model based phor music generation method according to any one of claims 1 to 5, wherein the encoder output matrix a is calculated by a computer _e Inputting a preset decoder to obtain a target output matrix, converting the target output matrix into MIDI files, and generating a final Buddha musical composition comprises:

matrix a is output by the encoder _e Inputting a preset decoder, and generating a target output matrix based on an attention unit and a feedforward neural network of the preset decoder;

and performing inverse operation of the embedded vector based on the target output matrix, outputting MIDI files, and generating a final Buddha musical composition.

7. The attention model based phor music generation method of claim 6 wherein, in said outputting said encoder output matrix a _e Before inputting the preset decoder, the method further comprises:

inserting an intermediate attention layer in said preset decoder, said intermediate attention layer comprising attention units, the front of said intermediate attention layerThe input of the individual units is the z _h The input of the last unit of the middle attention layer is a _e 。

8. An attention model-based phor music generation device, characterized in that the attention model-based phor music generation device comprises:

the acquisition module is used for acquiring an original audio file, wherein the original audio file is a musical instrument digital interface MIDI file of Buddhism music;

The extraction module is used for extracting the word-reading characters based on the original audio file, searching the tone corresponding to the word-reading characters according to a preset pronunciation table and generating a plurality of independent events;

the first generation module is used for taking each single event as an embedded vector x, generating a plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x based on a preset vector formula, and stacking the plurality of groups of query vectors Q, key vectors K and value vectors V corresponding to the embedded vectors x to obtain a corresponding query matrix Q, key matrix K and value matrix V;

a second generation module for generating a relative attention matrix z of each segment of the attention unit based on a preset relative attention matrix calculation formula _h Generating an output matrix z, generating a weighted result according to the embedded vector x and the output matrix z, and inputting the weighted result into a feedforward neural network;

the iteration module is used for carrying out iteration according to preset times and generating an encoder output matrix a based on the output matrix z obtained last time _e ；

A decoding module for outputting the encoder output matrix a _e Inputting a preset decoder to obtain a target output matrix, converting the target output matrix into MIDI files, and generating a final Buddha musical composition;

The preset relative attention moment array calculation formula is as follows:

，

9. An attention model-based phor music generating device, characterized in that the attention model-based phor music generating device comprises: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invokes the instructions in the memory to cause the attention model based phoropter generation device to perform an attention model based phoropter generation method of any of claims 1-7.

10. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the attention model-based phoropter generation method of any of claims 1-7.