CN113112969A

CN113112969A - Buddhism music score recording method, device, equipment and medium based on neural network

Info

Publication number: CN113112969A
Application number: CN202110308570.7A
Authority: CN
Inventors: 刘奡智; 韩宝强; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-07-13
Anticipated expiration: 2041-03-23
Also published as: CN113112969B

Abstract

The invention is used in the field of artificial intelligence, relates to the field of block chains, and discloses a Buddhist music score recording method, a Buddhist music score recording device, Buddhist music score recording equipment and a medium based on a neural network, wherein the method part comprises the following steps: acquiring original Buddhist audio data needing to be converted into a music score, converting the original Buddhist audio data into a time frequency spectrum matrix, acquiring a pitch identification network structure comprising a pitch identification model, acquiring a lyric identification network structure comprising a lyric identification model, inputting the time frequency spectrum matrix into the pitch identification network structure and the lyric identification network structure respectively to acquire pitch identification data and lyric identification data of the original Buddhist audio data, and generating a numbered musical notation of the original Buddhist audio data according to the pitch identification data and the lyric identification data; the invention utilizes the transfer learning technology, takes the pre-trained pitch recognition model and the voice-to-character model as the basis, can automatically recognize the melody and the librets of the Buddhism audio frequency to obtain the numbered musical notation, reduces the time cost of recording the spectrum of the Buddhism music, and improves the efficiency of recording the spectrum of the Buddhism music.

Description

Buddhism music score recording method, device, equipment and medium based on neural network

Technical Field

The invention relates to the field of artificial intelligence, in particular to a Buddhist music score recording method, device, equipment and medium based on a neural network.

Background

The work of collecting wind and recording music of Buddhism music has important significance for the reservation and inheritance of Buddhism culture. Because of the particularity of the words used by Buddhism music and the characteristics of musics, at present, some practical automatic music recording systems exist in the fields of popular music and Western classical music, and when identifying and recording the music of Buddhism music, the accuracy rate is low, so that the automatic music recording systems are difficult to be applied to the field of Buddhism music.

Therefore, after the Buddhist music is obtained by collecting wind, the music score recording of the Buddhist music is mainly completed manually, the process of manually recording the music score is more in repeated work, and the music score recording personnel need to have higher knowledge level of Buddhist and music level. A scholars skilled in mastering music knowledge and Buddhism related culture often takes several months to record one or two or more than one hundred music scores, a great deal of energy and time are consumed, the score recording cost is high, and the score recording efficiency is low.

Disclosure of Invention

The invention provides a Buddhist music score recording method, device, equipment and medium based on a neural network, and aims to solve the problems that in the prior art, the score recording work of Buddhist music depends on manual score recording, so that the score recording cost is high and the score recording efficiency is low.

A Buddhism music score recording method based on a neural network comprises the following steps:

acquiring original Buddhist audio data needing to be converted into a music score, and converting the original Buddhist audio data into a time-frequency spectrum matrix, wherein the original Buddhist audio data comprises music melody and words of Buddhist audio;

acquiring a pitch recognition network structure comprising a pitch recognition model, wherein the pitch recognition model is a neural network recognition model formed after transfer learning of a pre-trained sound scene classification model;

acquiring a lyric recognition network structure comprising a lyric recognition model, wherein the lyric recognition model is a neural network recognition model formed after a pre-trained voice-to-character model is migrated and learned;

inputting the time-frequency spectrum matrix into the pitch identification network structure to obtain pitch identification data of the original Buddhist audio data;

inputting the time frequency spectrum matrix into the lyric recognition network structure to obtain the lyric recognition data of the original Buddhism audio data;

and generating a numbered musical notation of the original Buddhism audio data according to the pitch identification data and the libretto identification data.

Further, the generating the numbered musical notation of the original Buddhism audio data according to the pitch identification data and the lyric identification data includes:

determining a pitch of each frame in the pitch identification data and determining a lyric of each frame in the lyric identification data;

correspondingly connecting the pitch of each frame with the lyrics to obtain initial data of the original Buddhism audio data;

and performing beat quantization on the initial data by adopting a metronome to generate a numbered musical notation of the initial data.

Further, after generating the numbered musical notation of the original Buddhism audio data according to the pitch identification data and the lyric identification data, the method further comprises:

converting the format of the numbered musical notation into a MusicXML format;

acquiring a manual proofreading result of the Buddhist music expert on the music notation in the MusicXML format;

and updating the numbered musical notation according to the manual proofreading result.

Further, the converting the original Buddhism audio data into a time-frequency spectrum matrix comprises:

determining a window function for converting the original Buddhism audio data as a Hanning window function;

and carrying out short-time Fourier transform on the original Buddhism audio data according to the Hanning window function so as to obtain the time-frequency spectrum matrix.

Further, the calculation formula of the time-frequency spectrum matrix is as follows:

wherein, X (m, omega) is the time frequency spectrum matrix, N is the signal of the original Buddhist audio data, X [ N ] is the signal input sequence of the original Buddhist audio data, wn [ N ] is a Hanning window function, m is the time frame ordinal of the original Buddhist audio data, omega is the digital frequency ordinal of the original Buddhist audio data, and N is the frame length of the original Buddhist audio data.

A Buddhism music score recording device based on a neural network comprises:

the system comprises a conversion module, a time frequency spectrum matrix and a storage module, wherein the conversion module is used for acquiring original Buddhist audio data needing to be converted into a music score and converting the original Buddhist audio data into the time frequency spectrum matrix, and the original Buddhist audio data comprises music melody and words of the Buddhist audio;

the pitch recognition system comprises a first acquisition module, a second acquisition module and a control module, wherein the first acquisition module is used for acquiring a pitch recognition network structure comprising a pitch recognition model, and the pitch recognition model is a neural network recognition model formed after transfer learning of a pre-trained sound scene classification model;

the second acquisition module is used for acquiring a lyric recognition network structure comprising a lyric recognition model, wherein the lyric recognition model is a neural network recognition model formed after a pre-trained voice-to-character model is migrated and learned;

the first input module is used for inputting the time-frequency spectrum matrix into the pitch identification network structure to obtain pitch identification data of the original Buddhist audio data;

the second input module is used for inputting the time-frequency spectrum matrix into the lyric recognition network structure to obtain the lyric recognition data of the original Buddhism audio data;

and the generating module is used for generating a numbered musical notation of the original Buddhism audio data according to the pitch identification data and the libretto identification data.

Further, the module is specifically configured to:

Further, the conversion module is specifically configured to:

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above neural network-based buddhist music profiling method when executing the computer program.

A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above neural network-based buddhist music notation method.

In one scheme provided by the Buddhist music notation method, the device, the equipment and the medium based on the neural network, original Buddhist audio data needing to be converted into a music score is obtained, the original Buddhist audio data are converted into a time frequency spectrum matrix, the original Buddhist audio data comprise music melody and lyrics of Buddhist audio, a pitch recognition network structure comprising a pitch recognition model is obtained, the pitch recognition model is a neural network recognition model formed by transferring and learning a pre-trained sound scene classification model, a lyric recognition network structure comprising a lyric recognition model is obtained, the lyric recognition model is a neural network recognition model formed by transferring and learning a pre-trained voice-to-text model, then the time frequency spectrum matrix is input into the pitch recognition network structure to obtain pitch recognition data of the original Buddhist audio data, and the time frequency spectrum matrix is input into the lyric recognition network structure, obtaining the libretto identification data of the original Buddhism audio data, and finally generating a numbered musical notation of the original Buddhism audio data according to the pitch identification data and the libretto identification data; in the invention, after the original Buddhist audio data is converted into the time-frequency spectrum matrix, by using the transfer learning technology and taking the pre-trained pitch recognition model and the voice-to-character model as the basis, the melody and the lyric corresponding to the Buddhist audio can be automatically recognized, so that the numbered musical notation of the Buddhist audio is obtained, the time cost of recording the numbered musical notation by the Buddhist music is reduced, and the efficiency of recording the numbered musical notation by the Buddhist music is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a diagram illustrating an application environment of a Buddhist music score recording method based on neural network according to an embodiment of the present invention;

FIG. 2 is a flow chart of a Buddhist music notation method based on neural network according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a process for generating a numbered musical notation from a pitch recognition network structure and a lyric recognition network structure in accordance with an embodiment of the present invention;

FIG. 4 is a schematic flow chart of a Buddhist music score recording method based on neural network according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating an implementation of step S60 in FIG. 2;

FIG. 6 is a flowchart illustrating an implementation of step S10 in FIG. 2;

FIG. 7 is a diagram of a Buddhist music score recording apparatus based on neural network according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a computer device according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The Buddhist music score recording method based on the neural network provided by the embodiment of the invention can be applied to the application environment shown in figure 1, wherein the terminal equipment is communicated with a server through the network. The method comprises the steps that a user sends original Buddhist audio data needing to be converted into a music score to a server through terminal equipment, the server obtains the original Buddhist audio data comprising music melody of Buddhist audio and lyrics, converts the original Buddhist audio data into a time frequency spectrum matrix, obtains a pitch recognition network structure comprising a pitch recognition model, the pitch recognition model is a neural network recognition model formed after a pre-trained sound scene classification model is migrated and learned, obtains a lyric recognition network structure comprising a lyric recognition model, the lyric recognition model is a neural network recognition model formed after the pre-trained voice-to-text model is migrated and learned, inputs the time frequency spectrum matrix into the pitch recognition network structure, obtains pitch recognition data of the original Buddhist audio data, inputs the time frequency spectrum matrix into the lyric recognition network structure, and obtains the lyric recognition data of the original Buddhist audio data, finally, generating a numbered musical notation of the original Buddhist audio data according to the pitch identification data and the libretto identification data; with the pitch identification model and the pronunciation commentaries on classics character model of training in advance as the basis, can the automatic identification go out melody and the lyric that buddhist training audio frequency corresponds, and then obtain the numbered musical notation of buddhist training audio frequency, realized artificial intelligence + neural network's automatic register for easy reference process of buddhist training music to the time cost of buddhist training music register for easy reference has been reduced, and improved the efficiency of buddhist training music register for easy reference.

The database in this embodiment is a block chain database stored in a block chain network, and is used for storing data used and generated for implementing the Buddhist music score recording method based on the neural network, such as original Buddhist audio data, a time-frequency spectrum matrix, pitch identification data, lyric identification data, numbered musical notation and other related data. The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like. The database is deployed in the blockchain, so that the safety of data storage can be improved.

The terminal device may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.

In one embodiment, as shown in fig. 2, a buddhist music score recording method based on a neural network is provided, which is exemplified by the server in fig. 1, and includes the following steps:

s10: the method comprises the steps of obtaining original Buddhist audio data needing to be converted into a music score, and converting the original Buddhist audio data into a time-frequency spectrum matrix, wherein the original Buddhist audio data comprise music melodies and words of Buddhist audio.

The user is after obtaining original Buddhist audio data, will need to turn into the original Buddhist audio data of music score and send to the server through terminal equipment, and after the server received the original Buddhist audio data that need turn into the music score, with original Buddhist audio data conversion to time frequency spectrum matrix, be about to sound signal conversion digital signal to follow-up discernment. The original Buddhism audio data comprises music melody and lyrics of the Buddhism audio, so that the original Buddhism audio data can be translated into a music score with lyric information in the following process.

S20: and acquiring a pitch recognition network structure comprising a pitch recognition model, wherein the pitch recognition model is a neural network recognition model formed after the pre-trained sound scene classification model is subjected to transfer learning.

After the server receives the original Buddhist audio data, a pitch recognition network structure comprising a pitch recognition model needs to be acquired so as to recognize the singing pitch information in the original Buddhist audio data by adopting the pitch recognition network structure. The pitch recognition model of the pitch recognition network structure is a neural network recognition model formed after transfer learning of a pre-trained sound scene classification model.

S30: and acquiring a lyric recognition network structure comprising a lyric recognition model, wherein the lyric recognition model is a neural network recognition model formed after the pre-trained voice-to-character model is migrated and learned.

After the server receives the original Buddhism audio data, the server needs to acquire an lyric recognition network structure including a lyric recognition model so as to recognize the lyric information in the original Buddhism audio data by using the lyric recognition network structure. The libretto recognition model is a neural network recognition model formed by transferring and learning a pre-trained Speech-To-Text (Speech To Text) model.

S40: and inputting the time-frequency spectrum matrix into a pitch identification network structure to obtain pitch identification data of the original Buddhism audio data.

After converting the original Buddhist audio data into a time-frequency spectrum matrix and obtaining a pitch identification network structure, inputting the converted time-frequency spectrum matrix into the pitch identification network structure to perform note identification on the time-frequency spectrum matrix, receiving note information output by the pitch identification network structure, and obtaining pitch identification data of the original Buddhist audio data.

S50: and inputting the time-frequency spectrum matrix into the lyric recognition network structure to obtain the lyric recognition data of the original Buddhism audio data.

After converting the original Buddhism audio data into a time-frequency spectrum matrix and obtaining a lyric recognition network structure, the time-frequency spectrum matrix is required to be input into the lyric recognition network structure so as to perform note recognition on the time-frequency spectrum matrix, receive the lyric information output by the lyric recognition network structure and obtain the lyric recognition data of the original Buddhism audio data.

The pitch identification network structure comprises a pitch identification model and two full connection layers, and in order to ensure the fitting effect of the pitch identification network structure, the two full connection layers with sequentially reduced dimensionalities are required to be added behind the pitch identification model; in order to ensure the fitting effect of the network structure for identifying the lyrics, a full connection layer with two sequentially reduced dimensions needs to be added behind the lyric identification model. Wherein, because the musical notes are fewer than the commonly used characters, the two fully-connected layers in the pitch recognition network structure have fewer dimensions than the two fully-connected layers in the lyric recognition network structure.

For example, the structure of the pitch recognition network structure and the lyric recognition network structure is shown in fig. 3:

the pitch recognition model can adopt a Convolutional Neural Network (CNN) -based structure, and a VGGish pre-training model trained by using an Audio set data set is adopted, the network structure and parameters in the pre-training model are unchanged, the VGGish pre-training model comprises 12 convolutional layers and 4 pooling layers, and a full connection layer with two added dimensions of 1024 and 180 respectively is connected behind the last pooling layer. The phonogram recognition model adopts a pretrained mixed model of a Deep Speech2 Recurrent Neural Network (RNN) and a Convolutional Neural Network (CNN), the network structure and parameters in the pretrained model are unchanged, the pretrained mixed model comprises 4 convolutional layers, the Recurrent neural network is positioned between a 3 rd convolutional layer and a 4 th convolutional layer, the Recurrent neural network comprises a plurality of groups of Gated cyclic Units (GRUs), and two full-connection layers with 2048 and 360 dimensions are connected behind the fourth convolutional layer. The pitch identification network structure can output 180-dimensional probability distribution corresponding to 36 notes of three octaves C4-B6, and when note labels are output, notes are identified at intervals of 20 minutes, so that the accuracy of pitch identification data is guaranteed; the network structure for identifying the lyrics can output 360-dimensional probability distribution, and corresponds to 360 tags of common words of Buddhism music, and each tag corresponds to a single word in a predefined library of words of the lyrics of the Buddhism music.

After obtaining original Buddhism audio data and converting the original Buddhism audio data into a time-frequency spectrum matrix, respectively inputting the time-frequency spectrum matrix into a VGGish pre-training model and a pre-training mixed model, after carrying out convolution and pooling processing on the time-frequency spectrum matrix by the VGGish pre-training model, inputting the identification characteristics into a full connection layer with a dimensionality of 1024, then inputting into a full connection layer with a dimensionality of 180 to classify the identification characteristics, and finally carrying out normalization processing by using a Softmax function, and finally outputting pitch identification data; the pre-training mixed model performs convolution on the time-frequency spectrum matrix for 3 times, then enters a cyclic neural network, performs convolution processing for 1 time after the cyclic neural network outputs, then sequentially inputs the processing result into full connection layers with different dimensions for classification processing, performs normalization processing on the recognition result by using a Softmax function, finally outputs lyric recognition data, and correspondingly connects the output pitch recognition data and the lyric recognition data to obtain a numbered musical notation of original Buddhist audio data.

In this embodiment, the structures of the pitch recognition model and the libretto recognition model are merely exemplary illustrations, and in other embodiments, the pitch recognition model and the libretto recognition model may also be neural network models with other structures, which are not described herein again.

In this embodiment, the dimensions of the full connection layer in the pitch recognition network structure and the lyric recognition network structure are only exemplary illustrations, and in other embodiments, the dimensions of the full connection layer may be other, which is not described herein again.

In addition, when training the pitch recognition network structure and the lyric recognition network structure, Cross Entropy functions (Cross Entropy) of outputs of the pitch recognition network structure and the lyric recognition network structure are respectively calculated as loss functions, and parameters in two full connection layers are respectively updated by gradient descent back propagation so as to improve the training speed and the recognition effect.

S60: and generating a numbered musical notation of the original Buddhism audio data according to the pitch identification data and the libretto identification data.

After pitch identification data and lyric identification data of the original Buddhism audio data are obtained, the pitch identification data correspond to the lyric identification data according to each frame signal of the original Buddhism audio data, so that a numbered musical notation of the original Buddhism audio data is generated, and the server outputs the generated numbered musical notation to terminal equipment so that a user can look up the numbered musical notation.

Carry out migration study through the pitch identification model and the pronunciation commentaries on classics word model that will train in advance, generate pitch identification network structure and lyric identification network structure, the automatic register model training of relative high rate of accuracy can be accomplished to the use a small amount of buddhist training music sample, pitch identification data and the lyric identification data of original buddhist training audio data are discerned through network structure model, and then generate the numbered musical notation, can replace buddhist training music scholars to accomplish repeated numerous and diverse music melody and the lyric identification part in the work of registering the musical notation, the register efficiency of buddhist training music has been promoted, make the contribution for the reservation and the inheritance of buddhist traditional culture.

In the embodiment, original Buddhist audio data needing to be converted into a music score is obtained, the original Buddhist audio data are converted into a time-frequency spectrum matrix, the original Buddhist audio data comprise music melody and lyrics of Buddhist audio, a pitch recognition network structure comprising a pitch recognition model is obtained, the pitch recognition model is a neural network recognition model formed after a pre-trained sound scene classification model is migrated and learned, a lyric recognition network structure comprising a lyric recognition model is obtained, the lyric recognition model is a neural network recognition model formed after a pre-trained voice-to-character model is migrated and learned, then inputting the time-frequency spectrum matrix into a pitch identification network structure to obtain pitch identification data of the original Buddhist audio data, inputting the time-frequency spectrum matrix into the lyric recognition network structure to obtain the lyric recognition data of the original Buddhism audio data, and finally generating a numbered musical notation of the original Buddhism audio data according to the pitch recognition data and the lyric recognition data; after the audio data of original Buddhism are converted into the time-frequency spectrum matrix, the migration learning technology is utilized to pre-trained pitch recognition model and voice-to-character model are used as the basis, the melody and the lyric corresponding to the Buddhism audio can be automatically recognized, the numbered musical notation of the Buddhism audio is obtained, the time cost of recording the Buddhism music is reduced, and the efficiency of recording the Buddhism music is improved.

In an embodiment, as shown in fig. 4, after step S60, that is, after generating the numbered musical notation of the original buddhist audio data according to the pitch identification data and the libretto identification data, the method further includes the following steps:

s71: and converting the format of the numbered musical notation into a MusicXML format.

After the numbered musical notation of the original Buddhist audio data is generated according to the pitch identification data and the libretto identification data, the format of the numbered musical notation is converted into a musicXML format, and the numbered musical notation in the musicXML format is output, so that a Buddhist music expert can manually correct and modify the numbered musical notation in the musicXML format.

S72: and acquiring the manual proofreading result of the Buddhist music experts on the numbered musical notation in the MusicXML format.

After the Buddhist music experts manually proofread and modify the numbered musical notation in the MusicXML format, the manual proofread result of the Buddhist music experts on the numbered musical notation in the MusicXML format is obtained.

S73: and updating the numbered musical notation according to the manual proofreading result.

After the manual proofreading result of the Buddhist Music experts on the Music notation in the Music XML format is obtained, the Music notation is updated according to the manual proofreading result and finally serves as a finished product of the Music notation corresponding to the original Buddhist audio data, the accuracy of the Music notation is guaranteed, and the Music notation is convenient to maintain and use subsequently.

In this embodiment, after the numbered musical notation of the original buddhist audio data is generated according to the pitch identification data and the lyric identification data, the format of the numbered musical notation is converted into the MusicXML format, the manual proofreading result of the buddhist music experts on the numbered musical notation of the MusicXML format is obtained, and then the numbered musical notation is updated according to the manual proofreading result, so that the music score corresponding to the original buddhist audio data is output.

In one embodiment, as shown in fig. 5, in step S60, generating a numbered musical notation of the original buddhist audio data according to the pitch recognition data and the libretto recognition data specifically includes the following steps:

s61: a pitch of each frame in the pitch identification data is determined, and an isolated word in each frame in the word identification data is determined.

S62: and correspondingly connecting the pitch of each frame with the lyrics to obtain initial data of the original Buddhism audio data.

After the pitch identification data and the lyric identification data are obtained, the pitch of each frame in the pitch identification data is correspondingly connected with the lyric of each frame in the lyric identification data, and the initial data of the original Buddhism audio data is obtained.

For example, if the pitch corresponding to the first frame in the pitch identification data is C4 and the lyric corresponding to the first frame in the lyric identification data is good, C4 is connected to good correspondence, and the pitch and the lyric of each frame are sequentially connected to obtain the initial data corresponding to the pitch and the lyric.

In this embodiment, the pitch corresponding to the first frame in the pitch identification data is C4, and the lyric corresponding to the first frame in the lyric identification data is a good example, but in other embodiments, the first frame in the pitch identification data and the first frame in the lyric identification data may be other frames, which is not described herein again.

S63: and performing beat quantization on the initial data by adopting a metronome to generate a numbered musical notation of the initial data.

After initial data corresponding to the pitch and the lyrics are obtained, beat numbers of the initial data are detected by a metronome, beat quantization is carried out according to the beat numbers of the initial data, and finally a numbered musical notation with the lyric information is generated and output to the terminal equipment.

In the embodiment, the pitch of each frame in the pitch identification data is determined, the lyric of each frame in the lyric identification data is determined, then the pitch of each frame is correspondingly connected with the lyric to obtain the initial data of the original Buddhism audio data, finally, the metronome is adopted to carry out beat quantization on the initial data to generate the numbered musical notation of the initial data, the specific steps of generating the numbered musical notation of the original Buddhism audio data according to the pitch identification data and the lyric identification data are refined, and a basis is provided for generating the numbered musical notation.

In an embodiment, as shown in fig. 6, in step S10, converting the original buddhist audio data into a time-frequency spectrum matrix specifically includes the following steps:

and S11, determining the window function for converting the original Buddhism audio data to be a Hanning window function.

After the original Buddhism audio data are obtained, a window function needs to be determined, and then the original Buddhism audio data are intercepted according to the window function so as to be converted into a time-frequency spectrum matrix, and data leakage in the conversion process is reduced. The window function in this embodiment is a relatively smooth hanning window function to ensure that the transformed time-frequency spectrum matrix is closer to the real frequency spectrum of the original Buddhism audio data.

And S12, carrying out short-time Fourier transform on the original Buddhism audio data according to the Hanning window function to obtain a time-frequency spectrum matrix.

And after the window function is determined to be a Hanning window function, carrying out short-time Fourier transform on the original Buddhism audio data according to the Hanning window function to obtain a time-frequency spectrum matrix. The original Buddhism audio data are converted by adopting a short-time Fourier transform mode, amplitude and phase information of signals in the original Buddhism audio data are not lost, so that an obtained time-frequency spectrum matrix has clear instantaneous frequency and time delay, and subsequent model feature identification is facilitated.

The method comprises the following steps of carrying out short-time Fourier transform on original Buddhism audio data according to a Hanning window function to obtain a time-frequency spectrum matrix, wherein a calculation formula of the time-frequency spectrum matrix is as follows:

wherein, X (m, omega) is time frequency spectrum matrix, N is signal of original Buddhism audio data, X [ N ] is signal input sequence of original Buddhism audio data, w [ N ] is Hanning window function, m is time frame ordinal of original Buddhism audio data, omega is digital frequency ordinal of original Buddhism audio data, N is frame length of original Buddhism audio data.

In this embodiment, the window function for converting the original Buddhist audio data is determined to be a hanning window function, short-time fourier transform is performed on the original Buddhist audio data according to the hanning window function to obtain a time-frequency spectrum matrix, the step of converting the original Buddhist audio data into the time-frequency spectrum matrix is refined, short-time fourier transform is performed on the original Buddhist audio data by adopting the hanning window function, the possibility of frequency spectrum leakage is reduced, and high-frequency interference is reduced.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In an embodiment, a Buddhist music score recording device based on a neural network is provided, and the Buddhist music score recording device based on the neural network corresponds to the Buddhist music score recording method based on the neural network in the above embodiment one to one. As shown in fig. 7, the buddhist music score apparatus based on neural network includes a conversion module 701, a first obtaining module 702, a second obtaining module 703, a first input module 704, a second input module 705 and a generation module 706. The functional modules are explained in detail as follows:

a conversion module 701, configured to obtain original buddhist audio data that needs to be converted into a music score, and convert the original buddhist audio data into a time-frequency spectrum matrix, where the original buddhist audio data includes a music melody and a lyric of a buddhist audio;

a first obtaining module 702, configured to obtain a pitch identification network structure including a pitch identification model, where the pitch identification model is a neural network identification model formed after a pre-trained sound scene classification model is migrated and learned;

a second obtaining module 703, configured to obtain a lyric recognition network structure including a lyric recognition model, where the lyric recognition model is a neural network recognition model formed after a pre-trained speech-to-text model is migrated and learned;

a first input module 704, configured to input the time-frequency spectrum matrix into the pitch identification network structure, so as to obtain pitch identification data of the original buddhist audio data;

a second input module 705, configured to input the time-frequency spectrum matrix into the lyric recognition network structure, so as to obtain lyric recognition data of the original Buddhism audio data;

a generating module 706, configured to generate a numbered musical notation of the original Buddhist audio data according to the pitch identification data and the lyric identification data.

Further, the generating module 706 is specifically configured to:

Further, after generating the numbered musical notation of the original buddhist audio data according to the pitch identification data and the lyric identification data, the generating module 706 is further specifically configured to:

converting the format of the numbered musical notation into a MusicXML format;

Further, the conversion module 701 is specifically configured to:

Further, the conversion module 701 is specifically configured to obtain the time-frequency spectrum matrix according to the following formula:

The specific definition of the buddhist music score apparatus based on the neural network can be referred to the definition of the buddhist music score method based on the neural network, and is not described herein again. The modules in the Buddhist music score recording device based on the neural network can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing data used and generated by the Buddhist music score recording method based on the neural network, and comprises original Buddhist audio data, pitch identification network structure data, libretto identification network structure data, pitch identification data, libretto identification data, numbered musical notation and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a neural network-based Buddhism music notation method.

In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A Buddhism music score recording method based on a neural network is characterized by comprising the following steps:

2. The Buddhism music score recording method based on neural network as claimed in claim 1, wherein said generating the numbered musical notation of the original Buddhism audio data according to the pitch identification data and the lyric identification data comprises:

3. The neural network-based Buddhist music notation method as claimed in claim 1, wherein after generating the numbered musical notation of the original Buddhist audio data from the pitch identification data and the lyric identification data, the method further comprises:

converting the format of the numbered musical notation into a MusicXML format;

4. The method for Buddhism music notation based on neural network as claimed in claim 1, wherein said converting the original Buddhism audio data into a time-frequency spectrum matrix comprises:

5. The Buddhism music score-recording method based on neural network as claimed in claim 4, wherein the time-frequency spectrum matrix is calculated as follows:

6. A Buddhism music score recording device based on a neural network is characterized by comprising:

7. The apparatus for Buddhism music notation based on neural network as claimed in claim 6, wherein said generating a numbered musical notation of the original Buddhism audio data from the pitch identification data and the lyric identification data comprises:

8. The apparatus for Buddhism music notation based on neural network as claimed in claim 6, wherein said converting the original Buddhism audio data into a time-frequency spectrum matrix comprises:

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the neural network-based buddhist music notation method of any one of claims 1-5.

10. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the neural network-based buddhist music notation method of any one of claims 1-5.