CN111063327A

CN111063327A - Audio processing method and device, electronic equipment and storage medium

Info

Publication number: CN111063327A
Application number: CN201911398886.9A
Authority: CN
Inventors: 马丹
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-04-24

Abstract

The embodiment of the invention provides an audio processing method, an audio processing device, electronic equipment and a storage medium; the method comprises the following steps: acquiring acoustic features of audio data to be identified, wherein the acoustic features are used for representing pitches; and determining the positions of the notes corresponding to the audio data to be identified in the electronic music score according to the acoustic characteristics of the audio data to be identified. The audio processing method, the audio processing device, the electronic equipment and the storage medium provided by the embodiment of the invention realize the positioning of the musical notes in the audio data to be recognized in the electronic music score based on the acoustic characteristics of the audio data to be recognized.

Description

Audio processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of audio technologies, and in particular, to an audio processing method and apparatus, an electronic device, and a storage medium.

Background

When a musician plays musical instruments, the musician needs to refer to a music score in most cases, and the paper music score has various problems caused by page turning:

1. page turning operation seriously affects the fluency of playing, and the playing difficulty is improved;

2. even if the player can turn the pages smoothly, the player has some influence on the appearance of the live audience.

In order to solve the above problems, the skilled person invents a series of automatic or semi-automatic page turning devices, which are mainly classified into the following categories:

1. semi-automatic mechanical page turning device: a player is required to control the page turning of the paper music score by using a mechanical device like a pedal;

2. semi-automatic electronic page turning equipment: requiring a player to touch a designated area of a screen of the electronic device to control the electronic music score to turn pages

3. Time-controlled full-automatic electronic or mechanical page turning equipment: the page turning behavior of the mechanical or electronic equipment is carried out at regular time by presetting the time point of page turning in advance.

The above-mentioned devices have been found to have certain drawbacks in practical use.

For the semiautomatic mechanical page turning device and the semiautomatic electronic page turning device, although the operation complexity of a player is greatly reduced, manual intervention control is still required, and the risk of interfering with the playing is caused.

For the electronic or mechanical page turning equipment with full-automatic time control, a timing device needs to be operated in advance, the requirement on time is harsh, and once time dislocation occurs, the whole playing process is influenced.

If the music sound played by the musician on site can be collected and compared with the notes in the music score, the progress of the musician on site can be known in real time, and further the automatic page turning of the electronic music score is realized. There is no method or apparatus in the prior art that can automatically match audio data to notes in an electronic score.

Disclosure of Invention

Embodiments of the present invention provide an audio processing method, an audio processing apparatus, an electronic device, and a storage medium, so as to overcome the defect that there is no method or apparatus for automatically matching audio data with notes in an electronic score in the prior art.

An embodiment of a first aspect of the present invention provides an audio processing method, including:

acquiring acoustic features of audio data to be identified, wherein the acoustic features are used for representing pitches;

and determining the positions of notes corresponding to the audio data to be identified in the electronic music score according to the acoustic characteristics of the audio data to be identified.

In the above technical solution, further comprising:

the positions of the notes corresponding to the audio data to be recognized in the electronic music score are coincided with the positions of the notes with page turning marks in the electronic music score, and page turning operation is carried out on a display page of the electronic music score; the page turning mark is a mark made for a note positioned at the tail of a display page when the electronic music score is displayed.

In the above technical solution, the acquiring the acoustic features of the audio data to be recognized includes:

performing framing processing on the audio data to be identified to obtain frame data of the audio data to be identified;

converting the frame data of the audio data to be identified from a time domain space to a frequency domain space to obtain a spectrogram of the audio data to be identified;

extracting acoustic features of notes in the audio data to be identified according to the spectrogram of the audio data to be identified; the acoustic feature is a 12-dimensional vector, and the numerical value of each dimension in the 12-dimensional vector represents the frequency of the musical note appearing in the half-tone interval corresponding to the dimension.

In the above technical solution, the determining, according to the acoustic feature of the audio data to be identified, the position of the note corresponding to the audio data to be identified in the electronic score includes:

converting the acoustic features of the audio data to be recognized into a first note set, acquiring a plurality of continuous notes from the electronic music score to form a second note set, calculating the difference between the notes in the first note set and the notes in the second note set, and determining the matching condition of the audio data to be recognized and the notes in the electronic music score according to the difference; determining the positions of notes corresponding to the audio data to be identified in the electronic music score according to the matching condition; wherein the content of the first and second substances,

the notes in the first note set have the same format and the number of notes in the first note set is the same as the number of notes in the second note set;

or, converting and mapping a plurality of continuous notes obtained from the electronic score to obtain a first set of acoustic features; calculating the difference between the acoustic features of the audio data to be identified and the first acoustic feature set, and determining the matching condition of the audio data to be identified and the musical notes in the electronic music score according to the difference; determining the positions of notes corresponding to the audio data to be identified in the electronic music score according to the matching condition; wherein the content of the first and second substances,

the number of the acoustic features contained in the first acoustic feature set is the same as the number of the acoustic features of the audio data to be identified.

In the above technical solution, further comprising:

inputting a music score to be converted into a music score conversion model to generate an electronic music score; wherein the content of the first and second substances,

the music score to be converted is a music score which can be identified by an optical character identification technology; the music score conversion model is a model for obtaining an electronic music score, which is obtained by training music score notes and labels of the music score notes as training input data in a machine learning mode; and the labels are used for representing the coding mode of music score notes in the electronic music score.

In the above technical solution, further comprising:

acquiring a plurality of score note representations; the musical score note representation is a picture for representing musical score notes, the picture being a picture recognizable by optical character recognition technology;

obtaining labels of the plurality of music score note representations;

and taking the plurality of music score note representation diagrams and the labels of the plurality of music score note representation diagrams as training input data, and training a music score conversion model capable of converting a music score to be converted into an electronic music score by adopting a machine learning method.

In the above technical solution, further comprising:

and marking page turning marks for notes positioned at the tail of a display page when the electronic music score is displayed.

The embodiment of the second aspect of the present invention provides an audio processing apparatus, including:

the acoustic feature acquisition module is used for acquiring acoustic features of the audio data to be identified, and the acoustic features are used for representing pitches;

and the positioning module is used for determining the positions of the musical notes corresponding to the audio data to be identified in the electronic music score according to the acoustic characteristics of the audio data to be identified.

In an embodiment of the third aspect of the present invention, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the audio processing method according to the embodiment of the first aspect of the present invention.

A fourth aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the audio processing method as provided in the first aspect of the present invention.

The audio processing method, the audio processing device, the electronic equipment and the storage medium provided by the embodiment of the invention realize the positioning of the musical notes in the audio data to be recognized in the electronic music score based on the acoustic characteristics of the audio data to be recognized.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a flowchart of an audio processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart of an audio processing method according to another embodiment of the invention;

FIG. 3 is a flowchart of an audio processing method according to another embodiment of the invention;

FIG. 4 is a flowchart of an audio processing method according to another embodiment of the invention;

FIG. 5 is a diagram of a partial MIDI encoding format;

FIG. 6 is a diagram of an audio processing apparatus according to an embodiment of the present invention;

fig. 7 illustrates a physical structure diagram of an electronic device.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of an audio processing method according to an embodiment of the present invention, and as shown in fig. 1, the audio processing method according to the embodiment of the present invention includes:

step 101, obtaining acoustic characteristics of audio data to be identified.

In the embodiment of the present invention, the audio data to be recognized is music data, such as music data collected at a concert live, and such as music data that has been recorded previously.

An acoustic feature is a feature used to characterize pitch (pitch, i.e., tone or frequency of sound).

By utilizing twelve-tone law in music theory, any audio frequency region spanning 8 degrees is known to be composed of 12 half-tones, so that for audio data of any frequency domain signal, notes represented by interval stationary signals can be composed of 12-dimensional vectors, wherein the value of each dimension represents the probability of the notes appearing in the interval. Thus, in embodiments of the present invention, the acoustic characteristic of a note is a 12-dimensional vector, and the value of each dimension in the 12-dimensional vector represents the frequency of occurrence of the note in the half-tone region corresponding to that dimension.

In the embodiment of the present invention, the audio data to be identified refers to the audio data to be identified after removing noise. In other embodiments of the present invention, the method further includes the step of performing denoising processing on the acquired audio data to obtain the audio data to be identified.

This step may further comprise:

step 101-1, performing framing processing on the audio data to be identified obtained by sampling to obtain frame data corresponding to the audio data to be identified.

When the audio data to be identified obtained by sampling is subjected to framing processing, in order to ensure the stability of signals in a frame, the frame length ranges from 20 milliseconds to 50 milliseconds. However, the length of each frame obtained by framing a segment of audio data should be consistent.

A series of frame data F obtained by framing a continuous audio waveform data S can be represented as: [ f ] of₀,f₁,f₂,……,f_n]。

Step 101-2, converting frame data of the audio data to be identified from a time domain space to a frequency domain space to obtain a spectrogram of the audio data to be identified.

In the embodiment of the present invention, the conversion of the frame data from the time domain space to the frequency domain space may be implemented by using a short-time fourier transform FFT. The method specifically comprises the following steps:

first, it is calculated according to the following formula:

k＝0,1,2,…,N-1。

x [ N ] (N is 0,1,2, …, N-1) is a frame of discrete speech sequence obtained by sampling, N is the frame length, and X [ k ] is a complex series of N points.

Then taking the modulus of X [ k ] to obtain the signal amplitude spectrum | X [ k ] |.

The combination of a plurality of signal magnitude spectra | X [ k ] | at different time sequences forms a spectrogram.

And step 101-3, extracting acoustic features of notes in the audio data to be identified according to the spectrogram of the audio data to be identified.

The meaning of the acoustic features of the musical notes has been described in the foregoing. For ease of understanding, the acoustic characteristics of the musical note are further described herein using mathematical expressions. Suppose to use

A 12-dimensional vector representing a note, where i is the note sequence number.

If the 16 th note is D^#The note is represented by a feature vector as:

this feature vector representation D^#Is a maximum value 0.874.

And step 102, determining the positions of notes corresponding to the audio data to be identified in the electronic music score according to the acoustic characteristics of the audio data to be identified.

In an embodiment of the present invention, the electronic score is a score file in a MIDI format, and in other embodiments of the present invention, the electronic score may also be an electronic score in other formats.

In the embodiment of the present invention, determining the positions of notes in an electronic musical score corresponding to the audio data to be identified according to the acoustic features of the audio data to be identified may be implemented as follows: converting and mapping a plurality of continuous notes obtained from the electronic score to obtain a first set of acoustic features; calculating the difference between the acoustic features of the audio data to be identified and the first acoustic feature set, and determining the matching condition of the audio data to be identified and the musical notes in the electronic music score according to the difference; and determining the positions of the notes corresponding to the audio data to be identified in the electronic music score according to the matching condition.

For example, when matching the acoustic features of the audio data to be recognized with the acoustic features of a score file in MIDI format, to avoid a comparison deviation, window segments may be provided for comparison one by one.

Assuming that a window comprises n frames, when comparing, selecting acoustic feature vectors of the n frames from the audio data to be identified, selecting audio data of the n frames from the music score file in the MIDI format, and then comparing the acoustic features in the audio data to be identified with the acoustic features in the music score file in the MIDI format to obtain a total comparison score in the window. And if the comparison total score in the window is higher than the identification threshold value, the notes in the n frames selected in the audio data to be identified are corresponding to the notes in the n frames selected in the music score file in the MIDI format. Wherein, the size of the identification threshold value can be determined according to practical application.

The mathematical expression of the total comparison score in the window is obtained by comparing the acoustic characteristics of the audio data to be identified with the acoustic characteristics of the music score file in the MIDI format:

assuming that the mth window contains n frames, each frame contains k notes, each frame

The method comprises the following steps:

wherein abs is an absolute value function, F_midFor the energy conversion function, it is responsible for the key of note in MIDI score_midiTransformed and mapped into 12-dimensional vectors.

The score for the mth window containing n frames is:

after the notes corresponding to the audio data to be recognized are found in the electronic music score, the positions of the notes in the electronic music score are the positions of the notes corresponding to the audio data to be recognized.

In other embodiments of the present invention, other methods may also be adopted to determine the positions of the notes in the electronic score corresponding to the audio data to be recognized. For example, acoustic features of the audio data to be recognized are converted into a first note set, a plurality of continuous notes are obtained from the electronic music score to form a second note set, differences between notes in the first note set and notes in the second note set are calculated, and matching conditions of the audio data to be recognized and the notes in the electronic music score are determined according to the differences; determining the positions of notes corresponding to the audio data to be identified in the electronic music score according to the matching condition; wherein the notes in the first note set and the notes in the second note set have the same format and the number of notes in the first note set is the same as the number of notes in the second note set.

The conversion of the acoustic features of the audio data to be recognized into the first set of notes may be performed using known techniques known to those skilled in the art and will therefore not be described further herein.

The audio processing method provided by the embodiment of the invention realizes the positioning of the musical notes in the audio data to be identified in the electronic music score based on the acoustic characteristics of the audio data to be identified.

Based on any of the above embodiments, fig. 2 is a flowchart of an audio processing method according to still another embodiment of the present invention, and as shown in fig. 2, the audio processing method according to still another embodiment of the present invention includes:

step 201, obtaining the acoustic characteristics of the audio data to be identified.

In the embodiment of the invention, the audio data to be identified can be music data collected in real time at a concert site.

Step 202, determining the positions of the musical notes corresponding to the audio data to be identified in the electronic music score according to the acoustic characteristics of the audio data to be identified.

And 203, enabling the positions of the notes corresponding to the audio data to be identified in the electronic music score to coincide with the positions of the notes with the page turning marks in the electronic music score, and performing page turning operation on a display page of the electronic music score.

In a previous step, the localization of the audio data to be recognized in the electronic score has been achieved. When an electronic music score is displayed on a display device, except for the situation that the content of the electronic music score is less and can be displayed on a display screen at one time, the content of the electronic music score is generally required to be displayed in pages. When the content of the audio data to be identified in the electronic music score is at the bottom of one page, page turning processing can be automatically performed on the displayed electronic music score, that is, the content of the next page of the electronic music score is displayed on the screen of the display device.

For convenience of processing, in the embodiment of the present invention, a plurality of notes at the end of each page are marked with a page turning mark when the electronic music score is displayed, and when the position of a note corresponding to the audio data to be recognized in the electronic music score coincides with the position of a note with a page turning mark in the electronic music score, a page turning operation is automatically performed.

The audio processing method provided by the embodiment of the invention can realize the matching of the sound and the content of the electronic music score according to the sound acquired in real time on site, and realizes the automatic page turning of the electronic music score according to the sound on site without human intervention.

Based on any of the above embodiments, fig. 3 is a flowchart of an audio processing method according to another embodiment of the present invention, and as shown in fig. 3, the audio processing method according to another embodiment of the present invention includes:

step 301, inputting the music score to be converted into a music score conversion model to generate an electronic music score.

The music score to be converted refers to a music score file that can be recognized by an OCR (Optical Character Recognition) technology, such as a music score picture in a format of JPG, PNG, GIF, BMP, DOC, PDF, and the like.

The conventional paper music score can be converted into the format capable of being recognized by the OCR technology through scanning, photographing and the like, so that the music score to be converted is obtained.

The music score to be converted is not limited in the format of the music score, and the numbered musical notation, the staff, the guitar staff and the like which are widely applied can be used as the music score to be converted. However, in one conversion process, only the music score to be converted of the same type can be converted.

In the embodiment of the present invention, the electronic score refers to an electronic score in a MIDI (Musical Instrument digital interface) format. In other embodiments of the present invention, the electronic music score may be an electronic music score in other formats.

In the embodiment of the invention, the music score conversion model is obtained by training music score notes and coding labels of the music score notes based on MIDI standard as training input data in a machine learning mode and is used for obtaining a model of an electronic music score.

A score conversion model can only convert scores in a specific format, for example, a score conversion model which can convert staff into electronic scores in the MIDI format, and cannot convert numbered musical notation into electronic scores in the MIDI format.

In the embodiment of the present invention, the score transformation model is trained based on the convolutional neural network ResNet-18, and in other embodiments of the present invention, the score transformation model can be implemented by other techniques known to those skilled in the art.

And step 302, acquiring acoustic characteristics of the audio data to be identified.

Step 303, determining the positions of the musical notes corresponding to the audio data to be identified in the electronic music score according to the acoustic characteristics of the audio data to be identified.

The audio processing method provided by the embodiment of the invention can convert the traditional paper music score into the electronic music score, further match the acoustic characteristics of the audio data to be recognized with the acoustic characteristics of the musical notes in the electronic music score, and realize the correspondence between the musical notes in the audio data to be recognized and the musical notes in the music scores in various forms.

Based on any of the above embodiments, fig. 4 is a flowchart of an audio processing method according to another embodiment of the present invention, and as shown in fig. 4, the audio processing method according to another embodiment of the present invention includes:

step 401, obtaining a plurality of score note representations.

The score notes are the minimum units constituting the score, and the score note representation is a picture for representing the score notes, which should be a picture recognizable by OCR technology, such as pictures in the formats of JPG, PNG, GIF, BMP, DOC, PDF, etc. The score note representation should contain the score note information in its entirety, e.g. for notes in the staff, not only the notes themselves, but also the lines in the staff.

Generally, in practical applications, it is difficult to directly find a score note representation corresponding to a single score note, so the score note representation can be obtained from a score picture in a picture slicing manner or the like.

In the embodiment of the present invention, the number of acquired musical score note representations is about 5000. In other embodiments of the present invention, the number of the representation diagrams of the musical notes can be adjusted according to actual needs.

Step 402, obtaining the label of the representation diagram of the musical score notes.

In an embodiment of the invention, the representation of the notes of the score is labeled according to the MIDI standard. In the MIDI standard, the scale, pitch and corresponding coding format of a note are specified. FIG. 5 is a diagram of a partial MIDI encoding format. According to MIDI standard, the music score note representation can be coded and labeled.

Each score note representation should have a corresponding coding label.

And 403, taking the multiple music score note representation diagrams and the labels of the multiple music score note representation diagrams as training input data, and training a music score conversion model capable of converting a music score to be converted into an electronic music score by adopting a machine learning method.

In this step, the score to be converted refers to a score that can be recognized by optical character recognition technology.

In the embodiment of the invention, a music score note representation diagram and a coding label based on MIDI standard of the music score note representation diagram are used as training input data and input into a convolutional neural network ResNet-18, and iterative training is carried out on the convolutional neural network until the recognition accuracy reaches a recognition accuracy threshold.

In the embodiment of the present invention, the recognition accuracy threshold is 98%, and in other embodiments, the recognition accuracy threshold may be adjusted as needed.

In the embodiment of the invention, the convolutional neural network ResNet-18 is adopted for machine learning, and in other embodiments of the invention, other types of convolutional neural networks such as the convolutional neural networks ResNet-34, ResNet-50, ResNet-101 and ResNet-152 can also be adopted for machine learning.

Step 404, inputting the music score to be converted into the music score conversion model to generate the electronic music score.

In the embodiment of the invention, after the electronic music score is generated, a page turning mark can be further marked on a plurality of notes at the end of each page when the electronic music score is displayed. The page turning mark is helpful for realizing automatic page turning of the electronic music score.

Step 405, obtaining acoustic features of the audio data to be identified.

And 406, determining the positions of the musical notes corresponding to the audio data to be identified in the electronic music score according to the acoustic characteristics of the audio data to be identified.

And 407, overlapping the positions of the notes corresponding to the audio data to be identified in the electronic music score with the positions of the notes with page turning marks in the electronic music score, and performing page turning operation on a display page of the electronic music score.

The audio processing method provided by the embodiment of the invention can convert the traditional paper music score into the electronic music score by establishing the music score conversion model, further realize the correspondence between the musical notes in the audio data to be recognized and the musical notes in the music scores in various forms based on the acoustic characteristics of the audio data to be recognized, and realize the automatic page turning of the electronic music score.

Based on any of the above embodiments, fig. 6 is a schematic diagram of an audio processing apparatus according to an embodiment of the present invention, and as shown in fig. 6, the audio processing apparatus according to the embodiment of the present invention includes:

the acoustic feature acquisition module 601 is configured to acquire an acoustic feature of the audio data to be identified, where the acoustic feature is used to characterize a pitch;

a positioning module 602, configured to determine, according to the acoustic features of the audio data to be identified, positions of notes in the electronic music score corresponding to the audio data to be identified.

The audio processing device provided by the embodiment of the invention realizes the positioning of notes in the audio data to be identified in the electronic music score based on the acoustic characteristics of the audio data to be identified.

Fig. 7 illustrates a physical structure diagram of an electronic device, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may call logic instructions in memory 730 to perform the following method: acquiring acoustic features of audio data to be identified, wherein the acoustic features are used for representing pitches; and determining the positions of notes corresponding to the audio data to be identified in the electronic music score according to the acoustic characteristics of the audio data to be identified.

It should be noted that, when being implemented specifically, the electronic device in this embodiment may be a server, a PC, or other devices, as long as the structure includes the processor 710, the communication interface 720, the memory 730, and the communication bus 740 shown in fig. 7, where the processor 710, the communication interface 720, and the memory 730 complete mutual communication through the communication bus 740, and the processor 710 may call the logic instructions in the memory 730 to execute the above method. The embodiment does not limit the specific implementation form of the electronic device.

In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of performing the methods provided by the above-mentioned method embodiments, for example, comprising: acquiring acoustic features of audio data to be identified, wherein the acoustic features are used for representing pitches; and determining the positions of notes corresponding to the audio data to be identified in the electronic music score according to the acoustic characteristics of the audio data to be identified.

In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, and for example, the method includes: acquiring acoustic features of audio data to be identified, wherein the acoustic features are used for representing pitches; and determining the positions of notes corresponding to the audio data to be identified in the electronic music score according to the acoustic characteristics of the audio data to be identified.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An audio processing method, comprising:

2. The audio processing method according to claim 1, further comprising:

3. The audio processing method according to claim 1, wherein the obtaining the acoustic feature of the audio data to be recognized comprises:

4. The audio processing method of claim 1, wherein the determining the positions of notes in the electronic score corresponding to the audio data to be recognized according to the acoustic features of the audio data to be recognized comprises:

5. The audio processing method of claim 1, further comprising:

6. The audio processing method of claim 5, further comprising:

obtaining labels of the plurality of music score note representations;

7. The audio processing method of claim 5, further comprising:

8. An audio processing apparatus, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the audio processing method according to any of claims 1 to 7 are implemented when the processor executes the program.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the audio processing method according to any one of claims 1 to 7.