CN111950552A

CN111950552A - Method for recognizing southern music score by using computer

Info

Publication number: CN111950552A
Application number: CN202010819712.1A
Authority: CN
Inventors: 徐凌云; 肖继华; 唐文千; 卓佳源; 杨晓琪; 郭奕晗; 武星; 晃国清; 郁抒思
Original assignee: Shanghai Huasheng Intelligent Technology Co Ltd
Current assignee: Shanghai Huasheng Intelligent Technology Co Ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-11-17

Abstract

The invention relates to the technical field of southern music score information acquisition, and discloses an identification method for identifying a southern music score by using a computer, which solves the technical problem that the prior southern music score can not be identified by the computer, and comprises the following steps: firstly, model training: through the steps, obtaining a note identifier; and secondly, image recognition: the image recognition comprises the following steps: s1, loading the image; s2, zooming the image; s3, graying; s4, binarization; s5, correcting the inclination; s6, cutting; s7, cutting characters; s8, identifying the musical note; s9, XML conversion. According to the technical scheme, on the basis of image recognition, a series of preprocessing is carried out on the image, then the image classifier capable of recognizing the special characters of the southern music score is obtained through a convolutional neural network training sample, so that a computer can recognize the southern music score and draw and electronically release the southern music score, and the purposes of copying, popularization and inheritance are achieved.

Description

Method for recognizing southern music score by using computer

Technical Field

The invention relates to the technical field of southern music score information acquisition, in particular to a method for identifying a southern music score by using a computer.

Background

The Fujian Quanzhou south China has been concerned by people as a precious "activating stone" for traditional music. However, under the impact of the multi-element culture of the new century, the complex spectrum and uniqueness of the Nanyin cause the protection and inheritance of the Nanyin to have some difficulties. Firstly, the music score is the music score, and the unique names, fingering and shooting characters of the music score are special and cannot be recognized by a computer, so that the music score is always copied and given by hands. It causes difficulties in identifying, editing and making music scores, and causes high learning cost and high learning threshold, thus causing certain obstruction to popularization and inheritance.

Disclosure of Invention

Aiming at the technical problem that the Nanyin music score provided in the background technology can not be identified by a computer, the method of the invention carries out a series of pre-treatments on the image based on the image identification, and then obtains an image classifier capable of identifying the special characters of the Nanyin music score by training a sample through a Convolutional Neural Network (CNN), so that the computer can identify the Nanyin music score and draw and arrange the electronic version of the Nanyin music score, thereby achieving the purposes of convenient copying, popularization and inheritance.

In order to achieve the purpose, the invention provides the following technical scheme:

an identification method for identifying a southern musical score by using a computer, comprising the steps of:

firstly, model training: through the steps, obtaining a note identifier;

and secondly, image recognition:

the image recognition comprises the following steps:

s1, image loading: loading the scanned whole batch of southern music score images into a memory;

s2, image scaling: adjusting the image to a set size;

s3, graying: processing the color image;

s4, binarization: processing the image into black and white;

s5, tilt correction: for the image with the inclination angle, performing inclination correction on the image;

s6, column cut: performing opening operation on the binary image in the vertical direction, wherein the opening operation is used for eliminating characters in the image, and leaving a vertical frame, so that each column of the music score can be distinguished;

s7, character cutting: obtaining cut and square character images as samples needed by the model;

s8, note identification: outputting the character images preprocessed in the steps from S1 to S8 to a note recognizer obtained by model training, outputting codes corresponding to notes, and finally mapping the codes to the specific note name, fingering and japanning of the south syllable;

s9, XML conversion: and outputting the text into an XML file to form formatted text.

Through the technical scheme, the Nanyin music score recognition algorithm is mainly characterized in that after the images are specially processed, a Convolutional Neural Network (CNN) is used for training samples to obtain a note recognizer capable of recognizing special characters of the Nanyin music score, and therefore the special characters on the Nanyin music score are recognized. The south music score is successfully identified by using a south music score algorithm, and the traditional music scores with different forms and versions are changed into the standard electronic music score. The study cost and threshold of the south music are greatly reduced, so that the precious traditional music 'activating stone' of the south music can be better popularized and inherited.

The invention is further configured to: the model training comprises the following steps:

first step, image scanning: scanning the written southern music score into an image;

secondly, image loading: loading the scanned whole batch of southern music score images into a memory;

thirdly, zooming the image: adjusting the image to a set size;

fourthly, graying: processing the color image;

fifthly, binarization: processing the image into black and white;

sixthly, correcting the inclination: for the image with the inclination angle, performing inclination correction on the image;

seventh step, cutting: performing opening operation on the binary image in the vertical direction, wherein the opening operation is used for eliminating characters in the image, and leaving a vertical frame, so that each column of the music score can be distinguished;

eighth step, character cutting: obtaining cut and square character images as samples needed by the model;

ninth step, note labeling: mapping the character image to a unique ASCII code, labeling the sample, and generating a set of high-quality data set after labeling;

step ten, model training: training a network model by adopting an Adam optimizer and a Cross Entropy (Cross Entropy) loss function, and iteratively setting times;

step eleven, outputting a model result: outputting the optimal model for use in the note recognizer for image recognition in claim 1.

Through the technical scheme, the note recognizer can be obtained through the model training step.

The invention is further configured to: in the image scaling step, the image is adjusted to a uniform size of 2000x 3000.

Through the technical scheme, the images are adjusted to be 2000x3000 in uniform size, so that the operation is convenient.

The invention is further configured to: graying the color image by using an average value method, wherein the average value method comprises the following steps of:

the image processed by the average value method only has a gray image of one channel.

By the technical scheme, when the color image is processed, three channels are required to be processed in sequence, and time overhead is large. Therefore, in order to increase the processing speed of the entire application system, it is necessary to reduce the amount of data to be processed by graying a color image.

The invention is further configured to: determining an optimal threshold value T by an algorithm, wherein the threshold value T is set to be 255 when the optimal threshold value T is larger than the threshold value, and the threshold value T is set to be 0 when the optimal threshold value T is smaller than the threshold value;

the processed image only has black and white colors, so that the gray scale range is divided into a target type and a background type, and the binarization of the image is realized.

Through the technical scheme, the purpose of binarization is to convert the image of the gray scale of the previous step into a black-and-white binary image, so that a cleaned edge contour line can be obtained, and follow-up processing services such as edge extraction, image segmentation, target identification and the like can be better served.

The invention is further configured to: the network model in the model training comprises four convolutional layers, namely two pooling layers and two full-connection layers.

The invention is further configured to: the fixed number of iterations is 1000.

Through the technical scheme, the experiment shows that the optimal model can be output for image recognition when the iteration fixed times is 1000 times.

In conclusion, the invention has the following beneficial effects:

(1) the Nanyin music score recognition algorithm is mainly characterized in that after special processing is carried out on an image, a Convolutional Neural Network (CNN) training sample is used for obtaining a note recognizer capable of recognizing special characters of a Nanyin music score, so that the special characters on the Nanyin music score are recognized;

(2) the south music score is successfully identified by using a south music score algorithm, and the traditional music scores with different forms and different versions are changed into standard electronic music scores;

(3) the study cost and threshold of the south music are greatly reduced, so that the precious traditional music 'activating stone' of the south music can be better popularized and inherited.

Drawings

FIG. 1 is a schematic diagram of a southern pronunciation name;

FIG. 2 is a schematic diagram of a southward pointing approach;

FIG. 3 is a schematic view of the Nanyin swing;

FIG. 4 is a schematic block diagram of model training;

FIG. 5 is a schematic block diagram of an image recognition principle;

FIG. 6 is a schematic view of a projection, i.e., a horizontal projection, of a two-dimensional image on the y-axis;

FIG. 7 is a schematic view of the projection of a two-dimensional image onto the x-axis, i.e., a vertical projection;

FIG. 8 is a schematic diagram of a network training model.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited to these examples.

A method for recognizing the southern music score by computer includes such steps as generating a note recognizer by model training, and recognizing the image of southern music score.

Specifically, as shown in fig. 1, fig. 2, fig. 3 and fig. 4, the model training steps are as follows:

first step, image scanning: the whole algorithm is to identify the specific notes in the written southern music score, so the written southern music score is firstly scanned into images and stored in a computer, and one image is formed by one page.

Secondly, image loading: with the scanned images, we load the images of the southern music score into memory. In the training phase, we load a batch of such images; and in the subsequent identification phase, one sheet is loaded at a time.

Thirdly, zooming the image: adjusting the image to a uniform size of 2000x 3000;

fourthly, graying: when a color image is processed, three channels are often required to be processed in sequence, and the time overhead is large. Therefore, in order to increase the processing speed of the entire application system, it is necessary to reduce the amount of data to be processed by graying a color image. In the RGB model, if R ═ G ═ B, the color represents a gray scale color, where the value of R ═ G ═ B is called the gray scale value, so that each pixel of the gray scale image only needs one byte to store the gray scale value (also called the intensity value, luminance value), and the gray scale range is 0-255. The color image is grayed by four methods, namely a component method, a maximum value method, an average value method and a weighted average method. We chose the mean method:

the processed image is only left with a gray scale image of one channel.

Fifthly, binarization: the purpose of binarization is to convert the image of the gray scale of the previous step into a black-and-white binary image, so that a cleaned edge contour line can be obtained, and the subsequent processing services such as edge extraction, image segmentation, target identification and the like can be better served. The specific method is to make the gray value of each pixel in the pixel matrix of the image be 0 (black) or 255 (white), that is, to make the whole image have only black and white effects. In the gray image, the gray value range is 0-255, an optimal threshold value T is determined by an algorithm, 255 is set when the threshold value T is larger than the threshold value, and 0 is set when the threshold value T is smaller than the threshold value.

Sixthly, correcting the inclination: for images with a tilt angle, before training or subsequent recognition, the images are subjected to tilt correction. Before the inclination correction, we need to do some processing to the image to eliminate the characters in the image. The specific method is to carry out open operation on the processed binary image in the horizontal direction, namely to use symbols for the image I and the structural element S

S represents the opening operation on image I.

The on operation is to erode and then dilate the image. And (3) corrosion: the highlight part in the image is corroded, the field is reduced, and the effect image has a highlight area smaller than that of the original image; when the operation is performed, the adjacent area is replaced by the minimum value, and the highlight area is reduced. Expansion: the highlight part in the image is expanded, the field is expanded, and the effect image has a highlight area larger than that of the original image; when the operation is performed, the adjacent area is replaced by the maximum value, and the highlight area is increased.

Opening operation:

1) the on operation can remove isolated dots, burrs, and the overall position and shape are inconvenient.

2) The on operation is a filter based on geometric operations.

3) Differences in the size of the structuring element will result in different filtering effects.

4) The selection of different structural elements results in different segmentations, i.e. different features are extracted.

After the opening operation, the text part in the image disappears, and a horizontal frame is left. After taking the horizontal borders, we calculate the angles of the score borders and the image borders. By the two-line angle equation:

and after the angle is obtained through calculation, image rotation is carried out, and finally a set of corrected images are formed.

Seventh step, cutting: after the inclination correction, the binary image is subjected to opening operation in the vertical direction, the method is the same as the inclination correction step, only the selected structure S is different, and the vertical frame is left in order to eliminate characters in the image, so that each column of the music score can be distinguished.

Eighth step, character cutting: referring to fig. 6 and 7, after column cutting, a plurality of characters are still in one column, and horizontal rows and vertical rows exist, so that the image cannot be input into a recognizer for recognition, and one-step character cutting is needed. We cut the characters in the horizontal and vertical directions by horizontal and vertical projection. The horizontal projection means: projection of the two-dimensional image on the y-axis, vertical projection refers to: projection of the two-dimensional image on the x-axis. We can see from the projected image that there are many gaps in the two directions, which are reflected in the image as character-to-character gaps, in such a way that we can cut the image into individual characters. Through the image preprocessing, the complexity of the model can be greatly reduced, the performance of the model is improved, and the time for training and subsequent image recognition is shortened.

Ninth step, note labeling: through the preprocessing of the image in the earlier stage, cut and square character images are obtained, and the cut and square character images are samples needed by the final model. Samples are available, but what meaning the samples specifically represent cannot be known by the computer, so we need to label the samples first. Since most of the notes of the south note are characters outside the UTF-8 encoding set, we need to encode the characters and map these special characters to unique ASCII codes. After encoding, we can label the samples. Finally, a set of high-quality data set is generated for model training.

Step ten, model training: referring to FIG. 8, with the data set, we can train the model. The heart of the southern note score recognition algorithm is here. We have invented a southern musical score recognition method based on Convolutional Neural Network (CNN). The network model includes four convolutional layers, two pooling layers, and two fully-connected layers. And (3) rolling layers: the kernel size is 3x3, the moving step size is 2x2, and the activation function is Relu. A pooling layer: kernel size 2x2, step size 2x 2. Full connection layer: and outputting the probability value through a Softmax activation function. And (3) training the network model by adopting an Adam optimizer and a Cross Entropy (Cross Entropy) loss function, and iterating for a fixed number of times, wherein about 1000 times are recommended. And outputting the optimal model for image recognition.

And step eleven, outputting a model training result, namely outputting an optimal model for a note recognizer of subsequent image recognition.

As shown in fig. 1, 2, 3 and 5, the image recognition steps are as follows:

s1, image loading: with the scanned images, we load the images of the southern music score into memory. In the training phase, we load a batch of such images; and in the subsequent identification phase, one sheet is loaded at a time.

S2, image scaling: adjusting the image to a uniform size of 2000x 3000;

s3, graying: when a color image is processed, three channels are often required to be processed in sequence, and the time overhead is large. Therefore, in order to achieve the improvement of the whole application systemFor speed purposes, it is desirable to grayscale color images to reduce the amount of data that needs to be processed. In the RGB model, if R ═ G ═ B, the color represents a gray scale color, where the value of R ═ G ═ B is called the gray scale value, so that each pixel of the gray scale image only needs one byte to store the gray scale value (also called the intensity value, luminance value), and the gray scale range is 0-255. The color image is grayed by four methods, namely a component method, a maximum value method, an average value method and a weighted average method. We chose the mean method:

the processed image is only left with a gray scale image of one channel.

S4, binarization: the purpose of binarization is to convert the image of the gray scale of the previous step into a black-and-white binary image, so that a cleaned edge contour line can be obtained, and the subsequent processing services such as edge extraction, image segmentation, target identification and the like can be better served. The specific method is to make the gray value of each pixel in the pixel matrix of the image be 0 (black) or 255 (white), that is, to make the whole image have only black and white effects. In the gray image, the gray value range is 0-255, an optimal threshold value T is determined by an algorithm, 255 is set when the threshold value T is larger than the threshold value, and 0 is set when the threshold value T is smaller than the threshold value.

S5, tilt correction: for images with a tilt angle, before training or subsequent recognition, the images are subjected to tilt correction. Before the inclination correction, we need to do some processing to the image to eliminate the characters in the image. The specific method is to carry out open operation on the processed binary image in the horizontal direction, namely to use symbols for the image I and the structural element S

S represents the opening operation on image I.

Opening operation:

2) The on operation is a filter based on geometric operations.

S6, column cut: after the inclination correction, the binary image is subjected to opening operation in the vertical direction, the method is the same as the inclination correction step, only the selected structure S is different, and the vertical frame is left in order to eliminate characters in the image, so that each column of the music score can be distinguished.

S7, character cutting: referring to fig. 6 and 7, after column cutting, a plurality of characters are still in one column, and horizontal rows and vertical rows exist, so that the image cannot be input into a recognizer for recognition, and one-step character cutting is needed. We cut the characters in the horizontal and vertical directions by horizontal and vertical projection. The horizontal projection means: projection of the two-dimensional image on the y-axis, vertical projection refers to: projection of the two-dimensional image on the x-axis. We can see from the projected image that there are many gaps in the two directions, which are reflected in the image as character-to-character gaps, in such a way that we can cut the image into individual characters. Through the image preprocessing, the complexity of the model can be greatly reduced, the performance of the model is improved, and the time for training and subsequent image recognition is shortened.

S8, note identification: and outputting the character images to a note recognizer obtained by model training, outputting codes corresponding to notes, and finally mapping the codes to the specific sound name, fingering and japanning of the south phonetics.

S9, XML conversion: the recognized notes, together with their position information in the score, are output to an XML file, forming formatted text.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. A recognition method for recognizing a southern musical score by using a computer is characterized by comprising the following steps:

firstly, model training: through the steps, obtaining a note identifier;

and secondly, image recognition:

the image recognition comprises the following steps:

s2, image scaling: adjusting the image to a set size;

s3, graying: processing the color image to convert the color image into a gray image with one channel;

s4, binarization: the image is processed into black and white, so that the gray scale range is divided into a target class and a background class, and the binaryzation of the image is realized;

s7, character cutting: after column cutting, a plurality of characters are arranged in the horizontal row and the vertical row of the single column, and the characters in the horizontal direction and the vertical direction are cut through horizontal projection and vertical projection to obtain character images which are cut well and are square and upright, and the character images are used as samples needed by a model;

2. The method for recognizing a southern musical score according to claim 1, wherein:

the model training comprises the following steps:

first step, image scanning: scanning the written southern music score into images, and storing the images on a computer, wherein one image is an image;

thirdly, zooming the image: adjusting the image to a set size;

fourthly, graying: processing the color image to convert the color image into a gray image with one channel;

fifthly, binarization: the image is processed into black and white, so that the gray scale range is divided into a target class and a background class, and the binaryzation of the image is realized;

eighth step, character cutting: after column cutting, a plurality of characters are arranged in the horizontal row and the vertical row of the single column, and the characters in the horizontal direction and the vertical direction are cut through horizontal projection and vertical projection to obtain character images which are cut well and are square and upright, and the character images are used as samples needed by a model;

step ten, model training: training the network model by adopting an Adam optimizer and a cross entropy loss function, and iteratively setting times;

3. A recognition method for recognizing a southern musical score using a computer according to claim 1 or 2, wherein: in the image scaling step, the image is adjusted to a uniform size of 2000x 3000.

4. A recognition method for recognizing a southern musical score using a computer according to claim 1 or 2, wherein: graying the color image in the graying step to reduce the amount of data to be processed, in the RGIn the B model, the value of R, G, B is called gray value, and the color image is grayed by selecting an average value method:

5. A recognition method for recognizing a southern musical score using a computer according to claim 1 or 2, wherein: the binarization step is specifically to make the gray value of each pixel in the pixel matrix of the image be 0, namely black, or 255, namely white, namely to make the whole image have the effect of only black and white; determining an optimal threshold value T by using an algorithm, wherein the range of a gray value in a grayed image is 0-255, 255 is set when the threshold value is larger than the optimal threshold value, and 0 is set when the threshold value is smaller than the optimal threshold value;

6. The method for recognizing a southern musical score using a computer as claimed in claim 2, wherein: the network model in the model training comprises four convolutional layers, namely two pooling layers and two full-connection layers;

and (3) rolling layers: the kernel size is 3x3, the moving step size is 2x2, and the activation function is Relu;

a pooling layer: kernel size 2x2, step size 2x 2;

full connection layer: outputting a probability value through a Softmax activation function;

and training the network model by adopting an Adam optimizer and a cross entropy loss function, iterating for a fixed number of times, and outputting the most optimal model for image recognition.

7. The method of claim 6, wherein the step of identifying the southern musical score comprises: the fixed number of iterations is 1000.