CN117253240B - Numbered musical notation extracting and converting method based on image recognition technology - Google Patents

Numbered musical notation extracting and converting method based on image recognition technology Download PDF

Info

Publication number
CN117253240B
CN117253240B CN202311110239.XA CN202311110239A CN117253240B CN 117253240 B CN117253240 B CN 117253240B CN 202311110239 A CN202311110239 A CN 202311110239A CN 117253240 B CN117253240 B CN 117253240B
Authority
CN
China
Prior art keywords
row
pixel
content
image
black
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311110239.XA
Other languages
Chinese (zh)
Other versions
CN117253240A (en
Inventor
林义尊
任世龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN202311110239.XA priority Critical patent/CN117253240B/en
Publication of CN117253240A publication Critical patent/CN117253240A/en
Application granted granted Critical
Publication of CN117253240B publication Critical patent/CN117253240B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/30Character recognition based on the type of data
    • G06V30/304Music notations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The patent provides a rapid, accurate and universal numbered musical notation image recognition and conversion method. The method comprises the steps of analyzing a numbered musical notation image, positioning required content by an algorithm after image cleaning, extracting the content by a character recognition technology, recognizing and supplementing key information based on the position of the content, adjusting the adjustability of a music score and outputting a PDF file; the method has the characteristics of convenience in use, high accuracy, high adaptability, convenience in adjustment and high operation efficiency, and has a good recognition effect on images with low resolution and doped watermarks of the main stream on the network.

Description

Numbered musical notation extracting and converting method based on image recognition technology
Technical Field
The invention relates to the field of image recognition, in particular to a numbered musical notation extraction and conversion method based on an image recognition technology.
Background
In recent years, image processing and precise recognition techniques have become mature. Music, the passage of time; and the score is stored for a long time. The prosperity of western serious music is that the record and inheritance of the staff are not separated; and the traditional musical instrument in China uses numbered musical notation. If vast staff users wish to research and play Chinese traditional music written in numbered musical notation, the music score is manually translated and arranged, which definitely greatly improves the threshold; if the recognition translation software can replace manual work, the Chinese traditional music written in numbered musical notation can be paid more attention to and researched.
The development of optical score recognition (OMR) is now focused on staff, while recognition of numbered musical notation still requires more research. If the numbered musical notation is extracted by directly using an OCR tool, not only a lot of characteristic information (such as upper and lower adding points, special symbols and the like) in the musical score is lost, but also the content of the musical score cannot be separated from the information such as titles, watermarks and the like; even, many OCR tools are not able to accurately recognize numbers here due to the presence of a large number of special elements in the numbered musical notation. Mature document OCR extraction tools, although numerous, are not easy to use in the area of numbered musical notation recognition. The algorithm of the numbered musical notation recognition is cold, a good competitive environment with hundreds of flowers in the OCR market is not available, and mature and available numbered musical notation recognition software or algorithm is difficult to find.
Disclosure of Invention
In order to enable music content written by a numbered musical notation to be widely used and spread more conveniently, the invention provides a numbered musical notation extraction and conversion method based on an image recognition technology, which has the characteristics of rapidness, accuracy and universality.
The specific implementation scheme of the invention is as follows:
a numbered musical notation extraction and conversion method based on an image recognition technology recognizes and converts a numbered musical notation into a staff, and the realization process is as follows:
(1) Preprocessing a picture: scaling the resolution of the original content of the numbered musical notation to a set range, filtering color watermarks and noise, enhancing the bar lines, and binarizing;
(2) Content positioning: extracting transverse features in the image, performing feature cleaning to obtain the position of a line where the main content is located, and dividing the line where the main content is located into sections;
(3) Subject content recognition OCR: using an OCR model aiming at training to identify numbers and symbols on the segmented bars and the decorative sound symbols, and filtering false identification results according to a music score rule;
(4) Rhythm recognition and adjustment: identifying an underline and upper and lower dots based on the position of each number, judging the number of the music in combination with the integral content, and adjusting the music content based on the user-specified tonality;
(5) Outputting a file: and arranging the music score into a format of open source staff typesetting software Lilypond, and outputting the staff file through the open source staff typesetting software.
The invention realizes that the image recognition technology replaces manpower to extract and convert the music score content in the numbered musical notation image into the staff, so that staff users can better study and play the music recorded in the numbered musical notation format.
Preferably, the specific process of enhancing the bar line in the picture preprocessing in the step (1) is as follows:
(11) The method comprises the steps of performing preliminary binarization on an image by adopting color watermark removal and binarization cleaning, marking the image by using a connected domain algorithm, only reserving connected domains with the width of 2-22 pixels and the height of 88-188 pixels, regarding the connected domains as a small section line, and covering the connected domains by using a pure black rectangle, thereby realizing marking and enhancement of the connected domains.
Preferably, the process of performing preliminary binarization on the image by adopting color watermark removal and binarization cleaning comprises the following steps:
first: in an 8bit RGB image file, each pixel is traversed and its "trichromatic sum" and "trichromatic range" is calculated, defined as follows:
variable name variable description
Summing the three values of R, G and B in the three-color summing unit pixel
The maximum difference value of the two phase subtraction between R, G and B values in the three-color range unit pixel
Second,: traversing each pixel point, classifying a part of pixels into black according to two values of three-color addition and three-color range, and converting the pixels into pure black, wherein the specific filtering logic is as follows: firstly, classifying pixel points with R, G and B values lower than 138 into black; thereafter, pixels having "tri-color sum" below 220 and "tri-color range" below 78 are also classified as black, and the process is cycled to progressively increase the "tri-color sum" threshold while reducing the "tri-color range" tolerance; finally, the pixels with the three-color range less than 25 are also classified into black;
third,: and deleting the connected domain with the area lower than the set threshold value by using a connected domain algorithm, so as to filter incomplete noise points in the previous step of the image.
Preferably, the specific process of content positioning in the step (2) is as follows:
(21) Removing titles, watermarks and lyrics by using a ten-step extraction method, and positioning the positions of rows where digital notes are located;
(22) Each bar is segmented according to bar line characteristics.
Preferably, the ten-step extraction method comprises the following steps:
s1: traversing each row of pixel points in the original image, extracting three characteristics of the number of black pixel points in each row and the appearance positions of the first black pixel points on the left side and the right side, wherein the three characteristics are expressed as follows:
variable name variable description
Number of black pixel dots included in a row pixel dot number single row
Counting from left to right in a single row of leftmost pixel positions of a row, where a first black pixel appears
The rightmost pixel location in the row is counted from right to left in a single row, where the first black pixel appears
S2: extracting the number of black pixels contained in each column in the image, removing the excessively low value, taking an average value, and regarding a first column of which the number of black pixels exceeds the average value from the left side and the right side as the left and right boundaries of the music score content; traversing all rows, and marking the row with the left-most pixel position obviously offset from the left boundary calculated above; meanwhile, resetting the right-most pixel position of the marked row to the right boundary calculated as above;
s3: the right-most pixel position of the row is adjusted to be continuous enough, the right-most pixel position of the row between two adjacent rows is smaller than 8, more than 38 continuous rows are continued, and the right-most pixel position of the row is adjusted to the right boundary estimated in the S2 step, so that the content row which cannot fill the whole row is prevented from being filtered; thereafter, filtering out the marked row in the S2;
s4: calculating the average value of the rightmost pixel positions of the rows, regarding the rows with the rightmost pixel positions of the rows deviated from the average value as non-main content rows, and filtering the non-main content rows;
s5: counting unfiltered lines, regarding scattered discontinuous lines, namely lines within 38 continuous lines of pixels, as image noise, and filtering the scattered discontinuous lines;
s6: counting unfiltered rows again, calculating the average value of the rightmost pixel positions of the rows, and filtering the rows with excessive offset;
s7: counting unfiltered lines again, filtering scattered discontinuous lines, namely, the lines within 26 continuous lines of pixels, which are regarded as image noise;
s8: traversing each remaining section of continuous line, respectively weighting and calculating the average 'line pixel point number', and accordingly cutting the head and the tail of the continuous line to ensure accurate positioning numbers;
s9: for the row near the top, counting the number of black pixel points in the middle part of the row based on the left and right boundaries of the content calculated in the step S2; comprehensively weighting the number of the row pixel points and the number of the black pixel points in the middle part, and judging whether a composer information row exists in a row close to the top by comparing the average value of the global row pixel points; if so, filtering the mixture;
s10: the unfiltered line is again counted, the average number of sustain lines is calculated, and scattered, discontinuous lines, i.e. lines deviating from the average are treated as image noise, which is filtered out.
Preferably, the implementation process of the OCR of the subject content recognition in the step (3) is as follows:
(1) Recognition using the trained OCR model;
(2) Aiming at common recognition errors, correcting the recognition result;
(3) Further identifying the decorative sound;
the training Tesseact OCR model is used for identifying the main body content, the training of the model uses about 4000 characters extracted from the numbered pictures on the network, and the model has better identification capability for special symbols such as transverse lines, points, brackets, decorative sound symbols and the like besides numbers between 0 and 7. For the segmented sections, the model can accurately identify the content and simultaneously accurately give the position coordinates of the content, which is important for the next step of rhythm identification and tuning.
Preferably, the specific process of rhythm recognition and adjustment in the step (4) is as follows:
(41) Identifying an underline that exists below each digit based on its position;
(42) Identifying points that exist above and below the number and the position of the underline;
(43) Determining a signature of the score based on the identified content of each section;
(44) The content of the score is adjusted based on the user-specified key.
Preferably, the step (41) identifies, based on the position of each digit, the flow of the underline existing thereunder:
based on the position of each digit, each section of image slice is intercepted upwards and downwards in the picture for analysis; the slice height is as follows: extending down by 40 pixels from the bottom of the number; the slice width is: based on the positions of the left end and the right end of the number, the position is finely adjusted through the following logic; if the last note in the bar does not have an underline, shortening the slicing range from the left; if the last note in the bar is underlined, shortening the slicing range from the right;
calculating the number of white pixel points contained in each row in the slice range row by row, recording when the number of white pixel points contained in two continuous rows changes obviously, and calculating the number and thickness of underlines after traversing; thereafter, comparing the thicknesses of the underscores included in the previous number, if the thicknesses of the underscores and the underscores have obvious differences, but the number of the underscores inferred is the same, judging that the underscores are stuck due to low resolution, and simultaneously adjusting the number of the underscores;
through the underline identification algorithm, the problems that the end to end of the underline is uneven, noise points are contained and the underline is adhered to each other under low resolution in typesetting are solved, and the precise identification of the underline in the numbered musical notation image is realized.
Preferably, the step (42) identifies the flow of points existing above and below the number and the position of the underline:
extracting connected domains in the slices above the numbers and below the underlines by using a connected domain algorithm; filtering, and only reserving a communication domain with the length and width of 8-22 pixel points and the actual area of more than 60% of the product of the length and width of the communication domain, and marking the communication domain as an upper adding point and a lower adding point; through the algorithm, the upper and lower adding points in the numbered musical notation image are accurately identified.
The beneficial effects of the invention are as follows:
1. by adopting the method, the user can complete identification by only providing the numbered musical notation picture without manually completing operations such as cutting titles and lyrics, marking note positions and the like.
2. For the mainstream numbered musical notation picture resources on the network, the method has better identification compatibility: such as low resolution, watermarked, lyrics containing images.
3. The user can adjust the tune of the music by himself and change the tune as required.
On the basis of basic thought, the invention realizes the filtration of titles, composer information and lyrics according to actual demands, and facilitates the daily use flow; meanwhile, the stability of the algorithm is optimized pertinently, pictures with lower resolution and large color watermarks and certain adhesion of detail contents can still be accurately identified, and the detection and correction aiming at partial error identification are introduced, so that the actual usability of the algorithm is greatly improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a diagram of a music score of a blue and white porcelain numbered musical notation of a song.
Fig. 3 is an effect diagram after image preprocessing.
Fig. 4 is a schematic drawing of extracted transverse features.
Fig. 5 is a schematic representation of the filtration characteristics using a ten step filtration process.
FIG. 6 is a diagram of filtered content row and bar positioning.
Fig. 7 is an effect diagram of extracting main content.
Fig. 8 is an effect display schematic diagram of text extraction.
Fig. 9 is a schematic diagram of effects of rhythm recognition and tuning.
FIG. 10 is an output LilyPond file presentation.
Fig. 11 is a diagram of a staff file of a song blue and white porcelain conversion output.
Fig. 12 is a diagram of a musical score of a small white boat.
Fig. 13 is a schematic diagram of a process of a musical score of a song boat.
Fig. 14 is a diagram of a final output staff file for a song xiaobai boat score.
Fig. 15 is a diagram of the original music score of the yellow river numbered musical notation of the song guard.
Fig. 16 is a schematic diagram of a song guard yellow river score process.
Fig. 17 is a diagram of a final output staff file of a song guard yellow river score.
Detailed Description
The invention is further described below with reference to the drawings and detailed description.
A content identification and staff output method based on a numbered musical notation image, as shown in figure 1, comprises the following steps:
1. preprocessing a picture: scaling the resolution of original image content to a proper range, filtering color watermark and noise, enhancing the bar line, and binarizing.
2. Content positioning: and extracting transverse features in the image, gradually cleaning the features to obtain the positions of lines where main contents are located, and dividing the lines of the contents into sections.
3. Subject content recognition (OCR): and recognizing numbers and symbols on the intra-measure and decorative sound symbols by using an OCR model aiming at training, and filtering out wrong recognition results according to a music score rule.
4. Rhythm recognition and adjustment: based on the position of each number, an underline and an up-down dot are identified using a directed algorithm, the beat of the musical composition is judged in combination with the overall content, and the score content is adjusted based on the user-specified tonality.
5. Outputting a file: the above contents are put in the format of open source staff typesetting software (Lilypond), and a staff PDF file is output by the software.
Specific description of the corresponding model and algorithm:
image preprocessing:
(1) Image resolution adjustment
(2) Bar line reinforcement
(3) Watermark removal and binarization
In this embodiment, firstly, the input pictures are uniformly scaled to about 300PPI, so that the accuracy of the character recognition model is optimized, and meanwhile, the standardization of the threshold value in the subsequent processing process is realized. Thereafter, longitudinal bar lines are identified and smear enhanced to better preserve the key content of the "bar lines" in the score. Finally, carrying out color watermark removal and binarization cleaning, and successfully filtering out the watermark in the picture while clearly retaining the music score content by using a binarization algorithm aimed at writing.
The following is a flow of "numbered musical notation image resolution adjustment":
the music scores with different lengths have different proportions and sizes, but as the music score is generally designed by printing on A4 paper, the number of pixels on the transverse edge is at least 2362 under the condition that the A4 paper is ensured to be 300PPI through calculation in the embodiment; in order to ensure that the stippling details in the image are sufficiently clear, through practical tests, the ideal transverse pixel interval of the numbered musical notation image is set at 2800-3000, and the original image is adjusted to proper resolution through scaling in equal proportion.
The following is a "bar line enhancement" procedure:
the image is subjected to preliminary binarization by a more relaxed color watermark removal and binarization cleaning algorithm (see below), and then the image is marked by a connected domain algorithm, only connected domains with the width of 2-22 and the height of 88-188 pixels are reserved, the connected domains are regarded as small section lines, and the connected domains are covered by a pure black rectangle, so that marking and enhancement of the connected domains are realized.
The following is the process of "color watermark removal and binarization cleaning":
1: first, in an 8bit RGB image file, we walk through each pixel and calculate its "trichromatic sum" and "trichromatic range", which is defined as follows:
variable name variable description
The sum of three values of R, G and B in the three-color sum unit pixel.
The three values R, G and B in the three-color range unit pixel are subtracted by two to obtain the maximum difference value.
2: traversing each pixel point, classifying partial pixels into black according to the two values, and converting the partial pixels into pure black, wherein the specific filtering logic is as follows: firstly, classifying pixel points with R, G and B values lower than 138 into black; thereafter, pixels having "tri-color sum" below 220 and "tri-color range" below 78 are also classified as black, and the process is cycled to progressively increase the "tri-color sum" threshold while reducing the "tri-color range" tolerance; finally, pixels with a "three-color range" below 25 are also classified as black.
3: and deleting the connected domain with the area lower than a certain threshold value by using a connected domain algorithm, so as to filter incomplete noise points in the previous step of the image.
Aiming at the actual situations that handwriting existing on a numbered musical notation is hazy and grey, watermark is interspersed with content and certain dyeing is possibly caused to the content, the color watermark removing and binarizing algorithm can achieve ideal effects.
After the operation, the embodiment obtains a cleaner binarized numbered musical notation picture, enhances the characteristics of the sections of the picture, and is convenient for subsequent positioning and accurate content extraction.
Content positioning:
(1) A ten step extraction method is used to exclude titles, watermarks, lyrics and locate the positions of the rows where the digital notes are located.
(2) Each bar is segmented according to bar line characteristics.
In this step, the main content in the music score is located through an algorithm, and is precisely divided, so that the next character recognition is facilitated. Here we use a "ten step extraction": based on the black pixel point characteristics of each row, the method realizes the function of removing interference information such as titles, composers and the like through ten steps of filtering, and can accurately position the content of each row. And then, aiming at the connected domain in the row, accurately identifying the bar line by the redeployment algorithm, and further dividing the content row into bar blocks, thereby realizing the accurate extraction of the main content.
The following is a flow of "ten step extraction method":
1: traversing each row of pixel points in the original image, and extracting three characteristics of the number of black pixel points in each row and the positions of the first black pixel points on the left side and the right side, wherein the three characteristics are as follows:
variable name variable description
The number of row pixel dots the number of black pixel dots included in a single row.
The leftmost pixel position of the row counts from left to right in a single row, where the first black pixel appears.
The rightmost pixel position in the row is counted from right to left in a single row, where the first black pixel appears.
2: and extracting the number of black pixels contained in each column in the image, removing the excessively low value, taking an average value, and taking a first column of which the number of black pixels is larger than the average value from the left side and the right side as the left and right boundaries of the music score content. Traversing all rows, and marking the row with the left-most pixel position obviously offset from the left boundary calculated above; at the same time, the "row right-most pixel position" of the marked row is reset to the right boundary as calculated above.
3: the right-most pixel position of the row is adjusted to be enough continuous row (the right-most pixel position of the row between two adjacent rows is less than 8 and continuous row above 38 rows) and the right-most pixel position of the row is adjusted to the right boundary estimated in the step 2, so that the content row which cannot fill the whole row is prevented from being filtered. Thereafter, the marked rows in step 2 are filtered out.
4: the average value of the "right-most pixel position of the row" is calculated, and the row whose "right-most pixel position of the row" deviates from the average value is regarded as a non-main content row, and is filtered out.
5: the unfiltered line is counted and the scattered, discontinuous line (within 38 lines of pixels in duration) is treated as image noise, which is filtered out.
6: (class 4) again counting unfiltered rows, calculating the average of the "right most pixel positions of rows" and filtering out the rows that are too far offset.
7: (class 5) the unfiltered lines are again counted and scattered, discontinuous lines (within 26 lines of pixels in duration) are treated as image noise, which is filtered out.
8: traversing each remaining continuous line, respectively weighting and calculating the average 'line pixel point number', and accordingly cutting the head and the tail of the continuous line to ensure accurate positioning of numbers.
9: counting the number of black pixels in the middle part of the row near the top based on the left and right boundaries of the content calculated in the step 2; comprehensively weighting the number of the row pixel points and the number of the black pixel points in the middle part, and judging whether a composer information row exists in a row close to the top by comparing the average value of the global row pixel points; if present, it is filtered out.
10: (class 5) the unfiltered line is again counted, the average number of sustain lines is calculated, and scattered, discontinuous lines (with sustain lines deviating from the average) are regarded as image noise, which is filtered out.
Subject content recognition (OCR):
(1) Recognition using trained OCR models
(2) Correcting the recognition result aiming at common recognition errors
(3) For decorative sounds, further recognition is carried out
At this step, the present embodiment uses the Tesseact OCR model for training to identify subject content. The training of the model uses 4000 characters extracted from the numbered pictures of the numbered musical notation on the network, and besides the numbers between 0 and 7, the model has better recognition capability for special symbols such as transverse lines, points, brackets, decorative sound symbols and the like. For the segmented sections, the model can accurately identify the content and simultaneously accurately give the position coordinates of the content, which is important for the next step of rhythm identification and tuning.
Rhythm recognition and adjustment:
(1) Based on the location of each digit, the underlines that exist below it are identified.
(2) Based on the number and the position of the underline, the points that exist above and below are identified.
(3) Based on the content identified per section, a signature of the score is determined.
(4) The content of the score is adjusted based on the user-specified key.
At this step, the present embodiment formally converts the recognized numbers into notes by deploying the recognition algorithm. Extracting a small area above and below each number, identifying transverse lines and points in the small area, adjusting the area and an identification threshold according to logic of front and rear identification results, and avoiding error identification of easily-confused symbol fragments through an algorithm; thereafter, musical composition is beaten by the identified content, and the musical composition is adjusted in accordance with the user input requirements.
The following is a flow of "underline identification":
based on the position of each digit, a section of image slice is taken up and down in the picture for analysis. The slice height is 'from the bottom of the number, the slice is fixedly extended downwards by 40 pixels'; the slice width is 'based on the positions of the left end and the right end of the number', and is finely adjusted through the following logic: if the last note in the bar does not have an underline, shortening the slicing range from the left; if there is an underline on the last note in the bar, the slicing range is shortened from the right. "
After that, the number of white pixel points contained in each row in the slice range is calculated row by row, when the number of white pixel points contained in two continuous rows is obviously changed, recording is carried out, and the number and thickness of underlines are calculated after traversing is finished; thereafter, the thicknesses of the underlines included in the preceding number are compared, and if the thicknesses of the two are significantly different, but the number of the underlines estimated is the same, it is determined that the low resolution results in the blocking of the underlines, and the number of the underlines is adjusted.
Through the underline identification algorithm, the problems that the end to end of the underline is uneven, noise points are contained and the underline is adhered to each other under low resolution in typesetting are solved, and the precise identification of the underline in the numbered musical notation image is realized.
The following is the flow of "up and down plus point identification":
extracting connected domains in the slices above the numbers and below the underlines by using a connected domain algorithm; and filtering, namely, only a communication domain with the length and width being within 8-22 pixel points and the actual area being more than 60% of the product of the length and width is reserved, and the communication domain is marked as an upper adding point and a lower adding point. Through the algorithm, the accurate identification of the upper and lower adding points in the numbered musical notation image is realized.
In order to enable music content written by a numbered musical notation to be widely used and spread more conveniently, the invention provides a rapid, accurate and general numbered musical notation identification and conversion method.
The content extraction flow is as shown in fig. 2-6: wherein fig. 3 shows "image preprocessing", fig. 4-7 shows "content localization", fig. 8 shows "subject content recognition", fig. 9-10 shows "rhythm recognition and pitch control", and the conversion result is shown in fig. 11.
Fig. 12-17 are graphs showing further practical operation effects of the method.
The embodiments of the present invention described above do not limit the scope of the present invention. Any modifications, equivalent substitutions and improvements made within the spirit principles of the present invention should be included in the scope of the claims of the present invention.

Claims (8)

1. A numbered musical notation extraction and conversion method based on an image recognition technology is characterized in that numbered musical notation is recognized and converted into a staff, and the realization process is as follows:
(1) Preprocessing a picture: scaling the resolution of the original content of the numbered musical notation to a set range, filtering color watermarks and noise, enhancing the bar lines, and binarizing;
(2) Content positioning: extracting transverse features in the image, performing feature cleaning to obtain the position of a line where the main content is located, and dividing the line where the main content is located into sections;
(3) And (3) main body content identification: using an OCR model aiming at training to identify numbers and symbols on the segmented bars and the decorative sound symbols, and filtering false identification results according to a music score rule;
(4) Rhythm recognition and adjustment: identifying an underline and upper and lower dots based on the position of each number, judging the number of the music in combination with the integral content, and adjusting the music content based on the user-specified tonality;
(5) Outputting a staff file: arranging the music score into a format of open-source staff typesetting software Lilypond, and outputting a staff file through the open-source staff typesetting software;
the specific process of enhancing the bar line in the picture preprocessing in the step (1) is as follows: the method comprises the steps of performing preliminary binarization on an image by adopting color watermark removal and binarization cleaning, marking the image by using a connected domain algorithm, only reserving connected domains with the width of 2-22 pixels and the height of 88-188 pixels, regarding the connected domains as a small section line, and covering the connected domains by using a pure black rectangle, thereby realizing marking and enhancement of the connected domains.
2. The method for extracting and converting numbered musical notation based on image recognition technology according to claim 1, wherein the process of performing preliminary binarization on the image by adopting color watermark removal and binarization cleaning is as follows:
first: in an 8bit RGB image file, each pixel is traversed and its "trichromatic sum" and "trichromatic range" is calculated, defined as follows:
variable name Variable description Trichromatic addition Addition of three values of R, G and B in unit pixel Extremely poor of three colors Maximum difference value of two-phase subtraction between R, G and B values in unit pixel
Second,: traversing each pixel point, classifying a part of pixels into black according to two values of three-color addition and three-color range, and converting the pixels into pure black, wherein the specific filtering logic is as follows: firstly, classifying pixel points with R, G and B values lower than 138 into black; thereafter, pixels having "tri-color sum" below 220 and "tri-color range" below 78 are also classified as black, and the process is cycled to progressively increase the "tri-color sum" threshold while reducing the "tri-color range" tolerance; finally, the pixels with the three-color range less than 25 are also classified into black;
third,: and deleting the connected domain with the area lower than the set threshold value by using a connected domain algorithm, so as to filter incomplete noise points in the previous step of the image.
3. The method for extracting and converting numbered musical notation based on image recognition technology according to any one of claims 1-2, wherein the specific process of content positioning in step (2) is as follows:
(21) Removing titles, watermarks and lyrics by using a ten-step extraction method, and positioning the positions of rows where digital notes are located;
(22) Each bar is segmented according to bar line characteristics.
4. The method for extracting and converting a numbered musical notation based on the image recognition technology according to claim 3, wherein the ten-step extraction method comprises the following steps:
s1: traversing each row of pixel points in the original image, extracting three characteristics of the number of black pixel points in each row and the appearance positions of the first black pixel points on the left side and the right side, wherein the three characteristics are expressed as follows:
variable name Variable description Line pixel count Number of black pixel dots included in a single line Leftmost pixel location of row Counting from left to right in a single row, where the first black pixel appears Right-most row pixel location Counting from right to left in a single row, where the first black pixel appears
S2: extracting the number of black pixels contained in each column in the image, removing the excessively low value, taking an average value, and regarding a first column of which the number of black pixels exceeds the average value from the left side and the right side as the left and right boundaries of the music score content; traversing all rows, and marking the offset of the leftmost pixel position of the row, namely the row of the left boundary calculated above; meanwhile, resetting the right-most pixel position of the marked row to the right boundary calculated as above;
s3: the right-most pixel position of the row is adjusted to be continuous enough, the right-most pixel position of the row between two adjacent rows is smaller than 8, more than 38 continuous rows are continued, and the right-most pixel position of the row is adjusted to the right boundary estimated in the S2 step, so that the content row which cannot fill the whole row is prevented from being filtered; thereafter, filtering out the marked row in the S2;
s4: calculating the average value of the rightmost pixel positions of the rows, regarding the rows with the rightmost pixel positions of the rows deviated from the average value as non-main content rows, and filtering the non-main content rows;
s5: counting unfiltered lines, regarding scattered discontinuous lines, namely lines within 38 continuous lines of pixels, as image noise, and filtering the scattered discontinuous lines;
s6: counting unfiltered rows again, calculating the average value of the rightmost pixel positions of the rows, and filtering the rows with excessive offset;
s7: counting unfiltered lines again, filtering scattered discontinuous lines, namely, the lines within 26 continuous lines of pixels, which are regarded as image noise;
s8: traversing each remaining section of continuous line, respectively weighting and calculating the average 'line pixel point number', and accordingly cutting the head and the tail of the continuous line to ensure accurate positioning numbers;
s9: for the row near the top, counting the number of black pixel points in the middle part of the row based on the left and right boundaries of the content calculated in the step S2; comprehensively weighting the number of the row pixel points and the number of the black pixel points in the middle part, and judging whether a composer information row exists in a row close to the top by comparing the average value of the global row pixel points; if so, filtering the mixture;
s10: the unfiltered line is again counted, the average number of sustain lines is calculated, and scattered, discontinuous lines, i.e. lines deviating from the average are treated as image noise, which is filtered out.
5. The method for extracting and converting numbered musical notation based on image recognition technology according to claim 4, wherein the implementation process of the main body content recognition in the step (3) is as follows:
(1) Identifying using the trained OCR model pair;
(2) Aiming at common recognition errors, correcting the recognition result;
(3) Further identifying the decorative sound;
recognizing the main content by using a Tesseact OCR model aiming at training, wherein the training of the model uses about 4000 characters extracted from a numbered picture on a network, and the model has recognition capability for special symbols of transverse lines, points, brackets and decorative sound symbols except numbers between 0 and 7; for the segmented subsections, the model can accurately give out the position coordinates of the content while accurately identifying the content, which is very important for the next step of rhythm identification and tuning.
6. The method for extracting and converting numbered musical notation based on the image recognition technology according to claim 5, wherein the specific process of rhythm recognition and tuning in the step (4) is as follows:
(41) Identifying an underline that exists below each digit based on its position;
(42) Identifying points that exist above and below the number and the position of the underline;
(43) Determining a signature of the score based on the identified content of each section;
(44) The content of the score is adjusted based on the user-specified key.
7. The method of claim 6, wherein the step (41) identifies the flow of the underline existing below each digit based on the position of the digit:
based on the position of each digit, each section of image slice is intercepted upwards and downwards in the picture for analysis; the slice height is as follows: extending down by 40 pixels from the bottom of the number; the slice width is: based on the positions of the left end and the right end of the number, the position is finely adjusted through the following logic; if the last note in the bar does not have an underline, shortening the slicing range from the left; if the last note in the bar is underlined, shortening the slicing range from the right;
calculating the number of white pixel points contained in each row in the slice range row by row, recording when the number of white pixel points contained in two continuous rows changes obviously, and calculating the number and thickness of underlines after traversing; thereafter, comparing the thicknesses of the underscores included in the previous number, if the thicknesses of the underscores and the underscores have obvious differences, but the number of the underscores inferred is the same, judging that the underscores are stuck due to low resolution, and simultaneously adjusting the number of the underscores;
through the underline identification algorithm, the problems that the end to end of the underline is uneven, noise points are contained and the underline is adhered to each other under low resolution in typesetting are solved, and the precise identification of the underline in the numbered musical notation image is realized.
8. The method of claim 6, wherein the step (42) is based on a number and an underlined position, and the step of identifying the point existing above and below the number and the underlined position is:
extracting connected domains in the slices above the numbers and below the underlines by using a connected domain algorithm; filtering, and only reserving a communication domain with the length and width of 8-22 pixel points and the actual area of more than 60% of the product of the length and width of the communication domain, and marking the communication domain as an upper adding point and a lower adding point; through the algorithm, the upper and lower adding points in the numbered musical notation image are accurately identified.
CN202311110239.XA 2023-08-31 2023-08-31 Numbered musical notation extracting and converting method based on image recognition technology Active CN117253240B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311110239.XA CN117253240B (en) 2023-08-31 2023-08-31 Numbered musical notation extracting and converting method based on image recognition technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311110239.XA CN117253240B (en) 2023-08-31 2023-08-31 Numbered musical notation extracting and converting method based on image recognition technology

Publications (2)

Publication Number Publication Date
CN117253240A CN117253240A (en) 2023-12-19
CN117253240B true CN117253240B (en) 2024-03-26

Family

ID=89132203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311110239.XA Active CN117253240B (en) 2023-08-31 2023-08-31 Numbered musical notation extracting and converting method based on image recognition technology

Country Status (1)

Country Link
CN (1) CN117253240B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007121563A (en) * 2005-10-26 2007-05-17 Kawai Musical Instr Mfg Co Ltd Musical score recognition device and musical score recognition program
CN102663423A (en) * 2012-03-28 2012-09-12 北京航空航天大学 Method for automatic recognition and playing of numbered musical notation image
JP2014170146A (en) * 2013-03-05 2014-09-18 Univ Of Tokyo Method and device for automatically composing chorus from japanese lyrics
KR20160002111A (en) * 2014-06-30 2016-01-07 한국전자통신연구원 Method for describing traditional music symbols in musicxml
CN112183658A (en) * 2020-10-14 2021-01-05 小叶子(北京)科技有限公司 Music score identification method and device, electronic equipment and storage medium
CN115311665A (en) * 2022-08-29 2022-11-08 泓宇星私人有限责任公司 Handwritten numbered musical notation recognition method and system
CN115393875A (en) * 2022-08-30 2022-11-25 杭州电子科技大学 Method and system for staff identification and numbered musical notation conversion based on MobileNet V3

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080167739A1 (en) * 2007-01-05 2008-07-10 National Taiwan University Of Science And Technology Autonomous robot for music playing and related method
JP2009151712A (en) * 2007-12-21 2009-07-09 Canon Inc Sheet music creation method and image processing system
US9263013B2 (en) * 2014-04-30 2016-02-16 Skiptune, LLC Systems and methods for analyzing melodies

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007121563A (en) * 2005-10-26 2007-05-17 Kawai Musical Instr Mfg Co Ltd Musical score recognition device and musical score recognition program
CN102663423A (en) * 2012-03-28 2012-09-12 北京航空航天大学 Method for automatic recognition and playing of numbered musical notation image
JP2014170146A (en) * 2013-03-05 2014-09-18 Univ Of Tokyo Method and device for automatically composing chorus from japanese lyrics
KR20160002111A (en) * 2014-06-30 2016-01-07 한국전자통신연구원 Method for describing traditional music symbols in musicxml
CN112183658A (en) * 2020-10-14 2021-01-05 小叶子(北京)科技有限公司 Music score identification method and device, electronic equipment and storage medium
CN115311665A (en) * 2022-08-29 2022-11-08 泓宇星私人有限责任公司 Handwritten numbered musical notation recognition method and system
CN115393875A (en) * 2022-08-30 2022-11-25 杭州电子科技大学 Method and system for staff identification and numbered musical notation conversion based on MobileNet V3

Also Published As

Publication number Publication date
CN117253240A (en) 2023-12-19

Similar Documents

Publication Publication Date Title
CN105261110B (en) A kind of efficiently DSP paper money number recognition methods
CN105654072B (en) A kind of text of low resolution medical treatment bill images automatically extracts and identifying system and method
CN101515325B (en) Character extracting method in digital video based on character segmentation and color cluster
JP4065460B2 (en) Image processing method and apparatus
CN102750541B (en) Document image classifying distinguishing method and device
dos Santos et al. Text line segmentation based on morphology and histogram projection
JP3792747B2 (en) Character recognition apparatus and method
CN104966051B (en) A kind of Layout Recognition method of file and picture
CN101599125A (en) The binarization method that the complex background hypograph is handled
CN105095892A (en) Student document management system based on image processing
CN102663423A (en) Method for automatic recognition and playing of numbered musical notation image
CN108235115B (en) Method and terminal for positioning human voice area in song video
CN110674821B (en) License plate recognition method for non-motor vehicle
CN116704523B (en) Text typesetting image recognition system for publishing and printing equipment
CN117253240B (en) Numbered musical notation extracting and converting method based on image recognition technology
Huang et al. Automatic Handwritten Mensural Notation Interpreter: From Manuscript to MIDI Performance.
CN107730511A (en) A kind of Tibetan language historical document line of text cutting method based on baseline estimations
CN107609482B (en) Chinese text image inversion discrimination method based on Chinese character stroke characteristics
CN111104869B (en) Industrial scale spectrum digitizing method capable of identifying small character content
JP5887242B2 (en) Image processing apparatus, image processing method, and program
JP3955467B2 (en) Image processing program and image processing apparatus
CN106570508B (en) Music score spectral line detection and deletion method based on local binary mode
Sun et al. Ancient Book Seals Segmentation Based on Automatic Region Growing
CN115331242B (en) Print form number recognition method
TWI695343B (en) Automatic labeling method for detecting moving objects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant