CN106297755B

CN106297755B - Electronic equipment and identification method for music score image identification

Info

Publication number: CN106297755B
Application number: CN201610859907.2A
Authority: CN
Inventors: 宋晴; 杨录; 贾文赫; 王智慧; 杨李怡; 刘小欧; 辛学仕; 陈海鹏; 杨敏; 姜佳男
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2016-09-28
Filing date: 2016-09-28
Publication date: 2023-06-13
Anticipated expiration: 2036-09-28
Also published as: CN106297755A

Abstract

The invention discloses electronic equipment and a recognition method for music score image recognition, wherein the electronic equipment comprises a shell, a sounding component, a main board arranged in the shell and an image scanning component arranged at a first end part of the shell; the main board is provided with a main control circuit, a sound card circuit and a power circuit which are respectively and electrically connected with the main control circuit; acquiring a staff image to be processed through a camera and transmitting the staff image to a main control circuit; the main control circuit identifies the staff images to be processed and identifies each complete note; the main control circuit sends corresponding sound digital signals to the sound card circuit according to the recognized complete notes, and the sound card circuit converts the received sound digital signals into playable analog signals and transmits the playable analog signals to the sounding component for playing; the device solves the problems of the prior art that an image acquisition module is separated from an identification module and the use is inconvenient. The method adopts the cascade connection of the note classifier and the convolutional neural network to carry out note identification, and has the advantages of high identification speed and high identification precision.

Description

Electronic equipment and identification method for music score image identification

Technical Field

The invention relates to the technical field of image recognition, in particular to electronic equipment and a recognition method for recognizing music score images.

Background

Image recognition refers to a technique of processing, analyzing, and understanding an image with a computer to recognize objects and objects of various modes.

The music score image recognition device in the prior art comprises an image acquisition module and a computer, wherein the image acquisition module acquires image data of a music score in a photographing or music score scanning mode, inputs the image data into the computer, and analyzes and recognizes the acquired image data through a recognition module in the computer.

However, with the above-described musical score image recognition apparatus, there are the following technical problems: the image acquisition module is separated from the identification module, and the image acquisition module and the identification module are required to work by a computer, so that the working process is longer, and the convenience of use is affected.

Most of music score image recognition methods in the prior art are based on traditional computer vision methods, are not ideal in recognition accuracy and recognition speed, cannot achieve rapid and accurate recognition, and even require high normalization requirements for music scores to be recognized, so that the method is not beneficial to use in daily scenes.

Disclosure of Invention

The embodiment of the invention aims to provide electronic equipment and an identification method for identifying music score images, which can solve the problems that an image acquisition module and an identification module of the music score image identification equipment in the prior art are separated, the use is inconvenient, and the identification precision and the identification speed of the music score image identification method in the prior art are not ideal.

To achieve the above object, an embodiment of the present invention discloses an electronic device for music score image recognition, including a housing, a sound emitting part, a main board disposed in the housing, and an image scanning part disposed at a first end of the housing;

the main board is provided with a main control circuit, a sound card circuit and a power supply circuit which are respectively and electrically connected with the main control circuit;

the image scanning component comprises a scanning roller and a camera arranged above the scanning roller, and the scanning roller and the camera are electrically connected with the main control circuit; the camera sends the shot music score image to the main control circuit for processing;

the sound generating component is connected with the sound card circuit and generates sound according to a sound signal sent by the main control circuit;

the power supply circuit is respectively and electrically connected with the scanning roller, the camera and the sounding component to supply power to the scanning roller, the camera and the sounding component;

the second end of the shell is provided with a battery compartment and a compartment cover, and the battery compartment is connected with a power circuit on the main board.

Preferably, the shell is a pen-shaped shell; the image scanning component is arranged at the first end part of the pen-shaped shell;

the sound generating component is arranged above the image scanning component, and the image scanning component and the sound generating component form a first end part into a pen point shape;

the main board is arranged at a position close to the pen point in the pen-shaped shell;

at least 2 main board mounting columns are arranged in the pen-shaped shell; the main board is fixed in the pen-shaped shell through the at least 2 main board mounting columns.

Preferably, a battery compartment and a compartment cover are arranged at the second end of the pen-shaped shell, and the battery compartment is connected with a power circuit on the main board.

Preferably, an external power line is arranged at the second end part of the pen-shaped shell, and the external power line is connected with a power circuit on the main board.

The embodiment of the invention also discloses a music score image recognition method, which comprises the following steps,

acquiring a staff image to be processed through a camera and transmitting the staff image to a main control circuit;

the main control circuit identifies the staff images to be processed and identifies each complete note;

the main control circuit sends corresponding sound digital signals to the sound card circuit according to the recognized complete notes, and the sound card circuit converts the received sound digital signals into playable analog signals and transmits the playable analog signals to the sounding component for playing;

the main control circuit identifies the staff image to be processed, including,

drawing the edge information of the image by adopting an edge detection method on the staff image to be processed, and detecting the position coordinates of the staff by adopting a straight line detection method;

carrying out note positioning segmentation on the staff image to be processed by adopting a preset note classifier to obtain the position of each complete note in the image;

identifying the note heads obtained by segmentation by adopting a preset convolutional neural network, judging whether the note heads are solid Fu Tou or hollow, and obtaining the positions of the note heads;

and identifying each complete note according to the obtained five-line position coordinates, the relative position of each complete note, whether the complete note is a solid note or a hollow note and the position of the note.

Preferably, the training process of the note classifier includes:

establishing a positive sample data set and a negative sample data set, wherein the positive sample data set comprises position data of a positioning frame and image data of a staff image in the positioning frame, the positive sample data set comprises image data of complete notes, and the negative sample data set comprises image data of other music scores except for the complete notes;

the channel characteristics of each sample in the positive sample data set and the negative sample data set are extracted, and a note classifier is trained.

Preferably, the musical note location segmentation is performed on the staff image to be processed, including,

randomly selecting a plurality of candidate positioning frames on the staff image to be processed, scanning the positioning frames one by one, extracting the channel characteristics from the image in each positioning frame, inputting the extracted channel characteristics into a note classifier, judging whether the image in the positioning frame is a positive sample or a negative sample, judging that the positive sample is a complete note in a music score, judging that the negative sample is a background rejection of the music score, thereby obtaining the complete note in the staff image to be processed, and comparing the position data of the positioning frames in the note classifier to obtain the position of each complete note in the image.

Preferably, the training process of the convolutional neural network comprises,

establishing a note symbol head data set which comprises three classification data of a solid symbol head, a hollow symbol head and a background;

constructing a convolutional neural network, which comprises 2 convolutional layers, 2 downsampling layers and 1 full-connection layer;

and inputting the symbol head image data in the symbol head data set into a convolutional neural network to complete training.

Preferably, the convolutional neural network is used to identify the note heads obtained by segmentation, including,

the method comprises the steps of inputting a complete note obtained by note positioning and segmentation into a convolutional neural network, obtaining a solid note, a hollow note or a background through comparison with data in a note data set, discarding the background, and simultaneously determining the position of the note in the complete note by comparing the position data of the note in the note data set.

Preferably, the staff image to be processed specifically includes: and denoising, contrast enhancement and graying the staff image, and reducing noise or uneven illumination to obtain a binary image.

According to the technical scheme, the sound generating component, the main board and the image scanning component are all integrated into one device, so that the portability of a product is greatly improved, and the problem that an image acquisition module and an identification module are separated and inconvenient to use in the prior art is solved.

According to the identification method embodiment, an edge detection method is adopted for drawing out the edge information of an image of the staff image to be processed, and then a linear detection method is adopted for detecting the position coordinates of the staff; carrying out note positioning segmentation on the staff image to be processed by adopting a preset note classifier to obtain the position of each complete note in the image; identifying the note heads obtained by segmentation by adopting a preset convolutional neural network, judging whether the note heads are solid Fu Tou or hollow, and obtaining the positions of the note heads; and identifying each complete note according to the obtained five-line position coordinates, the relative position of each complete note, whether the complete note is a solid note or a hollow note and the position of the note. Compared with the traditional computer vision method, the method adopts the cascade connection of the note classifier and the convolutional neural network to carry out note recognition, and has the advantages of high recognition speed and high recognition precision.

Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an embodiment of an electronic device of the present invention;

FIG. 2 is a schematic circuit diagram of a motherboard in an embodiment of the electronic device of the present invention;

FIG. 3 is a control schematic diagram of a motherboard in an embodiment of the electronic device of the present invention;

FIG. 4 is a flowchart of a first embodiment of the score recognition method of the present invention;

FIG. 5 is a flow chart of the identification of a staff image to be processed by the master control circuit in a first embodiment of the identification method of the present invention;

FIG. 6 is a flow chart of the identification of a staff image to be processed by the master control circuit in a second embodiment of the identification method of the present invention;

FIG. 7 is a schematic diagram of a single edge detection method in a second embodiment of the score recognition method of the present invention;

FIG. 8 is an effect diagram of five-line position coordinate detection in the second embodiment of the score recognition method of the present invention;

FIG. 9 is a diagram showing a training process of a phonetic symbol classifier in a second embodiment of the score recognition method of the present invention;

FIG. 10 is a sample schematic of a positive sample dataset and a negative sample dataset in a second embodiment of the score recognition method of the present invention;

FIG. 11 is a flowchart of a phonetic symbol positioning segmentation in a second embodiment of the score recognition method of the present invention;

FIG. 12 is an effect diagram of a phonetic symbol positioning segmentation in a second embodiment of the score recognition method of the present invention;

FIG. 13 is a schematic diagram of a training process of a convolutional neural network in a second embodiment of the score recognition method of the present invention;

FIG. 14 is a block diagram of a convolutional neural network in a second embodiment of the score recognition method of the present invention;

fig. 15 is a flowchart of recognition of a tone Fu Futou in a second embodiment of the score recognition method of the present invention;

in the figure, 1, the cabin cover, 2, the battery compartment, 3, the mainboard, 4, the camera, 5, scanning gyro wheel, 6, mainboard erection column, 7, sounding part, 8, LED light filling lamp.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the structure of one embodiment of the electronic device for music score image recognition of the present invention, as shown in fig. 1, the housing is a pen-shaped housing, the image scanning component is disposed at a first end of the pen-shaped housing, the sound generating component 7 is mounted above the image scanning component, and the image scanning component and the sound generating component 7 form the first end into a pen point shape; the image scanning means comprises a scanning roller 5 and a camera 4 arranged above the scanning roller 5.

The main board 3 is mounted in the pen-like housing at a position close to the pen point. At least 2 main board mounting posts 6 are arranged in the pen-shaped shell, and the main board 3 is fixed in the pen-shaped shell through the at least 2 main board mounting posts 6. As shown in fig. 2, a main control circuit, a sound card circuit and a power supply circuit are arranged on the main board 3, and the sound card circuit and the power supply circuit are respectively and electrically connected with the main control circuit; the scanning roller 5 and the camera 4 are electrically connected with a main control circuit; the camera 4 sends the shot music score image to the main control circuit for processing; the sounding component 7 is connected with the sound card circuit and sounds according to the sound signal sent by the main control circuit.

The second end of the pen-shaped shell is provided with a battery compartment 2 and a compartment cover 1, and the battery compartment 2 is connected with a power circuit on the main board 3. It should be noted that the battery compartment 2 and the cover 1 are provided for supplying power to the power circuit on the main board 3, and other structures may be selected for supplying power, for example: an external power line is arranged at the second end part of the pen-shaped shell and is connected with a power circuit on the main board 3.

Preferably, the camera 4 is further provided with an LED light supplementing lamp 8 for supplementing light to the camera 4.

Preferably, the sound generating component 7 is a speaker. It should be noted that the sound emitting component 7 is a sound emitting device in the prior art, and is intended to perform the function of sound emission.

Preferably, the camera 4 is implemented by a CMOS image sensor OV 7620; the main control circuit is realized by adopting a microprocessor Argus3 chip. As shown in FIG. 3, the microprocessor Argus3 chip is embedded with ARM9TDMI core, and a high-speed buffer memory, a special RAM and various rich application interfaces are integrated in the chip to support the formats of SPAM, FLASH and the like and provide a video processing engine and an image processor.

Preferably, a protective sleeve movably connected with the pen-shaped shell is arranged outside the image scanning component, and the shape of the protective sleeve is matched with the shape of the pen point so as to protect the camera 4.

A first embodiment of the score image recognition method of the present invention, as shown in fig. 4, includes,

step 101: acquiring a staff image to be processed through a camera and transmitting the staff image to a main control circuit;

step 102: the main control circuit identifies the staff images to be processed and identifies each complete note;

step 103: the main control circuit sends corresponding sound digital signals to the sound card circuit according to the recognized complete notes, and the sound card circuit converts the received sound digital signals into playable analog signals and transmits the playable analog signals to the sounding component for playing;

the master circuit identifies the staff images to be processed, as shown in fig. 5, including,

step 1021: drawing the edge information of the image by adopting an edge detection method on the staff image to be processed, and detecting the position coordinates of the staff by adopting a straight line detection method;

step 1022: carrying out note positioning segmentation on the staff image to be processed by adopting a preset note classifier to obtain the position of each complete note in the image;

step 1023: identifying the note heads obtained by segmentation by adopting a preset convolutional neural network, judging whether the note heads are solid Fu Tou or hollow, and obtaining the positions of the note heads;

step 1024: and identifying each complete note according to the obtained five-line position coordinates, the relative position of each complete note, whether the complete note is a solid note or a hollow note and the position of the note.

A second embodiment of the music score image recognition method of the present invention, as shown in fig. 6, is different from the first embodiment of the recognition method in that the main control circuit recognizes the staff image to be processed, including,

step 2021: denoising, contrast enhancement and graying the obtained staff image, and reducing noise or uneven illumination to obtain a binary image;

step 2022: drawing the edge information of the image by adopting a unilateral edge detection method on the obtained binary image, and detecting the five-line position coordinate by adopting a hough straight line detection method;

step 2023: carrying out note positioning segmentation on the obtained binary image by adopting a preset note classifier to obtain the position of each complete note in the image;

step 2024: identifying the note heads obtained by segmentation by adopting a preset convolutional neural network, judging whether the note heads are solid Fu Tou or hollow, and obtaining the positions of the note heads;

step 2025: and identifying each complete note according to the obtained five-line position coordinates, the relative position of each complete note, whether the complete note is a solid note or a hollow note and the position of the note.

Other steps in the second embodiment of the music score image recognition method of the present invention may refer to the first embodiment, and will not be described herein.

Preferably, the single edge detection method of step 2022 in the second embodiment of the identification method of the present invention includes:

a) And (3) selecting Sobel operators to respectively calculate gradient values in the horizontal direction and the vertical direction:

horizontal gradient: s is(s) _x ＝(a ₂ +2a ₃ +a ₄ )-(a ₀ +2a ₇ +a ₆ )

Vertical gradient: s is(s) _y ＝(a ₀ +2a ₁ +a ₂ )-(a ₆ +2a ₅ +a ₄ )

Amplitude value:

sobel template:

wherein a is ₀ -a ₇ Representing 8 neighborhood pixel points;

b) Adopting non-maximum value inhibition to inhibit gradient values in the horizontal direction and the vertical direction, namely only reserving the maximum value point on the gradient straight line in each direction, and setting the values of the rest points to be 0;

c) And obtaining the size of a threshold to be set in each region by adopting an adaptive threshold method, using the threshold as a condition limit of whether edges are connected or not, and drawing the edge information of the image.

In order to better illustrate the beneficial effects of the single edge detection method, the following makes a comparison between the conventional canny edge detection method and the single edge detection method adopted by the invention:

1) The traditional canny edge detection method comprises the following steps:

a) The first-order partial derivative of each pixel in the image is obtained, and the gradient direction and the amplitude are calculated, so that the amplitude of each point in different directions is obtained, and different operator templates, such as Robert operators, prewitt operators and the like, are involved in the process;

b) The non-extremum suppression is carried out on the gradient amplitude, the larger the element value in the gradient amplitude matrix of the image is, the larger the gradient value of the point in the image is, but the point is insufficient to be determined as an edge point, so that the extremum of the pixel point on a straight line is required to be found, the gray value corresponding to the non-extremum point is set to be 0, and most of non-edge points can be removed;

c) And detecting and connecting edges by using a double-threshold algorithm, selecting two thresholds, and obtaining an edge image according to the high threshold. And (3) linking the edges into contours in the high-threshold image, searching a point meeting the low threshold in the 8-value neighborhood points of the break point when the end points of the contours are reached, and collecting new edges according to the point until the edges of the whole image are closed, so that the whole edge image is formed.

2) The single-side edge detection method adopted by the invention comprises the following steps:

a) The template operator commonly used by the original canny algorithm is changed, and then a Sobel operator (a ₀ -a ₇ Representing 8 neighborhood pixel points), respectively solving gradient values in the horizontal direction and the vertical direction;

Amplitude value:

sobel template:

b) The gradient values in each direction are also suppressed, but since a straight line single-sided edge is required, the suppression method is required to be changed, the non-extreme value suppression in the original method is changed to the non-maximum value suppression, namely only the point of the maximum value on the gradient straight line in each direction is reserved, the values of the rest points are all set to 0, as shown in fig. 7, the area of (3*3) is used as a comparison block, the central pixel is respectively compared with (1, 5) (2, 6) (3, 7) (4, 8), and the non-maximum value point is set to 0;

c) The method uses the self-adaptive threshold method to obtain the size of the threshold to be set in each region, and uses the threshold as the condition limit of whether the edges are connected or not.

It should be noted that the adaptive threshold method is a common method in the prior art.

By the comparison, the detection of the traditional canny method finds that each five lines have double edges and influence the positioning effect, the invention adopts non-maximum value inhibition to only reserve gradient single-edge extremum, and adds the self-adaptive threshold condition, so that the five lines better present single-edge;

it should be noted that, the hough line detection method in step 2022 is a conventional line detection method in the prior art, and can detect the position coordinates of the five lines according to the edge information of the obtained image, as shown in fig. 8, which is an effect diagram of the positioning of the five line spectrums in this embodiment.

Preferably, the training process of the note classifier in step 2023 in the second embodiment of the identification method of the present invention, as shown in fig. 9, includes:

step 301: establishing a positive sample data set and a negative sample data set, wherein the positive sample data set is the image data comprising the complete musical notes, and the negative sample data set is the image data comprising the rest musical scores except the complete musical notes, and the image data of the staff images in the positioning frames are included in the data sets as shown in fig. 10;

step 302: the channel characteristics of each sample in the positive sample data set and the negative sample data set are extracted, and a note classifier is trained.

It should be noted that the negative examples herein may be incomplete note images, staff images, score background images, etc., but are not limited to the listed above.

Preferably, the channel characteristics of each sample include gray scale and color, linear filtering, nonlinear transformation, point-by-point transformation, gradient histogram. It should be noted that the 5 channel characteristics, which are integral channel characteristics in the prior art, are defined and explained as follows:

gray and color: gray scale is a simple channel, and LUV color space is also three commonly used channels;

linear filtering: obtaining channels by linear transformation, such as channels obtained by convoluting images with Gabor filters in different directions, wherein each channel contains edge information in different directions, so as to obtain texture information of different dimensions of the images;

nonlinear transformation: calculating the gradient amplitude of the image, and capturing the edge intensity information; capturing edge gradient information, wherein the gradient comprises edge intensity and edge direction, and for a color image, the gradient needs to be calculated in 3 channels respectively, and the maximum response of the 3 gradients at corresponding positions is taken as the final output; binarizing the image, wherein the image is binarized by two different thresholds respectively;

point-by-point transformation: any pixel in the channel may be changed as a post-processing by any one of a number of functions. The local multiplication operator exp (sigma) can be obtained by Log operation _i log(x _i ))＝∏ _i x _i Similarly, computing the power of p for each pixel can be used to solve the generalized mean;

gradient histogram: is a weighted histogram whose bin index is calculated from the direction of the gradient and whose weight is calculated from the magnitude of the gradient, i.e. the channel is calculated as: q (Q) _θ (x,y)＝G(x,y)*1[Θ(x,y)＝θ]Here, G (x, y) and Θ (x, y) represent the gradient magnitude and quantized gradient direction of the image, respectively, and at the same time, blurring of different scales is performed, so that gradient information of different scales can be calculated. Furthermore, the calculated histogram is normalized by means of gradient magnitude information, in a way similar to the HOG feature.

Preferably, the positioning frame is a rectangular block positioning frame, the size of the positioning frame is determined according to the interval between five lines, and the height and width of the positioning frame are calculated according to the formula respectively:

height＝5*interval；width＝2.5*interval。

preferably, the staff image to be processed in step 2023 of the second embodiment of the identification method of the present invention is subjected to note-location segmentation, as shown in fig. 11, comprising,

and randomly selecting a plurality of candidate positioning frames on the binary image to be identified, scanning the positioning frames one by one, extracting the channel characteristics from the images in each positioning frame, inputting the extracted channel characteristics into a note classifier, judging whether the images in the positioning frames are positive samples or negative samples, judging that the positive samples are complete notes in the music score, judging that the negative samples are the background of the music score, discarding the background of the music score, thereby obtaining the complete notes in the binary image to be identified, and comparing the position data of the positioning frames in the note classifier to obtain the positions of each complete note in the images, as shown in fig. 12.

In this embodiment, 2000 candidate positioning frames are randomly selected.

Preferably, the training process of the convolutional neural network in step 2024 in the second embodiment of the identification method of the present invention, as shown in fig. 13, includes,

step 401: establishing a note symbol head data set which comprises three classification data of a solid symbol head, a hollow symbol head and a background;

step 402: as shown in fig. 14, a convolutional neural network is constructed, which comprises 2 convolutional layers, 2 downsampling layers and 1 fully-connected layer;

step 403: and inputting the symbol head image data in the symbol head data set into a convolutional neural network to complete training.

The note header data set in this embodiment includes 2000 solid note headers, 1500 hollow note headers and 4000 background images.

According to the embodiment, the caffe framework convolutional neural network is adopted, the caffe framework is a clear, high-readability and rapid deep learning framework, the model is simple in structure and few in parameters, note identification can be carried out only by realizing simple convolution and full-connection forward network in a plurality of environments (notebooks, mobile phones and the like), and the caffe framework is not required to be additionally configured, so that the method is very convenient and simple.

Preferably, the recognition method of the present invention in the second embodiment uses a convolutional neural network to recognize the note headers obtained by segmentation, as shown in fig. 15, including,

In practical application, playable electronic music score can be generated according to the identified note information for playing.

By adopting the second embodiment to carry out note identification, the hardware is three-star galaxy S3, the CPU carries out test, the note identification speed reaches 500fps, and the accuracy is 98.71%.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A music score image recognition method for an electronic device for music score image recognition, characterized by comprising,

the main control circuit identifies the staff image to be processed, including,

carrying out note positioning segmentation on the staff image to be processed by adopting a preset note classifier to obtain the position of each complete note in the image, wherein the training process of the note classifier comprises the following steps: establishing a positive sample data set and a negative sample data set, wherein the positive sample data set comprises position data of a positioning frame and image data of a staff image in the positioning frame, the positive sample data set comprises image data of complete notes, and the negative sample data set comprises image data of other music scores except for the complete notes; extracting channel characteristics of each sample in the positive sample data set and the negative sample data set, and training a note classifier; the method comprises the steps of carrying out note positioning segmentation on a staff image to be processed, wherein a plurality of candidate positioning frames are randomly selected on the staff image to be processed, scanning the positioning frames one by one, extracting channel characteristics from the image in each positioning frame, inputting the extracted channel characteristics into a note classifier, judging whether the image in the positioning frame is a positive sample or a negative sample, judging that the positive sample is a complete note in a music score, judging that the negative sample is a complete note in the music score, discarding the background of the music score, thereby obtaining the complete note in the staff image to be processed, and obtaining the position of each complete note in the image by comparing the position data of the positioning frame in the note classifier;

identifying each complete note according to the obtained five-line position coordinates, the position of each complete note in the image, whether the complete note is a solid symbol head or a hollow symbol head and the position of the symbol head;

the electronic equipment for music score image recognition comprises a shell, a sounding component, a main board arranged in the shell and an image scanning component arranged at the first end part of the shell;

the main board is provided with a main control circuit and a sound card circuit electrically connected with the main control circuit;

the image scanning component comprises the camera, and the camera sends the shot music score image to the main control circuit for processing;

the sound generating component is connected with the sound card circuit and generates sound according to the sound signal sent by the main control circuit.

2. The score image recognition method of claim 1, wherein the training process of the convolutional neural network comprises,

3. The method of recognizing a score image as claimed in claim 2, wherein the recognizing of the note heads obtained by division using the convolutional neural network comprises,

4. The music score image recognition method of claim 1, wherein the staff image to be processed specifically is: and denoising, contrast enhancement and graying the staff image, and reducing noise or uneven illumination to obtain a binary image.

5. The music score image recognition method of claim 1, wherein a power circuit electrically connected to the main control circuit is further provided on the main board;

the image scanning component further comprises a scanning roller, and the scanning roller and the camera are electrically connected with the main control circuit;

6. The musical score image recognition method as claimed in claim 1, wherein the housing is a pen-shaped housing; the image scanning component is arranged at the first end part of the pen-shaped shell;

7. The music score image recognition method of claim 6, wherein the second end of the pen-shaped housing is provided with a battery compartment and a hatch cover, the battery compartment being connected to a power circuit on the main board.

8. The music score image recognition method of claim 6, wherein the second end of the pen-shaped case is provided with an external power line connected to a power circuit on the main board.