CN113450623A

CN113450623A - Singing training system

Info

Publication number: CN113450623A
Application number: CN202110609060.3A
Authority: CN
Inventors: 黄军; 沈梓力; 赵杰
Original assignee: Zhejiang Industry and Trade Vocational College
Current assignee: Zhejiang Industry and Trade Vocational College
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2021-09-28

Abstract

The invention discloses a singing training system, which comprises a singing information acquisition module, an MCU module and a man-machine interaction module, wherein the singing information acquisition module is used for acquiring audio and video information in the singing process of a user and transmitting the audio and video information to the MCU module, the MCU module is used for processing the information acquired by the singing information acquisition module, and sends the information processing result to the man-machine interaction module, the man-machine interaction module is used for the user to input the operation instruction and receive the feedback information of the MCU module, the invention collects the audio and video information in the singing process of the user through the singing information acquisition module, the MCU module processes the audio and video information in the singing process of the user to form guide opinions, and the human-computer interaction module is used for displaying the guide opinions and the audio and video information of the singing process, so that the user can clearly know the problems existing in singing each segment, and a better training effect is achieved.

Description

Singing training system

Technical Field

The invention relates to the technical field of training systems, in particular to a singing training system.

Background

In the prior art, some people record the singing training process of the people into a video and find problems by watching the video, but the problems with more details are difficult to find only through subjective evaluation of people, and meanwhile, some people sing songs through application products such as 'singing bar', 'national karaoke', and the like and evaluate the singing effect of the people according to scores, however, the application products do not clearly indicate the specific problems in the singing process, so that a feasible and effective method for scientific and objective evaluation of singing is necessary to be researched, and a training direction is provided for improvement of the singing technology.

Therefore, it is an urgent need to solve the above problems by providing a new technical solution.

Disclosure of Invention

In view of the above, the present invention provides a singing training system to solve the above technical problems.

In order to achieve the purpose, the invention provides the following technical scheme:

a singing training system comprises a singing information acquisition module, an MCU module and a man-machine interaction module.

In the above scheme, the singing information acquisition module is configured to acquire audio and video information during a singing process of a user and send the audio and video information to the MCU module.

In the above scheme, the MCU module is configured to process the information acquired by the singing information acquisition module, and send an information processing result to the human-computer interaction module.

In the above scheme, the human-computer interaction module is used for a user to input an operation instruction and receive feedback information of the MCU module.

In the above scheme, the singing information acquiring module includes an audio acquiring unit, the audio acquiring unit includes a microphone, a power button, an infrared receiver, an amplifier, an adaptive filter, a bracket, an infrared distance measuring sensor, an adjusting button and a driving motor, the microphone is used for acquiring sound signals, the power button and the infrared receiver are both disposed on the microphone, the power button is used for controlling the on and off of the microphone, the infrared receiver is used for controlling the on and off of the microphone according to received infrared signals transmitted by an infrared remote controller, the amplifier is connected to the microphone, the amplifier is used for amplifying the sound signals acquired by the microphone, the adaptive filter is connected to the amplifier, the adaptive filter is used for filtering the sound signals amplified by the amplifier, the support with the microphone is connected, the support is used for supporting the microphone, infrared distance measuring sensor with adjustment button all sets up on the support, infrared distance measuring sensor is used for detecting the distance of user and support, adjustment button is used for supplying the user to carry out manual regulation the height, the angle and the flexible volume of level of support, driving motor with the support is connected, driving motor is used for the drive the support is adjusted, driving motor includes altitude mixture control driving motor, angle modulation driving motor and the flexible driving motor of level.

In the foregoing solution, the singing information obtaining module further includes a video collecting unit, the video collecting unit includes a panoramic camera, a close-up camera, a zoom driving motor, a pan-tilt and a pan-tilt driving motor, the panoramic camera is used for obtaining panoramic information of a user singing process, the close-up camera is used for performing close-up on a user face and user actions during the user singing process, the zoom driving motor is connected with the close-up camera, the zoom driving motor is used for driving the close-up camera to zoom, the panoramic camera and the close-up camera are both mounted on the pan-tilt, the pan-tilt includes a panoramic camera support and a close-up camera support, the pan-tilt driving motor is connected with the pan-tilt, the pan-tilt driving motor is used for driving the pan-tilt to perform position conversion, and the pan-tilt driving motor includes a rotating stepping motor, a pan-tilt driving motor, a pan driving motor, a camera, a vertical stepper motor and a horizontal stepper motor.

In the above scheme, the MCU module includes a database unit, a data processing unit, and a singing skill training unit, the database unit is configured to store a model singing skill database, a singing image database, preset values of various parameters, and information acquired by the singing information acquisition module, the model singing skill database includes a reference music score, the data processing unit is connected to the database unit, the data processing unit is configured to process the information acquired by the singing information acquisition module, the singing skill training unit is connected to the data processing unit, and the singing skill training unit is configured to form a guidance suggestion according to a processing result of the data processing unit.

In the above solution, the data processing unit includes an audio processing module, the audio processing module includes a keyword recognition unit and a sound source positioning unit, the keyword recognition unit is configured to convert voice information input by a user into text information, compare the text information with keywords stored in the database unit, and output a comparison result; the sound source positioning unit is connected with the keyword identification unit and is used for acquiring sound source position information through an iterative optimization algorithm according to the comparison result of the keyword identification unit.

In the above solution, the audio processing module further comprises a pitch extraction unit, an alignment unit, a pitch unit and a note boundary optimization unit, the pitch extraction unit comprises a singing pitch extraction module and a reference score pitch extraction module, the singing pitch extraction module is used for singing a pitch sequence by YIN algorithm, the reference score pitch extraction module is used for obtaining a score pitch sequence according to absolute time positions and absolute pitches of notes provided by a reference score; the alignment unit is connected with the pitch extraction unit and is used for constructing a similarity matrix according to the singing pitch sequence and the reference music score pitch sequence and obtaining the approximate position of the reference music score notes in the singing sound through a dynamic programming algorithm; the tone fixing unit is connected with the aligning unit and used for regenerating the pitch of the reference music score, and selecting semitone with the minimum cost as the main tone of the regenerated reference music score according to the total cost of the optimal matching path as the measurement of the difference degree of the regenerated reference music score and singing; the note boundary optimizing unit is connected with the tone setting unit and detects the starting position and the ending position of the singing initial consonant through a consonant boundary distinguishing algorithm.

In the above solution, the data processing unit further includes a video processing module, the video processing module includes a noise reduction unit, a correction unit, a video synthesis module and a close-up image recognition unit, the noise reduction unit is configured to perform noise reduction processing on the singing video information collected by the information acquisition module through a Beta algorithm, the correction unit is connected to the noise reduction unit, the noise reduction unit is configured to perform correction processing on the singing video information processed by the noise reduction unit through a computer vision algorithm, the video synthesis module and the close-up image recognition unit are both connected to the correction unit, the video synthesis module is configured to perform fusion processing on the panoramic singing video information processed by the correction unit and a simulated scene selected by a user, the close-up image recognition unit is configured to perform emotion classification of singing according to the expression of a singer in the close-up singing image information, and judging singing action standard degree according to the action of the singer.

In the above scheme, the singing technique training unit includes a singing guidance module, an expression guidance module and an action guidance module, where the singing guidance module is configured to obtain pitch deviation and rhythm accuracy of different singing segments according to an audio processing result processed by the data processing unit, and mark a singing segment whose pitch deviation is higher than a preset pitch deviation value or whose rhythm accuracy is higher than a preset rhythm accuracy value; the expression guidance module is used for judging whether the singing emotion classification result of the data processing unit is consistent with the preset singing emotion or not and inputting a judgment structure; the action guide module is used for marking the non-standard action according to the singing action standard degree of the data processing unit.

In the above scheme, the human-computer interaction module includes a touch display screen, a speaker and a key, the touch display screen is used for a user to set parameters and input an operation instruction through touch operation, and is used for displaying information acquired by the singing information acquisition module and information sent by the MCU module, the speaker and the key are both connected to the touch display screen, the speaker is used for playing audio information according to the audio playing instruction information sent by the touch display screen, the key includes a page turning key, a brightness adjustment key and a sound adjustment key, the page turning key is used for performing a page turning operation on content displayed on the touch display screen, the brightness adjustment key is used for adjusting screen brightness of the touch display screen, and the sound adjustment key is used for adjusting playing volume of the speaker.

In the scheme, the human-computer interaction module further comprises a light supplement lamp, the light supplement lamp is connected with the touch display screen, and the light supplement lamp is used for supplementing light to the position of the singer according to the light supplement lamp adjusting instruction information sent by the touch display screen and the sound source position information sent by the MCU module.

In conclusion, the beneficial effects of the invention are as follows: the singing information acquisition module is used for acquiring audio and video information in the singing process of a user, the MCU module is used for processing the audio and video information in the singing process of the user to form guide opinions, and the man-machine interaction module is used for displaying the guide opinions and the audio and video information in the singing process, so that the user can clearly know the problems existing in singing each segment, and a better training effect is achieved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention.

FIG. 1 is a schematic diagram of the composition of the singing training system of the present invention.

Fig. 2 is a schematic diagram of the composition of the audio acquisition unit.

Fig. 3 is a schematic diagram of the composition of the video capture unit.

Fig. 4 is a schematic diagram of the MCU module.

Fig. 5 is a schematic diagram of the composition of the data processing unit.

Fig. 6 is a schematic diagram of the composition of the singing skill training unit.

FIG. 7 is a schematic diagram of the components of the human interaction module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

As shown in fig. 1, a singing training system of the present invention includes a singing information obtaining module, an MCU module, and a human-computer interaction module.

The connection relationship between the above modules of the present invention will be further described in detail with reference to the accompanying drawings.

The singing information acquisition module is used for acquiring audio and video information in the singing process of a user and sending the audio and video information to the MCU module; the MCU module is used for processing the information acquired by the singing information acquisition module and sending an information processing result to the man-machine interaction module; the human-computer interaction module is used for inputting an operation instruction by a user and receiving feedback information of the MCU module.

As shown in fig. 2, the singing information acquiring module includes an audio acquiring unit, the audio acquiring unit includes a microphone, a power button, an infrared receiver, an amplifier, an adaptive filter, a bracket, an infrared distance measuring sensor, an adjusting button and a driving motor, the microphone is used for acquiring sound signals, the power button and the infrared receiver are both disposed on the microphone, the power button is used for controlling the microphone to be turned on and off, the infrared receiver is used for controlling the microphone to be turned on and off according to received infrared signals transmitted by an infrared remote controller, the amplifier is connected to the microphone, the amplifier is used for amplifying the sound signals acquired by the microphone, the adaptive filter is connected to the amplifier, and the adaptive filter is used for filtering the sound signals amplified by the amplifier, the support with the microphone is connected, the support is used for supporting the microphone, infrared distance measuring sensor with adjustment button all sets up on the support, infrared distance measuring sensor is used for detecting the distance of user and support, adjustment button is used for supplying the user to carry out manual regulation the height, the angle and the flexible volume of level of support, driving motor with the support is connected, driving motor is used for the drive the support is adjusted, driving motor includes altitude mixture control driving motor, angle modulation driving motor and the flexible driving motor of level.

In this embodiment, a user can remotely control the opening and closing of the microphone through an infrared remote controller, and meanwhile, the user can select the height, the angle and the horizontal stretching amount of the support to be manually adjusted through the adjusting button, the adjusting button comprises a height adjusting button, an angle adjusting button and a horizontal stretching adjusting button, the height, the angle or the horizontal stretching amount of the support changes by one unit amount every time the user presses the adjusting button, and the height changing unit amount, the angle changing unit amount or the horizontal stretching changing unit amount can be set through the human-computer interaction module.

As shown in fig. 3, the singing information obtaining module further includes a video collecting unit, the video collecting unit includes a panoramic camera, a close-up camera, a zoom driving motor, a pan-tilt and a pan-tilt driving motor, the panoramic camera is used for obtaining panoramic information of the user singing process, the close-up camera is used for performing close-up on the user face and user actions during the user singing process, the zoom driving motor is connected with the close-up camera, the zoom driving motor is used for driving the close-up camera to zoom, the panoramic camera and the close-up camera are both mounted on the pan-tilt, the pan-tilt includes a panoramic camera support and a close-up camera support, the pan-tilt driving motor is connected with the pan-tilt, the pan-tilt driving motor is used for driving the pan-tilt to perform position conversion, the pan-tilt driving motor includes a rotating stepping motor, a pan-tilt driving motor, a pan driving motor, a camera, a pan driving motor, a camera, a, A vertical stepper motor and a horizontal stepper motor.

In this embodiment, adopt panorama camera and close-up camera to acquire the information of user's singing in-process jointly, not only can make the whole condition of the singing in-process that the user observed, have and to make the user observe the specific detail information of singing in-process, simultaneously, help the MCU module carries out meticulous processing to the video information of singing in-process.

As shown in fig. 4, the MCU module includes a database unit, a data processing unit, and a singing skill training unit, the database unit is configured to store a model singing skill database, a singing image database, preset values of various parameters, and information collected by the singing information obtaining module, the model singing skill database includes a reference music score, the data processing unit is connected to the database unit, the data processing unit is configured to process the information collected by the singing information obtaining module, the singing skill training unit is connected to the data processing unit, and the singing skill training unit is configured to form a guidance suggestion according to a processing result of the data processing unit.

As shown in fig. 5, the data processing unit includes an audio processing module, the audio processing module includes a keyword recognition unit and a sound source positioning unit, the keyword recognition unit is configured to convert voice information input by a user into text information, compare the text information with keywords stored in the database unit, and output a comparison result; the sound source positioning unit is connected with the keyword identification unit and is used for acquiring sound source position information through an iterative optimization algorithm according to the comparison result of the keyword identification unit.

In this embodiment, the MCU module sends a support adjustment driving signal to the height adjustment driving motor, the angle adjustment driving motor and the horizontal telescopic driving motor according to the sound source position information of the sound source positioning unit to adjust the height, angle and horizontal distance of the microphone from the singer, so that the microphone is located at the optimal position for the user to sing.

Further, the audio processing module further comprises a pitch extraction unit, an alignment unit, a pitch unit and a note boundary optimization unit, the pitch extraction unit comprises a singing pitch extraction module and a reference score pitch extraction module, the singing pitch extraction module is used for singing a pitch sequence through a YIN algorithm, and the reference score pitch extraction module is used for acquiring a score pitch sequence according to the absolute time position and the absolute pitch of notes provided by the reference score; the alignment unit is connected with the pitch extraction unit and is used for constructing a similarity matrix according to the singing pitch sequence and the reference music score pitch sequence and obtaining the approximate position of the reference music score notes in the singing sound through a dynamic programming algorithm; the tone fixing unit is connected with the aligning unit and used for regenerating the pitch of the reference music score, and selecting semitone with the minimum cost as the main tone of the regenerated reference music score according to the total cost of the optimal matching path as the measurement of the difference degree of the regenerated reference music score and singing; the note boundary optimizing unit is connected with the tone setting unit and detects the starting position and the ending position of the singing initial consonant through a consonant boundary distinguishing algorithm.

In this embodiment, the sound source positioning unit obtains sound source position information through an iterative optimization algorithm according to a comparison result of the key word recognition unit, the pitch extraction unit includes a singing pitch extraction module and a reference music score pitch extraction module, the singing pitch extraction module is configured to shift a singing sound of a singer into separate frames by a frame with a preset frame length of 50%, each frame extracts a pitch through a YIN algorithm to obtain a singing pitch sequence, performs preliminary unvoiced and voiced sound division through aperiodic measurement output by the YIN algorithm, marks a frame with an unstable pitch as unvoiced, and the reference music score pitch extraction module is configured to directly generate a reference music score sequence by referring to an absolute time position and an absolute pitch of a note provided by a music score, with a step length of 50% of the preset frame length, and marks a reference music score note start frame; the alignment unit constructs a similarity matrix according to the singing pitch sequence and the reference music score pitch sequence, searches an optimal matching path through a dynamic programming algorithm, and obtains the approximate position of the reference music score notes in the singing sound according to the sound frame sequence matched with the reference music score notes in the optimal matching path; the tone setting unit shifts the reference music score by traversing 12 semitones in octaves according to the main tone position of each reference music score, regenerates the pitch of the reference music score, passes through the regenerated pitch sequence of the reference music score and the singing pitch sequence and aligns the reference music score and the singing pitch sequence again by the aligning unit, and selects the semitone with the minimum cost as the main tone of the regenerated reference music score according to the total cost of the optimal matching path as the difference degree of the regenerated reference music score and the singing.

Further, the data processing unit further comprises a video processing module, the video processing module comprises a noise reduction unit, a correction unit, a video synthesis module and a close-up image recognition unit, the noise reduction unit is used for performing noise reduction processing on the singing video information collected by the information acquisition module through a Beta algorithm, the correction unit is connected with the noise reduction unit, the noise reduction unit is used for performing correction processing on the singing video information processed by the noise reduction unit through a computer vision algorithm, the video synthesis module and the close-up image recognition unit are both connected with the correction unit, the video synthesis module is used for performing fusion processing on the panoramic singing video information processed by the correction unit and a simulated scene selected by a user, the close-up image recognition unit is used for performing singing emotion classification according to the expression of a singer in the close-up singing image information, and judging singing action standard degree according to the action of the singer.

In this embodiment, the close-up image recognition unit classifies the emotion singing by the singer into the expression of the singer: tension anxiety, joy, sadness depression, natural calmness, and the like, the close-up image recognition unit judges the degree of standardization of the singing action as standard, more standard, non-standard, extremely non-standard, and the like.

As shown in fig. 6, the singing technique training unit includes a singing guidance module, an expression guidance module, and an action guidance module, where the singing guidance module is configured to obtain pitch deviation and rhythm accuracy of different singing segments according to an audio processing result processed by the data processing unit, and mark a singing segment whose pitch deviation is higher than a preset pitch deviation value or whose rhythm accuracy is higher than a preset rhythm accuracy value; the expression guidance module is used for judging whether the singing emotion classification result of the data processing unit is consistent with the preset singing emotion or not and inputting a judgment structure; the action guide module is used for marking the non-standard action according to the singing action standard degree of the data processing unit.

As shown in fig. 7, the human-computer interaction module includes a touch display screen, a speaker and a key, the touch display screen is used for a user to set parameters and input an operation instruction through touch operation, and is used for displaying information acquired by the singing information acquisition module and information sent by the MCU module, the speaker and the key are both connected to the touch display screen, the speaker is used for playing audio information according to the audio playing instruction information sent by the touch display screen, the key includes a page turning key, a brightness adjustment key and a sound adjustment key, the page turning key is used for performing a page turning operation on content displayed on the touch display screen, the brightness adjustment key is used for adjusting screen brightness of the touch display screen, and the sound adjustment key is used for adjusting playing volume of the speaker.

In this embodiment, the user may select a simulation scene through the touch display screen, and may view the video information obtained by fusing the panoramic singing video information and the simulation scene selected by the user, so as to watch the overall singing effect.

In this embodiment, when the touch display screen displays the guidance opinions formed by the singing skill training unit, for the singing guidance opinions, the singing sections with the pitch deviation higher than the preset value of the pitch deviation or the rhythm accuracy higher than the preset value of the rhythm accuracy can be displayed in a color different from the color of the lyrics of the singing music score, and the lyrics of the singing sections with the pitch deviation higher than the preset value of the pitch deviation and the lyrics of the singing sections with the rhythm accuracy higher than the preset value of the rhythm accuracy are also different, and when the pitch deviation of one singing section is higher than the preset value of the pitch deviation and the rhythm accuracy is higher than the preset value of the rhythm accuracy, the singing section is marked with a fourth color different from all the above colors; for the expression guidance opinions, displaying the singing emotion judgment results of all images, simultaneously displaying the images with the singing emotion of the singer inconsistent with the preset singing emotion independently, displaying the singing emotion type of the singer and the type of the preset singing emotion, and simultaneously displaying the image template of the preset singing emotion; and for the action guidance opinions, the action standard degree judgment results of all the images are displayed, meanwhile, the images with the singing action standard degree lower than the standard degree are independently displayed, and the image template of the standard action is displayed.

Further, the human-computer interaction module still includes the light filling lamp, the light filling lamp with the touch display screen is connected, the light filling lamp is used for according to the light filling lamp adjustment command information that the touch display screen sent with the sound source position information that the MCU module sent carries out the light filling to singer's position.

In this embodiment, the MCU module determines a relative position relationship between a vertical projection position of the sound source on a plane where the light supplement lamp is located and a center position of the light supplement lamp according to the sound source position information and the position of the light supplement lamp, and performs brightness control on the light supplement lamp according to the relative position relationship, thereby achieving an optimal light supplement lamp effect.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A singing training system, comprising: the singing information acquisition module, the MCU module and the man-machine interaction module;

the singing information acquisition module is used for acquiring audio and video information in the singing process of a user and sending the audio and video information to the MCU module;

the MCU module is used for processing the information acquired by the singing information acquisition module and sending an information processing result to the man-machine interaction module;

the human-computer interaction module is used for inputting an operation instruction by a user and receiving feedback information of the MCU module.

2. The singing training system according to claim 1, wherein the singing information acquisition module comprises an audio acquisition unit, the audio acquisition unit comprises a microphone, a power button, an infrared receiver, an amplifier, an adaptive filter, a bracket, an infrared distance measuring sensor, an adjusting button and a driving motor, the microphone is used for acquiring sound signals, the power button and the infrared receiver are both arranged on the microphone, the power button is used for controlling the microphone to be turned on and off, the infrared receiver is used for controlling the microphone to be turned on and off according to received infrared signals transmitted by an infrared remote controller, the amplifier is connected with the microphone, the amplifier is used for amplifying the sound signals acquired by the microphone, and the adaptive filter is connected with the amplifier, adaptive filter is used for the process sound signal that the amplifier enlargies carries out filtering process, the support with the microphone is connected, the support is used for supporting the microphone, infrared distance measuring sensor with adjustment button all sets up on the support, infrared distance measuring sensor is used for detecting the distance of user and support, adjustment button is used for supplying the user to carry out manual regulation the height, the angle and the flexible volume of level of support, driving motor with the support is connected, driving motor is used for the drive the support is adjusted, driving motor includes altitude mixture control driving motor, angle modulation driving motor and the flexible driving motor of level.

3. The singing training system of claim 2, wherein the singing information acquisition module further comprises a video acquisition unit, the video acquisition unit comprises a panoramic camera, a close-up camera, a zoom drive motor, a pan-tilt and pan-tilt drive motor, the panoramic camera is used for acquiring panoramic information of the singing process of the user, the close-up camera is used for close-up of the face and the actions of the user during the singing process of the user, the zoom drive motor is connected with the close-up camera, the zoom drive motor is used for driving the close-up camera to zoom, the panoramic camera and the close-up camera are both mounted on the pan-tilt, the pan-tilt comprises a panoramic camera support and a close-up camera support, the pan-tilt drive motor is connected with the pan-tilt, and the pan-tilt drive motor is used for driving the pan-tilt to perform position conversion, the holder driving motor comprises a rotary stepping motor, a vertical stepping motor and a horizontal stepping motor.

4. The singing training system of claim 1, wherein the MCU module comprises a database unit, a data processing unit, and a singing skill training unit, the database unit is configured to store a model singing skill database, a singing image database, preset values of parameters, and information collected by the singing information obtaining module, the model singing skill database includes a reference music score, the data processing unit is connected to the database unit, the data processing unit is configured to process the information collected by the singing information obtaining module, the singing skill training unit is connected to the data processing unit, and the singing skill training unit is configured to form a guidance suggestion according to a processing result of the data processing unit.

5. The singing training system of claim 4, wherein the data processing unit comprises an audio processing module, the audio processing module comprises a keyword recognition unit and a sound source positioning unit, the keyword recognition unit is configured to convert voice information input by a user into text information, compare the text information with the keywords stored in the database unit, and output a comparison result; the sound source positioning unit is connected with the keyword identification unit and is used for acquiring sound source position information through an iterative optimization algorithm according to the comparison result of the keyword identification unit.

6. The singing training system of claim 5, wherein the audio processing module further comprises a pitch extraction unit including a singing pitch extraction module for singing a pitch sequence by a YIN algorithm and a reference score pitch extraction module for obtaining a score pitch sequence from absolute time positions and absolute pitches of notes provided by a reference score, an alignment unit, a pitch unit, and a note boundary optimization unit; the alignment unit is connected with the pitch extraction unit and is used for constructing a similarity matrix according to the singing pitch sequence and the reference music score pitch sequence and obtaining the approximate position of the reference music score notes in the singing sound through a dynamic programming algorithm; the tone fixing unit is connected with the aligning unit and used for regenerating the pitch of the reference music score, and selecting semitone with the minimum cost as the main tone of the regenerated reference music score according to the total cost of the optimal matching path as the measurement of the difference degree of the regenerated reference music score and singing; the note boundary optimizing unit is connected with the tone setting unit and detects the starting position and the ending position of the singing initial consonant through a consonant boundary distinguishing algorithm.

7. The singing training system of claim 5, wherein the data processing unit further comprises a video processing module, the video processing module comprises a noise reduction unit, a correction unit, a video synthesis module and a close-up image recognition unit, the noise reduction unit is configured to perform noise reduction processing on the singing video information acquired by the information acquisition module through a Beta algorithm, the correction unit is connected with the noise reduction unit, the noise reduction unit is configured to perform correction processing on the singing video information processed by the noise reduction unit through a computer vision algorithm, the video synthesis module and the close-up image recognition unit are both connected with the correction unit, the video synthesis module is configured to perform fusion processing on the panoramic singing video information processed by the correction unit and a simulated scene selected by a user, and the close-up image recognition unit is configured to perform singing according to the expression of a singer in the singing image information And classifying the senses and judging the singing action standard degree according to the action of the singer.

8. The singing training system of claim 5, wherein the singing skill training unit comprises a singing guidance module, an expression guidance module and an action guidance module, the singing guidance module is configured to obtain pitch deviation and rhythm accuracy of different singing segments according to the audio processing result processed by the data processing unit, and mark the singing segments with the pitch deviation higher than a preset pitch deviation value or the rhythm accuracy higher than a preset rhythm accuracy value; the expression guidance module is used for judging whether the singing emotion classification result of the data processing unit is consistent with the preset singing emotion or not and inputting a judgment structure; the action guide module is used for marking the non-standard action according to the singing action standard degree of the data processing unit.

9. The singing training system of claim 1, wherein the human-computer interaction module comprises a touch display screen, a speaker and keys, the touch display screen is used for a user to set parameters and input operation instructions through touch operation, and is used for displaying the information collected by the singing information acquisition module and the information sent by the MCU module, the loudspeaker and the keys are connected with the touch display screen, the loudspeaker is used for playing audio information according to the audio playing instruction information sent by the touch display screen, the keys comprise a page turning key, a brightness adjusting key and a sound adjusting key, the page turning key is used for performing page turning operation on the content displayed on the touch display screen, the brightness adjusting key is used for adjusting the screen brightness of the touch display screen, and the sound adjusting key is used for adjusting the playing volume of the loudspeaker.

10. The singing training system of claim 9, wherein the human-computer interaction module further comprises a light supplement lamp, the light supplement lamp is connected with the touch display screen, and the light supplement lamp is used for supplementing light to the position of the singer according to light supplement lamp adjustment instruction information sent by the touch display screen and sound source position information sent by the MCU module.