CN107146631B - Music identification method, note identification model establishment method, device and electronic equipment - Google Patents

Music identification method, note identification model establishment method, device and electronic equipment Download PDF

Info

Publication number
CN107146631B
CN107146631B CN201610113604.6A CN201610113604A CN107146631B CN 107146631 B CN107146631 B CN 107146631B CN 201610113604 A CN201610113604 A CN 201610113604A CN 107146631 B CN107146631 B CN 107146631B
Authority
CN
China
Prior art keywords
note
segments
segment
information
audio data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610113604.6A
Other languages
Chinese (zh)
Other versions
CN107146631A (en
Inventor
李贝
陈伟
姚光超
唐文琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201610113604.6A priority Critical patent/CN107146631B/en
Publication of CN107146631A publication Critical patent/CN107146631A/en
Application granted granted Critical
Publication of CN107146631B publication Critical patent/CN107146631B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The invention relates to the field of automation, and discloses a music identification method, a note identification model establishment device and electronic equipment, which are used for solving the technical problem that music identification needs to depend on specific hardware equipment in the prior art. The method comprises the following steps: after obtaining the first audio data corresponding to the music, the first audio data may be matched and identified based on M note segments and a note model included in the first audio data, where the note model includes a correspondence between at least one group of note segments and the feature information, that is, the identification of the related information in the first audio data may be achieved without hardware improvement, so that a technical effect of improving compatibility of music identification is achieved.

Description

Music identification method, note identification model establishment method, device and electronic equipment
Technical Field
The invention relates to the field of automation, in particular to a music identification method, a note identification model establishing method, a device and electronic equipment.
Background
With the continuous development of science and technology, electronic technology has also gained rapid development, and the variety of electronic products is also more and more, and people also enjoy various conveniences brought by the development of science and technology. People can enjoy comfortable life brought along with the development of science and technology through various types of electronic equipment. For example: in the prior art, piano training can be realized through the built-in APP of electronic equipment.
The piano partner training in the prior art mainly adopts the following modes (as shown in figure 1): aiming at a piano matched with a special hardware device, such as an electric piano or an electronic organ with a midi output interface, the app identifies whether the performance of a user is correct or not by receiving key switch information of the hardware device. The partner training mode has the following technical problems:
the compatibility of the scheme is not strong, the scheme can only be suitable for the specific equipment corresponding to the APP, and the piano or the place for replacing the piano cannot be spared;
secondly, the existing electric piano or electronic organ is difficult to achieve the same touch sense and tone as the real piano, and the effect of practicing on the real piano is completely different, so that most parents cannot accept to let children learn the piano on the electric piano.
Disclosure of Invention
The invention provides a music identification method, a note identification model establishing method and a note identification model establishing device, and aims to solve the technical problem that music identification in the prior art needs to depend on specific hardware equipment.
In a first aspect, an embodiment of the present invention provides a music identification method, including:
obtaining first audio data corresponding to music, wherein the first audio data comprises M note segments, and M is a positive integer;
and performing matching identification on the first audio data based on the M note segments and a note model, wherein the note model comprises at least one group of corresponding relations between note segments and characteristic information.
Optionally, the performing matching identification on the first audio data includes:
identifying a performance error in the first audio data; and/or obtaining first score information for the first audio data based on the M note segments and a note model.
Optionally, the performance error includes: at least one playing error of note error, rhythm error and intonation error.
Optionally, the identifying the performance error in the first audio data includes:
acquiring reference score information for generating the first audio data;
determining N note segments contained in the reference music score information, wherein N is a positive integer;
and determining the segments of the M note segments which do not match with the notes in the N note segments as the note segments corresponding to the playing errors based on the note models.
Optionally, the method further includes:
determining the playing time length of each note segment in the N note segments;
after the determining, based on the note model, that the segments of the M note segments which do not match with the notes of the N note segments are the note segments corresponding to the performance errors, the method further includes:
and determining the segments of the M note segments which are matched with the notes in the N note segments but not matched with the playing duration as the note segments corresponding to the playing errors.
Optionally, the determining, based on the note model, a segment in which the notes in the M note segments are not matched with the notes in the N note segments as a note segment corresponding to the performance error includes:
searching and obtaining the characteristic information of each note segment in the N note segments from the note model, and further determining first note characteristic information corresponding to the reference music score information;
extracting and obtaining the characteristic information of each note segment in the M note segments, and further obtaining second note characteristic information;
and matching the second note characteristic information with the first note characteristic information, and determining note segments with unmatched characteristic information as the note segments corresponding to the playing errors.
Optionally, the determining, based on the note model, a segment in which the notes in the M note segments are not matched with the notes in the N note segments as a note segment corresponding to the performance error includes:
extracting and obtaining the characteristic information of each note segment in the M note segments, and further obtaining second note characteristic information;
identifying the second note characteristic information through the note model, and further obtaining first music score information corresponding to the first audio data;
and matching the first music score information with the reference music score information, and further determining note segments with different notes as note segments corresponding to the playing errors.
Optionally, the obtaining of reference score information for generating the first audio data includes:
responding to the selection operation of a user to further acquire the reference music score information; or
Scanning a music score on a paper course to obtain music score image information, and carrying out image recognition on the music score image information to further obtain reference music score information; or
And performing acoustic recognition on second audio data associated with the first audio data to determine the reference music score information.
Optionally, the obtaining the first score information of the first audio data includes:
extracting first characteristic information of a first note segment, wherein the first note segment is any one note segment in the M note segments;
determining a note segment with similarity values of the feature information and the first feature information meeting a preset similarity condition from note segments contained in the note model;
taking notes of the note segments with the similarity values meeting preset similarity conditions as notes of the first note segment;
and determining the first score information based on the notes of the first note segment or the notes of the first note segment and the playing time length of the first note segment.
Optionally, after the determining the first score information based on the notes of the first note segment and the duration of the first note segment, the method further includes:
searching and obtaining reference score information with the similarity value of the first score information being larger than a preset similarity value;
and providing the reference music score information as the recommended music score information of the first music score information to a user.
Optionally, the note model includes at least one group of note segments corresponding to the characteristic information at a specific pitch.
In a second aspect, an embodiment of the present invention provides a method for establishing a note recognition model, including:
acquiring standard voice data corresponding to each note segment;
extracting characteristic information of standard voice data of each note segment;
and establishing a note model based on the corresponding relation between the note segments and the characteristic information, wherein the note model comprises the corresponding relation between at least one group of note segments and the characteristic information.
Optionally, the note segments include: a single note consisting of a single key; and/or a plurality of notes formed by a plurality of key combinations.
Optionally, the extracting feature information of the standard speech data of each note segment includes: spectral features and/or frequency features of each note segment are extracted.
Optionally, the extracting feature information of the standard speech data of each note segment includes:
converting the standard voice data from time domain data into frequency domain data;
dividing the frequency domain data into at least one subdata;
and calculating to obtain the energy of each subdata in the at least one subdata, wherein the energy of each subdata in the at least one subdata is taken as the characteristic information of the corresponding note segment.
Optionally, the extracting feature information of the standard speech data of each note segment includes:
converting the standard voice data from time domain data into frequency domain data;
dividing the frequency domain data into at least one subdata;
calculating to obtain the energy of each subdata in the at least one subdata;
and determining the frequency of the subdata corresponding to the preset energy as the characteristic information of the corresponding note segment.
Optionally, the obtaining of the standard voice data corresponding to each note includes:
acquiring standard voice data corresponding to each note under a specific tone;
the establishing of the note model based on the corresponding relation between the note segments and the characteristic information, wherein the note model comprises the corresponding relation between at least one group of note segments and the characteristic information, and comprises the following steps:
and establishing the note model based on the corresponding relation between the note segments and the characteristic information under the specific tone, wherein the note model comprises the corresponding relation between the at least one group of note segments and the characteristic information under the specific tone.
In a third aspect, an embodiment of the present invention provides a music recognition apparatus, including:
the music playing device comprises an obtaining module, a playing module and a playing module, wherein the obtaining module is used for obtaining first audio data corresponding to music, the first audio data comprises M note segments, and M is a positive integer;
and the identification module is used for carrying out matching identification on the first audio data based on the M note segments and the note model, wherein the note model comprises at least one group of corresponding relations between the note segments and the characteristic information.
In a fourth aspect, an embodiment of the present invention provides a note identification model building apparatus, including:
the acquisition module is used for acquiring standard voice data corresponding to each note segment;
the extraction module is used for extracting the characteristic information of the standard voice data of each note segment;
and the establishing module is used for establishing a note model based on the corresponding relation between the note segments and the characteristic information, and the note model comprises the corresponding relation between at least one group of note segments and the characteristic information.
In a fifth aspect, an embodiment of the present invention provides an electronic device, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by one or more processors, the one or more programs including instructions for:
obtaining first audio data corresponding to music, wherein the first audio data comprises M note segments, and M is a positive integer;
and performing matching identification on the first audio data based on the M note segments and a note model, wherein the note model comprises at least one group of corresponding relations between note segments and characteristic information.
In a sixth aspect, an embodiment of the present invention provides an electronic device, including a memory, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs including instructions for:
acquiring standard voice data corresponding to each note segment;
extracting characteristic information of standard voice data of each note segment;
and establishing a note model based on the corresponding relation between the note segments and the characteristic information, wherein the note model comprises the corresponding relation between at least one group of note segments and the characteristic information.
The invention has the following beneficial effects:
in the embodiment of the present invention, after obtaining the first audio data corresponding to the music, the first audio data may be matched and identified based on the M note segments and the note model included in the first audio data, where the note model includes a corresponding relationship between at least one group of note segments and the feature information, that is, the identification of the related information in the first audio data may be achieved without hardware improvement, so that a technical effect of improving compatibility of music identification is achieved, and the cost of music identification can be reduced; in addition, since the hardware improvement is not required to be relied on, the first audio data output in any mode can be identified without relying on an electric piano or an electronic organ, thereby increasing the application range of the scheme.
Drawings
FIG. 1 is a schematic diagram of a prior art piano partner;
FIG. 2 is a flowchart illustrating a music recognition method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for obtaining a note model according to an embodiment of the present invention;
fig. 4 is a flowchart of identifying a performance error in first audio data in the music identification method according to the embodiment of the present invention;
FIG. 5 is a flowchart illustrating a first manner of determining segments with unmatched notes when performing an identification of a performance error in the first audio data according to the music identification method of the embodiment of the present invention;
FIG. 6 is a flowchart illustrating a second manner of determining segments with unmatched notes when performing an identification of a performance error in the first audio data according to the music identification method of the embodiment of the present invention;
fig. 7 is a diagram illustrating an output of a recognition result after a performance error in first audio data is recognized in a music recognition method according to an embodiment of the present invention;
fig. 8 is a flowchart of determining first score information in a music recognition method according to an embodiment of the present invention;
FIG. 9 is a flow chart of a note identification model building method according to an embodiment of the present invention;
FIG. 10 is a block diagram of a music recognition device according to an embodiment of the present invention;
FIG. 11 is a block diagram of an apparatus for building a note recognition model according to an embodiment of the present invention;
FIG. 12 is a block diagram of an electronic device shown in accordance with an example embodiment.
Detailed Description
The invention provides a music identification method, a note identification model establishing method and a note identification model establishing device, and aims to solve the technical problem that music identification in the prior art needs to depend on specific hardware equipment.
In order to solve the technical problems, the general idea of the embodiment of the present application is as follows:
after obtaining first audio data corresponding to music, matching and identifying the first audio data based on M note segments and a note model contained in the first audio data, wherein the note model contains the corresponding relation between at least one group of note segments and feature information, that is, the identification of the related information in the first audio data can be realized without hardware improvement, so that the technical effect of improving the compatibility of music identification is achieved, and the cost of music identification can be reduced; in addition, since the hardware improvement is not required to be relied on, the first audio data output in any mode can be identified without relying on an electric piano or an electronic organ, thereby increasing the application range of the scheme.
In order to better understand the technical solutions of the present invention, the following detailed descriptions of the technical solutions of the present invention are provided with the accompanying drawings and the specific embodiments, and it should be understood that the specific features in the embodiments and the examples of the present invention are the detailed descriptions of the technical solutions of the present invention, and are not limitations of the technical solutions of the present invention, and the technical features in the embodiments and the examples of the present invention may be combined with each other without conflict.
In a first aspect, an embodiment of the present invention provides a music identification method, please refer to fig. 2, including:
step S201: obtaining first audio data corresponding to music, wherein the first audio data comprises M note segments, and M is a positive integer;
step S202: and performing matching identification on the first audio data based on the M note segments and a note model, wherein the note model comprises at least one group of corresponding relations between note segments and characteristic information.
In step S201, the first audio data is, for example: the embodiment of the present invention is not limited to what kind of audio data the user obtains through musical instrument playing, audio data obtained by singing, audio data carried by the electronic device or downloaded in a network, and the like.
Wherein, a note segment includes one or more notes, each note includes a scale and a pitch of each scale, and the scale is, for example: 1. 2, 3, 4, 5, 6, 7, the pitch of the scale is, for example: bass, mid-range, treble, etc. Wherein, a note segment may only contain one note, for example: (mediant) 1, (mediant) 2, etc., a note segment may also contain multiple notes, such as: a (middle) 1 (middle) 2, a (middle) 1 (middle) 2 (bass) 3, and so on. Wherein, if the musical instrument generating the first audio data is a key-type musical instrument (e.g. piano, electronic organ, etc.) and the note segment is a single note, the single note may be a note composed of a single key or a plurality of notes composed of a plurality of key combinations, for example: chord tones, and the like.
Wherein each note contained in the first audio data can be identified by performing feature recognition on the first audio data and then divided into a plurality of parts according to the note, for example: if 10 notes are included in the first audio data, the first audio data may be divided into 10 parts, and then the 10 parts are combined into M note segments based on the number of notes included in each note segment, for example: if each note segment contains only one note, the 10 parts can be directly used as 10 note segments; if two notes are included in each note segment, the 10 segments can be combined two by two in turn to obtain 5 note segments, and so on.
After the first audio data is acquired, the first audio data can be directly adopted for subsequent processing, or silence and invalid voice can be filtered from the first audio data.
Before step S202 is executed, a note model needs to be obtained first, and in the implementation process, referring to fig. 3, the note model can be obtained through the following steps:
step S301: acquiring standard voice data corresponding to each note segment;
step S302: extracting characteristic information of standard voice data of each note segment;
step S303: and establishing a note model based on the corresponding relation between the note segments and the characteristic information, wherein the note model comprises the corresponding relation between at least one group of note segments and the characteristic information.
In step S301, taking the note model as a note model of a piano, each note segment includes one note, the piano usually includes 88 keys, and each key corresponds to one note, so that standard voice data of the corresponding key can be obtained by recording for each key.
In step S302, the feature data may include a plurality of kinds of feature information, and two kinds of feature information are listed below for description, but of course, in the specific implementation process, the following two cases are not limited to the following two cases, and in addition, in the case of no conflict, the following two cases may be used in combination.
Firstly, the feature data includes a spectrum feature, and the extracting feature information of the standard voice data of each note segment includes: converting the standard voice data from time domain data into frequency domain data; dividing the frequency domain data into at least one subdata; and calculating the energy of each subdata in the at least one subdata, wherein the energy of each subdata in the at least one subdata is the characteristic information of the corresponding note segment.
For example, the frequency domain data may be divided into different amounts of sub-data according to actual requirements, for example: 5. 10, etc., embodiments of the present invention are not limited. Wherein, it is assumed that the frequency domain data is divided into 5 sub-data, and the energy of the 5 sub-data for a certain standard voice data (e.g. treble 1) is: 5. 8, 10, 10, 12, it can be determined that the characteristic information of the standard voice data is (5, 8, 10, 10, 12).
The method includes the steps that standard voice data can be collected only once for a certain note segment, or multiple times of standard voice data can be collected, if the multiple times of standard voice data are collected, multiple groups of previous feature information can be obtained, and then the multiple groups of feature information are averaged to obtain feature information finally used for determining a note model.
Secondly, the feature data includes frequency features, and the extracting of feature information of the standard voice data of each note segment includes: converting the standard voice data from time domain data into frequency domain data; dividing the frequency domain data into at least one subdata; calculating to obtain the energy of each subdata in the at least one subdata; and determining the frequency of the subdata corresponding to the preset energy as the characteristic information of the corresponding note segment.
For example, the embodiment of the present invention does not limit how to set the predetermined frequency, and the frequency corresponding to the sub data with the highest energy may be determined as the frequency characteristic of the corresponding note segment, or the frequency corresponding to the sub data with the middle energy may be determined as the frequency characteristic of the corresponding note.
In the specific implementation process, if the note model is a note model for a fixed tuning instrument, the corresponding standard voice data of a certain note segment is directly obtained, and then the corresponding relation between the note segment and the characteristic information is established; and if the note model is the note model for the tonal modification musical instrument, the obtaining of the standard voice data corresponding to each note comprises: acquiring standard voice data corresponding to each note under a specific tone; the establishing of the note model based on the corresponding relation between the note segments and the characteristic information, wherein the note model comprises the corresponding relation between at least one group of note segments and the characteristic information, and comprises the following steps: and establishing the note model based on the corresponding relation between the note segments and the characteristic information under the specific tone, wherein the note model comprises the corresponding relation between the at least one group of note segments and the characteristic information under the specific tone.
For example, if a certain musical instrument adopts different tonality (such as C key, D key, etc.) even the same notes, the characteristic information is different, so for each tone, the standard voice data corresponding to each note segment is collected and obtained, and finally the obtained notes include the corresponding relation between at least one group of note segments and the characteristic information under each tone, and more accurate identification of music can be realized based on the scheme.
In a specific implementation process, the note model may be a model in various forms, two of which are listed below for description, and of course, in a specific implementation process, the note model is not limited to the following two cases.
First, the note model is specifically a note template, and the note template includes a corresponding relationship between notes and feature information, for example, as shown in table 1:
TABLE 1
Musical notes Characteristic information
High pitch 1 (5,8,10,10,12)
Mediant 7 (5,13,12,9,12)
Middle 6 (12,6,18,10,6)
Middle 5 (9,8,10,10,7)
…… ……
Of course, in the specific implementation process, the note and the feature information may also be in other corresponding relationships, and the embodiment of the present invention is not illustrated in detail and is not limited.
Second, the note model is specifically: statistical-based models, such as: training HMMs (Hidden Markov models), decision tree models, etc. based on extracted features.
Taking the HMM model as an example, the feature information corresponding to each note segment may be obtained first, and then the HMM model is input into the feature information of each note segment for training, so as to obtain a set of probability vectors, so that the probability of the feature information corresponding to the corresponding note segment tends to be maximized, that is, the HMM model used for identifying the corresponding note segment is obtained, and further, how many HMM models exist.
The HMM model for note recognition includes probabilities that feature information belongs to a certain note segment, for example: after the feature information (5, 8, 10, 10, 12) is obtained, it may be input into an HMM model corresponding to each note segment, the probability of the feature information under each note segment is obtained, and then the note segment with the highest probability value is determined as the note segment corresponding to the feature information.
In step S202, the matching identification of the first audio data can be divided into a plurality of cases, and two cases are listed below for description, and of course, in the specific implementation process, the two cases are not limited to the following two cases.
The first mode for carrying out matching identification on the first audio data is as follows: performance errors in the first audio data are identified. The performance errors can be divided into a plurality of errors, three of which are listed below for description, and certainly, the implementation process is not limited to the following three cases.
First, the performance error includes: the note is wrong.
For example, assuming that the user generates the first audio data based on the reference score information, where the second note segment in the reference score information is the high pitch 1 and the second note segment in the first audio data is the middle pitch 7, it can be determined that a note error occurs during the performance.
The note error may be to play a note as another note, to omit a note, or to play a note more, and the embodiments of the present invention are not limited.
Second, the performance error includes: the rhythm is wrong.
For example, it is assumed that the user generates the first audio data based on the reference score information, wherein the second note segment in the reference score information is high pitch 1 and is a full note, and the second note segment in the first audio data is high pitch 1 but is a half note, indicating that a rhythm error occurs.
Thirdly, the performance error includes: it is not sound accurate.
For example, for some blowing instruments, there is a requirement for breath, and if the breath is unstable or insufficient, the fingering corresponding to the note segment in the timely blowing process is not correct, which also results in inaccurate sound.
In a specific implementation process, the identifying the performance error in the first audio data, referring to fig. 4, includes:
step S401: acquiring reference score information for generating the first audio data;
step S402: determining N note segments contained in the reference music score information, wherein N is a positive integer;
step S403: and determining the segment of the M note segments which is not matched with the notes in the N note segments as the note segment corresponding to the playing error based on the note model.
In step S401, the reference score information is, for example: numbered musical notation, staff, abbreviated notation, etc., embodiments of the present invention are not limited. In the specific implementation process, the reference score information may be obtained in various ways, and three of them are listed below for description, of course, the specific implementation process is not limited to the following three cases.
Determining a reference score information: and responding to the selection operation of the user to further acquire the reference music score information.
For example, when a user plays music, if the user wants to identify a performance error in the music played by the user through the APP, the corresponding APP may be opened first, and then the music played by the user is selected from the APP, where a music score corresponding to the music selected by the user is the reference music score information.
Determining the reference music score information: and scanning a music score on a paper course to obtain music score image information, and carrying out image recognition on the music score image information to obtain the reference music score information.
For example, a paper tutorial exists for musical instrument playing, the paper tutorial contains a lot of music score information, if a user wants which music, the user can scan through a camera to obtain music score image information of the corresponding music, and then image recognition is performed through electronic equipment to obtain reference music score information.
Determining reference music score information (c): and performing acoustic recognition on second audio data associated with the first audio data to determine the reference music score information.
For example, the second audio data may be other audio data different from the first audio data, such as: when a user wishes to play music a, the user may first play a segment of music a (i.e. second audio data), and after the second audio data is recorded, the electronic device determines reference music score information matched with the second audio data by performing acoustic recognition on the second audio data, and when the user plays three musical notes, the user recognizes the musical notes by the pair of musical notes, assuming that the recognized musical notes are: and 125, matching in the system through the step 125, and determining the following recognition results:
Figure DEST_PATH_GDA0001032980690000111
when a user plays five notes, the notes are identified, and if the notes are identified to be 12531, matching is performed in the system through 12531, and the following identification results are determined:
Figure DEST_PATH_GDA0001032980690000112
the reference score information can be determined based on the selection operation of the user, and the notes input by the user can be continuously detected until only one identification result is left, and the remaining identification result is the reference score information.
After determining the reference score information based on the second audio data, the user plays music a again, and the electronic device continues to collect audio data played by the user as the first audio data.
Still alternatively, after determining the reference score information based on the second audio data, the user continues to perform the music a, in which case the second audio data may be the same audio data as the first audio data, for example: the user directly plays the music A, the electronic equipment collects first audio data played by the user, corresponding reference music score information is determined through the first audio data, and playing errors in the first audio data are identified.
In step S402, each note in the reference score information is directly extracted, so that N note segments can be obtained, wherein when each note in the reference score information is extracted, all notes in the reference score information can be extracted at one time, or some notes in the reference score information can be extracted in sequence, and after some notes in the reference score information are extracted, a performance error in the first audio data can be identified in real time.
In step S403, the segments in which the notes in the M note segments are not matched with the notes in the N note segments can be determined in various ways, and two of them are listed below for description, which is not limited to the following two cases in the specific implementation process.
First, the determining, based on the note model, a segment in which the notes in the M note segments do not match the notes in the N note segments as a note segment corresponding to the performance error, please refer to fig. 5, which includes:
step S501: searching and obtaining the characteristic information of each note segment in the N note segments from the note model, and further determining first note characteristic information corresponding to the reference music score information;
step S502: extracting and obtaining the characteristic information of each note segment in the M note segments, and further obtaining second note characteristic information;
step S503: and matching the second note characteristic information with the first note characteristic information, and determining note segments with unmatched characteristic information as the note segments corresponding to the playing errors.
In step S502, the manner of obtaining the feature information of each note segment of the M note segments is similar to the manner of the feature information when the note model is established, and thus is not described herein again.
In step S503, the matching manner of the second note characteristic information and the first note characteristic information is different based on the difference of the note models, and two of them are listed below for description, but it is needless to say that the specific implementation process is not limited to the following two cases.
The note model is a note template, and similarity parameters of the feature information of each note segment in the second note feature information and the feature information of the corresponding note segment in the first note feature information can be calculated, and unmatched note segments are determined based on the similarity values.
Taking the first note segment in the second note characteristic information as an example, a distance value between the characteristic information of the first note segment in the second note characteristic information and the characteristic information of the corresponding note segment in the first note characteristic information (for example, the larger the distance value is, the smaller the similarity value is) can be calculated, and then whether the distance value is smaller than a preset distance threshold value is judged, if so, the first note segment in the second note characteristic information is matched with the first note segment in the first note characteristic information, otherwise, the first note segment in the second note characteristic information is not matched; or calculating the similarity value between the feature information of the first note segment in the second note feature information and the feature information of the corresponding note segment in the first note feature information, then judging whether the similarity value is greater than a preset similarity threshold value, if so, indicating that the first note segment in the second note feature information is matched with the first note segment in the first note feature information, otherwise, indicating that the first note segment is not matched.
Secondly, if the note model is a statistical model, each feature information in the second note feature information can be input into the statistical model of the corresponding note feature information in the first note feature information to obtain a calculated probability
Figure DEST_PATH_GDA0001032980690000131
Wherein
Figure DEST_PATH_GDA0001032980690000132
For feature extraction, M is a statistical model; and then judging whether the calculated probability is greater than a preset probability value, if so, indicating that the corresponding note segment in the second note characteristic information is matched with the corresponding note segment in the first note characteristic information, otherwise, indicating that the calculated probability is not matched.
First, the determining, based on the note model, a segment in which the notes in the M note segments do not match the notes in the N note segments as a note segment corresponding to the performance error, please refer to fig. 6, which includes:
step S601: extracting and obtaining the characteristic information of each note segment in the M note segments, and further obtaining second note characteristic information;
step S602: identifying the second note characteristic information through the note model, and further obtaining first music score information corresponding to the first audio data;
step S603: and matching the first music score information with the reference music score information, and further determining note segments with different notes as note segments corresponding to the playing errors.
In step S602, the manner of obtaining the first score information is different based on different note models, and two of them are listed below for description, and certainly, in the specific implementation process, the two cases are not limited to the following two cases.
The note model is a note template, in this case, for first feature information corresponding to a certain note segment in the second note feature information, similarity parameters (for example, similarity value, distance value, and the like) may be calculated from the first feature information and feature information of all note segments in the note model, and then a note segment with the highest similarity value (or the lowest distance value) between the note template feature information and the first feature information is determined as the first note segment, where each feature information in the second note feature information may be obtained in this manner to obtain the corresponding note segment, and further obtain the first score information.
The note model is a statistical model, in this case, for the first feature information corresponding to a certain note segment in the second note feature information, the first feature information may be input into the statistical models corresponding to all note segments, and then the note segment corresponding to the statistical model with the highest obtained probability value is determined as the note segment corresponding to the first feature information.
In step S603, directly comparing the first score information with the reference score information to determine whether the note segments at the same positions are the same, and if not, determining that the corresponding note segments are the note segments with errors in playing.
As an alternative embodiment, the method further comprises: determining the playing time length of each note segment in the N note segments;
after the determining, based on the note model, that the segments of the M note segments which do not match the notes of the N note segments are the note segments corresponding to the performance errors, the method may further include: and determining the segments of the M note segments which are matched with the notes in the N note segments but not matched with the playing duration as the note segments corresponding to the playing errors.
For example, if the third note segment in the first audio data is a middle 3 with a note length of a full note, and the third note segment in the reference score information is a middle 3 with a note length of a half note, it indicates that the note segment played with is correct, but the playing time is less than half, in this case, the note can also be regarded as a playing error segment. Of course, there may be other situations where the playing time lengths do not match, and the embodiments of the present invention are not illustrated in detail and are not limited.
In a specific implementation process, after the note segment with the performance error in the first audio data is identified, the place with the performance error in the first audio data can be provided to the user, and the place with the performance error is output after statistics, as shown in fig. 7. The position of a performance error can be identified and provided for the user, or the identification result can be provided for the user after the music performance is finished.
And a second mode of performing matching identification on the first audio data is as follows: obtaining first score information of the first audio data based on the M note segments and a note model.
In a specific implementation process, the obtaining the first score information of the first audio data, please refer to fig. 8, which includes:
step S801: extracting first characteristic information of a first note segment, wherein the first note segment is any one note segment in the M note segments;
step S802: determining a note segment with similarity values of the feature information and the first feature information meeting a preset similarity condition from note segments contained in the note model;
step S803: taking notes of the note segments with the similarity values meeting preset similarity conditions as notes of the first note segment;
step S804: and determining the first score information based on the notes of the first note segment or the notes of the first note segment and the playing time length of the first note segment.
In step S801, how to extract the first feature information of the first note segment is described above, and therefore, the description is omitted here.
In step S802, the similarity value satisfying the preset similarity condition may be classified into various cases, for example:
similarity values of the first feature information and feature information of all note segments in the note model can be calculated, and then the note segment with the highest similarity value of the feature information and the first feature information in the note model is obtained as the first note segment; or calculating the distance values between the first characteristic information and the characteristic information of all the note segments in the note model, and then acquiring the note segment with the lowest distance value between the characteristic information and the first characteristic information in the note model as the first note segment.
The similarity value of the first characteristic information and the characteristic information of any note segment in the note model can be calculated, whether the similarity value is larger than a preset similarity threshold value or not is judged, and if the similarity value is larger than the preset similarity threshold value, the corresponding note segment is determined to be used as the first note segment; if the similarity is not greater than the preset similarity threshold, the feature information of the next note segment in the note model is obtained, and the similarity value is continuously calculated with the first feature information until the first character segment is determined.
In step S804, since the note segments corresponding to all the feature information are determined based on step S803, the notes of each note segment are sequentially arranged according to the occurrence time, and then the first score information can be obtained; and the playing time of the first note segment can be directly obtained from the first audio data, so that the first score information can be determined based on the playing time of each note segment and each note segment, and if the first score information is determined based on the playing time of each note segment and each note segment, more accurate first score information can be determined.
As an optional embodiment, after determining the first score information, the method further includes: searching and obtaining reference score information with the similarity value of the first score information being larger than a preset similarity value; and providing the reference music score information as the recommended music score information of the first music score information to a user.
For example, more authoritative scores may be pre-collected and sorted into a score database, with substantially no errors. The first score information obtained by the user may be score information obtained through network search or self-recording, some errors may occur, and therefore, after the first score information is obtained, reference score information having a similarity value greater than a preset similarity value with the first score information may be obtained from a score database and provided to the user, so as to ensure the accuracy of the score information obtained by the user.
In addition, before the reference music score information is provided for the user, whether the reference music score information is completely the same as the first music score information can be judged, and if the reference music score information is completely the same as the first music score information, the reference music score information does not need to be provided for the user; the reference score information is provided to the user when not identical.
In a second aspect, based on the same inventive concept, an embodiment of the present invention provides a method for establishing a note recognition model, please refer to fig. 9, which includes:
step S901: acquiring standard voice data corresponding to each note segment;
step S902: extracting characteristic information of standard voice data of each note segment;
step S903: and establishing a note model based on the corresponding relation between the note segments and the characteristic information, wherein the note model comprises the corresponding relation between at least one group of note segments and the characteristic information.
Optionally, the note segments include: a single note consisting of a single key; and/or a plurality of notes formed by a plurality of key combinations.
Optionally, the extracting feature information of the standard speech data of each note segment includes: spectral features and/or frequency features of each note segment are extracted.
Optionally, the extracting feature information of the standard speech data of each note segment includes:
converting the standard voice data from time domain data into frequency domain data;
dividing the frequency domain data into at least one subdata;
and calculating to obtain the energy of each subdata in the at least one subdata, wherein the energy of each subdata in the at least one subdata is taken as the characteristic information of the corresponding note segment.
Optionally, the extracting feature information of the standard speech data of each note segment includes:
converting the standard voice data from time domain data into frequency domain data;
dividing the frequency domain data into at least one subdata;
calculating to obtain the energy of each subdata in the at least one subdata;
and determining the frequency of the subdata corresponding to the preset energy as the characteristic information of the corresponding note segment.
Optionally, the obtaining of the standard voice data corresponding to each note includes:
acquiring standard voice data corresponding to each note under a specific tone;
the method for establishing the note model based on the corresponding relation between the note segments and the characteristic information comprises the following steps:
and establishing the note model based on the corresponding relation between the note segments and the characteristic information under the specific tone, wherein the note model comprises the corresponding relation between the at least one group of note segments and the characteristic information under the specific tone.
Since the method for establishing a note recognition model introduced in the second aspect of the embodiment of the present invention corresponds to the method for identifying music introduced in the first aspect of the embodiment of the present invention, a person skilled in the art can understand the specific implementation process of the method for establishing a note recognition model introduced in the second aspect of the embodiment of the present invention based on the method for identifying music introduced in the first aspect of the embodiment of the present invention, and thus, no further description is given here.
In a third aspect, based on the same inventive concept, an embodiment of the present invention provides a music recognition apparatus, please refer to fig. 10, including:
an obtaining module 10, configured to obtain first audio data corresponding to music, where the first audio data includes M note segments, and M is a positive integer;
and the identifying module 11 is configured to perform matching identification on the first audio data based on the M note segments and a note model, where the note model includes at least one group of corresponding relationships between note segments and feature information.
Optionally, the identification module 11 is configured to:
identifying a performance error in the first audio data; and/or obtaining first score information for the first audio data based on the M note segments and a note model.
Optionally, the performance error includes: at least one playing error of note error, rhythm error and intonation error.
Optionally, the identification module 11 includes:
an acquisition unit configured to acquire reference score information used to generate the first audio data;
a first determining unit, configured to determine N note segments included in the reference score information, where N is a positive integer;
and a second determining unit, configured to determine, based on the note model, a segment in which the notes in the M note segments do not match the notes in the N note segments as a note segment corresponding to the performance error.
Optionally, the apparatus further comprises:
the first determining module is used for determining the playing time length of each note segment in the N note segments;
and the second determining module is used for determining the segments of the M note segments which are matched with the notes in the N note segments but not matched with the playing duration as the note segments corresponding to the playing errors.
Optionally, the second determining unit includes:
an obtaining subunit, configured to search and obtain feature information of each note segment in the N note segments from the note model, and further determine first note feature information corresponding to reference score information;
the first extraction subunit is used for extracting and obtaining the characteristic information of each note segment in the M note segments so as to obtain second note characteristic information;
and the first matching subunit is configured to match the second note characteristic information with the first note characteristic information, and determine a note segment with unmatched characteristic information as the note segment corresponding to the performance error.
Optionally, the second determining unit includes:
a second extracting subunit, configured to extract and obtain feature information of each note segment in the M note segments, so as to obtain second note feature information;
the identification subunit is configured to identify the second note feature information through the note model, so as to obtain first score information corresponding to the first audio data;
and the second matching subunit is used for matching the first music score information with the reference music score information so as to determine note segments with different notes as the note segments corresponding to the playing errors.
Optionally, the obtaining unit is configured to:
responding to the selection operation of a user to further acquire the reference music score information; or
Scanning a music score on a paper course to obtain music score image information, and carrying out image recognition on the music score image information to further obtain reference music score information; or
And performing acoustic recognition on second audio data associated with the first audio data to determine the reference music score information.
Optionally, the identification module 11 includes:
an extracting unit, configured to extract first feature information of a first note segment, where the first note segment is any one of the M note segments;
a third determining unit, configured to determine, from the note segments included in the note model, a note segment whose similarity value between the feature information and the first feature information satisfies a preset similarity condition;
a fourth determining unit, configured to use the note of the note segment with the similarity value satisfying a preset similarity condition as the note of the first note segment;
a fifth determining unit, configured to determine the first score information based on the notes of the first note segment or the notes of the first note segment and a playing time length of the first note segment.
Optionally, the apparatus further comprises:
the searching module is used for searching and obtaining reference music score information of which the similarity value with the first music score information is greater than a preset similarity value;
and the third determining module is used for providing the reference music score information as the recommended music score information of the first music score information to a user.
Optionally, the note model includes at least one group of note segments corresponding to the characteristic information at a specific pitch.
Since the music recognition apparatus described in the third aspect of the embodiment of the present invention is a device used for implementing the music recognition method described in the first aspect of the embodiment of the present invention, based on the music recognition method described in the first aspect of the embodiment of the present invention, a person skilled in the art can understand the specific structure and the modification of the apparatus, and therefore details are not described here, and all the devices used for implementing the music recognition method described in the first aspect of the embodiment of the present invention belong to the scope of the present invention to be protected.
In a fourth aspect, based on the same inventive concept, an embodiment of the present invention provides a note recognition model building apparatus, please refer to fig. 11, including:
an obtaining module 20, configured to obtain standard voice data corresponding to each note segment;
an extracting module 21, configured to extract feature information of standard voice data of each note segment;
the establishing module 22 is configured to establish a note model based on a corresponding relationship between the note segments and the feature information, where the note model includes a corresponding relationship between at least one group of note segments and the feature information.
Optionally, the note segments include: a single note consisting of a single key; and/or a plurality of notes formed by a plurality of key combinations.
Optionally, the extracting module 21 is configured to: spectral features and/or frequency features of each note segment are extracted.
Optionally, the extracting module 21 includes:
the first conversion unit is used for converting the standard voice data from time domain data into frequency domain data;
a first dividing unit, configured to divide the frequency domain data into at least one sub data;
and the first calculating unit is used for calculating and obtaining the energy of each subdata in the at least one subdata, and the energy of each subdata in the at least one subdata is taken as the characteristic information of the corresponding note segment.
Optionally, the extracting module 21 includes:
the second conversion unit is used for converting the standard voice data from time domain data into frequency domain data;
a second dividing unit, configured to divide the frequency domain data into at least one sub data;
the second calculating unit is used for calculating and obtaining the energy of each subdata in the at least one subdata;
and determining the frequency of the subdata corresponding to the preset energy as the characteristic information of the corresponding note segment.
Optionally, the obtaining module 20 is configured to obtain standard voice data corresponding to each note under a specific tone;
the establishing module 22 is configured to establish the note model based on a corresponding relationship between the note segments and the feature information under the specific tone, where the note model includes corresponding relationships between the plurality of groups of note segments and the feature information under the specific tone.
Since the note identification model establishing apparatus described in the fourth aspect of the embodiment of the present invention is an apparatus used for implementing the note identification model establishing method described in the second aspect of the embodiment of the present invention, a person skilled in the art can understand the specific structure and the modification of the apparatus based on the note identification model establishing method described in the second aspect of the embodiment of the present invention, and therefore, no further description is given here, and all apparatuses used for implementing the note identification model establishing method described in the second aspect of the embodiment of the present invention belong to the scope to be protected by the present invention.
In a fifth aspect, based on the same inventive concept, an embodiment of the present invention provides an electronic device, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by the one or more processors, the one or more programs including instructions for:
obtaining first audio data corresponding to music, wherein the first audio data comprises M note segments, and M is a positive integer;
and performing matching identification on the first audio data based on the M note segments and a note model, wherein the note model comprises at least one group of corresponding relations between note segments and characteristic information.
Since the electronic device described in the fifth aspect of the embodiment of the present invention is an electronic device used for implementing the music recognition method described in the first aspect of the embodiment of the present invention, based on the music recognition method described in the first aspect of the embodiment of the present invention, a person skilled in the art can understand a specific structure and a modification of the electronic device, and therefore details are not described here, and all electronic devices used for implementing the music recognition method described in the first aspect of the embodiment of the present invention belong to the scope to be protected by the present invention.
In a sixth aspect, based on the same inventive concept, an embodiment of the present invention provides an electronic device, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by the one or more processors, the one or more programs including instructions for:
acquiring standard voice data corresponding to each note segment;
extracting characteristic information of standard voice data of each note segment;
and establishing a note model based on the corresponding relation between the note segments and the characteristic information, wherein the note model comprises the corresponding relation between at least one group of note segments and the characteristic information.
Since the electronic device introduced in the sixth aspect of the embodiment of the present invention is an electronic device used for implementing the method for establishing a note recognition model introduced in the second aspect of the embodiment of the present invention, a person skilled in the art can understand the specific structure and the modification of the electronic device based on the method for establishing a note recognition model introduced in the second aspect of the embodiment of the present invention, and therefore details are not described herein again, and all electronic devices used for implementing the method for establishing a note recognition model introduced in the second aspect of the embodiment of the present invention belong to the scope to be protected by the present invention.
Fig. 12 is a block diagram of an electronic device 800 illustrating a music recognition method or a note recognition model building method according to an example embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 12, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. Power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a music recognition method, the method comprising:
obtaining first audio data corresponding to music, wherein the first audio data comprises M note segments, and M is a positive integer;
and performing matching identification on the first audio data based on the M note segments and a note model, wherein the note model comprises at least one group of corresponding relations between note segments and characteristic information.
A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of an electronic device, enable the electronic device to perform a note recognition model building method, the method comprising:
acquiring standard voice data corresponding to each note segment;
extracting characteristic information of standard voice data of each note segment;
and establishing a note model based on the corresponding relation between the note segments and the characteristic information, wherein the note model comprises the corresponding relation between at least one group of note segments and the characteristic information.
Servers in embodiments of the present invention may vary significantly depending on configuration or performance, and may include one or more Central Processing Units (CPUs) (e.g., one or more processors) and memory, one or more storage media (e.g., one or more mass storage devices) that store applications or data. The memory and storage medium may be, among other things, transient or persistent storage. The program stored on the storage medium may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor may be configured to communicate with the storage medium and execute a series of instruction operations in the storage medium on the server.
The server may also include one or more power supplies, one or more wired or wireless network interfaces, one or more input-output interfaces, one or more keyboards, and/or one or more operating systems, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and the like.
One or more embodiments of the invention have at least the following beneficial effects:
in the embodiment of the present invention, after obtaining the first audio data corresponding to the music, the first audio data may be matched and identified based on M note segments and a note model included in the first audio data, where the note model includes a corresponding relationship between a plurality of groups of note segments and feature information, that is, the identification of the related information in the first audio data may be achieved without hardware improvement, so that a technical effect of improving compatibility of music identification is achieved, and the cost of music identification can be reduced; in addition, since the hardware improvement is not required to be relied on, the first audio data output in any mode can be identified without relying on an electric piano or an electronic organ, thereby increasing the application range of the scheme.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

Claims (36)

1. A music recognition method, comprising:
obtaining first audio data corresponding to music, wherein the first audio data comprises M note segments, and M is a positive integer;
matching and identifying the first audio data based on the M note segments and a note model, wherein the note model comprises the corresponding relation between at least one group of note segments and characteristic information, and the characteristic information is the characteristic information of standard voice data of the note segments;
the performing matching identification on the first audio data comprises: identifying performance errors in the first audio data, including: acquiring reference score information for generating the first audio data; determining N note segments contained in the reference music score information, wherein N is a positive integer; and determining the segments of the M note segments which do not match with the notes in the N note segments as the note segments corresponding to the playing errors based on the note models.
2. The method of claim 1, wherein the identifying the match to the first audio data comprises:
obtaining first score information of the first audio data based on the M note segments and a note model.
3. The method of claim 1, wherein the performance error comprises: at least one playing error of note error, rhythm error and intonation error.
4. The method of claim 1, wherein the method further comprises:
determining the playing time length of each note segment in the N note segments;
after the determining, based on the note model, that the segments of the M note segments which do not match with the notes of the N note segments are the note segments corresponding to the performance errors, the method further includes:
and determining the segments of the M note segments which are matched with the notes in the N note segments but not matched with the playing duration as the note segments corresponding to the playing errors.
5. The method as claimed in claim 1, wherein the determining, based on the note model, a segment in which the notes of the M note segments do not match the notes of the N note segments as the note segment corresponding to the performance error comprises:
searching and obtaining the characteristic information of each note segment in the N note segments from the note model, and further determining first note characteristic information corresponding to the reference music score information;
extracting and obtaining the characteristic information of each note segment in the M note segments, and further obtaining second note characteristic information;
and matching the second note characteristic information with the first note characteristic information, and determining note segments with unmatched characteristic information as the note segments corresponding to the playing errors.
6. The method as claimed in claim 1, wherein the determining, based on the note model, a segment in which the notes of the M note segments do not match the notes of the N note segments as the note segment corresponding to the performance error comprises:
extracting and obtaining the characteristic information of each note segment in the M note segments, and further obtaining second note characteristic information;
identifying the second note characteristic information through the note model, and further obtaining first music score information corresponding to the first audio data;
and matching the first music score information with the reference music score information, and further determining note segments with different notes as note segments corresponding to the playing errors.
7. The method of any of claims 4-6, wherein said obtaining reference score information for use in generating said first audio data comprises:
responding to the selection operation of a user to further acquire the reference music score information; or
Scanning a music score on a paper course to obtain music score image information, and carrying out image recognition on the music score image information to further obtain reference music score information; or
And performing acoustic recognition on second audio data associated with the first audio data to determine the reference music score information.
8. The method of claim 2, wherein the obtaining first score information for the first audio data comprises:
extracting first characteristic information of a first note segment, wherein the first note segment is any one note segment in the M note segments;
determining a note segment with similarity values of the feature information and the first feature information meeting a preset similarity condition from note segments contained in the note model;
taking notes of the note segments with the similarity values meeting preset similarity conditions as notes of the first note segment;
and determining the first score information based on the notes of the first note segment or the notes of the first note segment and the playing time length of the first note segment.
9. The method of claim 8, wherein after said determining the first score information based on the notes of the first note segment and the duration of the first note segment, the method further comprises:
searching and obtaining reference score information with the similarity value of the first score information being larger than a preset similarity value;
and providing the reference music score information as the recommended music score information of the first music score information to a user.
10. A method according to any of claims 1-3, wherein the note model comprises at least one set of note segments corresponding to characteristic information at a particular pitch.
11. A note recognition model building method is characterized by comprising the following steps:
acquiring standard voice data corresponding to each note segment;
extracting characteristic information of standard voice data of each note segment;
establishing a note model based on the corresponding relationship between the note segments and the feature information, wherein the note model comprises the corresponding relationship between at least one group of note segments and the feature information, and is used for identifying the performance errors in the audio data corresponding to the music, and the identifying the performance errors in the audio data corresponding to the music comprises the following steps:
obtaining first audio data corresponding to music, wherein the first audio data comprises M note segments, and M is a positive integer; acquiring reference music score information for generating the audio data, and determining N note segments contained in the reference music score information, wherein N is a positive integer; and determining the segments of the M note segments which do not match with the notes in the N note segments as the note segments corresponding to the playing errors based on the note models.
12. The method of claim 11, wherein the note segments comprise: a single note consisting of a single key; and/or a plurality of notes formed by a plurality of key combinations.
13. The method as claimed in claim 11, wherein said extracting feature information of the standard voice data of each note segment comprises: spectral features and/or frequency features of each note segment are extracted.
14. The method as claimed in claim 13, wherein said extracting feature information of the standard voice data of each note segment comprises:
converting the standard voice data from time domain data into frequency domain data;
dividing the frequency domain data into at least one subdata;
and calculating to obtain the energy of each subdata in the at least one subdata, wherein the energy of each subdata in the at least one subdata is taken as the characteristic information of the corresponding note segment.
15. The method as claimed in claim 13, wherein said extracting feature information of the standard voice data of each note segment comprises:
converting the standard voice data from time domain data into frequency domain data;
dividing the frequency domain data into at least one subdata;
calculating to obtain the energy of each subdata in the at least one subdata;
and determining the frequency of the subdata corresponding to the preset energy as the characteristic information of the corresponding note segment.
16. A method according to any one of claims 11-15, wherein said obtaining standard speech data corresponding to each note comprises:
acquiring standard voice data corresponding to each note under a specific tone;
the establishing of the note model based on the corresponding relation between the note segments and the characteristic information, wherein the note model comprises the corresponding relation between at least one group of note segments and the characteristic information, and comprises the following steps:
and establishing the note model based on the corresponding relation between the note segments and the characteristic information under the specific tone, wherein the note model comprises the corresponding relation between the at least one group of note segments and the characteristic information under the specific tone.
17. A music recognition device, comprising:
the music playing device comprises an obtaining module, a playing module and a playing module, wherein the obtaining module is used for obtaining first audio data corresponding to music, the first audio data comprises M note segments, and M is a positive integer;
the recognition module is used for carrying out matching recognition on the first audio data based on the M note segments and the note models, wherein the note models comprise the corresponding relation between at least one group of note segments and characteristic information, and the characteristic information is the characteristic information of standard voice data of the note segments;
the performing matching identification on the first audio data comprises: identifying a performance error in the first audio data,
the identification module comprises: an acquisition unit configured to acquire reference score information used to generate the first audio data; a first determining unit, configured to determine N note segments included in the reference score information, where N is a positive integer; and a second determining unit, configured to determine, based on the note model, a segment in which the notes in the M note segments do not match the notes in the N note segments as a note segment corresponding to the performance error.
18. The apparatus of claim 17, wherein the identification module is to:
obtaining first score information of the first audio data based on the M note segments and a note model.
19. The apparatus of claim 17, wherein the performance error comprises: at least one playing error of note error, rhythm error and intonation error.
20. The apparatus of claim 17, wherein the apparatus further comprises:
the first determining module is used for determining the playing time length of each note segment in the N note segments;
and the second determining module is used for determining the segments of the M note segments which are matched with the notes in the N note segments but not matched with the playing duration as the note segments corresponding to the playing errors.
21. The apparatus of claim 17, wherein the second determining unit comprises:
an obtaining subunit, configured to search and obtain feature information of each note segment in the N note segments from the note model, and further determine first note feature information corresponding to reference score information;
the first extraction subunit is used for extracting and obtaining the characteristic information of each note segment in the M note segments so as to obtain second note characteristic information;
and the first matching subunit is configured to match the second note characteristic information with the first note characteristic information, and determine a note segment with unmatched characteristic information as the note segment corresponding to the performance error.
22. The apparatus of claim 17, wherein the second determining unit comprises:
a second extracting subunit, configured to extract and obtain feature information of each note segment of the M note segments, so as to obtain second note feature information;
the identification subunit is configured to identify the second note feature information through the note model, so as to obtain first score information corresponding to the first audio data;
and the second matching subunit is used for matching the first music score information with the reference music score information so as to determine note segments with different notes as the note segments corresponding to the playing errors.
23. The apparatus according to any one of claims 20 to 22, wherein the obtaining unit is configured to:
responding to the selection operation of a user to further acquire the reference music score information; or
Scanning a music score on a paper course to obtain music score image information, and carrying out image recognition on the music score image information to further obtain reference music score information; or
And performing acoustic recognition on second audio data associated with the first audio data to determine the reference music score information.
24. The apparatus of claim 18, wherein the identification module comprises:
an extracting unit, configured to extract first feature information of a first note segment, where the first note segment is any one of the M note segments;
a third determining unit, configured to determine, from the note segments included in the note model, a note segment whose similarity value between the feature information and the first feature information satisfies a preset similarity condition;
a fourth determining unit, configured to use the note of the note segment with the similarity value satisfying a preset similarity condition as the note of the first note segment;
a fifth determining unit, configured to determine the first score information based on the notes of the first note segment or the notes of the first note segment and a playing time length of the first note segment.
25. The apparatus of claim 24, wherein the apparatus further comprises:
the searching module is used for searching and obtaining reference music score information of which the similarity value with the first music score information is greater than a preset similarity value;
and the third determining module is used for providing the reference music score information as the recommended music score information of the first music score information to a user.
26. An apparatus according to any one of claims 17 to 19, wherein the note model comprises at least one set of note segments corresponding to feature information at a specific pitch.
27. A note recognition model creation apparatus, comprising:
the acquisition module is used for acquiring standard voice data corresponding to each note segment;
the extraction module is used for extracting the characteristic information of the standard voice data of each note segment;
the establishing module is used for establishing a note model based on the corresponding relation between the note segments and the characteristic information, the note model comprises the corresponding relation between at least one group of note segments and the characteristic information, the note model is used for identifying the playing errors in the audio data corresponding to the music,
wherein, the identifying the performance error in the audio data corresponding to the music comprises:
obtaining first audio data corresponding to music, wherein the first audio data comprises M note segments, and M is a positive integer; acquiring reference music score information for generating the audio data, and determining N note segments contained in the reference music score information, wherein N is a positive integer; and determining the segments of the M note segments which do not match with the notes in the N note segments as the note segments corresponding to the playing errors based on the note models.
28. The apparatus of claim 27, wherein the note segments comprise: a single note consisting of a single key; and/or a plurality of notes formed by a plurality of key combinations.
29. The apparatus of claim 27, wherein the extraction module is to: spectral features and/or frequency features of each note segment are extracted.
30. The apparatus of claim 29, wherein the extraction module comprises:
the first conversion unit is used for converting the standard voice data from time domain data into frequency domain data;
a first dividing unit, configured to divide the frequency domain data into at least one sub data;
and the first calculating unit is used for calculating and obtaining the energy of each subdata in the at least one subdata, and the energy of each subdata in the at least one subdata is taken as the characteristic information of the corresponding note segment.
31. The apparatus of claim 29, wherein the extraction module comprises:
the second conversion unit is used for converting the standard voice data from time domain data into frequency domain data;
a second dividing unit, configured to divide the frequency domain data into at least one sub data;
the second calculating unit is used for calculating and obtaining the energy of each subdata in the at least one subdata;
and determining the frequency of the subdata corresponding to the preset energy as the characteristic information of the corresponding note segment.
32. The device according to any one of claims 27 to 31, wherein the obtaining module is configured to obtain standard voice data corresponding to each note under a specific tone;
the establishing module is used for establishing the note model based on the corresponding relation between the note segments and the characteristic information under the specific tone, and the note model comprises the corresponding relation between the plurality of groups of note segments and the characteristic information under the specific tone.
33. An electronic device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for:
obtaining first audio data corresponding to music, wherein the first audio data comprises M note segments, and M is a positive integer;
matching and identifying the first audio data based on the M note segments and a note model, wherein the note model comprises the corresponding relation between at least one group of note segments and characteristic information, and the characteristic information is the characteristic information of standard voice data of the note segments;
the performing matching identification on the first audio data comprises: identifying performance errors in the first audio data, including: acquiring reference score information for generating the first audio data; determining N note segments contained in the reference music score information, wherein N is a positive integer; and determining the segments of the M note segments which do not match with the notes in the N note segments as the note segments corresponding to the playing errors based on the note models.
34. An electronic device comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors the one or more programs including instructions for:
acquiring standard voice data corresponding to each note segment;
extracting characteristic information of standard voice data of each note segment;
establishing a note model based on the corresponding relationship between the note segments and the feature information, wherein the note model comprises the corresponding relationship between at least one group of note segments and the feature information, and is used for identifying the performance errors in the audio data corresponding to the music, and the identifying the performance errors in the audio data corresponding to the music comprises the following steps:
obtaining first audio data corresponding to music, wherein the first audio data comprises M note segments, and M is a positive integer; acquiring reference music score information for generating the audio data, and determining N note segments contained in the reference music score information, wherein N is a positive integer; and determining the segments of the M note segments which do not match with the notes in the N note segments as the note segments corresponding to the playing errors based on the note models.
35. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method steps of any of claims 1 to 10.
36. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method steps of any of claims 11 to 16.
CN201610113604.6A 2016-02-29 2016-02-29 Music identification method, note identification model establishment method, device and electronic equipment Active CN107146631B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610113604.6A CN107146631B (en) 2016-02-29 2016-02-29 Music identification method, note identification model establishment method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610113604.6A CN107146631B (en) 2016-02-29 2016-02-29 Music identification method, note identification model establishment method, device and electronic equipment

Publications (2)

Publication Number Publication Date
CN107146631A CN107146631A (en) 2017-09-08
CN107146631B true CN107146631B (en) 2020-11-10

Family

ID=59783025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610113604.6A Active CN107146631B (en) 2016-02-29 2016-02-29 Music identification method, note identification model establishment method, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN107146631B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108735230B (en) * 2018-05-10 2020-12-04 上海麦克风文化传媒有限公司 Background music identification method, device and equipment based on mixed audio
CN108664977B (en) * 2018-05-14 2020-12-25 中国地质大学(武汉) Staff image identification and coding method and system based on real-time video stream
CN109285560B (en) * 2018-09-28 2021-09-03 北京奇艺世纪科技有限公司 Music feature extraction method and device and electronic equipment
CN109801645B (en) * 2019-01-21 2021-11-26 深圳蜜蜂云科技有限公司 Musical tone recognition method
CN109754657A (en) * 2019-03-13 2019-05-14 何昶昕 A kind of intelligence white silk qin training mate software
CN111210841B (en) * 2020-01-13 2022-07-29 杭州矩阵之声科技有限公司 Musical instrument phoneme recognition model establishing method and musical instrument phoneme recognition method
CN111508454B (en) * 2020-04-09 2023-12-26 百度在线网络技术(北京)有限公司 Music score processing method and device, electronic equipment and storage medium
CN113076967B (en) * 2020-12-08 2022-09-23 无锡乐骐科技股份有限公司 Image and audio-based music score dual-recognition system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001011603A1 (en) * 1999-08-05 2001-02-15 Yamaha Corporation Music reproducing apparatus, music reproducing method and telephone terminal device
CN101271457A (en) * 2007-03-21 2008-09-24 中国科学院自动化研究所 Music retrieval method and device based on rhythm
JP2009192399A (en) * 2008-02-15 2009-08-27 Honda Motor Co Ltd Strain gauge and its manufacturing method
CN102446504A (en) * 2010-10-08 2012-05-09 华为技术有限公司 Voice/Music identifying method and equipment
CN102498514A (en) * 2009-08-04 2012-06-13 诺基亚公司 Method and apparatus for audio signal classification
CN103035235A (en) * 2011-09-30 2013-04-10 西门子公司 Method and device for transforming voice into melody
CN103824565A (en) * 2014-02-26 2014-05-28 曾新 Humming music reading method and system based on music note and duration modeling
CN104143324A (en) * 2014-07-14 2014-11-12 电子科技大学 Musical tone note identification method
CN105118352A (en) * 2015-09-14 2015-12-02 刘健婷 Full-automatic musical instrument playing error correction method
CN105280170A (en) * 2015-10-10 2016-01-27 北京百度网讯科技有限公司 Method and device for playing music score
CN105304073A (en) * 2014-07-09 2016-02-03 中国科学院声学研究所 Method and system for estimating multiple music notes of music played by percussion string instruments

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001011603A1 (en) * 1999-08-05 2001-02-15 Yamaha Corporation Music reproducing apparatus, music reproducing method and telephone terminal device
CN101271457A (en) * 2007-03-21 2008-09-24 中国科学院自动化研究所 Music retrieval method and device based on rhythm
JP2009192399A (en) * 2008-02-15 2009-08-27 Honda Motor Co Ltd Strain gauge and its manufacturing method
CN102498514A (en) * 2009-08-04 2012-06-13 诺基亚公司 Method and apparatus for audio signal classification
CN102446504A (en) * 2010-10-08 2012-05-09 华为技术有限公司 Voice/Music identifying method and equipment
CN103035235A (en) * 2011-09-30 2013-04-10 西门子公司 Method and device for transforming voice into melody
CN103824565A (en) * 2014-02-26 2014-05-28 曾新 Humming music reading method and system based on music note and duration modeling
CN105304073A (en) * 2014-07-09 2016-02-03 中国科学院声学研究所 Method and system for estimating multiple music notes of music played by percussion string instruments
CN104143324A (en) * 2014-07-14 2014-11-12 电子科技大学 Musical tone note identification method
CN105118352A (en) * 2015-09-14 2015-12-02 刘健婷 Full-automatic musical instrument playing error correction method
CN105280170A (en) * 2015-10-10 2016-01-27 北京百度网讯科技有限公司 Method and device for playing music score

Also Published As

Publication number Publication date
CN107146631A (en) 2017-09-08

Similar Documents

Publication Publication Date Title
CN107146631B (en) Music identification method, note identification model establishment method, device and electronic equipment
CN109801644B (en) Separation method, separation device, electronic equipment and readable medium for mixed sound signal
CN107291690B (en) Punctuation adding method and device and punctuation adding device
CN109147826B (en) Music emotion recognition method and device, computer equipment and computer storage medium
JP2020518861A (en) Speech recognition method, apparatus, device, and storage medium
CN108399914B (en) Voice recognition method and device
CN106796785A (en) Sample sound for producing sound detection model is verified
CN111583944A (en) Sound changing method and device
CN111508511A (en) Real-time sound changing method and device
US20140358566A1 (en) Methods and devices for audio processing
CN107909995B (en) Voice interaction method and device
CN108345581A (en) A kind of information identifying method, device and terminal device
CN110033784A (en) A kind of detection method of audio quality, device, electronic equipment and storage medium
CN112735429A (en) Method for determining lyric timestamp information and training method of acoustic model
CN111199730B (en) Voice recognition method, device, terminal and storage medium
CN109102813B (en) Voiceprint recognition method and device, electronic equipment and storage medium
CN112133295A (en) Speech recognition method, apparatus and storage medium
CN109102812B (en) Voiceprint recognition method and system and electronic equipment
CN113113040B (en) Audio processing method and device, terminal and storage medium
CN107037887B (en) Method and device for Chinese character input and electronic equipment
CN109524025B (en) Singing scoring method and device, electronic equipment and storage medium
CN111276113B (en) Method and device for generating key time data based on audio
CN113707122B (en) Method and device for constructing voice synthesis model
JP7230085B2 (en) Method and device, electronic device, storage medium and computer program for processing sound
CN107068125B (en) Musical instrument control method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant