CN105895079B - Voice data processing method and device - Google Patents

Voice data processing method and device Download PDF

Info

Publication number
CN105895079B
CN105895079B CN201510926346.9A CN201510926346A CN105895079B CN 105895079 B CN105895079 B CN 105895079B CN 201510926346 A CN201510926346 A CN 201510926346A CN 105895079 B CN105895079 B CN 105895079B
Authority
CN
China
Prior art keywords
acoustic feature
voice data
feature information
processed
music score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510926346.9A
Other languages
Chinese (zh)
Other versions
CN105895079A (en
Inventor
刘方宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Zhirong Innovation Technology Development Co ltd
Original Assignee
Tianjin Zhirong Innovation Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Zhirong Innovation Technology Development Co ltd filed Critical Tianjin Zhirong Innovation Technology Development Co ltd
Priority to CN201510926346.9A priority Critical patent/CN105895079B/en
Publication of CN105895079A publication Critical patent/CN105895079A/en
Application granted granted Critical
Publication of CN105895079B publication Critical patent/CN105895079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Abstract

The embodiment of the invention provides a method and a device for processing voice data. The processing method comprises the following steps: acquiring voice data to be processed; extracting corresponding acoustic characteristic information from the voice data to be processed; and searching a pre-stored reference acoustic characteristic music table according to the acoustic characteristic information to obtain a music corresponding to the voice data to be processed. By adopting the embodiment of the invention, the music score of the voice data can be rapidly acquired, the spreading performance of the music score is enhanced, and the user experience is improved.

Description

Voice data processing method and device
Technical Field
The present invention relates to computer technologies, and in particular, to a method and an apparatus for processing voice data.
Background
Along with the popularization of the internet and the promotion of audio and video technologies, people have more and more abundant daily entertainment life, for example, people can sing songs on KTV, or sing songs for online users in a video live broadcast mode, and the like.
The music can be enjoyed and can be used for mastering the temperament of people, so that a plurality of people like music. Music does not include lyrics alone but also music score, which is a carrier for accurately recording music, which is a regular combination of various written symbols recording the pitch or rhythm of music. Music scores are important components of music.
However, people who have not learned music only know the lyrics, but not the music score, and cannot recognize the music score, and the new music thought which is accidentally flashed in the mind of the user is quickly forgotten, so that people can only record a few sentences of tones through the recording equipment, and the method has poor spreading performance and user experience.
Disclosure of Invention
The invention aims to provide a method for composing a music of voice data and a device for realizing the method, which are used for acquiring a music score corresponding to the voice data to be processed based on acoustic characteristic information acquired from the voice data to be processed, thereby quickly acquiring the music score of the voice data, enhancing the transmissibility of the music score and improving the user experience.
According to an aspect of the present invention, a method for processing voice data is provided. The processing method comprises the steps of obtaining voice data to be processed; acquiring corresponding acoustic characteristic information from the voice data to be processed; and searching a pre-stored reference acoustic characteristic music table according to the acoustic characteristic information to obtain a music corresponding to the voice data to be processed.
Preferably, the searching a pre-stored reference acoustic feature music score table according to the acoustic feature information, and the processing of obtaining the music score corresponding to the to-be-processed voice data includes: searching a reference acoustic characteristic information range value in the pre-stored reference acoustic characteristic spectrum table according to the acoustic characteristic information; and taking the music score corresponding to the searched reference acoustic characteristic information range value as the music score corresponding to the voice data to be processed.
Preferably, the processing method further comprises: and outputting the voice data to be processed and the acquired music score.
Preferably, the processing of acquiring corresponding acoustic feature information from the voice data to be processed includes: and according to the sampling time of the voice data to be processed, dividing the voice data to be processed into a plurality of data segments with preset duration, and acquiring corresponding acoustic characteristic information from any data segment.
Preferably, the reference acoustic feature profile comprises a scale, a pitch, a chromatic scale and/or a long note.
According to another aspect of the present invention, there is provided a processing apparatus for voice data. The processing device comprises: the voice data acquisition module is used for acquiring voice data to be processed; the acoustic feature acquisition module is used for acquiring corresponding acoustic feature information from the voice data to be processed acquired by the voice data acquisition module; and the music score acquisition module is used for searching a prestored reference acoustic feature music score table according to the acoustic feature information acquired by the acoustic feature acquisition module and acquiring the music score corresponding to the voice data to be processed.
Preferably, the score obtaining module includes: the information searching unit is used for searching a reference acoustic characteristic information range value in the prestored reference acoustic characteristic spectrum table according to the acoustic characteristic information acquired by the acoustic characteristic acquiring module; and the music score acquisition unit is used for taking the music score corresponding to the reference acoustic characteristic information range value searched by the information search unit as the music score corresponding to the voice data to be processed.
Preferably, the processing apparatus further comprises: and the music score output module is used for outputting the voice data to be processed and the acquired music score.
Preferably, the acoustic feature acquisition module is configured to: and according to the sampling time of the voice data to be processed, which is acquired by the voice data acquisition module, the voice data to be processed is divided into a plurality of data segments with preset duration, and corresponding acoustic feature information is extracted from any data segment.
Preferably, the reference acoustic feature profile comprises a scale, a pitch, a chromatic scale and/or a long note.
According to the voice data processing method and device provided by the embodiment of the invention, the corresponding acoustic feature information is obtained from the obtained voice data to be processed, the prestored reference acoustic feature music score table is searched according to the acoustic feature information, and the music score corresponding to the voice data to be processed is obtained, so that the music score of the voice data can be quickly obtained, the propagation of the music score is enhanced, and the user experience is improved.
Drawings
Fig. 1 is a flowchart illustrating a processing method of voice data according to a first embodiment of the present invention;
FIG. 2 is an exemplary diagram illustrating a display interface of a home page of an application for voice data processing;
Fig. 3 is a flowchart showing a processing method of voice data according to the second embodiment of the present invention;
FIG. 4 is an exemplary diagram illustrating a display interface of a home page of an application containing speech data processing of a melody;
fig. 5 is a logic block diagram showing a processing apparatus of voice data according to a third embodiment of the present invention;
fig. 6 is another logic block diagram showing a speech data processing apparatus according to a third embodiment of the present invention;
fig. 7 is still another logic block diagram showing a speech data processing apparatus according to a third embodiment of the present invention.
Detailed Description
The technical scheme can be applied to voice data processing scenes such as a recording studio, online video live broadcast and the like, corresponding acoustic characteristic information is obtained from the obtained voice data to be processed, a prestored reference acoustic characteristic music score table is searched according to the acoustic characteristic information, and a music score corresponding to the voice data to be processed is obtained, so that the music score of the voice data can be quickly obtained, the transmissibility of the music score is enhanced, and the user experience is improved.
Exemplary embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Example one
Fig. 1 is a flowchart illustrating a processing method of voice data according to a first embodiment of the present invention. The processing method is performed by a computer system including the processing apparatus shown in fig. 5.
Referring to fig. 1, voice data to be processed is acquired at step S110.
The terminal device can be installed with an application program for processing voice data, when a user needs to compose a song or a tune sung by himself or other users, a shortcut icon of the application program can be clicked, the terminal device starts the application program and displays a home page of the application program, as shown in fig. 2, the home page can include a microphone icon, a voice input box, an output box, a help icon and the like, wherein the microphone icon can include an activated state and an inactivated state, for example, when the user clicks the microphone icon, the terminal device starts a microphone and collects voice data input by the user through the microphone, and at this time, the microphone icon is in the activated state; if the user does not input voice data within the preset time, the terminal equipment can close the microphone, and at the moment, the microphone icon is in an inactivated state; the voice input box can be used for displaying an icon of voice data input by a user, or a text of the voice data and the like, so that the user can determine whether the voice data collected by the terminal equipment is accurate; the output box may be used to output data obtained by processing the voice data, and the like. After the terminal device displays the home page of the application program, the microphone can be started, at this time, the microphone icon is in an activated state, then, the user can enable the microphone of the terminal device to face the user singing the song or the tune, and the terminal device can collect voice data (namely the voice data to be processed) input by the user through the microphone. The homepage may further include a determination key, and the determination key may be clicked after the user input is completed, and the terminal device acquires the voice data to be processed collected by the microphone, or may preset a receiving duration threshold, and when the duration after the user stops inputting reaches the receiving duration threshold, the voice data input before the user stops inputting may be determined as the voice data to be processed.
It should be noted that, if the voice of the user is too small, and the terminal device cannot receive the voice data, the terminal device may send a prompt signal indicating that the voice data reception is failed, so as to prompt the user to re-input the voice data.
In step S120, corresponding acoustic feature information is obtained from the voice data to be processed.
Specifically, the terminal device may perform preprocessing on the voice data to be processed, for example, perform processing such as sampling (the sampling frequency may be 10KHz or 16KHz, etc.), anti-aliasing filtering, and removing glottal excitation and noise influence, and then may perform feature extraction on the processed voice data, where the feature extraction is used to extract one or more sets of parameters capable of describing acoustic attribute features in the voice data, such as average energy, zero-crossing number, formants, cepstrum, linear prediction coefficients, etc., from a waveform of the voice data, so as to perform subsequent voice training and acquisition of acoustic feature information, and the selection of the parameter directly relates to the level of accuracy of the acoustic feature information in the voice data. Through the analysis of the above parameters of the voice data, the acoustic feature information of the voice data, such as the tone information, the timbre information, the loudness information and/or the scale information, etc., can be obtained.
In step S130, a pre-stored reference acoustic feature score table is searched according to the acoustic feature information, and a score corresponding to the to-be-processed voice data is obtained.
Specifically, the terminal device may store a reference acoustic feature profile in advance, where the reference acoustic feature profile may include a plurality of pieces of reference acoustic feature information, and the reference acoustic feature profile may be obtained by performing a large amount of training on the voice data obtained by the processing, or may be formed by general standard acoustic feature information. The terminal device may compare each piece of reference acoustic feature information in the reference acoustic feature score table with the acoustic feature information, and calculate a matching degree between the acoustic feature information and each piece of reference acoustic feature information, may determine the first piece of reference acoustic feature information with the highest matching degree as the acoustic feature information corresponding to the voice data, and may analyze the obtained first piece of reference acoustic feature information, and set a corresponding score based on information such as pitch information, timbre information, loudness information, and/or scale information in the first piece of reference acoustic feature information, thereby obtaining the score corresponding to the voice data.
According to the voice data processing method provided by the embodiment of the invention, the corresponding acoustic feature information is obtained from the obtained voice data to be processed, the prestored reference acoustic feature music score table is searched according to the acoustic feature information, and the music score corresponding to the voice data to be processed is obtained, so that the music score of the voice data can be quickly obtained, the transmissibility of the music score is enhanced, and the user experience is improved.
Example two
Fig. 3 is a flowchart showing a processing method of voice data according to a second embodiment of the present invention, which can be regarded as still another specific implementation of fig. 1.
Referring to fig. 3, in step S310, voice data to be processed is acquired.
The content of the step S310 is the same as the content of the step S110 in the first embodiment, and is not repeated herein.
In step S320, the voice data to be processed is divided into a plurality of data segments with preset duration according to the sampling time of the voice data to be processed, and corresponding acoustic feature information is obtained from any one of the data segments.
Specifically, since the speech signal corresponding to the speech data can be generally regarded as a short-time stationary signal, for example, the speech signal between adjacent sampling times (e.g. 10-20ms) of the speech data can be regarded as a short-time stationary signal, and its spectral characteristic and some physical characteristic parameters can be approximately regarded as unchanged, so that the speech data to be processed can be processed by using an analysis processing method of stationary process, specifically: the voice data to be processed can be divided into a plurality of data segments with preset time duration (such as 10-20ms) according to the sampling time, and the endpoint detection can be performed on each data segment, wherein the endpoint detection refers to determining the starting point and the ending point of the voice from a segment of data containing the voice. Then, feature extraction may be performed on each data segment, one or more sets of parameters capable of describing acoustic attribute features in the corresponding data segment are extracted from each data segment, and the acoustic feature information of each data segment can be obtained through analysis of the parameters of each data segment.
In step S330, a reference acoustic feature information range value in the pre-stored reference acoustic feature profile table is searched according to the acoustic feature information.
Wherein the reference acoustic feature music score comprises a scale, a tone, a chromatic scale and/or a long tone.
Specifically, the terminal device may store a reference acoustic feature profile in advance, the reference acoustic feature profile may include a plurality of pieces of reference acoustic feature information such as musical scale, pitch, chromatic scale, and/or long tone, different recognition ranges may be divided for the musical scale, pitch, chromatic scale, and/or long tone according to a predetermined division standard, and corresponding range values may be set, and the reference acoustic feature profile may be obtained by performing a large amount of training on voice data, or may be composed using general standard acoustic feature information. The acoustic feature information of each data segment may be set with a feature value according to a predetermined standard, and for the acoustic feature information of a certain data segment in the voice data, the terminal device may compare each piece of reference acoustic feature information in the reference acoustic feature profile with the acoustic feature information of the data segment, and find a reference acoustic feature information range value in which the feature value of the acoustic feature information of the data segment is located in the reference acoustic feature profile. By the above method, the above processing can be performed on other data segments in the voice data, and the reference acoustic feature information range value where the feature value of the acoustic feature information of each data segment is located is found in the reference acoustic feature profile table respectively.
In step S340, the music score corresponding to the searched reference acoustic feature information range value is used as the music score corresponding to the voice data to be processed.
Specifically, the matching degree between the acoustic feature information of the data segment and each piece of reference acoustic feature information is obtained through calculation, the first piece of reference acoustic feature information with the highest matching degree can be determined as the acoustic feature information corresponding to the data segment, the terminal device can analyze the reference acoustic feature information corresponding to each found reference acoustic feature information range value, and set a corresponding music score based on the information of scale, tone, chromatic scale, and/or chromatic scale in the corresponding reference acoustic feature information, so as to obtain a music score corresponding to the corresponding data segment, the above processing can be performed on other data segments in the voice data through the above manner, so as to obtain a music score corresponding to each data segment, then, the position of the data segment in the voice data can be determined according to the start point and the end point corresponding to each data segment, and the corresponding music scores can be sorted according to the position of each data segment, and obtaining a music score corresponding to the voice data.
In addition, the process of composing the voice data may be implemented in other various manners besides the above manners, for example, the voice data may be composed through a voice composition model, the voice composition model may be trained before the voice data is composed, and a technician may obtain various voice data through various ways, for example, the technician may obtain the voice data from various channels (such as purchasing from a user) before developing a voice composition mechanism, then train the voice composition model by using the obtained voice data, specifically, may set parameters of a plurality of voice composition models, extract relevant parameters in the voice data after obtaining the voice data, and obtain acoustic feature information of the voice data according to the relevant parameters, and then may perform state labeling on each frame of voice data, specifically, a neural network model may be set, the speech data may be divided into three layers, and then the neural network model of the acoustic features of the context may be used, the acoustic feature information of the head layer, the middle layer, and the tail layer is extracted from the speech data, the acoustic feature information of the three layers may be used as a sample feature space, the acoustic feature information corresponding to the sample feature space is obtained based on the sample feature space, and the acoustic feature information corresponding to the middle layer may be used as a flag. An artificial neural network topology structure can be used as a core of a speech recognition model, the artificial neural network topology structure can comprise three layers, such as an input layer, an implied layer and an output layer, firstly, the artificial neural network can be initialized, at the moment, the direct network connection weight of every two neurons is initialized to a very small random number (for example, -1.0), each neuron has a bias and is initialized to a random number, the output of each neuron is obtained through calculation according to the network input layer of input speech data, the calculation method of each neuron is the same and can be obtained by linear combination of the input of the neuron, finally, the actual output, namely the corresponding curved spectrum, can be obtained by comparing with an expected output result to obtain the error of each output unit, and the obtained error needs to be propagated from the output layer to the input layer, the error of the unit in the previous layer can be calculated by the error of all the units in the next layer connected with the unit, and the network weight and the neuron bias can be adjusted. And for each voice data, if the final output error is smaller than a preset acceptable range or a preset iteration threshold, continuing the processing on the next voice data, and thus continuously training to obtain a voice music score model. After the terminal device obtains the voice data to be processed, the voice data can be input into the voice recognition model for voice music composition, and a voice music composition result is obtained.
In step S350, the voice data to be processed and the acquired music score are output.
Specifically, as shown in fig. 4, the terminal device may display the text of the voice data to be processed and the acquired score at a preset position of an output box in the top page of the application program of the voice data processing, "XXXX" in fig. 4 represents the text of the voice data, and "a a …" represents the score.
It should be noted that the text of the voice data to be processed and the acquired music score may be displayed correspondingly, for example, a first character in the text corresponds to a first character in the music score, a second character in the text corresponds to a second character and a third character in the music score, and the like.
In addition, the home page of the application program for processing the voice data can also comprise a key for playing the music score, and when the user needs to listen to the music score, the key can be clicked, and the terminal equipment plays the music score. In order to improve the user experience, when the music score is played, the voice data to be processed input by the user can be played, so that the user can know the matching degree between the voice data and the music score through the playing of the terminal equipment.
On one hand, according to the voice data processing method provided by the embodiment of the invention, on the one hand, the acquired voice data to be processed is divided into a plurality of data segments with preset duration, corresponding acoustic feature information is acquired from any data segment, a prestored reference acoustic feature music score table is searched according to the acoustic feature information, and a music score corresponding to the voice data to be processed is acquired, so that the music score of the voice data can be acquired quickly, the transmissibility of the music score is enhanced, and the user experience is improved; on the other hand, the voice data to be processed and the acquired music score are output and displayed, and the music score can be played, so that a user can know the matching degree between the voice data and the music score, and the user experience is improved.
EXAMPLE III
Based on the same technical concept, fig. 5 is a logic block diagram showing a processing apparatus of voice data according to a third embodiment of the present invention. Referring to fig. 5, the processing apparatus includes a voice data obtaining module 510, an acoustic feature obtaining module 520, and a music score obtaining module 530, where the voice data obtaining module 510 is connected to the acoustic feature obtaining module 520, and the acoustic feature obtaining module 520 is connected to the music score obtaining module 530.
The voice data obtaining module 510 is used for obtaining the voice data to be processed.
The acoustic feature obtaining module 520 is configured to obtain corresponding acoustic feature information from the to-be-processed voice data obtained by the voice data obtaining module 510.
The music score obtaining module 530 is configured to search a pre-stored reference acoustic feature music score table according to the acoustic feature information obtained by the acoustic feature obtaining module 520, and obtain a music score corresponding to the to-be-processed voice data.
According to the voice data processing device provided by the embodiment of the invention, the corresponding acoustic feature information is obtained from the obtained voice data to be processed, the prestored reference acoustic feature music score table is searched according to the acoustic feature information, and the music score corresponding to the voice data to be processed is obtained, so that the music score of the voice data can be quickly obtained, the transmissibility of the music score is enhanced, and the user experience is improved.
Further, on the basis of the embodiment shown in fig. 5, the curvelet obtaining module 530 shown in fig. 6 includes: the information searching unit 531 is configured to search, according to the acoustic feature information obtained by the acoustic feature obtaining module 520, a reference acoustic feature information range value in the pre-stored reference acoustic feature profile table; a music score obtaining unit 532, configured to use the music score corresponding to the range value of the reference acoustic feature information found by the information finding unit 531 as the music score corresponding to the to-be-processed voice data.
Further, on the basis of the embodiment shown in fig. 6, the processing apparatus shown in fig. 7 further includes: and a music score output module 540, configured to output the voice data to be processed and the obtained music score.
Preferably, the acoustic feature obtaining module 520 is configured to divide the to-be-processed voice data into a plurality of data segments with preset time duration according to the sampling time of the to-be-processed voice data obtained by the voice data obtaining module 510, and extract corresponding acoustic feature information from any one of the data segments.
Preferably, the reference acoustic feature profile comprises a scale, a pitch, a chromatic scale and/or a long note.
Further, in the apparatus for processing voice data provided in the embodiment of the present invention, on one hand, by dividing the acquired to-be-processed voice data into a plurality of data segments with preset durations, corresponding acoustic feature information is acquired from any data segment, and a prestored reference acoustic feature music score table is searched according to the acoustic feature information, so as to acquire a music score corresponding to the to-be-processed voice data, thereby quickly acquiring the music score of the voice data, enhancing the transmissibility of the music score, and improving user experience; on the other hand, the voice data to be processed and the acquired music score are output and displayed, and the music score can be played, so that a user can know the matching degree between the voice data and the music score, and the user experience is improved.
It should be noted that, according to implementation requirements, each step/component described in the present application can be divided into more steps/components, and two or more steps/components or partial operations of the steps/components can also be combined into a new step/component to achieve the purpose of the present invention.
The above-described method according to the present invention can be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium and to be stored in a local recording medium downloaded through a network, so that the method described herein can be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the processing methods described herein. Further, when a general-purpose computer accesses code for implementing the processes shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the processes shown herein.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (2)

1. A method for processing voice data, the method comprising:
acquiring voice data to be processed;
acquiring corresponding acoustic characteristic information from the voice data to be processed, wherein the acquisition comprises the steps of dividing the voice data to be processed into a plurality of data sections with preset duration according to the sampling time of the voice data to be processed, and acquiring the corresponding acoustic characteristic information from each data section;
searching a pre-stored reference acoustic characteristic music score table according to the acoustic characteristic information to obtain a music score corresponding to the voice data to be processed;
outputting the voice data to be processed and the acquired music score;
searching a pre-stored reference acoustic feature music table according to the acoustic feature information, and acquiring a music score corresponding to the voice data to be processed comprises:
Searching a reference acoustic characteristic information range value in the pre-stored reference acoustic characteristic spectrum table according to the acoustic characteristic information, wherein the reference acoustic characteristic information range value comprises the following steps: comparing each piece of reference acoustic feature information in a reference acoustic feature profile with the acoustic feature information of each data segment in the voice data to be processed, and finding a reference acoustic feature information range value where a feature value of the acoustic feature information of each data segment is located in the reference acoustic feature profile; the reference acoustic characteristic information comprises a scale, a tone, a chromatic scale and/or a long tone, and different identification ranges are respectively divided for the scale, the tone, the chromatic scale and/or the long tone according to a preset division standard;
taking the music score corresponding to the searched reference acoustic characteristic information range value as the music score corresponding to the voice data to be processed, wherein the music score corresponding to the voice data to be processed comprises the following steps: calculating the matching degree of the acoustic feature information of each data segment and each datum acoustic feature information, determining the first datum acoustic feature information with the highest matching degree as the acoustic feature information corresponding to the data segment, analyzing the datum acoustic feature information corresponding to the searched range value of each datum acoustic feature information, and setting a corresponding music score based on the musical scale, tone, chromatic scale and/or long-pitch information in the corresponding datum acoustic feature information, so as to obtain the music score corresponding to the corresponding data segment; and determining the position of each data segment in the voice data according to the corresponding starting point and ending point of each data segment, and sequencing the corresponding music score according to the position of each data segment to obtain the music score corresponding to the voice data.
2. A processing apparatus of voice data, the processing apparatus comprising:
the voice data acquisition module is used for acquiring voice data to be processed;
the acoustic feature acquisition module is used for acquiring corresponding acoustic feature information from the voice data to be processed acquired by the voice data acquisition module, and comprises the steps of dividing the voice data to be processed into a plurality of data segments with preset duration according to the sampling time of the voice data to be processed, and acquiring corresponding acoustic feature information from each data segment;
the music score acquisition module is used for searching a prestored reference acoustic feature music score table according to the acoustic feature information acquired by the acoustic feature acquisition module and acquiring a music score corresponding to the voice data to be processed;
the music score output module is used for outputting the voice data to be processed and the acquired music score;
the music score acquisition module comprises:
the information searching unit is configured to search a reference acoustic feature information range value in the pre-stored reference acoustic feature profile table according to the acoustic feature information acquired by the acoustic feature acquisition module, and includes: comparing each piece of reference acoustic feature information in a reference acoustic feature profile with the acoustic feature information of each data segment in the voice data to be processed, and finding a reference acoustic feature information range value where a feature value of the acoustic feature information of each data segment is located in the reference acoustic feature profile; the reference acoustic characteristic information comprises a scale, a tone, a chromatic scale and/or a long tone, and different identification ranges are respectively divided for the scale, the tone, the chromatic scale and/or the long tone according to a preset division standard;
A score acquisition unit, configured to take the score corresponding to the range value of the reference acoustic feature information found by the information search unit as the score corresponding to the to-be-processed voice data, where the score acquisition unit is configured to: calculating the matching degree of the acoustic feature information of each data segment and each datum acoustic feature information, determining the first datum acoustic feature information with the highest matching degree as the acoustic feature information corresponding to the data segment, analyzing the datum acoustic feature information corresponding to the searched range value of each datum acoustic feature information, and setting a corresponding music score based on the musical scale, tone, chromatic scale and/or long-pitch information in the corresponding datum acoustic feature information, so as to obtain the music score corresponding to the corresponding data segment; and determining the position of each data segment in the voice data according to the corresponding starting point and ending point of each data segment, and sequencing the corresponding music score according to the position of each data segment to obtain the music score corresponding to the voice data.
CN201510926346.9A 2015-12-14 2015-12-14 Voice data processing method and device Active CN105895079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510926346.9A CN105895079B (en) 2015-12-14 2015-12-14 Voice data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510926346.9A CN105895079B (en) 2015-12-14 2015-12-14 Voice data processing method and device

Publications (2)

Publication Number Publication Date
CN105895079A CN105895079A (en) 2016-08-24
CN105895079B true CN105895079B (en) 2022-07-29

Family

ID=57002399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510926346.9A Active CN105895079B (en) 2015-12-14 2015-12-14 Voice data processing method and device

Country Status (1)

Country Link
CN (1) CN105895079B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986841B (en) * 2018-08-08 2023-07-11 百度在线网络技术(北京)有限公司 Audio information processing method, device and storage medium
CN109920449B (en) * 2019-03-18 2022-03-04 广州市百果园网络科技有限公司 Beat analysis method, audio processing method, device, equipment and medium
CN111081248A (en) * 2019-12-27 2020-04-28 安徽仁昊智能科技有限公司 Artificial intelligence speech recognition device
CN113823281B (en) * 2020-11-24 2024-04-05 北京沃东天骏信息技术有限公司 Voice signal processing method, device, medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271457A (en) * 2007-03-21 2008-09-24 中国科学院自动化研究所 Music retrieval method and device based on rhythm
CN101930732A (en) * 2010-06-29 2010-12-29 中兴通讯股份有限公司 Music producing method and device based on user input voice and intelligent terminal

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003005242A1 (en) * 2001-03-23 2003-01-16 Kent Ridge Digital Labs Method and system of representing musical information in a digital representation for use in content-based multimedia information retrieval
TWI342009B (en) * 2007-12-31 2011-05-11 Inventec Appliances Corp Method of converting voice into music score
CN104978962B (en) * 2014-04-14 2019-01-18 科大讯飞股份有限公司 Singing search method and system
CN104992712B (en) * 2015-07-06 2019-02-12 成都云创新科技有限公司 It can identify music automatically at the method for spectrum

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101271457A (en) * 2007-03-21 2008-09-24 中国科学院自动化研究所 Music retrieval method and device based on rhythm
CN101930732A (en) * 2010-06-29 2010-12-29 中兴通讯股份有限公司 Music producing method and device based on user input voice and intelligent terminal

Also Published As

Publication number Publication date
CN105895079A (en) 2016-08-24

Similar Documents

Publication Publication Date Title
CN102664016B (en) Singing evaluation method and system
US8535236B2 (en) Apparatus and method for analyzing a sound signal using a physiological ear model
Mion et al. Score-independent audio features for description of music expression
CN105895079B (en) Voice data processing method and device
JP6060867B2 (en) Information processing apparatus, data generation method, and program
CN106898339B (en) Song chorusing method and terminal
CN106971743B (en) User singing data processing method and device
KR101325722B1 (en) Apparatus for generating musical note fit in user's song and method for the same
CN113782032A (en) Voiceprint recognition method and related device
Pikrakis et al. Tracking melodic patterns in flamenco singing by analyzing polyphonic music recordings
TWI299855B (en) Detection method for voice activity endpoint
Pendekar et al. Harmonium raga recognition
CN105244021B (en) Conversion method of the humming melody to MIDI melody
Rao Audio signal processing
JP2010060846A (en) Synthesized speech evaluation system and synthesized speech evaluation method
JP6098422B2 (en) Information processing apparatus and program
Tsai et al. Automatic Singing Performance Evaluation Using Accompanied Vocals as Reference Bases.
JP6252420B2 (en) Speech synthesis apparatus and speech synthesis system
CN110299049B (en) Intelligent display method of electronic music score
CN108182946B (en) Vocal music mode selection method and device based on voiceprint recognition
CN112837698A (en) Singing or playing evaluation method and device and computer readable storage medium
JP2008040258A (en) Musical piece practice assisting device, dynamic time warping module, and program
JP6365483B2 (en) Karaoke device, karaoke system, and program
KR20110076314A (en) Apparatus and method for estimating a musical performance
KR101236435B1 (en) Recognizable karaoke player of voice of words and method of controlling the same

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220714

Address after: 300467 917-2, Chuangzhi building, 482 Zhongxin eco city, Binhai New Area, Tianjin

Applicant after: Tianjin Zhirong Innovation Technology Development Co.,Ltd.

Address before: 100025 LETV building, 105 yaojiayuan Road, Chaoyang District, Beijing

Applicant before: LE SHI INTERNET INFORMATION & TECHNOLOGY CORP., BEIJING

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant