CN106548784A - A kind of evaluation methodology of speech data and system - Google Patents

A kind of evaluation methodology of speech data and system Download PDF

Info

Publication number
CN106548784A
CN106548784A CN201510586445.7A CN201510586445A CN106548784A CN 106548784 A CN106548784 A CN 106548784A CN 201510586445 A CN201510586445 A CN 201510586445A CN 106548784 A CN106548784 A CN 106548784A
Authority
CN
China
Prior art keywords
speech data
fundamental frequency
stripe
note
quantization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510586445.7A
Other languages
Chinese (zh)
Other versions
CN106548784B (en
Inventor
傅鸿城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201510586445.7A priority Critical patent/CN106548784B/en
Priority to PCT/CN2016/083043 priority patent/WO2017045428A1/en
Publication of CN106548784A publication Critical patent/CN106548784A/en
Application granted granted Critical
Publication of CN106548784B publication Critical patent/CN106548784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The embodiment of the invention discloses the evaluation methodology of speech data and system, are applied to technical field of information processing.In the method for the present embodiment, the evaluation system of speech data can quantify respectively to the speech data included by a plurality of voice data of an accompaniment, then the optimum speech data of an accompaniment is obtained according to the speech data after many stripe quantizations and is stored, so it is achieved that and the i.e. optimum speech data of preset normal data is automatically generated by the evaluation system of speech data, to facilitate system to evaluate the speech data to be evaluated of an accompaniment, with need to make and pre-set criteria data are compared in system by artificial offline in prior art, the method low cost of the present embodiment, difficulty is little, and ageing comparison is high.

Description

A kind of evaluation methodology of speech data and system
Technical field
The present invention relates to technical field of information processing, the evaluation methodology of more particularly to a kind of speech data and system.
Background technology
The evaluation system of existing speech data (such as song) can be evaluated to the speech data that user uploads, and the evaluation result of speech data is supplied to into user by way of marking, specifically system is compared with the speech data for uploading according to preset normal data, is given a mark according to comparative result.
But preset normal data is all so not only high cost by manually making offline and being preset in system in the evaluation system of traditional speech data, difficulty is big and ageing also than relatively low.
The content of the invention
The embodiment of the present invention provides a kind of evaluation methodology of speech data and system, realizes and automatically generates preset normal data by the evaluation system of speech data.
The embodiment of the present invention provides a kind of evaluation methodology of speech data, including:
The speech data included by a plurality of voice data of one accompaniment is carried out quantifying to obtain the speech data after many stripe quantizations respectively;
Speech data after many stripe quantizations is clustered, the optimum speech data of one accompaniment is obtained;
Store the optimum speech data of one accompaniment, the optimum speech data is for evaluating to the speech data to be evaluated of one accompaniment.
The embodiment of the present invention also provides a kind of evaluation system of speech data, including:
First quantifying unit, for carrying out respectively quantifying to obtain the speech data after many stripe quantizations to the speech data included by a plurality of voice data of an accompaniment;
Optimum acquiring unit, clusters for the speech data after many stripe quantizations for obtaining to first quantifying unit, obtains the optimum speech data of one accompaniment;
Storage element, for storing the optimum speech data of the accompaniment that the optimum acquiring unit is obtained, the optimum speech data is for evaluating to the speech data to be evaluated of one accompaniment.
It can be seen that, in the method for the present embodiment, the evaluation system of speech data can quantify respectively to the speech data included by a plurality of voice data of an accompaniment, then the optimum speech data of an accompaniment is obtained according to the speech data after many stripe quantizations and is stored, so it is achieved that and the i.e. optimum speech data of preset normal data is automatically generated by the evaluation system of speech data, to facilitate system to evaluate the speech data to be evaluated of an accompaniment, with need to make and pre-set criteria data are compared in system by artificial offline in prior art, the method low cost of the present embodiment, difficulty is little, and ageing comparison is high.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, accompanying drawing to be used needed for embodiment or description of the prior art will be briefly described below, apparently, drawings in the following description are only some embodiments of the present invention, for those of ordinary skill in the art, without having to pay creative labor, can be with according to these other accompanying drawings of accompanying drawings acquisition.
Fig. 1 be a kind of speech data provided in an embodiment of the present invention evaluation methodology in pre-set criteria data method flow diagram;
Fig. 2 is the method flow diagram evaluated to speech data in the embodiment of the present invention;
Fig. 3 is the method flow diagram for quantifying the speech data included by voice data in the embodiment of the present invention;
Fig. 4 is the method flow diagram of the optimum speech data of acquisition in the embodiment of the present invention;
Fig. 5 is a kind of structural representation of the evaluation system of speech data provided in an embodiment of the present invention;
Fig. 6 is the structural representation of the evaluation system of another kind of speech data provided in an embodiment of the present invention;
Fig. 7 is the structural representation of the evaluation system of another kind of speech data provided in an embodiment of the present invention;
Fig. 8 is the structural representation of the evaluation system of another kind of speech data provided in an embodiment of the present invention;
Fig. 9 is the method flow diagram for extracting speech data in Application Example of the present invention from voice data;
Figure 10 is regular to multiple fundamental frequency subsequences method flow diagram in Application Example of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only a part of embodiment of the invention, rather than the embodiment of whole.Based on the embodiment in the present invention, the every other embodiment obtained under the premise of creative work is not made by those of ordinary skill in the art belongs to the scope of protection of the invention.
The (if present)s such as term " first ", " second ", " the 3rd " " the 4th " in description and claims of this specification and above-mentioned accompanying drawing are for distinguishing similar object, without being used to describe specific order or precedence.It should be appreciated that the data for so using can be exchanged in the appropriate case, so that embodiments of the invention described herein for example can be implemented with the order in addition to those for illustrating here or describing.In addition, term " comprising " and " having " and their any deformation, it is intended to cover including not exclusively, for example, process, method, system, product or the equipment for containing series of steps or unit is not necessarily limited to those steps clearly listed or unit, but may include other steps clearly do not list or intrinsic for these processes, method, product or equipment or unit.
The embodiment of the present invention provides a kind of evaluation methodology of speech data, the mainly method performed by the evaluation system of speech data, flow chart as shown in figure 1, including:
Step 101, carries out quantifying to obtain the speech data after many stripe quantizations to the speech data included by a plurality of voice data of an accompaniment respectively.
It is appreciated that for an accompaniment, any user can be followed by one section of accompaniment and be sung at least one times and obtain at least one voice data, and the singing data (i.e. speech data) of accompaniment music and user is just contained in a voice data.In the present embodiment, each user respective voice data can be uploaded to the evaluation system of speech data by operating the evaluation system of speech data, then the speech data that the evaluation system of speech data can be respectively to including in the voice data of each user is processed.Therefore, the evaluation system of speech data needs first to extract the speech data in voice data, is then quantified again, and quantification treatment here is by voice data criteria, the standardized data that the evaluation system of speech data is given tacit consent to is showed, with the process after facilitating.
Step 102, clusters to the speech data after many stripe quantizations, obtains the optimum speech data of an accompaniment.Here cluster refers to the analysis process that the set of physics or abstract object is grouped into the multiple classes being made up of similar object, when implementing, for the speech data after any of which stripe quantization, the distance of the speech data after the stripe quantization and the speech data after other stripe quantizations is calculated respectively, then just optimum speech data can be obtained according to the distance for calculating.
Step 103, stores the optimum speech data of an accompaniment, and the optimum speech data as evaluates the normal data of speech data, is mainly used in evaluating the speech data to be evaluated of this accompaniment.
By above-mentioned steps 101 to 103, system is with regard to being automatically prefixed the normal data evaluated to speech data.Further, with reference to shown in Fig. 2, when the speech data to be evaluated to this section of accompaniment is evaluated, specifically can realize in accordance with the following steps:
Step 201, speech data to be evaluated after being quantified is carried out to the speech data to be evaluated of an accompaniment, specifically, a voice data is obtained after a certain user is sung according to an accompaniment, and in uploading to the evaluation system of speech data, wait the evaluation of the evaluation system, then the evaluation system just first can extract speech data to be evaluated from the voice data for uploading, and the method that can then proceed in above-mentioned steps 101 quantifies to speech data to be evaluated.
First distance of step 202, the speech data to be evaluated after the quantization that calculation procedure 201 is obtained and optimum speech data, first distance can be Euclidean distance.
Step 203, determines the evaluation score value of speech data to be evaluated according to the first distance that above-mentioned steps 202 are calculated.
Specifically, the evaluation system of speech data can be using the fraction between the above-mentioned first distance quantization to 0 to 100 as evaluation score value, specifically, first can obtain in the speech data after many stripe quantizations that above-mentioned steps 101 are obtained, the maximum second distance of the distance of the optimum speech data obtained with above-mentioned steps 102, assume that above-mentioned first distance is k, second distance is m, then the evaluation score value that the evaluation system of speech data determines is 100* (m-k)/m.
Further, the evaluation system of speech data can also export the evaluation score value, and the speech data to be evaluated position inconsistent with optimum speech data is further exported, such user just can intuitively know the defect that oneself is sung, such that it is able to pointedly lift the performance level of itself.Wherein, due to during optimum speech data is obtained, distance between each position of speech data to be evaluated and optimum each position of speech data can be calculated, if the corresponding distance in a certain position is more than preset value, speech data to be evaluated is inconsistent with optimum speech data on the position.
It can be seen that, in the method for the present embodiment, the evaluation system of speech data can quantify respectively to the speech data included by a plurality of voice data of an accompaniment, then the optimum speech data of the accompaniment is obtained according to the speech data after many stripe quantizations and is stored, so it is achieved that and the i.e. optimum speech data of preset normal data is automatically generated by the evaluation system of speech data, to facilitate system to evaluate the speech data to be evaluated of the accompaniment, with need to make and pre-set criteria data are compared in system by artificial offline in prior art, the method low cost of the present embodiment, difficulty is little, and ageing comparison is high.
With reference to shown in Fig. 3, in a specific embodiment, during above-mentioned steps 101 are performed, the speech data included for a certain bar voice data carries out quantifying to obtain the speech data after a stripe quantization evaluation system of speech data, can specifically be implemented by:
Step A1, extracts the fundamental frequency information of the voice data.As people's frequency that vocal cord vibration is produced in sounding can produce a large amount of overtones after sound channel filtration, for the ease of subsequent operation, the evaluation system of speech data extracts the data (i.e. fundamental frequency information) of the fundamental tone for directly showing vibration frequency of vocal band from voice data, fundamental tone determines the pitch of whole note, therefore, the fundamental frequency information that this step is extracted can just represent the speech data included in a voice data.
The general fundamental frequency information for extracting includes fundamental frequency sequence, i.e., be time point including multiple fundamental frequencies, and the fundamental frequency value in each fundamental frequency is frequency values, and the frequency values illustrate the voice intensity on correspondence time point.
Step B1, fundamental frequency information is converted so that the fundamental frequency value that the fundamental frequency information after conversion includes is small range numerical value, small range numerical value is said for original fundamental frequency value here, refer in a range of numerical value less than original fundamental frequency value so that process afterwards is easily realized.
Specifically, the multiple fundamental frequency values in fundamental frequency information directly can be converted into small range numerical value by the evaluation system of speech data, specifically, can take log values to obtain small range numerical value to fundamental frequency value.
Fundamental frequency information first can also be carried out the second pretreatment by the evaluation system of speech data, and the fundamental frequency value that the second pretreated fundamental frequency information includes is converted into small range numerical value then.Here the second pretreatment includes following at least one processing mode:Low-pass filtering, compression, unusual fundamental frequency zero setting and the filling of zero-base frequency etc., wherein low-pass filtering can include medium filtering or mean filter etc., if adopting medium filtering, if then the length of fundamental frequency sequence is less than preset length (such as 35 frames), the evaluation system of speech data can carry out medium filtering according to the physical length of the fundamental frequency sequence;If the length of fundamental frequency sequence is more than or equal to preset length, the evaluation system of speech data such as can do 10 points of medium filterings per frame to carrying out medium filtering per a bit of a points that take.
Compression is processed can be included:For the fundamental frequency sequence that fundamental frequency information includes, per N (such as 5) individual fundamental frequency in the fundamental frequency value that takes in a fundamental frequency constitute new fundamental frequency sequence, then equivalent to by N times of fundamental frequency sequence compaction.
Unusual fundamental frequency zero setting is processed can be included:Fundamental frequency value in fundamental frequency before and after fundamental frequency value in a certain fundamental frequency, with the fundamental frequency is compared, if phase difference is more than preset difference, detects that the fundamental frequency is unusual fundamental frequency, then by the fundamental frequency value zero setting in the fundamental frequency.
The filling of zero-base frequency is processed can be included:If in fundamental frequency sequence, there is the fundamental frequency value in multiple continuous fundamental frequencies to be zero, and the length of this multiple continuous fundamental frequency be less than preset length (such as 15 frames) when, the fundamental frequency value fundamental frequency value in this multiple continuous fundamental frequency being all set in the non-zero fundamental frequency before this multiple fundamental frequency.Wherein, the fundamental frequency of these zero-base frequency values is quiet section in speech data, that is, do not have the part of voice.
Fundamental frequency information after conversion is quantified as a sequence of notes by step C1, or is quantified as a sequence of notes by the first pretreated fundamental frequency information is carried out to the fundamental frequency information after conversion, then the speech data after a stripe quantization includes the information of sequence of notes.Wherein, sequence of notes includes multiple notes, the information of each note includes the initial time of the note, duration and corresponding pitch value, here initial time is the fundamental frequency information after conversion or the initial time for carrying out the fundamental frequency subsequence that the first pretreated fundamental frequency information includes, the length of a fundamental frequency subsequence after Shi Changwei is regular, pitch value be it is regular after a fundamental frequency subsequence frequency values.
Here the first pretreatment can include following at least one processing mode:Low-pass filtering and 3 points are smooth etc., and wherein low-pass filtering is specifically as follows medium filtering or mean filter;And 3 points of smoothing processings can specifically include:If the fundamental frequency value in a certain fundamental frequency, it is more than preset difference (such as 0.01) with the difference of the fundamental frequency value in the fundamental frequency before and after the fundamental frequency, then the fundamental frequency value in the fundamental frequency is set to into the fundamental frequency value in less previous fundamental frequency different with the fundamental frequency value difference of the fundamental frequency or latter fundamental frequency.
The evaluation system of speech data is when the quantization of this step is performed, segmentation can be carried out to the fundamental frequency information after conversion or the fundamental frequency sequence carried out in the first pretreated fundamental frequency information and obtain multiple fundamental frequency subsequences, specifically, if the difference of the fundamental frequency value in two adjacent fundamental frequencies is more than preset value such as 0.05, from the time in two adjacent fundamental frequencies earlier above or relatively after fundamental frequency on be segmented, the initial time of a fundamental frequency subsequence, and the end time of another fundamental frequency subsequence are obtained.So whole fundamental frequency sequence is segmented, so as to obtain multiple fundamental frequency subsequences, and obtain initial time and the end time of each fundamental frequency subsequence, the length of each fundamental frequency subsequence can determine that according to the difference of initial time and end time, in the present embodiment, need to carry out the length of each fundamental frequency subsequence it is regular, it is regular to preset scope (such as between 0 to 20).
Then the frequency values of each fundamental frequency subsequence are represented with the fundamental frequency intermediate value of the fundamental frequency subsequence, and by it is the fundamental frequency intermediate value regular be integer between 0 to 24.As people's acoustic energy crosses over two octaves, an octave includes 12 semitones, and two such octave has 24 semitones, thus operation here be by it is the fundamental frequency intermediate value of each fundamental frequency subsequence regular be a note value.
One sequence of notes can be obtained by by above-mentioned operation, including multiple notes, the information of each note includes the initial time (initial time of i.e. above-mentioned fundamental frequency subsequence) of the note, the duration (length of the fundamental frequency subsequence after i.e. regular) of note, and pitch value (the fundamental frequency intermediate value of the fundamental frequency subsequence after i.e. regular).In the sequence of notes obtained after so quantifying, the precision of the pitch value of each note is 1 semitone, and maximum is 23, and the precision of the duration of each note is 10 frames (i.e. 0.1 second), and maximum is 19.
Further, if the pitch value by least two adjacent notes in the sequence of notes that above-mentioned process is obtained is identical, it is a note then to need to merge this at least two note, the initial time of a note after merging is the initial time of the note (earliest initial time in specially above-mentioned at least two note) for merging, the duration sum of at least two notes that Shi Changwei merges, pitch value are the pitch value of the arbitrary note for merging.
With reference to shown in Fig. 4, in another specific embodiment, if the speech data after the quantization that obtains of above-mentioned steps 101 has n bars, n is the positive integer more than 1 here, then the evaluation system of speech data can be realized by the steps when above-mentioned steps 102 are performed:
Step A2, calculates the distance of the speech data in the speech data after n stripe quantizations after any two stripe quantization respectively.
It is appreciated that, the information of sequence of notes is included in the speech data after the stripe quantization that aforesaid operations are obtained, the information of sequence of notes includes the initial time of each note in multiple notes, duration and pitch value, then the note S in the speech data in the speech data after n stripe quantizations after the first stripe quantizationi, with the second stripe quantization after speech data in note SjApart from D (Si,Sj), specifically include:
Calculate according to equation below:Wherein:
Δ p is expression note SiWith SjPitch it is poor, Δ p=min (abs (pi-pj), abs (pi-pj- 24)+1.0, abs (pi-pj+ 24)+1.0), to seek absolute value, min is to choose minima, p to absiFor note SiPitch value, pjFor note SjPitch value;
Δ d is note SiWith SjTime difference, i.e. the difference of the corresponding duration of each note, σ are such as 0.4 for the weighted value of time difference.
The distance between note in speech data after a note and another (such as Article 2) quantization in speech data after one (such as first) quantifies is can be obtained by so, then directly distance is the ultimate range between the note of the speech data after the note and the second stripe quantization of the speech data after the first stripe quantization for calculating to the speech data after the speech data and the second stripe quantization after the first stripe quantization.
Step B2, calculates the speech data after each stripe quantization in the speech data after n stripe quantizations respectively, can correspond to one apart from sum with the speech data after sum, so each stripe quantization of speech data after other n-1 stripe quantizations respectively.
Step C2, using the speech data after a minimum range sum corresponding stripe quantization as optimum speech data.
The embodiment of the present invention also provides a kind of evaluation system of speech data, and its structural representation is as shown in figure 5, can specifically include:
First quantifying unit 10, for carrying out respectively quantifying to obtain the speech data after many stripe quantizations to the speech data included by a plurality of voice data of an accompaniment.
Speech data in voice data first can be extracted by first quantifying unit 10, then quantified again, here quantification treatment is the standardized data that the evaluation system of speech data is given tacit consent to be showed, voice data criteria with the process after facilitating.
Optimum acquiring unit 11, clusters for the speech data after many stripe quantizations for obtaining to first quantifying unit 10, obtains the optimum speech data of one accompaniment.
Specifically, optimum acquiring unit 11, for the speech data after any of which stripe quantization, the distance of the speech data after the stripe quantization and the speech data after other stripe quantizations can be calculated respectively, then just optimum speech data can be obtained according to the distance for calculating.
Storage element 12, for storing the optimum speech data of the accompaniment that the optimum acquiring unit 11 is obtained, the optimum speech data is for evaluating to the speech data to be evaluated of one accompaniment.
In the system of the present embodiment, first quantifying unit 10 can quantify respectively to the speech data included by a plurality of voice data of an accompaniment, then optimum acquiring unit 11 obtains the optimum speech data of the accompaniment according to the speech data after many stripe quantizations and is stored by storage element 12, so it is achieved that and the i.e. optimum speech data of preset normal data is automatically generated by the evaluation system of speech data, to facilitate system to evaluate the speech data to be evaluated of the accompaniment, with need to make and pre-set criteria data are compared in system by artificial offline in prior art, the low cost of the system of the present embodiment, difficulty is little, and ageing comparison is high.
With reference to shown in Fig. 6, in a specific embodiment, the first quantifying unit 10 in the evaluation system of speech data specifically can be realized by information extraction unit 110, conversion unit 120 and information quantization unit 130, and optimum acquiring unit 11 specifically can be realized by the first computing unit 111, the second computing unit 121 and optimum determining unit 131, wherein:
Information extraction unit 110, for extracting the fundamental frequency information of the voice data, the fundamental frequency information includes fundamental frequency sequence, and fundamental frequency sequence includes the fundamental frequency value in multiple fundamental frequencies.
Conversion unit 120, for converting to the fundamental frequency information that described information extraction unit 110 is extracted so that the fundamental frequency value that the fundamental frequency information after conversion includes is small range numerical value.
Specifically, if the fundamental frequency information of the speech data of the extraction of information extraction unit 110 includes multiple fundamental frequency values, conversion unit 120 is specifically for being directly converted into small range numerical value by the plurality of fundamental frequency value;Or, conversion unit 120, specifically for the fundamental frequency information is carried out the second pretreatment, the fundamental frequency value for carrying out the described second pretreated fundamental frequency information and including is converted into into small range numerical value.Wherein, the second pretreatment includes following at least one processing mode:Low-pass filtering, compression, unusual fundamental frequency zero setting and the filling of zero-base frequency etc..
Information quantization unit 130, sequence of notes is quantified as the fundamental frequency information after the conversion unit 120 is converted, or the first pretreated fundamental frequency information will be carried out to the fundamental frequency information after the conversion be quantified as sequence of notes, the speech data after a stripe quantization includes the information of the sequence of notes;Wherein, the first pretreatment includes following at least one processing mode:Low-pass filtering and 3 points are smooth etc..
It is appreciated that, the information of above-mentioned sequence of notes includes the initial time of each note in the plurality of note, duration and corresponding pitch value, wherein, the initial time is the fundamental frequency information after the conversion or the initial time for carrying out the fundamental frequency subsequence that the first pretreated fundamental frequency information includes, when described it is a length of it is regular after one fundamental frequency subsequence length, the pitch value be it is regular after one fundamental frequency subsequence frequency values.
Information quantization unit 130 can carry out segmentation to the fundamental frequency information after conversion or the fundamental frequency sequence carried out in the first pretreated fundamental frequency information and obtain multiple fundamental frequency subsequences.Specifically, if the difference of the fundamental frequency value in two adjacent fundamental frequencies is more than preset value, from the time in two adjacent fundamental frequencies earlier above or relatively after fundamental frequency on be segmented, you can obtain the initial time of a fundamental frequency subsequence, and the end time of another fundamental frequency subsequence.So whole fundamental frequency sequence is segmented, so as to obtain multiple fundamental frequency subsequences, and obtain initial time and the end time of each fundamental frequency subsequence, the length of each fundamental frequency subsequence can determine that according to the difference of initial time and end time, in the present embodiment, information quantization unit 130 need to carry out the length of each fundamental frequency subsequence it is regular, it is regular to preset scope (such as between 0 to 20).
Then the frequency values of each fundamental frequency subsequence can represent by information quantization unit 130 with the fundamental frequency intermediate value of the fundamental frequency subsequence, and by it is the fundamental frequency intermediate value regular be integer between 0 to 24.
Further, if the pitch value by least two adjacent notes in the sequence of notes that above-mentioned process is obtained is identical, it is a note that then information quantization unit 130 needs to merge this at least two note, the initial time of a note after merging is the initial time of the note (earliest initial time in specially above-mentioned at least two note) for merging, the duration sum of at least two notes that Shi Changwei merges, pitch value are the pitch value of the arbitrary note for merging.
First computing unit 111, if the speech data after the quantization obtained for the first quantifying unit 10 has n bars, the n is the positive integer more than 1, calculates the distance of the speech data in the speech data after the n stripe quantizations after any two stripe quantization respectively.Assume that the speech data after the stripe quantization that the first quantifying unit 10 is obtained includes the information of sequence of notes, the information of the sequence of notes includes the duration and pitch value of each note in multiple notes;
Then the first computing unit 111, specifically for calculating the note Si in the speech data in the speech data after the n stripe quantizations after the first stripe quantization, with the second stripe quantization after speech data in note SjApart from D (Si,Sj) be:
Wherein:
The Δ p is expression note SiWith SjPitch it is poor, Δ p=min (abs (pi-pj), abs (pi-pj- 24)+1.0, abs (pi-pj+ 24)+1.0), the piFor note SiPitch value, pjFor note SjPitch value;
The Δ d is note SiWith SjTime difference, the σ is the weighted value of the time difference.
The distance between note in speech data after a note and the second stripe quantization in speech data after first stripe quantization is can be obtained by so, then the distance of the speech data after the speech data after first stripe quantization and the second stripe quantization is:Ultimate range between the note of the speech data after the note of the speech data after first stripe quantization and second stripe quantization.
Second computing unit 121, for the distance obtained according to the first computing unit 111, the speech data after each stripe quantization in the speech data after the n stripe quantizations is calculated respectively, one can be corresponded to apart from sum with the speech data after sum, so each stripe quantization of speech data after other n-1 stripe quantizations respectively.
Optimum determining unit 131, the speech data after the corresponding stripe quantization of the minimum range sum for second computing unit 121 is calculated is used as the optimum speech data.
With reference to shown in Fig. 7, in another specific embodiment, the evaluation system of speech data can also include the second quantifying unit 13, the 3rd computing unit 14, score value determining unit 15 and output unit 16 in addition to it can include structure as shown in Figure 5, wherein:
Second quantifying unit 13, for carrying out the speech data to be evaluated after being quantified to the speech data to be evaluated of one accompaniment.
Second quantifying unit 13 specifically first can extract speech data to be evaluated from voice data, can then proceed in the method quantified by the first quantifying unit 10 and speech data to be evaluated is quantified.
3rd computing unit 14, for calculating the first distance of the optimum speech data that the speech data to be evaluated after the quantization that second quantifying unit 13 is obtained is obtained with the optimum acquiring unit 11;
Score value determining unit 15, first distance for being calculated according to the 3rd computing unit 14 determine the evaluation score value of the speech data to be evaluated.
Specifically, the score value determining unit 15, specifically for obtaining in the speech data after many stripe quantizations, the second distance maximum with the distance of the optimum speech data, first distance are k, and the second distance is m;Determine that the evaluation score value is 100* (m-k)/m.
Output unit 16, the position inconsistent with the optimum speech data for exporting the speech data to be evaluated.And the output unit 16 can also export the evaluation score value of the determination of score value determining unit 15.
As optimum acquiring unit 11 is during optimum speech data is obtained, distance between each position of speech data to be evaluated and optimum each position of speech data can be calculated, if the corresponding distance in a certain position is more than preset value, then output unit 16 determines that speech data to be evaluated is inconsistent with optimum speech data on the position, and exports inconsistent position.
The embodiment of the present invention also provides a kind of evaluation system of speech data, its structural representation is as shown in Figure 8, the evaluation system of the speech data can be because of configuration or performance is different and the larger difference of producing ratio, one or more central processing units (central processing units can be included, CPU) 20 (for example, one or more processors) and memorizer 21, one or more store the storage medium 22 (such as one or more mass memory units) of application programs 221 or data 222.Wherein, memorizer 21 and storage medium 22 can be of short duration storage or persistently store.The program for being stored in storage medium 22 can include one or more modules (diagram is not marked), and each module can include the series of instructions operation in the evaluation system to speech data.Further, central processing unit 30 could be arranged to communicate with storage medium 22, and the series of instructions operation in storage medium 22 is performed in the evaluation system of speech data.
The evaluation system of speech data can also include one or more power supplys 23, one or more wired or wireless network interfaces 24, one or more input/output interfaces 25, and/or, one or more operating systems 223, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
The step by performed by the evaluation system of speech data described in said method embodiment can be based on the structure of the evaluation system of the speech data shown in the Fig. 8.
Below with a specific application example come the method for illustrating the embodiment of the present invention, in the present embodiment, the evaluation system of speech data is when evaluation methodology is performed:
1st, quantify the speech data included in voice data
Accompany for one, such as a certain song, respective singing data (i.e. the voice data of user) is uploaded to user the evaluation system of speech data, the evaluation system of speech data can quantify to the speech data included by the voice data of each user, specifically, for the speech data (being designated as midi) for being included for a certain bar voice data:
With reference to shown in Fig. 9, the evaluation system of speech data need first to extract from voice data including midi, specifically:
Step 301, carries out pretreatment to voice data, including:The DC component of voice data is removed, if the volume of voice data is less than preset volume, needs first carry out energy gain and then carry out noise reduction process to voice data;If the volume of voice data is less than or equal to preset volume, noise reduction process is carried out to voice data directly.
Step 302, extracts the fundamental frequency information of pretreated voice data, and fundamental frequency information includes fundamental frequency sequence, includes the fundamental frequency value in multiple fundamental frequencies in fundamental frequency sequence;Then effective fundamental frequency sequence of quiet is found, specifically, if the fundamental frequency value in above-mentioned fundamental frequency sequence in continuous x fundamental frequency is zero, and x is more than preset value, then the fundamental frequency sequence between this x fundamental frequency is quiet section.Then effectively fundamental frequency sequence is the fundamental frequency sequence of quiet section of removal.
Step 303, judges the length of effective fundamental frequency sequence whether less than threshold value (such as 35 frames), if it is less, execution step 306 after execution step 304, if greater than or be equal to, then execution step 306 after execution step 305.
Step 304, carries out medium filtering according to the physical length of effective fundamental frequency sequence.
Step 305, carries out medium filtering to effective fundamental frequency sequence according to preset window, such as, effective fundamental frequency sequence is done 10 points of medium filterings per frame.
Step 306, reduces the granularity of fundamental frequency sequence after filtering, generally 5 times, i.e., per 5 fundamental frequencies in take fundamental frequency value in a fundamental frequency, remove other fundamental frequency values.
Step 307, for reducing the fundamental frequency sequence after granularity, takes the logarithm i.e. log2 with 2 as bottom to the fundamental frequency value in each fundamental frequency, is converted into small range numerical value.
Step 308, to the fundamental frequency sequence after conversion, using two neighboring fundamental frequency fundamental frequency value difference to fundamental frequency sequence segment, specifically, if the difference of the fundamental frequency value of two adjacent fundamental frequencies is more than 0.05, then it is segmented from the fundamental frequency, multiple fundamental frequency subsequences be can be divided into, and the information of each fundamental frequency subsequence, including the initial time of fundamental frequency subsequence exported, length and frequency values, the information of the midi for as extracting.
With reference to shown in Figure 10, the evaluation system of speech data carries out regular to the information of multiple fundamental frequency subsequences, and quantization obtains a sequence of notes, that is, the speech data after quantifying, specifically:
Step 401, the information of every midi for being extracted by above-mentioned steps carries out regular to the frequency values of each fundamental frequency subsequence, and regular is the integer between 0 to 24, will frequency values be converted into the pitch value of a note, and the precision of pitch value is a semitone.
Step 402, carries out regular to the length of each fundamental frequency subsequence, and regular is the integer between 0 to 20, will length be converted into the duration of a note, and the precision of duration is 0.1s for 10 frames.
Step 403, judges whether adjacent note needs to merge, and specifically, if the pitch value of adjacent note is identical, needs to merge, if necessary to merge then execution step 404,.If need not merge, terminate flow process.
Step 404, merges to adjacent note, the note after merging when a length of each merging note duration sum, the pitch value of the note after merging is the pitch value of the note of arbitrary merging.As the duration of note after merging can change, then need to merge the duration of the note after merging again, and return execution step 402.
2nd, optimum speech data is obtained, optimum midi is designated as
The a plurality of midi of an accompaniment after to quantization carries out cluster and obtains optimum midi, the concrete method for obtaining optimum midi is shown in the method described in above-mentioned Fig. 4 correspondences embodiment, here is not repeated, and in this process, needs distance values m that record is maximum with the distance of optimum midi.
3rd, optimum midi is preset in the evaluation system of speech data.
4th, the speech data included in voice data to be evaluated is evaluated, the method described in above-mentioned Fig. 2 correspondences embodiment is shown in concrete evaluation methodology, here is not repeated.
5th, score value, and speech data to be evaluated and the midi inconsistent position of optimum are evaluated in output.Wherein due to during optimum midi is obtained, calculate the distance of a note in the note and another midi in a midi, if the distance is more than preset value, it is determined that on the position of the note, speech data to be evaluated is inconsistent with optimum midi.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment can be by program to instruct the hardware of correlation to complete, the program can be stored in a computer-readable recording medium, and storage medium can include:Read only memory (ROM), random access memory ram), disk or CD etc..
The evaluation methodology of the speech data for being provided to the embodiment of the present invention above and system are described in detail, specific case used herein is set forth to the principle and embodiment of the present invention, and the explanation of above example is only intended to help and understands the method for the present invention and its core concept;Simultaneously for one of ordinary skill in the art, according to the thought of the present invention, will change in specific embodiments and applications, in sum, this specification content should not be construed as limiting the invention.

Claims (18)

1. a kind of evaluation methodology of speech data, it is characterised in that include:
The speech data included by a plurality of voice data of one accompaniment is carried out quantifying to obtain the speech data after many stripe quantizations respectively;
Speech data after many stripe quantizations is clustered, the optimum speech data of one accompaniment is obtained;
Store the optimum speech data of one accompaniment, the optimum speech data is for evaluating to the speech data to be evaluated of one accompaniment.
2. the method for claim 1, it is characterised in that the speech data included to a voice data carries out quantifying to obtain the speech data after a stripe quantization, specifically includes:
Extract the fundamental frequency information of the voice data;
The fundamental frequency information is converted so that the fundamental frequency value that the fundamental frequency information after conversion includes is small range numerical value;
Fundamental frequency information after the conversion is quantified as sequence of notes, or the fundamental frequency information after the conversion will be carried out the first pretreated fundamental frequency information being quantified as sequence of notes, the speech data after a stripe quantization includes the information of the sequence of notes;
The information of the sequence of notes includes the initial time of each note in the plurality of note, duration and corresponding pitch value, wherein, the initial time is the fundamental frequency information after the conversion or the initial time for carrying out the fundamental frequency subsequence that the first pretreated fundamental frequency information includes, when described it is a length of it is regular after one fundamental frequency subsequence length, the pitch value be it is regular after one fundamental frequency subsequence frequency values.
3. method as claimed in claim 2, it is characterised in that the fundamental frequency information of the speech data includes multiple fundamental frequency values, then described to convert to the fundamental frequency information so that the fundamental frequency value that the fundamental frequency information after conversion includes is small range numerical value, is specifically included:The plurality of fundamental frequency value is converted into into small range numerical value directly;
Or, the fundamental frequency information is carried out the second pretreatment, the fundamental frequency value for carrying out the described second pretreated fundamental frequency information and including is converted into into small range numerical value.
4. method as claimed in claim 3, it is characterised in that second pretreatment includes following at least one processing mode:Low-pass filtering, compression, unusual fundamental frequency zero setting and the filling of zero-base frequency;
First pretreatment includes following at least one processing mode:Low-pass filtering and 3 points are smooth.
5. the method for claim 1, it is characterised in that the speech data after the quantization has n bars, the n is the positive integer more than 1, then the speech data to after many stripe quantizations is clustered, and is obtained the optimum speech data of one accompaniment, is specifically included:
The distance of the speech data in the speech data after the n stripe quantizations after any two stripe quantization is calculated respectively;
Calculate the speech data after each stripe quantization in the speech data after the n stripe quantizations respectively, respectively with other n-1 stripe quantizations after speech data apart from sum, using the speech data after a minimum range sum corresponding stripe quantization as the optimum speech data.
6. method as claimed in claim 5, it is characterised in that the speech data after a stripe quantization includes the information of sequence of notes, and the information of the sequence of notes includes the duration and pitch value of each note in multiple notes;
The note S in speech data in speech data after the n stripe quantizations after the first stripe quantizationi, with the second stripe quantization after speech data in note SjApart from D (Si,Sj), specially:
Wherein:
The Δ p is expression note SiWith SjPitch it is poor, Δ p=min (abs (pi-pj), abs (pi-pj- 24)+1.0, abs (pi-pj+ 24)+1.0), the piFor note SiPitch value, pjFor note SjPitch value;
The Δ d is note SiWith SjTime difference, the σ is the weighted value of the time difference;
Then the distance of the speech data after the speech data after first stripe quantization and the second stripe quantization is:Ultimate range between the note of the speech data after the note of the speech data after first stripe quantization and second stripe quantization.
7. the method as described in any one of claim 1 to 6, it is characterised in that methods described also includes:
Speech data to be evaluated after being quantified is carried out to the speech data to be evaluated of one accompaniment;
Calculate the first distance of the speech data to be evaluated after the quantization and the optimum speech data;
First distance according to calculating determines the evaluation score value of the speech data to be evaluated.
8. method as claimed in claim 7, it is characterised in that the evaluation score value that the speech data to be evaluated is determined according to first distance for calculating, specifically includes:
Obtain in the speech data after many stripe quantizations, the second distance maximum with the distance of the optimum speech data, first distance is k, and the second distance is m;
Determine that the evaluation score value is 100* (m-k)/m.
9. the method as described in any one of claim 1 to 6, it is characterised in that methods described also includes:
Export the speech data to be evaluated position inconsistent with the optimum speech data.
10. a kind of evaluation system of speech data, it is characterised in that include:
First quantifying unit, for carrying out respectively quantifying to obtain the speech data after many stripe quantizations to the speech data included by a plurality of voice data of an accompaniment;
Optimum acquiring unit, clusters for the speech data after many stripe quantizations for obtaining to first quantifying unit, obtains the optimum speech data of one accompaniment;
Storage element, for storing the optimum speech data of the accompaniment that the optimum acquiring unit is obtained, the optimum speech data is for evaluating to the speech data to be evaluated of one accompaniment.
11. systems as claimed in claim 10, it is characterised in that first quantifying unit, specifically include:
Information extraction unit, for extracting the fundamental frequency information of the voice data;
Conversion unit, for converting to the fundamental frequency information that described information extraction unit is extracted so that the fundamental frequency value that the fundamental frequency information after conversion includes is small range numerical value;
Information quantization unit, sequence of notes is quantified as the fundamental frequency information after the conversion unit is converted, or the first pretreated fundamental frequency information will be carried out to the fundamental frequency information after the conversion be quantified as sequence of notes, the speech data after a stripe quantization includes the information of the sequence of notes;
The information of the sequence of notes includes the initial time of each note in the plurality of note, duration and corresponding pitch value, wherein, the initial time is the fundamental frequency information after the conversion or the initial time for carrying out the fundamental frequency subsequence that the first pretreated fundamental frequency information includes, when described it is a length of it is regular after one fundamental frequency subsequence length, the pitch value be it is regular after one fundamental frequency subsequence frequency values.
12. systems as claimed in claim 11, it is characterised in that the fundamental frequency information of the speech data that described information extraction unit is extracted includes multiple fundamental frequency values, then the conversion unit, specifically for the plurality of fundamental frequency value is converted into small range numerical value directly;
Or, the conversion unit, specifically for the fundamental frequency information is carried out the second pretreatment, the fundamental frequency value for carrying out the described second pretreated fundamental frequency information and including is converted into into small range numerical value.
13. systems as claimed in claim 12, it is characterised in that second pretreatment includes following at least one processing mode:Low-pass filtering, compression, unusual fundamental frequency zero setting and the filling of zero-base frequency;
First pretreatment includes following at least one processing mode:Low-pass filtering and 3 points are smooth.
14. systems as claimed in claim 10, it is characterised in that
Speech data after the quantization that first quantifying unit is obtained has n bars, and the n is the positive integer more than 1, then the optimum acquiring unit is specifically included:
First computing unit, for calculating the distance of the speech data in the speech data after the n stripe quantizations after any two stripe quantization respectively;
Second computing unit, for calculating the speech data in the speech data after the n stripe quantizations after each stripe quantization respectively, respectively with other n-1 stripe quantizations after speech data apart from sum;
Optimum determining unit, for the speech data after the corresponding stripe quantization of minimum range sum that calculates second computing unit as the optimum speech data.
15. systems as claimed in claim 14, it is characterised in that the speech data after the stripe quantization that first quantifying unit is obtained includes the information of sequence of notes, the information of the sequence of notes includes the duration and pitch value of each note in multiple notes;
Then first computing unit, specifically for calculating the note S in the speech data in the speech data after the n stripe quantizations after the first stripe quantizationi, with the second stripe quantization after speech data in note SjApart from D (Si,Sj) be:
Wherein:
The Δ p is expression note SiWith SjPitch it is poor, Δ p=min (abs (pi-pj), abs (pi-pj- 24)+1.0, abs (pi-pj+ 24)+1.0), the piFor note SiPitch value, pjFor note SjPitch value;
The Δ d is note SiWith SjTime difference, the σ is the weighted value of the time difference;
Then the distance of the speech data after the speech data after first stripe quantization and the second stripe quantization is:Ultimate range between the note of the speech data after the note of the speech data after first stripe quantization and second stripe quantization.
16. systems as described in any one of claim 10 to 15, it is characterised in that the system also includes:
Second quantifying unit, for carrying out the speech data to be evaluated after being quantified to the speech data to be evaluated of one accompaniment;
3rd computing unit, for calculating the first distance of the speech data to be evaluated after the quantization and the optimum speech data;
Score value determining unit, first distance for being calculated according to the 3rd computing unit determine the evaluation score value of the speech data to be evaluated.
17. systems as claimed in claim 16, it is characterised in that the score value determining unit, specifically for obtaining in the speech data after many stripe quantizations, the second distance maximum with the distance of the optimum speech data, first distance are k, and the second distance is m;Determine that the evaluation score value is 100* (m-k)/m.
18. systems as described in any one of claim 10 to 15, it is characterised in that the system also includes:
Output unit, the position inconsistent with the optimum speech data for exporting the speech data to be evaluated.
CN201510586445.7A 2015-09-16 2015-09-16 Voice data evaluation method and system Active CN106548784B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510586445.7A CN106548784B (en) 2015-09-16 2015-09-16 Voice data evaluation method and system
PCT/CN2016/083043 WO2017045428A1 (en) 2015-09-16 2016-05-23 Voice data evaluation method and system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510586445.7A CN106548784B (en) 2015-09-16 2015-09-16 Voice data evaluation method and system

Publications (2)

Publication Number Publication Date
CN106548784A true CN106548784A (en) 2017-03-29
CN106548784B CN106548784B (en) 2020-04-24

Family

ID=58288032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510586445.7A Active CN106548784B (en) 2015-09-16 2015-09-16 Voice data evaluation method and system

Country Status (2)

Country Link
CN (1) CN106548784B (en)
WO (1) WO2017045428A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060702A (en) * 2019-04-29 2019-07-26 北京小唱科技有限公司 For singing the data processing method and device of the detection of pitch accuracy
WO2020237769A1 (en) * 2019-05-30 2020-12-03 腾讯音乐娱乐科技(深圳)有限公司 Accompaniment purity evaluation method and related device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101430876A (en) * 2007-11-08 2009-05-13 中国科学院声学研究所 Singing marking system and method
CN102915725A (en) * 2012-09-10 2013-02-06 福建星网视易信息系统有限公司 Human-computer interaction song singing system and method
WO2015030319A1 (en) * 2013-08-28 2015-03-05 Lee Sung-Ho Sound source evaluation method, performance information analysis method and recording medium used therein, and sound source evaluation apparatus using same

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4204941B2 (en) * 2003-09-30 2009-01-07 ヤマハ株式会社 Karaoke equipment
CN101441865A (en) * 2007-11-19 2009-05-27 盛趣信息技术(上海)有限公司 Method and system for grading sing genus game
CN102664018B (en) * 2012-04-26 2014-01-08 杭州来同科技有限公司 Singing scoring method with radial basis function-based statistical model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101430876A (en) * 2007-11-08 2009-05-13 中国科学院声学研究所 Singing marking system and method
CN102915725A (en) * 2012-09-10 2013-02-06 福建星网视易信息系统有限公司 Human-computer interaction song singing system and method
WO2015030319A1 (en) * 2013-08-28 2015-03-05 Lee Sung-Ho Sound source evaluation method, performance information analysis method and recording medium used therein, and sound source evaluation apparatus using same

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060702A (en) * 2019-04-29 2019-07-26 北京小唱科技有限公司 For singing the data processing method and device of the detection of pitch accuracy
WO2020237769A1 (en) * 2019-05-30 2020-12-03 腾讯音乐娱乐科技(深圳)有限公司 Accompaniment purity evaluation method and related device

Also Published As

Publication number Publication date
CN106548784B (en) 2020-04-24
WO2017045428A1 (en) 2017-03-23

Similar Documents

Publication Publication Date Title
US10261965B2 (en) Audio generation method, server, and storage medium
US8977551B2 (en) Parametric speech synthesis method and system
Bello Measuring structural similarity in music
KR102128926B1 (en) Method and device for processing audio information
CN110310666B (en) Musical instrument identification method and system based on SE convolutional network
Roma et al. Recurrence quantification analysis features for environmental sound recognition
CN104252862B (en) The method and apparatus for handling audio signal
CN105810213A (en) Typical abnormal sound detection method and device
WO2007070007A1 (en) A method and system for extracting audio features from an encoded bitstream for audio classification
JP2020140193A (en) Voice feature extraction algorithm based on dynamic division of cepstrum coefficient of inverse discrete cosine transform
WO2015114216A2 (en) Audio signal analysis
CN109979428B (en) Audio generation method and device, storage medium and electronic equipment
CN102945673A (en) Continuous speech recognition method with speech command range changed dynamically
CN106887233A (en) Audio data processing method and system
CN111508475B (en) Robot awakening voice keyword recognition method and device and storage medium
CN105718486B (en) Online humming retrieval method and system
CN106548784A (en) A kind of evaluation methodology of speech data and system
Foster et al. Sequential complexity as a descriptor for musical similarity
Rosenzweig et al. Detecting Stable Regions in Frequency Trajectories for Tonal Analysis of Traditional Georgian Vocal Music.
Thiruvengatanadhan Music genre classification using gmm
JP2003524218A (en) Speech processing using HMM trained with TESPAR parameters
Bammer et al. Invariance and stability of Gabor scattering for music signals
CN111696500B (en) MIDI sequence chord identification method and device
Chuan et al. A Dynamic Programming Approach to the Extraction of Phrase Boundaries from Tempo Variations in Expressive Performances.
CN107437414A (en) Parallelization visitor's recognition methods based on embedded gpu system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 510000 Guangzhou City, Guangzhou, Guangdong, Whampoa Avenue, No. 315, self - made 1-17

Applicant after: Guangzhou KuGou Networks Co., Ltd.

Address before: 510000 Guangzhou, Tianhe District branch Yun Yun Road, No. 16, self built room 2, building 1301

Applicant before: Guangzhou KuGou Networks Co., Ltd.

GR01 Patent grant
GR01 Patent grant