CN106971737A - A kind of method for recognizing sound-groove spoken based on many people - Google Patents
A kind of method for recognizing sound-groove spoken based on many people Download PDFInfo
- Publication number
- CN106971737A CN106971737A CN201610024134.6A CN201610024134A CN106971737A CN 106971737 A CN106971737 A CN 106971737A CN 201610024134 A CN201610024134 A CN 201610024134A CN 106971737 A CN106971737 A CN 106971737A
- Authority
- CN
- China
- Prior art keywords
- frequency range
- sequence number
- data group
- sequence
- many people
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000012360 testing method Methods 0.000 claims description 28
- 238000012549 training Methods 0.000 claims description 27
- 238000013144 data compression Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 9
- 230000002123 temporal effect Effects 0.000 claims description 7
- 230000015572 biosynthetic process Effects 0.000 claims description 6
- 238000012790 confirmation Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 4
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- TZCXTZWJZNENPQ-UHFFFAOYSA-L barium sulfate Chemical compound [Ba+2].[O-]S([O-])(=O)=O TZCXTZWJZNENPQ-UHFFFAOYSA-L 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/14—Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Collating Specific Patterns (AREA)
Abstract
The invention discloses a kind of method for recognizing sound-groove spoken based on many people, belong to technical field of biometric identification;The method for recognizing sound-groove, sound source can be decomposed when many people speak simultaneously, obtain the voice of each speaker, everyone voice is matched with default frequency range, the voice of the speaker of identification is recognized the need for according to matching similarity, Application on Voiceprint Recognition is being carried out to the voice, amount of calculation is smaller, storage and computing resource can be saved, the accuracy rate of identification is high, and overcomes the problem of modeling method based on probability statistics is present, and the intelligence system for being suitable for limited system resources is used.The first frequency for representing the speaker of children is pre-set simultaneously and represents the second frequency of the speaker of adult and is compared respectively, further improves the degree of accuracy for the Application on Voiceprint Recognition spoken based on many people.
Description
Technical field
The present invention relates to technical field of biometric identification, more particularly to a kind of Application on Voiceprint Recognition spoken based on many people
Method.
Background technology
As Application on Voiceprint Recognition and fingerprint, iris, recognition of face etc., belong to one kind of bio-identification, recognized
To be most natural living things feature recognition identity authentication mode.Can be easily to saying by Application on Voiceprint Recognition
The identity of words people is verified, and the privacy of this verification mode is very high, because the usual nothing of vocal print
Method and is stolen at fraudulent copying, thus Application on Voiceprint Recognition have in various fields especially smart machine field it is prominent
The application advantage gone out.
The basic process of Application on Voiceprint Recognition is voice collecting, feature extraction, disaggregated model.Common voice is special
It is the short-term stationarity characteristic using voice to levy extracting method, is converted speech into using U.S. Cepstrum Transform method
Identification feature collection, is modeled the classification mould for obtaining speaker to speaker's voice by learning process afterwards
Type, then obtains the result of Application on Voiceprint Recognition by all kinds of identification models.But said process exist it is following several
Individual problem:(1) model of above-mentioned Application on Voiceprint Recognition needs to learn more samples to apply;(2) foundation
The complexity of the calculating for the Application on Voiceprint Recognition that above-mentioned identification model is carried out is higher;(3) according to above-mentioned identification mould
It is larger that type calculates obtained model data amount;(4) when multiple speakers speak simultaneously, it is impossible to identify
Need the voice of the speaker of identification.In summary, it is above-mentioned for the intelligence system of resource-constrained
Both the problem of depositing limits the application of voiceprint recognition algorithm of the prior art.
The content of the invention
According to the above-mentioned problems in the prior art, a kind of Application on Voiceprint Recognition spoken based on many people is now provided
The technical scheme of method, is specifically included:
A kind of method for recognizing sound-groove spoken based on many people, wherein:Default one first frequency range and one second frequency
Section, first frequency range is higher than second frequency range, comprises the steps:
Step S1, receives the sound source of multiple speakers;
Step S2, is decomposed to the sound source, to obtain everyone voice respectively;
Step S3, everyone voice is matched with first frequency range, corresponding to obtain
The matching degree of association, or
Everyone voice is matched with second frequency range, associated with obtaining corresponding matching
Degree;
Step S4, extracts the maximum corresponding voice of the matching degree of association, by the voice and institute
State the first frequency range or second frequency range is fitted;
Step S5, by the different background being respectively under first frequency range or second frequency range, no
Voice with voice is divided into the identification section of length-specific;
Step S6, does to each identification section and corresponding multiple identification features is obtained after eigentransformation,
And respectively constitute correspondence described first using all identification features for being associated with all identification sections
The identification feature space of frequency range, or correspond to the identification feature space of second frequency range;
Step S7, plural sub-spaces are divided into by the identification feature space, and each with description information
The subspace being divided, and assign a corresponding sequence number to each subspace respectively;
Step S8, will be associated with training in first frequency range or in second frequency range respectively
Every training sentence of model is done to be obtained including the time sequence characteristic point of corresponding time sequence characteristic point after eigentransformation
Collection, each described subspace that each time sequence characteristic point is respectively allocated under same frequency range, according to every
The sequence number of the corresponding subspace of the individual time sequence characteristic point formed respectively be associated with first frequency range or
The First ray of second frequency range described in person, and and then the corresponding training identification feature of formation;
Step S9, will be associated with test in first frequency range or in second frequency range respectively
Every test statement of model, which is done, obtains the temporal aspect point set after eigentransformation, each sequential is special
Levy and be a little respectively allocated into subspace each described, according to the corresponding son of each time sequence characteristic point
The sequence number in space forms the second sequence for being associated with first frequency range or second frequency range respectively, and
And then form corresponding test identification feature;
Step S10, contrast is associated with the training identification feature of first frequency range and the test is recognized
Whether feature is similar, and the confirmation knot for the Application on Voiceprint Recognition for obtaining speaking based on many people according to comparing result processing
Really, or
For whether being associated with the training identification feature of second frequency range and the test identification feature
It is similar, and the confirmation result for the Application on Voiceprint Recognition for obtaining speaking based on many people according to comparing result processing.
It is preferred that, the method for recognizing sound-groove that should be spoken based on many people, wherein, in the step S8, each
The time sequence characteristic point is dispensed into each described subspace according to nearest neighbouring rule.
It is preferred that, should the method for recognizing sound-groove that be spoken based on many people, wherein, will be by the step S8
Each the described subspace for being dispensed into the time sequence characteristic point constitutes a spatial sequence according to the sequence number, and
Using the spatial sequence as the First ray, to form the training identification feature.
It is preferred that, should the method for recognizing sound-groove that be spoken based on many people, wherein, will be by the step S9
Each the described subspace for being dispensed into the time sequence characteristic point constitutes a spatial sequence according to the sequence number, and
Using the control sequence as second sequence, to form the test identification feature.
It is preferred that, the method for recognizing sound-groove that should be spoken based on many people, wherein, it is described in the step S8
Spatial sequence includes being associated with the data group of each subspace, a data group correspondence one
The sequence number;
After the spatial sequence is formed, in addition to respectively in first frequency range or second frequency
The process for the first data compression that the spatial sequence of section is carried out, be specially:
Step S81, the sequence number of each data group of record, and record is associated with each sequence number
Repetition sequence number quantity;
Step S82, the sequence number quantity that repeats for judging whether the sequence number is 1, and existing
Step S83 is turned to when stating the data group that repetition sequence number quantity is 1;
Step S83, it is the 1 corresponding data group of the sequence number to delete the sequence number quantity that repeats;
Step S84, judge the deleted data group previous data group the sequence number whether with quilt
The sequence number of latter data group of the data group deleted is identical:
If identical, the previous data group and the latter data are combined simultaneously;
If differing, retain the previous data group and the latter data group;
Institute is formed after being performed both by first data compression to all data groups in the spatial sequence
State First ray.
It is preferred that, the method for recognizing sound-groove that should be spoken based on many people, wherein, it is described in the step S9
Spatial sequence includes being associated with the data group of each subspace, a data group correspondence one
The sequence number;
After the spatial sequence is formed, in addition to respectively in first frequency range or second frequency
The process for the second data compression that the spatial sequence of section is carried out, be specially:
Step S91, the sequence number of each data group of record, and record is associated with each sequence number
Repetition sequence number quantity;
Step S92, the sequence number quantity that repeats for judging whether the sequence number is 1, and existing
Step S93 is turned to when stating the data group that repetition sequence number quantity is 1;
Step S93, it is the 1 corresponding data group of the sequence number to delete the sequence number quantity that repeats;
Step S94, judge the deleted data group previous data group the sequence number whether with quilt
The sequence number of latter data group of the data group deleted is identical:
If identical, the previous data group and the latter data are combined simultaneously;
If differing, retain the previous data group and the latter data group;
Institute is formed after being performed both by second data compression to all data groups in the spatial sequence
State the second sequence.
It is preferred that, the method for recognizing sound-groove that should be spoken based on many people, wherein:The eigentransformation is U.S. cepstrum
Conversion.
It is preferred that, the method for recognizing sound-groove that should be spoken based on many people, wherein:In the execution U.S. Cepstrum Transform
During, every sentence is divided into the frames of 20ms mono- respectively, and 10ms frame is pipetted into out pass
It is coupled to the sentence frame of the sentence;
Then, remove Jing Yin in units of frame, help every frame after Cepstrum Transform to stay 12 to the sentence frame
Coefficient, and constituted the identification feature with 12 coefficients.
It is preferred that, the method for recognizing sound-groove that should be spoken based on many people, wherein:In the step S7, adopt
Identification feature space is divided into several subspaces with " K- averages " algorithm, each son after division is empty
Between the description information of the correspondence subspace is recorded as with the central point of " K- averages " respectively.
The beneficial effect of above-mentioned technical proposal is:A kind of method for recognizing sound-groove spoken based on many people is provided,
Sound source can be decomposed when many people speak simultaneously, obtain the voice of each speaker, will be each
The voice of people is matched with default frequency range, and the speaker of identification is recognized the need for according to matching similarity
Voice, Application on Voiceprint Recognition is being carried out to the voice, amount of calculation is smaller, can save storage and computing resource,
The accuracy rate of identification is high, and overcomes the problem of modeling method based on probability statistics is present, and is suitable for
The intelligence system of limited system resources is used.The first frequency for the speaker for representing children is pre-set simultaneously
The second frequency of the speaker of rate and expression adult is simultaneously compared respectively, is further improved based on many
The degree of accuracy for the Application on Voiceprint Recognition that people speaks.
Brief description of the drawings
Fig. 1 be the present invention preferred embodiment in, a kind of method for recognizing sound-groove spoken based on many people
Overview flow chart;
Fig. 2 be the present invention preferred embodiment in, the schematic flow sheet of the first data compression;
Fig. 3 be the present invention preferred embodiment in, the schematic flow sheet of the second data compression.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out
Clearly and completely describe, it is clear that described embodiment is only a part of embodiment of the invention, and
The embodiment being not all of.Based on the embodiment in the present invention, those of ordinary skill in the art are not making
The every other embodiment obtained on the premise of going out creative work, belongs to the scope of protection of the invention.
It should be noted that in the case where not conflicting, the embodiment in the present invention and the spy in embodiment
Levying to be mutually combined.
The invention will be further described with specific embodiment below in conjunction with the accompanying drawings, but not as the present invention's
Limit.
In the preferred embodiment of the present invention, based on the above-mentioned problems in the prior art, one is now provided
Plant the method for recognizing sound-groove spoken based on many people.This can be applicable based on the method for recognizing sound-groove that many people speak
In the smart machine with voice control function, such as the intelligent robot in applied to personal air.
In the above-mentioned method for recognizing sound-groove spoken based on many people, one first frequency range and one the are preset first
Two frequency ranges, first frequency range is higher than second frequency range.Specifically, for different users,
The frequency of its voice may be different, and progress division rough to frequency can be divided into the speaker's of correspondence adult
Relatively low frequency range, and correspond to the higher frequency range of the speaker of children.
Further, for the speaker of adult and the speaker of children, it is spoken based on many people
Application on Voiceprint Recognition may and differ, be characterized in particular in the extraction of its vocal print feature and corresponding sound-groove model
Structure might have difference.Therefore in technical solution of the present invention, the frequency range of two phonetic inceptings is set,
And recognized the voice of adult and the speech differentiation of children according to the two frequency ranges, so as to further be lifted
Accuracy of identification.In other words, the first frequency range above can be used to indicate that the voice frequency of the speaker of children
Section, the second frequency range can be used to indicate that the voice band of the speaker of adult.Therefore, it is of the invention preferable
Embodiment in, above-mentioned two frequency range can accordingly be changed according to the constantly cumulative of experimental data, from
And reach the mesh of a voice band that can accurately represent adult speaker and children speaker respectively
's.
Then in preferred embodiment of the invention, as shown in figure 1, the above-mentioned Application on Voiceprint Recognition spoken based on many people
Method specifically includes following step:
Step S1, receives the sound source of multiple speakers;
Step S2, is decomposed to sound source, to obtain everyone voice respectively;
Step S3, everyone voice is matched with the first frequency range, is closed with obtaining corresponding matching
Connection degree, or
Everyone voice is matched with the second frequency range, to obtain the corresponding matching degree of association;
Step S4, extracts the maximum corresponding voice of the matching degree of association, by voice and the first frequency range or the
Two frequency ranges are fitted;
Step S5, by the different background being respectively under the first frequency range or the second frequency range, different voice
Voice is divided into the identification section of length-specific;
Step S6, does to each identification section and corresponding multiple identification features is obtained after eigentransformation, and adopt
The identification feature for respectively constituting the first frequency range of correspondence with all identification features for being associated with all identification sections is empty
Between, or correspond to the identification feature space of the second frequency range;
Step S7, is divided into plural sub-spaces, and each drawn with description information by identification feature space
The subspace divided, and assign a corresponding sequence number to every sub-spaces respectively;
Step S8, will be associated with the every of training pattern in the first frequency range or in the second frequency range respectively
Bar training sentence, which is done, obtains the temporal aspect point set for including corresponding time sequence characteristic point after eigentransformation, each
Time sequence characteristic point is respectively allocated each sub-spaces under same frequency range, according to each time sequence characteristic point correspondence
The sequence number of subspace form the First ray for being associated with the first frequency range or the second frequency range respectively, and and then
Form corresponding training identification feature;
Step S9, will be associated with the every of test model in the first frequency range or in the second frequency range respectively
Bar test statement does and temporal aspect point set is obtained after eigentransformation, each time sequence characteristic point be respectively allocated into
Each sub-spaces, form and are associated with first respectively according to the sequence number of the corresponding subspace of each time sequence characteristic point
Second sequence of frequency range or the second frequency range, and and then the corresponding test identification feature of formation;
Step S10, contrast be associated with the training identification feature of the first frequency range with test identification feature whether phase
Seemingly, the confirmation result for obtaining Application on Voiceprint Recognition and is handled according to comparing result, or
It is whether similar to testing identification feature for being associated with the training identification feature of the second frequency range, and according to
Comparing result processing obtains the confirmation result of Application on Voiceprint Recognition.
In the present embodiment, the method for recognizing sound-groove spoken based on many people can be in many people while when speaking, can
Sound source is decomposed, the voice of each speaker is obtained, everyone voice is entered with default frequency range
Row matching, the voice of the speaker of identification is recognized the need for according to matching similarity, is carried out to the voice
Application on Voiceprint Recognition, amount of calculation is smaller, can save storage and computing resource, and the accuracy rate of identification is high, and
The problem of modeling method based on probability statistics is present is overcome, is suitable for the intelligence system of limited system resources
System is used.The first frequency for representing the speaker of children is pre-set simultaneously and represents speaking for adult
The second frequency of people is simultaneously compared respectively, further improves the degree of accuracy of Application on Voiceprint Recognition.
In the preferred embodiment of the present invention, on the basis of above-mentioned pre-set, above-mentioned steps S5-S6
In, obtain first be respectively under the first frequency range or the second frequency range based on different background, different voice
Voice, and these voices are divided into the identification section of length-specific.Specifically, can be by the different back ofs the body
Scape, the corresponding every sentence of voice of different voice are divided into multiple sentence frames by a frame of 20ms,
And pipette 10ms sentence frame, then remove Jing Yin in units of every frame, cepstrum is helped to speech frame
Conversion, 12 coefficients are stayed per frame, and 12 coefficients are to constitute identification feature.The identification of all voice segments
Feature constitutes identification feature collection, that is, constitutes corresponding identification feature space.
In the preferred embodiment of the present invention, in above-mentioned steps S7, it will be recognized using " K- averages " algorithm
Feature space is divided into plural sub-spaces, and several subspaces after division are respectively with the center of " K- averages "
Point is recorded as the data description of the subspace, and each sub-spaces are numbered, and record is per sub-spaces
Description information sequence number corresponding with its.Above-mentioned steps are same under the first frequency range or the second frequency range
Identification feature space perform respectively.
It is empty to the son under the first frequency range or the second frequency range respectively in the preferred embodiment of the present invention
Between carry out as above-mentioned step S8 operation:Every training sentence for being associated with training pattern is done into feature change
Obtain including the temporal aspect point set of corresponding time sequence characteristic point after changing, each time sequence characteristic point is divided respectively
Supplying is distinguished with each sub-spaces under frequency range according to the sequence number of the corresponding subspace of each time sequence characteristic point
Form the First ray for being associated with the first frequency range or the second frequency range, and and then the corresponding training identification of formation
Feature.
Specifically, in preferred embodiment of the invention, so-called training sentence can be by instructing repeatedly
The part for the training pattern that reference is carried out when internal system is compared for system is defaulted in after white silk.
Specifically, in preferred embodiment of the invention, in above-mentioned steps S8, by each temporal aspect
Point is dispensed under same frequency range (the first frequency range or the second frequency range) respectively according to nearest neighbouring rule
In each sub-spaces, and the sequence number of the corresponding subspace of each time sequence characteristic point is recorded, ultimately form one
Individual First ray, the First ray is made up of the sequence number of different subspaces, for example (2,2,4,8,8,
8th, 5,5,5,5,5), and then corresponding training identification feature is formed according to the First ray.
In the preferred embodiment of the present invention, similarly, in above-mentioned steps S9, respectively in above-mentioned
Subspace under first frequency range or the second frequency range is proceeded as follows:Test to being associated with test model
Sentence is done and temporal aspect point set is obtained after eigentransformation, and each time sequence characteristic point is respectively allocated into each height
Space, formed respectively according to the sequence number of the corresponding subspace of each time sequence characteristic point be associated with the first frequency range or
Second sequence of the frequency range of person second, and and then the corresponding test identification feature of formation.
In the preferred embodiment of the present invention, so-called test statement, it is associated with test model, that is,
Need the sentence compared.
Specifically, in preferred embodiment of the invention, in above-mentioned steps S9, equally by above-mentioned test
Each time sequence characteristic point in sentence is dispensed into (first under same frequency range respectively according to nearest neighbouring rule
Frequency range or the second frequency range) each sub-spaces in, and it is empty to record the corresponding son of each time sequence characteristic point
Between sequence number, ultimately form second sequence, the same sequence number by different subspaces of second sequence
Composition, such as (2,3,3,5,5,8,6,6,6,4,4), and then according to the second sequence shape
Into corresponding test identification feature.In the preferred embodiment of the present invention, above-mentioned steps S8 and step S9
Between and in the absence of the relation that mutually depends on, (i.e. step S9 execution is necessarily finished with step S8
Premised on), therefore above-mentioned steps S8 and step S9 can carry out simultaneously.Step is still shown in Fig. 1
The embodiment that S8 and step S9 orders are carried out.
In the preferred embodiment of the present invention, in above-mentioned steps S10, the training of above-mentioned formation is recognized special
Test identification feature of seeking peace is compared, and the final result for obtaining Application on Voiceprint Recognition is handled according to comparison result.
Specifically, in above-mentioned steps S10, equally compared respectively in accordance with the first frequency range and the second frequency range
It is right, i.e., by the test identification feature under the first frequency range and the training identification feature being similarly under the first frequency range
It is compared, and the result for obtaining Application on Voiceprint Recognition is handled according to comparison result.Similarly, by the second frequency range
Under test identification feature be compared with the training identification feature that is similarly under the second frequency range, and according to
Comparison result processing obtains the result of Application on Voiceprint Recognition.
Further, in preferred embodiment of the invention, in above-mentioned steps S8, wrapped in spatial sequence
Include the data group for being associated with every sub-spaces, data group one sequence number of correspondence;
Then after spatial sequence is formed, in addition to respectively to the space in the first frequency range or the second frequency range
The process for the first data compression that sequence is carried out, specifically as shown in Fig. 2 being:
Step S81, records the sequence number of each data group, and record the repetition sequence number for being associated with each sequence number
Quantity;
Step S82, the repetition sequence number quantity for judging whether sequence number is 1, and repeats sequence number existing
Step S83 is turned to when quantity is 1 data group;
Step S83, deletes the corresponding data group of sequence number for repeating that sequence number quantity is 1;
Step S84, judge deleted data group previous data group sequence number whether with it is deleted
The sequence number of latter data group of data group is identical:
If identical, previous data group and latter data are combined simultaneously;
If differing, retain previous data group and latter data group;
First ray is formed after being performed both by the first data compression to all data groups in spatial sequence.
Specifically, in preferred embodiment of the invention, during above-mentioned first data compression, record
The sequence number of subspace and the quantity of same sequence number, regard the quantity of sequence number and same sequence number as one group of data
Arranged, when the quantity of same sequence number is 1, remove this group of data.In the foot stool of the present invention
Embodiment in, the data of serial number 4 only have 1, then carry out the first data compression during delete
Fall this group of data.
If after this group of data were removed, sequence number and one group of rear data in front of the data in one group of data
In sequence number it is identical when, then by two combination simultaneously.The sequence number of the data group newly formed and the deleted data
The sequence number of the one group of data in front of group is identical, and the quantity of same sequence number is in front of this group of deleted data one
The quantity sum of the quantity and deleted one group of this group of data rear data of group data.Or, deleting
After this group of data, the sequence number in front of the data in one group of data is different with the sequence number in the data of one group of rear,
Then retain this two groups of data simultaneously.For example, in the preferred embodiment of the present invention, working as serial number
After 4 data group is removed, positioned at the serial number 2 of the data of this group of data previous group, positioned at this group of data
The serial number 8,2 and 8 of the data of later group is differed, so retaining former data group.
In the preferred embodiment of the present invention, the First ray after the first data compression is above-mentioned instruction
Practice identification feature.
Correspondingly, in preferred embodiment of the invention, in above-mentioned steps S9, spatial sequence includes
It is associated with the data group of every sub-spaces, data group one sequence number of correspondence;
Then after spatial sequence is formed, in addition to respectively to the space in the first frequency range or the second frequency range
The process for the second data compression that sequence is carried out, specifically as shown in figure 3, being:
Step S91, records the sequence number of each data group, and record the repetition sequence number for being associated with each sequence number
Quantity;
Step S92, the repetition sequence number quantity for judging whether sequence number is 1, and repeats sequence number existing
Step S93 is turned to when quantity is 1 data group;
Step S93, deletes the corresponding data group of sequence number for repeating that sequence number quantity is 1;
Step S94, judge deleted data group previous data group sequence number whether with it is deleted
The sequence number of latter data group of data group is identical:
If identical, previous data group and latter data are combined simultaneously;
If differing, retain previous data group and latter data group;
All data groups in spatial sequence are performed both by after the second data compression forming the second sequence.
Specifically, in the step in similar above-mentioned steps S8, step S9, the sequence of same record subspace
Number and same sequence number quantity, arranged the quantity of sequence number and same sequence number as one group of data.
When the quantity of same sequence number is 1, remove this group of data.
If after this group of data were removed, sequence number and one group of rear data in front of the data in one group of data
In sequence number it is identical when, then by two combination simultaneously.The sequence number of the data group newly formed and the deleted data
The sequence number of the one group of data in front of group is identical, and the quantity of same sequence number is in front of this group of deleted data one
The quantity sum of the quantity and deleted one group of this group of data rear data of group data.Or, deleting
After this group of data, the sequence number in front of the data in one group of data is different with the sequence number in the data of one group of rear,
Then retain this two groups of data simultaneously.For example, in the preferred embodiment of the present invention, working as serial number
After 4 data group is removed, positioned at the serial number 2 of the data of this group of data previous group, positioned at this group of data
The serial number 8,2 and 8 of the data of later group is differed, so retaining former data group.
Similarly, in preferred embodiment of the invention, above-mentioned the second sequence Jing Guo the second data compression
As test identification feature.
In above-mentioned steps S10, eventually through same frequency range (the first frequency range or the second frequency range) will be in
Under training identification feature and test identification feature be compared, and handled according to comparison result and obtain final
Application on Voiceprint Recognition result.
The execution of above-mentioned steps make it that the amount of calculation of Application on Voiceprint Recognition is smaller, and discrimination more preferably, and needs place
The data volume of reason is also relatively small.
The foregoing is only preferred embodiments of the present invention, not thereby limit embodiments of the present invention and
Protection domain, to those skilled in the art, should can appreciate that all utilization description of the invention
And the equivalent substitution made by diagramatic content and the scheme obtained by obvious change, it should include
Within the scope of the present invention.
Claims (9)
1. a kind of method for recognizing sound-groove spoken based on many people, it is characterised in that:Default one first frequency range with
And one second frequency range, first frequency range is higher than second frequency range, comprises the steps:
Step S1, receives the sound source of multiple speakers;
Step S2, is decomposed to the sound source, to obtain everyone voice respectively;
Step S3, everyone voice is matched with first frequency range, corresponding to obtain
The matching degree of association, or
Everyone voice is matched with second frequency range, associated with obtaining corresponding matching
Degree;
Step S4, extracts the maximum corresponding voice of the matching degree of association, by the voice and institute
State the first frequency range or second frequency range is fitted;
Step S5, by the different background being respectively under first frequency range or second frequency range, no
Voice with voice is divided into the identification section of length-specific;
Step S6, does to each identification section and corresponding multiple identification features is obtained after eigentransformation,
And respectively constitute correspondence described first using all identification features for being associated with all identification sections
The identification feature space of frequency range, or correspond to the identification feature space of second frequency range;
Step S7, plural sub-spaces are divided into by the identification feature space, and each with description information
The subspace being divided, and assign a corresponding sequence number to each subspace respectively;
Step S8, will be associated with training in first frequency range or in second frequency range respectively
Every training sentence of model is done to be obtained including the time sequence characteristic point of corresponding time sequence characteristic point after eigentransformation
Collection, each described subspace that each time sequence characteristic point is respectively allocated under same frequency range, according to every
The sequence number of the corresponding subspace of the individual time sequence characteristic point formed respectively be associated with first frequency range or
The First ray of second frequency range described in person, and and then the corresponding training identification feature of formation;
Step S9, will be associated with test in first frequency range or in second frequency range respectively
Every test statement of model, which is done, obtains the temporal aspect point set after eigentransformation, each sequential is special
Levy and be a little respectively allocated into subspace each described, according to the corresponding son of each time sequence characteristic point
The sequence number in space forms the second sequence for being associated with first frequency range or second frequency range respectively, and
And then form corresponding test identification feature;
Step S10, contrast is associated with the training identification feature of first frequency range and the test is recognized
Whether feature is similar, and the confirmation knot for the Application on Voiceprint Recognition for obtaining speaking based on many people according to comparing result processing
Really, or
For whether being associated with the training identification feature of second frequency range and the test identification feature
It is similar, and the confirmation result for the Application on Voiceprint Recognition for obtaining speaking based on many people according to comparing result processing.
2. the method for recognizing sound-groove as claimed in claim 1 spoken based on many people, it is characterised in that institute
State in step S8, it is empty that each time sequence characteristic point is dispensed into each described son according to nearest neighbouring rule
In.
3. the method for recognizing sound-groove as claimed in claim 1 spoken based on many people, it is characterised in that institute
State in step S8, by each the described subspace for being dispensed into the time sequence characteristic point according to the sequence number
A spatial sequence is constituted, and the spatial sequence is known as the First ray with forming the training
Other feature.
4. the method for recognizing sound-groove as claimed in claim 1 spoken based on many people, it is characterised in that institute
State in step S9, by each the described subspace for being dispensed into the time sequence characteristic point according to the sequence number
A spatial sequence is constituted, and the control sequence is known as second sequence with forming the test
Other feature.
5. the method for recognizing sound-groove as claimed in claim 3 spoken based on many people, it is characterised in that institute
State in step S8, the spatial sequence includes being associated with the data group of each subspace, one
One sequence number of the data group correspondence;
After the spatial sequence is formed, in addition to respectively in first frequency range or second frequency
The process for the first data compression that the spatial sequence of section is carried out, be specially:
Step S81, the sequence number of each data group of record, and record is associated with each sequence number
Repetition sequence number quantity;
Step S82, the sequence number quantity that repeats for judging whether the sequence number is 1, and existing
Step S83 is turned to when stating the data group that repetition sequence number quantity is 1;
Step S83, it is the 1 corresponding data group of the sequence number to delete the sequence number quantity that repeats;
Step S84, judge the deleted data group previous data group the sequence number whether with quilt
The sequence number of latter data group of the data group deleted is identical:
If identical, the previous data group and the latter data are combined simultaneously;
If differing, retain the previous data group and the latter data group;
Institute is formed after being performed both by first data compression to all data groups in the spatial sequence
State First ray.
6. the method for recognizing sound-groove as claimed in claim 4 spoken based on many people, it is characterised in that institute
State in step S9, the spatial sequence includes being associated with the data group of each subspace, one
One sequence number of the data group correspondence;
After the spatial sequence is formed, in addition to respectively in first frequency range or second frequency
The process for the second data compression that the spatial sequence of section is carried out, be specially:
Step S91, the sequence number of each data group of record, and record is associated with each sequence number
Repetition sequence number quantity;
Step S92, the sequence number quantity that repeats for judging whether the sequence number is 1, and existing
Step S93 is turned to when stating the data group that repetition sequence number quantity is 1;
Step S93, it is the 1 corresponding data group of the sequence number to delete the sequence number quantity that repeats;
Step S94, judge the deleted data group previous data group the sequence number whether with quilt
The sequence number of latter data group of the data group deleted is identical:
If identical, the previous data group and the latter data are combined simultaneously;
If differing, retain the previous data group and the latter data group;
Institute is formed after being performed both by second data compression to all data groups in the spatial sequence
State the second sequence.
7. the method for recognizing sound-groove as claimed in claim 1 spoken based on many people, it is characterised in that:Institute
It is U.S. Cepstrum Transform to state eigentransformation.
8. the method for recognizing sound-groove as claimed in claim 7 spoken based on many people, it is characterised in that:In
During performing the U.S. Cepstrum Transform, every sentence is divided into the frames of 20ms mono- respectively, and
10ms frame is pipetted out to the sentence frame for being associated with the sentence;
Then, remove Jing Yin in units of frame, help every frame after Cepstrum Transform to stay 12 to the sentence frame
Coefficient, and constituted the identification feature with 12 coefficients.
9. the method for recognizing sound-groove as claimed in claim 1 spoken based on many people, it is characterised in that:Institute
State in step S7, identification feature space is divided into by several subspaces using " K- averages " algorithm, after division
Each subspace the described of the correspondence subspace be recorded as with the central point of " K- averages " respectively retouched
State information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610024134.6A CN106971737A (en) | 2016-01-14 | 2016-01-14 | A kind of method for recognizing sound-groove spoken based on many people |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610024134.6A CN106971737A (en) | 2016-01-14 | 2016-01-14 | A kind of method for recognizing sound-groove spoken based on many people |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106971737A true CN106971737A (en) | 2017-07-21 |
Family
ID=59335025
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610024134.6A Pending CN106971737A (en) | 2016-01-14 | 2016-01-14 | A kind of method for recognizing sound-groove spoken based on many people |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106971737A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108447502A (en) * | 2018-03-09 | 2018-08-24 | 福州米鱼信息科技有限公司 | A kind of memo method and terminal based on voice messaging |
CN109051405A (en) * | 2018-08-31 | 2018-12-21 | 深圳市研本品牌设计有限公司 | A kind of intelligent dustbin and storage medium |
CN109256121A (en) * | 2018-08-31 | 2019-01-22 | 深圳市研本品牌设计有限公司 | The rubbish put-on method and system of multi-person speech identification |
CN109256120A (en) * | 2018-08-31 | 2019-01-22 | 深圳市研本品牌设计有限公司 | A kind of voice dustbin and storage medium |
CN110473566A (en) * | 2019-07-25 | 2019-11-19 | 深圳壹账通智能科技有限公司 | Audio separation method, device, electronic equipment and computer readable storage medium |
CN111694539A (en) * | 2020-06-23 | 2020-09-22 | 北京小米松果电子有限公司 | Method, apparatus and medium for switching between earpiece and speaker |
CN115331673A (en) * | 2022-10-14 | 2022-11-11 | 北京师范大学 | Voiceprint recognition household appliance control method and device in complex sound scene |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6130949A (en) * | 1996-09-18 | 2000-10-10 | Nippon Telegraph And Telephone Corporation | Method and apparatus for separation of source, program recorded medium therefor, method and apparatus for detection of sound source zone, and program recorded medium therefor |
CN101661754A (en) * | 2003-10-03 | 2010-03-03 | 旭化成株式会社 | Data processing unit, method and control program |
CN101944359A (en) * | 2010-07-23 | 2011-01-12 | 杭州网豆数字技术有限公司 | Voice recognition method facing specific crowd |
CN102354496A (en) * | 2011-07-01 | 2012-02-15 | 中山大学 | PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof |
CN102623008A (en) * | 2011-06-21 | 2012-08-01 | 中国科学院苏州纳米技术与纳米仿生研究所 | Voiceprint identification method |
CN103943104A (en) * | 2014-04-15 | 2014-07-23 | 海信集团有限公司 | Voice information recognition method and terminal equipment |
CN104185868A (en) * | 2012-01-24 | 2014-12-03 | 澳尔亚有限公司 | Voice authentication and speech recognition system and method |
CN104392718A (en) * | 2014-11-26 | 2015-03-04 | 河海大学 | Robust voice recognition method based on acoustic model array |
-
2016
- 2016-01-14 CN CN201610024134.6A patent/CN106971737A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6130949A (en) * | 1996-09-18 | 2000-10-10 | Nippon Telegraph And Telephone Corporation | Method and apparatus for separation of source, program recorded medium therefor, method and apparatus for detection of sound source zone, and program recorded medium therefor |
CN101661754A (en) * | 2003-10-03 | 2010-03-03 | 旭化成株式会社 | Data processing unit, method and control program |
CN101944359A (en) * | 2010-07-23 | 2011-01-12 | 杭州网豆数字技术有限公司 | Voice recognition method facing specific crowd |
CN102623008A (en) * | 2011-06-21 | 2012-08-01 | 中国科学院苏州纳米技术与纳米仿生研究所 | Voiceprint identification method |
CN102354496A (en) * | 2011-07-01 | 2012-02-15 | 中山大学 | PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof |
CN104185868A (en) * | 2012-01-24 | 2014-12-03 | 澳尔亚有限公司 | Voice authentication and speech recognition system and method |
CN103943104A (en) * | 2014-04-15 | 2014-07-23 | 海信集团有限公司 | Voice information recognition method and terminal equipment |
CN104392718A (en) * | 2014-11-26 | 2015-03-04 | 河海大学 | Robust voice recognition method based on acoustic model array |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108447502A (en) * | 2018-03-09 | 2018-08-24 | 福州米鱼信息科技有限公司 | A kind of memo method and terminal based on voice messaging |
CN109051405A (en) * | 2018-08-31 | 2018-12-21 | 深圳市研本品牌设计有限公司 | A kind of intelligent dustbin and storage medium |
CN109256121A (en) * | 2018-08-31 | 2019-01-22 | 深圳市研本品牌设计有限公司 | The rubbish put-on method and system of multi-person speech identification |
CN109256120A (en) * | 2018-08-31 | 2019-01-22 | 深圳市研本品牌设计有限公司 | A kind of voice dustbin and storage medium |
CN110473566A (en) * | 2019-07-25 | 2019-11-19 | 深圳壹账通智能科技有限公司 | Audio separation method, device, electronic equipment and computer readable storage medium |
CN111694539A (en) * | 2020-06-23 | 2020-09-22 | 北京小米松果电子有限公司 | Method, apparatus and medium for switching between earpiece and speaker |
CN111694539B (en) * | 2020-06-23 | 2024-01-30 | 北京小米松果电子有限公司 | Method, device and medium for switching between earphone and loudspeaker |
CN115331673A (en) * | 2022-10-14 | 2022-11-11 | 北京师范大学 | Voiceprint recognition household appliance control method and device in complex sound scene |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106971737A (en) | A kind of method for recognizing sound-groove spoken based on many people | |
CN107464568B (en) | Speaker identification method and system based on three-dimensional convolution neural network text independence | |
CN108597496B (en) | Voice generation method and device based on generation type countermeasure network | |
CN108281137A (en) | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system | |
CN102509547B (en) | Method and system for voiceprint recognition based on vector quantization based | |
CN108172218B (en) | Voice modeling method and device | |
CN104167208B (en) | A kind of method for distinguishing speek person and device | |
CN107731233A (en) | A kind of method for recognizing sound-groove based on RNN | |
CN108364662B (en) | Voice emotion recognition method and system based on paired identification tasks | |
CN106898355B (en) | Speaker identification method based on secondary modeling | |
CN103971690A (en) | Voiceprint recognition method and device | |
CN101540170B (en) | Voiceprint recognition method based on biomimetic pattern recognition | |
CN109473105A (en) | The voice print verification method, apparatus unrelated with text and computer equipment | |
CN111091809B (en) | Regional accent recognition method and device based on depth feature fusion | |
CN112507311A (en) | High-security identity verification method based on multi-mode feature fusion | |
Fong | Using hierarchical time series clustering algorithm and wavelet classifier for biometric voice classification | |
CN105845141A (en) | Speaker confirmation model, speaker confirmation method and speaker confirmation device based on channel robustness | |
CN105679323B (en) | A kind of number discovery method and system | |
CN103811000A (en) | Voice recognition system and voice recognition method | |
CN106971730A (en) | A kind of method for recognizing sound-groove based on channel compensation | |
CN106971727A (en) | A kind of verification method of Application on Voiceprint Recognition | |
CN105845143A (en) | Speaker confirmation method and speaker confirmation system based on support vector machine | |
CN106971731A (en) | A kind of modification method of Application on Voiceprint Recognition | |
CN116434758A (en) | Voiceprint recognition model training method and device, electronic equipment and storage medium | |
CN106887230A (en) | A kind of method for recognizing sound-groove in feature based space |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170721 |
|
RJ01 | Rejection of invention patent application after publication |