CN109872725A - Multi-angle of view vector processing method and equipment - Google Patents

Multi-angle of view vector processing method and equipment Download PDF

Info

Publication number
CN109872725A
CN109872725A CN201711267389.6A CN201711267389A CN109872725A CN 109872725 A CN109872725 A CN 109872725A CN 201711267389 A CN201711267389 A CN 201711267389A CN 109872725 A CN109872725 A CN 109872725A
Authority
CN
China
Prior art keywords
angle
visual angle
component
view vector
likelihood
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711267389.6A
Other languages
Chinese (zh)
Other versions
CN109872725B (en
Inventor
石自强
刘柳
刘汝杰
林慧镔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201711267389.6A priority Critical patent/CN109872725B/en
Publication of CN109872725A publication Critical patent/CN109872725A/en
Application granted granted Critical
Publication of CN109872725B publication Critical patent/CN109872725B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

This application discloses a kind of multi-angle of view vector processing method and equipment.Multi-angle of view vector x is used to characterize the object for the information for containing at least two visual angle that can not be discrete.This method comprises: modeling procedure, establishes the multi-angle of view vector field homoemorphism type, so that it includes at least following component: the population mean μ of the multi-angle of view vector;The component at each visual angle of the multi-angle of view vector;And noise ∈;Training step, using the training data of the multi-angle of view vector x obtain the population mean μ, each visual angle component parameter and the noise ∈ parameter;And matching step, utilize the population mean μ, the parameter of the parameter of the component at each visual angle and the noise ∈, calculate the identical and different likelihood component of each visual angle component of two multi-angle of view vectors, likelihood component calculated is pre-processed to obtain approximate likelihood, and judge whether two multi-angle of view vectors match according to the approximate likelihood.

Description

Multi-angle of view vector processing method and equipment
Technical field
This application involves field of information processing, analysis more particularly to multi-angle of view vector and compare.
Background technique
In various mode identification technologys, the feature extracted and utilized is often more intuitive and can be discrete feature. For example the shape feature and textural characteristics of object can be discrete, can only consider its shape for an object Without considering its texture (texture item is 0), or only consider its texture without considering its shape in turn (shape item is 0).Again For example, for the addition of waveforms of frequency domain, the waveform of the result of superposition seem to it is indivisible, but its high frequency at Divide and low-frequency component can be separated and is individually present, that is to say, that another corresponding ingredient is 0.For such feelings Shape can model each independent feature, then simple superposition respectively.
Summary of the invention
It has been given below about brief overview of the invention, in order to provide the base about certain aspects of the invention This understanding.It should be appreciated that this summary is not an exhaustive overview of the invention.It is not intended to determine the present invention Key or pith, nor is it intended to limit the scope of the present invention.Its purpose only provides certain in simplified form Concept, taking this as a prelude to a more detailed description discussed later.
According to an aspect of the invention, there is provided a kind of multi-angle of view vector processing method, wherein the multi-angle of view to Amount x is used to characterize the object for the information for containing at least two visual angle that can not be discrete, this method comprises: modeling procedure, establishing should Multi-angle of view vector field homoemorphism type, so that it includes at least following component: the population mean μ of the multi-angle of view vector;The multi-angle of view The component at each visual angle of vector;And noise ∈;Training step is obtained using the training data of the multi-angle of view vector x The population mean μ, each visual angle component parameter and the noise ∈ parameter;And matching step, utilize institute State population mean μ, each visual angle component parameter and the noise ∈ parameter, calculate each of two multi-angle of view vectors The identical and different likelihood component of a visual angle component pre-processes to obtain approximation seemingly likelihood component calculated Right property, and judge whether two multi-angle of view vectors match according to the approximate likelihood.
A kind of multi-angle of view Vector Processing equipment is additionally provided, including processor and the storage medium for being stored with program code, Said program code when being executed by a processor, realizes foregoing method.
Other aspects according to the present invention additionally provide corresponding computer program code, computer readable storage medium And computer program product.
According to the multi-angle of view vector processing method and equipment of the application, opposite multiple visual angles that can not be discrete can be led to The mode for crossing modeling is separated, and then is carried out using the model established in relation to each visual angle component different more Between the vector of visual angle whether identical judgement, such as vocal print confirm.
By below in conjunction with attached drawing the following detailed description of the embodiment of the present invention, these and other of the invention Advantage will be apparent from.
Detailed description of the invention
In order to which the above and other advantages and features of the application are further described, with reference to the accompanying drawing to the tool of the application Body embodiment is described in further detail.The attached drawing includes in the present specification together with following detailed description And form a part of this specification.The element of function and structure having the same is denoted with the same reference numerals.It should Understand, these attached drawings only describe the typical case of the application, and are not to be taken as the restriction to scope of the present application.In attached drawing In:
Fig. 1 is the schematic table of opposite component that can not be discrete;
Fig. 2 is the schematic table of opposite component that can be discrete;
Fig. 3 is the schematic flow diagram according to a kind of embodiment for the multi-angle of view vector processing method being disclosed;
Fig. 4 is the schematic flow diagram according to the another embodiment for the multi-angle of view vector processing method being disclosed;
Fig. 5 A is the schematic diagram for showing all visual angles judged in multi-angle of view vector scene whether all the same;
Fig. 5 B is the schematic diagram for showing the visual angle judged in multi-angle of view vector scene whether all the same;
Fig. 6 is the schematic flow diagram according to the another embodiment for the multi-angle of view vector processing method being disclosed;And
Fig. 7 is the method that embodiment according to the present invention wherein may be implemented and/or the general purpose personal computer of device The block diagram of exemplary structure.
Specific embodiment
Exemplary embodiment of the invention is described hereinafter in connection with attached drawing.For clarity and conciseness, All features of actual implementation mode are not described in the description.It should be understood, however, that any this practical real developing Much decisions specific to embodiment must be made during applying example, to realize the objectives of developer, example Such as, meet restrictive condition those of related to system and business, and these restrictive conditions may be with embodiment not It changes together.In addition, it will also be appreciated that although development is likely to be extremely complex and time-consuming, to benefit For those skilled in the art of present disclosure, this development is only routine task.
Here, and also it should be noted is that, in order to avoid having obscured the present invention because of unnecessary details, in attached drawing In illustrate only with closely related device structure and/or processing step according to the solution of the present invention, and be omitted and this hair The little other details of bright relationship.
It is discussed below to carry out in the following order:
1. the modeling of multi-angle of view vector
2. the utilization of multi-angle of view vector model
3. the extraction of multi-angle of view vector
4. multi-angle of view Vector Processing equipment
5. the calculating equipment to the device and method for implementing the application
[modelings of 1. multi-angle of view vectors]
Background technology part is mentioned, and for more intuitive and can be discrete feature, can be modeled respectively, then simple folded Add.But there is a situation where that the feature that need to be extracted and utilize can not be discrete.In this case, the prior art is often only It is modeled dependent on feature associated with the feature to be extracted and be utilized.Such as speech recognition, can only be to identify content Training objective extracts various features largely to be trained, but can not always reject the influence of different people, so that voice is known Other product by different people in use, always needing the training adaptation by one section of not short time.Reason for that is Voice content and the phonatory organ of specific people can not be discrete.For example in language content identification, in short say, Necessarily someone says certain, can not in short be detached from people and exist;And in the identification of people, vocal print is also inevitable It is extracted from specific language.
Similar situation further includes the intertexture of languages, age, gender, voice content and the identity of specific people.Again Such as in application associated with image recognition, age, sex, race and intertexture of specific identity, etc..
In other words, above-mentioned various if express the voice or image of a people with a feature vector Factor necessarily exists simultaneously in this feature vector, referred to as " multi-angle of view vector ", wherein such as voice content, languages, year Age, sex, race etc., be all one can not be discrete " visual angle ": each visual angle certainly exists certain option and can not be Zero.Specifically, in short necessarily people A or people B or ... say, it is impossible to be that " nobody " says (in this meaning On, robot is also " people ", in other words, in short inevitable to be issued by some entity);Vocal print is extracted, a people must Sounding is spoken, he can speak, that is to say, that the visual angle of voice content is also impossible to be 0.
Fig. 1 and Fig. 2 further illustrate can not be discrete visual angle with can be discrete visual angle.Fig. 1 shows can not be discrete two A visual angle u and v, i.e., both always in association, can not discretely exist simultaneously.Visual angle u is constantly present some option such as u1Or Person u2, and can not be sky or zero, visual angle v is also constantly present some option such as v1, v2Or v3, and can not to be empty or It is zero (quantity at the visual angle of certain u or v is not necessarily restricted).The collective effect at two visual angles will generate to be studied right As or information x: such as u1And v1X will be generated11n(n is natural number, indicates different samples), and so on.
And in Fig. 2, two visual angles u and v can be discrete.This means that each visual angle be likely to be empty or Person is zero and another visual angle can be individually present.At this point, two visual angles are formed by research object or letter when existing simultaneously Breath in fact can be expressed simply as the sum at two visual angles.Such as the specific choice u of visual angle u2With the specific choice v of visual angle v1 Generated message sample x21nIt can be expressed as u2The information x individually generated20nAnd v1The information x individually generated01nSum.
Certainly, described in the disclosure can discrete, can not be discrete concept can not excessively think in absolute terms, though can also exist some So can not be absolutely discrete, but the two tangles the less high situation of degree.It for this situation, is to be regarded as to divide Vertical, being still regarded as can not be discrete, then can be depending on actual needs.
The disclosure is targeted, as comprising (opposite) can not be discrete multiple visual angles information multi-angle of view vector.It is more Visual angle vector itself can be obtained with any traditional mode.For example, see Fig. 3, can directly by object 302 to be processed or Person's information in any way vectorization 304 and obtain multi-angle of view vector 306.For example, can be using following methods to the language of collection Sound data are handled, and the signal that voice data is divided into 25 milliseconds of frame length, frame to move 10 milliseconds extracts the mel-frequency of 13 dimensions The first-order difference and second differnce of cepstrum coefficient (MFCCs) and the coefficient connect totally 39 dimensions and are used as feature.Join simultaneously Close hereafter that totally 39 frames (left 25 frames, right 13 frames) as final feature totally 1521 tie up (39*39).Thus obtain 1521 The vector of dimension can be used as the process object of the technical solution of the disclosure.Certainly, it will be understood by those skilled in the art that language The processing of sound data can also use other methods well known in the art, and this will not be repeated here.Meanwhile vectorization can be carried out Processing is also not limited to voice data certainly.
After obtaining multi-angle of view vector 306, vector can be carried out with method proposed by the present invention and decompose 308, to carry out It is specific to apply 310, such as judge whether match between two multi-angle of view vectors.Side of the application also for vectorization 304 Method, vector decompose 308 method (i.e. the modeling of multi-angle of view vector) and application 310 proposes new scheme.It first discusses below The new vector decomposition method that the application proposes, in other words namely multi-angle of view vector modeling method.It is right according to the application In multi-angle of view vector 306, establish or train 508 vector decomposition models by way of the application proposition, thus obtain to Decomposition model parameter 510 is measured, vector can be carried out with the model parameter 510 and decompose 308.For application, the mould Shape parameter 510 can also directly be applied 310 because after having obtained model parameter 510, either with or without " dominant " to It is not important that amount decomposes 308.In certain models directly by model parameter 510 be applied to research object information certain answer It is the equal of being applied again after being decomposed to object information in.
According to a kind of embodiment of the application, a kind of multi-angle of view vector processing method is proposed, wherein more views Angle vector x is used to characterize the object for the information for containing at least two visual angle that can not be discrete, this method comprises: modeling procedure, builds The multi-angle of view vector field homoemorphism type is found, so that it includes at least following component: the population mean μ of the multi-angle of view vector;It is described more The component at each visual angle of visual angle vector;And noise ∈;Training step utilizes the training data of the multi-angle of view vector x Obtain the population mean μ, each visual angle component parameter and the noise ∈ parameter;And matching step, benefit With the population mean μ, the parameter of the parameter of the component at each visual angle and the noise ∈, two multi-angle of view vectors are calculated The identical and different likelihood component of each visual angle component, likelihood component calculated is pre-processed close to obtain Judge whether two multi-angle of view vectors match like likelihood, and according to the approximate likelihood.
According to the embodiment, that is, following such model is established to multi-angle of view vector x and mould is obtained by training Shape parameter is so as to obtaining the influence of each component in the multi-angle of view vector x:
Wherein Ci is the component at each visual angle, and i is the serial number at visual angle.
It was recognized by the inventor that the population mean can also be assigned in the component at each visual angle, therefore one In kind embodiment, the population mean μ can be set as 0.To which the model can become:
In addition, the component Ci at each visual angle can be considered as corresponding visual angle space base SiIt is regioselective with the visual angle Coefficient uiProduct, wherein i be visual angle serial number.I.e.
For the noise, it is believed that it meets using diagonal matrix sigma as the Gaussian Profile of covariance.
The training step can use EM algorithm, using the training data obtain the population mean μ, The space base S at each visual angleiWith the ∑.Specifically, μ, S can be based oniIt calculates with ∑ for described each visual angle point Amount specific selection, all samples of x mean value expectation, and for the regioselective of each visual angle component with The relevant expectation of covariance, and μ, S are recalculated based on the desired valueiAnd ∑, until convergence.
For convenience of description, it only by taking two kinds of visual angles as an example, for example is mentioned with the voice comprising two visual angles of speaker and text For the vocal print (i.e. multi-angle of view vector) taken out.Assuming that there is I speaker in training data, J kind text, every kind of text is every People corresponds to HijDuan Yuyin.The corresponding multi-angle of view vocal print of kth section voice for remembering i-th of people's jth kind text is xijk.Then formula (3) can To be written as:
xijk=μ+Sui+Tvj+∈ijk (4)
Wherein μ indicates all xijkAverage value, that is, population mean, S and T respectively represent speaker visual angle space base and I-th kind of coefficient selected of the space base at text visual angle, the visual angle S is ui, the coefficient that the jth kind at the visual angle T selects is vj。∈ijkTable Show noise signal (to meet the Gaussian Profile using diagonal matrix sigma as covariance).K indicates the in the case where aforementioned selection K kind sample.If θ={ μ, S, T, ∑ }, comprising parameter all in multi-angle of view vector model, and in order to simplify following retouch It states, it is assumed that B=[S T].Parameter in hypothesized model meets lower column distribution:
WhereinIt is mean value is the normal distribution that μ variance is ∑.That is, under the premise of parameter θ, for The specific selection u at two visual angles S and TiAnd vj, multi-angle of view vector xijkBe distributed as with μ+Sui+TvjFor mean value, Σ is variance Normal distribution.uiAnd vjItself be respectively then mean value be 0, variance be unit matrix I normal distribution.
The basic process of aforementioned EM algorithm is as follows:
Random initializtion parameter θ={ μ, S, T, ∑ } first;
Then, for I speaker all in training data, J kind text, every kind of text everyone correspond to HijDuan Yuyin Multi-angle of view vector (vocal print) X={ xijk: i=1 ..., I;J=1 ..., J;K=1 ..., Hij, it calculates:
And
Wherein, formula (6) is the mean value expectation of all samples of X, and formula (7) is the spy for each visual angle component Determine the expectation relevant to covariance of selection.Wherein θtFor the parameter θ of t step.When for first cycle i.e. t=1, For the initial value arbitrarily chosen as previously described.Wherein U indicates variable uiSet, V indicate variable vjSet, the relationship that wherein relationship of Z and U, V can be expressed as × multiply, It is exactly Z=U × V.
Then, based on the aforementioned desired value calculated, new parameter value is calculated:
Then the desired value of new parameter value calculation formula (6) and formula (7) is recycled, circulation is until convergence, obtains θ always ={ μ, S, T, ∑ }.Model parameter has been obtained, has also meaned that have obtained the component at each visual angle:
Wherein B=2SST+TTT
[utilizations of 2. multi-angle of view vector models]
The modeling for the multi-angle of view vector that the application proposes can be applied to characterize the vector of any information, such as phenogram The vector of picture or sound.The multi-angle of view vector for wherein characterizing sound can be referred to as vocal print.Vocal print confirms in many fields It has a wide range of applications, including intelligent user interface, Homeland Security, telephone bank etc..The multi-visual-angle pronunciation proposed based on the application The modeling method of line further provides the recognition methods of vocal print, that is, calculates two multi-angle of view vocal prints and belong to or be not belonging to same Then the likelihood of one people and same text do further decision using the likelihood.
Specifically, in previously discussed scheme, the multi-angle of view vector xijkIt can indicate i-th of speaker just The vocal print of k-th of sample of j kind text, uiFor the coefficient of i-th of speaker, vj
For the coefficient of jth kind text.So in one embodiment, can use the population mean μ, each The parameter of the parameter of the component at visual angle and the noise ∈, at least one the visual angle component for calculating two multi-angle of view vectors are identical With different likelihoods, judge whether at least one visual angle component of two multi-angle of view vectors is identical according to the likelihood.Example Such as, can be used to judge whether speaker is identical, that is, be used for identification;Can also be used to judge whether speech content is identical, It is used for speech recognition or password authentication.It can also be used to carry out more accurate Application on Voiceprint Recognition, such as require correct People says correct content.
Using model parameter obtained above, can be applied based on above-mentioned different scenes.
Fig. 5 A illustrates the schematic diagram of all visual angles judged in multi-angle of view vector scene whether all the same.As tool Body example does not constitute a limitation still, such as whether judging the vocal print comprising two visual angle characteristics of speaker and speech content It is completely the same.Fig. 5 A is corresponding with formula (4), u1、u2Indicate the specific selection at speaker visual angle, v1、v2Indicate speech content view The specific selection at angle.X is various combined speech samples, and ε is the last noise item in formula (4).
With continued reference to formula (5) above, under the premise of parameter θ, it is above-mentioned about Application on Voiceprint Recognition the problem of can be considered as Hypothesis Testing Problem judges which kind of conclusion is set up according to data.It here, as shown in Figure 5 A, can be with limitative proposition H0, Indicate that speaker is identical with speech content.Furthermore, it is possible to limitative proposition H1, indicate working as speaker and speech content In have a difference or two different all situations.Specifically, as shown in Figure 5 A, the hypothesis H in left side0Include Mode M0, indicate the speaker of two sections of voices and speech content situation all the same;And the hypothesis H on right side1Include mode M1、M2And M3, wherein mode M1Indicate that speaker is different and the identical situation of speech content, mode M2Indicate speaker it is identical and The different situation of speech content and mode M3Indicate the situation that speaker and speech content are all different.
Therefore, the problem of Application on Voiceprint Recognition as described above, which can be attributed to, will assume H0Under mode (M0) likelihood (i.e. two sections of vocal print exact matchings, indicate that speaker and speech content are identical) and hypothesis H1Under all mode (M1、M2、 M3) likelihood (i.e. two sections of vocal prints mismatch, indicate speaker and speech content in have a difference or two It is different) the problem of being compared.Here the concept of likelihood ratio can be introduced, i.e. hypothesis H0Under institute's directed quantity likelihood with Assuming that H1Under institute's directed quantity likelihood between ratio.If the ratio is greater than a certain threshold value, such as 1, it can be determined that two Otherwise Duan Shengwen exact matching may determine that two sections of vocal prints mismatch.
Specifically, for two sections of vocal prints, i.e. target vocal print xs, and test vocal print xt, set A=SST+TTT, B=2SST+ TTT, C=SST+2TTT, and set and assume H1Under all mode M1、M2、M3Prior probability be respectively p1=(M1|H1)、p2 =(M2|H1) and p3=(M3|H1).Accordingly, likelihood ratio as described above can be calculated as following formula (13).
Wherein
In formula (13), l (xt,xs) indicate likelihood ratio as described above, i.e. hypothesis H0Under institute's directed quantity likelihood Property P (xt,xs|H0) and hypothesis H1Under institute's directed quantity likelihood P (xt,xs|H1) between ratio.
As shown in above formula (13) and (14), when calculating likelihood ratio, need to carry out four exponent arithmetics, this may be in reality Huge computation burden is brought in the application of border, thus will affect real-time.For this purpose, according to embodiment of the present disclosure, it can be right Likelihood component calculated is pre-processed to obtain approximate likelihood, and judges two sections of sound according to approximate likelihood Whether line matches.In this way, can simplify calculating process, raising judges speed.
According to embodiment of the present disclosure, which can be according to the category of each visual angle component in multi-angle of view vector Property select a likelihood component as approximate likelihood.For example, can choose the likelihood component conduct with maximum value Approximate likelihood.
For the specific example of Application on Voiceprint Recognition described above, it is contemplated that indicate that speaker is different and says in actual conditions Talk about the mode M of the identical situation of content1With indicate speaker is identical and the mode M for the situation that speech content is different2Corresponding Likelihood component is smaller, thus this two can be omitted in calculating process, i.e., in formula (14)WithThen to the two of formula (13) Logarithm operation side-draw.Available following formula (15)
According to formula (15) as it can be seen that by the pretreatment to likelihood component, calculating cost can be greatly reduced, is improved Calculating speed, to be suitable for processing in real time.
According to present disclosure, the pretreatment executed to likelihood component is not limited to above embodiment.For example, can be with Mode M is corresponded respectively to in formula (14)1、M2And M3Exponential term be compared to each other, if maximum exponential term and Difference between two big exponential terms is greater than a certain threshold value, such as 15, then only retains maximum exponential term and omit other two Exponential term.Otherwise, it omits and mode M1And M2Corresponding exponential term only retains exponential term corresponding with mode M3, i.e.,Then, molecule and denominator point as above formula (15), to likelihood ratio Logarithm operation is not taken, calculates cost to achieve the purpose that simplify.
Discussed above is all identical embodiments of visual angle component for calculating two multi-angle of view vectors.Specifically, it begs for The case where judging two sections of vocal print exact matchings has been discussed, i.e., has required correct speaker to say correctly in Application on Voiceprint Recognition application Speech content.
Next, by the whether identical embodiment progress of at least one visual angle component to two multi-angle of view vectors are judged Description.For the application scenarios of Application on Voiceprint Recognition, can be used to judge whether speaker is identical, that is, be used for identification;It can also be with For judging whether speech content is identical, that is, it is used for speech recognition or password authentication.
Fig. 5 B illustrates the schematic diagram of the visual angle judged in multi-angle of view vector scene whether all the same.As tool Body example does not constitute a limitation still, such as is used to judge the whether identical example of speech content in Application on Voiceprint Recognition, can answer For speech recognition or password authentication.Similar with Fig. 5 A, Fig. 5 B is corresponding with formula (4), u1、u2Indicate speaker visual angle Specific selection, v1、v2Indicate the specific selection at speech content visual angle.X is various combined speech samples, and ε is in formula (4) Last noise item.
Different from Fig. 5 A, in figure 5B, it is assumed that H0The identical all scenario of expression speech content (i.e. on the left of in Fig. 5 B Mode M0And M1), and assume H1Indicate different all situation (the i.e. mode M on the right side of in Fig. 5 B of speech content2And M3).It should Application on Voiceprint Recognition problem, which can be attributed to, will assume H0Under all mode (M0And M1) likelihood (i.e. in the speaking of two sections of vocal prints Hold identical) and hypothesis H1Under all mode (M2And M3) the likelihood speech content of vocal print (i.e. two sections different) compared Compared with the problem of.If it is assumed that H0Under institute's directed quantity likelihood and assume H1Under institute's directed quantity likelihood between likelihood Than being greater than a certain threshold value, such as 1, it can be determined that the speech content of two sections of vocal prints is identical, otherwise may determine that saying for two sections of vocal prints It is different to talk about content.
Specifically, for two sections of vocal prints, i.e. target vocal print xs, and test vocal print xt, set A=SST+TTT, B=2SST+ TTT, C=SST+2TTT, and set and assume H1Under all mode M1、M2、M3Prior probability be respectively p1=(M1|H1)、p2 =(M2|H1) and p3=(M3|H1).Accordingly, likelihood ratio as described above can be calculated as following formula (16).
Wherein
And
As shown in above formula (16)-(18), when calculating likelihood ratio, need to carry out four exponent arithmetics, this may be in reality Huge computation burden is brought in, thus will affect real-time.For this purpose, according to embodiment of the present disclosure, it can be to institute The likelihood component of calculating pre-processed to obtain approximate likelihood, and is judged according to approximate likelihood.This Sample, can simplify calculating process, and raising judges speed.
According to embodiment of the present disclosure, which can be according to the category of each visual angle component in multi-angle of view vector Property select a likelihood component as approximate likelihood.For example, can choose the likelihood component conduct with maximum value Approximate likelihood.
For the specific example of Application on Voiceprint Recognition described above, it is contemplated that in indicating speaker in actual conditions and speaking Hold the mode M of situation all the same0With indicate speaker is identical and the mode M for the situation that speech content is different2It is corresponding seemingly Right property component is smaller, thus this two can be omitted in calculating process, i.e., in formula (17)In formula (18)It is then right Two logarithm operations side-draw of formula (16) calculate cost to achieve the purpose that reduce.
According to present disclosure, the pretreatment executed to likelihood component is not limited to above embodiment.For example, can be with Mode M is corresponded respectively to in formula (17)0And M1Exponential term be compared to each other, if difference between the two be greater than it is a certain Threshold value, such as 15, then only retain biggish exponential term and omits another exponential term.Otherwise, it omits and mode M0It is corresponding Exponential termIn addition, corresponding respectively to mode M in formula (18)2And M3Finger It is several to carry out similar processing, that is, the two exponential terms are compared to each other, if difference between the two is greater than a certain threshold value, such as 15, then only retain biggish exponential term and omits another exponential term.Otherwise, it omits and mode M2Corresponding exponential termThen, logarithm operation is taken respectively to the molecule of likelihood ratio and denominator, to reach To the purpose for simplifying calculating cost.
Discussed above is judge the whether identical situation of the speech content of two sections of vocal prints.However, those skilled in the art answer It recognizes, techniques discussed above scheme can be applied equally to judge the situation whether speaker of two sections of vocal prints is identical.It changes Yan Zhi can be applied not only to judge all visual angle components of two multi-angle of view vectors according to the technical solution of present disclosure Whether identical situation, can be applied equally to judge one or more visual angle components in two multi-angle of view vectors Whether identical situation.
[extractions of 3. multi-angle of view vectors]
The multi-angle of view vector can be by directly carrying out vectorization acquisition to the object to be characterized.As example Rather than limit, such as the voice data of collection can be handled using following methods, to obtain the example of multi-angle of view vector One of: vocal print.The extraction of vocal print for example can be using the previously described method for extracting mel-frequency cepstrum coefficient (MFCCs). Certainly, it will be understood by those skilled in the art that other methods well known in the art can also be used to the processing of voice data, This will not be repeated here.
It, being capable of comprehensive earth's surface when directly from the object extraction multi-angle of view vector comprising multiple Viewing-angle informations that can not be discrete The object is levied, the modeling method that can use the application proposition later is based on a large amount of object samples and builds to the multi-angle of view vector Mould is applied to survey to reflect influence of the feature of different perspectives to the multi-angle of view vector so as to model gained model Object is tried, to identify or the feature at some or multiple visual angles using test object.
That is, the feature of such as one object is influenced by the visual angle A that can not be discrete and the visual angle B, the disclosure is no longer Pursuit is directly extracted from object as far as possible not by the A visual angle characteristic of B angle effects, or is directly extracted as far as possible not by the visual angle A shadow Loud B visual angle characteristic is also no longer pursued and is directed to the visual angle A respectively come marker samples with the visual angle A and the visual angle B respectively to train With the classifier at the visual angle B.On the contrary, the disclosure recognizes the extraction of characteristics of objects the reality that the visual angle A and the visual angle B can not be discrete, To extract the feature of object together, multi-angle of view vector is formed, the visual angle A and B are then measured using the modeling method of the disclosure The respective influence in visual angle.
But in some cases, the object without any processing, influencing its possibility has multiple factors, these factors In some be relatively can be discrete, some be then it is opposite can not be discrete.At this point, if originally with respect to can be discrete Visual angle is also included, then can unnecessarily increase the calculation amount that modeling and vector decompose, in addition due to being excessively increased for variable and Becoming the solution of problem can not.In such a case, it is possible to consider when extracting multi-angle of view vector first will it is opposite can not It separates at discrete visual angle.
One of method is to be handled using classifier direct vectorization object feature vector obtained and obtained The multi-angle of view vector at opposite visual angle that can not be discrete is remained to eliminating opposite visual angle that can be discrete.In other words, it is excluded Visual angle and the multi-angle of view vector multiple visual angles between can discreteness, higher than between the multiple visual angle can be discrete Property.It is noted that here opposite can be discrete, it is opposite can not be discrete, be all opposite concept, and it is nisi " discrete " and " can not be discrete ".Even, in some cases, such as in the case where there are many visual angle, may all can not be discrete, at this time Then determine which visual angle is excluded depending on the selection of user, which visual angle is retained.
Classifier can use neural network.In the training stage, training sample is marked, such as to interested Each visual angle is marked.Such as the image for people, interested age, gender can be marked.It is marked with these Image pattern is trained neural network.So test image is handled with the neural network that training is completed, can be obtained To the multi-angle of view vector comprising two visual angles of age and gender of the image.The multi-angle of view vector of voice can also be extracted.Depend on The feature visual angle that sample when training neural network is marked, such as age, sex, race, specific personal identification, language Kind, specific voice content etc. just include with the multi-angle of view vector that trained Processing with Neural Network tested speech sample is obtained The feature at these selected visual angles.
Fig. 6 illustrates the overall architecture decomposed from the training of classifier to multi-angle of view vector.Wherein, in classifier training Stage, visual angle S1 and S2 that comparatively can be discrete be not used to label training sample, and with can not be discrete visual angle S3 and S4 Feature is come while marking training sample, and training sample set 410, Lai Xunlian classifier 408 are obtained.Classifier 408 is to test specimens Test sample in this set 412 is handled, and the multi-angle of view vector set 414 of the information comprising visual angle S3 and S4 is obtained.Base Vector in multi-angle of view vector set 414 trains the process of multi-angle of view vector decomposition model to be not shown in Fig. 6.It is based on The model can decompose each multi-angle of view vector, for example (vector x 1 and x2 can come from multi-angle of view for vector x 1 and vector x 2 Vector set 414 is also possible to the multi-angle of view vector newly extracted with classifier 408 in practical applications), they can be divided Solution is the component of visual angle S3 and S4.The decomposition of this visual angle component can be it is dominant, such as certain applications needs directly obtain Obtain the component of some or each visual angle;It may also be recessive, such as vocal print comparison by discussion in this application In embodiment, vocal print is not yet explicitly decomposed, but in calculating vocal print when the same or different likelihood in each visual angle, It is equivalent to and vocal print is decomposed.
[4. multi-angle of view Vector Processing equipment]
Method discussed above can realize by program that computer can be performed completely, can also be partly or complete It is realized entirely using hardware and/or firmware.When it is realized with hardware and/or firmware, or the journey that computer can be performed When sequence loading can run the hardware device of program, then the multi-angle of view Vector Processing equipment that will be described below is realized.Hereafter In, the summary of these devices is provided in the case where not repeating above some details by discussion, it should be noted that Although these devices can execute hereinbefore described method, the method not necessarily uses that of described device A little components are not necessarily executed by those components.
According to one kind, embodiment there is provided multi-angle of view Vector Processing equipment, wherein the multi-angle of view vector x is for characterizing Contain at least two the object of the information at visual angle that can not be discrete.The equipment includes for instructing to multi-angle of view vector model Experienced training device, the multi-angle of view vector field homoemorphism type include at least following component: the population mean μ of the multi-angle of view vector;Institute State the component at each visual angle of multi-angle of view vector;And noise ∈;The instruction of multi-angle of view vector x described in the training mechanism Practice data obtain the population mean μ, each visual angle component parameter and the noise ∈ parameter.The multi-angle of view to Amount processing equipment can use the population mean μ, each visual angle component parameter and the noise ∈ parameter, meter Calculate the identical and different likelihood component of each visual angle component of two multi-angle of view vectors, to likelihood component calculated into Row pretreatment judges whether two multi-angle of view vectors match according to the approximate likelihood to obtain approximate likelihood.
The pretreatment can select likelihood point according to the attribute of each visual angle component in multi-angle of view vector Amount is as approximate likelihood.In addition, the pretreatment can choose the likelihood component with maximum value as approximate likelihood Property.
Similarly, the population mean μ can be set as 0.The component at each visual angle can be based on corresponding visual angle space The regioselective coefficient u of base Si and the visual angleiProduct, wherein i be visual angle serial number.The noise can be set as to full It is enough the Gaussian Profile that diagonal matrix sigma is covariance.
In one embodiment, the training device can be configured as using EM algorithm, utilize the instruction Practice space base Sn and the ∑ that data obtain the population mean μ, each visual angle.It, can in the EM algorithm To calculate the specific selection for each visual angle component based on μ, Sn and ∑, the mean value of all samples of x it is expected, with And the relevant expectation of regioselective and covariance for each visual angle component, and again based on the desired value μ, Sn and ∑ are calculated, until convergence.
The multi-angle of view vector includes the expression of the multi-angle of view vector model in the case where two visual angles, each in model The distribution of kind parameter and variable, reference can be made to being repeated no more above to the explanation of method.
The equipment can also include probability calculation device, further comprise calculating two to regard using the likelihood more The identical and different probability of at least one the visual angle component angularly measured, the judgment means are configured as according to the probabilistic determination Whether at least one visual angle component of two multi-angle of view vectors is identical.
In the case where judging the application whether all the same of two visual angle components, the method for the working method of the equipment in front Part is described, and is repeated no more.At this point, the equipment can be vocal print confirmation equipment, it is used to exact p-value vocal print and target sound Whether line is identical, i.e. whether two visual angle components (speaker with speech content) are identical.
Correlative detail in above embodiments is shown in detail in the description to multi-angle of view vector processing method, This is repeated no more.
[the calculating equipment of 5. device and method to implement the application]
All modules, unit can be matched by way of software, firmware, hardware or combinations thereof in above-mentioned apparatus It sets.It configures workable specific means or mode is well known to those skilled in the art, details are not described herein.Passing through software It is (such as shown in Fig. 7 to the computer with specialized hardware structure from storage medium or network or in the case that firmware is realized General purpose computer 700) program for constituting the software is installed, which is able to carry out various function when being equipped with various programs Energy is equal.
In Fig. 7, central processing unit (CPU) 701 is according to the program stored in read-only memory (ROM) 702 or from depositing The program that storage part 708 is loaded into random access memory (RAM) 703 executes various processing.In RAM 703, also according to need Store the data required when CPU 701 executes various processing etc..CPU 701, ROM 702 and RAM 703 are via bus 704 are connected to each other.Input/output interface 705 is also connected to bus 704.
Components described below is connected to input/output interface 705: importation 706 (including keyboard, mouse etc.), output section Divide 707 (including display, such as cathode-ray tube (CRT), liquid crystal display (LCD) etc. and loudspeakers etc.), storage section 708 (including hard disks etc.), communications portion 709 (including network interface card such as LAN card, modem etc.).Communications portion 709 execute communication process via network such as internet.As needed, driver 710 can be connected to input/output interface 705.Removable media 711 such as disk, CD, magneto-optic disk, semiconductor memory etc. is installed in driver as needed On 710, so that the computer program read out is mounted to as needed in storage section 708.
In the case where series of processes above-mentioned by software realization, such as may be used from network such as internet or storage medium The installation of removal medium 711 constitutes the program of software.
It will be understood by those of skill in the art that this storage medium is not limited to shown in Fig. 7 wherein be stored with journey Sequence is separately distributed with equipment to provide a user the removable media 711 of program.The example of removable media 711 includes Disk (include floppy disk (registered trademark)), CD (including compact disc read-only memory (CD-ROM) and digital versatile disc (DVD)), Magneto-optic disk (including mini-disk (MD) (registered trademark)) and semiconductor memory.Alternatively, storage medium can be ROM 702, The hard disk etc. for including in storage section 708, wherein computer program stored, and use is distributed to together with the equipment comprising them Family.
The invention also provides corresponding computer program codes, a kind of instruction code for being stored with machine-readable Computer program product.When described instruction code is read and executed by machine, above-mentioned side according to an embodiment of the present invention can be performed Method.
Correspondingly, the storage for being configured as carrying the program product of the above-mentioned instruction code for being stored with machine-readable is situated between Matter is also included in disclosure of the invention.The storage medium includes but is not limited to floppy disk, CD, magneto-optic disk, storage card, deposits Store up stick etc..
Calculating equipment including above-mentioned storage medium also includes in disclosure of the invention, such as at a kind of multi-angle of view vector Equipment is managed, including processor and the storage medium for being stored with program code, said program code is when being executed by a processor, real Now foregoing method.
Through the above description, the embodiment provides technical solution below, but not limited to this.
A kind of multi-angle of view vector processing method of scheme 1., wherein the multi-angle of view vector x includes at least two for characterizing The object of the information at a visual angle that can not be discrete, this method comprises:
Modeling procedure establishes the multi-angle of view vector field homoemorphism type so that its include at least following component: the multi-angle of view to The population mean μ of amount;The component at each visual angle of the multi-angle of view vector;And noise ∈;
Training step obtains the population mean μ, each visual angle using the training data of the multi-angle of view vector x The parameter of the parameter of component and the noise ∈;And
Matching step, using the population mean μ, the parameter of the parameter of the component at each visual angle and the noise ∈, The identical and different likelihood component of each visual angle component of two multi-angle of view vectors is calculated, to likelihood component calculated Pre-processed to obtain approximate likelihood, and according to the approximate likelihood come judge two multi-angle of view vectors whether Match.
The method as described in scheme 1 of scheme 2., wherein the pretreatment is according to each visual angle in multi-angle of view vector point The attribute of amount selects a likelihood component as approximate likelihood.
The method as described in scheme 1 of scheme 3., wherein there is the likelihood component of maximum value to make for the pretreatment selection For approximate likelihood.
The method as described in scheme 1 of scheme 4., wherein the component at each visual angle is based on corresponding visual angle space base SiWith The regioselective coefficient u at the visual angleiProduct, wherein i be visual angle serial number.
The method as described in scheme 4 of scheme 5., wherein be set as meeting with diagonal matrix sigma for association side by the noise The Gaussian Profile of difference.
The method as described in scheme 5 of scheme 6., wherein the training step includes: to be utilized using EM algorithm The training data obtains the space base S of the population mean μ, each visual anglenWith the ∑.
The method as described in scheme 5 of scheme 7. remembers corresponding visual angle wherein the multi-angle of view vector includes two visual angles Space base is S and T, then multi-angle of view vector is represented as:
xijk=μ+Sui+Tvj+∈ijk
Wherein μ indicates the population mean, uiFor the coefficient of i-th kind of selection at visual angle corresponding with space base S, vjFor with The coefficient of the jth kind selection at the corresponding visual angle space base T, ∈ijkIndicate that the noise, k indicate in the case where aforementioned selection Kth kind sample.
Method of the scheme 8. as described in any one of scheme 1 to 7, wherein the multi-angle of view vector is characterization talker Two visual angle vectors of vocal print, the content that wherein talker and talker are said are the visual angle components of two visual angle vector.
A kind of multi-angle of view Vector Processing equipment of scheme 9., including processor and the storage medium for being stored with program code, institute It states program code when being executed by a processor, realizes the method as described in any one of scheme 1 to 7.
A kind of computer readable storage medium of scheme 10. is stored with program code, and said program code is when by processor When execution, the method as described in any one of scheme 1-7 is realized.
Finally, it is to be noted that, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or equipment for including a series of elements not only includes those elements, and It and further include other elements that are not explicitly listed, or further include solid by this process, method, article or equipment Some elements.In addition, in the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in the process, method, article or apparatus that includes the element.
Although the embodiment of the present invention is described in detail in conjunction with attached drawing above, it is to be understood that reality described above It applies mode to be only configured as illustrating the present invention, and is not construed as limiting the invention.Come for those skilled in the art It says, above embodiment can be made various changes and modifications without departing from the spirit and scope of the invention.Therefore, originally The range of invention is only limited by the attached claims and its equivalents.

Claims (9)

1. a kind of multi-angle of view vector processing method, wherein the multi-angle of view vector x is contained at least two for characterization can not be discrete Visual angle information object, this method comprises:
Modeling procedure establishes the multi-angle of view vector field homoemorphism type so that its include at least following component: the multi-angle of view vector it is total Body mean μ;The component at each visual angle of the multi-angle of view vector;And noise
Training step obtains the component of the population mean μ, each visual angle using the training data of the multi-angle of view vector x Parameter and the noiseParameter;And
Matching step utilizes the population mean μ, the parameter of the component at each visual angle and the noiseParameter, calculate The identical and different likelihood component of each visual angle component of two multi-angle of view vectors carries out likelihood component calculated pre- Processing judges whether two multi-angle of view vectors match according to the approximate likelihood to obtain approximate likelihood.
2. the method for claim 1, wherein category pre-processed according to each visual angle component in multi-angle of view vector Property select a likelihood component as approximate likelihood.
3. the method for claim 1, wherein the pretreatment selects the likelihood component with maximum value as approximation Likelihood.
4. the method for claim 1, wherein the component at each visual angle is based on corresponding visual angle space base SiWith the visual angle Regioselective coefficient uiProduct, wherein i be visual angle serial number.
5. method as claimed in claim 4, wherein be set as the noise to meet the height using diagonal matrix sigma as covariance This distribution.
6. method as claimed in claim 5, wherein the training step includes: to utilize the instruction using EM algorithm Practice the space base S that data obtain the population mean μ, each visual anglenWith the ∑.
7. method as claimed in claim 5 remembers the space base at corresponding visual angle wherein the multi-angle of view vector includes two visual angles For S and T, then multi-angle of view vector is represented as:
xijk=μ+Sui+Tvj+∈ijk
Wherein μ indicates the population mean, uiFor the coefficient of i-th kind of selection at visual angle corresponding with space base S, vjFor with space The coefficient of the jth kind selection at the corresponding visual angle base T, ∈ijkIndicate that the noise, k indicate kth in the case where aforementioned selection Kind sample.
8. the method as described in any one of claims 1 to 7, wherein the multi-angle of view vector is the vocal print for characterizing talker Two visual angle vectors, the content that wherein talker and talker are said is the visual angle component of two visual angle vector.
9. a kind of multi-angle of view Vector Processing equipment, including processor and the storage medium for being stored with program code, described program generation Code when being executed by a processor, realizes the method as described in any one of claims 1 to 7.
CN201711267389.6A 2017-12-05 2017-12-05 Multi-view vector processing method and device Active CN109872725B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711267389.6A CN109872725B (en) 2017-12-05 2017-12-05 Multi-view vector processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711267389.6A CN109872725B (en) 2017-12-05 2017-12-05 Multi-view vector processing method and device

Publications (2)

Publication Number Publication Date
CN109872725A true CN109872725A (en) 2019-06-11
CN109872725B CN109872725B (en) 2022-10-18

Family

ID=66916428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711267389.6A Active CN109872725B (en) 2017-12-05 2017-12-05 Multi-view vector processing method and device

Country Status (1)

Country Link
CN (1) CN109872725B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080208581A1 (en) * 2003-12-05 2008-08-28 Queensland University Of Technology Model Adaptation System and Method for Speaker Recognition
US20090137924A1 (en) * 2007-08-27 2009-05-28 Microsoft Corporation Method and system for meshing human and computer competencies for object categorization
US20120194649A1 (en) * 2009-07-31 2012-08-02 University Of Connecticut System and methods for three-dimensional imaging of objects in a scattering medium
CN103049751A (en) * 2013-01-24 2013-04-17 苏州大学 Improved weighting region matching high-altitude video pedestrian recognizing method
CN104268586A (en) * 2014-10-17 2015-01-07 北京邮电大学 Multi-visual-angle action recognition method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080208581A1 (en) * 2003-12-05 2008-08-28 Queensland University Of Technology Model Adaptation System and Method for Speaker Recognition
US20090137924A1 (en) * 2007-08-27 2009-05-28 Microsoft Corporation Method and system for meshing human and computer competencies for object categorization
US20120194649A1 (en) * 2009-07-31 2012-08-02 University Of Connecticut System and methods for three-dimensional imaging of objects in a scattering medium
CN103049751A (en) * 2013-01-24 2013-04-17 苏州大学 Improved weighting region matching high-altitude video pedestrian recognizing method
CN104268586A (en) * 2014-10-17 2015-01-07 北京邮电大学 Multi-visual-angle action recognition method

Also Published As

Publication number Publication date
CN109872725B (en) 2022-10-18

Similar Documents

Publication Publication Date Title
CN106683680B (en) Speaker recognition method and device, computer equipment and computer readable medium
CN107610707B (en) A kind of method for recognizing sound-groove and device
CN107610709B (en) Method and system for training voiceprint recognition model
TWI641965B (en) Method and system of authentication based on voiceprint recognition
CN111276131B (en) Multi-class acoustic feature integration method and system based on deep neural network
CN108875463A (en) Multi-angle of view vector processing method and equipment
CN107680600B (en) Sound-groove model training method, audio recognition method, device, equipment and medium
CN105096955B (en) A kind of speaker's method for quickly identifying and system based on model growth cluster
CN107180628A (en) Set up the method, the method for extracting acoustic feature, device of acoustic feature extraction model
CN109545227B (en) Depth self-coding network-based speaker sex automatic identification method and system
CN111243602A (en) Voiceprint recognition method based on gender, nationality and emotional information
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
CN102737633A (en) Method and device for recognizing speaker based on tensor subspace analysis
CN108986798B (en) Processing method, device and the equipment of voice data
CN110929836B (en) Neural network training and image processing method and device, electronic equipment and medium
CN106096642B (en) Multi-mode emotional feature fusion method based on identification of local preserving projection
CN112053694A (en) Voiceprint recognition method based on CNN and GRU network fusion
CN108417207A (en) A kind of depth mixing generation network self-adapting method and system
CN111081255A (en) Speaker confirmation method and device
CN110111798A (en) A kind of method and terminal identifying speaker
CN111666996A (en) High-precision equipment source identification method based on attention mechanism
CN111613230A (en) Voiceprint verification method, voiceprint verification device, voiceprint verification equipment and storage medium
CN111462762B (en) Speaker vector regularization method and device, electronic equipment and storage medium
CN110188338A (en) The relevant method for identifying speaker of text and equipment
CN109872725A (en) Multi-angle of view vector processing method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant