CN109473091A - A kind of speech samples generation method and device - Google Patents

A kind of speech samples generation method and device Download PDF

Info

Publication number
CN109473091A
CN109473091A CN201811593971.6A CN201811593971A CN109473091A CN 109473091 A CN109473091 A CN 109473091A CN 201811593971 A CN201811593971 A CN 201811593971A CN 109473091 A CN109473091 A CN 109473091A
Authority
CN
China
Prior art keywords
voice
variable
voice variable
mel
characteristic value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811593971.6A
Other languages
Chinese (zh)
Other versions
CN109473091B (en
Inventor
魏华强
李锐
彭凝多
唐博
彭恒进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Hongwei Technology Co Ltd
Original Assignee
Sichuan Hongwei Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Hongwei Technology Co Ltd filed Critical Sichuan Hongwei Technology Co Ltd
Priority to CN201811593971.6A priority Critical patent/CN109473091B/en
Publication of CN109473091A publication Critical patent/CN109473091A/en
Application granted granted Critical
Publication of CN109473091B publication Critical patent/CN109473091B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • G10L2015/0636Threshold criteria for the updating

Abstract

The present invention provides a kind of speech samples generation method and device, this method comprises: extracting the mel-frequency characteristic value of the first voice variable after obtaining the first voice variable;Utilize the loss function of the mel-frequency characteristic value of the mel-frequency characteristic value and target voice of neural computing the first voice variable;Optimize loss function using the optimization algorithm in neural network by adjusting the value of sampled point in the first voice variable, until the value of the loss function after optimization is less than preset threshold, it is target voice sample that the value for meeting loss function, which is less than the voice variable of preset threshold,.Therefore, inverse Meier transformation based on Neural Networks Solution voice variable, and it is optimized by error of the neural network to the mel-frequency characteristic value of voice variable and the mel-frequency characteristic value of target voice, in the hope of making voice variable of the error when being less than preset threshold, and using voice variable at this time as one to resisting sample, thus the speech samples collection of abundant speech recognition system.

Description

A kind of speech samples generation method and device
Technical field
The present invention relates to technical field of voice recognition, in particular to a kind of speech samples generation method and device.
Background technique
In the existing speech recognition system based on deep learning model, since corpus is not comprehensive enough, speech samples Collect the reasons such as scarcity, cause the robustness of speech recognition system not strong enough, is easy by the interference to resisting sample.
Summary of the invention
The present invention provides a kind of speech samples generation method and device, deficient with the speech samples collection for solving speech recognition system Weary problem.
To achieve the goals above, technical solution provided by the embodiment of the present invention is as follows:
In a first aspect, the embodiment of the present invention provides a kind of speech samples generation method, comprising: obtaining the first voice variable Afterwards, the mel-frequency characteristic value of the first voice variable is extracted;Wherein, the characteristic parameter and target of the first voice variable The characteristic parameter of voice is identical, and the characteristic parameter includes: length, sample rate and sound channel;Using described in neural computing The loss function of the mel-frequency characteristic value of the mel-frequency characteristic value and target voice of first voice variable;It utilizes Optimization algorithm in the neural network optimizes the loss function by adjusting the value of sampled point in the first voice variable, Until the value of the loss function after optimization is less than preset threshold, the value for meeting the loss function is less than the preset threshold Voice variable be target voice sample.Therefore, the inverse Meier transformation based on Neural Networks Solution voice variable, and pass through nerve Network optimizes the error of the mel-frequency characteristic value of voice variable and the mel-frequency characteristic value of target voice, in the hope of It must make voice variable of the error when being less than preset threshold, and using voice variable at this time as one to resisting sample, thus The speech samples collection of abundant speech recognition system.
In alternative embodiment of the invention, the mel-frequency characteristic value for extracting the first voice variable, comprising: Fourier transformation is carried out to each frame of the first voice variable and obtains the second voice variable;To the second voice variable into Row Meier filters to obtain third voice variable;Discrete cosine transform is carried out to the third voice variable and obtains melscale scramble Spectrum, and using the melscale cepstrum as the mel-frequency characteristic value of the first voice variable.Therefore, voice is extracted to become The process of the mel-frequency characteristic value of amount can be with are as follows: and Fourier transformation is carried out, Meier filtering is carried out, carries out discrete cosine transform, To make voice variable have better table using obtained melscale cepstrum as the mel-frequency characteristic value of voice variable Show.
In alternative embodiment of the invention, plum is obtained to third voice variable progress discrete cosine transform described Now it spends after cepstrum, the method also includes: calculus of differences is carried out to the melscale cepstrum;It is described by the plum Now mel-frequency characteristic value of the degree cepstrum as the first voice variable, comprising: insert the result of the calculus of differences Enter to obtain the mel-frequency characteristic value of the first voice variable in the melscale cepstrum.Therefore, voice is become The difference for the melscale cepstrum that before and after frames are extracted in amount, as the parameter for representing voice variable interframe multidate information, supplement Into melscale cepstrum, together as the mel-frequency characteristic value of voice variable, so that utilizing in speech recognition system should Voice variable has bigger application range after being trained.
In alternative embodiment of the invention, Fourier transformation is carried out in each frame to the first voice variable and is obtained Before second voice variable, the method also includes: high-pass filtering processing is carried out to the first voice variable, and will be at filtering The first voice variable after reason is divided into continuous frame, carries out windowing process to each frame.Therefore, voice variable is being solved Before mel-frequency characteristic value, voice variable can be filtered first, framing, the preemphasis processing such as adding window, make to handle The voice variable arrived is more conducive to solve mel-frequency characteristic value.
In alternative embodiment of the invention, the first voice variable of the acquisition, comprising: generate sound bite;To described Sound bite is formatted to obtain the first voice variable, so that the characteristic parameter of the first voice variable and institute The characteristic parameter for stating target voice is identical.Therefore, voice variable can be the Duan Yuyin generated at random, the length of the voice The characteristic parameters such as degree, sample rate and sound channel should be identical as the length of target voice, sample rate and sound channel, so as to protect The voice variable that card final optimization pass obtains can be used as the sample of speech recognition system.
In alternative embodiment of the invention, the mel-frequency characteristic value for extracting the first voice variable it Before, the method also includes: obtain the target voice;Extract the mel-frequency characteristic value of the target voice.Therefore, Before handling voice variable, one section of target voice can be obtained first, which is the optimization of voice variable Target.
In alternative embodiment of the invention, it is less than the language of the preset threshold in the value for meeting the loss function Change of tune amount be target voice sample after, the method also includes: using the target voice sample be sample utilize the nerve Network is trained speech recognition system.It therefore, can be with after obtaining standard compliant voice variable using neural network Speech recognition system is trained using the voice variable as training sample, to promote the robustness of the voice system.
Second aspect, the embodiment of the present invention provide a kind of speech samples generating means, comprising: the first extraction module is used for After obtaining the first voice variable, the mel-frequency characteristic value of the first voice variable is extracted;Wherein, first voice becomes The characteristic parameter of amount and the characteristic parameter of target voice are identical, and the characteristic parameter includes: length, sample rate and sound channel;The One computing module, for the mel-frequency characteristic value and the target using the first voice variable described in neural computing The loss function of the mel-frequency characteristic value of voice;Optimization module, for being passed through using the optimization algorithm in the neural network The value for adjusting sampled point in the first voice variable optimizes the loss function, until the value of the loss function after optimization Less than preset threshold, it is target voice sample that the value for meeting the loss function, which is less than the voice variable of the preset threshold,.Cause This, is converted using the first extraction module based on the inverse Meier of Neural Networks Solution voice variable, and pass through mind using optimization module It is optimized through error of the network to the mel-frequency characteristic value of voice variable and the mel-frequency characteristic value of target voice, with The voice variable for making the error when being less than preset threshold is acquired, and using voice variable at this time as one to resisting sample, from And the speech samples collection of abundant speech recognition system.
In alternative embodiment of the invention, first extraction module includes: the first conversion module, for described the Each frame of one voice variable carries out Fourier transformation and obtains the second voice variable;First filter module, for described second Voice variable carries out Meier and filters to obtain third voice variable;Second conversion module, for being carried out to the third voice variable Discrete cosine transform obtains melscale cepstrum, and using the melscale cepstrum as the plum of the first voice variable That frequecy characteristic value.Therefore, the process that the first extraction module extracts the mel-frequency characteristic value of voice variable can be with are as follows: utilizes the One conversion module carry out Fourier transformation, using the first filter module carry out Meier filtering, using the second conversion module carry out from Cosine transform is dissipated, to make voice variable using obtained melscale cepstrum as the mel-frequency characteristic value of voice variable There is better expression.
In alternative embodiment of the invention, described device further include: the second computing module, for the melscale Cepstrum carries out calculus of differences;Second conversion module includes: insertion module, for the result of the calculus of differences to be inserted into The mel-frequency characteristic value of the first voice variable is obtained in the melscale cepstrum.Therefore, second is calculated The difference for the melscale cepstrum that before and after frames are extracted in the voice variable being calculated in module, as representing voice variable frame Between multidate information parameter, and using insertion module add in melscale cepstrum, together as the Meier of voice variable Frequecy characteristic value, to have bigger application range after speech recognition system is trained using the voice variable.
In alternative embodiment of the invention, described device further include: third filter module, for first voice Variable carries out high-pass filtering processing, and the first voice variable after filtering processing is divided into continuous frame, to each frame into Row windowing process.It therefore, can be sharp first before the mel-frequency characteristic value for solving voice variable using the first extraction module Voice variable is filtered with third filter module, framing, the preemphasis processing such as adding window, the voice variable for obtaining processing is more Conducive to solution mel-frequency characteristic value.
In alternative embodiment of the invention, first extraction module includes: generation module, for generating voice sheet Section;Formatting module obtains the first voice variable for being formatted to the sound bite, so that first language The characteristic parameter of change of tune amount and the characteristic parameter of the target voice are identical.Therefore, voice variable can be generation The Duan Yuyin that module generates at random, the characteristic parameters such as length, sample rate and sound channel of the voice should be with target voices Length, sample rate and sound channel are identical, and the voice variable that thereby may be ensured that final optimization pass obtains can be used as speech recognition system The sample of system.
In alternative embodiment of the invention, described device further include: module is obtained, for obtaining the target voice; Second extraction module, for extracting the mel-frequency characteristic value of the target voice.Therefore, to voice variable Before reason, one section of target voice can be obtained first with module is obtained, which is the target of voice variable optimization.
In alternative embodiment of the invention, described device further include: training module, for the target voice sample Speech recognition system is trained using the neural network for sample.Therefore, it is complied with standard using neural network Voice variable after, can use training module and speech recognition system instructed using the voice variable as training sample Practice, to promote the robustness of the voice system.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, comprising: processor, memory and bus, it is described Memory is stored with the executable machine readable instructions of the processor, when electronic equipment operation, the processor with By bus communication between the memory, executes in first aspect and appoint when the machine readable instructions are executed by the processor Method described in one.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer readable storage medium On be stored with computer program, when which is run by processor execute any optional implementation of first aspect in Any method.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, the embodiment of the present invention is cited below particularly, and match Appended attached drawing is closed, is described in detail below.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described.It should be appreciated that the following drawings illustrates only certain embodiments of the present invention, therefore it is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is a kind of flow chart of speech samples generation method provided in an embodiment of the present invention;
Fig. 2 is the flow chart of another speech samples generation method provided in an embodiment of the present invention;
Fig. 3 is the flow chart of another speech samples generation method provided in an embodiment of the present invention;
Fig. 4 is the flow chart of another speech samples generation method provided in an embodiment of the present invention;
Fig. 5 is the flow chart of another speech samples generation method provided in an embodiment of the present invention;
Fig. 6 is a kind of structural block diagram of speech samples generating means provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description.Obviously, described embodiment is only a part of the embodiments of the present invention, instead of all the embodiments.It is logical The component for the embodiment of the present invention being often described and illustrated herein in the accompanying drawings can be arranged and be designed with a variety of different configurations.
Therefore, the detailed description of the embodiment of the present invention provided in the accompanying drawings is not intended to limit below claimed The scope of the present invention, but be merely representative of selected embodiment of the invention.Based on the embodiment of the present invention, those skilled in the art Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
It should also be noted that similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined in a attached drawing, does not then need that it is further defined and explained in subsequent attached drawing.
In the description of the present invention, it should be noted that term " in ", "upper", "lower", "horizontal", "inner", "outside" etc. refer to The orientation or positional relationship shown is to be based on the orientation or positional relationship shown in the drawings or when invention product use usually puts The orientation or positional relationship put, is merely for convenience of description of the present invention and simplification of the description, rather than the dress of indication or suggestion meaning It sets or element must have a particular orientation, be constructed and operated in a specific orientation, therefore should not be understood as to limit of the invention System.In addition, term " first ", " second " etc. are only used for distinguishing description, it is not understood to indicate or imply relative importance.
In addition, the terms such as term "horizontal", "vertical" are not offered as requiring component abswolute level or pendency, but can be slightly Low dip.It is not to indicate that the structure has been had to if "horizontal" only refers to that its direction is more horizontal with respect to for "vertical" It is complete horizontal, but can be slightly tilted.
In the description of the present invention, it should be noted that unless otherwise clearly defined and limited, term " setting ", " phase Even ", " connection " shall be understood in a broad sense, for example, it may be being fixedly connected, may be a detachable connection, or be integrally connected.It can To be mechanical connection, it is also possible to be electrically connected.It can be directly connected, can also indirectly connected through an intermediary, it can be with It is the connection inside two elements.For the ordinary skill in the art, it can understand that above-mentioned term exists with concrete condition Concrete meaning in the present invention.
With reference to the accompanying drawing, it elaborates to some embodiments of the present invention.In the absence of conflict, following Feature in embodiment and embodiment can be combined with each other.
First embodiment
The embodiment of the present invention provides a kind of speech samples generation method, please refers to Fig. 1, and Fig. 1 provides for the embodiment of the present invention A kind of speech samples generation method flow chart, this method comprises the following steps:
Step S100: after obtaining the first voice variable, the mel-frequency characteristic value of the first voice variable is extracted.
Specifically, mel-frequency cepstrum is the non-linear melscale based on sound frequency in acoustic processing field The linear transformation of logarithmic energy frequency spectrum, it more can the approximate mankind than the frequency band for the linear interval in normal cepstrum Auditory system, therefore, such non-linear expression can make voice signal have better expression in multiple fields.And it mentions Take mel-frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC) characteristic value of one section of voice Mode is the conventional means of those skilled in the art, can be acquired by many modes, the embodiment of the present invention is not made specifically Restriction.
For example, preemphasis, framing are carried out to voice first and divides window, then to each short-time analysis window, passed through Fast Fourier Transform (Fast Fourier Transformation, FFT) obtains corresponding frequency spectrum.It will then be obtained above Frequency spectrum Meier frequency spectrum is obtained by Meier (Mel) filter group, cepstral analysis is finally carried out on Meier frequency spectrum, to obtain Mel-frequency cepstrum coefficient is obtained, which is the feature of the frame voice.Wherein, cepstral analysis can wrap It includes: taking logarithm, does inverse transformation, practical inverse transformation is generally by discrete cosine transform (Discrete Cosine Transform, DCT) Lai Shixian.
It should be noted that in inventive embodiments, the first voice variable for carrying out mel-frequency characteristics extraction can be with It can be noise to generate the Duan Yuyin perhaps used at random, mute or any voice.But obtaining such one After Duan Yuyin, need by being formatted to it, the feature of the characteristic parameter and one section of target voice that make this section of voice is joined Number is identical, wherein characteristic parameter may include: length, sample rate and sound channel etc..Above-mentioned target voice is the voice variable References object, the voice variable are finally needed as close possible to above-mentioned target voice.In addition to guaranteeing the by the way of formatting The characteristic parameter of one voice variable is identical as the characteristic parameter of target voice, can also just generate spy when generating the voice variable A parameter Duan Yuyin identical with the characteristic parameter of target voice is levied, to simplify the process being formatted to this section of voice.
Step S200: the mel-frequency characteristic value and the mesh using the first voice variable described in neural computing The loss function of the mel-frequency characteristic value of poster sound.
Specifically, being calculated after the mel-frequency characteristic value of the first voice variable in the step s 100, in order to judge The error of the first voice variable and target voice at this time can use a loss function to indicate the first voice variable The error of mel-frequency characteristic value and the mel-frequency characteristic value of target voice.The process can be by neural fusion, benefit This is asked with log logarithm loss function, quadratic loss function, figure penalties function or the unknown losses function in neural network The error of the mel-frequency characteristic value of the mel-frequency characteristic value and target voice of first voice variable.
It should be noted that for those skilled in the art's in the way of the loss function solution error in neural network Conventional means can be acquired by many modes, and the embodiment of the present invention does not limit specifically.
Step S300: using the optimization algorithm in the neural network by adjusting sampled point in the first voice variable Value optimize the loss function, until the value of the loss function after optimization is less than preset threshold, meet the loss letter The voice variable that several values is less than the preset threshold is target voice sample.
Specifically, the Meier frequency of the first voice variable is given expression to using the loss function in neural network in step s 200 After the error of rate characteristic value and the mel-frequency characteristic value of target voice, the optimization algorithm in neural network can be continued with Above-mentioned loss function is optimized, when each calculates an error amount, by the error amount calculated and default threshold Value is compared, and when error amount is greater than preset threshold, by changing the value of several sampled points in voice variable, is calculated new The mel-frequency characteristic value of voice variable and the mel-frequency characteristic value of new voice variable and the Meier of target voice frequency The loss function of rate characteristic value, repeated optimization process and the error amount for obtaining optimization compared with preset threshold, work as error again When value is greater than preset threshold, repeat the above process;When error amount is less than preset threshold, terminate optimization, exports error amount at this time Corresponding voice variable, the voice variable are the satisfactory voice sample closest to target voice in the embodiment of the present invention This.Wherein, above-mentioned preset threshold is user's preset fixed value according to the actual situation, in neural network to loss letter During number optimizes, it is believed that when optimum results are less than the fixed value, loss function reaches minimum, to avoid The process of optimization is excessively tediously long, cannot obtain a satisfactory voice variable always.
It is sufficiently small to obtain meeting loss function it should be noted that optimized using neural network to loss function In the case where optimal solution mode, be those skilled in the art conventional means, can be by many modes to loss function It optimizes, such as gradient decline, least square method etc., the embodiment of the present invention does not limit specifically.
In embodiments of the present invention, the inverse Meier transformation based on Neural Networks Solution voice variable, and pass through neural network The error of the mel-frequency characteristic value of mel-frequency characteristic value and target voice to voice variable optimizes, in the hope of making Voice variable of the error when being less than preset threshold, and using voice variable at this time as one to resisting sample, thus abundant The speech samples collection of speech recognition system.
Further, referring to figure 2., Fig. 2 is the stream of another speech samples generation method provided in an embodiment of the present invention Cheng Tu, step S100 include the following steps:
Step S110: Fourier transformation is carried out to each frame of the first voice variable and obtains the second voice variable.
Specifically, can become first to the first voice during solving the first voice variable mel-frequency characteristic value Amount carries out the processing of Fourier transformation, and each frame in the first voice variable is carried out Short Time Fourier Transform, is calculated each The power spectrum of frame obtains the second voice variable, so that the information of the first voice variable is transformed into frequency domain from time domain.
Step S120: Meier is carried out to the second voice variable and filters to obtain third voice variable.
Specifically, the second voice variable after Fourier transformation is carried out Meier filtering by a Meier filter.? , can be using n triangle bandpass filtering as Meier filter in the embodiment of the present invention, this n triangle bandpass filter is in plum It can be above your frequency equally distributed.Second voice variable signal obtains every after multiplied by this n triangle bandpass filter The logarithmic energy of one filter output, as third voice variable.
Step S130: carrying out discrete cosine transform to the third voice variable and obtain melscale cepstrum, and by institute State mel-frequency characteristic value of the melscale cepstrum as the first voice variable.
Specifically, by third voice variable, i.e. n logarithmic energy carries out discrete cosine transformation, so as to find out the plum of L rank Now cepstrum is spent, which is the mel-frequency characteristic value of the first voice variable.
In embodiments of the present invention, the process for extracting the mel-frequency characteristic value of voice variable can be with are as follows: carries out Fourier Transformation carries out Meier filtering, carries out discrete cosine transform, thus using obtained melscale cepstrum as the plum of voice variable That frequecy characteristic value, makes voice variable have better expression.
Further, referring to figure 3., Fig. 3 is the stream of another speech samples generation method provided in an embodiment of the present invention Cheng Tu, step S110 could alternatively be following steps to step S130:
Step S110: Fourier transformation is carried out to each frame of the first voice variable and obtains the second voice variable.
Step S120: Meier is carried out to the second voice variable and filters to obtain third voice variable.
Step S131: discrete cosine transform is carried out to the third voice variable and obtains melscale cepstrum.
Step S140: calculus of differences is carried out to the melscale cepstrum.
Specifically, in step S131 by discrete cosine transform obtain the first voice variable melscale cepstrum it Afterwards, discrete differential operation can be carried out to melscale cepstrum, it can carry out discrete first-order difference calculating, can also carry out Discrete second differnce calculates or discrete first-order difference calculates and the calculating of discrete second differnce is calculated, and obtains difference Value after calculating.
Step S150: the result of the calculus of differences is inserted into the melscale cepstrum and obtains first voice The mel-frequency characteristic value of variable.
Specifically, the melscale that the first voice variable will be inserted into step S140 by the value that discrete differential is calculated The mel-frequency characteristic value that the first voice variable is obtained among cepstrum, using the multidate information as the first voice variable interframe. It should be noted that the value that discrete first-order difference is calculated can be only inserted into, discrete second differnce also can be inserted and calculate The value arrived, or it is inserted into the value that discrete first-order difference calculates and discrete second differnce is calculated simultaneously.
In embodiments of the present invention, the difference of melscale cepstrum before and after frames in voice variable extracted, as generation The parameter of predicative change of tune amount interframe multidate information, adds in melscale cepstrum, together as the Meier frequency of voice variable Rate characteristic value, to have bigger application range after speech recognition system is trained using the voice variable.
Further, further include following steps before step S110:
Step S160: high-pass filtering processing is carried out to the first voice variable, and by described first after filtering processing Voice variable is divided into continuous frame, carries out windowing process to each frame.
Specifically, solve the first voice variable mel-frequency characteristic value before, can first to the first voice variable into The a series of processing of row, treatment process may include: firstly, carrying out preemphasis processing to the first voice variable, i.e., by the first language Sound signal is by a high-pass filter, to eliminate during generating first cloud because of signal vocal cords and lip to voice It is influenced caused by signal, so that the first voice signal of compensation receives the high frequency section that articulatory system is suppressed.
Secondly, carrying out sub-frame processing to the first voice signal by preemphasis processing, i.e., continuous first voice is believed Number it is divided into continuous multiple frames, the length of each frame can control within the scope of 20-50 millisecond, and corresponding sample point is a Number is equal to the product of the first speech signal samples rate and every frame length.
Finally, the first voice signal after sub-frame processing, in order to keep its each frame two-end-point smooth and Continuity can carry out windowing process to every frame of the first voice signal after sub-frame processing.This is because in subsequent step When carrying out Fourier transform to the first voice signal in rapid, all assume that the signal in a sound frame is to represent a cycle news Number, if this periodicity is not present, Fourier transform can for the discontinuous variation in left and right end to be met, and generate it is some not There are the Energy distributions of former signal, cause the error in analysis.In embodiments of the present invention, using will be after sub-frame processing Each frame of first voice signal keeps the continuous mesh in sound frame left and right ends multiplied by the mode of the Hamming window of same length to reach 's.
It should be noted that the above-mentioned concrete mode handled the first voice signal and data are of the invention real Several schemes of example offer are applied, those skilled in the art can be easy to think of the side of remaining signal processing on this basis Formula and the scheme of protection of the embodiment of the present invention.
In embodiments of the present invention, before the mel-frequency characteristic value for solving voice variable, voice can be become first Amount is filtered, framing, the preemphasis processing such as adding window, and the voice variable for obtaining processing is more conducive to solve mel-frequency feature Value.
Further, referring to figure 4., Fig. 4 is the stream of another speech samples generation method provided in an embodiment of the present invention Cheng Tu, step S100 further includes following steps:
Step S170: sound bite is generated.
Specifically, the generation of the sound bite is a random process, it can be using typing one end audio, one section of downloading The modes such as voice, the sound bite of generation can be one section of noise, mute or any voice.
Step S180: being formatted the sound bite to obtain the first voice variable, so that first language The characteristic parameter of change of tune amount and the characteristic parameter of the target voice are identical.
Specifically, can be formatted to the voice variable, after generating sound bite in step S170 so that format The first voice variable after change is identical as the characteristic parameter of one end target voice, and this feature parameter can be length, sample rate, sound The characteristic parameters such as road.Wherein, above-mentioned target voice is the voice sheet of most original being trained in neural network as sample Section.
In embodiments of the present invention, voice variable can be the Duan Yuyin generated at random, length, the sample rate of the voice And the characteristic parameters such as sound channel should be identical as the length of target voice, sample rate and sound channel, thereby may be ensured that final excellent Changing obtained voice variable can be used as the sample of speech recognition system.
Further, referring to figure 5., Fig. 5 is the stream of another speech samples generation method provided in an embodiment of the present invention Cheng Tu further includes following steps before step S100:
Step S400: the target voice is obtained.
Step S500: the mel-frequency characteristic value of the target voice is extracted.
Specifically, target voice is the sound bite of most original being trained in neural network as sample, it can be with Using the mel-frequency characteristic value for extracting target voice with identical mode in step S100, to be calculated by the optimization of neural network Method makes the first voice variable approach target voice as far as possible, so that the first voice variable for guaranteeing that final optimization pass obtains can be made It is trained for the sample of neural network, to achieve the purpose that increase the robustness of speech recognition system.
In embodiments of the present invention, before handling voice variable, one section of target voice can be obtained first, it should Target voice is the target of voice variable optimization.
Further, further include following steps after step S300:
Step S600: speech recognition system is instructed using the neural network using the target voice sample as sample Practice.
Specifically, the first voice variable is speech recognition system after having obtained suitable first voice variable It is new to resisting sample.It can be to have identical length, sample rate, sound channel etc. with target voice by saving the first voice variable The voice of feature, and the amplitude of the speech waveform generally takes normal range (NR), i.e., -215To 215Between -1, add the voice Enter and carry out dual training into former speech recognition system, to enhance the robustness of existing voice identifying system.
It in embodiments of the present invention, can be by the language after obtaining standard compliant voice variable using neural network Change of tune amount is trained speech recognition system as training sample, to promote the robustness of the voice system.
Second embodiment
The embodiment of the present invention provides a kind of speech samples generating means 600, please refers to Fig. 6, and Fig. 6 mentions for the embodiment of the present invention A kind of structural block diagram of the speech samples generating means supplied, which includes: the first extraction module 610, For extracting the mel-frequency characteristic value of the first voice variable after obtaining the first voice variable;Wherein, first language The characteristic parameter of change of tune amount and the characteristic parameter of target voice are identical, and the characteristic parameter includes: length, sample rate and sound Road;First computing module 620, for using the first voice variable described in neural computing the mel-frequency characteristic value with The loss function of the mel-frequency characteristic value of the target voice;Optimization module 630, it is excellent in the neural network for utilizing Change algorithm and optimize the loss function by adjusting the value of sampled point in the first voice variable, until the damage after optimization The value for losing function is less than preset threshold, and it is target language that the value for meeting the loss function, which is less than the voice variable of the preset threshold, Sound sample.
In embodiments of the present invention, the inverse Meier using the first extraction module 610 based on Neural Networks Solution voice variable Transformation, and pass through neural network to the mel-frequency characteristic value of voice variable and the plum of target voice using optimization module 630 You optimize the error of frequecy characteristic value, in the hope of making voice variable of the error when being less than preset threshold, and will at this time Voice variable as one to resisting sample, thus the speech samples collection of abundant speech recognition system.
Further, first extraction module 610 includes: the first conversion module, for the first voice variable Each frame carry out Fourier transformation obtain the second voice variable;First filter module, for the second voice variable into Row Meier filters to obtain third voice variable;Second conversion module, for carrying out discrete cosine change to the third voice variable Get melscale cepstrum in return, and using the melscale cepstrum as the mel-frequency feature of the first voice variable Value.
In embodiments of the present invention, the process that the first extraction module 610 extracts the mel-frequency characteristic value of voice variable can With are as follows: Fourier transformation is carried out using the first conversion module, Meier filtering is carried out using the first filter module, utilizes the second transformation Module carries out discrete cosine transform, thus using obtained melscale cepstrum as the mel-frequency characteristic value of voice variable, Voice variable is set to have better expression.
Further, described device further include: the second computing module, for carrying out difference to the melscale cepstrum Operation;Second conversion module includes: insertion module, is fallen for the result of the calculus of differences to be inserted into the melscale The mel-frequency characteristic value of the first voice variable is obtained in frequency spectrum.
In embodiments of the present invention, Meier before and after frames in the voice variable being calculated in the second computing module extracted The difference of scale cepstrum adds to Meier as the parameter for representing voice variable interframe multidate information, and using insertion module In scale cepstrum, together as the mel-frequency characteristic value of voice variable, to be become in speech recognition system using the voice Amount has bigger application range after being trained.
Further, described device further include: third filter module, for carrying out high pass filter to the first voice variable Wave processing, and the first voice variable after filtering processing is divided into continuous frame, windowing process is carried out to each frame.
In embodiments of the present invention, using the first extraction module 610 solve voice variable mel-frequency characteristic value it Before, voice variable can be filtered first with third filter module, framing, the preemphasis processing such as adding window, make to handle The voice variable arrived is more conducive to solve mel-frequency characteristic value.
Further, first extraction module 610 includes: generation module, for generating sound bite;Format mould Block obtains the first voice variable for being formatted to the sound bite, so that the institute of the first voice variable It is identical as the characteristic parameter of the target voice to state characteristic parameter.
In embodiments of the present invention, voice variable can be the Duan Yuyin that generation module generates at random, the length of the voice The characteristic parameters such as degree, sample rate and sound channel should be identical as the length of target voice, sample rate and sound channel, so as to protect The voice variable that card final optimization pass obtains can be used as the sample of speech recognition system.
Further, described device further include: module is obtained, for obtaining the target voice;Second extraction module is used In the mel-frequency characteristic value for extracting the target voice.
In embodiments of the present invention, before handling voice variable, one can be obtained first with module is obtained Section target voice, the target voice are the target of voice variable optimization.
Further, described device further include: training module, for using the target voice sample as described in sample utilization Neural network is trained speech recognition system.
In embodiments of the present invention, after obtaining standard compliant voice variable using neural network, it can use instruction Practice module to be trained speech recognition system using the voice variable as training sample, to promote the robust of the voice system Property.
3rd embodiment
The embodiment of the present invention provides a kind of electronic equipment, comprising: processor, memory and bus, the memory are deposited The executable machine readable instructions of the processor are contained, when electronic equipment operation, the processor and the storage It is executed by bus communication between device, when the machine readable instructions are executed by the processor any described in first embodiment Method.
Memory can include but is not limited to random access memory (Random Access Memory, RAM), read-only to deposit Reservoir (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electric erasable Read-only memory (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..
Processor can be a kind of IC chip, have signal handling capacity.Above-mentioned processor can be general place Manage device, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (DSP), specific integrated circuit (ASIC), ready-made programmable gate array Arrange (FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware components.It can be real Now or execute the embodiment of the present invention disclosed in various methods, step and logic diagram.General processor can be micro process Device or the processor are also possible to any conventional processor etc..
Fourth embodiment
The embodiment of the present invention provides a kind of computer readable storage medium, and meter is stored on the computer readable storage medium Calculation machine program, when which is run by processor execute any optional side of realizationing of first embodiment in it is any described in Method.
In conclusion the present invention provides a kind of speech samples generation method and device, this method comprises: obtaining the first language After change of tune amount, the mel-frequency characteristic value of the first voice variable is extracted;Wherein, the characteristic parameter of the first voice variable Identical as the characteristic parameter of target voice, the characteristic parameter includes: length, sample rate and sound channel;Utilize neural network meter Calculate the loss letter of the mel-frequency characteristic value of the first voice variable and the mel-frequency characteristic value of the target voice Number;Optimize the damage using the optimization algorithm in the neural network by adjusting the value of sampled point in the first voice variable Function is lost, until the value of the loss function after optimization is less than preset threshold, meets the value of the loss function less than described The voice variable of preset threshold is target voice sample.Therefore, the inverse Meier transformation based on Neural Networks Solution voice variable, and It is carried out by error of the neural network to the mel-frequency characteristic value of voice variable and the mel-frequency characteristic value of target voice Optimization in the hope of making voice variable of the error when being less than preset threshold, and is fought voice variable at this time as one Sample, thus the speech samples collection of abundant speech recognition system.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Claims (10)

1. a kind of speech samples generation method characterized by comprising
After obtaining the first voice variable, the mel-frequency characteristic value of the first voice variable is extracted;Wherein, first language The characteristic parameter of change of tune amount and the characteristic parameter of target voice are identical, and the characteristic parameter includes: length, sample rate and sound Road;
Utilize the Meier of the mel-frequency characteristic value and the target voice of the first voice variable described in neural computing The loss function of frequecy characteristic value;
Using described in value optimization of the optimization algorithm in the neural network by adjusting sampled point in the first voice variable Loss function, until the value of the loss function after optimization is less than preset threshold, the value for meeting the loss function is less than institute The voice variable for stating preset threshold is target voice sample.
2. speech samples generation method according to claim 1, which is characterized in that described to extract the first voice variable Mel-frequency characteristic value, comprising:
Fourier transformation is carried out to each frame of the first voice variable and obtains the second voice variable;
Meier is carried out to the second voice variable to filter to obtain third voice variable;
Discrete cosine transform is carried out to the third voice variable and obtains melscale cepstrum, and by the melscale scramble Compose the mel-frequency characteristic value as the first voice variable.
3. speech samples generation method according to claim 2, which is characterized in that described to the third voice variable After progress discrete cosine transform obtains melscale cepstrum, the method also includes:
Calculus of differences is carried out to the melscale cepstrum;
It is described using the melscale cepstrum as the mel-frequency characteristic value of the first voice variable, comprising:
The result of the calculus of differences is inserted into the melscale cepstrum and obtains the plum of the first voice variable That frequecy characteristic value.
4. speech samples generation method according to claim 2, which is characterized in that the every of the first voice variable Before one frame progress Fourier transformation obtains the second voice variable, the method also includes:
High-pass filtering processing is carried out to the first voice variable, and the first voice variable after filtering processing is divided into company Continuous frame carries out windowing process to each frame.
5. speech samples generation method according to claim 1, which is characterized in that the first voice variable of the acquisition, packet It includes:
Generate sound bite;
The sound bite is formatted to obtain the first voice variable, so that the spy of the first voice variable It is identical as the characteristic parameter of the target voice to levy parameter.
6. speech samples generation method according to claim 1, which is characterized in that extract the first voice change described Before the mel-frequency characteristic value of amount, the method also includes:
Obtain the target voice;
Extract the mel-frequency characteristic value of the target voice.
7. speech samples generation method according to claim 1-6, which is characterized in that meet the damage described Lose function value be less than the preset threshold voice variable be target voice sample after, the method also includes:
Speech recognition system is trained using the neural network using the target voice sample as sample.
8. a kind of speech samples generating means characterized by comprising
First extraction module, for extracting the mel-frequency feature of the first voice variable after obtaining the first voice variable Value;Wherein, the characteristic parameter of the first voice variable is identical as the characteristic parameter of target voice, and the characteristic parameter includes: Length, sample rate and sound channel;
First computing module, for utilizing the mel-frequency characteristic value and the institute of the first voice variable described in neural computing State the loss function of the mel-frequency characteristic value of target voice;
Optimization module, for utilizing the optimization algorithm in the neural network by adjusting sampled point in the first voice variable Value optimize the loss function, until the value of the loss function after optimization is less than preset threshold, meet the loss letter The voice variable that several values is less than the preset threshold is target voice sample.
9. speech samples generating means according to claim 8, which is characterized in that first extraction module includes:
First conversion module carries out Fourier transformation for each frame to the first voice variable and obtains the change of the second voice Amount;
First filter module filters to obtain third voice variable for carrying out Meier to the second voice variable;
Second conversion module obtains melscale cepstrum for carrying out discrete cosine transform to the third voice variable, and Using the melscale cepstrum as the mel-frequency characteristic value of the first voice variable.
10. speech samples generating means according to claim 9, which is characterized in that described device further include:
Second computing module, for carrying out calculus of differences to the melscale cepstrum;
Second conversion module includes:
It is inserted into module, obtains first voice for the result of the calculus of differences to be inserted into the melscale cepstrum The mel-frequency characteristic value of variable.
CN201811593971.6A 2018-12-25 2018-12-25 Voice sample generation method and device Active CN109473091B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811593971.6A CN109473091B (en) 2018-12-25 2018-12-25 Voice sample generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811593971.6A CN109473091B (en) 2018-12-25 2018-12-25 Voice sample generation method and device

Publications (2)

Publication Number Publication Date
CN109473091A true CN109473091A (en) 2019-03-15
CN109473091B CN109473091B (en) 2021-08-10

Family

ID=65676987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811593971.6A Active CN109473091B (en) 2018-12-25 2018-12-25 Voice sample generation method and device

Country Status (1)

Country Link
CN (1) CN109473091B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111292766A (en) * 2020-02-07 2020-06-16 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for generating speech samples
CN111477247A (en) * 2020-04-01 2020-07-31 宁波大学 GAN-based voice countermeasure sample generation method
WO2020232860A1 (en) * 2019-05-22 2020-11-26 平安科技(深圳)有限公司 Speech synthesis method and apparatus, and computer readable storage medium
CN112201227A (en) * 2020-09-28 2021-01-08 海尔优家智能科技(北京)有限公司 Voice sample generation method and device, storage medium and electronic device
CN112216296A (en) * 2020-09-25 2021-01-12 脸萌有限公司 Audio anti-disturbance testing method and device and storage medium
CN112466298A (en) * 2020-11-24 2021-03-09 网易(杭州)网络有限公司 Voice detection method and device, electronic equipment and storage medium
WO2021137754A1 (en) * 2019-12-31 2021-07-08 National University Of Singapore Feedback-controlled voice conversion

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293289A (en) * 2017-06-13 2017-10-24 南京医科大学 A kind of speech production method that confrontation network is generated based on depth convolution
CN108182936A (en) * 2018-03-14 2018-06-19 百度在线网络技术(北京)有限公司 Voice signal generation method and device
CN108597496A (en) * 2018-05-07 2018-09-28 广州势必可赢网络科技有限公司 A kind of speech production method and device for fighting network based on production
CN108899032A (en) * 2018-06-06 2018-11-27 平安科技(深圳)有限公司 Method for recognizing sound-groove, device, computer equipment and storage medium
US20180342258A1 (en) * 2017-05-24 2018-11-29 Modulate, LLC System and Method for Creating Timbres
CN109036389A (en) * 2018-08-28 2018-12-18 出门问问信息科技有限公司 The generation method and device of a kind of pair of resisting sample

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180342258A1 (en) * 2017-05-24 2018-11-29 Modulate, LLC System and Method for Creating Timbres
CN107293289A (en) * 2017-06-13 2017-10-24 南京医科大学 A kind of speech production method that confrontation network is generated based on depth convolution
CN108182936A (en) * 2018-03-14 2018-06-19 百度在线网络技术(北京)有限公司 Voice signal generation method and device
CN108597496A (en) * 2018-05-07 2018-09-28 广州势必可赢网络科技有限公司 A kind of speech production method and device for fighting network based on production
CN108899032A (en) * 2018-06-06 2018-11-27 平安科技(深圳)有限公司 Method for recognizing sound-groove, device, computer equipment and storage medium
CN109036389A (en) * 2018-08-28 2018-12-18 出门问问信息科技有限公司 The generation method and device of a kind of pair of resisting sample

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020232860A1 (en) * 2019-05-22 2020-11-26 平安科技(深圳)有限公司 Speech synthesis method and apparatus, and computer readable storage medium
WO2021137754A1 (en) * 2019-12-31 2021-07-08 National University Of Singapore Feedback-controlled voice conversion
CN111292766A (en) * 2020-02-07 2020-06-16 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for generating speech samples
CN111292766B (en) * 2020-02-07 2023-08-08 抖音视界有限公司 Method, apparatus, electronic device and medium for generating voice samples
CN111477247A (en) * 2020-04-01 2020-07-31 宁波大学 GAN-based voice countermeasure sample generation method
CN111477247B (en) * 2020-04-01 2023-08-11 宁波大学 Speech countermeasure sample generation method based on GAN
CN112216296A (en) * 2020-09-25 2021-01-12 脸萌有限公司 Audio anti-disturbance testing method and device and storage medium
CN112216296B (en) * 2020-09-25 2023-09-22 脸萌有限公司 Audio countermeasure disturbance testing method, device and storage medium
CN112201227A (en) * 2020-09-28 2021-01-08 海尔优家智能科技(北京)有限公司 Voice sample generation method and device, storage medium and electronic device
CN112466298A (en) * 2020-11-24 2021-03-09 网易(杭州)网络有限公司 Voice detection method and device, electronic equipment and storage medium
CN112466298B (en) * 2020-11-24 2023-08-11 杭州网易智企科技有限公司 Voice detection method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109473091B (en) 2021-08-10

Similar Documents

Publication Publication Date Title
CN109473091A (en) A kind of speech samples generation method and device
RU2685391C1 (en) Method, device and system for noise rejection
EP2178082B1 (en) Cyclic signal processing method, cyclic signal conversion method, cyclic signal processing device, and cyclic signal analysis method
JP4177755B2 (en) Utterance feature extraction system
CN102054480B (en) Method for separating monaural overlapping speeches based on fractional Fourier transform (FrFT)
CN109767783A (en) Sound enhancement method, device, equipment and storage medium
CN109256127B (en) Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter
KR20120090086A (en) Determining an upperband signal from a narrowband signal
Kesarkar et al. Feature extraction for speech recognition
US6701291B2 (en) Automatic speech recognition with psychoacoustically-based feature extraction, using easily-tunable single-shape filters along logarithmic-frequency axis
CN103258537A (en) Method utilizing characteristic combination to identify speech emotions and device thereof
JP2013512475A (en) Complex acoustic resonance speech analysis system
JP2002507776A (en) Signal processing method for analyzing transients in audio signals
CN103778914A (en) Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching
RU2013119828A (en) METHOD FOR DETERMINING THE RISK OF DEVELOPMENT OF INDIVIDUAL DISEASES BY ITS VOICE AND HARDWARE AND SOFTWARE COMPLEX FOR IMPLEMENTING THE METHOD
JP2006215228A (en) Speech signal analysis method and device for implementing this analysis method, speech recognition device using this device for analyzing speech signal, program for implementing this analysis method, and recording medium thereof
JP4166405B2 (en) Drive signal analyzer
CN112863517A (en) Speech recognition method based on perceptual spectrum convergence rate
CN107993666B (en) Speech recognition method, speech recognition device, computer equipment and readable storage medium
Singh et al. A comparative study of recognition of speech using improved MFCC algorithms and Rasta filters
Zouhir et al. Speech Signals Parameterization Based on Auditory Filter Modeling
Flynn et al. A comparative study of auditory-based front-ends for robust speech recognition using the Aurora 2 database
JP4537821B2 (en) Audio signal analysis method, audio signal recognition method using the method, audio signal section detection method, apparatus, program and recording medium thereof
Singh et al. A novel algorithm using MFCC and ERB gammatone filters in speech recognition
Bhore et al. Comparison of Formant Estimation Techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant