SI25265A - The process and the device for marking the period of speech pitch and audio/non-audio segments - Google Patents

The process and the device for marking the period of speech pitch and audio/non-audio segments Download PDF

Info

Publication number
SI25265A
SI25265A SI201600184A SI201600184A SI25265A SI 25265 A SI25265 A SI 25265A SI 201600184 A SI201600184 A SI 201600184A SI 201600184 A SI201600184 A SI 201600184A SI 25265 A SI25265 A SI 25265A
Authority
SI
Slovenia
Prior art keywords
speech
signal
time
autocorrelation
short
Prior art date
Application number
SI201600184A
Other languages
Slovenian (sl)
Inventor
KaÄŤiÄŤ Zdravko
Original Assignee
Univerza v Mariboru Fakulteta za elektrotehniko, računalništvo in informatiko
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univerza v Mariboru Fakulteta za elektrotehniko, računalništvo in informatiko filed Critical Univerza v Mariboru Fakulteta za elektrotehniko, računalništvo in informatiko
Priority to SI201600184A priority Critical patent/SI25265A/en
Priority to PCT/SI2017/000007 priority patent/WO2018026329A1/en
Publication of SI25265A publication Critical patent/SI25265A/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • G10L2025/906Pitch tracking

Abstract

Predlagana rešitev se nanaša na področje analize in sinteze govora, natančneje na postopek in napravo za označevanje zvočnih/nezvočnih segmentov govora in periode višine govora pri analizi kompleksnih signalov. Postopek označevanja period višine govora obsega odstranjevanje enosmerne komponente iz vhodnega signala, nizkopasovno filtriranje z ničelno fazo in mejno frekvenco Flp, izračun kratkočasovne avtokorelacije ob uporabi drsečega okna in spremenljive velikosti okna, določitev kratkočasovnegaavtokorelacijskega časovnega indeksa višine govora, izračun koeficientov pasovnoprepustnega filtra ob upoštevanju vrednosti kratkočasovnega avtokorelacijskega časovnega indeksa, filtriranje govornega signala z adaptivnim pasovnoprepustnim filtrom z ničelno fazo in središčno frekvenco Fcf, kjer je izhodni signal filtra signal višine govora, definiranje oznak period višine govora za signal višine govora na osnovi določitve polperiod signala višine govora, definiranje segmentov višine govora in preslikavaoznak period višine govora za signal višine govora v oznake period višine govora za govorni signal. S postopkom se lahko določi oznako periode višine govora v točki negativnega vrha znotraj segmenta višine govora, v točki pozitivnega vrha, ali pa v točkah začetnega in končnega otipka periode govora, znotraj katerega se nahajata pozitivni in negativni vrh. Detektiranje zvočnega/nezvočnega govora je izvedeno s pomočjo metode upragovanja. Kratkočasovno poprečje absolutnih trenutnih vrednosti amplitudenizkopasovno filtriranega signala in kratkočasovni avtokorelacijski časovni indeksi višine govora so uporabljeni kot kriterij detekcije in označevanja zvočnih/nezvočnih segmentov.The proposed solution refers to the field of speech analysis and synthesis, more precisely to the process and apparatus for marking audio / non-speech speech segments and the speech height during the analysis of complex signals. The procedure for marking the speech-period period involves removing the one-way component from the input signal, the low-pass zero-phase filtering and the Flp threshold frequency, calculating the short-term autocorrelation using the sliding window and the variable window size, defining the short-time reverberation time index of speech height, calculating the coefficients of the pass filter for the short- autocorrelation time index, voice signal filtering with zero-phase adaptive zero-pass filter and center frequency Fcf, where the output signal of the filter is a speech signal height, defining the times of speech height for a speech signal level based on the determination of the half-level of the speech signal signal, defining segments of speech height and a mapping of the speech-time period for the speech signal signal in the speech-time speech signal for the voice signal. The process can specify the period of the speech height at the point of the negative peak within the speech segment segment, at the point of the positive peak, or at the points of the initial and final touch of the period of speech within which the positive and negative peaks are located. The detection of audio / non-speech speech is performed with the help of the method of eligibility. The short-time average of the absolute current values of the amplitude of the bandwidth filtered signal and the short-time autocorrelation time indices of speech height are used as a criterion for the detection and marking of sound / non-oscillatory segments.

Description

POSTOPEK IN NAPRAVA ZA OZNAČEVANJE PERIODE VIŠINE GOVORA IN ZVOČNIH/NEZVOČNIH SEGMENTOVPROCEDURE AND DEVICE FOR LABELING OF THE PERIOD OF THE SPEECH OF SPEECH AND SOUND / SINGLE SEGMENTS

Področje tehnikeField of technique

Predlagana rešitev se nanaša na področje analize in sinteze govora, natančneje na postopek in napravo za označevanje zvočnih/nezvočnih segmentov govora in periode višine govora pri analizi kompleksnih signalov.The proposed solution refers to the field of speech analysis and synthesis, more precisely to the process and apparatus for marking audio / non-speech speech segments and the speech height during the analysis of complex signals.

Tehnični problemTechnical problem

Predlagan izum rešuje tehnični problem, kako označiti zvočne in nezvočne segmente signala za zanesljivo določitev oznak period višine govora z visoko časovno razločljivostjo. Problem je tudi kako označiti periode višine zvočnih segmentov ter postaviti oznake za nezvočne segmente kompleksnega signala, kot sta govor in glasba, da bodo odpravljene težave učinka oknjenja, nizke časovne ali frekvenčne razločljivosti, poprečenja, vstavljanja ali brisanja oznak period in neodpornosti na šum. Postopek označevanja period višine govora, ki je neobčutljiv na šum, z visoko časovno in frekvenčno razločljivostjo, neobčutljivostjo na šum, visoko zanesljivostjo in računsko učinkovitostjo je cilj izuma.The proposed invention solves the technical problem of how to signal the sound and non-oscillatory segments of the signal for reliably determining the marks of the speech-time period with high time resolution. The problem is also how to mark the periods of the sound segment sizes and set tags for non-voice segments of the complex signal, such as speech and music, in order to eliminate the problems of penetration, low time or frequency differences, transitions, insertion or deletion of period labels and noise tolerance. The process of marking a noise-insensitive noise level with high time and frequency resolution, noise insensitivity, high reliability and computational efficiency is the object of the invention.

Višina govora je temeljna značilnost govornega signala. V časovni domeni se izkazuje s periodo višine govora znotraj zvočnega segmenta govora. V primeru govornega signala pomeni označevanje višine govora označevanje časovnih trenutkov zaprtja glasilk, ki so imenovane tudi epohe (angl. glottal closure instants (GCIs)). Običajno se oznaka višine postavi na točko ekstrema amplitude govornega signala znotraj periode višine govora - najpogosteje negativnega ekstrema amplitude, ki ustreza časovnemu trenutku zaprtja glasilk.Speech height is the fundamental characteristic of the speech signal. In the time domain, it is shown with a period of speech height within the voice speech segment. In the case of a voice signal, speech signaling means the marking of the timing of the closing of the vocabularies, also called glottal closure instants (GCIs)). Normally, the height designation is placed at the point of the amplitude of the amplitude of the speech signal within the speech height range - most often the negative amplitude extremity corresponding to the timing of the closing of the vocabularies.

Pri postopkih samodejnega označevanja višine govora je potrebno odločiti ali bo označen časovni trenutek pozitivnih ali negativnih ekstremov amplitude signala, kar je lahko določeno s postopkom detekcije polaritete signala. Oznake višine govora so lahko postavljene tudi v časovni trenutek prečkanja nivoja nič signala, kjer te točke predstavljajo začetno in končno točko periode govornega signala znotraj katere se nahajata pozitivni in negativni ekstrem amplitude signala.In automatic speech marking procedures, it is necessary to decide whether the timing of the positive or negative amplitude of the amplitude of the signal will be indicated, which can be determined by the signal polarity detection procedure. Voice level markings can also be set at the time of crossing the zero level signal, where these points represent the start and end points of the period of the voice signal within which the positive and negative extrema of the amplitude of the signal are located.

Zanesljivo označevanje period višine govora je pomembno na množici področij procesiranja govornega signala, kot so: izboljšanje kvalitete govora ob upoštevanju višine govora, klinična diagnostika, kodiranje govora, avtomatska fonetična segmentacija, analiza in procesiranje govora ob upoštevanju višine govora, karakterizacija govorca, pretvarjanje glasu, razpoznavanje govora in sinteza govora.Reliable marking of speech height is important in many areas of speech signal processing, such as: improving voice quality by considering speech height, clinical diagnostics, voice coding, automatic phonetic segmentation, speech analysis and processing, taking into account speech height, speaker characterization, voice converting, Speech recognition and speech synthesis.

Stanje tehnike V zadnjih letih je bilo predlaganih veliko število postopkov označevanja višine govora, ki so bili zasnovani na različnih tehnikah procesiranja govora, kot so: linearna predikcija, kepstralna analiza, avtokorelacijska funkcija, funkcija razlike poprečja magnitude, Cohenov razred časovno-frekvenčnih predstavitev, metode skupinske zakasnitve, večfazni algoritmi, razstavitev z naborom empiričnih modusov, postopki z uporabo pragov in postopki določanja maksimumov. Pri mnogih od teh postopkov so prisotne težave učinka oknjenja, nizke časovne ali frekvenčne razločljivosti, poprečenja, vstavljanja ali brisanja oznak period, neodpornosti na šum ipd. Prav tako so prenekatere tudi računsko neučinkovite.State of the art In recent years, a large number of speech recognition methods have been proposed, based on various speech processing techniques, such as: linear prediction, kepstral analysis, autocorrelation function, magnitude difference difference function, Kohenov class of time-frequency representations, methods group delays, multiphase algorithms, disassembling with empirical modes, threshold procedures, and maximization procedures. In many of these processes, there are problems with window effect, low time or frequency difference, interval, insertion, or deletion of time tags, noise resistance, etc. Likewise, many are also calculating ineffective.

Med pogosto uporabljanimi postopki so postopki, ki slonijo na uporabi avtokorelacije. Takšni rešitvi sta razkriti v patentih US 8280725 B2 in US 8214201 B2, v katerih je perioda višine govora definirana na osnovi izračuna avtokorelacijskih vrednosti prekrivajočih se delov govornega signala in izračunu kombinirane avtokorelacijske vrednosti, na osnovi katere je ocenjena perioda višine govora. V patentni prijavi US 2004/0260537 A1, je avtokorelacija uporabljena za določitev periode višine govora ob uporabi iterativne tehnike izračuna indeksa avtokorelacijskega zaporedja, ki označuje periodo višine govora. Čeprav se lahko z veliko večino postopkov, kjer je uporabljena avtokorelacijska funkcija doseže dobra natančnost, ostaja prisoten problem časovne razločljivosti. Da bi se lahko zanesljivo določila vrednost avtokorelacijskega zaporedja in želena natančnost postopka, mora okno signala za izračun avtokorelacije vsebovati vsaj tri ali več period govornega signala, kar pomeni potrebo po dolgih časovnih oknih analize in posledično nižjo časovno razločljivost postopka. Visoka časovna razločljivost postopka je pomembna predvsem na mejah zvočnih/nezvočnih segmentov govora. V nekaj znanih postopkih je za izločitev signala višine govora uporabljena tehnika filtriranja, kot na primer v patentu US6349277 B1. V tej rešitvi je uporabljen analizator frekvence višine govora za oceno frekvence višine govora vhodnega signala, na osnovi česar je nastavljena mejna frekvenca adaptivnega filtra tako, da ta izloči signal višine govora vhodnega govornega signala. Pri izvedbi analizatorja govora se lahko uporabijo različni poznani postopki analize frekvence višine govora. V drugem izvedbenem primeru je v patentu uporabljena množica nizkopasovnih filtrov, ki so priključeni na detektorje vrhov. Ti v filtriranem signalu detektirajo vrhove signala, na osnovi česar selektor kanalov ob vsakem časovnem trenutku adaptivno izbira najustreznejši kanal, enega od filtrov v množici filtrov, ter na osnovi tako definiranih izhodov filtrov določi oznake period. Da se odstranijo nepravilnosti v tako določenem zaporedju oznak period se le-te pretvorijo v krivuljo frekvence višine govora, ki je potem uporabljena za nastavitev parametrov adaptivnega nizkopasovnega filtra, ki iz vhodnega govornega signala izloči signal višine govora. V patentu US 6470311 B1 so parametri optimalnega filtra določeni s pomočjo filtriranja delov govornega signala, večjih od 50 ms z množico filtrov. Nato so izračunane poprečne vrednosti izhodnih signalov filtrov in razlike med poprečji. Na osnovi tega je določen prvi vrh razlike energije nad krivuljo poprečja razlike energije, ki je uporabljen za definiranje parametrov optimalnega filtra za filtriranje vhodnega signala.Commonly used procedures include procedures that are based on autocorrelation. Such solutions are disclosed in US Pat. Nos. 8280725 B2 and US 8214201 B2, in which the speech height period is defined based on the calculation of the autocorrelation values of the overlapping parts of the speech signal and the calculation of the combined autocorrelation value on the basis of which the speech height period is estimated. In patent application US 2004/0260537 A1, the autocorrelation is used to determine the speech height period using an iterative technique for calculating an autocorrelation sequence index indicating a period of speech height. Although the vast majority of procedures, where the autocorrelation function is used, achieves good accuracy, the problem of time resolution remains. In order to reliably determine the value of the autocorrelation sequence and the desired process precision, the autocorrelation signal window must contain at least three or more periods of the voice signal, which means the need for long time windows of the analysis and, consequently, the lower time resolution of the process. The high time resolution of the process is important especially at the limits of the audio / non-speech segments of the speech. In some well-known procedures, the filtering technique is used to eliminate the speech signal signal, such as in US6349277 B1 patent. In this solution, a speech speech frequency analyzer is used to estimate the frequency of the speech signal input signal, on the basis of which the limit frequency of the adaptive filter is set so that it eliminates the speech speech signal signal of the input voice signal. When performing the speech analyzer, various known methods of speech speech frequency analysis can be used. In another embodiment, a plurality of low-pass filters that are connected to the peak detectors are used in the patent. In the filtered signal, they detect the peaks of the signal, based on which the channel selector at each time point adaptively selects the most suitable channel, one of the filters in the plurality of filters, and determines the period markings based on the defined filter outputs. In order to remove the irregularities in the specified sequence of the time codes, they are converted into a pitch curve curve, which is then used to set the adaptive low-pass filter parameters that eliminate the speech signal signal from the input voice signal. In US Patent 6470311 B1, the optimal filter parameters are determined by filtering parts of the voice signal greater than 50 ms with a plurality of filters. Then the transverse values of the output signals of the filters and the differences between the transversal are calculated. On this basis, the first peak of the energy difference over the curve of the energy difference is used, which is used to define the parameters of the optimal filter for input signal filtering.

Pristopi, ki uporabljajo tehnike filtriranja govornega signala imajo relativno dobro časovno razločljivost. Uspešnost postopkov določanja oznak višine govora, ki uporabljajo postopke določanja frekvence višine govora je odvisna predvsem od natančnosti, robustnosti in časovne razločljivosti uporabljenih postopkov določitve frekvence višine govornega signala. Postopki, ki uporabljajo množico filtrov so pogosto občutljivi na šum, kar lahko privede do slabih rezultatov v primeru šumnega govornega signala. V patentu EP 2278581 B1 je opisana rešitev, kjer je uporabljen pasovnoprepustni filter, ki iz govornega signala izloči signal višine govora, pri tem so parametri filtra (karakteristika prepustnega pasu) določeni glede na ocenjeno frekvenco višine govora, katere oceno določa modul določitve frekvence višine govora. V tej rešitvi ima modul določitve frekvence višine govora kompleksno strukturo, kar v splošnem pomeni večjo računsko zahtevnost predlagane rešitve.Approaches using speech signal filtering techniques have a relatively good time resolution. The effectiveness of the procedures for determining the height of speech indications using the speech frequency determination procedures depends primarily on the accuracy, robustness and time resolution of the methods used to determine the frequency of the speech signal height. Procedures using a plurality of filters are often sensitive to noise, which can lead to poor results in the case of a noisy voice signal. In EP 2278581 B1, a solution is described where a band-pass filter is used which excludes a speech signal signal from the voice signal, wherein the filter parameters (passband bandwidth) are determined according to the estimated frequency of the speech height, the estimate of which determines the speech-frequency frequency determination module . In this solution, the module for determining the pitch-frequency frequency is a complex structure, which in general implies greater computational complexity of the proposed solution.

Prej opisane rešitve se razlikujejo od pričujočega izuma po tem, da postopek vključuje preprost mehanizem določanja parametrov adaptivnega pasovnoprepustnega filtra z ničelno fazo.The solutions described above differ from the present invention in that the process includes a simple mechanism for determining the parameters of the zero-phase adaptive zero-pass filter.

Opis rešitve tehničnega problemaDescription of the solution of the technical problem

Postopek za označevanje periode višine govora in zvočnih/nezvočnih segmentov po izumu vključuje naslednje korake: - iz govornega signala se najprej odstrani enosmerna komponenta čemur sledi filtriranje z nizkopasovnim filtrom z ničelno fazo in mejno frekvenco Fip.; - izhodni filtriran govorni signal se uporabi za izračun kratkočasovnega poprečja absolutnih trenutnih vrednosti amplitude signala in izračun zaporedja kratkočasovne pristranske avtokorelacije ob uporabi drsečega kratkočasovnega okna analize s spremenljivo dolžino; - poišče se vrednost prvih dveh harmonično povezanih vrhov (harmonskih vrhov) kratkočasovnega avtokorelacijskega zaporedja in definira časovni indeks višine govora avtokorelacijskega zaporedja; - na osnovi vrednosti časovnega indeksa avtokorelacijskega zaporedja se definira vrednost središčne frekvence F* pasovnoprepustnega filtra in se izračunajo koeficienti filtra; - nizkopasovno filtriran segment govora, za katerega se definira časovni indeks višine govora avtokorelacijskega zaporedja, se filtrira z adaptivnim pasovnoprepustnim filtrom z ničelno fazo, ki uporablja izračunane koeficiente filtra; tako filtriran signal predstavlja signal višine govora govornega signala; - izračuna se kratkočasovno poprečje absolutnih trenutnih vrednosti amplitude nizkopasovno filtriranega govornega signala, ki se uporabi za določitev zvočnih segmentov; - v primeru, da so za trenutni segment govora časovni indeksi višine govora avtokorelacijskega zaporedja definirani in so kratkočasovna poprečja absolutnih trenutnih vrednosti amplitude signala večja od pragu TRE1, se označi trenutni segment kot zvočni segment, v nasprotnem se označi trenutni segment kot nezvočni; - za zvočni govor se določi oznake periode signala višine govora v točkah prečkanja nivoja nič, kjer dve sosednji oznaki predstavljata začetek in konec periode signala višine govora, znotraj katere se pojavita pozitiven in negativen vrh signala; - v zadnji fazi se izvede preslikava oznak period višine govora signala višine govora v oznake period višine govora govornega signala; - položaj oznak višine govora se za nezvočne segmente definira na osnovi pravil določanja mej nezvočnih segmentov, ki lahko definirajo položaj oznak period višine govora v konstantnih časovnih intervalih izbrane dolžine, pri tem je lahko izbrana dolžina določena kot poprečna vrednost razdalj med oznakami period višine govora za zvočne segmente ali na osnovi kakorkoli drugače definiranih statističnih karakteristik razdalj med oznakami period višine govora zvočnih segmentov govora, ali pa na osnovi kakšnega drugega kriterija določanja izbrane dolžine.The procedure for marking the speech and audio / non-audio segments of the invention according to the invention includes the following steps: - from the voice signal, the one-way component is first removed, followed by a zero-phase low-pass filtering and a Fip pitch; - the output filtered speech signal is used to calculate the short-time average of the absolute current values of the amplitude of the signal and to calculate the sequence of short-time biased autocorrelation using a sliding, short-time variable-length analysis window; - the value of the first two harmonically connected peaks (harmonic peaks) of the short-time autocorrelation sequence is found and defines the time index of the speech angle of the autocorrelation sequence; - based on the value of the time index of the autocorrelation sequence, the value of the center frequency F * of the band-pass filter is defined and the filter coefficients are calculated; - a low-pass filtered speech segment, for which the autocorrelation sequence speech timing index is defined, is filtered by a zero-phase adaptive zero-pass filter, using the calculated filter coefficients; such a filtered signal is a signal of speech speech speech signal; - the short-time average of the absolute current values of the amplitude of the low-pass filtered speech signal, used to determine the audio segments, is calculated; - in the case where the temporal speech level indices of the autocorrelation sequence are defined and the short-time frequencies of the absolute current values of the amplitude of the signal are greater than the TRE1 threshold, the current segment is denoted as the audio segment; otherwise, the current segment is denoted as the unfinished; - for speech speech, the signs of the speech signal level are specified at zero level crossing points, where the two adjacent marks represent the beginning and end of the period of the speech signal signal within which a positive and negative signal peak appears; - in the last phase, a mapping of the marks of the speech height of the speech signal signal to the markings of the speech signal height of the speech signal is performed; - the position of the speech level markings for non-singular segments is defined on the basis of the rules for determining the boundaries of the non-oscillating segments that can define the position of the marks of the speech height at constant time intervals of the selected length, with the selected length being determined as the average distance between the marks sound segments, or on the basis of any other defined statistical characteristics of the distances between the marks, the period of speech pitch of speech segments of the speech, or on the basis of some other criterion for determining the selected length.

Naprava po prej opisanem postopku vključuje: - enoto za odstranjevanje enosmerne vrednosti, - nizkopasovni filter z ničelno fazo, - enoto za izračun avtokorelacijskega zaporedja in iskanja maksimumov avtokorelacijskega zaporedja; - enoto za izračun kratkočasovnega poprečja absolutnih trenutnih vrednosti amplitude signala, - enoto detekcije zvočnih/nezvočnih segmentov signala, - pomnilnik zvočnega govora, - enoto generiranja koeficientov pasovnoprepustnega filtra, - adaptivni pasovnoprepustni filter z ničelno fazo, - enoto detektiranja polaritete govora in - enoto označevanja period višine govora.The device according to the procedure described above includes: - a unit for removing the DC value, - a zero-phase low-pass filter, - a unit for calculating the autocorrelation sequence and searching for the maximums of the autocorrelation sequence; - unit for calculating the short-time average absolute current value of the amplitude of the signal, -the unit of detection of the audio / non-audio segments of the signal, -the voice speech memory, -the unit of generation of the coefficients of the bandwidth filter, -the adaptive zero-phase filter, -the voice polling detecting unit, Speech height.

Postopek in naprava za označevanje periode višine govora in zvočnih/nezvočnih segmentov po izumu bo v nadaljevanju podrobneje razložena z opisom izvedbenih primerov in slik, ki prikazujejo:The process and device for marking the period of the speech and audio / non-audio segments according to the invention will be explained in more detail hereinafter by the description of the embodiment examples and illustrations showing:

Slika 1 blok shema prvega izvedbenega primera postopkaFigure 1 is a block diagram of the first embodiment of the process

Slika 2 blok shema postopka označevanja period višine govora po drugem izvedbenem primeruFigure 2 is a block diagram of a procedure for marking the speech-time period according to the second embodiment

Slika 3 potek postopka označevanja period višine govora Slika 4a primer govornega signalaFigure 3 is a process of marking the speech-time period Fig. 4a is an example of a voice signal

Slika 4b krivuljo normiranega kratkočasovnega avtokorelacijskega časovnega indeksa višine govoraFigure 4b is a curve for a standardized short-time autocorrelation time index of speech height

Slika 4c normirano kratkočasovno poprečje absolutnih trenutnih vrednosti amplitudeFigure 4c is a normalized short-time average of absolute instantaneous amplitude values

Slika 5 primer zvočnega segmenta govornega signala in signal višine govora, izločen iz zvočnega govora z adaptivnim pasovnoprepustnim filtrom z ničelno fazoFigure 5 is an example of an audible speech signal segment and a speech signal signal excluded from voice speech with a zero-phase adaptive zero-pass filter

Slika 6 primer segmenta zvočnega govornega signalaFigure 6 is an example of an audio speech signal segment

Slika 7 primer zaporedja pristranske kratkočasovne avtokorelacije zvočnega segmenta govornega signalaFigure 7 is an example of a sequential short-time autocorrelation of the voice segment of the speech signal

Slika 8 primer prehoda iz zvočnega v nezvočni govorni signal Slika 9 primer pristranskega avtokorelacijskega zaporedja Slika 10 primer prehoda signala iz zvočnega v nezvočni govor Slika 11 primer pristranskega avtokorelacijskega zaporedjaFigure 8 is an example of a transition from an audio to an unwanted voice signal. Figure 9 is a case of a biased autocorrelation sequence. Figure 10 is an example of a signal transition from an audio to a non-speech language. Figure 11 is an example of a biased autocorrelation sequence

Slika 12 primer govornega signala z označenimi periodami višine govoraFigure 12 is an example of a speech signal with indicated periods of speech height

Slika 13 primer govornega signala s prikazanimi oznakami periode višine govoraFigure 13 is an example of a voice signal with the speech speech signal markings shown

Slika 14 primer govornega signala s prikazanimi oznakami periode višine govornega signalaFigure 14 is an example of a voice signal with the indications of the height of the speech signal

Slika 15 naprava po izumu V podrobnem opisu, ki sledi v nadaljevanju, so prikazani in opisani le posamezni primeri izvedb. Zato so slike in opis le ilustrativnega značaja in kot take neomejujoče.Figure 15 device according to the invention In the detailed description below, only individual examples of embodiments are shown and described. Therefore, the images and description are of a purely illustrative character and as such are unbridled.

Slika 1 prikazuje blok shemo postopka določanja period višine govora po izumu (izvedbeni primer I). Postopek procesiranja govornega signala se začne s korakom 1010 z odstranitvijo enosmerne komponente. Signal brez enosmerne komponente se v koraku 1020 filtrira z nizkopasovnim filtrom z ničelno fazo. V koraku 1030 se izvede procesiranje avtokorelacije, ki vključuje korak 1031 oz. izračun zaporedja kratkočasovne avtokorelacije ter korak 1032 z določitvijo časovnih indeksov vrhov avtokorelacijskega zaporedja za določitev avtokorelacijskega indeksa časovne periode višine govornega signala. Signal, ki je nastal v koraku 1020 se direktno vodi tudi v korak 1060 oz. pomnilnik zvočnega govora ter v korak 1045, kjer se izvede izračun kratkočasovnega poprečja absolutih trenutnih vrednosti amplitude signala. V koraku 1050 se izvede detektiranje zvočnega/nezvočnega govora, pri čemer je pripeljan tudi signal iz koraka 1032. Zvočni signal iz koraka 1050 se vodi v korak 1060 oz. pomnilnik zvočnega govora in v korak 2080. Signal iz koraka 1032 se vodi tudi v korak 2080, kjer se opravi generiranje koeficientov pasovnoprepustnega filtra. V koraku 2080 dobljeni koeficienti pasovnoprepustnega filtra se vodijo v adaptivni pasovnoprepustni filter z ničelno fazo in s središčno frekvenco Fcf oz. v korak 2070. Koraku 2070 sledi korak 2090 oz. generiranje signala višine govora, kateri se vodi v algoritem označevanja višine period v koraku 1111, v katerega se pripelje tudi signal po odstranitvi enosmerne komponente. Iz korakov 2090, kjer je definiran signal višine govora, in 1010, katerega rezultat je signal z odstranjeno enosmerno komponento, se vodita signala z detektirano polariteto govora, ki je izvedena v koraku 2100, v korak 1110, ki poleg koraka 1111 za označevanje zvočnih segmentov govora, vključuje tudi korak 1112, kjer poteka ob upoštevanju detektiranih meja nezvočnih segmentov v koraku 1050 in signala z odstranjeno enosmerno komponento, označevanje period nezvočnih segmentov govora. Rezultati procesiranja v korakih 1050 in 1110 so zapisani v množice oznak v koraku 1120. Iz koraka 1050 je rezultat procesiranja zapisan v množico oznak zvočnih/nezvočnih segmentov v koraku 1122. Iz korakov 1111 in 1112 sta rezultata procesiranja zapisana v obliki množic oznak period višine govora za zvočne in nezvočne segmente govora v koraku 1121.Figure 1 shows a block diagram of a method of determining a speech speech period according to the invention (embodiment I). The process of processing the voice signal begins with step 1010 by removing the one-way component. The signal without the DC component is filtered in step 1020 by a low-pass zero-phase filter. In step 1030, autocorrelation processing is performed, comprising step 1031 or calculating the short-time autocorrelation sequence and step 1032 by determining the time indices of the peaks of the autocorrelation sequence to determine the autocorrelation index of the time period of the height of the voice signal. The signal generated in step 1020 is also directly run in step 1060 or and in step 1045, where the calculation of the short-time average absolute current value of the signal amplitude is performed. In step 1050, the audio / non-speech speech detection is performed, and the signal from step 1032 is also brought. The acoustic signal from step 1050 is carried out in step 1060 or, and in step 2080. The signal from step 1032 is also carried out in step 2080, whereby the coefficients of the pass-through filter are generated. In step 2080, the resulting filter bandwidth coefficients are fed into the zero-phase adaptive zero-pass filter and the center frequency Fcf or, in step 2070. Step 2070 is followed by a step of 2090 oz. generating a speech signal signal which is guided in the algorithm of the height-height period in step 1111 into which the signal is also output after the removal of the one-way component. From steps 2090, wherein the speech signal signal is defined, and 1010, the result of which is a signal having a one-way component removed, a signal with a detected polarity of speech, carried out in step 2100, is conducted in step 1110, which, in addition to step 1111 for marking audio segments speech, also includes step 1112, whereby taking into account the detected limits of the non-oscillating segments in step 1050 and the signal with the removed one-way component, marking the period of non-speech speech segments. The processing results in steps 1050 and 1110 are recorded in the plurality of labels in step 1120. From step 1050, the result of the processing is recorded in the plurality of audio / oscillation segments in step 1122. From steps 1111 and 1112, the processing results are recorded in the form of plurality of marks of speech height for acoustic and non-voice speech segments in step 1121.

Govorni signal se procesira v korakih 1111, 1112 in 1050. Rezultati tega procesiranja so meje. Meje lahko označujejo zvočni/nezvočni govor - to se določi v koraku 1050, ali pa periode govornega signala. Te se za zvočni govor določa v koraku 1111, za nezvočni govor pa v koraku 1112. Korak 1120 tako ne vključuje več signalov, pač pa združuje rezultate procesiranja in sicer korak 1122 zapis mej zvočnih/nezvočnih segmentov govora in korak 1121 zapis mej/oznak višine (period) zvočnega in nezvočnega govora, ki so bile določene v koraku procesiranja 1111 in 1112.The voice signal is processed in steps 1111, 1112, and 1050. The results of this processing are the limits. The borders may be indicated by sound / non-speech speech - this is determined in step 1050, or the period of the voice signal. This is specified for the voice speech in step 1111, and for non-speech speech in step 1112. The step 1120 thus does not include multiple signals, but combines the processing results, namely step 1122 record of the audio / non-voice speech boundaries and step 1121 of the boundaries / height markers (period) of sound and non-speech speech, which were determined in the processing steps 1111 and 1112.

Postopek označevanja period višine govora po izvedbenem primeru II je prikazan s pomočjo blok sheme na sliki 2. Korak 2080 za generiranje koeficientov filtra je nadomeščen s korakom 2082, ki vključuje množico preddefiniranih koeficientov pasovnoprepustnih filtrov, pri čemer množica koeficientov preddefiniranih filtrov kot vhodna informacija prihaja v korak 2081, ki vključuje modul izbire koeficientov filtra. Izhod iz koraka 2081 je množica koeficientov izbranega filtra, ki je eden od dveh vhodnih podatkov v korak 2070. S tem je dosežena večja računska učinkovitost predlaganega postopka, saj je potrebno manj izračunov za definiranje koeficientov filtra za trenutni govorni segment.The procedure for marking the speech-time period according to embodiment II is illustrated by means of a block diagram in Figure 2. The step 2080 for generating filter coefficients is replaced by step 2082, which includes a plurality of predefined coefficients of bandwidth filters, wherein a plurality of predefined filter coefficients, as input information, step 2081, which includes a filter coefficient filter selection module. The exit from step 2081 is a plurality of the coefficients of the selected filter, which is one of the two input data in step 2070. This achieves a greater computational efficiency of the proposed process since fewer calculations are needed to define the filter coefficients for the current voice segment.

Slika 3 prikazuje potek postopka označevanja period višine govora. S tipko START se sproži postopek, kjer se v koraku 1010 odstrani enosmerna komponenta iz vhodnega govornega signala 1000. V koraku 1020 se rezultirajoč govorni signal iz koraka 1010 filtrira z nizkopasovnim filtrom z ničelno fazo in mejno frekvenco F!p, pri tem je vrednost mejne frekvence F|P višja od najvišje pričakovane vrednosti višine govora vhodnega signala. V primeru nizkofrekvenčnega šuma se lahko vhodni nizkopasovni filter z mejno frekvenco F|P nadomesti s pasovnoprepustnim filtrom z ničelno fazo, ki iz signala odstrani nizkofrekvenčni šum in se s tem poveča robustnost predlagane metode. Izhodni nizkopasovo filtriran govorni signal iz koraka 1020 se procesira s korelacijskim modulom, ki v koraku 1031 izračuna kratkočasovno avtokorelacijo in koraku 1032 poišče časovne indekse vrhov avtokorelacijskega zaporedja. Modul izračuna kratkočasovne avtokorelacije določi pristransko avtokorelacijsko zaporedje za drseče okno analize W različnih dolžin za celotno dolžino vhodnega signala. Za vsak časovni trenutek se za izračun avtokorelacijskega zaporedja iz koraka 1031 uporabi začetno dolgočasovno okno analize WLt- Dobljeno avtokorelacijsko zaporedje se normira na vrednost 1. Normirano avtokorelacijsko zaporedje iz koraka 1031 se v koraku 1032 obdela v modulu iskanja vrhov avtokorelacijske funkcije, kjer se iščeta prva dva harmonska vrhova. Iskana harmonska vrhova sta določena ob upoštevanju vnaprej definiranih pragov avtokorelacijskih vrednosti. Pragovi so določeni z vrednostmi TRA1, TRA2 in TRA3, ki so prikazani na slikah 7, 9 in 11, pri čemer imajo v ilustrativnem primeru pragovi vrednosti TRA1=0.3, TRA2=0.2 in TRA3=0.38. Vrednosti TRA1, TRA2 in TRA3 so eksperimentalno določene ter se lahko za različna zvočna okolja razlikujejo. Glede na vnaprej definirane pragove se ločijo trije različni primeri. V prvem sta oba harmonska vrhova avtokorelacijskega zaporedja večja od definiranih pragov: prvi vrh je večji od TRA1 in drugi vrh večji od TRA2 (slika 7). V tem primeru je segment signala, za katerega je bilo izračunano avtokorelacijsko zaporedje, zvočen (slika 6). V drugem primeru je samo prvi vrh večji od ustreznih pragov TRA1 in TRA3 (slika 9), med tem ko je drugi vrh manjši od praga TRA2. Segment signala za katerega je bilo izračunano kratkočasovno avtokorelacijsko zaporedje je delno periodičen (slika 8). V tretjem primeru noben od prvih dveh harmonskih vrhov avtokorelacijskega zaporedja ne presega ustreznih pragov TRA1 in TRA2 (slika 11). Segment signala za katerega je bilo izračunano kratkočasovno avtokorelacijsko zaporedje je večinsko neperiodičen (slika 10). Če sta oba harmonska vrhova večja od definiranih pragov je indeks prvega vrha avtokorelacijskega zaporedja v koraku 1033, določen kot avtokorelacijski časovni indeks dolgočasovnega okna WLt (slika 3). Če v koraku 1034 časovni indeks okna WLt obstaja, se določi novo dolžino okna analize Wn (kratkočasovno okno), da se izboljša časovno razločljivost postopka. Dolžino kratkočasovnega okna WN se določi v koraku 1035 kot mnogokratnik vrednosti avtokorelacijskega časovnega indeksa dolgočasovnega okna Wlt- Za tako določeno kratkočasovno okno Wn se v koraku 1036 ponovno izračuna avtokorelacijsko zaporedje. V izračunanem avtokorelacijskem zaporedju se v koraku 1037 išče prva dva harmonska vrhova. Če v koraku 1038 vrhova obstajata in presegata pragova TRA1 in TRA 2, ali če samo prvi vrh presega prag TRA3, med tem ko je drugi vrh manjši od TRA2, je časovni indeks avtokorelacijskega zaporedja prvega vrha določen kot avtokorelacijski časovni indeks kratkočasovnega okna Wn. Če noben od vrhov ni večji od definiranih pragov, avtokorelacijski časovni indeks kratkočasovnega okna Wn ni določen. Če avtokorelacijski časovni indeks kratkočasovnega okna WN obstaja, se ga v koraku 1039 primerja z avtokorelacijskim časovnim indeksom dolgočasovnega okna Wlt· če je v koraku 1040 razlika med časovnima indeksoma manjša od pragu TRL1, se v koraku 1041 določi avtokorelacijski časovni indeks kratkočasovnega okna analize WN kot avtokorelacijski časovni indeks višine govora trenutnega okna. če je razlika večja, pa se v koraku 1042 določi avtokorelacijski časovni indeks dolgočasovnega okna analize WLt, kot avtokorelacijski časovni indeks višine govora trenutnega okna. Če avtokorelacijski časovni indeks kratkočasovnega okna analize WN za trenutno okno ni definiran, se določi avtokorelacijski časovni indeks dolgočasovnega okna analize Wlt kot avtokorelacijski časovni indeks višine govora trenutnega okna. če noben od vrhov avtokorelacijskega zaporedja dolgočasovnega okna analize Wlt ni večji od definiranih pragov, za trenutno okno analize avtokorelacijski časovni indeks višine govora ni določen. Postopek se iz koraka 1041 in koraka 1042, kot tudi iz koraka 1034, kjer časovni indeks okna WLt ne obstaja, nadaljuje v koraku 1045.Figure 3 shows the process of marking the speech heights period. With the START key, a procedure is initiated where in step 1010 the one-way component from the input voice signal 1000 is removed. In step 1020, the resulting voice signal from step 1010 is filtered by a low-pass zero-phase filter and a limit frequency F! P, wherein the limit value frequencies F | P is higher than the highest expected value of speech input signal pitch. In the case of low-frequency noise, the input low-pass filter with the limit frequency F | P can be replaced by a zero-phase phase-pass filter which removes the low-frequency noise from the signal, thereby increasing the robustness of the proposed method. The output low-pass filtered voice signal from step 1020 is processed by a correlation module which calculates short-time autocorrelation in step 1031, and step 1032 locates the time indices of the peaks of the autocorrelation sequence. The short-time autocorrelation calculation module determines a biased autocorrelation sequence for the sliding window of the W analysis of different lengths for the entire length of the input signal. For each timestream, the initial boring sequence of the WLt analysis is used to calculate the autocorrelation sequence from step 1031. The obtained autocorrelation sequence is normalized to a value of 1. The standardized autocorrelation sequence of step 1031 is processed in step 1032 in the search module peaks of the autocorrelation function where the first two harmonic peaks. The sought harmonic peaks are determined taking into account the predefined thresholds of the autocorrelation values. The thresholds are determined by the values of TRA1, TRA2 and TRA3 shown in Figures 7, 9 and 11, where in the illustrative case the thresholds of the values of TRA1 = 0.3, TRA2 = 0.2, and TRA3 = 0.38. The TRA1, TRA2 and TRA3 values are experimentally determined and can differ for different sound environments. Depending on the predefined thresholds, three different examples are distinguished. In the first, the two harmonic peaks of the autocorrelation sequence are greater than the defined thresholds: the first peak is larger than TRA1 and the second peak is larger than TRA2 (Fig. 7). In this case, the segment of the signal for which the autocorrelation sequence was calculated sounded (Fig. 6). In the second case, only the first peak is greater than the appropriate TRA1 and TRA3 thresholds (Figure 9), while the second peak is smaller than the TRA2 threshold. The signal segment for which the short-time autocorrelation sequence was calculated is partially periodic (Figure 8). In the third case, none of the first two harmonic peaks of the autocorrelation sequence exceeds the appropriate thresholds TRA1 and TRA2 (Figure 11). The signal segment for which the short-term autocorrelation sequence was calculated is mostly non-periodic (Figure 10). If both harmonic peaks are greater than the defined thresholds, the index of the first peak of the autocorrelation sequence in step 1033 is defined as the auto-correlation time index of the boring window WLt (Figure 3). If, in step 1034, the time index of the WLt window exists, a new length of the Wn analysis window (short time window) is determined to improve the process time resolution. The length of the short-time window WN is determined in step 1035 as a multiple of the auto-correlation time index of the boring window Wlt- For such a short-time window Wn, in step 1036, the autocorrelation sequence is again calculated. In the calculated autocorrelation sequence, in step 1037, the first two harmonic peaks are searched. If there exist and exceeds the TRA1 and TRA 2 thresholds in step 1038, or if only the first peak exceeds the TRA3 threshold, while the second peak is smaller than TRA2, the time index of the first peak autocorrelation sequence is defined as the autocorrelation time index of the short-time window Wn. If none of the peaks is greater than the defined thresholds, the auto-correlation time index of the short-time window Wn is not specified. If the auto-correlation time index of the short-time window WN exists, it is compared in step 1039 to the autocorrelation time index of the boring window Wlt · if in step 1040 the difference between the time indices is less than the threshold TRL1, in step 1041, the autocorrelation time index of the short-time window of the WN analysis autocorrelation time instantaneous speech index index. if the difference is greater, then in the step 1042, the autocorrelation time index of the boring WLt analysis window is determined, as the autocorrelation time index of the pitch of the current window. If the auto-correlation time index of the short-time window of the WN analysis for the current window is not defined, the auto-correction time index of the boring Wlt analysis window is defined as the autocorrelation time index of the pitch of the current window. if none of the peaks of the autocorrelation sequence of the boring window of the Wlt analysis is greater than the defined thresholds, for the current analysis window, the autocorrelation time index of speech height is not specified. The procedure is from step 1041 and step 1042, as well as from step 1034, where the time index of the WLt window does not exist, proceeds in step 1045.

Ob izračunu kratkočasovne avtokorelacije se v koraku 1045 izračuna tudi kratkočasovno poprečje absolutnih trenutnih vrednosti amplitude nizkopasovno filtriranega vhodnega signala. V koraku 1050 modul določanja zvočnih/nezvočnih segmentov uporablja za določitev mej zvočnih/nezvočnih segmentov informacijo o avtokorelacijskih časovnih indeksih višine govora iz koraka 1041 ali koraka 1042 in v koraku 1045 dobljeno krivuljo kratkočasovnega poprečja absolutnih trenutnih vrednosti amplitude signala ter prag TRE1. Če avtokorelacijski časovni indeksi višine govora za trenutni segment obstajajo in je kratkočasovno poprečje absolutnih trenutnih vrednosti amplitude večje od pragu TRE1 za celotni trenutni segment, se označi trenutni segment kot periodičen. Če avtokorelacijski časovni indeksi višine govora za trenutni segment niso določeni ali je vrednost kratkočasovnega poprečja absolutnih trenutnih vrednosti amplitude signala manjše od pragu TRL1, ali če velja oboje, se označi trenutni segment kot nezvočen. Slika 4a prikazuje primer govornega signala, slika 4b normirano krivuljo avtokorelacijskega časovnega indeksa višine govora in slika 4c normirano kratkočasovno poprečje absolutnih trenutnih vrednosti amplitude nizkopasovno filtriranega govornega signala ter pragovno vrednost TRE1. Navpične črte na vseh treh slikah 4a, 4b in 4c označujejo meje zvočnih/nezvočnih segmentov.In calculating the short-time autocorrelation, in the step 1045, the short-time average absolute current amplitude value of the low-pass filtered input signal is calculated. In step 1050, the audio / oscillatory segment determination module uses information on the autocorrelation time indexes of speech height from step 1041 or step 1042 to determine the boundaries of the audio / non-audio segments, and in the step 1045, the short-time average curve of the absolute current values of the amplitude of the signal and the threshold TRE1 is obtained. If the autocorrelation time indices of the speech height for the current segment exist and the short-time average of the absolute current amplitude values is greater than the threshold TRE1 for the entire current segment, the current segment is marked as a periodic one. If the autocorrelation time indices of speech height for the current segment are not determined or the value of the short-time average of the absolute current values of the amplitude of the signal is less than the threshold TRL1, or if both are valid, the current segment is marked as unfolded. Figure 4a shows an example of a voice signal, Fig. 4b is a normalized curve for the autocorrelation time index of speech height, and Fig. 4c is a normalized short-time average of the absolute current values of the amplitude of the low-pass filtered speech signal and the threshold value TRE1. Vertical lines in all three Figures 4a, 4b, and 4c indicate the boundaries of the audio / non-audio segments.

Informacija o avtokorelacijskem časovnem indeksu višine govora je v koraku 2080 posredovana modulu generiranja koeficientov filtra, kjer je vrednost indeksa uporabljena za definiranje središčne frekvence FCf pasovnoprepustnega filtra, na osnovi česar so izračunani koeficienti filtra. Pri tem je uporabljena funkcija preslikave, ki preslika vrednost avtokorelacijskega časovnega indeksa višine govora v vrednost središčne frekvence pasovnoprepustnega filtra Fcf. V koraku 2070 adaptivni pasovnoprepustni filter z ničelno fazo uporablja izračunane koeficiente filtra in filtrira vhodni nizkopasovno filtriran govorni signal tako, da izloči signal višine govora. Slika 5 kaže primer zvočnega govornega signala in signal višine govora, ki je bil izločen iz govornega signala s pomočjo filtriranja govornega signala z adaptivnim pasovnoprepustnim filtrom z ničelno fazo. Naslednji korak je postopek označevanja period višine govora. V koraku 1111 se v modulu označevanja period višine govora najprej izvede označevanje period na izločenem signalu višine govora. To se izvede z določanjem polperiod signala višine govora in določitvijo oznake periode višine govora za signal višine govora v točkah prečkanja nivoja nič, kot začetni in končni otipek periode signala višine govora znotraj katerih sta prisotna pozitivni in negativni vrh periode. Perioda višine govora, ki jo označujeta oznaki višine govora določa območje oznak periode višine govora. V nadaljevanju se izvede preslikavo oznak period signala višine govora v oznake period višine govornega signala. Preslikave se lahko izvedejo v tri različne točke: a) oznake višine se postavi v točko negativnega vrha periode govornega signala (slika 12), b) oznake višine govora se postavi v točke pozitivnega vrha periode govornega signala (slika 13), c) oznake višine govora se postavi v točke prečkanja nivoja nič, ki predstavljajo začetni in končni otipek periode, znotraj katere se nahajata pozitivni in negativni vrh govornega signala (slika 14). V slikah 12, 13 in 14 puščice kažejo preslikavo oznak period višine govora signala višine govora v oznake period višine govora govornega signala. Pri označevanju vrhov, kot točk oznak period višine govora, se v koraku 2100 (slika 1) uporabi detekcijo polaritete govora, s katero se definira ali se mora označevati pozitivne ali negativne vrhove. V koraku 1112 se definira oznake period višine govora za neperiodične segmente na osnovi pravil. Pravila lahko definirajo položaj oznak period višine govora za nezvočne segmente govora v konstantnih časovnih intervalih izbrane dolžine, ali kot poprečno vrednost razdalj med oznakami period višine govora za zvočne segmente ali določajo razdaljo med oznakami period na osnovi kakorkoli drugače definiranih statističnih karakteristik razdalj med oznakami period višin govora zvočnih segmentov govora.The information about the autocorrelation time index of the speech height is in the step 2080 transmitted to the filter coefficient generation module where the index value is used to define the center frequency of the FCf band pass filter, based on which the filter coefficients are calculated. In this case, a mapping function is used that maps the value of the autocorrelation time index of speech height to the center frequency of the Fcf pass filter. In step 2070, the zero-phase adaptive zero-phase filter filter utilizes the calculated filter coefficients and filters the input low-pass filtered speech signal to eliminate the speech signal signal. Figure 5 shows an example of an acoustic speech signal and a speech signal which was excluded from the speech signal by means of filtering the voice signal with a zero-phase adaptive zero-pass filter. The next step is the process of marking the period of speech. In step 1111, in the tagging module, the speech-period period is first performed with the marking period on the excluded speech signal signal. This is done by determining the half-height of the speech signal signal and determining the height of the speech period sign for the speech height signal at the zero-crossing points as the initial and final touch of the speech-signal period within which the positive and negative peaks of the period are present. The speech-span period, which is indicated by the speech level marker, determines the area of the speech level of the period. Below, a mapping of the signs of the signal height of the speech signal to the marks of the height of the speech signal is performed. The mapping can be carried out at three different points: a) the height markings are placed at the point of the negative peak of the speech signal period (Figure 12), b) the height of the speech level is placed in the points of the positive peak of the speech signal period (Figure 13), c) the speech is placed at zero-point crossing points, which represent the initial and final touch of the period, within which the positive and negative peak of the speech signal are located (Figure 14). In Figures 12, 13 and 14, the arrows indicate a mapping of the marks of the speech height of the speech signal signal to the codes of the speech signal height of the speech signal. When marking peaks, as points of speech speech period, in step 2100 (Figure 1), the detection of polarity is used to define or have positive or negative peaks. In step 1112, the height-of-speech markings for non-periodic segment-based rules are defined. The rules can define the position of the marks of the speech height for non-speech speech segments at constant time intervals of the selected length, or as the transversal value of the distances between the marks of the speech height for sound segments, or determine the distance between the period marks based on any differently defined statistical characteristics of the distances between the marks height period Speech speech segments.

Slika 5 kaže primer zvočnega segmenta govornega signala in signal višine govora, izločen iz zvočnega govora z adaptivnim pasovnoprepustnim filtrom z ničelno fazo. Na sliki 6 je prikazan primer segmenta zvočnega govornega signala. Slika 7 kaže primer zaporedja pristranske kratkočasovne avtokorelacije zvočnega segmenta govornega signala z vrednostjo prvega vrha P1, večjo od pragu TRA1, in vrednostjo drugega vrha P2, večjo od praga TRA2, pri tem je zaporedje avtokorelacije izračunano za zvočni segment dolžine WLt govornega signala, prikazanega na sliki 6.FIG. 5 shows an example of an audio signal segment and a speech signal excluded from the voice speech with a zero-phase adaptive zero-pass filter. Figure 6 shows an example of an audio speech signal segment. Figure 7 shows an example of the sequence of a biased short-time autocorrelation of the voice segment of the speech signal with a value of the first peak P1 greater than the threshold TRA1 and the value of the second peak P2 greater than the TRA2 threshold, wherein the autocorrelation sequence is calculated for the audio segment of the length WLt of the speech signal shown on Fig. 6.

Slika 8 kaže primer prehoda iz zvočnega v nezvočni govorni signal. Slika 9 kaže primer pristranskega avtokorelacijskega zaporedja z vrednostjo prvega vrha P1, večjo od pragov TRA 1 in TRA3, in vrednostjo drugega vrha P2, manjšo od pragu TRA2, izračunano za primer prehoda iz zvočnega v nezvočni govorni signal segmenta dolžine Wlt, ki ga kaže slika 8.Figure 8 shows an example of a transition from an audio to a non-voice speech signal. Figure 9 shows an example of a biased autocorrelation sequence with a value of the first peak P1 greater than the TRA 1 and TRA3 thresholds and a value of the second peak P2 smaller than the TRA2 threshold calculated for the transition from an audio to an unfocused speech signal of the length of the Wlt segment shown by the image 8.

Slika 10 kaže primer prehoda signala iz zvočnega v nezvočni govor - večji del signala je nezvočen. Slika 11 kaže primer pristranskega avtokorelacijskega zaporedja, ki nima nobenih vrednosti vrhov večjih od TR1 in TR2, izračunanega za primer prehoda signala iz zvočnega v nezvočni govor za govorni segment dolžine Wi_t, ki ga kaže slika 10.Figure 10 shows an example of a signal passing from sound to speechless - most of the signal is not heard. Figure 11 shows an example of a biased autocorrelation sequence that has no peak values greater than TR1 and TR2 calculated for the case of the transition of the signal from the audio to the non-speech language for the speech segment of the Wi_t length as shown in Figure 10.

Slika 12 kaže primer govornega signala z označenimi periodami višine govora (navpične črtkane črte), pri tem so izbrane točke označitve period višine govora negativni vrhovi govornega signala. Puščice kažejo preslikavo oznak period višine za signal višine govora, ki označujejo točke prečkanja nivoja nič periode signala višine govora, v oznake period višine govora, definirane v točkah negativnih ekstremov govornega signala.Figure 12 shows an example of a speech signal with indicated periods of speech height (vertical dotted line), with the selected marking points of speech height being the negative peaks of the speech signal. Arrows indicate the mapping of the height-height markings for a speech signal signal indicating the zero-level crossing points of the speech signal level in the speech-span period codes defined in the points of the negative extremes of the speech signal.

Slika 13 kaže primer govornega signala s prikazanimi oznakami periode višine govora (navpične črtkane črte), kjer oznake višine govora označujejo pozitivne vrhove. Puščice označujejo preslikavo oznak višine govora signala višine govora, ki označujejo točke prečkanja nivoja nič periode signala višine govora, v točke period višine govora, definirane v točkah pozitivnih vrhov govornega signala.Figure 13 shows an example of a speech signal with the displayed speech marks of the speech height (vertical dotted line), where the height of speech indicates positive peaks. Arrows denote the mapping of the height of the speech signal of the speech height indicating the points of the crossing of the level of zero period of the speech signal signal, in the points of the speech height, defined in the points of the positive peaks of the speech signal.

Slika 14 kaže primer govornega signala s prikazanimi oznakami periode višine govornega signala (navpične črtkane črte), kjer oznake višine govora označujejo začetni in končni otipek periode višine govora, ki zajema pozitivni in negativni vrh. Puščice označujejo preslikavo oznak višine govora signala višine govora, ki označujejo točke prečkanja nivoja nič periode signala višine govora, v oznake period višine govora, definirane v točkah prečkanja nivoja nič govornega signala.Figure 14 shows an example of a speech signal with the indications of the period of the height of the speech signal (vertical dotted line), where the height level of speech indicates the start and end of the speech period, which includes a positive and negative peak. Arrows denote the mapping of the height of the speech signal of the speech height indicating the points of the crossing of the level zero of the period of the speech signal signal in the marks of the speech height defined at the points of crossing the level of zero speech signal.

Napravo za označevanje periode višine govora (slika 15) izvedena na osnovi postopka po izumu sestavlja enota odstranjevanja enosmerne komponente 101 Oa, nizkopasovni filter z ničelno fazo 1020a, enota izračuna avtokorelacije in iskanja vrhov avtokorelacijskega zaporedja 1030a, enota izračuna kratkočasovnega poprečja absolutnih trenutnih vrednosti amplitude signala 1045a, enota detektiranja zvočnega/nezvočnega govora 1050a, pomnilnik zvočnega govora 1060a, enota generiranja koeficientov pasovnoprepustnega filtra 2080a, adaptivni pasovnoprepustni filter z ničelno fazo 2070a, enota označevanja periode višine govora 1110a z modulom označevanja zvočnega govora 1111 a in nezvočnega govora 1112a ter enota detekcije polaritete govora 2100a.The device for marking the period of the speech height (Figure 15) carried out on the basis of the process according to the invention consists of the unit of removal of the one-way component 101 Oa, the low-pass filter with the zero phase 1020a, the autocorrelation calculation unit and the peak search of the autocorrelation sequence 1030a, the unit of calculation of the short-time average absolute current values of the amplitude of the signal 1045a, audio / non-voice speech detection unit 1050a, voice speech memory 1060a, a 2080a filter bandwidth generation unit, a zero-phase adaptive belt filter 2070a, a speech signaling unit 1110a with a speech recognition module 1111a and a speechless speaker 1112a and a detection unit speech polarity 2100a.

Končni rezultat predlaganega postopka je množica oznak iz koraka 1120 (slika 1). Množica oznak period višine govora za zvočni/nezvočni govor iz koraka 1121 in množica oznak zvočnih/nezvočnih segmentov iz koraka 1122. Z namenom zmanjšanja računske zahtevnosti postopka je v izvedbenem primeru II modul generiranja koeficientov filtrov iz koraka 2080 zamenjan z množico v naprej definiranih koeficientov pasovnoprepustnih filtrov iz koraka 2082 (Slika 2) in modulom izbire koeficientov filtra iz koraka 2081. Podatka o detektiranju zvočnih/nezvočnih segmentov govornega signala iz koraka 1060 in avtokorelacijskega časovnega indeksa višine govora iz koraka 1032 sta uporabljena kot kriterija za izbiro ustrezne množice koeficientov filtra iz množice v naprej definiranih koeficientov pasovnoprepustnih filtrov iz koraka 2082. S predlagano rešitvijo so zagotovljena sredstva za robustno in zanesljivo določitev oznak period višine govora z visoko časovno razločljivostjo. Ker se višina govora s časom spreminja, je za izločitev signala višine govora časovno spreminjajočega se govornega signala uporabljen časovno spremenljiv filter. Računanje koeficientov adaptivnega filtra zahteva uporabo kompleksnih iterativnih algoritmov. Cilj filtriranja v predlaganem izumu je izločitev signala višine govora iz govornega signala z uporabo ustrezno definiranih koeficientov adaptivnega pasovnoprepustnega filtra z ničelno fazo in s središčno frekvenco FCf.The final result of the proposed process is the plurality of labels from step 1120 (FIG. 1). A plurality of markings for a speech / speech speech period from step 1121 and a plurality of audio / oscillation segments from step 1122. In order to reduce the computational complexity of the process, in the embodiment II, the filter coefficient module of step 2080 is replaced by a plurality of predefined coefficients of bandwidth of the filters of step 2082 (FIG. 2) and a filter selection module of step 2081. The data on the detection of the audio / non-voice segments of the speech signal from step 1060 and the autocorrelation speech speech index index from step 1032 are used as criteria for selecting the corresponding plurality of filter coefficients from the plurality in the predefined coefficients of the band-pass filters of step 2082. The proposed solution provides means for robust and reliable determination of the markings of the speech-time period with high time resolution. Since speech time is changing over time, a time-varying filter is used to exclude the speech signal pitch of a time-varying voice signal. Calculation of adaptive filter coefficients requires the use of complex iterative algorithms. The aim of the filtering in the present invention is to eliminate the speech signal signal from the voice signal using the appropriate zero-phase adaptive zero-pass filter and coefficient FCf.

Pričujoči postopek, kot tudi naprava po tem postopku, izkazuje visoko stopnjo neobčutljivosti na šum in visoko časovno razločljivost, brez učinka oknjenja in poprečenja pri označevanju period višine govora in zvočnih/nezvočnih segmentov govornega signala.The present procedure, as well as the device according to this process, shows a high degree of noise sensitivity and high time resolution, with no effect of rotation and transitions in marking the period of speech and audio / non-voice segments of the speech signal.

Uporaba postopka po izumu je možna na področjih procesiranja govornega signala za izboljšanje kvalitete govora ob upoštevanju višine govora, v klinični diagnostiki, pri kodiranju govora, avtomatski fonetični segmentaciji, analizi in procesiranju govora ob upoštevanju višine govora, karakterizaciji govorca, pretvarjanju glasu, razpoznavanju govora in sintezi govora.The use of the method according to the invention is possible in the areas of speech signal processing to improve speech quality, taking into account the speech level, in clinical diagnostics, speech encoding, automatic phonetic segmentation, speech analysis and processing, taking into account speech height, characterization of the speaker, voice converting, speech recognition and synthesis of speech.

Claims (8)

Patentni zahtevkiPatent claims 1. Postopek za označevanje periode višine govora in zvočnih/nezvočnih segmentov po izumu vključuje naslednje korake: - iz govornega signala se najprej odstrani enosmerna komponenta, čemur sledi filtriranje z nizkopasovnim filtrom z ničelno fazo in mejno frekvenco F|P.; - izhodni filtriran govorni signal se uporabi za izračun kratkočasovnega poprečja absolutnih trenutnih vrednosti amplitude signala in izračun zaporedja kratkočasovne pristranske avtokorelacije ob uporabi drsečega kratkočasovnega okna analize s spremenljivo dolžino; - poišče se vrednost prvih dveh harmonično povezanih vrhov (harmonskih vrhov) kratkočasovnega avtokorelacijskega zaporedja in definira časovni indeks višine govora avtokorelacijskega zaporedja; - na osnovi vrednosti časovnega indeksa avtokorelacijskega zaporedja se definira vrednost središčne frekvence FCf pasovnoprepustnega filtra in se izračunajo koeficienti filtra; - nizkopasovno filtriran segment govora, za katerega se definira časovni indeks višine govora avtokorelacijskega zaporedja, se filtrira z adaptivnim pasovnoprepustnim filtrom z ničelno fazo, ki uporablja izračunane koeficiente filtra; tako filtriran signal predstavlja signal višine govora govornega signala; - izračuna se kratkočasovno poprečje absolutnih trenutnih vrednosti amplitude nizkopasovno filtriranega govornega signala, ki se uporabi za določitev zvočnih segmentov; - v primeru, da so za trenutni segment govora časovni indeksi višine govora avtokorelacijskega zaporedja definirani in so kratkočasovna poprečja absolutnih trenutnih vrednosti amplitude signala večja od pragu TRE1, se označi trenutni segment kot zvočni segment, v nasprotnem se označi trenutni segment kot nezvočni; - za zvočni govor se določi oznake periode signala višine govora v točkah prečkanja nivoja nič, kjer dve sosednji oznaki predstavljata začetek in konec periode signala višine govora, znotraj katere se pojavita pozitiven in negativen vrh signala; - v zadnji fazi se izvede preslikava oznak period višine govora signala višine govora v oznake period višine govora govornega signala; - položaj oznak višine govora se za nezvočne segmente definira na osnovi pravil določanja mej nezvočnih segmentov, ki določajo meje v konstantnih časovnih intervalih izbrane dolžine, ali kot poprečno vrednost razdalj med oznakami period višin govora za zvočne segmente ali določajo razdaljo med oznakami period na osnovi kakorkoli drugače definiranih statističnih karakteristikah razdalj med oznakami period višine govora zvočnih segmentov govora.A method for marking the speech and audio / oscillation segments of the invention according to the invention includes the following steps: - a single component is first removed from the voice signal, followed by zero-phase low-pass filtering and the F | P boundary frequency; - the output filtered speech signal is used to calculate the short-time average of the absolute current values of the amplitude of the signal and to calculate the sequence of short-time biased autocorrelation using a sliding, short-time variable-length analysis window; - the value of the first two harmonically connected peaks (harmonic peaks) of the short-time autocorrelation sequence is found and defines the time index of the speech angle of the autocorrelation sequence; - based on the value of the time index of the autocorrelation sequence, the value of the center frequency FCf of the pass-through filter is defined and the filter coefficients are calculated; - a low-pass filtered speech segment, for which the autocorrelation sequence speech timing index is defined, is filtered by a zero-phase adaptive zero-pass filter, using the calculated filter coefficients; such a filtered signal is a signal of speech speech speech signal; - the short-time average of the absolute current values of the amplitude of the low-pass filtered speech signal, used to determine the audio segments, is calculated; - in the case where the temporal speech level indices of the autocorrelation sequence are defined and the short-time frequencies of the absolute current values of the amplitude of the signal are greater than the TRE1 threshold, the current segment is denoted as the audio segment; otherwise, the current segment is denoted as the unfinished; - for speech speech, the signs of the speech signal level are specified at zero level crossing points, where the two adjacent marks represent the beginning and end of the period of the speech signal signal within which a positive and negative signal peak appears; - in the last phase, a mapping of the marks of the speech height of the speech signal signal to the markings of the speech signal height of the speech signal is performed; - the position of the speech level markings for non-singular segments is defined on the basis of the rules for determining the boundaries of the non-oscillating segments that define the boundaries at constant time intervals of the selected length, or as the transversal value of the distances between the markings, the periods of speech heights for the sound segments, or determine the distance between the marks period based on any differently defined statistical characteristics of the distances between the signs of the speech speech level of speech segments of the speech. 2. Postopek po zahtevku 1, značilen po tem, da se procesiranje govornega signala začne s korakom (1010) z odstranitvijo enosmerne komponente; da se signal brez enosmerne komponente v koraku (1020) filtrira z nizkopasovnim filtrom z ničelno fazo; da se v koraku (1030) izvede procesiranje avtokorelacije, ki vključuje korak (1031) oz. izračun zaporedja kratkočasovne avtokorelacije ter korak (1032) z določitvijo časovnih indeksov vrhov avtokorelacijskega zaporedja za določitev avtokorelacijskega indeksa časovne periode višine govornega signala; da se signal, ki je nastal v koraku (1020) direktno vodi tudi v korak (1060) oz. pomnilnik zvočnega govora ter v korak (1045), kjer se izvede izračun kratkočasovnega poprečja absolutih trenutnih vrednosti amplitude signala; da se v koraku (1050) izvede detektiranje zvočnega/nezvočnega govora, pri čemer je pripeljan tudi signal iz koraka (1032); da se zvočni signal iz koraka (1050) vodi v korak (1060) oz. pomnilnik zvočnega govora in v korak (2080); da se signal iz koraka (1032) vodi tudi v korak (2080), kjer se opravi generiranje koeficientov pasovnoprepustnega filtra; da se v koraku (2080) dobljeni koeficienti pasovnoprepustnega filtra vodijo v adaptivni pasovnoprepustni filter z ničelno fazo in s središčno frekvenco FCf oz. v korak (2070); da sledi koraku (2070) korak (2090) oz. generiranje signala višine govora, kateri se vodi v algoritem označevanja period višine govora v koraku (1111), v katerega se pripelje tudi signal po odstranitvi enosmerne komponente; da se vodita signala z detektirano polariteto govora, ki je izvedena v koraku (2100), v korak (1110), ki poleg koraka (1111) za označevanje zvočnih segmentov govora, vključuje tudi korak (1112), kjer poteka ob upoštevanju detektiranih meja nezvočnih segmentov v koraku (1050) in signala z odstranjeno enosmerno komponento, označevanje period nezvočnih segmentov govora; da so rezultati procesiranja v korakih (1050 in 1110) zapisani v množice oznak v koraku (1120); da je iz koraka (1050) rezultat procesiranja zapisan v množico oznak zvočnih/nezvočnih segmentov v koraku (1122); da sta iz korakov (1111 in 1112) rezultata procesiranja zapisana v obliki množic oznak period višine govora za zvočne in nezvočne segmente govora v koraku (1121).Method according to claim 1, characterized in that the processing of the voice signal begins with step (1010) by removing the one-way component; characterized in that the non-DC component signal in step (1020) is filtered by a zero-phase low-pass filter; in step (1030), processing of the autocorrelation is carried out, comprising the step (1031) or calculating the short-time autocorrelation sequence and step (1032) by determining the time indices of the peaks of the autocorrelation sequence to determine the autocorrelation index of the time period of the height of the voice signal; that the signal generated in step (1020) is directly guided in step (1060) or and a step (1045) where the calculation of the short-time average absolute current value of the amplitude of the signal is performed; in the step (1050), the detection of audio / non-voice speech is performed, and the signal from step (1032) is also brought; characterized in that the sound signal from step (1050) is conducted in step (1060) or sound recorder and in step (2080); characterized in that the signal from step (1032) is also carried out in step (2080) where the coefficients of the band pass filter are carried out; in step (2080), the resulting coefficients of the pass-through filter are fed into the zero-phase adaptive zero-pass filter and with the center frequency FCf or, in step (2070); to follow the step (2070) step (2090) or generating a speech signal signal, which is guided in the algorithm of indicating the speech-height period in step (1111), into which the signal is also output after the removal of the one-way component; in a step (1110), which, in addition to the speech speech segments step (1111), also includes a step (1112), which takes place with respect to the detected limits of the non-audio signals segments in step (1050) and a signal with the removed one-way component, marking the period of non-speech speech segments; characterized in that the processing results in steps (1050 and 1110) are recorded in the plurality of labels in step (1120); characterized in that from step (1050) the result of processing is recorded in a plurality of audio / oscillation segments in step (1122); that steps from (1111 and 1112) of the processing result are recorded in the form of plurality of marks of speech height for speech and non-voice speech segments in step (1121). 3. Postopek po zahtevku 2, značilen po tem, da je korak za generiranje koeficientov filtra sestavljen iz koraka (2082), ki vključuje množico preddefiniranih koeficientov pasovnoprepustnih filtrov, pri čemer množica kot vhodna informacija prihaja v korak (2081), ki vključuje modul izbire koeficientov filtra. Izhodni podatek iz koraka (2081) je izbrana množica koeficientov filtra za pasovnoprepustni filter s središčno frekvenco Fcf.Method according to claim 2, characterized in that the step for generating the filter coefficients consists of a step (2082) including a plurality of predefined coefficients of bandwidth filters, wherein the plurality of input information comes in step (2081), which includes a selection module filter coefficients. The output from step (2081) selected a plurality of filter coefficients for a pass filter with a center frequency Fcf. 4. Postopek po enem od predhodnih zahtevkov od 1 do 3, značilen po tem, da poteka tako, da se sproži tipalo START, kjer se v koraku (1010) odstrani enosmerna komponenta iz vhodnega govornega signala (1000); da se v koraku (1020) rezultirajoč govorni signal iz koraka (1010) filtrira z nizkopasovnim filtrom z ničelno fazo in mejno frekvenco F|P, pri tem je vrednost mejne frekvence F|P višja od najvišje pričakovane vrednosti višine govora vhodnega signala; da se v primeru nizkofrekvenčnega šuma lahko vhodni nizkopasovni filter z mejno frekvenco F|P nadomesti s pasovnoprepustnim filtrom z ničelno fazo, ki iz signala odstrani nizkofrekvenčni šum; da se izhodni nizkopasovno filtriran govorni signal iz koraka (1020) procesira s korelacijskim modulom, ki v koraku (1031) izračuna kratkočasovno avtokorelacijo in koraku (1032) poišče časovne indekse vrhov avtokorelacijskega zaporedja; da se v modulu izračuna kratkočasovne avtokorelacije določi pristransko avtokorelacijsko zaporedje za drseče okno analize W različnih dolžin za celotno dolžino vhodnega signala.; da se za vsak časovni trenutek za izračun avtokorelacijskega zaporedja iz koraka (1031) uporabi začetno dolgočasovno okno analize WLt; da se dobljeno avtokorelacijsko zaporedje normira na vrednost 1; da se normirano avtokorelacijsko zaporedje iz koraka (1031) v koraku (1032) obdela v modulu iskanja vrhov avtokorelacijske funkcije, kjer se iščeta prva dva harmonska vrhova; da sta iskana harmonska vrhova določena ob upoštevanju vnaprej definiranih pragov poprečja absolutne amplitude; da so pragovi določeni z vrednostmi TRA1, TRA2 in TRA3; da je v primeru, ko sta oba harmonska vrhova večja od definiranih pragov indeks prvega vrha avtokorelacijskega zaporedja v koraku (1033), določen kot avtokorelacijski časovni indeks dolgočasovnega okna Wlt; da se v primeru, če v koraku (1034) časovni indeks okna Wlt obstaja, določi novo dolžino okna analize Wn (kratkočasovno okno), da se izboljša časovno razločljivost postopka; da se dolžino kratkočasovnega okna WN določi v koraku (1035) kot mnogokratnik vrednosti avtokorelacijskega časovnega indeksa dolgočasovnega okna Wlt; da se za tako določeno kratkočasovno okno Wn v koraku (1036) ponovno izračuna avtokorelacijsko zaporedje; da se v izračunanem avtokorelacijskem zaporedju v koraku (1037) išče prva dva harmonska vrhova; da je v primeru, če v koraku (1038) vrhova obstajata in presegata pragova TRA1 in TRA2, ali če samo prvi vrh presega prag TRA3, med tem ko je drugi vrh manjši od TRA2, časovni indeks avtokorelacijskega zaporedja prvega vrha določen kot avtokorelacijski časovni indeks kratkočasovnega okna WN; da če noben od vrhov ni večji od definiranih pragov, avtokorelacijski časovni indeks kratkočasovnega okna Wn ni določen; da v primeru, če avtokorelacijski časovni indeks kratkočasovnega okna Wn obstaja, se ga v koraku (1039) primerja z avtokorelacijskim časovnim indeksom dolgočasovnega okna WLt; da se v primeru, če je v koraku (1040) razlika med časovnima indeksoma manjša od pragu TRL1, v koraku (1041) določi avtokorelacijski časovni indeks kratkočasovnega okna analize Wn kot avtokorelacijski časovni indeks višine govora trenutnega okna; da se v primeru, če je razlika večja, v koraku (1042) določi avtokorelacijski časovni indeks dolgočasovnega okna analize WLt, kot avtokorelacijski časovni indeks višine govora trenutnega okna; da se v primeru, če avtokorelacijski časovni indeks kratkočasovnega okna analize Wn za trenutno okno ni definiran, določi avtokorelacijski časovni indeks dolgočasovnega okna analize WLt kot avtokorelacijski časovni indeks višine govora trenutnega okna; da v primeru, če noben od vrhov avtokorelacijskega zaporedja dolgočasovnega okna analize Wlt ni večji od definiranih pragov, za trenutno okno analize avtokorelacijski časovni indeks višine govora ni določen; da sta signala iz koraka (1041) in koraka (1042) tako kot signal iz koraka (1034), kjer časovni indeks okna Wlt ne obstaja, vhodni signal v koraku (1045).The method according to one of the preceding claims 1 to 3, characterized in that the START sensor is triggered, wherein in the step (1010) the one-way component from the input voice signal (1000) is removed; in step (1020), the resulting voice signal from step (1010) is filtered by a low-pass zero-phase filter and a limit frequency F | P, where the value of the limit frequency F | P is higher than the highest expected value of the pitch of the input signal; in the case of a low-frequency noise, an input lowband filter with a limit frequency F | P can be replaced by a zero-phase band-pass filter which removes the low-frequency noise from the signal; wherein the output low-pass filtered speech signal from step (1020) is processed by a correlation module which calculates a short-time autocorrelation in step (1031) and finds the time indexes of the peaks of the autocorrelation sequence in step (1032); in the short-time autocorrelation calculation module, a bi-directional autocorrelation sequence for sliding window W analysis of different lengths is determined for the total length of the input signal; for the time taken to calculate the autocorrelation sequence from step (1031), the initial long delay window of the WLt analysis is used; that the obtained autocorrelation sequence is normalized to a value of 1; characterized in that the standardized autocorrelation sequence from step (1031) in step (1032) is processed in the peak search module of the autocorrelation function, where the first two harmonic peaks are sought; that the harmonic peaks sought are determined by taking into account the predefined thresholds of the absolute amplitude; the thresholds are determined by the values of TRA1, TRA2 and TRA3; in the case where both harmonic peaks are larger than the defined thresholds, the index of the first peak of the autocorrelation sequence in step (1033) is defined as the autocorrelation time index of the boring window Wlt; in the event that in step (1034) the time index of the Wlt window exists, a new length of the Wn analysis window (short-time window) is determined to improve the time resolution of the process; wherein the length of the short-time window WN is determined in step (1035) as a multiple of the value of the auto-correction time index of the boring window Wlt; in such a way that the short-time window Wn in step (1036) is again calculated the autocorrelation sequence; in the calculated autocorrelation sequence, the first two harmonic peaks are searched in step (1037); in the event that in the step (1038) of the peaks there are and exceed the thresholds TRA1 and TRA2, or if only the first peak exceeds the TRA3 threshold, while the second peak is smaller than TRA2, the time index of the first peak autocorrelation sequence is defined as the autocorrelation time index short time window WN; that if none of the peaks is greater than the defined thresholds, the auto-correction time index of the short-time window Wn is not specified; in the case where the auto-correlation time index of the short-time window Wn exists, in step (1039) it is compared with the autocorrelation time index of the boring window WLt; in the event that in step (1040) the difference between the time indices is lower than the TRL1 threshold, in step (1041) the auto-correlation time index of the short-time window of analysis Wn is defined as the autocorrelation time index of the pitch of the current window; in the event that the difference is greater, in the step (1042), the auto-correlation time index of the tedious window of the WLt analysis is determined, such as the autocorrelation time index of the pitch of the current window; in the event that the auto-correlation time index of the short-time window of analysis Wn for the current window is not defined, determine the autocorrelation time index of the boring WLt analysis window as the autocorrelation time index of the pitch of the current window; in the case that none of the peaks of the autocorrelation sequence of the boring window of the Wlt analysis is greater than the defined thresholds, for the current analysis window, the autocorrelation time index of speech height is not specified; characterized in that the signals from step (1041) and step (1042) are the same as the signal from step (1034), where the time index of the Wlt window does not exist, the input signal in step (1045). 5. Postopek po enem zahtevku 4, značilen po tem, da imajo pragovi vrednosti TRA1=0.3, TRA2=0.2 in TRA3=0.38.Method according to one of Claims 4, characterized in that the thresholds of the values of TRA1 = 0.3, TRA2 = 0.2 and TRA3 = 0.38. 6. Postopek po enem zahtevku 4, značilen po tem, da so vrednosti TRA1, TRA2 in TRA3 eksperimentalno določene ter se lahko za različna zvočna okolja razlikujejo.Method according to one of Claims 4, characterized in that the values of TRA1, TRA2 and TRA3 are experimentally determined and can vary for different sound environments. 7. Naprava za označevanje period višine govora in zvočnih/nezvočnih segmentov govora osnovana na postopku po kateremkoli od prejšnjih zahtevkov, ki vključuje: - enoto za odstranjevanje enosmerne vrednosti, - nizkopasovni filter z ničelno fazo, - enoto za izračun avtokorelacijskega zaporedja in iskanja maksimumov avtokorelacijskega zaporedja; - enoto za izračun kratkočasovnega poprečja absolutnih trenutnih vrednosti amplitude signala, - enoto detekcije zvočnih/nezvočnih segmentov signala, - pomnilnik zvočnega govora, - enoto generiranja koeficientov pasovnoprepustnega filtra, - adaptivni pasovnoprepustni filter z ničelno fazo, - enoto detektiranja polaritete govora in - enoto označevanja period višine govora.7. A device for marking the speech and audio / non-speech speech segments based on a method according to any one of the preceding claims, comprising: - a unit for removing the DC value, - a zero-phase low-pass filter, - a unit for calculating the autocorrelation sequence and searching for the peaks of the autocorrelation sequences; - unit for calculating the short-time average absolute current value of the amplitude of the signal, -the unit of detection of the audio / non-audio segments of the signal, -the voice speech memory, -the unit of generation of the coefficients of the bandwidth filter, -the adaptive zero-phase filter, -the voice polling detecting unit, Speech height. 8. Uporaba rezultatov postopka po kateremkoli od zahtevkov od 1 do 6, značilna po tem, da se uporabijo za izboljšanje kvalitete govora ob upoštevanju višine govora, v klinični diagnostiki, pri kodiranju govora, avtomatski fonetični segmentaciji, analizi in procesiranju govora ob upoštevanju višine govora, karakterizaciji govorca, pretvarjanje glasu, razpoznavanju govora in sintezi govora.Use of the process results according to any one of claims 1 to 6, characterized in that they are used to improve speech quality by considering the speech level, in clinical diagnostics, speech encoding, automatic phonetic segmentation, speech analysis and processing, taking into account speech height , speaker characterization, voice converting, speech recognition and speech synthesis.
SI201600184A 2016-08-02 2016-08-02 The process and the device for marking the period of speech pitch and audio/non-audio segments SI25265A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
SI201600184A SI25265A (en) 2016-08-02 2016-08-02 The process and the device for marking the period of speech pitch and audio/non-audio segments
PCT/SI2017/000007 WO2018026329A1 (en) 2016-08-02 2017-04-25 Pitch period and voiced/unvoiced speech marking method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
SI201600184A SI25265A (en) 2016-08-02 2016-08-02 The process and the device for marking the period of speech pitch and audio/non-audio segments

Publications (1)

Publication Number Publication Date
SI25265A true SI25265A (en) 2018-02-28

Family

ID=59067869

Family Applications (1)

Application Number Title Priority Date Filing Date
SI201600184A SI25265A (en) 2016-08-02 2016-08-02 The process and the device for marking the period of speech pitch and audio/non-audio segments

Country Status (2)

Country Link
SI (1) SI25265A (en)
WO (1) WO2018026329A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11443761B2 (en) 2018-09-01 2022-09-13 Indian Institute Of Technology Bombay Real-time pitch tracking by detection of glottal excitation epochs in speech signal using Hilbert envelope
CN116432007B (en) * 2023-06-13 2023-08-22 天津精仪精测科技有限公司 Optical fiber early warning mode identification method based on airspace characteristics and machine learning

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6119718A (en) 1984-07-06 1986-01-28 Nippon Steel Corp Operating method of exhaust gas disposing system of closed-type converter in abnormal time
US6490562B1 (en) 1997-04-09 2002-12-03 Matsushita Electric Industrial Co., Ltd. Method and system for analyzing voices
US6470311B1 (en) 1999-10-15 2002-10-22 Fonix Corporation Method and apparatus for determining pitch synchronous frames
TWI225637B (en) 2003-06-09 2004-12-21 Ali Corp Method for calculation a pitch period estimation of speech signals with variable step size
US8396704B2 (en) * 2007-10-24 2013-03-12 Red Shift Company, Llc Producing time uniform feature vectors
US8214201B2 (en) 2008-11-19 2012-07-03 Cambridge Silicon Radio Limited Pitch range refinement
US8280725B2 (en) 2009-05-28 2012-10-02 Cambridge Silicon Radio Limited Pitch or periodicity estimation

Also Published As

Publication number Publication date
WO2018026329A1 (en) 2018-02-08

Similar Documents

Publication Publication Date Title
Drugman et al. Glottal closure and opening instant detection from speech signals
KR100930584B1 (en) Speech discrimination method and apparatus using voiced sound features of human speech
JP5229234B2 (en) Non-speech segment detection method and non-speech segment detection apparatus
Evangelopoulos et al. Multiband modulation energy tracking for noisy speech detection
CN105118502A (en) End point detection method and system of voice identification system
Greenwood et al. SUVing: automatic silence/unvoiced/voiced classification of speech
Khoa Noise robust voice activity detection
SI25265A (en) The process and the device for marking the period of speech pitch and audio/non-audio segments
US6865529B2 (en) Method of estimating the pitch of a speech signal using an average distance between peaks, use of the method, and a device adapted therefor
US10522160B2 (en) Methods and apparatus to identify a source of speech captured at a wearable electronic device
CN104732984B (en) A kind of method and system of quick detection single-frequency prompt tone
US6954726B2 (en) Method and device for estimating the pitch of a speech signal using a binary signal
JPS60200300A (en) Voice head/end detector
US11443761B2 (en) Real-time pitch tracking by detection of glottal excitation epochs in speech signal using Hilbert envelope
Bouzid et al. Voice source parameter measurement based on multi-scale analysis of electroglottographic signal
Vieira et al. Robust F/sub 0/and jitter estimation in pathological voices
CN1971707B (en) Method and apparatus for estimating fundamental tone period and adjudging unvoiced/voiced classification
Vijayan et al. Epoch extraction from allpass residual of speech signals
Lin et al. A Novel Normalization Method for Autocorrelation Function for Pitch Detection and for Speech Activity Detection.
Sudhakar et al. Automatic speech segmentation to improve speech synthesis performance
Govind et al. Epoch extraction in high pass filtered speech using hilbert envelope
Bőhm et al. Automatic classification of regular vs. irregular phonation types
WO2001013360A1 (en) Pitch and voicing estimation for low bit rate speech coders
Jijomon et al. An offline signal processing technique for accurate localisation of stop release bursts in vowel-consonant-vowel utterances
Rachel et al. Estimation of glottal closure instants from telephone speech using a group delay-based approach that considers speech signal as a spectrum

Legal Events

Date Code Title Description
OO00 Grant of patent

Effective date: 20180301