WO2013014876A1 - Fragment processing device, fragment processing method, and fragment processing program - Google Patents

Fragment processing device, fragment processing method, and fragment processing program Download PDF

Info

Publication number
WO2013014876A1
WO2013014876A1 PCT/JP2012/004540 JP2012004540W WO2013014876A1 WO 2013014876 A1 WO2013014876 A1 WO 2013014876A1 JP 2012004540 W JP2012004540 W JP 2012004540W WO 2013014876 A1 WO2013014876 A1 WO 2013014876A1
Authority
WO
WIPO (PCT)
Prior art keywords
time
orthogonal transform
orthogonal
coefficient
corrected
Prior art date
Application number
PCT/JP2012/004540
Other languages
French (fr)
Japanese (ja)
Inventor
正徳 加藤
玲史 近藤
康行 三井
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Publication of WO2013014876A1 publication Critical patent/WO2013014876A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/06Elementary speech units used in speech synthesisers; Concatenation rules
    • G10L13/07Concatenation rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used

Definitions

  • the present invention relates to an element processing apparatus, an element processing method, and an element processing program for processing an element so that the elements are connected well.
  • a speech synthesis method a method of generating synthesized speech by connecting segments that are speech waveforms cut out from recorded speech is known.
  • this method for example, in accordance with the reading of the input text, each segment corresponding to the reading is selected and the segment is connected.
  • Speech synthesis that employs this method is called segment selection speech synthesis.
  • the segment is generated in advance based on the recorded voice.
  • Each segment is cut out, for example, for each semi-syllable.
  • a plurality of types of segments are generated from various recorded voices for one voice (here, a voice of about half syllable).
  • segment is sometimes referred to as a speech segment.
  • Patent Document 1 describes a technique for calculating a frequency characteristic of a connection portion between connected pieces, calculating a spectrum envelope thereof, and correcting the frequency characteristic of the connection portion by the spectrum envelope.
  • the audibility continuity at the connecting portion of the segment can be improved by correcting the frequency characteristics by the spectrum envelope.
  • the spectrum may be greatly deformed locally.
  • the present invention provides a fragment processing apparatus, a fragment processing method, and a fragment processing method capable of improving the audible continuity of sound at a connected portion of a fragment while reducing local deformation of the spectrum.
  • An object is to provide a fragment processing program.
  • the segment processing apparatus includes a spectrum envelope extraction unit that extracts a spectrum envelope in a period including a boundary time between two segments from two consecutive segments, and an orthogonal that performs orthogonal transform on the spectrum envelope.
  • a transforming means a smoothing means for smoothing an orthogonal transform coefficient, which is an expansion coefficient obtained as a result of performing an orthogonal transform on the spectrum envelope, and an inverse for performing an inverse transform of the orthogonal transform on the corrected orthogonal transform coefficient.
  • Transform means, and time domain waveform generation means for generating a time domain waveform from a spectrum envelope obtained as a result of performing inverse transform of orthogonal transform on the corrected orthogonal transform coefficient.
  • the segment processing method extracts a spectrum envelope in a period including a boundary time between the two segments from two consecutive segments, performs orthogonal transform on the spectrum envelope, and converts the spectrum envelope into a spectrum envelope.
  • the orthogonal transformation coefficient which is the expansion coefficient obtained as a result of performing the orthogonal transformation, is smoothed, the orthogonal transformation coefficient is inversely transformed to the corrected orthogonal transformation coefficient, and the orthogonal transformation is performed on the corrected orthogonal transformation coefficient.
  • a time domain waveform is generated from a spectrum envelope obtained as a result of performing the inverse transformation.
  • the segment processing program allows a computer to perform spectrum envelope extraction processing for extracting a spectrum envelope in a period including a boundary time between two segments from two consecutive segments, orthogonal to the spectrum envelope.
  • Orthogonal transformation processing that performs transformation
  • smoothing processing that smoothes the orthogonal transformation coefficient that is the expansion coefficient obtained as a result of performing orthogonal transformation on the spectrum envelope
  • inverse transformation of orthogonal transformation to the corrected orthogonal transformation coefficient
  • a time domain waveform generation process for generating a time domain waveform from a spectrum envelope obtained as a result of performing an inverse transform of the orthogonal transform on the corrected orthogonal transform coefficient.
  • FIG. 1 It is a block diagram which shows the example of embodiment of the unit processing apparatus of this invention. It is explanatory drawing which shows the example of the correction
  • FIG. 1 is a block diagram showing an example of an embodiment of the fragment processing apparatus of the present invention.
  • the segment processing apparatus of this embodiment includes a correction range determination unit 1, a spectrum envelope extraction unit 2, an orthogonal transformation unit 3, a smoothing processing unit 4, an inverse transformation unit 5, a pitch waveform generation unit 6, And a superposition addition unit 7.
  • the individual segment selected in accordance with the reading of the text to be synthesized is input to the correction range determination unit 1.
  • a fragment is expressed as a change in amplitude within a certain time.
  • the segment is represented by an individual time and an amplitude at each time. Then, by representing the time on the horizontal axis and the amplitude on the vertical axis, the segment can be represented as a waveform.
  • the correction range determination unit 1 determines a range (period) to be corrected for the orthogonal transformation coefficient between two consecutive segments in order to increase the continuity of the audible sound at the connected portion of the segments. .
  • the range to be corrected is referred to as a correction target period.
  • the orthogonal transform coefficient will be described later.
  • FIG. 2 is an explanatory diagram illustrating an example of a correction target period of orthogonal transform coefficients in two continuous segments. Of the two consecutive pieces, the first piece is referred to as a preceding piece, and the subsequent piece is referred to as a subsequent piece.
  • the correction range determination unit 1 may determine a period that satisfies the following conditions as the correction target period.
  • the first condition is to include the boundary time of the segment (hereinafter referred to as t bnd ).
  • the second condition is that the period before the center time t c1 of the preceding segment period and the period after the center time t c2 of the subsequent segment period are not included. Accordingly, the correction period is the longest, from the central time t c1 period preceding segment, a case where the distance from the center time t c2 period subsequent segment was corrected target period (see FIG. 2). However, a period shorter than the period from t c1 to t c2 may be determined as the correction target period.
  • the correction range determination unit 1 may determine a fixed length period as a correction target period. Alternatively, a period of a predetermined ratio with respect to the segment length (for example, a period of 30% with respect to the total length of the preceding segment and the subsequent segment) may be determined as the correction target period. However, the above conditions shall be satisfied.
  • the spectrum envelope extraction unit 2 extracts the spectrum envelope corresponding to the portion from the portion overlapping the correction target period in the preceding segment. Similarly, the spectrum envelope extraction unit 2 extracts the spectrum envelope corresponding to the portion from the portion overlapping the correction target period in the subsequent segment.
  • the spectrum envelope extraction unit 2 extracts the spectrum envelope from the segment is not particularly limited.
  • the spectrum envelope extraction unit 2 may extract the spectrum envelope from the segment by performing STRIGHT analysis.
  • STRAIGHT analysis is an example of a method for extracting a spectral envelope.
  • the spectral envelope may be extracted from the segment by performing, for example, linear prediction analysis, LSP (Line Spectral Pair) analysis, or cepstrum analysis.
  • FIG. 3 is an explanatory diagram schematically showing an example of the spectrum envelope extracted by the spectrum envelope extraction unit 2.
  • the horizontal axis shown in FIG. 3 represents time, and the vertical axis represents spectral density.
  • the start time and end time of the correction target period are denoted as t str and t fin , respectively.
  • the spectral density is obtained discretely.
  • the case where the spectral density is calculated in accordance with the pitch period of the segment is taken as an example. Then, describing the time when the spectral density is computed pitch synchronously position time, represented by t p.
  • the first t p in the spectrum envelope extracted from the preceding segment (the time at which the spectral density is calculated), and t Beg1, the last t p in the spectral envelope and t end1. Further, the first t p in the spectrum envelope extracted from the subsequent segment, and t Beg2, the last t p in the spectral envelope and t end2. At this time, the following relationship is established.
  • the axis in the depth direction is the frequency axis. That is, the spectrum envelope illustrated in FIG. 3 is obtained for each subband.
  • the orthogonal transform unit 3 performs orthogonal transform on the spectrum envelope extracted by the spectrum envelope extraction unit 2.
  • the orthogonal transformation is to obtain a expansion coefficient by an orthogonal function.
  • the orthogonal function is a function having orthonormality.
  • the expansion coefficient obtained by the orthogonal transformation is referred to as an orthogonal transformation coefficient.
  • a vector having an orthogonal transform coefficient as a component (hereinafter referred to as an orthogonal transform coefficient vector) is obtained by orthogonal transform on the spectrum envelope.
  • Orthogonal transformation coefficient vector is obtained for each pitch synchronization position time t p.
  • orthogonal transforms examples include discrete Fourier transform (DFT), fast Fourier transform (FFT), discrete cosine transform (DCT), wavelet transform, and the like.
  • DFT discrete Fourier transform
  • FFT fast Fourier transform
  • DCT discrete cosine transform
  • wavelet transform wavelet transform
  • FIG. 4 is an explanatory diagram schematically showing the result of orthogonal transformation with respect to the spectrum envelope.
  • the horizontal axis shown in FIG. 4 represents time, and the vertical axis represents the orthogonal transform coefficient.
  • the orthogonal transform coefficient vector has n dimensions, and the results illustrated in FIG. 4 are obtained for each dimension. That is, in the example shown in FIG. 4, the axis in the depth direction (in other words, the axis perpendicular to the paper surface) is the axis of the dimension.
  • orthogonality is obtained at a connection portion (near time t bnd ) between the orthogonal transformation result with respect to the spectral envelope extracted from the preceding element and the orthogonal transformation result with respect to the spectral envelope extracted from the subsequent element.
  • the smoothing processing unit 4 corrects the orthogonal transformation coefficient obtained by the orthogonal transformation to smooth the change from the orthogonal transformation coefficient on the preceding element side to the orthogonal transformation coefficient on the subsequent element side.
  • the smoothing processing unit 4 eliminates the gap between orthogonal transform coefficients in the vicinity of the time t bnd by this smoothing processing.
  • the smoothing processing unit 4 performs this process for each dimension.
  • FIG. 5 is a block diagram illustrating an example of the smoothing processing unit 4.
  • the smoothing processing unit 4 includes a reference value calculation unit 41 and a correction unit 42.
  • FIG. 6 is an explanatory diagram for illustrating processing of the reference value calculation unit 41.
  • the orthogonal transformation coefficient before correction is represented as X ( ⁇ )
  • the orthogonal transformation coefficient after correction is represented as Y ( ⁇ ).
  • is a frequency bin
  • the time is added as a subscript.
  • the orthogonal transformation coefficient before correction at time t end1 is denoted as X tend1 ( ⁇ ).
  • the orthogonal transform coefficient before correction at time t beg2 is denoted as X tbeg2 ( ⁇ ).
  • the corrected orthogonal transformation coefficient Y ( ⁇ ) is represented by adding the corresponding time as a subscript. Note that the orthogonal transformation coefficient before correction in an arbitrary pitch synchronization position time t p, and the orthogonal transform coefficients after the correction, respectively, X tp (omega), referred to as Y tp ( ⁇ ).
  • the reference value calculation unit 41 calculates a reference value that serves as a reference when the change of each orthogonal transform coefficient in the correction target period t str to t fin is smoothed.
  • a reference value that serves as a reference when the change of each orthogonal transform coefficient in the correction target period t str to t fin is smoothed.
  • an orthogonal transformation coefficient Y tbnd ( ⁇ ) after correction at the boundary time t bnd is assumed and the orthogonal transformation coefficient Y tbnd ( ⁇ ). Is the reference value.
  • each orthogonal transformation coefficient on the preceding element side is increased, and the orthogonal transformation coefficient X tend1 ( ⁇ ) becomes If the reference value is exceeded, each orthogonal transform coefficient on the leading element side is reduced.
  • each orthogonal transformation coefficient on the subsequent element side is increased, and the orthogonal transformation coefficient X tbeg2 ( If ⁇ ) exceeds the reference value, each orthogonal transform coefficient on the subsequent segment side is decreased.
  • the reference value calculation unit 41 uses, as the reference value Y tbnd ( ⁇ ), the orthogonal transformation coefficient X tend1 ( ⁇ ) at the pitch synchronization position time t end1 at the end of the preceding element side and the pitch synchronization position time t at the start edge of the subsequent element side.
  • the reference value calculation unit 41 sets the average value of X tend1 ( ⁇ ) and X tbeg2 ( ⁇ ) as Y tbnd ( ⁇ ).
  • Y tbnd ( ⁇ ) X tend1 ( ⁇ )
  • Y tbnd ( ⁇ ) X tbeg2 ( ⁇ )
  • Correcting unit 42 based on the reference value Y tbnd ( ⁇ ), the more the pitch synchronization position time t p near the boundary time t bnd, the amount of change from the orthogonal transform coefficient before correction is increased, far from the boundary time t bnd as the pitch synchronization position time t p, as the amount of change from the orthogonal transform coefficient before correction is reduced, it corrects the orthogonal transform coefficients of each pitch synchronization position time t p belonging to the correction period t str ⁇ t fin.
  • FIG. 7 is an explanatory diagram schematically showing the orthogonal transform coefficient corrected by the correction unit 42. As illustrated in FIG. 7, the correction of the orthogonal transform coefficient by the correction unit 42 eliminates the gap in the vicinity of the time t bnd , and the change in the orthogonal transform coefficient with the passage of time is smoothed.
  • the inverse transform unit 5 performs the inverse transform of the orthogonal transform performed by the orthogonal transform unit 3 on the orthogonal transform coefficient corrected by the correction unit 42.
  • the inverse transform performed by the inverse transform unit 5 may be determined in advance according to the type of orthogonal transform performed by the orthogonal transform unit 3. For example, in the configuration in which the orthogonal transform unit 3 performs discrete Fourier transform, the inverse transform unit 5 performs inverse discrete Fourier transform (Inverse DFT). In the configuration in which the orthogonal transform unit 3 performs the fast Fourier transform, the inverse transform unit 5 performs the inverse fast Fourier transform (Inverse FFT). In the configuration in which the orthogonal transform unit 3 performs discrete cosine transform, the inverse transform unit 5 performs inverse discrete cosine transform (Inverse DCT). In the configuration in which the orthogonal transform unit 3 performs wavelet transform, the inverse transform unit 5 performs inverse wavelet transform.
  • Inverse DFT inverse discrete Fourier transform
  • the corrected spectral envelope is obtained by the inverse transformation by the inverse transformation unit 5.
  • the pitch waveform generation unit 6 generates a pitch waveform from the corrected spectral envelope (result of inverse transformation by the inverse transformation unit 5).
  • the superposition adding unit 7 generates a synthesized speech waveform by superposing and adding the pitch waveforms generated by the pitch waveform generating unit 6.
  • a pitch pattern is input to the overlay adding unit 7.
  • the pitch pattern is a time series of pitch frequencies. It is only necessary to sequentially input the pitch frequency corresponding to the pitch waveform generated in the pitch waveform generation unit 6 to the overlay addition unit 7.
  • the superposition adding unit 7 determines the pitch waveform arrangement interval according to the input pitch frequency, and superimposes and adds the pitch waveforms.
  • Correction range determination unit 1, spectrum envelope extraction unit 2, orthogonal transformation unit 3, smoothing processing unit 4 (reference value calculation unit 41 and correction unit 42), inverse transformation unit 5, pitch waveform generation unit 6, and overlay addition unit 7 Is realized, for example, by a CPU of a computer that operates in accordance with a segment processing program.
  • a computer program storage device (not shown) stores the fragment processing program, and the CPU reads the program, and according to the program, the correction range determination unit 1, the spectrum envelope extraction unit 2, and the orthogonal transform unit 3.
  • the smoothing processing unit 4 (the reference value calculation unit 41 and the correction unit 42), the inverse conversion unit 5, and the pitch waveform generation unit 6 may be operated.
  • the correction range determination unit 1, the spectrum envelope extraction unit 2, the orthogonal transformation unit 3, the reference value calculation unit 41, the correction unit 42, the inverse transformation unit 5, the pitch waveform generation unit 6, and the overlay addition unit 7 are separate units. It may be realized with.
  • FIG. 8 is a flowchart showing an example of processing progress of the fragment processing apparatus of the present invention.
  • the individual segments selected in accordance with the reading of the text to be subjected to speech synthesis are sequentially input to the correction range determination unit 1 according to the reading order.
  • the correction range determination unit 1 determines the correction target period of the orthogonal transform coefficient for two consecutive pieces (step S1). As already described, the correction range determination unit 1 determines that the correction target period includes the segment boundary time t bnd (first condition) and the center time t c1 of the preceding segment period (see FIG. 2). The correction target period so as to satisfy the condition (second condition) that the correction target period does not include the period before the center time t c2 (see FIG. 2) of the period before) and the subsequent segment period (see FIG. 2). Can be determined.
  • the spectrum envelope extracting unit 2 extracts the spectrum envelope corresponding to the correction target period from the portion overlapping the correction target period in the preceding element, and similarly, from the part overlapping the correction target period in the subsequent element, A spectrum envelope corresponding to the portion is extracted (step S2).
  • the spectrum envelope extraction unit 2 may extract the spectrum envelope from the segment by performing, for example, STRIGHT analysis.
  • the spectral envelope may be extracted from the segment by other methods such as linear prediction analysis, LSP analysis, and cepstrum analysis.
  • the spectrum envelope illustrated in FIG. 3 is obtained for each subband.
  • the orthogonal transform unit 3 performs orthogonal transform on the spectrum envelope extracted in step S2 (step S3).
  • an orthogonal transform coefficient vector is obtained for each pitch synchronization position time.
  • the relationship between the time and the orthogonal transformation coefficient as illustrated in FIG. 4 is obtained for each dimension.
  • the axis in the depth direction is the axis of the dimension.
  • the orthogonal transform unit 3 may perform, for example, a fast Fourier transform as the orthogonal transform.
  • a fast Fourier transform is performed as the orthogonal transform is illustrated, but the orthogonal transform unit 3 may perform other orthogonal transform such as discrete Fourier transform, discrete cosine transform, and wavelet transform.
  • the reference value calculation unit 41 corrects the orthogonal transformation coefficient X tend1 ( ⁇ ) before correction at the end pitch synchronization position time t end1 on the preceding unit side and the correction at the pitch synchronization position time t beg2 on the start side of the subsequent unit side.
  • a reference value Y tbnd ( ⁇ ) is calculated based on the previous orthogonal transform coefficient X tbeg2 ( ⁇ ) (step S4).
  • the reference value calculation unit 41 uses an average value of X tend1 ( ⁇ ) and X tbeg2 ( ⁇ ) as a reference value. That is, in this example, the reference value calculation unit 41 may calculate the reference value Y tbnd ( ⁇ ) by calculating the following equation (1).
  • Equation (1) is an example of a method for calculating the reference value Y tbnd ( ⁇ ). If X tend1 ( ⁇ ) ⁇ Y tbnd ( ⁇ ) ⁇ X tbeg2 ( ⁇ ), or X tbeg2 ( ⁇ ) ⁇ Y tbnd ( ⁇ ) ⁇ X tend1 ( ⁇ ), the reference value Y is satisfied by another method. tbnd ( ⁇ ) may be calculated.
  • the correction unit 42 corrects the orthogonal transform coefficients in the pitch synchronization position time t p in the correction target period (step S5). By this correction, the correction unit 42 smoothes the change from the orthogonal transformation coefficient on the preceding element side to the orthogonal transformation coefficient on the subsequent element side.
  • the correction unit 42 uses the reference value Y tbnd ( ⁇ ), in each pitch synchronization position time t p of the period of the preceding element side in the correction target period (the period before the boundary time t bnd) The orthogonal transformation coefficient of is corrected.
  • the correction unit 42 uses the reference value Y tbnd ( ⁇ ), orthogonal at each pitch synchronization position time t p of the period of the subsequent element side in the correction target period (period after the boundary time t bnd) Correct the conversion factor.
  • FIG. 9 is an explanatory diagram schematically illustrating the correction in the period before the boundary time t bnd in the correction target period.
  • FIG. 9 illustrates a case where the orthogonal transformation coefficient is increased in a period before the reference time Y tbnd ( ⁇ ) is larger than X tend1 ( ⁇ ) and before the boundary time t bnd .
  • Equation (2) is regarded as t end1 ⁇ t bnd, represents that the two above-mentioned ratio is the same.
  • the correction unit 42 by performing the calculation of the above formula (3) with respect to each pitch synchronization position time t p for the period before the boundary time t bnd, orthogonal corrected at each pitch synchronization position time t p A conversion coefficient Ytp ( ⁇ ) is calculated.
  • the pitch synchronization position time t p near the boundary time t bnd correction change from the orthogonal transformation coefficient before increases, the farther the pitch synchronization position time t p from the boundary time t bnd, before correction orthogonal transform
  • the amount of change from the coefficient is small.
  • the correction unit 42 uses the equation (3).
  • the corrected orthogonal transformation coefficient Y tp ( ⁇ ) may be calculated by the above calculation.
  • FIG. 10 is an explanatory diagram schematically illustrating correction in a period after the boundary time t bnd in the correction target period.
  • FIG. 10 illustrates a case where the orthogonal transformation coefficient is increased in a period after the reference value Y tbnd ( ⁇ ) is larger than X tbeg2 ( ⁇ ) and after the boundary time t bnd .
  • Equation (4) left-hand side of the time from the segment of boundary time t bnd to the end time t fin of the correction target period for (see FIG. 10), the time from an arbitrary pitch synchronization position time t p to time t fin ( (See FIG. 10).
  • the proportion of the right side of the equation (4) is for the "Y tbnd ( ⁇ ) -X tbeg2 ( ⁇ )", the change amount of the correction at the pitch synchronization position time t p "Y tp ( ⁇ ) -X tp ( ⁇ )” It is. That is, equation (4) is regarded as t beg2 ⁇ t bnd, represents that the two above-mentioned ratio is the same.
  • the correction unit 42 by performing the calculation of the above equation (5) with respect to each pitch synchronization position time t p in the period after the boundary time t bnd, orthogonal corrected at each pitch synchronization position time t p A conversion coefficient Y tp ( ⁇ ) is calculated.
  • the pitch synchronization position time t p near the boundary time t bnd correction change from the orthogonal transformation coefficient before increases, the farther the pitch synchronization position time t p from the boundary time t bnd, before correction orthogonal transform
  • the amount of change from the coefficient is small.
  • the correction unit 42 is configured to use the equation (5).
  • the corrected orthogonal transformation coefficient Y tp ( ⁇ ) may be calculated by the above calculation.
  • the reference value calculation unit 41 and the correction unit 42 perform the processes of steps S4 and S5 for each dimension of the orthogonal transform coefficient.
  • step S5 the inverse transform unit 5 performs the inverse transform of the orthogonal transform in step S3 on the corrected orthogonal transform coefficient obtained in step S5 (step S6). If the fast Fourier transform is performed in step S3, the inverse transform unit 5 may perform the inverse fast Fourier transform. As a result of step S6, a spectrum envelope is obtained. This spectral envelope can be said to be a corrected spectral envelope.
  • the pitch waveform generation unit 6 converts the corrected spectral envelope (the spectral envelope obtained as a result of step S6) into a time domain waveform (that is, a pitch waveform) using inverse Fourier transform (step S7). In this conversion, an appropriate phase spectrum is used.
  • the pitch waveform generation unit 6 may use, for example, a zero phase or a fixed phase as the phase spectrum. Further, if the phase before extraction of the spectral envelope in step S2 can be used, that phase may be used.
  • Time domain waveform (pitch waveform) is represented by individual time and amplitude at each time.
  • the superposition adding unit 7 generates a synthesized speech waveform by superposing and adding the pitch waveforms generated by the pitch waveform generating unit 6 (step S8).
  • the superposition adding unit 7 may determine the pitch waveform arrangement interval in accordance with the input pitch frequency and superimpose and add the pitch waveforms.
  • the orthogonal transform coefficient obtained by orthogonal transform of the spectral envelope is smoothed, and the gap of the orthogonal transform coefficient in the vicinity of the time t bnd is eliminated. Then, an inverse transform of the orthogonal transform is performed on the smoothed (in other words, corrected) orthogonal transform coefficient to generate a spectrum envelope, and a pitch waveform is generated from the spectrum envelope. Therefore, since the external shape of the spectrum is adjusted, it is possible to reduce the large local deformation of the spectrum. In addition, since the pitch waveforms thus obtained are superimposed and added, the continuity of the audible sound at the connecting portion of the segments can be improved.
  • the spectral envelope is extracted, and the orthogonal transformation result (orthogonal transformation coefficient) for the spectral envelope is smoothed. Therefore, the influence of pitch can be eliminated during smoothing.
  • the smoothing processing unit 4 includes the reference value calculation unit 41 and the correction unit 42, the reference value calculation unit 41 calculates the reference value, and the correction unit 42 uses the reference value to perform the orthogonal transform coefficient.
  • the orthogonal transform coefficient is corrected by another method.
  • FIG. 11 is a block diagram showing a modification of the embodiment of the present invention. Elements other than the smoothing processing unit 14 are the same as those described in the above embodiment. Elements other than the smoothing processing unit 14 are denoted by the same reference numerals as those in FIG. In addition, the smoothing process part 14 is implement
  • the orthogonal transformation coefficient before correction at time t is assumed to be X t ( ⁇ ).
  • the corrected orthogonal transform coefficient at time t is Y t ( ⁇ ).
  • the time t is expressed as follows.
  • is a frequency bin, and ⁇ ⁇ 0, 1, 2,..., FFT_LEN ⁇ 1 ⁇ .
  • steps S1 to S3 and steps S6 to S8 are the same as those in the above embodiment.
  • the smoothing processing unit 14 may perform the following processing to calculate the corrected orthogonal transformation coefficient Y t ( ⁇ ). That is, the smoothing processing unit 14 calculates the following orthogonal transformation coefficient Y t ( ⁇ ) by performing the calculation of Expression (7) below for all ⁇ and t.
  • X tstr + k ⁇ ts ( ⁇ ) is an orthogonal transform coefficient before correction at time t str + k ⁇ t s.
  • the smoothed orthogonal transform coefficient Y t ( ⁇ ) can be obtained by the calculation of Expression (7).
  • each orthogonal transform coefficient is corrected by a moving average.
  • the smoothing processing unit 14 may calculate the corrected orthogonal transformation coefficient Y t ( ⁇ ) by performing the calculation of the following equation (8) for all ⁇ and t.
  • Y t-ts ( ⁇ ) is an orthogonal transform coefficients after the correction at time t-t s.
  • ⁇ in equation (8) is a time constant.
  • Y tstr ⁇ ts ( ⁇ ) 0. That is, the time t str orthogonal transform coefficients after the correction in -t s Y tstr-ts ( ⁇ ) is assumed to be 0.
  • Equation (8) is obtained by multiplying the orthogonal transformation coefficient X t ( ⁇ ) before correction at time t by the time constant and the orthogonal transformation coefficient Y t-ts ( ⁇ ) after correction at the time before that fixed period. 1 is obtained by calculating the sum of the result obtained by multiplying the value obtained by subtracting the time constant from 1 and obtaining the corrected orthogonal transform coefficient Y t ( ⁇ ) at time t.
  • the smoothed orthogonal transform coefficient Y t ( ⁇ ) can be obtained by the calculation of Expression (8). This method can be said to be smoothing using first-order leak integration.
  • the smoothing processing unit 14 may calculate the corrected orthogonal transformation coefficient Y t ( ⁇ ) as follows.
  • ⁇ 1 in equation (9) is a time constant. Further, it is assumed that the following expression (10) is established. It should be noted that, the left-hand side of the subscript of the formula (10), refers to the time t str -t s.
  • Equation (9) is the first provisional corrected orthogonal transform coefficient.
  • Expression (9) is obtained by multiplying the orthogonal transformation coefficient X t ( ⁇ ) before correction at time t by the first time constant ⁇ 1 and the first provisional correction at the time before that fixed period. the orthogonal transform coefficients after, by calculating the sum of the result of multiplying the value of the constant alpha 1 is subtracted when the 1 of the first, obtaining the orthogonal transform coefficients of the first tentative corrected at time t Means that.
  • the following formula (11) is calculated in the order of.
  • (b) described as a subscript in Formula (11) is a code
  • ⁇ 2 in equation (11) is a time constant. Further, it is assumed that the following expression (12) is established. It should be noted that, the left-hand side of the subscript of the formula (12), refers to the time t str + N ⁇ t s.
  • Equation (11) is the second provisional corrected orthogonal transform coefficient.
  • Expression (11) is obtained by multiplying the orthogonal transformation coefficient X t ( ⁇ ) before correction at time t by the second time constant ⁇ 2 and the second provisional correction at the time after a certain period. the orthogonal transform coefficients after, by calculating the sum of the result of multiplying the 1 second time value constant alpha 2 is subtracted to obtain the orthogonal transform coefficients of the second provisional corrected at time t Means that.
  • the smoothing processing unit 14 calculates the average value of the calculation result of Expression (9) and the calculation result of Expression (11) for all t, thereby correcting the orthogonal transform coefficient Y t ( ⁇ after correction). ) Is calculated. That is, the smoothing processing unit 14 calculates Y t ( ⁇ ) by performing the calculation of Expression (13) shown below for all t.
  • the smoothing processing unit 14 After calculating Y t ( ⁇ ) for all t, the smoothing processing unit 14 selects the next ⁇ and repeats the same processing. The smoothing processing unit 14 ends the process when there is no unselected ⁇ .
  • the smoothed orthogonal transform coefficient Y t ( ⁇ ) can also be obtained by the above processing.
  • This method can also be said to be smoothing using first-order leak integration.
  • FIG. 12 is a block diagram showing an example of the minimum configuration of the fragment processing apparatus of the present invention.
  • the segment processing apparatus of the present invention includes a spectrum envelope extracting unit 71, an orthogonal transform unit 72, a smoothing unit 73, an inverse transform unit 74, and a time domain waveform generating unit 75.
  • the spectrum envelope extracting means 71 (for example, the spectrum envelope extracting unit 2) extracts a spectrum envelope in a period (for example, a correction target period) including a boundary time between the two elements from two consecutive elements.
  • Orthogonal transformation means 72 (for example, orthogonal transformation unit 3) performs orthogonal transformation on the spectrum envelope.
  • the smoothing means 73 (for example, the smoothing processing unit 4) smoothes the orthogonal transform coefficient that is the expansion coefficient obtained as a result of performing the orthogonal transform on the spectrum envelope.
  • the inverse transform means 74 (for example, the inverse transform unit 5) performs inverse transform of orthogonal transform on the corrected orthogonal transform coefficient.
  • the time domain waveform generation means 75 (for example, the pitch waveform generation unit 6) generates a time domain waveform from a spectrum envelope obtained as a result of performing inverse transformation of orthogonal transformation on the corrected orthogonal transformation coefficient.
  • Such a configuration makes it possible to match the outer shape of the spectrum, so that the local deformation of the spectrum can be reduced. Moreover, the continuity of the audible sound at the connecting portion of the segment can be improved.
  • Spectral envelope extraction means for extracting a spectrum envelope in a period including a boundary time between the two pieces from two consecutive pieces, and orthogonal transform means for performing orthogonal transformation on the spectrum envelope, Smoothing means for smoothing orthogonal transform coefficients, which are expansion coefficients obtained as a result of performing orthogonal transform on the spectrum envelope, and inverse transform means for performing inverse transform of the orthogonal transform on the corrected orthogonal transform coefficients And a time domain waveform generating means for generating a time domain waveform from a spectrum envelope obtained as a result of performing the inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient.
  • the smoothing means includes an orthogonal transform coefficient at a time closest to a boundary time between the two segments among the orthogonal transform coefficients on the previous segment side in two consecutive segments, and the two segments.
  • the reference determining means for determining the value between the orthogonal transform coefficient at the time closest to the boundary time as the orthogonal transform coefficient at the boundary time, and the orthogonal transform at the boundary time.
  • the smoothing means multiplies the result obtained by multiplying the orthogonal transform coefficient before correction by a constant, and the result obtained by multiplying the orthogonal transform coefficient after correction at a time before a certain period by a value obtained by subtracting the constant from 1. 2.
  • the element processing device according to appendix 1, wherein the orthogonal transformation coefficient on the previous element side and the orthogonal transformation coefficient on the subsequent element side are corrected by calculating the sum of each time.
  • the smoothing means converts the first orthogonal transform coefficient from 1 to the result obtained by multiplying the orthogonal transform coefficient before correction by the first constant and the first provisional corrected orthogonal transform coefficient at a time before a predetermined period.
  • the first provisional corrected orthogonal transform coefficient at each time is calculated in time order, and the orthogonal transform coefficient before correction
  • the result of multiplying the second constant by a value obtained by subtracting the second constant from the second provisional corrected orthogonal transform coefficient at a time after a fixed period.
  • the second provisional corrected orthogonal transform coefficient at each time is calculated in the reverse order of the time order, and the first provisional corrected orthogonal transform coefficient at each time and the first The average value of the two orthogonal correction coefficients after tentative correction is calculated at each time.
  • Fragment processing apparatus calculated as the orthogonal transform coefficients after the correction.
  • a spectrum envelope in a period including a boundary time between the two segments is extracted from two consecutive segments, orthogonal transform is performed on the spectrum envelope, and orthogonal transform is performed on the spectrum envelope. Smoothing the orthogonal transformation coefficient, which is the expansion coefficient obtained as a result of performing, performing the inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient, and performing the inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient.
  • a segment processing method comprising: generating a time domain waveform from a spectrum envelope obtained as a result of performing.
  • Spectral envelope extraction processing for extracting a spectral envelope in a period including a boundary time between the two segments from two continuous segments, and orthogonal transformation processing for performing orthogonal transformation on the spectral envelope
  • a smoothing process for smoothing an orthogonal transform coefficient, which is an expansion coefficient obtained as a result of performing an orthogonal transform on the spectrum envelope, and an inverse transform process for performing an inverse transform of the orthogonal transform on the corrected orthogonal transform coefficient
  • a segment processing program for executing time domain waveform generation processing for generating a time domain waveform from a spectrum envelope obtained as a result of performing inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient.
  • a spectrum envelope extraction unit that extracts a spectrum envelope in a period including a boundary time between the two units from two consecutive units, an orthogonal transform unit that performs orthogonal transform on the spectrum envelope, A smoothing unit that smoothes orthogonal transform coefficients that are expansion coefficients obtained as a result of performing orthogonal transform on the spectrum envelope, and an inverse transform unit that performs inverse transform of the orthogonal transform on the corrected orthogonal transform coefficient And a time domain waveform generation unit that generates a time domain waveform from a spectrum envelope obtained as a result of performing inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient.
  • the smoothing unit includes an orthogonal transform coefficient at a time closest to a boundary time of the two segments among the orthogonal transform coefficients on the previous segment side in two consecutive segments, and the two segments
  • a reference determining unit that determines a value between the orthogonal transformation coefficient at the time closest to the boundary time as the orthogonal transformation coefficient at the boundary time, and the orthogonal transformation at the boundary time
  • the segment processing apparatus according to appendix 9, further comprising: a correction unit that corrects the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side based on the coefficient.
  • the correction unit increases the amount of change from the orthogonal transformation coefficient before correction as the time is closer to the boundary time between the two segments, and the amount of change from the orthogonal transformation coefficient before correction as the time is farther from the boundary time.
  • the segment processing apparatus according to appendix 10 wherein the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side are corrected so as to be smaller.
  • the smoothing unit multiplies the result of multiplying the orthogonal transform coefficient before correction by a constant, and the result of multiplying the orthogonal transform coefficient after correction at a time before a certain period by a value obtained by subtracting the constant from 1.
  • the smoothing unit converts the result of multiplying the orthogonal transformation coefficient before correction by the first constant and the first provisional corrected orthogonal transformation coefficient at a time before a predetermined period from 1 to the first By calculating the sum of the result of subtracting the constant of 1 and the result of multiplication for each time, the first provisional corrected orthogonal transform coefficient at each time is calculated in time order, and the orthogonal transform coefficient before correction And the result of multiplying the second constant by a value obtained by subtracting the second constant from the second provisional corrected orthogonal transform coefficient at a time after a fixed period.
  • the second provisional corrected orthogonal transform coefficient at each time is calculated in the reverse order of the time order, and the first provisional corrected orthogonal transform coefficient at each time and the first The average value of the two orthogonal correction coefficients after tentative correction is calculated at each time.
  • Fragment processing apparatus according to note 9 to calculate the orthogonal transformation coefficient after correction.
  • the present invention is preferably applied to a segment processing apparatus that performs processing on segments so that the segments are connected well in segment selection type speech synthesis.

Abstract

A fragment processing device is provided which can reduce significant local spectral deformation while improving the continuity of audible sound in a continuous portion between fragments. A spectral envelope extraction means (71) extracts the spectral envelope from two continuous fragments in the period including the boundary time of the two fragments. An orthogonal transformation means (72) performs an orthogonal transformation on the spectral envelope. A smoothing means (73) smoothes the orthogonal transformation coefficient, which is the expansion coefficient resulting from the orthogonal transformation of the spectral envelope. An inverse transform means (74) performs an inverse transform of the orthogonal transformation on the corrected orthogonal transformation coefficient. A time-domain waveform generating means (75) generates a time-domain waveform from the spectral envelope.

Description

素片処理装置、素片処理方法および素片処理プログラムSegment processing apparatus, segment processing method, and segment processing program
 本発明は、素片を良好に接続させるように素片に対して処理を行う素片処理装置、素片処理方法および素片処理プログラムに関する。 The present invention relates to an element processing apparatus, an element processing method, and an element processing program for processing an element so that the elements are connected well.
 音声合成の方法として、収録した音声から切り出した音声波形である素片を接続して合成音声を生成する方法が知られている。この方法では、例えば、入力されたテキストの読みに応じて、その読みに対応する各素片を選択し、その素片を接続させる。この方法を採用した音声合成は、素片選択型の音声合成と呼ばれる。なお、素片は、収録音声に基づいて予め生成されている。また、個々の素片は、例えば、半音節程度毎に、切り出されている。また、一般的に、1つの音声(ここでは、半音節程度の音声)に対して、種々の収録音声から複数種類の素片が生成される。 As a speech synthesis method, a method of generating synthesized speech by connecting segments that are speech waveforms cut out from recorded speech is known. In this method, for example, in accordance with the reading of the input text, each segment corresponding to the reading is selected and the segment is connected. Speech synthesis that employs this method is called segment selection speech synthesis. Note that the segment is generated in advance based on the recorded voice. Each segment is cut out, for example, for each semi-syllable. In general, a plurality of types of segments are generated from various recorded voices for one voice (here, a voice of about half syllable).
 なお、素片は、音声素片と称されることもある。 Note that the segment is sometimes referred to as a speech segment.
 特許文献1には、接続する素片間における接続部分の周波数特性を算出し、そのスペクトル包絡を算出し、接続部分の周波数特性をスペクトル包絡によって補正する技術が記載されている。 Patent Document 1 describes a technique for calculating a frequency characteristic of a connection portion between connected pieces, calculating a spectrum envelope thereof, and correcting the frequency characteristic of the connection portion by the spectrum envelope.
特開2009-237015号公報JP 2009-237015 A
 素片選択型の音声合成では、人間が合成音声を聴いたときに、音の不連続感を感じることがある。これは、接続される素片同士の間で、スペクトルのギャップが生じているためである。素片を選択するときには、そのようなギャップができるだけ小さくなるような素片の組み合わせを選択し、その素片同士を接続させる。しかし、スペクトルのギャップを解消することは困難であった。 In the unit selection type speech synthesis, when a person listens to the synthesized speech, a sense of discontinuity may be felt. This is because a spectral gap is generated between the connected pieces. When selecting the pieces, a combination of the pieces is selected so that the gap becomes as small as possible, and the pieces are connected to each other. However, it has been difficult to eliminate the spectral gap.
 特許文献1に記載された技術では、スペクトル包絡によって周波数特性を補正することにより、素片の接続部における聴感上の連続性を向上させることができる。しかし、スペクトルが局所的に大きく変形する場合があった。 In the technique described in Patent Document 1, the audibility continuity at the connecting portion of the segment can be improved by correcting the frequency characteristics by the spectrum envelope. However, the spectrum may be greatly deformed locally.
 そこで、本発明は、局所的にスペクトルが大きく変形することを少なくしつつ、素片の接続部分における聴感上の音の連続性を良好にすることができる素片処理装置、素片処理方法および素片処理プログラムを提供することを目的とする。 Therefore, the present invention provides a fragment processing apparatus, a fragment processing method, and a fragment processing method capable of improving the audible continuity of sound at a connected portion of a fragment while reducing local deformation of the spectrum. An object is to provide a fragment processing program.
 本発明による素片処理装置は、連続する2つの素片から、当該2つの素片の境界時刻を含む期間におけるスペクトル包絡を抽出するスペクトル包絡抽出手段と、スペクトル包絡に対して直交変換を行う直交変換手段と、スペクトル包絡に対して直交変換を行った結果得られる展開係数である直交変換係数を平滑化する平滑化手段と、補正後の直交変換係数に対して直交変換の逆変換を行う逆変換手段と、補正後の直交変換係数に対して直交変換の逆変換を行った結果得られるスペクトル包絡から時間領域波形を生成する時間領域波形生成手段とを備えることを特徴とする。 The segment processing apparatus according to the present invention includes a spectrum envelope extraction unit that extracts a spectrum envelope in a period including a boundary time between two segments from two consecutive segments, and an orthogonal that performs orthogonal transform on the spectrum envelope. A transforming means, a smoothing means for smoothing an orthogonal transform coefficient, which is an expansion coefficient obtained as a result of performing an orthogonal transform on the spectrum envelope, and an inverse for performing an inverse transform of the orthogonal transform on the corrected orthogonal transform coefficient. Transform means, and time domain waveform generation means for generating a time domain waveform from a spectrum envelope obtained as a result of performing inverse transform of orthogonal transform on the corrected orthogonal transform coefficient.
 また、本発明による素片処理方法は、連続する2つの素片から、当該2つの素片の境界時刻を含む期間におけるスペクトル包絡を抽出し、スペクトル包絡に対して直交変換を行い、スペクトル包絡に対して直交変換を行った結果得られる展開係数である直交変換係数を平滑化し、補正後の直交変換係数に対して直交変換の逆変換を行い、補正後の直交変換係数に対して直交変換の逆変換を行った結果得られるスペクトル包絡から時間領域波形を生成することを特徴とする。 Further, the segment processing method according to the present invention extracts a spectrum envelope in a period including a boundary time between the two segments from two consecutive segments, performs orthogonal transform on the spectrum envelope, and converts the spectrum envelope into a spectrum envelope. The orthogonal transformation coefficient, which is the expansion coefficient obtained as a result of performing the orthogonal transformation, is smoothed, the orthogonal transformation coefficient is inversely transformed to the corrected orthogonal transformation coefficient, and the orthogonal transformation is performed on the corrected orthogonal transformation coefficient. A time domain waveform is generated from a spectrum envelope obtained as a result of performing the inverse transformation.
 また、本発明による素片処理プログラムは、コンピュータに、連続する2つの素片から、当該2つの素片の境界時刻を含む期間におけるスペクトル包絡を抽出するスペクトル包絡抽出処理、スペクトル包絡に対して直交変換を行う直交変換処理、スペクトル包絡に対して直交変換を行った結果得られる展開係数である直交変換係数を平滑化する平滑化処理、補正後の直交変換係数に対して直交変換の逆変換を行う逆変換処理、および、補正後の直交変換係数に対して直交変換の逆変換を行った結果得られるスペクトル包絡から時間領域波形を生成する時間領域波形生成処理を実行させることを特徴とする。 In addition, the segment processing program according to the present invention allows a computer to perform spectrum envelope extraction processing for extracting a spectrum envelope in a period including a boundary time between two segments from two consecutive segments, orthogonal to the spectrum envelope. Orthogonal transformation processing that performs transformation, smoothing processing that smoothes the orthogonal transformation coefficient that is the expansion coefficient obtained as a result of performing orthogonal transformation on the spectrum envelope, and inverse transformation of orthogonal transformation to the corrected orthogonal transformation coefficient And a time domain waveform generation process for generating a time domain waveform from a spectrum envelope obtained as a result of performing an inverse transform of the orthogonal transform on the corrected orthogonal transform coefficient.
 本発明によれば、局所的にスペクトルが大きく変形することを少なくしつつ、素片の接続部分における聴感上の音の連続性を良好にすることができる。 According to the present invention, it is possible to improve the continuity of the audible sound at the connecting portion of the segments while reducing local large deformation of the spectrum.
本発明の素片処理装置の実施形態の例を示すブロック図である。It is a block diagram which shows the example of embodiment of the unit processing apparatus of this invention. 連続する2つの素片における直交変換係数の補正対象期間の例を示す説明図である。It is explanatory drawing which shows the example of the correction | amendment object period of the orthogonal transformation coefficient in two continuous elements. スペクトル包絡抽出部2が抽出したスペクトル包絡の例を模式的に示す説明図である。It is explanatory drawing which shows typically the example of the spectrum envelope which the spectrum envelope extraction part 2 extracted. スペクトル包絡に対する直交変換の結果を模式的に示す説明図である。It is explanatory drawing which shows typically the result of the orthogonal transformation with respect to a spectrum envelope. 平滑化処理部4の例を示すブロック図である。3 is a block diagram illustrating an example of a smoothing processing unit 4. FIG. 基準値算出部41の処理を示すための説明図である。It is explanatory drawing for showing the process of the reference value calculation part. 補正された直交変換係数を模式的に示す説明図である。It is explanatory drawing which shows the correct | amended orthogonal transformation coefficient typically. 本発明の素片処理装置の処理経過の例を示すフローチャートである。It is a flowchart which shows the example of the process progress of the segment processing apparatus of this invention. 補正対象期間における境界時刻tbndよりも前の期間での補正を模式的に示す説明図である。It is explanatory drawing which shows typically the correction | amendment in the period before boundary time tbnd in a correction | amendment object period. 補正対象期間における境界時刻tbndよりも後の期間での補正を模式的に示す説明図である。It is explanatory drawing which shows typically the correction | amendment in the period after boundary time tbnd in a correction | amendment object period. 本発明の実施形態の変形例を示すブロック図である。It is a block diagram which shows the modification of embodiment of this invention. 本発明の素片処理装置の最小構成の例を示すブロック図である。It is a block diagram which shows the example of the minimum structure of the fragment processing apparatus of this invention.
 以下、本発明の実施形態を図面を参照して説明する。
 図1は、本発明の素片処理装置の実施形態の例を示すブロック図である。本実施形態の素片処理装置は、補正範囲決定部1と、スペクトル包絡抽出部2と、直交変換部3と、平滑化処理部4と、逆変換部5と、ピッチ波形生成部6と、重ね合わせ加算部7とを備える。
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing an example of an embodiment of the fragment processing apparatus of the present invention. The segment processing apparatus of this embodiment includes a correction range determination unit 1, a spectrum envelope extraction unit 2, an orthogonal transformation unit 3, a smoothing processing unit 4, an inverse transformation unit 5, a pitch waveform generation unit 6, And a superposition addition unit 7.
 補正範囲決定部1には、音声合成の対象となるテキストの読みに応じて選択された個々の素片が入力される。 The individual segment selected in accordance with the reading of the text to be synthesized is input to the correction range determination unit 1.
 素片は、ある時間内における振幅の変化として表される。例えば、素片は、個々の時刻と、その各時刻における振幅によって表される。そして、時刻を横軸で表し、振幅を縦軸で表すことで、素片は波形として表すことができる。 A fragment is expressed as a change in amplitude within a certain time. For example, the segment is represented by an individual time and an amplitude at each time. Then, by representing the time on the horizontal axis and the amplitude on the vertical axis, the segment can be represented as a waveform.
 補正範囲決定部1は、素片の接続部分における聴感上の音の連続性を高めるために、連続する2つの素片の間で、直交変換係数の補正対象とする範囲(期間)を決定する。以下、補正対象とする範囲を補正対象期間と記す。直交変換係数については、後述する。図2は、連続する2つの素片における直交変換係数の補正対象期間の例を示す説明図である。連続する2つの素片のうち、先の素片を先行素片と記し、後の素片を後続素片と記す。補正範囲決定部1は、補正対象期間として、以下の条件を満たす期間を決定すればよい。第1の条件は、素片の境界時刻(以下、tbndと記す。)を含むことである。第2の条件は、先行素片の期間の中心時刻tc1よりも前の期間および後続素片の期間の中心時刻tc2よりも後の期間を含まないことである。従って、補正対象期間が最長となるのは、先行素片の期間の中心時刻tc1から、後続素片の期間の中心時刻tc2までを補正対象期間とした場合である(図2参照)。ただし、tc1からtc2までの期間よりも短い期間を補正対象期間として決定してもよい。 The correction range determination unit 1 determines a range (period) to be corrected for the orthogonal transformation coefficient between two consecutive segments in order to increase the continuity of the audible sound at the connected portion of the segments. . Hereinafter, the range to be corrected is referred to as a correction target period. The orthogonal transform coefficient will be described later. FIG. 2 is an explanatory diagram illustrating an example of a correction target period of orthogonal transform coefficients in two continuous segments. Of the two consecutive pieces, the first piece is referred to as a preceding piece, and the subsequent piece is referred to as a subsequent piece. The correction range determination unit 1 may determine a period that satisfies the following conditions as the correction target period. The first condition is to include the boundary time of the segment (hereinafter referred to as t bnd ). The second condition is that the period before the center time t c1 of the preceding segment period and the period after the center time t c2 of the subsequent segment period are not included. Accordingly, the correction period is the longest, from the central time t c1 period preceding segment, a case where the distance from the center time t c2 period subsequent segment was corrected target period (see FIG. 2). However, a period shorter than the period from t c1 to t c2 may be determined as the correction target period.
 補正範囲決定部1は、予め定められた固定長の期間を補正対象期間として決定してもよい。あるいは、素片長に対する所定の割合の期間(例えば、先行素片と後続素片の全体の長さに対する30%の期間)を補正対象期間として決定してもよい。ただし、上記の条件は満たすものとする。 The correction range determination unit 1 may determine a fixed length period as a correction target period. Alternatively, a period of a predetermined ratio with respect to the segment length (for example, a period of 30% with respect to the total length of the preceding segment and the subsequent segment) may be determined as the correction target period. However, the above conditions shall be satisfied.
 スペクトル包絡抽出部2は、先行素片のうち補正対象期間と重なる部分から、その部分に対応するスペクトル包絡を抽出する。同様に、スペクトル包絡抽出部2は、後続素片のうち補正対象期間と重なる部分から、その部分に対応するスペクトル包絡を抽出する。 The spectrum envelope extraction unit 2 extracts the spectrum envelope corresponding to the portion from the portion overlapping the correction target period in the preceding segment. Similarly, the spectrum envelope extraction unit 2 extracts the spectrum envelope corresponding to the portion from the portion overlapping the correction target period in the subsequent segment.
 スペクトル包絡抽出部2が素片からスペクトル包絡を抽出する態様は、特に限定されない。例えば、スペクトル包絡抽出部2は、STRAIGHT分析を行うことによって素片からスペクトル包絡を抽出してもよい。STRAIGHT分析は、スペクトル包絡を抽出する方法の一例である。STRAIGHT分析の代わりに、例えば、線形予測分析、LSP(Line Spectral Pair)分析、あるいは、ケプストラム分析を行うことによって、素片からスペクトル包絡を抽出してもよい。 The aspect in which the spectrum envelope extraction unit 2 extracts the spectrum envelope from the segment is not particularly limited. For example, the spectrum envelope extraction unit 2 may extract the spectrum envelope from the segment by performing STRIGHT analysis. STRAIGHT analysis is an example of a method for extracting a spectral envelope. Instead of the STRAIGHT analysis, the spectral envelope may be extracted from the segment by performing, for example, linear prediction analysis, LSP (Line Spectral Pair) analysis, or cepstrum analysis.
 図3は、スペクトル包絡抽出部2が抽出したスペクトル包絡の例を模式的に示す説明図である。図3に示す横軸は時刻を表し、縦軸はスペクトル密度を表す。また、補正対象期間の開始時刻、終了時刻を、それぞれtstr,tfinと記す。また、スペクトル密度は離散的に得られる。本例では、素片のピッチ周期に合わせてスペクトル密度が算出される場合を例にする。そして、スペクトル密度が算出されている時刻をピッチ同期位置時刻と記し、tで表す。先行素片から抽出されたスペクトル包絡における最初のt(スペクトル密度が算出されている時刻)を、tbeg1とし、そのスペクトル包絡における最後のtをtend1とする。また、後続素片から抽出されたスペクトル包絡における最初のtを、tbeg2とし、そのスペクトル包絡における最後のtをtend2とする。このとき、以下の関係が成立する。 FIG. 3 is an explanatory diagram schematically showing an example of the spectrum envelope extracted by the spectrum envelope extraction unit 2. The horizontal axis shown in FIG. 3 represents time, and the vertical axis represents spectral density. Also, the start time and end time of the correction target period are denoted as t str and t fin , respectively. Further, the spectral density is obtained discretely. In this example, the case where the spectral density is calculated in accordance with the pitch period of the segment is taken as an example. Then, describing the time when the spectral density is computed pitch synchronously position time, represented by t p. The first t p in the spectrum envelope extracted from the preceding segment (the time at which the spectral density is calculated), and t Beg1, the last t p in the spectral envelope and t end1. Further, the first t p in the spectrum envelope extracted from the subsequent segment, and t Beg2, the last t p in the spectral envelope and t end2. At this time, the following relationship is established.
 tstr<tbeg1<tend1<tbnd<tbeg2<tend2<tfin t str <t beg1 <t end1 <t bnd <t beg2 <t end2 <t fin
 図3に示す例において、奥行き方向の軸(換言すれば、紙面に垂直方向の軸)が、周波数軸となる。すなわち、図3に例示するスペクトル包絡が、サブバンド毎に得られることになる。 In the example shown in FIG. 3, the axis in the depth direction (in other words, the axis perpendicular to the paper surface) is the frequency axis. That is, the spectrum envelope illustrated in FIG. 3 is obtained for each subband.
 直交変換部3は、スペクトル包絡抽出部2によって抽出されたスペクトル包絡に対して直交変換を行う。直交変換とは、直交関数による展開係数を求めることである。また、直交関数とは、正規直交性をなす関数である。以下、直交変換により求められる展開係数を直交変換係数と記す。具体的には、スペクトル包絡に対する直交変換によって、直交変換係数を成分とするベクトル(以下、直交変換係数ベクトルと記す。)が得られる。直交変換係数ベクトルは、ピッチ同期位置時刻t毎に得られる。 The orthogonal transform unit 3 performs orthogonal transform on the spectrum envelope extracted by the spectrum envelope extraction unit 2. The orthogonal transformation is to obtain a expansion coefficient by an orthogonal function. Further, the orthogonal function is a function having orthonormality. Hereinafter, the expansion coefficient obtained by the orthogonal transformation is referred to as an orthogonal transformation coefficient. Specifically, a vector having an orthogonal transform coefficient as a component (hereinafter referred to as an orthogonal transform coefficient vector) is obtained by orthogonal transform on the spectrum envelope. Orthogonal transformation coefficient vector is obtained for each pitch synchronization position time t p.
 直交変換の例として、離散フーリエ変換(DFT;Discrete Fourier Transform)、高速フーリエ変換(FFT;Fast Fourier Transform)、離散コサイン変換(DCT;Discrete Cosine Transform )、ウェーブレット変換等が挙げられる。ここでは、直交変換部3が高速フーリエ変換を行う場合を例にする。 Examples of orthogonal transforms include discrete Fourier transform (DFT), fast Fourier transform (FFT), discrete cosine transform (DCT), wavelet transform, and the like. Here, a case where the orthogonal transform unit 3 performs fast Fourier transform is taken as an example.
 図4は、スペクトル包絡に対する直交変換の結果を模式的に示す説明図である。図4に示す横軸は時刻を表し、縦軸は直交変換係数を表す。なお、直交変換係数ベクトルはn次元となり、次元毎に図4に例示したような結果が得られる。すなわち、図4に示す例において、奥行き方向の軸(換言すれば、紙面に垂直方向の軸)が次元の軸となる。 FIG. 4 is an explanatory diagram schematically showing the result of orthogonal transformation with respect to the spectrum envelope. The horizontal axis shown in FIG. 4 represents time, and the vertical axis represents the orthogonal transform coefficient. Note that the orthogonal transform coefficient vector has n dimensions, and the results illustrated in FIG. 4 are obtained for each dimension. That is, in the example shown in FIG. 4, the axis in the depth direction (in other words, the axis perpendicular to the paper surface) is the axis of the dimension.
 図4に例示するように、先行素片から抽出されたスペクトル包絡に対する直交変換結果と、後続素片から抽出されたスペクトル包絡に対する直交変換結果との接続部分(時刻tbndの近傍)では、直交変換係数のギャップが生じている。平滑化処理部4は、直交変換によって得られた直交変換係数を補正することにより、先行素片側の直交変換係数から後続素片側の直交変換係数への変化を平滑化させる。平滑化処理部4は、この平滑化処理により、時刻tbndの近傍における直交変換係数のギャップを解消する。平滑化処理部4は、この処理を次元毎に行う。 As illustrated in FIG. 4, orthogonality is obtained at a connection portion (near time t bnd ) between the orthogonal transformation result with respect to the spectral envelope extracted from the preceding element and the orthogonal transformation result with respect to the spectral envelope extracted from the subsequent element. There is a gap in conversion coefficients. The smoothing processing unit 4 corrects the orthogonal transformation coefficient obtained by the orthogonal transformation to smooth the change from the orthogonal transformation coefficient on the preceding element side to the orthogonal transformation coefficient on the subsequent element side. The smoothing processing unit 4 eliminates the gap between orthogonal transform coefficients in the vicinity of the time t bnd by this smoothing processing. The smoothing processing unit 4 performs this process for each dimension.
 図5は、平滑化処理部4の例を示すブロック図である。平滑化処理部4は、基準値算出部41と、補正部42とを備える。 FIG. 5 is a block diagram illustrating an example of the smoothing processing unit 4. The smoothing processing unit 4 includes a reference value calculation unit 41 and a correction unit 42.
 図6は、基準値算出部41の処理を示すための説明図である。補正前の直交変換係数をX(ω)と表し、補正後の直交変換係数をY(ω)と表す。なお、ωは周波数ビンであり、ω∈{0,1,2,・・・,FFT_LEN-1}である。また、ある時刻における直交変換係数を表す場合、その時刻を添え字として付して表す。例えば、時刻tend1における補正前の直交変換係数をXtend1(ω)と記す。また、時刻tbeg2における補正前の直交変換係数をXtbeg2(ω)と記す。補正後の直交変換係数Y(ω)に関しても、同様に、対応する時刻を添え字として付して表す。なお、任意のピッチ同期位置時刻tにおける補正前の直交変換係数、および、補正後の直交変換係数は、それぞれ、Xtp(ω)、Ytp(ω)と記す。 FIG. 6 is an explanatory diagram for illustrating processing of the reference value calculation unit 41. The orthogonal transformation coefficient before correction is represented as X (ω), and the orthogonal transformation coefficient after correction is represented as Y (ω). Note that ω is a frequency bin, and ωε {0, 1, 2,..., FFT_LEN−1}. In addition, when an orthogonal transform coefficient at a certain time is expressed, the time is added as a subscript. For example, the orthogonal transformation coefficient before correction at time t end1 is denoted as X tend1 (ω). Further, the orthogonal transform coefficient before correction at time t beg2 is denoted as X tbeg2 (ω). Similarly, the corrected orthogonal transformation coefficient Y (ω) is represented by adding the corresponding time as a subscript. Note that the orthogonal transformation coefficient before correction in an arbitrary pitch synchronization position time t p, and the orthogonal transform coefficients after the correction, respectively, X tp (omega), referred to as Y tp (ω).
 基準値算出部41は、補正対象期間tstr~tfinにおける各直交変換係数の変化を平滑化する際に基準となる基準値を算出する。境界時刻tbndでは、補正前の直交変換係数は存在しないが、便宜的に、境界時刻tbndにおける補正後の直交変換係数Ytbnd(ω)を想定し、その直交変換係数Ytbnd(ω)を基準値とする。先行素片側の終端のピッチ同期位置時刻tend1における直交変換係数Xtend1(ω)が基準値未満であれば、先行素片側の各直交変換係数を増加させ、直交変換係数Xtend1(ω)が基準値を超えていれば、先行素片側の各直交変換係数を減少させることになる。同様に、後続素片側の始端のピッチ同期位置時刻tbeg2における直交変換係数Xtbeg2(ω)が基準値未満であれば、後続素片側の各直交変換係数を増加させ、直交変換係数Xtbeg2(ω)が基準値を超えていれば、後続素片側の各直交変換係数を減少させることになる。基準値算出部41は、基準値Ytbnd(ω)として、先行素片側の終端のピッチ同期位置時刻tend1における直交変換係数Xtend1(ω)と、後続素片側の始端のピッチ同期位置時刻tbeg2における直交変換係数Xtbeg2(ω)との間の値を求める。すなわち、Xtend1(ω)≦Ytbnd(ω)≦Xtbeg2(ω)、または、Xtbeg2(ω)≦Ytbnd(ω)≦Xtend1(ω)を満たす基準値Ytbnd(ω)を求めればよい。 The reference value calculation unit 41 calculates a reference value that serves as a reference when the change of each orthogonal transform coefficient in the correction target period t str to t fin is smoothed. At the boundary time t bnd , there is no orthogonal transformation coefficient before correction, but for convenience, an orthogonal transformation coefficient Y tbnd (ω) after correction at the boundary time t bnd is assumed and the orthogonal transformation coefficient Y tbnd (ω). Is the reference value. If the orthogonal transformation coefficient X tend1 (ω) at the pitch synchronization position time t end1 at the end of the preceding element side is less than the reference value, each orthogonal transformation coefficient on the preceding element side is increased, and the orthogonal transformation coefficient X tend1 (ω) becomes If the reference value is exceeded, each orthogonal transform coefficient on the leading element side is reduced. Similarly, if the orthogonal transformation coefficient X tbeg2 (ω) at the pitch synchronization position time t beg2 at the starting end on the subsequent element side is less than the reference value, each orthogonal transformation coefficient on the subsequent element side is increased, and the orthogonal transformation coefficient X tbeg2 ( If ω) exceeds the reference value, each orthogonal transform coefficient on the subsequent segment side is decreased. The reference value calculation unit 41 uses, as the reference value Y tbnd (ω), the orthogonal transformation coefficient X tend1 (ω) at the pitch synchronization position time t end1 at the end of the preceding element side and the pitch synchronization position time t at the start edge of the subsequent element side. obtaining a value between the orthogonal transform coefficient X tbeg2 (ω) in Beg2. That, X tend1 (ω) ≦ Y tbnd (ω) ≦ X tbeg2 (ω), or is asked to X tbeg2 (ω) ≦ Y tbnd (ω) ≦ X tend1 reference value Y Tbnd satisfying (ω) (ω) That's fine.
 本例では、基準値算出部41は、Xtend1(ω)とXtbeg2(ω)の平均値をYtbnd(ω)とする。 In this example, the reference value calculation unit 41 sets the average value of X tend1 (ω) and X tbeg2 (ω) as Y tbnd (ω).
 なお、基準値の定め方によっては、Ytbnd(ω)=Xtend1(ω)となる場合や、Ytbnd(ω)=Xtbeg2(ω)となる場合もあり得る。Ytbnd(ω)=Xtend1(ω)である場合、先行素片側の各直交変換係数に関しては変化させなくてよい。同様に、Ytbnd(ω)=Xtbeg2(ω)である場合、後続素片側の各直交変換係数に関しては変化させなくてよい。 Depending on how the reference value is defined, Y tbnd (ω) = X tend1 (ω) or Y tbnd (ω) = X tbeg2 (ω) may be satisfied. When Y tbnd (ω) = X tend1 (ω), it is not necessary to change each orthogonal transform coefficient on the leading element side. Similarly, when Y tbnd (ω) = X tbeg2 (ω), it is not necessary to change each orthogonal transform coefficient on the subsequent segment side.
 補正部42は、基準値Ytbnd(ω)に基づいて、境界時刻tbndに近いピッチ同期位置時刻tほど、補正前の直交変換係数からの変化量が大きくなり、境界時刻tbndから遠いピッチ同期位置時刻tほど、補正前の直交変換係数からの変化量が小さくなるように、補正対象期間tstr~tfinに属する各ピッチ同期位置時刻tの直交変換係数を補正する。 Correcting unit 42, based on the reference value Y tbnd (ω), the more the pitch synchronization position time t p near the boundary time t bnd, the amount of change from the orthogonal transform coefficient before correction is increased, far from the boundary time t bnd as the pitch synchronization position time t p, as the amount of change from the orthogonal transform coefficient before correction is reduced, it corrects the orthogonal transform coefficients of each pitch synchronization position time t p belonging to the correction period t str ~ t fin.
 図7は、補正部42によって補正された直交変換係数を模式的に示す説明図である。図7に示すように、補正部42による直交変換係数の補正により、時刻tbndの近傍でのギャップが解消し、時間経過に伴う直交変換係数の変化が平滑化される。 FIG. 7 is an explanatory diagram schematically showing the orthogonal transform coefficient corrected by the correction unit 42. As illustrated in FIG. 7, the correction of the orthogonal transform coefficient by the correction unit 42 eliminates the gap in the vicinity of the time t bnd , and the change in the orthogonal transform coefficient with the passage of time is smoothed.
 逆変換部5は、直交変換部3が行った直交変換の逆変換を、補正部42によって補正された直交変換係数に対して行う。 The inverse transform unit 5 performs the inverse transform of the orthogonal transform performed by the orthogonal transform unit 3 on the orthogonal transform coefficient corrected by the correction unit 42.
 逆変換部5が実行する逆変換は、直交変換部3が行う直交変換の種類によって予め決定しておけばよい。例えば、直交変換部3が離散フーリエ変換を行う構成では、逆変換部5は逆離散フーリエ変換(Inverse DFT )を行う。また、直交変換部3が高速フーリエ変換を行う構成では、逆変換部5は逆高速フーリエ変換(Inverse FFT )を行う。また、直交変換部3が離散コサイン変換を行う構成では、逆変換部5は逆離散コサイン変換(Inverse DCT )を行う。また、直交変換部3がウェーブレット変換を行う構成では、逆変換部5は逆ウェーブレット変換を行う。 The inverse transform performed by the inverse transform unit 5 may be determined in advance according to the type of orthogonal transform performed by the orthogonal transform unit 3. For example, in the configuration in which the orthogonal transform unit 3 performs discrete Fourier transform, the inverse transform unit 5 performs inverse discrete Fourier transform (Inverse DFT). In the configuration in which the orthogonal transform unit 3 performs the fast Fourier transform, the inverse transform unit 5 performs the inverse fast Fourier transform (Inverse FFT). In the configuration in which the orthogonal transform unit 3 performs discrete cosine transform, the inverse transform unit 5 performs inverse discrete cosine transform (Inverse DCT). In the configuration in which the orthogonal transform unit 3 performs wavelet transform, the inverse transform unit 5 performs inverse wavelet transform.
 逆変換部5による逆変換により、補正されたスペクトル包絡が得られる。 The corrected spectral envelope is obtained by the inverse transformation by the inverse transformation unit 5.
 ピッチ波形生成部6は、補正されたスペクトル包絡(逆変換部5による逆変換の結果)からピッチ波形を生成する。 The pitch waveform generation unit 6 generates a pitch waveform from the corrected spectral envelope (result of inverse transformation by the inverse transformation unit 5).
 重ね合わせ加算部7は、ピッチ波形生成部6に生成されたピッチ波形を重ね合わせ加算することにより、合成音声波形を生成する。重ね合わせ加算部7には、ピッチパタンが入力される。ピッチパタンは、ピッチ周波数の時系列である。重ね合わせ加算部7には、ピッチ波形生成部6に生成されるピッチ波形に対応するピッチ周波数を順次入力すればよい。重ね合わせ加算部7は、入力されたピッチ周波数に合わせてピッチ波形の配置間隔を決定し、ピッチ波形同士を重ね合わせ加算する。 The superposition adding unit 7 generates a synthesized speech waveform by superposing and adding the pitch waveforms generated by the pitch waveform generating unit 6. A pitch pattern is input to the overlay adding unit 7. The pitch pattern is a time series of pitch frequencies. It is only necessary to sequentially input the pitch frequency corresponding to the pitch waveform generated in the pitch waveform generation unit 6 to the overlay addition unit 7. The superposition adding unit 7 determines the pitch waveform arrangement interval according to the input pitch frequency, and superimposes and adds the pitch waveforms.
 補正範囲決定部1、スペクトル包絡抽出部2、直交変換部3、平滑化処理部4(基準値算出部41および補正部42)、逆変換部5、ピッチ波形生成部6および重ね合わせ加算部7は、例えば、素片処理プログラムに従って動作するコンピュータのCPUによって実現される。この場合、例えば、コンピュータのプログラム記憶装置(図示略)が素片処理プログラムを記憶し、CPUがそのプログラムを読み込んで、そのプログラムに従って、補正範囲決定部1、スペクトル包絡抽出部2、直交変換部3、平滑化処理部4(基準値算出部41および補正部42)、逆変換部5、ピッチ波形生成部6として動作すればよい。また、補正範囲決定部1、スペクトル包絡抽出部2、直交変換部3、基準値算出部41、補正部42、逆変換部5、ピッチ波形生成部6および重ね合わせ加算部7がそれぞれ別々のユニットで実現されていてもよい。 Correction range determination unit 1, spectrum envelope extraction unit 2, orthogonal transformation unit 3, smoothing processing unit 4 (reference value calculation unit 41 and correction unit 42), inverse transformation unit 5, pitch waveform generation unit 6, and overlay addition unit 7 Is realized, for example, by a CPU of a computer that operates in accordance with a segment processing program. In this case, for example, a computer program storage device (not shown) stores the fragment processing program, and the CPU reads the program, and according to the program, the correction range determination unit 1, the spectrum envelope extraction unit 2, and the orthogonal transform unit 3. The smoothing processing unit 4 (the reference value calculation unit 41 and the correction unit 42), the inverse conversion unit 5, and the pitch waveform generation unit 6 may be operated. The correction range determination unit 1, the spectrum envelope extraction unit 2, the orthogonal transformation unit 3, the reference value calculation unit 41, the correction unit 42, the inverse transformation unit 5, the pitch waveform generation unit 6, and the overlay addition unit 7 are separate units. It may be realized with.
 次に、動作について説明する。
 図8は、本発明の素片処理装置の処理経過の例を示すフローチャートである。補正範囲決定部1には、音声合成の対象となるテキストの読みに応じて選択された個々の素片が読みの順に従って、順次、入力される。補正範囲決定部1は、連続する2つの素片に対して、直交変換係数の補正対象期間を決定する(ステップS1)。既に説明したように、補正範囲決定部1は、補正対象期間が素片の境界時刻tbndを含むという条件(第1の条件)と、先行素片の期間の中心時刻tc1(図2参照)よりも前の期間および後続素片の期間の中心時刻tc2(図2参照)よりも後の期間を補正対象期間が含まないという条件(第2の条件)とを満たすように補正対象期間を決定すればよい。
Next, the operation will be described.
FIG. 8 is a flowchart showing an example of processing progress of the fragment processing apparatus of the present invention. The individual segments selected in accordance with the reading of the text to be subjected to speech synthesis are sequentially input to the correction range determination unit 1 according to the reading order. The correction range determination unit 1 determines the correction target period of the orthogonal transform coefficient for two consecutive pieces (step S1). As already described, the correction range determination unit 1 determines that the correction target period includes the segment boundary time t bnd (first condition) and the center time t c1 of the preceding segment period (see FIG. 2). The correction target period so as to satisfy the condition (second condition) that the correction target period does not include the period before the center time t c2 (see FIG. 2) of the period before) and the subsequent segment period (see FIG. 2). Can be determined.
 次に、スペクトル包絡抽出部2は、先行素片のうち補正対象期間と重なる部分から、その部分に対応するスペクトル包絡を抽出し、同様に、後続素片のうち補正対象期間と重なる部分から、その部分に対応するスペクトル包絡を抽出する(ステップS2)。ステップS2では、スペクトル包絡抽出部2は、例えば、STRAIGHT分析を行うことによって素片からスペクトル包絡を抽出してもよい。あるいは、線形予測分析、LSP分析、ケプストラム分析等の他の方法で素片からスペクトル包絡を抽出してもよい。ステップS2の処理により、図3に例示するスペクトル包絡が、サブバンド毎に得られる。 Next, the spectrum envelope extracting unit 2 extracts the spectrum envelope corresponding to the correction target period from the portion overlapping the correction target period in the preceding element, and similarly, from the part overlapping the correction target period in the subsequent element, A spectrum envelope corresponding to the portion is extracted (step S2). In step S2, the spectrum envelope extraction unit 2 may extract the spectrum envelope from the segment by performing, for example, STRIGHT analysis. Alternatively, the spectral envelope may be extracted from the segment by other methods such as linear prediction analysis, LSP analysis, and cepstrum analysis. Through the processing in step S2, the spectrum envelope illustrated in FIG. 3 is obtained for each subband.
 次に、直交変換部3が、ステップS2で抽出されたスペクトル包絡に対して直交変換を行う(ステップS3)。この結果、ピッチ同期位置時刻毎に直交変換係数ベクトルが得られる。換言すれば、図4に例示するような時刻と直交変換係数との関係が次元毎に得られる。既に説明したように、図4に示す例において、奥行き方向の軸が次元の軸となる。 Next, the orthogonal transform unit 3 performs orthogonal transform on the spectrum envelope extracted in step S2 (step S3). As a result, an orthogonal transform coefficient vector is obtained for each pitch synchronization position time. In other words, the relationship between the time and the orthogonal transformation coefficient as illustrated in FIG. 4 is obtained for each dimension. As already explained, in the example shown in FIG. 4, the axis in the depth direction is the axis of the dimension.
 ステップS3では、直交変換部3は、直交変換として、例えば、高速フーリエ変換を行えばよい。ここでは、直交変換として高速フーリエ変換を行う場合を例示するが、直交変換部3は、離散フーリエ変換、離散コサイン変換、ウェーブレット変換等の他の直交変換を行ってもよい。 In step S3, the orthogonal transform unit 3 may perform, for example, a fast Fourier transform as the orthogonal transform. Here, a case where fast Fourier transform is performed as the orthogonal transform is illustrated, but the orthogonal transform unit 3 may perform other orthogonal transform such as discrete Fourier transform, discrete cosine transform, and wavelet transform.
 次に、基準値算出部41は、先行素片側の終端のピッチ同期位置時刻tend1における補正前の直交変換係数Xtend1(ω)と、後続素片側の始端のピッチ同期位置時刻tbeg2における補正前の直交変換係数Xtbeg2(ω)とに基づいて、基準値Ytbnd(ω)を算出する(ステップS4)。本例では、基準値算出部41は、Xtend1(ω)とXtbeg2(ω)との平均値を基準値とする場合を例にする。すなわち、本例では、基準値算出部41は、以下に示す式(1)を計算することにより、基準値Ytbnd(ω)を算出すればよい。 Next, the reference value calculation unit 41 corrects the orthogonal transformation coefficient X tend1 (ω) before correction at the end pitch synchronization position time t end1 on the preceding unit side and the correction at the pitch synchronization position time t beg2 on the start side of the subsequent unit side. A reference value Y tbnd (ω) is calculated based on the previous orthogonal transform coefficient X tbeg2 (ω) (step S4). In this example, the reference value calculation unit 41 uses an average value of X tend1 (ω) and X tbeg2 (ω) as a reference value. That is, in this example, the reference value calculation unit 41 may calculate the reference value Y tbnd (ω) by calculating the following equation (1).
 Ytbnd(ω)={Xtend1(ω)+Xtbeg2(ω)}/2   式(1) Y tbnd (ω) = {X tend1 (ω) + X tbeg2 (ω)} / 2 Formula (1)
 なお、式(1)は、基準値Ytbnd(ω)の算出方法の一例である。Xtend1(ω)≦Ytbnd(ω)≦Xtbeg2(ω)、または、Xtbeg2(ω)≦Ytbnd(ω)≦Xtend1(ω)を満たしていれば、他の方法で基準値Ytbnd(ω)を算出してもよい。 Equation (1) is an example of a method for calculating the reference value Y tbnd (ω). If X tend1 (ω) ≦ Y tbnd (ω) ≦ X tbeg2 (ω), or X tbeg2 (ω) ≦ Y tbnd (ω) ≦ X tend1 (ω), the reference value Y is satisfied by another method. tbnd (ω) may be calculated.
 ステップS4の後、補正部42は、補正対象期間内のピッチ同期位置時刻tにおける直交変換係数を補正する(ステップS5)。補正部42は、この補正によって、先行素片側の直交変換係数から後続素片側の直交変換係数への変化を平滑化させる。 After step S4, the correction unit 42 corrects the orthogonal transform coefficients in the pitch synchronization position time t p in the correction target period (step S5). By this correction, the correction unit 42 smoothes the change from the orthogonal transformation coefficient on the preceding element side to the orthogonal transformation coefficient on the subsequent element side.
 具体的には、補正部42は、基準値Ytbnd(ω)を用いて、補正対象期間における先行素片側の期間(境界時刻tbndよりも前の期間)の各ピッチ同期位置時刻tでの直交変換係数を補正する。同様に、補正部42は、基準値Ytbnd(ω)を用いて、補正対象期間における後続素片側の期間(境界時刻tbndよりも後の期間)の各ピッチ同期位置時刻tでの直交変換係数を補正する。 Specifically, the correction unit 42 uses the reference value Y tbnd (ω), in each pitch synchronization position time t p of the period of the preceding element side in the correction target period (the period before the boundary time t bnd) The orthogonal transformation coefficient of is corrected. Similarly, the correction unit 42 uses the reference value Y tbnd (ω), orthogonal at each pitch synchronization position time t p of the period of the subsequent element side in the correction target period (period after the boundary time t bnd) Correct the conversion factor.
 まず、素片の境界時刻tbndよりも前の期間における補正について説明する。図9は、補正対象期間における境界時刻tbndよりも前の期間での補正を模式的に示す説明図である。図9では、基準値Ytbnd(ω)がXtend1(ω)よりも大きく、境界時刻tbndよりも前の期間において直交変換係数を増加させる場合を例示している。 First, correction in a period before the segment boundary time t bnd will be described. FIG. 9 is an explanatory diagram schematically illustrating the correction in the period before the boundary time t bnd in the correction target period. FIG. 9 illustrates a case where the orthogonal transformation coefficient is increased in a period before the reference time Y tbnd (ω) is larger than X tend1 (ω) and before the boundary time t bnd .
 この補正処理では、tend1≒tbndであるとみなして、以下の式(2)を満たすようにYtp(ω)を求める。 In this correction process, is regarded as a t end1t bnd, obtains the Y tp (omega) so as to satisfy the following equation (2).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 式(2)の左辺は、補正対象期間の開始時刻tstrから素片の境界時刻tbndまでの時間(図9参照)に対する、時刻tstrから任意のピッチ同期位置時刻tまでの時間(図9参照)の割合である。式(2)の右辺は、“Ytbnd(ω)-Xtend1(ω)”に対する、そのピッチ同期位置時刻tにおける補正による変化量“Ytp(ω)-Xtp(ω)”の割合である。すなわち、式(2)は、tend1≒tbndであるとみなして、上記の2つの割合が等しいということを表している。 The left-hand side of equation (2), the correction period of the starting time t str time to the boundary time t bnd the segment from for (see FIG. 9), the time t str arbitrary pitch synchronization position time t p to the time from ( (See FIG. 9). The proportion of the right side of the equation (2) is for "Y tbnd (ω) -X tend1 (ω)", the change amount of the correction at the pitch synchronization position time t p "Y tp (ω) -X tp (ω)" It is. That is, Equation (2) is regarded as t end1t bnd, represents that the two above-mentioned ratio is the same.
 そして、式(2)を満たすようにYtp(ω)を求めれば、境界時刻tbndに近いピッチ同期位置時刻tほど、補正前の直交変換係数からの変化量が大きくなり、境界時刻tbndから遠いピッチ同期位置時刻tほど、補正前の直交変換係数からの変化量が小さくなるように、期間tstr~tbndにおけるXtp(ω)を補正することができる。 Then, by obtaining the Y tp (omega) to satisfy equation (2), as the pitch synchronizing position time t p near the boundary time t bnd, the amount of change from the orthogonal transform coefficient before correction is increased, boundary time t farther pitch synchronization position time t p from bnd, as the amount of change from the orthogonal transform coefficient before correction is reduced, it is possible to correct the X tp (omega) in the period t str ~ t bnd.
 式(2)をYtp(ω)に関して解くと、以下に示す式(3)となる。 Solving equation (2) with respect to Y tp (ω) yields equation (3) below.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 すなわち、補正部42は、境界時刻tbndよりも前の期間における各ピッチ同期位置時刻tに関して上記の式(3)の計算を行うことで、各ピッチ同期位置時刻tにおける補正後の直交変換係数Ytp(ω)を計算する。この結果、境界時刻tbndに近いピッチ同期位置時刻tほど、補正前の直交変換係数からの変化量が大きくなり、境界時刻tbndから遠いピッチ同期位置時刻tほど、補正前の直交変換係数からの変化量が小さくなる。 That is, the correction unit 42, by performing the calculation of the above formula (3) with respect to each pitch synchronization position time t p for the period before the boundary time t bnd, orthogonal corrected at each pitch synchronization position time t p A conversion coefficient Ytp (ω) is calculated. As a result, as the pitch synchronization position time t p near the boundary time t bnd, correction change from the orthogonal transformation coefficient before increases, the farther the pitch synchronization position time t p from the boundary time t bnd, before correction orthogonal transform The amount of change from the coefficient is small.
 なお、基準値Ytbnd(ω)がXtend1(ω)よりも小さく、境界時刻tbndよりも前の期間において直交変換係数を減少させる場合であっても、補正部42は、式(3)の計算により、補正後の直交変換係数Ytp(ω)を計算すればよい。 Even when the reference value Y tbnd (ω) is smaller than X tend1 (ω) and the orthogonal transform coefficient is decreased in the period before the boundary time t bnd , the correction unit 42 uses the equation (3). The corrected orthogonal transformation coefficient Y tp (ω) may be calculated by the above calculation.
 次に、素片の境界時刻tbndよりも後の期間における補正について説明する。図10は、補正対象期間における境界時刻tbndよりも後の期間での補正を模式的に示す説明図である。図10では、基準値Ytbnd(ω)がXtbeg2(ω)よりも大きく、境界時刻tbndよりも後の期間において直交変換係数を増加させる場合を例示している。 Next, correction in a period after the segment boundary time t bnd will be described. FIG. 10 is an explanatory diagram schematically illustrating correction in a period after the boundary time t bnd in the correction target period. FIG. 10 illustrates a case where the orthogonal transformation coefficient is increased in a period after the reference value Y tbnd (ω) is larger than X tbeg2 (ω) and after the boundary time t bnd .
 この補正処理では、tbeg2≒tbndであるとみなして、以下の式(4)を満たすようにYtp(ω)を求める。 In this correction process, is regarded as a t beg2t bnd, obtains the Y tp (omega) so as to satisfy the following equation (4).
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 式(4)の左辺は、素片の境界時刻tbndから補正対象期間の終了時刻tfinまでの時間(図10参照)に対する、任意のピッチ同期位置時刻tから時刻tfinまでの時間(図10参照)の割合である。式(4)の右辺は、“Ytbnd(ω)-Xtbeg2(ω)”に対する、そのピッチ同期位置時刻tにおける補正による変化量“Ytp(ω)-Xtp(ω)”の割合である。すなわち、式(4)は、tbeg2≒tbndであるとみなして、上記の2つの割合が等しいということを表している。 Equation (4) left-hand side of the time from the segment of boundary time t bnd to the end time t fin of the correction target period for (see FIG. 10), the time from an arbitrary pitch synchronization position time t p to time t fin ( (See FIG. 10). The proportion of the right side of the equation (4) is for the "Y tbnd (ω) -X tbeg2 (ω)", the change amount of the correction at the pitch synchronization position time t p "Y tp (ω) -X tp (ω)" It is. That is, equation (4) is regarded as t beg2t bnd, represents that the two above-mentioned ratio is the same.
 そして、式(4)を満たすようにYtp(ω)を求めれば、境界時刻tbndに近いピッチ同期位置時刻tほど、補正前の直交変換係数からの変化量が大きくなり、境界時刻tbndから遠いピッチ同期位置時刻tほど、補正前の直交変換係数からの変化量が小さくなるように、期間tbnd~tfinにおけるXtp(ω)を補正することができる。 Then, by obtaining the Y tp (omega) to satisfy equation (4), as the pitch synchronizing position time t p near the boundary time t bnd, the amount of change from the orthogonal transform coefficient before correction is increased, boundary time t farther pitch synchronization position time t p from bnd, as the amount of change from the orthogonal transform coefficient before correction is reduced, it is possible to correct the X tp (omega) in the period t bnd ~ t fin.
 式(4)をYtp(ω)に関して解くと、以下に示す式(5)となる。 When Formula (4) is solved with respect to Y tp (ω), Formula (5) shown below is obtained.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 すなわち、補正部42は、境界時刻tbndよりも後の期間における各ピッチ同期位置時刻tに関して上記の式(5)の計算を行うことで、各ピッチ同期位置時刻tにおける補正後の直交変換係数Ytp(ω)を計算する。この結果、境界時刻tbndに近いピッチ同期位置時刻tほど、補正前の直交変換係数からの変化量が大きくなり、境界時刻tbndから遠いピッチ同期位置時刻tほど、補正前の直交変換係数からの変化量が小さくなる。 That is, the correction unit 42, by performing the calculation of the above equation (5) with respect to each pitch synchronization position time t p in the period after the boundary time t bnd, orthogonal corrected at each pitch synchronization position time t p A conversion coefficient Y tp (ω) is calculated. As a result, as the pitch synchronization position time t p near the boundary time t bnd, correction change from the orthogonal transformation coefficient before increases, the farther the pitch synchronization position time t p from the boundary time t bnd, before correction orthogonal transform The amount of change from the coefficient is small.
 なお、基準値Ytbnd(ω)がXtbeg2(ω)よりも小さく、境界時刻tbndよりも後の期間において直交変換係数を減少させる場合であっても、補正部42は、式(5)の計算により、補正後の直交変換係数Ytp(ω)を計算すればよい。 Even when the reference value Y tbnd (ω) is smaller than X tbeg2 (ω) and the orthogonal transform coefficient is decreased in the period after the boundary time t bnd , the correction unit 42 is configured to use the equation (5). The corrected orthogonal transformation coefficient Y tp (ω) may be calculated by the above calculation.
 基準値算出部41および補正部42は、ステップS4,S5の処理を直交変換係数の次元毎に行う。 The reference value calculation unit 41 and the correction unit 42 perform the processes of steps S4 and S5 for each dimension of the orthogonal transform coefficient.
 ステップS5の後、逆変換部5は、ステップS5で得られた補正後の直交変換係数に対して、ステップS3における直交変換の逆変換を行う(ステップS6)。ステップS3で高速フーリエ変換を行う構成であれば、逆変換部5は、逆高速フーリエ変換を行えばよい。ステップS6の結果、スペクトル包絡が得られる。このスペクトル包絡は、補正されたスペクトル包絡であると言うことができる。 After step S5, the inverse transform unit 5 performs the inverse transform of the orthogonal transform in step S3 on the corrected orthogonal transform coefficient obtained in step S5 (step S6). If the fast Fourier transform is performed in step S3, the inverse transform unit 5 may perform the inverse fast Fourier transform. As a result of step S6, a spectrum envelope is obtained. This spectral envelope can be said to be a corrected spectral envelope.
 ピッチ波形生成部6は、逆フーリエ変換を利用して、補正されたスペクトル包絡(ステップS6の結果得られたスペクトル包絡)を時間領域波形(すなわち、ピッチ波形)に変換する(ステップS7)。この変換では、適当な位相スペクトルを用いる。ピッチ波形生成部6は、この位相スペクトルとして、例えば、ゼロ位相や固定位相を用いてよい。また、ステップS2におけるスペクトル包絡の抽出前における位相を利用可能であれば、その位相を用いてもよい。 The pitch waveform generation unit 6 converts the corrected spectral envelope (the spectral envelope obtained as a result of step S6) into a time domain waveform (that is, a pitch waveform) using inverse Fourier transform (step S7). In this conversion, an appropriate phase spectrum is used. The pitch waveform generation unit 6 may use, for example, a zero phase or a fixed phase as the phase spectrum. Further, if the phase before extraction of the spectral envelope in step S2 can be used, that phase may be used.
 ピッチ同期位置時刻tにおける補正されたスペクトル包絡をYtp(ω)とする。また、ピッチ波形を生成する際に用いる位相をθ(ω)とする。このとき、ピッチ波形生成部6は、ピッチ波形y(t)を、以下の式(6)の計算を行うことによって生成する。 The spectral envelope that is corrected in the pitch synchronization position time t p and Y tp (ω). Also, the phase used when generating the pitch waveform is θ (ω). At this time, the pitch waveform generation unit 6 generates the pitch waveform y (t) by calculating the following equation (6).
 ytp(t)=F-1{Ytp(ω)ejθ(ω)}      式(6) y tp (t) = F −1 {Y tp (ω) e jθ (ω) } Equation (6)
 式(6)において、F-1{・}は、逆フーリエ変を表す。ピッチ波形生成の際に用いる位相が、例えば、ゼロ位相であるならば、式(6)においてθ(ω)=0とすればよい。 In Formula (6), F −1 {·} represents inverse Fourier transformation. If the phase used when generating the pitch waveform is, for example, a zero phase, θ (ω) = 0 in equation (6).
 時間領域波形(ピッチ波形)は、個々の時刻と、その各時刻における振幅で表される。 Time domain waveform (pitch waveform) is represented by individual time and amplitude at each time.
 重ね合わせ加算部7は、ピッチ波形生成部6に生成されたピッチ波形を重ね合わせ加算することにより、合成音声波形を生成する(ステップS8)。重ね合わせ加算部7は、入力されたピッチ周波数に合わせてピッチ波形の配置間隔を決定し、ピッチ波形同士を重ね合わせ加算すればよい。 The superposition adding unit 7 generates a synthesized speech waveform by superposing and adding the pitch waveforms generated by the pitch waveform generating unit 6 (step S8). The superposition adding unit 7 may determine the pitch waveform arrangement interval in accordance with the input pitch frequency and superimpose and add the pitch waveforms.
 本実施形態によれば、スペクトル包絡を直交変換して得られる直交変換係数を平滑化して、時刻tbndの近傍での直交変換係数のギャップを解消させる。そして、その平滑化後(換言すれば、補正後)の直交変換係数に対して直交変換の逆変換を行ってスペクトル包絡を生成し、そのスペクトル包絡からピッチ波形を生成する。従って、スペクトルの外形を合わせ込むことになるので、スペクトルが局所的に大きく変形することを少なくすることができる。また、そのようにして得られたピッチ波形を重ね合わせ加算するので、素片の接続部分における聴感上の音の連続性を良好にすることができる。 According to the present embodiment, the orthogonal transform coefficient obtained by orthogonal transform of the spectral envelope is smoothed, and the gap of the orthogonal transform coefficient in the vicinity of the time t bnd is eliminated. Then, an inverse transform of the orthogonal transform is performed on the smoothed (in other words, corrected) orthogonal transform coefficient to generate a spectrum envelope, and a pitch waveform is generated from the spectrum envelope. Therefore, since the external shape of the spectrum is adjusted, it is possible to reduce the large local deformation of the spectrum. In addition, since the pitch waveforms thus obtained are superimposed and added, the continuity of the audible sound at the connecting portion of the segments can be improved.
 また、上記のように、スペクトル包絡を抽出し、スペクトル包絡に対する直交変換結果(直交変換係数)を平滑化する。従って、平滑化の際に、ピッチの影響を除くことができる。 Also, as described above, the spectral envelope is extracted, and the orthogonal transformation result (orthogonal transformation coefficient) for the spectral envelope is smoothed. Therefore, the influence of pitch can be eliminated during smoothing.
 次に、上記の実施形態の変形例を説明する。上記の実施形態では、平滑化処理部4が基準値算出部41と補正部42とを備え、基準値算出部41が基準値を算出し、補正部42がその基準値を用いて直交変換係数を補正する場合を説明した。以下に示す各変形例では、他の方法で直交変換係数を補正する。 Next, a modification of the above embodiment will be described. In the above embodiment, the smoothing processing unit 4 includes the reference value calculation unit 41 and the correction unit 42, the reference value calculation unit 41 calculates the reference value, and the correction unit 42 uses the reference value to perform the orthogonal transform coefficient. The case of correcting the above has been described. In each modification shown below, the orthogonal transform coefficient is corrected by another method.
 図11は、本発明の実施形態の変形例を示すブロック図である。平滑化処理部14以外の要素は、上記の実施形態で説明したそれらの要素と同様である。平滑化処理部14以外の要素に関しては、図1と同一の符号を付し、説明を省略する。なお、平滑化処理部14は、例えば、素片処理プログラムに従って動作するコンピュータのCPUによって実現される。 FIG. 11 is a block diagram showing a modification of the embodiment of the present invention. Elements other than the smoothing processing unit 14 are the same as those described in the above embodiment. Elements other than the smoothing processing unit 14 are denoted by the same reference numerals as those in FIG. In addition, the smoothing process part 14 is implement | achieved by CPU of the computer which operate | moves according to a segment processing program, for example.
 また、上記の実施形態では、素片のピッチ周期に合わせて、ピッチ同期位置時刻t毎にスペクトル密度および直交変換係数ベクトルが算出される場合を例にしたが、以下に示す変形例では、補正対象期間の開始時刻tstrから固定周期でスペクトル密度および直交変換係数ベクトルが算出される場合を例にして説明する。この周期をtとする。また、補正対象期間をtstr~tstr+(N-1)・tとする。なお、この場合、Nを補正対象期間のフレーム数と呼び、tを補正対象期間のフレームシフトの長さと呼ぶこともできる。 In the above embodiment, in accordance with the pitch period of the segment, although the case where the spectral density and the orthogonal transform coefficient vector is calculated for each pitch synchronization position time t p example, in the modification described below, An example will be described in which the spectral density and the orthogonal transform coefficient vector are calculated at a fixed period from the start time tstr of the correction target period. This period is t s. In addition, the correction period and t str ~ t str + (N -1) · t s. In this case, referred to N the number of frames correction period, the t s can also be referred to as a length of a frame-shift correction period.
 このとき、時刻tにおける補正前の直交変換係数をX(ω)とする。また、時刻tにおける補正後の直交変換係数をY(ω)とする。ただし、時刻tは、以下のように表される。 At this time, the orthogonal transformation coefficient before correction at time t is assumed to be X t (ω). The corrected orthogonal transform coefficient at time t is Y t (ω). However, the time t is expressed as follows.
時刻t∈{tstr,tstr+t,tstr+2t,・・・,tstr+(N-1)・tTime tε {t str , t str + t s , t str + 2t s ,..., T str + (N−1) · t s }
 また、ωは周波数ビンであり、ω∈{0,1,2,・・・,FFT_LEN-1}である。 Ω is a frequency bin, and ω∈ {0, 1, 2,..., FFT_LEN−1}.
 ステップS1~S3およびステップS6~S8(図8参照)の動作は、上記の実施形態と同様である。そして、前述のステップS4,S5(図8参照)の代わりに、平滑化処理部14が以下に示す処理を行って、補正後の直交変換係数Y(ω)を算出すればよい。すなわち、平滑化処理部14は、全てのωとtに対して、以下に示す式(7)の計算を行って、補正後の直交変換係数Y(ω)を算出する。 The operations in steps S1 to S3 and steps S6 to S8 (see FIG. 8) are the same as those in the above embodiment. Then, instead of the above-described steps S4 and S5 (see FIG. 8), the smoothing processing unit 14 may perform the following processing to calculate the corrected orthogonal transformation coefficient Y t (ω). That is, the smoothing processing unit 14 calculates the following orthogonal transformation coefficient Y t (ω) by performing the calculation of Expression (7) below for all ω and t.
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 なお、式(7)において、Xtstr+k・ts(ω)は、時刻tstr+k・tにおける補正前の直交変換係数である。 In the equation (7), X tstr + k · ts (ω) is an orthogonal transform coefficient before correction at time t str + k · t s.
 式(7)におけるMは、移動平均の窓幅である。また、t≦0またはtstr+N・t≦tの範囲のtに対してはX(ω)=0とする。 M in Equation (7) is a moving average window width. Further, X t (ω) = 0 is set for t in the range of t ≦ 0 or t str + N · t s ≦ t.
 式(7)の計算によって、平滑化された直交変換係数Y(ω)を得ることができる。この方法では、移動平均により各直交変換係数を補正していることになる。 The smoothed orthogonal transform coefficient Y t (ω) can be obtained by the calculation of Expression (7). In this method, each orthogonal transform coefficient is corrected by a moving average.
 また、平滑化処理部14は、全てのωとtに対して、以下に示す式(8)の計算を行って、補正後の直交変換係数Y(ω)を算出してもよい。 Further, the smoothing processing unit 14 may calculate the corrected orthogonal transformation coefficient Y t (ω) by performing the calculation of the following equation (8) for all ω and t.
 Y(ω)=αX(ω)+(1-α)Yt-ts(ω)      式(8) Y t (ω) = αX t (ω) + (1−α) Y t−ts (ω) Equation (8)
 式(8)において、Yt-ts(ω)は、時刻t-tにおける補正後の直交変換係数である。 In the formula (8), Y t-ts (ω) is an orthogonal transform coefficients after the correction at time t-t s.
 式(8)におけるαは時定数である。ただし、Ytstr-ts(ω)=0とする。すなわち、時刻tstr-tにおける補正後の直交変換係数Ytstr-ts(ω)が0であるものとする。 Α in equation (8) is a time constant. However, Y tstr−ts (ω) = 0. That is, the time t str orthogonal transform coefficients after the correction in -t s Y tstr-ts (ω ) is assumed to be 0.
 式(8)は、時刻tにおける補正前の直交変換係数X(ω)に時定数を乗算した結果と、その一定周期前の時刻における補正後の直交変換係数Yt-ts(ω)に、1から時定数を減算した値を乗算した結果との和を計算することにより、時刻tにおける補正後の直交変換係数Y(ω)を求めることを意味する。式(8)の計算によって、平滑化された直交変換係数Y(ω)を得ることができる。この方法は、一次リーク積分を利用した平滑化であるということができる。 Equation (8) is obtained by multiplying the orthogonal transformation coefficient X t (ω) before correction at time t by the time constant and the orthogonal transformation coefficient Y t-ts (ω) after correction at the time before that fixed period. 1 is obtained by calculating the sum of the result obtained by multiplying the value obtained by subtracting the time constant from 1 and obtaining the corrected orthogonal transform coefficient Y t (ω) at time t. The smoothed orthogonal transform coefficient Y t (ω) can be obtained by the calculation of Expression (8). This method can be said to be smoothing using first-order leak integration.
 また、平滑化処理部14は、以下のようにして、補正後の直交変換係数Y(ω)を算出してもよい。 Further, the smoothing processing unit 14 may calculate the corrected orthogonal transformation coefficient Y t (ω) as follows.
 平滑化処理部14は、1つのωを選択して、そのωに関して、以下の処理を行う。まず、平滑化処理部14は、t=tstr,tstr+t,tstr+2t,・・・の順に、以下の式(9)の計算を行う。 The smoothing processing unit 14 selects one ω and performs the following processing on the ω. First, the smoothing processing unit 14 calculates the following equation (9) in the order of t = t str , t str + t s , t str + 2t s ,.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 なお、式(9)において、添え字として記載した(a)は、後述の式(11)の計算結果と区別するための符号である。 In addition, in Formula (9), (a) described as a subscript is a code | symbol for distinguishing from the calculation result of Formula (11) mentioned later.
 式(9)におけるαは時定数である。また、以下の式(10)が成立しているものとする。なお、式(10)の左辺の添え字は、時刻tstr-tを意味する。 Α 1 in equation (9) is a time constant. Further, it is assumed that the following expression (10) is established. It should be noted that, the left-hand side of the subscript of the formula (10), refers to the time t str -t s.
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 式(9)で算出している値は、第1の暫定的な補正後の直交変換係数であるということができる。そして、式(9)は、時刻tにおける補正前の直交変換係数X(ω)に第1の時定数αを乗算した結果と、その一定周期前の時刻における第1の暫定的な補正後の直交変換係数に、1から第1の時定数αを減算した値を乗算した結果との和を計算することにより、時刻tにおける第1の暫定的な補正後の直交変換係数を求めることを意味する。 It can be said that the value calculated by Equation (9) is the first provisional corrected orthogonal transform coefficient. Expression (9) is obtained by multiplying the orthogonal transformation coefficient X t (ω) before correction at time t by the first time constant α 1 and the first provisional correction at the time before that fixed period. the orthogonal transform coefficients after, by calculating the sum of the result of multiplying the value of the constant alpha 1 is subtracted when the 1 of the first, obtaining the orthogonal transform coefficients of the first tentative corrected at time t Means that.
 次に、平滑化処理部14は、t=tstr+(N-1)・t,tstr+(N-2)・t,tstr+(N-3)・t,・・・の順に、以下の式(11)の計算を行う。 Next, the smoothing processing unit 14 performs t = t str + (N−1) · t s , t str + (N−2) · t s , t str + (N−3) · t s ,. The following formula (11) is calculated in the order of.
Figure JPOXMLDOC01-appb-M000008
Figure JPOXMLDOC01-appb-M000008
 なお、式(11)において、添え字として記載した(b)は、前述の式(9)の計算結果と区別するための符号である。 In addition, (b) described as a subscript in Formula (11) is a code | symbol for distinguishing from the calculation result of the above-mentioned Formula (9).
 式(11)におけるαは時定数である。また、以下の式(12)が成立しているものとする。なお、式(12)の左辺の添え字は、時刻tstr+N・tを意味する。 Α 2 in equation (11) is a time constant. Further, it is assumed that the following expression (12) is established. It should be noted that, the left-hand side of the subscript of the formula (12), refers to the time t str + N · t s.
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
 式(11)で算出している値は、第2の暫定的な補正後の直交変換係数であるということができる。そして、式(11)は、時刻tにおける補正前の直交変換係数X(ω)に第2の時定数αを乗算した結果と、その一定周期後の時刻における第2の暫定的な補正後の直交変換係数に、1から第2の時定数αを減算した値を乗算した結果との和を計算することにより、時刻tにおける第2の暫定的な補正後の直交変換係数を求めることを意味する。 It can be said that the value calculated by Equation (11) is the second provisional corrected orthogonal transform coefficient. Expression (11) is obtained by multiplying the orthogonal transformation coefficient X t (ω) before correction at time t by the second time constant α 2 and the second provisional correction at the time after a certain period. the orthogonal transform coefficients after, by calculating the sum of the result of multiplying the 1 second time value constant alpha 2 is subtracted to obtain the orthogonal transform coefficients of the second provisional corrected at time t Means that.
 続いて、平滑化処理部14は、全てのtに関して、式(9)の計算結果と式(11)の計算結果との平均値を計算することにより、補正後の直交変換係数Y(ω)を算出する。すなわち、平滑化処理部14は、全てのtに関して、以下に示す式(13)の計算を行うことによって、Y(ω)を算出する。 Subsequently, the smoothing processing unit 14 calculates the average value of the calculation result of Expression (9) and the calculation result of Expression (11) for all t, thereby correcting the orthogonal transform coefficient Y t (ω after correction). ) Is calculated. That is, the smoothing processing unit 14 calculates Y t (ω) by performing the calculation of Expression (13) shown below for all t.
Figure JPOXMLDOC01-appb-M000010
Figure JPOXMLDOC01-appb-M000010
 全てのtに関してY(ω)を算出した後、平滑化処理部14は、次のωを選択し、同様の処理を繰り返す。平滑化処理部14は、未選択のωが存在しなくなったならば、処理を終了する。 After calculating Y t (ω) for all t, the smoothing processing unit 14 selects the next ω and repeats the same processing. The smoothing processing unit 14 ends the process when there is no unselected ω.
 以上の処理によっても、平滑化された直交変換係数Y(ω)を得ることができる。この方法も、一次リーク積分を利用した平滑化であるということができる。 The smoothed orthogonal transform coefficient Y t (ω) can also be obtained by the above processing. This method can also be said to be smoothing using first-order leak integration.
 上記の各変形例においても、前述の実施形態と同様の効果が得られる。 In each of the above modifications, the same effect as that of the above-described embodiment can be obtained.
 次に、本発明の最小構成について説明する。図12は、本発明の素片処理装置の最小構成の例を示すブロック図である。本発明の素片処理装置は、スペクトル包絡抽出手段71と、直交変換手段72と、平滑化手段73と、逆変換手段74と、時間領域波形生成手段75とを備える。 Next, the minimum configuration of the present invention will be described. FIG. 12 is a block diagram showing an example of the minimum configuration of the fragment processing apparatus of the present invention. The segment processing apparatus of the present invention includes a spectrum envelope extracting unit 71, an orthogonal transform unit 72, a smoothing unit 73, an inverse transform unit 74, and a time domain waveform generating unit 75.
 スペクトル包絡抽出手段71(例えば、スペクトル包絡抽出部2)は、連続する2つの素片から、当該2つの素片の境界時刻を含む期間(例えば、補正対象期間)におけるスペクトル包絡を抽出する。 The spectrum envelope extracting means 71 (for example, the spectrum envelope extracting unit 2) extracts a spectrum envelope in a period (for example, a correction target period) including a boundary time between the two elements from two consecutive elements.
 直交変換手段72(例えば、直交変換部3)は、スペクトル包絡に対して直交変換を行う。 Orthogonal transformation means 72 (for example, orthogonal transformation unit 3) performs orthogonal transformation on the spectrum envelope.
 平滑化手段73(例えば、平滑化処理部4)は、スペクトル包絡に対して直交変換を行った結果得られる展開係数である直交変換係数を平滑化する。 The smoothing means 73 (for example, the smoothing processing unit 4) smoothes the orthogonal transform coefficient that is the expansion coefficient obtained as a result of performing the orthogonal transform on the spectrum envelope.
 逆変換手段74(例えば、逆変換部5)は、補正後の直交変換係数に対して直交変換の逆変換を行う。 The inverse transform means 74 (for example, the inverse transform unit 5) performs inverse transform of orthogonal transform on the corrected orthogonal transform coefficient.
 時間領域波形生成手段75(例えば、ピッチ波形生成部6)は、補正後の直交変換係数に対して直交変換の逆変換を行った結果得られるスペクトル包絡から時間領域波形を生成する。 The time domain waveform generation means 75 (for example, the pitch waveform generation unit 6) generates a time domain waveform from a spectrum envelope obtained as a result of performing inverse transformation of orthogonal transformation on the corrected orthogonal transformation coefficient.
 そのような構成により、スペクトルの外形を合わせ込むことになるので、スペクトルが局所的に大きく変形することを少なくすることができる。また、素片の接続部分における聴感上の音の連続性を良好にすることができる。 Such a configuration makes it possible to match the outer shape of the spectrum, so that the local deformation of the spectrum can be reduced. Moreover, the continuity of the audible sound at the connecting portion of the segment can be improved.
 上記の実施形態およびその変形例の一部または全部は、以下の付記のようにも記載され得るが、以下に限定されるわけではない。 Some or all of the above-described embodiment and its modifications may be described as in the following supplementary notes, but are not limited to the following.
(付記1)連続する2つの素片から、当該2つの素片の境界時刻を含む期間におけるスペクトル包絡を抽出するスペクトル包絡抽出手段と、前記スペクトル包絡に対して直交変換を行う直交変換手段と、前記スペクトル包絡に対して直交変換を行った結果得られる展開係数である直交変換係数を平滑化する平滑化手段と、補正後の直交変換係数に対して前記直交変換の逆変換を行う逆変換手段と、補正後の直交変換係数に対して前記直交変換の逆変換を行った結果得られるスペクトル包絡から時間領域波形を生成する時間領域波形生成手段とを備えることを特徴とする素片処理装置。 (Supplementary note 1) Spectral envelope extraction means for extracting a spectrum envelope in a period including a boundary time between the two pieces from two consecutive pieces, and orthogonal transform means for performing orthogonal transformation on the spectrum envelope, Smoothing means for smoothing orthogonal transform coefficients, which are expansion coefficients obtained as a result of performing orthogonal transform on the spectrum envelope, and inverse transform means for performing inverse transform of the orthogonal transform on the corrected orthogonal transform coefficients And a time domain waveform generating means for generating a time domain waveform from a spectrum envelope obtained as a result of performing the inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient.
(付記2)平滑化手段は、連続する2つの素片における先の素片側の直交変換係数のうち、前記2つの素片の境界時刻に最も近い時刻の直交変換係数と、前記2つの素片における後の素片側の直交変換係数のうち、前記境界時刻に最も近い時刻の直交変換係数との間の値を、前記境界時刻における直交変換係数として定める基準決定手段と、前記境界時刻における直交変換係数に基づいて、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する補正手段とを含む付記1に記載の素片処理装置。 (Supplementary Note 2) The smoothing means includes an orthogonal transform coefficient at a time closest to a boundary time between the two segments among the orthogonal transform coefficients on the previous segment side in two consecutive segments, and the two segments. Among the orthogonal transform coefficients on the element side after the reference time, the reference determining means for determining the value between the orthogonal transform coefficient at the time closest to the boundary time as the orthogonal transform coefficient at the boundary time, and the orthogonal transform at the boundary time The segment processing apparatus according to appendix 1, including correction means for correcting the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side based on the coefficient.
(付記3)補正手段は、2つの素片の境界時刻に近い時刻ほど補正前の直交変換係数からの変化量が大きくなり、前記境界時刻から遠い時刻ほど補正前の直交変換係数からの変化量が小さくなるように、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する付記2に記載の素片処理装置。 (Additional remark 3) As for the correction | amendment means, the variation | change_quantity from the orthogonal transformation coefficient before correction | amendment becomes large as the time near the boundary time of two segments, and the variation | change_quantity from the orthogonal transformation coefficient before correction | amendment becomes the time far from the said boundary time. The segment processing apparatus according to appendix 2, wherein the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side are corrected so that becomes smaller.
(付記4)平滑化手段は、移動平均によって、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する付記1に記載の素片処理装置。 (Supplementary note 4) The segment processing apparatus according to supplementary note 1, wherein the smoothing unit corrects the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side by moving average.
(付記5)平滑化手段は、補正前の直交変換係数に定数を乗算した結果と、一定周期前の時刻における補正後の直交変換係数に、1から前記定数を減算した値を乗算した結果との和を時刻毎に計算することによって、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する付記1に記載の素片処理装置。 (Supplementary Note 5) The smoothing means multiplies the result obtained by multiplying the orthogonal transform coefficient before correction by a constant, and the result obtained by multiplying the orthogonal transform coefficient after correction at a time before a certain period by a value obtained by subtracting the constant from 1. 2. The element processing device according to appendix 1, wherein the orthogonal transformation coefficient on the previous element side and the orthogonal transformation coefficient on the subsequent element side are corrected by calculating the sum of each time.
(付記6)平滑化手段は、補正前の直交変換係数に第1の定数を乗算した結果と、一定周期前の時刻における第1の暫定的な補正後の直交変換係数に、1から前記第1の定数を減算した値を乗算した結果との和を時刻毎に計算することによって、各時刻における第1の暫定的な補正後の直交変換係数を時刻順に算出し、補正前の直交変換係数に第2の定数を乗算した結果と、一定周期後の時刻における第2の暫定的な補正後の直交変換係数に、1から前記第2の定数を減算した値を乗算した結果との和を時刻毎に計算することによって、各時刻における第2の暫定的な補正後の直交変換係数を時刻順とは逆順に算出し、各時刻における第1の暫定的な補正後の直交変換係数と第2の暫定的な補正後の直交変換係数との平均値を、各時刻における補正後の直交変換係数として算出する付記1に記載の素片処理装置。 (Supplementary note 6) The smoothing means converts the first orthogonal transform coefficient from 1 to the result obtained by multiplying the orthogonal transform coefficient before correction by the first constant and the first provisional corrected orthogonal transform coefficient at a time before a predetermined period. By calculating the sum of the result of subtracting the constant of 1 and the result of multiplication for each time, the first provisional corrected orthogonal transform coefficient at each time is calculated in time order, and the orthogonal transform coefficient before correction And the result of multiplying the second constant by a value obtained by subtracting the second constant from the second provisional corrected orthogonal transform coefficient at a time after a fixed period. By calculating for each time, the second provisional corrected orthogonal transform coefficient at each time is calculated in the reverse order of the time order, and the first provisional corrected orthogonal transform coefficient at each time and the first The average value of the two orthogonal correction coefficients after tentative correction is calculated at each time. Fragment processing apparatus according to note 1, calculated as the orthogonal transform coefficients after the correction.
(付記7)連続する2つの素片から、当該2つの素片の境界時刻を含む期間におけるスペクトル包絡を抽出し、前記スペクトル包絡に対して直交変換を行い、前記スペクトル包絡に対して直交変換を行った結果得られる展開係数である直交変換係数を平滑化し、補正後の直交変換係数に対して前記直交変換の逆変換を行い、補正後の直交変換係数に対して前記直交変換の逆変換を行った結果得られるスペクトル包絡から時間領域波形を生成することを特徴とする素片処理方法。 (Supplementary note 7) A spectrum envelope in a period including a boundary time between the two segments is extracted from two consecutive segments, orthogonal transform is performed on the spectrum envelope, and orthogonal transform is performed on the spectrum envelope. Smoothing the orthogonal transformation coefficient, which is the expansion coefficient obtained as a result of performing, performing the inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient, and performing the inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient A segment processing method, comprising: generating a time domain waveform from a spectrum envelope obtained as a result of performing.
(付記8)コンピュータに、連続する2つの素片から、当該2つの素片の境界時刻を含む期間におけるスペクトル包絡を抽出するスペクトル包絡抽出処理、前記スペクトル包絡に対して直交変換を行う直交変換処理、前記スペクトル包絡に対して直交変換を行った結果得られる展開係数である直交変換係数を平滑化する平滑化処理、補正後の直交変換係数に対して前記直交変換の逆変換を行う逆変換処理、および、補正後の直交変換係数に対して前記直交変換の逆変換を行った結果得られるスペクトル包絡から時間領域波形を生成する時間領域波形生成処理を実行させるための素片処理プログラム。 (Supplementary note 8) Spectral envelope extraction processing for extracting a spectral envelope in a period including a boundary time between the two segments from two continuous segments, and orthogonal transformation processing for performing orthogonal transformation on the spectral envelope A smoothing process for smoothing an orthogonal transform coefficient, which is an expansion coefficient obtained as a result of performing an orthogonal transform on the spectrum envelope, and an inverse transform process for performing an inverse transform of the orthogonal transform on the corrected orthogonal transform coefficient And a segment processing program for executing time domain waveform generation processing for generating a time domain waveform from a spectrum envelope obtained as a result of performing inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient.
(付記9)連続する2つの素片から、当該2つの素片の境界時刻を含む期間におけるスペクトル包絡を抽出するスペクトル包絡抽出部と、前記スペクトル包絡に対して直交変換を行う直交変換部と、前記スペクトル包絡に対して直交変換を行った結果得られる展開係数である直交変換係数を平滑化する平滑化部と、補正後の直交変換係数に対して前記直交変換の逆変換を行う逆変換部と、補正後の直交変換係数に対して前記直交変換の逆変換を行った結果得られるスペクトル包絡から時間領域波形を生成する時間領域波形生成部とを備えることを特徴とする素片処理装置。 (Supplementary Note 9) A spectrum envelope extraction unit that extracts a spectrum envelope in a period including a boundary time between the two units from two consecutive units, an orthogonal transform unit that performs orthogonal transform on the spectrum envelope, A smoothing unit that smoothes orthogonal transform coefficients that are expansion coefficients obtained as a result of performing orthogonal transform on the spectrum envelope, and an inverse transform unit that performs inverse transform of the orthogonal transform on the corrected orthogonal transform coefficient And a time domain waveform generation unit that generates a time domain waveform from a spectrum envelope obtained as a result of performing inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient.
(付記10)平滑化部は、連続する2つの素片における先の素片側の直交変換係数のうち、前記2つの素片の境界時刻に最も近い時刻の直交変換係数と、前記2つの素片における後の素片側の直交変換係数のうち、前記境界時刻に最も近い時刻の直交変換係数との間の値を、前記境界時刻における直交変換係数として定める基準決定部と、前記境界時刻における直交変換係数に基づいて、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する補正部とを含む付記9に記載の素片処理装置。 (Supplementary Note 10) The smoothing unit includes an orthogonal transform coefficient at a time closest to a boundary time of the two segments among the orthogonal transform coefficients on the previous segment side in two consecutive segments, and the two segments A reference determining unit that determines a value between the orthogonal transformation coefficient at the time closest to the boundary time as the orthogonal transformation coefficient at the boundary time, and the orthogonal transformation at the boundary time The segment processing apparatus according to appendix 9, further comprising: a correction unit that corrects the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side based on the coefficient.
(付記11)補正部は、2つの素片の境界時刻に近い時刻ほど補正前の直交変換係数からの変化量が大きくなり、前記境界時刻から遠い時刻ほど補正前の直交変換係数からの変化量が小さくなるように、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する付記10に記載の素片処理装置。 (Supplementary Note 11) The correction unit increases the amount of change from the orthogonal transformation coefficient before correction as the time is closer to the boundary time between the two segments, and the amount of change from the orthogonal transformation coefficient before correction as the time is farther from the boundary time. The segment processing apparatus according to appendix 10, wherein the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side are corrected so as to be smaller.
(付記12)平滑化部は、移動平均によって、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する付記9に記載の素片処理装置。 (Supplementary note 12) The segment processing apparatus according to supplementary note 9, wherein the smoothing unit corrects the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side by moving average.
(付記13)平滑化部は、補正前の直交変換係数に定数を乗算した結果と、一定周期前の時刻における補正後の直交変換係数に、1から前記定数を減算した値を乗算した結果との和を時刻毎に計算することによって、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する付記9に記載の素片処理装置。 (Supplementary note 13) The smoothing unit multiplies the result of multiplying the orthogonal transform coefficient before correction by a constant, and the result of multiplying the orthogonal transform coefficient after correction at a time before a certain period by a value obtained by subtracting the constant from 1. The segment processing apparatus according to appendix 9, wherein the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side are corrected by calculating the sum of the values at each time.
(付記14)平滑化部は、補正前の直交変換係数に第1の定数を乗算した結果と、一定周期前の時刻における第1の暫定的な補正後の直交変換係数に、1から前記第1の定数を減算した値を乗算した結果との和を時刻毎に計算することによって、各時刻における第1の暫定的な補正後の直交変換係数を時刻順に算出し、補正前の直交変換係数に第2の定数を乗算した結果と、一定周期後の時刻における第2の暫定的な補正後の直交変換係数に、1から前記第2の定数を減算した値を乗算した結果との和を時刻毎に計算することによって、各時刻における第2の暫定的な補正後の直交変換係数を時刻順とは逆順に算出し、各時刻における第1の暫定的な補正後の直交変換係数と第2の暫定的な補正後の直交変換係数との平均値を、各時刻における補正後の直交変換係数として算出する付記9に記載の素片処理装置。 (Supplementary Note 14) The smoothing unit converts the result of multiplying the orthogonal transformation coefficient before correction by the first constant and the first provisional corrected orthogonal transformation coefficient at a time before a predetermined period from 1 to the first By calculating the sum of the result of subtracting the constant of 1 and the result of multiplication for each time, the first provisional corrected orthogonal transform coefficient at each time is calculated in time order, and the orthogonal transform coefficient before correction And the result of multiplying the second constant by a value obtained by subtracting the second constant from the second provisional corrected orthogonal transform coefficient at a time after a fixed period. By calculating for each time, the second provisional corrected orthogonal transform coefficient at each time is calculated in the reverse order of the time order, and the first provisional corrected orthogonal transform coefficient at each time and the first The average value of the two orthogonal correction coefficients after tentative correction is calculated at each time. Fragment processing apparatus according to note 9 to calculate the orthogonal transformation coefficient after correction.
 この出願は、2011年7月28日に出願された日本特許出願2011-165707を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application 2011-165707 filed on July 28, 2011, the entire disclosure of which is incorporated herein.
 以上、実施形態を参照して本願発明を説明したが、本願発明は上記の実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 The present invention has been described above with reference to the embodiments, but the present invention is not limited to the above-described embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.
産業上の利用の可能性Industrial applicability
 本発明は、素片選択型の音声合成において、素片を良好に接続させるように素片に対して処理を行う素片処理装置に好適に適用される。 The present invention is preferably applied to a segment processing apparatus that performs processing on segments so that the segments are connected well in segment selection type speech synthesis.
 1 補正範囲決定部
 2 スペクトル包絡抽出部
 3 直交変換部
 4,14 平滑化処理部
 5 逆変換部
 6 ピッチ波形生成部
 7 重ね合わせ加算部
 41 基準値算出部
 42 補正部
DESCRIPTION OF SYMBOLS 1 Correction range determination part 2 Spectral envelope extraction part 3 Orthogonal transformation part 4,14 Smoothing process part 5 Inverse transformation part 6 Pitch waveform generation part 7 Overlay addition part 41 Reference value calculation part 42 Correction part

Claims (8)

  1.  連続する2つの素片から、当該2つの素片の境界時刻を含む期間におけるスペクトル包絡を抽出するスペクトル包絡抽出手段と、
     前記スペクトル包絡に対して直交変換を行う直交変換手段と、
     前記スペクトル包絡に対して直交変換を行った結果得られる展開係数である直交変換係数を平滑化する平滑化手段と、
     補正後の直交変換係数に対して前記直交変換の逆変換を行う逆変換手段と、
     補正後の直交変換係数に対して前記直交変換の逆変換を行った結果得られるスペクトル包絡から時間領域波形を生成する時間領域波形生成手段とを備える
     ことを特徴とする素片処理装置。
    Spectrum envelope extraction means for extracting a spectrum envelope in a period including a boundary time between the two segments from two consecutive segments;
    Orthogonal transform means for performing orthogonal transform on the spectral envelope;
    Smoothing means for smoothing orthogonal transform coefficients, which are expansion coefficients obtained as a result of performing orthogonal transform on the spectral envelope;
    Inverse transform means for performing inverse transform of the orthogonal transform on the corrected orthogonal transform coefficient;
    A segment processing apparatus comprising: time domain waveform generation means for generating a time domain waveform from a spectrum envelope obtained as a result of performing inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient.
  2.  平滑化手段は、
     連続する2つの素片における先の素片側の直交変換係数のうち、前記2つの素片の境界時刻に最も近い時刻の直交変換係数と、前記2つの素片における後の素片側の直交変換係数のうち、前記境界時刻に最も近い時刻の直交変換係数との間の値を、前記境界時刻における直交変換係数として定める基準決定手段と、
     前記境界時刻における直交変換係数に基づいて、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する補正手段とを含む
     請求項1に記載の素片処理装置。
    The smoothing means is
    Of the orthogonal transformation coefficients on the preceding element side in the two consecutive elements, the orthogonal transformation coefficient at the time closest to the boundary time of the two elements, and the orthogonal transformation coefficient on the subsequent element side in the two elements A reference determining means for determining a value between an orthogonal transform coefficient at a time closest to the boundary time as an orthogonal transform coefficient at the boundary time;
    The segment processing apparatus according to claim 1, further comprising: a correcting unit that corrects the orthogonal transform coefficient on the previous unit side and the orthogonal transform coefficient on the subsequent unit side based on the orthogonal transform coefficient at the boundary time.
  3.  補正手段は、2つの素片の境界時刻に近い時刻ほど補正前の直交変換係数からの変化量が大きくなり、前記境界時刻から遠い時刻ほど補正前の直交変換係数からの変化量が小さくなるように、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する
     請求項2に記載の素片処理装置。
    The correcting means increases the amount of change from the orthogonal transformation coefficient before correction as the time is closer to the boundary time between the two segments, and decreases the amount of change from the orthogonal transformation coefficient before correction as the time is farther from the boundary time. The segment processing apparatus according to claim 2, wherein the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side are corrected.
  4.  平滑化手段は、移動平均によって、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する
     請求項1に記載の素片処理装置。
    The segment processing apparatus according to claim 1, wherein the smoothing unit corrects the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side by moving average.
  5.  平滑化手段は、補正前の直交変換係数に定数を乗算した結果と、一定周期前の時刻における補正後の直交変換係数に、1から前記定数を減算した値を乗算した結果との和を時刻毎に計算することによって、先の素片側の直交変換係数および後の素片側の直交変換係数を補正する
     請求項1に記載の素片処理装置。
    The smoothing means calculates the sum of the result obtained by multiplying the orthogonal transform coefficient before correction by a constant and the result obtained by multiplying the orthogonal transform coefficient after correction at a time before a predetermined period by the value obtained by subtracting the constant from 1. The segment processing apparatus according to claim 1, wherein the orthogonal transform coefficient on the previous segment side and the orthogonal transform coefficient on the subsequent segment side are corrected by calculating each time.
  6.  平滑化手段は、
     補正前の直交変換係数に第1の定数を乗算した結果と、一定周期前の時刻における第1の暫定的な補正後の直交変換係数に、1から前記第1の定数を減算した値を乗算した結果との和を時刻毎に計算することによって、各時刻における第1の暫定的な補正後の直交変換係数を時刻順に算出し、
     補正前の直交変換係数に第2の定数を乗算した結果と、一定周期後の時刻における第2の暫定的な補正後の直交変換係数に、1から前記第2の定数を減算した値を乗算した結果との和を時刻毎に計算することによって、各時刻における第2の暫定的な補正後の直交変換係数を時刻順とは逆順に算出し、
     各時刻における第1の暫定的な補正後の直交変換係数と第2の暫定的な補正後の直交変換係数との平均値を、各時刻における補正後の直交変換係数として算出する
     請求項1に記載の素片処理装置。
    The smoothing means is
    Multiply the result of multiplying the orthogonal transformation coefficient before correction by the first constant and the first provisional corrected orthogonal transformation coefficient at a time before a certain period by the value obtained by subtracting the first constant from 1 By calculating the sum with the result for each time, the first provisional corrected orthogonal transform coefficient at each time is calculated in order of time,
    Multiply the result of multiplying the orthogonal transformation coefficient before correction by the second constant and the value obtained by subtracting the second constant from 1 to the second provisional corrected orthogonal transformation coefficient at a time after a certain period. By calculating the sum with the result for each time, the second provisional corrected orthogonal transform coefficient at each time is calculated in the reverse order of the time order,
    The average value of the first provisional corrected orthogonal transform coefficient and the second provisional corrected orthogonal transform coefficient at each time is calculated as the corrected orthogonal transform coefficient at each time. The fragment processing apparatus described.
  7.  連続する2つの素片から、当該2つの素片の境界時刻を含む期間におけるスペクトル包絡を抽出し、
     前記スペクトル包絡に対して直交変換を行い、
     前記スペクトル包絡に対して直交変換を行った結果得られる展開係数である直交変換係数を平滑化し、
     補正後の直交変換係数に対して前記直交変換の逆変換を行い、
     補正後の直交変換係数に対して前記直交変換の逆変換を行った結果得られるスペクトル包絡から時間領域波形を生成する
     ことを特徴とする素片処理方法。
    Extracting the spectral envelope in a period including the boundary time between the two segments from two consecutive segments,
    Performing orthogonal transform on the spectral envelope;
    Smoothing the orthogonal transform coefficient, which is the expansion coefficient obtained as a result of performing orthogonal transform on the spectral envelope,
    The inverse transform of the orthogonal transform is performed on the corrected orthogonal transform coefficient,
    A segment processing method, comprising: generating a time domain waveform from a spectrum envelope obtained as a result of performing inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient.
  8.  コンピュータに、
     連続する2つの素片から、当該2つの素片の境界時刻を含む期間におけるスペクトル包絡を抽出するスペクトル包絡抽出処理、
     前記スペクトル包絡に対して直交変換を行う直交変換処理、
     前記スペクトル包絡に対して直交変換を行った結果得られる展開係数である直交変換係数を平滑化する平滑化処理、
     補正後の直交変換係数に対して前記直交変換の逆変換を行う逆変換処理、および、
     補正後の直交変換係数に対して前記直交変換の逆変換を行った結果得られるスペクトル包絡から時間領域波形を生成する時間領域波形生成処理
     を実行させるための素片処理プログラム。
    On the computer,
    A spectral envelope extraction process for extracting a spectral envelope in a period including a boundary time between the two segments from two consecutive segments;
    An orthogonal transformation process for performing orthogonal transformation on the spectrum envelope;
    A smoothing process for smoothing orthogonal transform coefficients, which are expansion coefficients obtained as a result of performing orthogonal transform on the spectral envelope;
    An inverse transform process for performing the inverse transform of the orthogonal transform on the corrected orthogonal transform coefficient, and
    A segment processing program for executing time domain waveform generation processing for generating a time domain waveform from a spectrum envelope obtained as a result of performing inverse transformation of the orthogonal transformation on the corrected orthogonal transformation coefficient.
PCT/JP2012/004540 2011-07-28 2012-07-13 Fragment processing device, fragment processing method, and fragment processing program WO2013014876A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2011165707 2011-07-28
JP2011-165707 2011-07-28

Publications (1)

Publication Number Publication Date
WO2013014876A1 true WO2013014876A1 (en) 2013-01-31

Family

ID=47600753

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2012/004540 WO2013014876A1 (en) 2011-07-28 2012-07-13 Fragment processing device, fragment processing method, and fragment processing program

Country Status (2)

Country Link
JP (1) JPWO2013014876A1 (en)
WO (1) WO2013014876A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0247700A (en) * 1988-08-10 1990-02-16 Nippon Hoso Kyokai <Nhk> Speech synthesizing method
JPH03501896A (en) * 1988-09-02 1991-04-25 フランス共和国 Processing device for speech synthesis by adding and superimposing waveforms
JP2009163121A (en) * 2008-01-09 2009-07-23 Toshiba Corp Voice processor, and program therefor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0247700A (en) * 1988-08-10 1990-02-16 Nippon Hoso Kyokai <Nhk> Speech synthesizing method
JPH03501896A (en) * 1988-09-02 1991-04-25 フランス共和国 Processing device for speech synthesis by adding and superimposing waveforms
JP2009163121A (en) * 2008-01-09 2009-07-23 Toshiba Corp Voice processor, and program therefor

Also Published As

Publication number Publication date
JPWO2013014876A1 (en) 2015-02-23

Similar Documents

Publication Publication Date Title
JP6041815B2 (en) Audio signal decoder, audio signal encoder, method of generating decoded multi-channel audio signal representation, method of generating encoded multi-channel audio signal representation, and computer program
Le Roux et al. Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction.
US9047874B2 (en) Noise suppression method, device, and program
EP3806096B1 (en) Improved subband block based harmonic transposition
KR101744621B1 (en) Cross product enhanced subband block based harmonic transposition
KR100717625B1 (en) Formant frequency estimation method and apparatus in speech recognition
US20110087488A1 (en) Speech synthesis apparatus and method
JP2006337415A (en) Method and apparatus for suppressing noise
JP4827661B2 (en) Signal processing method and apparatus
JP4031813B2 (en) Audio signal processing apparatus, audio signal processing method, and program for causing computer to execute the method
JP2009075536A (en) Steady rate calculation device, noise level estimation device, noise suppressing device, and method, program and recording medium thereof
EP2249333B1 (en) Method and apparatus for estimating a fundamental frequency of a speech signal
Morise Error evaluation of an F0-adaptive spectral envelope estimator in robustness against the additive noise and F0 error
JP4454591B2 (en) Noise spectrum estimation method, noise suppression method, and noise suppression device
JP2008216721A (en) Noise suppression method, device, and program
WO2011013244A1 (en) Audio processing apparatus
WO2013014876A1 (en) Fragment processing device, fragment processing method, and fragment processing program
JP5413575B2 (en) Noise suppression method, apparatus, and program
US20090326951A1 (en) Speech synthesizing apparatus and method thereof
JP5325130B2 (en) LPC analysis device, LPC analysis method, speech analysis / synthesis device, speech analysis / synthesis method, and program
Kraft et al. Improved PVSOLA time-stretching and pitch-shifting for polyphonic audio
US10388264B2 (en) Audio signal processing apparatus, audio signal processing method, and audio signal processing program
JP6131574B2 (en) Audio signal processing apparatus, method, and program
JP5705086B2 (en) Vocal tract spectrum extraction device, vocal tract spectrum extraction method and program
JP5066141B2 (en) Signal enhancement apparatus, method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12818118

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2013525565

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12818118

Country of ref document: EP

Kind code of ref document: A1