JP2007148172A5 - - Google Patents

Download PDF

Info

Publication number
JP2007148172A5
JP2007148172A5 JP2005344737A JP2005344737A JP2007148172A5 JP 2007148172 A5 JP2007148172 A5 JP 2007148172A5 JP 2005344737 A JP2005344737 A JP 2005344737A JP 2005344737 A JP2005344737 A JP 2005344737A JP 2007148172 A5 JP2007148172 A5 JP 2007148172A5
Authority
JP
Japan
Prior art keywords
speech
unit
priority
units
selection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2005344737A
Other languages
Japanese (ja)
Other versions
JP2007148172A (en
JP4664194B2 (en
Filing date
Publication date
Application filed filed Critical
Priority to JP2005344737A priority Critical patent/JP4664194B2/en
Priority claimed from JP2005344737A external-priority patent/JP4664194B2/en
Publication of JP2007148172A publication Critical patent/JP2007148172A/en
Publication of JP2007148172A5 publication Critical patent/JP2007148172A5/ja
Application granted granted Critical
Publication of JP4664194B2 publication Critical patent/JP4664194B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Claims (8)

音声素片の系列から音声を合成し、合成される音声に含まれる音声素片を、ユーザの指定に従って、異なる声質の音声素片に変更する声質制御装置であって、
少なくとも音韻情報と韻律情報とを含む入力に対して選択の候補となる複数の音声素片を格納する素片記憶手段と、
前記各入力に対して、候補となる複数の音声素片からなる前記各素片群のうちから音声素片系列を、隣接する音声素片間で予め定められた連続性を保って選択する素片選択手段と、
前記素片選択手段によって選択された音声素片の系列から音声を合成してユーザに提示する合成手段と、
前記素片群の中から、前記音声合成された系列に含まれる音声素片よりも優先的に選択されるべき音声素片を指定するためのユーザからの入力を受け付ける入力手段と、
前記ユーザにより指定された音声素片に対して、前記素片選択手段によって選択された音声素片よりも高い優先度を決定する優先度決定手段とを備え、
前記素片選択手段は、前記ユーザにより指定された前記音声素片を、前記優先度決定手段により決定された優先度に基づいて前記素片群の中から再選択し、さらに、その前後の音声素片を前記再選択前と同一の選択を許容して再選択し、
前記声質制御装置は、さらに、
前記ユーザにより順次、指定された複数の音声素片を含む音声素片の系列において、音声素片間の接続歪みを、予め定められた方法で前記音声素片間の連続性を計測することにより検出する接続歪み検出手段と、
前記素片選択手段により音声素片を再選択した後に、前記接続歪み検出手段により、所定の閾値以上の歪みを検出した場合に、前記優先度決定手段により決定された音声素片の優先度を調整する優先度調整手段とを備え、
前記素片選択手段は、調整された優先度に基づいて音声素片をさらに再選択する
ことを特徴とする声質制御装置。
A speech quality control device that synthesizes speech from a sequence of speech units and changes speech units included in the synthesized speech to speech units of different voice qualities according to user designation,
Segment storage means for storing a plurality of speech segments that are candidates for selection with respect to an input including at least phoneme information and prosody information;
For each input, a unit for selecting a speech unit sequence from the group of units composed of a plurality of candidate speech units while maintaining a predetermined continuity between adjacent speech units. A piece selection means;
Synthesizing means for synthesizing speech from a sequence of speech units selected by the unit selection means and presenting it to the user;
An input unit for receiving an input from a user for designating a speech unit to be preferentially selected from speech units included in the speech-synthesized sequence from the unit group;
A priority determining unit that determines a higher priority than the speech unit selected by the unit selecting unit for the speech unit specified by the user;
The unit selection unit reselects the speech unit designated by the user from the unit group based on the priority determined by the priority determination unit, and further, the speech before and after the unit is selected. the fragment was allowed the same selection and before the reselection reselect,
The voice quality control device further includes:
By measuring connection continuity between speech units in a sequence of speech units including a plurality of speech units designated by the user sequentially, and measuring continuity between the speech units by a predetermined method. A connection distortion detecting means for detecting;
After re-selecting the speech unit by the unit selection unit, when the distortion more than a predetermined threshold is detected by the connection distortion detection unit, the priority of the speech unit determined by the priority determination unit is set. A priority adjusting means for adjusting,
The voice quality control device , wherein the segment selection means further re-selects a voice segment based on the adjusted priority .
前記優先度調整手段は、前記音声素片の系列において、近接して指定された前記各音声素片に対して決定された優先度の大小、前記各音声素片の指定の時間的関係、指定された前記各音声素片の指定回数の少なくとも1つに基づいて、前記優先度を調整する
ことを特徴とする請求項に記載の声質制御装置。
The priority adjustment means includes: a priority level determined for each of the speech units specified in proximity in the sequence of the speech units, a temporal relationship of designation of the speech units, and designation has been based on said at least one specified number of voice segments, voice quality control apparatus according to claim 1, characterized in that to adjust the priority.
前記優先度調整手段は、時間を変数として優先度を表す関数に基づいて、前記指定の時間的前後に応じた前記各音声素片間の優先度の大小を判定し、優先度が小さいと判定された前記音声素片については、前後の音声素片を含め、優先度が大きいと判定された前記音声素片との接続歪みがより小さくなる音声素片が前記素片群から再選択されるよう前記優先度を調整する
ことを特徴とする請求項に記載の声質制御装置。
The priority adjustment means determines the priority between the speech units according to the designated time before and after the designated time based on a function representing priority with time as a variable, and determines that the priority is low. For the speech unit that has been determined, a speech unit that has a smaller connection distortion with the speech unit that has been determined to have a higher priority is reselected from the group of segments, including previous and next speech units. The voice quality control apparatus according to claim 2 , wherein the priority is adjusted as described above.
前記関数は、正の値を取り、前記変数の一次係数の正負によって単調増加又は単調減少する関数であって、前記一次係数は、前記ユーザの指定が時間的に遅い方を重要視する場合には正の値に、時間的に早い方を重要視する場合には負の値に設定される
ことを特徴とする請求項に記載の声質制御装置。
The function is a function that takes a positive value and monotonously increases or decreases monotonically depending on the positive and negative of the primary coefficient of the variable, and the primary coefficient is used when importance is given to the user specified later in time. The voice quality control device according to claim 3 , wherein is set to a positive value, and is set to a negative value when importance is given to the earlier one in terms of time.
前記素片記憶手段は、類似度に基づきクラスタリングされた複数の音声素片を格納し、
前記優先度調整手段により調整された優先度が所定の閾値よりも小さい場合には、調整された音声素片が属するクラスタの優先度を、前記優先度調整手段により調整された優先
度とする
ことを特徴とする請求項に記載の声質制御装置。
The unit storage means stores a plurality of speech units clustered based on similarity,
When the priority adjusted by the priority adjustment unit is smaller than a predetermined threshold, the priority of the cluster to which the adjusted speech segment belongs is set as the priority adjusted by the priority adjustment unit. The voice quality control device according to claim 2 .
前記声質制御装置は、さらに、
前記音声合成によりユーザに提示された音声素片の前記系列と、前記系列に含まれる音声素片の選択の候補となった素片群とを表示する表示手段を備え、
前記入力手段は、表示された前記素片群の中から、優先的に選択されるべき音声素片を指定するための入力を受け付ける
ことを特徴とする請求項1に記載の声質制御装置。
The voice quality control device further includes:
Display means for displaying the sequence of speech units presented to the user by the speech synthesis and a segment group that is a candidate for selection of speech units included in the sequence;
The voice quality control apparatus according to claim 1, wherein the input unit receives an input for designating a speech unit to be preferentially selected from the displayed unit group.
音声素片の系列から音声を合成し、合成される音声に含まれる音声素片を、ユーザの指定に従って、異なる声質の音声素片に変更する声質制御方法であって、
少なくとも音韻情報と韻律情報とを含む入力に対して選択の候補となる複数の音声素片を格納する素片記憶ステップと、
前記各入力に対して、候補となる複数の音声素片からなる前記各素片群のうちから音声素片系列を、隣接する音声素片間で予め定められた連続性を保って選択する素片選択ステップと、
前記素片選択ステップによって選択された音声素片の系列から音声を合成してユーザに提示する合成ステップと、
前記素片群の中から、前記音声合成された系列に含まれる音声素片よりも優先的に選択されるべき音声素片を指定するためのユーザからの入力を受け付ける入力ステップと、
前記ユーザにより指定された音声素片に対して、前記素片選択手段によって選択された音声素片よりも高い優先度を決定する優先度決定ステップとを含み、
前記素片選択ステップでは、前記ユーザにより指定された前記音声素片を、前記優先度決定ステップにより決定された優先度に基づいて前記素片群の中から再選択し、さらに、その前後の音声素片を前記再選択前と同一の選択を許容して再選択し、
前記声質制御方法は、さらに、
前記ユーザにより順次、指定された複数の音声素片を含む音声素片の系列において、音声素片間の接続歪みを、予め定められた方法で前記音声素片間の連続性を計測することにより検出する接続歪み検出ステップと、
前記素片選択ステップにより音声素片を再選択した後に、前記接続歪み検出ステップにより、所定の閾値以上の歪みを検出した場合に、前記優先度決定ステップにより決定された音声素片の優先度を調整する優先度調整ステップとを含み、
前記素片選択ステップでは、調整された優先度に基づいて音声素片をさらに再選択する
ことを特徴とする声質制御方法。
A speech quality control method for synthesizing speech from a sequence of speech segments and changing speech segments included in the synthesized speech to speech segments of different voice qualities according to user designation,
A segment storage step for storing a plurality of speech segments that are candidates for selection with respect to an input including at least phoneme information and prosody information;
For each input, a unit for selecting a speech unit sequence from the group of units composed of a plurality of candidate speech units while maintaining a predetermined continuity between adjacent speech units. A single selection step;
A synthesis step of synthesizing speech from the sequence of speech segments selected by the segment selection step and presenting it to the user;
An input step of receiving an input from a user for designating a speech unit to be preferentially selected from speech units included in the speech synthesized sequence from the unit group;
A priority determination step for determining a higher priority than the speech unit selected by the unit selection unit for the speech unit specified by the user,
In the segment selection step, the speech segment specified by the user is reselected from the segment group based on the priority determined in the priority determination step, and the speech before and after the segment is further selected. the fragment was allowed the same selection and before the reselection reselect,
The voice quality control method further includes:
By measuring connection continuity between speech units in a sequence of speech units including a plurality of speech units designated by the user sequentially, and measuring continuity between the speech units by a predetermined method. A connection distortion detection step to detect;
After re-selecting a speech unit by the unit selection step, if a distortion greater than a predetermined threshold is detected by the connection distortion detection step, the priority of the speech unit determined by the priority determination step is set. A priority adjustment step to adjust,
In the segment selection step, a speech segment is further selected again based on the adjusted priority .
音声素片の系列から音声を合成し、合成される音声に含まれる音声素片を、ユーザの指定に従って、異なる声質の音声素片に変更する声質制御装置のためのプログラムであって、
コンピュータに、少なくとも音韻情報と韻律情報とを含む入力に対して選択の候補となる複数の音声素片を格納する素片記憶ステップと、前記各入力に対して、候補となる複数の音声素片からなる前記各素片群のうちから音声素片系列を、隣接する音声素片間で予め定められた連続性を保って選択する素片選択ステップと、前記素片選択ステップによって選択された音声素片の系列から音声を合成してユーザに提示する合成ステップと、前記素片群の中から、前記音声合成された系列に含まれる音声素片よりも優先的に選択されるべき音声素片を指定するためのユーザからの入力を受け付ける入力ステップと、前記ユーザにより指定された音声素片に対して、前記素片選択手段によって選択された音声素片よりも高い優先度を決定する優先度決定ステップとを含み、前記素片選択ステップでは、前記ユーザにより指定された前記音声素片を、前記優先度決定ステップにより決定された優先度に基づいて前記素片群の中から再選択し、さらに、その前後の音声素片を前記再選択前と同一の選択を許容して再選択し、さらに、
前記ユーザにより順次、指定された複数の音声素片を含む音声素片の系列において、音声素片間の接続歪みを、予め定められた方法で前記音声素片間の連続性を計測することにより検出する接続歪み検出ステップと、前記素片選択ステップにより音声素片を再選択した後に、前記接続歪み検出ステップにより、所定の閾値以上の歪みを検出した場合に、前記優先度決定ステップにより決定された音声素片の優先度を調整する優先度調整ステップとを含み、前記素片選択ステップでは、調整された優先度に基づいて音声素片をさらに再選択することを実行させるプログラム。
A program for a voice quality control device that synthesizes speech from a sequence of speech units and changes speech units included in the synthesized speech to speech units of different voice qualities according to a user designation,
Storing a plurality of speech units as candidates for selection with respect to an input including at least phoneme information and prosody information in a computer; and a plurality of speech units as candidates for each of the inputs A speech segment sequence that selects a speech segment sequence from the group of segments consisting of the above while maintaining predetermined continuity between adjacent speech segments; and the speech selected by the segment selection step A synthesis step of synthesizing speech from the sequence of segments and presenting it to the user; and a speech segment to be preferentially selected from speech segments included in the speech synthesized sequence from the segment group An input step for accepting an input from a user for designating a priority, and a priority for determining a higher priority than the speech unit selected by the unit selection unit for the speech unit specified by the user Re-selecting the speech unit designated by the user from the unit group based on the priority determined by the priority determination step. Furthermore, the preceding and the speech unit to reselect to permit the same selection and before the reselection, and further,
By measuring connection continuity between speech units in a sequence of speech units including a plurality of speech units designated by the user sequentially, and measuring continuity between the speech units by a predetermined method. After the re-selection of the speech element by the connection distortion detection step to be detected and the element selection step, the distortion is determined by the priority determination step when a distortion of a predetermined threshold value or more is detected by the connection distortion detection step. A priority adjustment step of adjusting the priority of the speech unit, wherein the unit selection step further executes reselection of the speech unit based on the adjusted priority .
JP2005344737A 2005-11-29 2005-11-29 Voice quality control device and method, and program storage medium Expired - Fee Related JP4664194B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2005344737A JP4664194B2 (en) 2005-11-29 2005-11-29 Voice quality control device and method, and program storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2005344737A JP4664194B2 (en) 2005-11-29 2005-11-29 Voice quality control device and method, and program storage medium

Publications (3)

Publication Number Publication Date
JP2007148172A JP2007148172A (en) 2007-06-14
JP2007148172A5 true JP2007148172A5 (en) 2008-12-25
JP4664194B2 JP4664194B2 (en) 2011-04-06

Family

ID=38209625

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2005344737A Expired - Fee Related JP4664194B2 (en) 2005-11-29 2005-11-29 Voice quality control device and method, and program storage medium

Country Status (1)

Country Link
JP (1) JP4664194B2 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5198200B2 (en) * 2008-09-25 2013-05-15 株式会社東芝 Speech synthesis apparatus and method
JP2011180368A (en) * 2010-03-01 2011-09-15 Fujitsu Ltd Synthesized voice correction device and synthesized voice correction method
JP5123347B2 (en) * 2010-03-31 2013-01-23 株式会社東芝 Speech synthesizer
JP5648347B2 (en) * 2010-07-14 2015-01-07 ヤマハ株式会社 Speech synthesizer
KR101201913B1 (en) * 2010-11-08 2012-11-15 주식회사 보이스웨어 Voice Synthesizing Method and System Based on User Directed Candidate-Unit Selection
JP5712818B2 (en) * 2011-06-30 2015-05-07 富士通株式会社 Speech synthesis apparatus, sound quality correction method and program

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3281281B2 (en) * 1996-03-12 2002-05-13 株式会社東芝 Speech synthesis method and apparatus
JP3423276B2 (en) * 2000-08-10 2003-07-07 三洋電機株式会社 Voice synthesis method
JP2004145015A (en) * 2002-10-24 2004-05-20 Fujitsu Ltd System and method for text speech synthesis
JP4311710B2 (en) * 2003-02-14 2009-08-12 株式会社アルカディア Speech synthesis controller
JP2005181998A (en) * 2003-11-28 2005-07-07 Matsushita Electric Ind Co Ltd Speech synthesizer and speech synthesizing method

Similar Documents

Publication Publication Date Title
JP2007148172A5 (en)
JP6264081B2 (en) Program, method and system for selecting video segment, and program for selecting segment of media clip
TWI337740B (en)
US9087501B2 (en) Sound signal analysis apparatus, sound signal analysis method and sound signal analysis program
US9747876B1 (en) Adaptive layout of sheet music in coordination with detected audio
CN104517605B (en) A kind of sound bite splicing system and method for phonetic synthesis
JP2001265326A (en) Performance position detecting device and score display device
JP2006295514A5 (en)
JP2016156996A (en) Electronic device, method, and program
JP5888356B2 (en) Voice search device, voice search method and program
JP4664194B2 (en) Voice quality control device and method, and program storage medium
JP2012108451A (en) Audio processor, method and program
US8728822B2 (en) Method, apparatus, and recording medium for performance game
CN104205212A (en) Talker collision in auditory scene
GB2506404A (en) Computer implemented iterative method of cross-fading between two audio tracks
JP4921343B2 (en) Performance evaluation device, program and electronic musical instrument
JPWO2018207936A1 (en) Automatic musical score detection method and device
CN103974142B (en) A kind of video broadcasting method and system
JP2006313176A (en) Speech synthesizer
JP4687297B2 (en) Image processing apparatus and program
JP4442239B2 (en) Voice speed conversion device and voice speed conversion method
JP7232654B2 (en) karaoke equipment
JP4846548B2 (en) Audio information selection device and audio information selection method
JP6569343B2 (en) Voice search device, voice search method and program
WO2017164216A1 (en) Acoustic processing method and acoustic processing device