WO2012043168A1 - 音声変換装置、携帯電話端末、音声変換方法および記録媒体 - Google Patents
音声変換装置、携帯電話端末、音声変換方法および記録媒体 Download PDFInfo
- Publication number
- WO2012043168A1 WO2012043168A1 PCT/JP2011/070248 JP2011070248W WO2012043168A1 WO 2012043168 A1 WO2012043168 A1 WO 2012043168A1 JP 2011070248 W JP2011070248 W JP 2011070248W WO 2012043168 A1 WO2012043168 A1 WO 2012043168A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- correction
- character string
- phrase
- voice
- word
- Prior art date
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012937 correction Methods 0.000 claims abstract description 171
- 238000004891 communication Methods 0.000 claims description 12
- 230000005540 biological transmission Effects 0.000 claims description 7
- 240000000220 Panda oleosa Species 0.000 description 66
- 235000016496 Panda oleosa Nutrition 0.000 description 66
- 238000010586 diagram Methods 0.000 description 5
- 230000001413 cellular effect Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 2
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2250/00—Details of telephonic subscriber devices
- H04M2250/70—Details of telephonic subscriber devices methods for entering alphabetical characters, e.g. multi-tap or dictionary disambiguation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2250/00—Details of telephonic subscriber devices
- H04M2250/74—Details of telephonic subscriber devices with voice recognition means
Definitions
- the present invention relates to a voice conversion device, a mobile phone terminal, a voice conversion method, and a recording medium.
- Patent Document 1 when an error in a speech recognition result is corrected by a correction utterance by a user, the contents of the correction, specifically, a speech recognition result before correction and a speech recognition result after correction are stored. A voice recognition device is described.
- An object of the present invention is to provide a voice conversion device, a mobile phone terminal, a voice conversion method, and a recording medium that can solve the above-described problems.
- the voice conversion device includes a voice recognition unit that converts a voice into a character string each time a voice is received, a display unit that displays the character string, and a part of the character string displayed on the display unit.
- a correction unit that corrects the word according to the correction instruction
- a storage unit that stores correction contents of the word executed by the correction unit
- the voice recognition unit When the speech is converted into a character string, if the content of correction for the phrase in the character string is stored in the storage means, a selection candidate reflecting the content of the correction is generated, and the selection candidate Control means for displaying on the display means as the speech recognition result candidates.
- the speech conversion device is a speech conversion device capable of communicating with a speech recognition device that converts the speech data into a character string each time the speech data is received and transmits the character string to the transmission source of the speech data.
- Output means for converting the input speech into speech data; and communication means for transmitting the speech data to the speech recognition device and then receiving a character string that is a conversion result of the speech data from the speech recognition device;
- a display unit that displays the character string, and a correction unit that corrects a word or phrase in the character string in accordance with the correction instruction when receiving a correction instruction to correct a phrase that is a part of the character string displayed on the display unit.
- And storage means for storing the contents of correction of the phrase executed by the correction means, and when the communication means receives a character string from the speech recognition apparatus, If the contents of all the corrections are stored in the storage means, a selection candidate reflecting the contents of the correction is generated, and the selection candidates are displayed on the display means as the speech recognition result candidates. Control means.
- the speech conversion method of the present invention is a speech conversion method performed by a speech conversion device, and converts speech into a character string each time a speech is received, displays the character string on a display means, and displays the text on the display means.
- a correction instruction for correcting a phrase that is a part of the character string is received, the phrase is corrected according to the correction instruction, and the content of the correction for the phrase that has been corrected is stored in a storage unit.
- the speech is converted into a character string, if the content of correction for the phrase in the character string is stored in the storage means, a selection candidate reflecting the content of the correction is generated, Selection candidates are displayed on the display means as speech recognition result candidates.
- the voice conversion method converts the voice data into a character string each time voice data is received, and the voice performed by the voice converter capable of communicating with the voice recognition device that transmits the character string to the transmission source of the voice data.
- a conversion method that converts input speech into speech data, transmits the speech data to the speech recognition device, and then receives a character string that is a conversion result of the speech data from the speech recognition device, The character string is displayed on the display means, and upon receiving a correction instruction for correcting a word that is a part of the character string displayed on the display means, the word in the character string is corrected according to the correction instruction, and the correction Is stored in the storage means, and when the character string is received from the speech recognition apparatus, the correction contents of the phrase in the character string are stored in the storage means. It is in the case which produces a selected candidate that reflects the contents of the correction, the selection candidate as a recognition result candidates of the voice is displayed on the display means.
- the recording medium of the present invention includes a voice recognition procedure for converting a voice into a character string every time a voice is received by a computer, a display procedure for displaying the character string on a display means, and a character displayed on the display means.
- a correction instruction for correcting a word that is part of a column is received, a correction procedure for correcting the word in accordance with the correction instruction, and a storage that stores in the storage means the contents of correction for the word that has been corrected If the contents of the correction for the words and phrases in the character string are stored in the storage means when the voice is converted into the character string in the procedure and the voice recognition procedure, the contents of the correction are reflected.
- a computer-readable program storing a program for executing the control procedure for generating the selected candidate and displaying the selected candidate on the display means as the speech recognition result candidate.
- the recording medium of the present invention converts audio data into a character string each time the audio data is received, and inputs the audio input to a computer that can communicate with a voice recognition device that transmits the character string to the transmission source of the audio data.
- a storage procedure for storing the correction contents of the corrected phrase in the storage means, and when a character string is received from the voice recognition device, the phrase in the character string A control procedure for generating a selection candidate reflecting the correction content and displaying the selection candidate as the speech recognition result candidate on the display means when the correction content is stored in the storage means
- FIG. 6 is a diagram for explaining the operation of the mobile phone terminal 1.
- FIG. 6 is a diagram for explaining the operation of the mobile phone terminal 1.
- FIG. 1 is a block diagram showing a mobile phone terminal 1 according to an embodiment of the present invention.
- the mobile phone terminal 1 has a function of handling character data such as e-mail.
- the cellular phone terminal 1 includes a voice conversion device 10 according to an embodiment of the present invention.
- the voice conversion device 10 includes a conversion unit 11, a display unit 12, a correction unit 13, a storage device 14, a control unit 15, a communication unit 16, and an antenna 17.
- the conversion unit 11 includes a microphone 11a and a voice recognition unit 11b.
- the correction unit 13 includes an operation unit 13a and a character editing unit 13b.
- the conversion unit 11 can generally be referred to as voice recognition means.
- the conversion unit 11 converts the voice into a character string by performing voice recognition processing on the voice every time it receives voice.
- the microphone 11a can be generally called output means. Each time the microphone 11a accepts an input user's voice, the microphone 11a converts the input user's voice into voice data and outputs the voice data. The voice data is provided to the voice recognition unit 11b via the control unit 15, for example.
- the voice recognition unit 11b Each time the voice recognition unit 11b receives voice data, the voice recognition unit 11b performs voice recognition processing on the voice data, thereby converting the voice data into a character string and outputting the character string.
- the voice recognition unit 11b outputs a string of kana characters (katakana or hiragana) as a character string.
- the display unit 12 can generally be called display means.
- the display unit 12 displays a character string that is an output from the voice recognition unit 11b.
- the display unit 12 also displays the character editing status in the character editing unit 13b.
- the correction unit 13 can generally be referred to as correction means.
- the correction unit 13 receives a correction instruction to correct a phrase (a phrase composed of one or more characters) that is a part of the character string output by the voice recognition unit 11b.
- the correction instruction designates a phrase to be corrected and indicates the corrected phrase.
- the correction unit 13 corrects the phrase designated as corrected by the correction instruction among the words in the character string to the phrase indicated as the corrected phrase by the correction instruction.
- the phrase designated as amended by the amendment instruction is referred to as “pre-correction word / phrase”
- the word / phrase indicated as the amended word / phrase according to the amendment instruction is referred to as “after-correction word / phrase”.
- the operation unit 13a is an operation button.
- the operation button may be displayed on the display unit 12.
- the operation unit 13a receives various inputs (for example, correction instructions) from the user when operated by the user.
- correction instruction is provided to the character editing unit 13b via the control unit 15.
- the character editing unit 13b When the character editing unit 13b receives the correction instruction, the character editing unit 13b edits the character string output from the voice recognition unit 11b in accordance with the correction instruction. In the present embodiment, when receiving the correction instruction, the character editing unit 13b replaces the uncorrected word / phrase in the character string with the corrected word / phrase.
- Storage device 14 can generally be referred to as storage means.
- the storage device 14 stores a dictionary (dictionary data) necessary for character editing in the character editing unit 13b and voice recognition processing in the voice recognition unit 11b.
- the storage device 14 stores the content of correction (a set of a word before correction and a word after correction) for the word executed by the character editing unit 13b.
- the storage device 14 stores a difference dictionary (difference dictionary data) indicating the contents of correction.
- the difference dictionary represents a pre-correction phrase and a post-correction phrase that are associated with each other.
- Control unit 15 can be generally referred to as control means.
- the control unit 15 controls each unit in the mobile phone terminal 1.
- the control unit 15 displays the content of the correction.
- a reflected selection candidate is generated, and the selection candidate is displayed on the display unit 12 as a speech recognition result candidate.
- the control unit 15 determines that the character string in the character string is stored in the storage device 14 as the pre-correction word / phrase.
- a replacement character string is generated as a selection candidate by replacing the pre-correction word / phrase with the post-correction word / phrase associated with the pre-correction word / phrase.
- the control unit 15 displays the corrected phrase on the display unit 12 in a display form different from characters other than the corrected phrase among the characters in the replacement character string. For example, in the replacement character string, the control unit 15 displays the corrected character in a different color, a different size, or a different typeface from characters other than the corrected character.
- the communication unit 16 can generally be called a communication means.
- the communication unit 16 uses the voice data output from the microphone 11a as the antenna 17. Is transmitted to the speech recognition apparatus 2, and then a character string that is a conversion result of the speech data is received from the speech recognition apparatus 2 via the antenna 17.
- the voice recognition device 2 converts the voice data into a character string every time the voice data is received, and transmits the conversion result (character string) to the voice data transmission source.
- FIG. 2 is a diagram showing an example of a difference dictionary (database) stored in the storage device 14.
- the difference dictionary 14A is provided with a plurality of recognition result difference storage areas 14A1.
- the control unit 15 stores the recognition result difference storage area 14A1 in the speech recognition unit 11b.
- the recognition result difference information (contents of correction) representing the difference between the voice recognition result and the user's recognition is registered.
- the recognition result difference storage area 14A1 has a recognition result kana storage area 14A2, a correction result kana storage area 14A3, and a difference occurrence count storage area 14A4.
- kana (hereinafter referred to as “recognition result kana”) that is a word (pre-correction word / phrase) designated for correction by a correction instruction among character strings output in kana from the speech recognition unit 11b. Stored).
- correction result kana (hereinafter referred to as “correction result kana”) which is a word (corrected word / phrase) indicated as a corrected word / phrase in the correction instruction is stored.
- the difference occurrence count storage area 14A4 the number of times “recognition result kana” stored in the recognition result kana storage area 14A2 is corrected to “correction result kana” stored in the correction result kana storage area 14A3 (hereinafter “difference”). Stored as “number of occurrences”).
- the storage device 14 stores a plurality of combinations of the pre-correction word / phrase and the post-correction word / phrase, and the correction specified by the combination is executed for each set. Stored (hereinafter referred to as “execution count”).
- the control unit 15 For each word in the indicated character string, a replacement character string is generated as a selection candidate by replacing the word in the character string shown as the pre-correction word with the post-correction word paired with the pre-correction word .
- the control unit 15 determines the display order of the selection candidates on the display unit 12, the number of executions for the set used to generate the selection candidate, and the number of characters of the uncorrected word used to generate the selection candidate. , Based on the decision.
- control unit 15 gives each selection candidate a value that increases as the number of executions increases and increases as the number of characters in the uncorrected word increases, and displays the selection candidates on the display unit 12 in descending order of the values. .
- the voice conversion device 10 may be realized by a computer.
- the computer reads and executes a program recorded on a recording medium such as a CD-ROM (Compact Disk Read Only Memory) that can be read by the computer, whereby the conversion unit 11, the display unit 12, and the correction unit 13.
- a recording medium such as a CD-ROM (Compact Disk Read Only Memory) that can be read by the computer, whereby the conversion unit 11, the display unit 12, and the correction unit 13.
- the storage device 14 and the control unit 15 function.
- the recording medium is not limited to the CD-ROM and can be changed as appropriate.
- Difference information representing a difference in reading kana (kana) from the column is accumulated in the storage device 14 in the mobile phone terminal 1.
- the cellular phone terminal 1 generates a selection candidate reflecting the difference information for the result of the speech recognition processing executed by the speech recognition unit 11b, and displays the selection candidate as a speech recognition result candidate.
- the cellular phone terminal 1 generates a replacement character string in which the uncorrected word / phrase (recognition result kana) in the character string output from the speech recognition unit 11b is replaced with the corrected word / phrase (correction result kana) as a selection candidate.
- the corrected character in the replacement character string is displayed in a different color, a different size, or a different typeface from characters other than the corrected character.
- FIG. 3 is a flowchart for explaining the operation of the mobile phone terminal 1 according to the user's operation.
- the user When the user performs character input by voice to the mobile phone terminal 1, the user performs voice input by uttering a word to be input to the microphone 11a (step 301).
- the input voice is converted into voice data by the microphone 11a, and then voice recognition processing for the voice data is executed by the voice recognition unit 11b or the external voice recognition device 2. Thereafter, the control unit 15 obtains kana information (character string) as a voice recognition result (step 302).
- control unit 15 generates a recognition result candidate for kana information (character string) based on the kana information (character string) that is a voice recognition result.
- the character editing unit 13b executes kanji conversion processing for the recognition result candidate.
- the control unit 15 displays on the display unit 12 the recognition result candidates that have been converted to Kanji.
- the control unit 15 collates the kana information, which is the current speech recognition result, with the difference information stored in the difference dictionary 14A (step 303), and is indicated in the difference information.
- a search is performed as to whether there is a recognition result kana that matches a part of the kana information that is the current voice recognition result among the recognition result kana (step 304).
- the difference dictionary 14A stores the difference information shown in FIG. 4, the user utters “Hencho”, and the voice is recognized by the voice recognition engine in the voice recognition unit 11b or the voice recognition engine in the voice recognition device 2.
- the kana information that is the recognition result is “Henshu”
- the control unit 15 collates the kana information that is the current voice recognition result with the recognition result kana in the difference dictionary 14A, the partially matching recognition As a result, “shu” and “shu” are obtained as kana.
- the control unit 15 replaces the recognition result candidate kana (replacement character string) obtained by replacing the kana matching the recognition result kana with the correction result kana associated with the recognition result kana in the kana information as the current voice recognition result. Create (step 305).
- A is a recognition result kana coefficient
- B is a difference occurrence frequency coefficient, both of which are stored in the control unit 15 in advance.
- the longer the character string length of the recognition result Kana the higher the possibility that it is more similar to the utterance.
- the occurrence count is a value that considers the occurrence frequency of the recognition difference, and the importance is obtained by combining these values. calculate.
- control unit 15 uses the recognition result candidate kana “Hencho” created using the recognition result difference 1 and the recognition result candidate kana “Henshuu” created using the recognition result difference 2 with high importance. They are displayed on the display unit 12 in the order of “Hencho” and “Hensu”.
- the recognition result candidate Kana is collated with the character string registered in the Japanese dictionary by the character editing unit 13b, and is displayed as a recognition result candidate only when it matches the Japanese registered in the dictionary. If the recognition result candidate kana does not match the Japanese registered in the dictionary, the character editing unit 13b determines that the recognition result candidate kana is not a correct word as Japanese, and the control unit 15 determines the recognition result candidate kana. Is not recognized as a recognition result candidate.
- the recognition result candidate kana is displayed as a recognition result candidate together with the kana information that is the current voice recognition result (step 306).
- kana information which is the current speech recognition result, is displayed at the top, and then recognition result candidates are displayed in descending order of importance.
- control unit 15 displays the recognition result candidate kana on the display unit 12 as a recognition result candidate for the result of the kanji conversion performed by the character editing unit 13b.
- control unit 15 displays a character string obtained by converting Kana information, which is a voice recognition result, into Kanji characters as a recognition result candidate.
- the user selects a character string that matches the uttered character string from the displayed recognition result candidates (step 307).
- the control unit 15 does not change the difference dictionary, assuming that the user's utterance matches the speech recognition result (step 308).
- the control unit 15 causes a difference between the user's utterance and the speech recognition result. Assuming that there is a difference, the difference of kana is acquired and the difference is registered in the difference dictionary (step 310).
- the registration of the difference information is not limited to a word or a phrase unit, but the combination (set) of the recognition result kana “shu” and the correction result kana “so” extracted only the corrected part, or the corrected part A combination (set) of the recognition result kana “shu” and the correction result kana “sou” to which the character string before and after is added may be registered in the difference dictionary.
- the updated difference dictionary will be reflected in the next speech recognition.
- the control unit 15 stores the content of the correction for the phrase in the character string in the storage device 14.
- a selection candidate reflecting the contents of the correction is generated, and the selection candidate is displayed on the display unit 12 as a recognition result candidate of the character string.
- the control unit 15 when the conversion unit 11 converts the speech into a character string, when the word / phrase in the character string is stored in the storage device 14 as an uncorrected word / phrase, the control unit 15 A replacement character string in which the uncorrected word / phrase in the character string is replaced with the corrected word / phrase associated with the uncorrected word / phrase is generated as a selection candidate. In this case, there is a high possibility that the correction made before will be reproduced.
- the control unit 15 displays the corrected phrase on the display unit 12 in a display form different from characters other than the corrected phrase among the characters in the replacement character string.
- the control unit 15 displays the corrected character in a different color, a different size, or a different typeface from characters other than the corrected character. In this case, it is possible to emphasize to the user what kind of replacement has been performed, and it is easy for the user to notice a speech recognition error caused by the characteristics of the user's habit or microphone.
- the difference information is reflected in the voice recognition result as information indicating the characteristics of the user's bag or microphone without depending on the voice recognition engine, and the reflection result is presented.
- n A ⁇ a + B ⁇ b using the character string length and the number of occurrences
- the time information such as the data update date, the recognition result kana and the correction result kana are compared and the consonant
- Another calculation formula may be used in which information obtained by quantifying the similarity between “ma” and “mu”) and vowels (such as “ka” and “ha”) is used as a parameter.
- the method for registering data in the differential dictionary may be edited directly by the user in addition to the opportunity of performing speech recognition.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
- Telephone Function (AREA)
Abstract
Description
10 音声変換装置
11 変換部
11a マイク
11b 音声認識部
12 表示部
13 修正部
13a 操作部
13b 文字編集部
14 記憶装置
15 制御部
16 通信部
17 アンテナ
2 音声認識装置
Claims (10)
- 音声を受け付けるごとに、当該音声を文字列に変換する音声認識手段と、
前記文字列を表示する表示手段と、
前記表示手段に表示された文字列の一部である語句を修正する旨の修正指示を受け付けると、当該修正指示に従って前記語句を修正する修正手段と、
前記修正手段が実行した語句についての修正の内容を格納する格納手段と、
前記音声認識手段が音声を文字列に変換した際に、当該文字列内の語句についての修正の内容が前記格納手段に格納されている場合には、前記修正の内容を反映した選択候補を生成し、当該選択候補を、前記音声の認識結果候補として、前記表示手段に表示する制御手段と、を含む音声変換装置。 - 前記格納手段は、前記修正の内容として、前記修正手段にて修正される前の語句である修正前語句と、前記修正前語句を修正した修正後語句と、を格納し、
前記制御手段は、前記音声認識手段が音声を文字列に変換した際に、当該文字列内の語句が前記格納手段に前記修正前語句として格納されている場合には、前記文字列内の語句のうち前記修正前語句として示された語句を前記修正後語句に置き換えた置換文字列を、前記選択候補として生成する、請求項1に記載の音声変換装置。 - 前記制御手段は、前記修正後語句を、前記置換文字列内の文字のうち当該修正後語句以外の文字と異なる表示形態で、前記表示手段に表示する、請求項2に記載の音声変換装置。
- 音声データを受信するごとに当該音声データを文字列に変換し当該文字列を前記音声データの送信元に送信する音声認識装置と通信可能な音声変換装置であって、
入力された音声を音声データに変換する出力手段と、
前記音声データを前記音声認識装置に送信し、その後、前記音声認識装置から前記音声データの変換結果である文字列を受信する通信手段と、
前記文字列を表示する表示手段と、
前記表示手段に表示された文字列の一部である語句を修正する修正指示を受け付けると、当該修正指示に従って前記文字列内の語句を修正する修正手段と、
前記修正手段が実行した語句についての修正の内容を格納する格納手段と、
前記通信手段が前記音声認識装置から文字列を受信した際に、当該文字列内の語句についての修正の内容が前記格納手段に格納されている場合には、前記修正の内容を反映した選択候補を生成し、当該選択候補を、前記音声の認識結果候補として、前記表示手段に表示する制御手段と、を含む音声変換装置。 - 前記格納手段は、前記修正の内容として、前記修正手段にて修正される前の語句である修正前語句と、前記修正前語句を修正した修正後語句と、を格納し、
前記制御手段は、前記通信手段が前記音声認識装置から文字列を受信した際に、当該文字列内の語句が前記格納手段に前記修正前語句として格納されている場合には、前記文字列内の語句のうち前記修正前語句として示された語句を前記修正後文字に置き換えた置換文字列を、前記選択候補として生成する、請求項4に記載の音声変換装置。 - 請求項1から5のいずれか1項に記載の音声変換装置を備えた携帯電話端末。
- 音声変換装置が行う音声変換方法であって、
音声を受け付けるごとに、当該音声を文字列に変換し、
前記文字列を表示手段に表示し、
前記表示手段に表示された文字列の一部である語句を修正する旨の修正指示を受け付けると、当該修正指示に従って前記語句を修正し、
前記修正が実行された語句についての修正の内容を格納手段に格納し、
前記音声が文字列に変換された際に、当該文字列内の語句についての修正の内容が前記格納手段に格納されている場合には、前記修正の内容を反映した選択候補を生成し、当該選択候補を、前記音声の認識結果候補として、前記表示手段に表示する、音声変換方法。 - 音声データを受信するごとに当該音声データを文字列に変換し当該文字列を前記音声データの送信元に送信する音声認識装置と通信可能な音声変換装置が行う音声変換方法であって、
入力された音声を音声データに変換し、
前記音声データを前記音声認識装置に送信し、その後、前記音声認識装置から前記音声データの変換結果である文字列を受信し、
前記文字列を表示手段に表示し、
前記表示手段に表示された文字列の一部である語句を修正する修正指示を受け付けると、当該修正指示に従って前記文字列内の語句を修正し、
前記修正が実行された語句についての修正の内容を格納手段に格納し、
前記音声認識装置から文字列が受信された際に、当該文字列内の語句についての修正の内容が前記格納手段に格納されている場合には、前記修正の内容を反映した選択候補を生成し、当該選択候補を、前記音声の認識結果候補として、前記表示手段に表示する、音声変換方法。 - コンピュータに、
音声を受け付けるごとに、当該音声を文字列に変換する音声認識手順と、
前記文字列を表示手段に表示する表示手順と、
前記表示手段に表示された文字列の一部である語句を修正する旨の修正指示を受け付けると、当該修正指示に従って前記語句を修正する修正手順と、
前記修正が実行された語句についての修正の内容を格納手段に格納する格納手順と、
前記音声認識手順にて音声が文字列に変換された際に、当該文字列内の語句についての修正の内容が前記格納手段に格納されている場合には、前記修正の内容を反映した選択候補を生成し、当該選択候補を、前記音声の認識結果候補として、前記表示手段に表示する制御手順と、を実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体。 - 音声データを受信するごとに当該音声データを文字列に変換し当該文字列を前記音声データの送信元に送信する音声認識装置と通信可能なコンピュータに、
入力された音声を音声データに変換する出力手順と、
前記音声データを前記音声認識装置に送信し、その後、前記音声認識装置から前記音声データの変換結果である文字列を受信する通信手順と、
前記文字列を表示手段に表示する表示手順と、
前記表示手段に表示された文字列の一部である語句を修正する修正指示を受け付けると、当該修正指示に従って前記文字列内の語句を修正する修正手順と、
前記修正が実行された語句についての修正の内容を格納手段に格納する格納手順と、
前記音声認識装置から文字列が受信された際に、当該文字列内の語句についての修正の内容が前記格納手段に格納されている場合には、前記修正の内容を反映した選択候補を生成し、当該選択候補を、前記音声の認識結果候補として、前記表示手段に表示する制御手順と、を実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012536306A JP5874640B2 (ja) | 2010-09-29 | 2011-09-06 | 音声変換装置、携帯電話端末、音声変換方法およびプログラム |
US13/818,889 US20130179166A1 (en) | 2010-09-29 | 2011-09-06 | Voice conversion device, portable telephone terminal, voice conversion method, and record medium |
CN201180047298.6A CN103140889B (zh) | 2010-09-29 | 2011-09-06 | 语音转换装置、便携电话终端、语音转换方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010-219053 | 2010-09-29 | ||
JP2010219053 | 2010-09-29 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012043168A1 true WO2012043168A1 (ja) | 2012-04-05 |
Family
ID=45892641
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/070248 WO2012043168A1 (ja) | 2010-09-29 | 2011-09-06 | 音声変換装置、携帯電話端末、音声変換方法および記録媒体 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20130179166A1 (ja) |
JP (1) | JP5874640B2 (ja) |
CN (1) | CN103140889B (ja) |
WO (1) | WO2012043168A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020107130A (ja) * | 2018-12-27 | 2020-07-09 | キヤノン株式会社 | 情報処理システム、情報処理装置、制御方法、プログラム |
JP7463690B2 (ja) | 2019-10-31 | 2024-04-09 | 株式会社リコー | サーバ装置、通信システム、情報処理方法、プログラムおよび記録媒体 |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9130778B2 (en) | 2012-01-25 | 2015-09-08 | Bitdefender IPR Management Ltd. | Systems and methods for spam detection using frequency spectra of character strings |
US8954519B2 (en) * | 2012-01-25 | 2015-02-10 | Bitdefender IPR Management Ltd. | Systems and methods for spam detection using character histograms |
CN103647880B (zh) * | 2013-12-13 | 2015-11-18 | 南京丰泰通信技术股份有限公司 | 一种带有电话转译电文功能的电话机 |
CN103944983B (zh) * | 2014-04-14 | 2017-09-29 | 广东美的制冷设备有限公司 | 语音控制指令纠错方法和系统 |
KR102261552B1 (ko) | 2014-06-30 | 2021-06-07 | 삼성전자주식회사 | 음성 명령어 제공 방법 및 이를 지원하는 전자 장치 |
CN105786438A (zh) * | 2014-12-25 | 2016-07-20 | 联想(北京)有限公司 | 一种电子系统 |
US20180315415A1 (en) * | 2017-04-26 | 2018-11-01 | Soundhound, Inc. | Virtual assistant with error identification |
CN107731229B (zh) * | 2017-09-29 | 2021-06-08 | 百度在线网络技术(北京)有限公司 | 用于识别语音的方法和装置 |
JP7159756B2 (ja) | 2018-09-27 | 2022-10-25 | 富士通株式会社 | 音声再生区間の制御方法、音声再生区間の制御プログラムおよび情報処理装置 |
JP7243106B2 (ja) * | 2018-09-27 | 2023-03-22 | 富士通株式会社 | 修正候補提示方法、修正候補提示プログラムおよび情報処理装置 |
US11263198B2 (en) | 2019-09-05 | 2022-03-01 | Soundhound, Inc. | System and method for detection and correction of a query |
CN116312509B (zh) * | 2023-01-13 | 2024-03-01 | 山东三宏信息科技有限公司 | 一种基于语音识别的终端id文本的校正方法、设备及介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002287792A (ja) * | 2001-03-27 | 2002-10-04 | Denso Corp | 音声認識装置 |
JP2004240234A (ja) * | 2003-02-07 | 2004-08-26 | Nippon Hoso Kyokai <Nhk> | 文字列修正訓練サーバ、文字列修正訓練装置、文字列修正訓練方法および文字列修正訓練プログラム |
JP2004309928A (ja) * | 2003-04-09 | 2004-11-04 | Casio Comput Co Ltd | 音声認識装置、電子辞書装置、音声認識方法、検索方法、及びプログラム |
JP2006521578A (ja) * | 2003-03-26 | 2006-09-21 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | 音声認識システム |
JP2011002656A (ja) * | 2009-06-18 | 2011-01-06 | Nec Corp | 音声認識結果修正候補検出装置、音声書き起こし支援装置、方法及びプログラム |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6791529B2 (en) * | 2001-12-13 | 2004-09-14 | Koninklijke Philips Electronics N.V. | UI with graphics-assisted voice control system |
US8996379B2 (en) * | 2007-03-07 | 2015-03-31 | Vlingo Corporation | Speech recognition text entry for software applications |
KR101462932B1 (ko) * | 2008-05-28 | 2014-12-04 | 엘지전자 주식회사 | 이동 단말기 및 그의 텍스트 수정방법 |
CN101655837B (zh) * | 2009-09-08 | 2010-10-13 | 北京邮电大学 | 一种对语音识别后文本进行检错并纠错的方法 |
-
2011
- 2011-09-06 JP JP2012536306A patent/JP5874640B2/ja not_active Expired - Fee Related
- 2011-09-06 CN CN201180047298.6A patent/CN103140889B/zh not_active Expired - Fee Related
- 2011-09-06 WO PCT/JP2011/070248 patent/WO2012043168A1/ja active Application Filing
- 2011-09-06 US US13/818,889 patent/US20130179166A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002287792A (ja) * | 2001-03-27 | 2002-10-04 | Denso Corp | 音声認識装置 |
JP2004240234A (ja) * | 2003-02-07 | 2004-08-26 | Nippon Hoso Kyokai <Nhk> | 文字列修正訓練サーバ、文字列修正訓練装置、文字列修正訓練方法および文字列修正訓練プログラム |
JP2006521578A (ja) * | 2003-03-26 | 2006-09-21 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | 音声認識システム |
JP2004309928A (ja) * | 2003-04-09 | 2004-11-04 | Casio Comput Co Ltd | 音声認識装置、電子辞書装置、音声認識方法、検索方法、及びプログラム |
JP2011002656A (ja) * | 2009-06-18 | 2011-01-06 | Nec Corp | 音声認識結果修正候補検出装置、音声書き起こし支援装置、方法及びプログラム |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020107130A (ja) * | 2018-12-27 | 2020-07-09 | キヤノン株式会社 | 情報処理システム、情報処理装置、制御方法、プログラム |
JP7463690B2 (ja) | 2019-10-31 | 2024-04-09 | 株式会社リコー | サーバ装置、通信システム、情報処理方法、プログラムおよび記録媒体 |
Also Published As
Publication number | Publication date |
---|---|
US20130179166A1 (en) | 2013-07-11 |
CN103140889A (zh) | 2013-06-05 |
JPWO2012043168A1 (ja) | 2014-02-06 |
JP5874640B2 (ja) | 2016-03-02 |
CN103140889B (zh) | 2015-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5874640B2 (ja) | 音声変換装置、携帯電話端末、音声変換方法およびプログラム | |
US7552045B2 (en) | Method, apparatus and computer program product for providing flexible text based language identification | |
US8423351B2 (en) | Speech correction for typed input | |
US7983912B2 (en) | Apparatus, method, and computer program product for correcting a misrecognized utterance using a whole or a partial re-utterance | |
US20080126093A1 (en) | Method, Apparatus and Computer Program Product for Providing a Language Based Interactive Multimedia System | |
JP2009098490A (ja) | 音声認識結果編集装置、音声認識装置およびコンピュータプログラム | |
JP2013065188A (ja) | オートマトン決定化方法、オートマトン決定化装置およびオートマトン決定化プログラム | |
JPWO2007097390A1 (ja) | 音声認識システム、音声認識結果出力方法、及び音声認識結果出力プログラム | |
JP3104661B2 (ja) | 日本語文章作成装置 | |
JP2021139994A (ja) | 音声認識誤り訂正装置、音声認識誤り訂正方法及び音声認識誤り訂正プログラム | |
JP2002014693A (ja) | 音声認識システム用辞書提供方法、および音声認識インタフェース | |
JP4189336B2 (ja) | 音声情報処理システム、音声情報処理方法及びプログラム | |
JP2013050742A (ja) | 音声認識装置および音声認識方法 | |
JP2010164918A (ja) | 音声翻訳装置、および方法 | |
WO2017159207A1 (ja) | 処理実行装置、処理実行装置の制御方法、および制御プログラム | |
JP2000056795A (ja) | 音声認識装置 | |
JP6197523B2 (ja) | 音声合成装置、言語辞書修正方法及び言語辞書修正用コンピュータプログラム | |
JP2009199434A (ja) | アルファベット文字列日本語読み変換装置及びアルファベット文字列日本語読み変換プログラム | |
JP6411015B2 (ja) | 音声合成装置、音声合成方法、およびプログラム | |
JP2002123281A (ja) | 音声合成装置 | |
JP3414326B2 (ja) | 音声合成用辞書登録装置及び方法 | |
JP4445371B2 (ja) | 認識語彙の登録装置と音声認識装置および方法 | |
JP3036591B2 (ja) | 音声認識装置 | |
JP2006309469A (ja) | 検索装置、検索方法、プログラム、及びコンピュータ読み取り可能な記録媒体 | |
JP2007293595A (ja) | 情報処理装置及び情報処理方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201180047298.6 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11828729 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13818889 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2012536306 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11828729 Country of ref document: EP Kind code of ref document: A1 |