WO2016075794A1 - Voice conversion device, voice conversion method, and voice conversion program - Google Patents
Voice conversion device, voice conversion method, and voice conversion program Download PDFInfo
- Publication number
- WO2016075794A1 WO2016075794A1 PCT/JP2014/080103 JP2014080103W WO2016075794A1 WO 2016075794 A1 WO2016075794 A1 WO 2016075794A1 JP 2014080103 W JP2014080103 W JP 2014080103W WO 2016075794 A1 WO2016075794 A1 WO 2016075794A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- conversion
- voice
- character
- dictionary
- unit
- Prior art date
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 317
- 238000000034 method Methods 0.000 title claims abstract description 164
- 238000003384 imaging method Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 description 150
- 238000012545 processing Methods 0.000 description 99
- 238000004891 communication Methods 0.000 description 46
- 238000010586 diagram Methods 0.000 description 42
- 230000006870 function Effects 0.000 description 21
- 230000008859 change Effects 0.000 description 20
- 230000005236 sound signal Effects 0.000 description 13
- 238000001228 spectrum Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 7
- 238000003825 pressing Methods 0.000 description 7
- 238000000926 separation method Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 230000003252 repetitive effect Effects 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 238000012790 confirmation Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 241000207961 Sesamum Species 0.000 description 3
- 235000003434 Sesamum indicum Nutrition 0.000 description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000001914 filtration Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000010079 rubber tapping Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000000881 depressing effect Effects 0.000 description 1
- 230000000994 depressogenic effect Effects 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
Definitions
- the present invention relates to a voice conversion device, a voice conversion method, and a voice conversion program, and more particularly to a technique for converting voice uttered by a user.
- the Internet can be used to search and display information on the server device, and to view audio and video content distributed from the server device.
- Broadcast receiving apparatuses corresponding to services to be used are becoming widespread. When using these services, it is necessary to input a character string such as a keyword to be searched or a character for logging in to the content distribution service. Therefore, in the conventional broadcast receiving apparatus, the user performs the character input operation by pressing the operation button of the remote controller with a finger, but there is a problem that this operation is difficult. For this reason, a method of inputting a character string such as a search by voice recognition has been proposed.
- a keyword is extracted from program information, a speech recognition dictionary in which the keyword is registered together with the speech recognition information is generated, and the keyword is displayed on the screen in response to a user request based on the partial dictionary of the speech recognition dictionary.
- the user inputs the keyword by voice by selecting from the keyword displayed on the screen "(see summary).
- Words that are used in daily search are usually used for the character strings such as keywords to be searched, and speech recognition is easy. Therefore, speech input is convenient for easy input of character strings.
- the account name and password input when logging in to the server apparatus may be a combination of irregular alphanumeric characters or special characters and symbols such as “@”. Therefore, when inputting by voice recognition, the user needs to input one by one the alphanumeric characters constituting the character string one by one, and there is a problem that usability is poor.
- An object of the present invention is to provide a more convenient voice conversion technology.
- the present invention converts speech information uttered by a user into a recognized character composed of a phonetic character representing the reading of the speech information, and reads the phonetic character and the reading of the phonetic character.
- the special character conversion dictionary information that stores a special character including at least one of a character and a symbol having a different reading from each other is referred to, and the recognized character is converted into the special character.
- the block diagram which shows the structural example of the broadcast receiver which concerns on this embodiment It is a figure which shows the internal structure of the broadcast receiver which concerns on this embodiment, Comprising: (a) is a software block diagram of a broadcast receiver, (b) shows the dictionary memorize
- positioning of the remote control 120 The sequence diagram which shows an example of the character input process by the speech recognition by 1st embodiment Sequence diagram showing the flow of dictionary registration processing according to the first embodiment It is a figure which shows the example of a screen display in the dictionary registration process concerning embodiment which concerns on 1st embodiment, (a) shows a special dictionary registration list display screen, (b) shows a new dictionary registration screen, (c) Indicates a dictionary registration change screen. Sequence diagram showing the flow of the character conversion dictionary registration process in the dictionary registration process according to the first embodiment It is a figure which shows the structure of the dictionary information which concerns on 1st embodiment, Comprising: (a) shows a normal character conversion dictionary, (b) shows a special character conversion dictionary.
- the flowchart which shows the flow of the dictionary conversion input process which concerns on 1st embodiment.
- Sequence diagram showing the flow of dictionary registration processing including user authentication processing It is a figure which shows the example of a screen display in the dictionary registration process which concerns on 2nd embodiment, (a) shows a special dictionary registration list display screen, (b) shows a new dictionary registration screen, (c) changes dictionary registration Show the screen.
- Sequence diagram showing the flow of the character conversion dictionary registration process in the dictionary registration process according to the second embodiment The figure which shows the example of the dictionary registration information memorize
- the flowchart which shows the flow of the dictionary conversion input process which concerns on 2nd embodiment.
- Sequence diagram showing an example of character input processing by speech recognition according to the third embodiment The flowchart which shows the flow of the dictionary conversion input process which concerns on 3rd embodiment.
- Flow chart showing the flow of identification sound determination processing It is a figure which shows the signal waveform used for an identification sound determination process, Comprising: (a1) and (b1) are the output waveforms of an identification sound, (a2) is an audio
- Waveform is shown. It is a figure which shows the example of the waveform of an identification sound when the pattern of the timing Ton of outputting the signal of the predetermined frequency F0 in the identification sound setting process S341 and the timing pattern of the period Toff to stop are provided (2 or more), (a ) Shows an output pattern in which the period Ton for outputting the signal of the frequency F0 is substantially the same as the period Toff for stopping, and (b) shows an output pattern of the identification sound in which the period Ton for outputting the signal of the frequency F0 is different from the period Toff of stopping. Show.
- the block diagram which shows an example of an internal structure of an information terminal device It is a figure which shows the internal structure of the information terminal device 2, Comprising: (a) is a software block diagram of the information terminal device 2, (b) shows the dictionary memorize
- the present invention is not limited to a broadcast receiving device, but a device that performs sound input, for example,
- the present invention can be applied to a technology to which a voice input technology can be applied, such as a user login technology in an information processing apparatus such as a PC (Personal Computer), a smartphone, and a tablet terminal device, and an entry / exit management system for a monitoring area.
- FIG. 1 is a block diagram showing a configuration example of a broadcast receiving apparatus 1 according to an embodiment of the present invention.
- the broadcast receiving apparatus 1 is installed in a home, for example, and is electrically connected to the router 3 via a wireless or wired LAN (Local Area Network) (not shown).
- an information terminal device 2 such as a smartphone or a tablet terminal is connected to the router 3 via a wireless LAN, and these information terminal devices 2 are connected to the broadcast receiving device 1 via the router 3 and the LAN.
- the router device 3 is also connected to an external public network 4 such as the Internet.
- the broadcast receiving apparatus 1 is wirelessly connected to the remote controller 120 by infrared communication.
- the broadcast receiving apparatus 1 includes an antenna 107 for receiving broadcast waves from the broadcast station 5 and receives public broadcast waves.
- the broadcast receiving apparatus 1 includes a main control unit 101, a ROM (Read Only Memory) 102 or a RAM (Random Access Memory) 103, a storage unit 104, an external interface (hereinafter referred to as an “I / F unit”) unit 105, Operation unit 106, antenna 107 for receiving broadcast waves from broadcasting station 5, tuner / demodulation unit 108 connected to antenna 107, separation unit 109, decoder unit 110, audio input unit 111, audio processing unit 112, audio output Unit 113, imaging unit 114, image processing unit 115, display unit 116, and LAN communication unit 117, and these components are electrically connected to each other via a system bus 100.
- the system bus 100 is a data communication path for performing data transmission / reception between the main control unit 101 and each unit in the broadcast receiving apparatus 1.
- the main control unit 101 includes a CPU (Central Processing Unit) that performs arithmetic and control processing.
- the main control unit 101 develops (loads) various operation programs and data stored in the ROM 102, the RAM 103, and / or the storage unit 104 into the RAM 103 and executes them. Thereby, an operation program (software) cooperates with CPU (hardware), and various functions of broadcast receiving device 1 are realized.
- the main control unit 101 controls the entire broadcast receiving apparatus 1.
- the ROM 102 is a memory in which a basic operation program such as an operating system and other operation programs are stored.
- a rewritable ROM such as an EEPROM (Electrically Erasable Programmable ⁇ ROM) or a flash ROM is used.
- the RAM 103 serves as a work area for executing a basic operation program and other operation programs (applications).
- the ROM 102 and the RAM 103 may be configured integrally with the main control unit 101. Further, the ROM 102 may not use an independent configuration as shown in FIG. 1 but may use a partial storage area in the storage unit 104.
- the storage unit 104 is configured using a device capable of holding and storing data even when the power is turned off, such as an HDD (Hard Disc Drive), and stores an operation program, an operation setting value, and the like of the broadcast receiving device 1.
- a new operation program (application) downloaded from the server device connected to the server device (not shown) via the LAN communication unit 117, the router device 3, and the external network 4, various data created by the operation program, etc. Can be stored.
- contents such as moving images, still images, and voices acquired from broadcast waves or downloaded from server devices on the network can be stored.
- the external I / F unit 105 is an interface group for extending the function of the broadcast receiving apparatus 1.
- the external I / F unit 105 is a video input I / F unit 105 a, an audio input I / F unit 105 b, and a USB (Universal Serial Interface). Bus) I / F unit 105c.
- the video input I / F unit 105a and the audio input I / F unit 105b input video signals / audio signals from an external video / audio output device.
- the USB I / F unit 105c connects a USB device such as a keyboard or a memory card.
- the HDD apparatus or the like may be connected to the USB I / F unit 105c. Further, the video input I / F unit 105a and the audio input I / F unit 105b may input video and audio together using HDMI (High-Definition Multimedia Interface: registered trademark).
- HDMI High-Definition Multimedia Interface: registered trademark
- the operation unit 106 is configured by using an input device for a user to input an operation instruction to the broadcast receiving apparatus 1.
- the operation unit 106 includes an operation key 106 a in which button switches are arranged and a remote control reception unit 106 b that receives an infrared signal from the remote control 120.
- the broadcast receiving apparatus 1 may be operated using a keyboard or the like connected to the USB I / F unit 105c.
- these information terminal apparatuses 2 and the remote controller 120 function as the operation unit 106.
- the tuner / demodulator 108 demodulates a TS (Transport Stream) signal by extracting a channel signal selected by the user from the broadcast wave received by the antenna 107.
- TS Transport Stream
- the separation unit 109 separates the TS signal into packetized video data, audio data, and accompanying information data, and outputs them to the decoder unit 110.
- the decoder unit 110 includes an audio decoder 110a, a video decoder 110b, and an information decoder 110c.
- the audio data packetized by the separation unit 109 is output to the audio decoder 110a, the video data is output to the video decoder 110b, and the accompanying information data is output to the information decoder 110c.
- the audio decoder 110a decodes the audio data output from the separation unit 109 and outputs the audio data to the audio processing unit 112 as an audio signal.
- the video decoder 110b decodes the video data output from the separation unit 109 and outputs the decoded video data to the image processing unit 115 as a video signal.
- the information decoder 110c processes the accompanying information data output from the separation unit 109, and acquires SI (Service Information) information including program information such as the program name, genre, broadcast start / end date and time of each program in particular. .
- SI Service Information
- the audio input unit 111 is a microphone, and takes in external audio and outputs it to the audio processing unit 112.
- the audio processing unit 112 performs A / D (Analog / Digital Converter) conversion on the audio captured by the audio input unit 111, or performs D / A conversion (Digital / Analog® Converter) on the audio signal output to the audio output unit 114. To do.
- a / D Analog / Digital Converter
- D / A conversion Digital / Analog® Converter
- the audio output unit 113 is a speaker and outputs the audio signal processed by the audio processing unit 112.
- the imaging unit 114 captures an image around the broadcast receiving apparatus 1 by converting light input from a lens into an electrical signal using an electronic device such as a CCD (Charge-Coupled Device) or a CMOS (Complementary-Metal-Oxide-Semiconductor) sensor. It is a camera that outputs to the processing unit 115.
- an electronic device such as a CCD (Charge-Coupled Device) or a CMOS (Complementary-Metal-Oxide-Semiconductor) sensor. It is a camera that outputs to the processing unit 115.
- the image processing unit 115 performs format conversion, menu and other OSD (On Screen Display) signal superimposition processing on the input video signal as necessary.
- OSD On Screen Display
- the display unit 116 is a display device such as a liquid crystal panel, and displays the video signal processed by the image processing unit 115.
- the LAN communication unit 117 is connected to the router device 3 by wire or wirelessly, and transmits / receives information to / from a device connected to a local network in a home or a server device connected to an external network 4 such as the Internet. Further, the information terminal device 2 is connected to the router device 3, and the user can operate the broadcast receiving device 1 using the information terminal device 2.
- the LAN communication unit 117 may further include another communication unit such as the Bluetooth (registered trademark) communication unit or the NFC (Near Field Communication) communication unit in the broadcast receiving device 1.
- the audio input unit 111 and the imaging unit 114 are built in the broadcast receiving apparatus 1, but a camera or a microphone provided outside via the USB I / F unit 105c is used. Also good.
- the communication from the remote control 120 to the remote control receiving unit 106b is configured to be performed by infrared rays, but the present invention is not limited to this, and other communication methods such as Bluetooth may be used.
- the broadcast receiving apparatus 1 may be a broadcast recording / playback apparatus such as a BD (Blu-ray Disc: registered trademark) recorder or HDD recorder, an STB (Set Top Box), or the like.
- BD Blu-ray Disc: registered trademark
- HDD recorder HDD recorder
- STB Set Top Box
- a video signal output unit and an audio signal output unit may be provided instead of the display unit 116 and the audio output unit 113.
- FIG. 2 is a diagram illustrating an internal configuration of the broadcast receiving apparatus 1 according to the present embodiment, in which (a) is a software configuration diagram of the broadcast receiving apparatus 1 and (b) is stored in a voice conversion information storage area. Indicates the dictionary to be used.
- the software configuration diagram of FIG. 2A shows a software configuration in the ROM 102, the RAM 103, and the storage unit 104.
- the basic operation program 1021 stored in the ROM 102 is expanded in the RAM 103, and the main control unit 101 executes the expanded basic operation program to constitute the basic operation execution unit 1031.
- the application program 1041, content processing program 1042, voice conversion program 1043, user authentication program 1044, cooperative terminal management program 1045, and dictionary registration program 1046 stored in the storage unit 104 are expanded in the RAM 103, and further, the main control unit
- the application execution unit 1032, the content processing execution unit 1033, the voice conversion execution unit 1034, the user authentication execution unit 1035, the linked terminal management execution unit 1036, and the dictionary registration execution unit are executed by the 101 executing the expanded operation programs. 1037 is configured.
- the RAM 103 includes a temporary storage area 1038 for temporarily storing data created when each operation program is executed as necessary.
- the storage unit 104 stores video content downloaded from a server device on the network as recorded content, a content information storage area 104a for managing information related to the recorded content, and the like.
- a voice conversion information storage area 104b for storing a dictionary to be converted, a user authentication information storage area 104c for storing user authentication information including voice or image data for user authentication, and a cooperative operation with the broadcast receiving apparatus 1 are possible. It is assumed that a cooperation terminal information storage area 104d for storing identification information and the like of the information terminal device 2 and various information storage areas 104e for storing other various information are provided.
- the input voice is recognized, the voice components (for example, phonemes and syllables) are recognized, and the phonetic characters (for example, in Japanese).
- Speech recognition dictionary 104b1 for converting to a character string (hereinafter referred to as "recognized character string") consisting of hiragana, katakana, and English alphabet, including characters and symbols of different types and shapes having the same sound (reading) as the recognized character string
- Normal character conversion dictionary (first dictionary) 104b2 for converting to ideographic characters (for example, kanji in Japanese), and special character conversion dictionary for converting recognized character strings to phonograms and ideograms of different sounds Assume that (second dictionary) 104b3 is stored.
- phonemes and character types are almost completely one-to-one correspondence, for example, hiragana and katakana in Japanese, as well as multiple letters in the same sound, such as the alphabet in English.
- hiragana and katakana in Japanese
- multiple letters in the same sound such as the alphabet in English.
- the same character is used for different sounds (for example, “a” in “apple” and “a” in “ace”) .
- the above ideographic characters include numbers and symbols. That is, for example, when a voice corresponding to “and” is input in English, the recognized character string is converted to “and”, and “and” and “&” consisting of three alphabetic character strings are usually converted by normal character conversion. It may be provided as a conversion candidate example and configured so that the user can select it. Alternatively, the basic character string “and” may be registered in association with the symbol “&” in a special character conversion dictionary instead of the normal character conversion dictionary.
- the main control unit 101 executes the basic operation processing to control each operation block by expanding the basic operation program 1021 stored in the ROM 102 into the RAM 103 and executing it.
- the unit 1031 is described as performing control of each operation block. The same description is made for other operation programs.
- the application execution unit 1032 executes various operation programs downloaded from a server device or the like. Each application is activated when the operation unit 106 receives an operation from the user and the user selects an application activation icon displayed on the display unit 116.
- the content processing execution unit 1033 accumulates content data from the server device in advance in the content information storage area 104a via the external network 4, reproduces the accumulated content, and displays it on the display unit 116 (download reproduction). Alternatively, the content processing execution unit 1033 receives content data and content information distributed from the server device via the external network 4, reproduces the sequentially received video, audio, and the like and displays them on the display unit 116 (streaming). Playback).
- the voice conversion execution unit 1034 converts the user's voice captured by the voice input unit 111 into a recognized character string based on the voice recognition dictionary 104b1, and inputs an operation input such as channel selection of the broadcast receiving apparatus 1 or a voice recognized character string. Characters are input after being converted into a predetermined character string according to the dictionaries 104b2 and 104b3.
- the user authentication execution unit 1035 authenticates the user based on the user authentication information stored in the user authentication information storage area 104c and the user's voice captured by the voice input unit 111 or the user's face image captured by the imaging unit 114. I do.
- the linked terminal management execution unit 1036 registers and manages the information terminal device 2 connected to the external network 4 such as the home local network or the Internet, and the broadcast receiving device 1 can perform various operations according to the operation input from the registered information terminal device 2. Execute the operation.
- the operation programs may be stored in advance in the ROM 102 and / or the storage unit 104 at the time of product shipment. It may be obtained via a LAN communication unit 117 from a server device on the Internet 4 after product shipment. Further, the respective operation programs stored in a memory card, an optical disk or the like may be acquired via the USB interface unit 105c or the like.
- FIG. 3 is a diagram illustrating an example of the key arrangement of the remote controller 120.
- a power key 120a1 a broadcast wave selection key (terrestrial digital, BS, CS) 120a2, a channel character input key (1-12) 120a3, a volume UP / DOWN key 120a4, a channel UP / DOWN key 120a5, Input switch key 120a6, program guide key 120a7, Data key 120a8, audio input key 120a9, menu key 120a10, return key 120a11, cursor keys (up, down, left, right) 120a12, enter key 120a13, color keys (blue, red) , Green, yellow) 120a14. Other operation keys may be further displayed.
- the power key 120a1, the broadcast wave selection key 120a2, and the like have the same functions as the operation keys of a known TV remote controller, and will not be described in detail.
- the voice input key 120a9 is an operation key prepared for the voice input function of this embodiment.
- FIG. 4 is a sequence diagram showing an example of character input processing by voice recognition according to the present embodiment.
- the broadcast receiving apparatus 1 displays a character input screen (S400).
- the character string input screen is displayed, for example, when an account name or a password for logging in to the server is input via the external network 4 when the content distributed from the server device is streamed or downloaded.
- the main control unit 101 of the broadcast receiving apparatus 1 determines the state (operation input state) where the voice input key 120a9 is pressed based on the key input information (S403).
- the main control unit 101 executes a branch process (S404) based on the operation input state determination result.
- the main control unit 101 determines in the operation input state determination process that the voice input key 120a9 is pressed once within a predetermined time or less, the character string (recognized character string) recognized by the voice conversion execution unit 1034 is recognized.
- the normal character conversion dictionary (first dictionary) is selected as a dictionary to be converted into another character string (normal character string) (S411).
- the voice conversion execution unit 1034 converts the recognized character string into another character string (special character string).
- the special character conversion dictionary (second dictionary) is selected as the dictionary to be executed (S412).
- the user utters a voice (S421)
- the voice is captured by the voice input unit 111 (S422)
- voice conversion is executed based on the captured voice and the voice recognition dictionary 104b1 stored in the voice conversion information storage area 104b.
- the unit 1034 performs voice recognition processing to convert the voice into a recognized character string (S423).
- the speech conversion execution unit 1034 converts the recognized character string into a normal character string or a special character string by dictionary conversion input processing described later, and sets the converted result as character input (S424).
- the normal character conversion dictionary stores general words and phrases and is stored in the ROM 102 and / or the storage unit 104 in advance at the time of product shipment. Alternatively, it may be acquired via a LAN communication unit 117 from a server device on the Internet 4 after product shipment. Further, a normal character string conversion dictionary stored in a memory card, an optical disk or the like may be acquired via the USB interface unit 105c or the like. Or you may make it a user register by the dictionary registration process mentioned later.
- FIG. 5 is a sequence diagram showing the flow of dictionary registration processing.
- 6A and 6B are diagrams showing a screen display example in the dictionary registration processing according to the embodiment.
- FIG. 6A shows a special dictionary registration list display screen
- FIG. 6B shows a new dictionary registration screen
- FIG. The registration change screen is shown.
- the dictionary registration execution unit 1037 executes the dictionary registration process in the order of steps in FIG.
- the process is repeated (S500) until the user selects the end of dictionary registration.
- a list of character strings registered in the dictionary is displayed on the display unit 116 (S501).
- the registration number 6a1 the voice-recognized character string (recognized character string) 6a2, and the special character string corresponding to the recognized character string ("Conversion” in the figure) 6a3, the selection frame 6a4, and the function 6a5 assigned to the color key 120a14 of the remote controller 120 are displayed.
- functions assigned to the color key 120a14 “red” is newly registered, “blue” is registered and changed, “yellow” is registered and deleted, and “green” is a normal character conversion dictionary (first dictionary) and a special character conversion dictionary.
- An example of assignment to (second dictionary) switching is shown.
- the user performs process selection input using the remote controller 120 (S502). For example, by moving the selection frame 6a4 with the up and down cursor keys 120a12 of the remote controller 120 and pressing the "blue” or “yellow” key of the color key 120a14, the processing for changing or deleting the dictionary registration contents displayed in the selection frame Can be selected.
- the dictionary registration execution unit 1037 displays a new dictionary registration screen (see FIG. 6B). 116 (S511), and a character conversion dictionary registration process S512 described later is executed.
- the new dictionary registration screen includes a new registration number 6b1, a character string input frame 6b2 before conversion by speech recognition input (an input frame in which a recognized character string is displayed), a character string input frame 6b3 after conversion, and color keys of the remote controller 120.
- the function 6b4 assigned to 120a14 is displayed.
- the selected character string input frame is displayed as a thick line frame.
- the soft keyboard 6b5 may be displayed when a character string is input to the converted character string input frame 6b3.
- the user may input a converted character string by selecting and determining a character with the soft keyboard 6b5 by operating the cursor of the remote controller 120, or by operating the channel character input keys (1 to 12) 120a3 of the remote controller 120 for conversion.
- You may input a character string into the subsequent character string input frame 6b3.
- the voice input key 120a9 of the remote controller 120 may be depressed to input the converted character string by voice input.
- the character string input frame 6b2 is selected again and the remote controller 120 is determined.
- the key 120a13 By depressing the key 120a13, the user can speak again and input voice, and voice recognition processing can be performed again.
- the user can speak again to input voice and perform voice recognition processing again. It may be.
- the dictionary registration execution unit 1037 displays the dictionary registration change screen ((c) in FIG. 6). (S513), and a character conversion dictionary registration process S514 described later is executed.
- Fig. 6 (c) shows a display example of the dictionary registration change screen.
- the dictionary registration change screen includes a registration number 6c1 of a character string to be changed, a character string input frame 6c2 before conversion by speech recognition input (an input frame in which a recognized character string is displayed), a character string input frame 6c3 after conversion, and the remote controller 120.
- the function 6c4 assigned to the color key 120a14 is displayed.
- the dictionary registration change screen the recognized character string currently registered in the dictionary and the corresponding special character string are displayed first, and either the recognized character string after change or the special character string or both are changed and registered be able to.
- a soft keyboard may be further displayed as in the new dictionary registration screen when a character string is input to the converted character string input frame 6c3.
- the dictionary registration execution unit 1037 deletes the registration of the character string in the selection frame portion (S515).
- FIG. 7 is a sequence diagram showing the flow of the character conversion dictionary registration process in the dictionary registration process.
- the process is repeated (S700) until the user selects the end of dictionary registration.
- the repetitive process S700 first, the user inputs a process selection using the remote controller 120 (S701).
- the character string input frame 6b2 before conversion or the character string input frame 6b3 after conversion by voice recognition input shown in FIG. 6B is selected by the up and down cursor keys 120a12 of the remote controller 120.
- the process is selected by pressing the “red” key of the color key 120a14.
- the dictionary registration process S514 in the dictionary registration change the character string input frame 6c2 before conversion or the character string input frame 6c3 after conversion by voice recognition input shown in FIG. Selection and processing selection is performed by pressing the “red” key of the color key 120a14.
- the dictionary registration execution unit 1037 executes a branch process S702 according to the selected process.
- the dictionary registration execution unit 1037 and the character string input box (6b2 or 6c2) for the recognized character string and after conversion A dictionary of normal character conversion dictionary (first dictionary) or special character conversion dictionary (second dictionary) switched by dictionary switching processing S516 in association with the character string displayed in the character string input frame 6b3 or 6c3.
- Information is stored in the normal character conversion dictionary 104b2 or special character conversion dictionary 104b3 of the storage unit 104 (S708).
- the dictionary registration execution unit 1037 ends the character string conversion dictionary registration process by the dictionary interruption process (S709), and repeats the process S700. Exit.
- FIG. 8 is a diagram showing the configuration of dictionary information, where (a) shows a normal character conversion dictionary and (b) shows a special character conversion dictionary.
- the normal character conversion dictionary has a registration number 8a1 as dictionary information, a character string (recognized character string) 8a2 before conversion by speech recognition input, and a character string (converted character string) 8a3 after conversion.
- the special character conversion dictionary includes, as dictionary information, a registration number 8b1, a character string (recognized character string) 8b2 before conversion by voice recognition input, and a character string after conversion (converted character). Column) 8b3 is stored.
- the character string “Hirake Sesame” before conversion by voice recognition input is converted into a character string “open sesame”.
- the special character conversion dictionary the character string “Hirake Sesame” before conversion by voice recognition input is converted to a character string “6922 # 7MgkRH”.
- FIG. 9 is a flowchart showing the flow of dictionary conversion input processing.
- the speech conversion execution unit 1034 confirms whether the recognized character string recognized in S423 is registered in the selected character conversion dictionary (S901). For example, when the normal character conversion dictionary is selected in S411, the speech conversion execution unit 1034 confirms whether the character string recognized as speech is registered in the normal character conversion dictionary, and the special character conversion dictionary is determined in S412. If it is selected, it is confirmed whether or not a character string that has been voice-recognized is registered in the special character conversion dictionary.
- the voice conversion execution unit 1034 performs a branch process (S902) based on the confirmation result. If the recognized character string is not registered in the character conversion dictionary (S902 / No), the speech conversion execution unit 1034 displays an error such as not being registered in the dictionary (S903), and ends the process. If the speech recognized character string is registered in the character conversion dictionary (S902 / Yes), the speech conversion execution unit 1034 converts the recognized character string into a normal character string or a special character string according to the information in the character conversion dictionary. (S904) The converted character string is used as an input character, and the process is terminated.
- a complicated character string in inputting characters such as an account and a password when logging in to the server device, a complicated character string can be obtained by inputting a simple character string by voice recognition once registered in the character conversion dictionary. Can be easily input, and the usability of the user can be improved.
- user authentication using a password or the like may be performed when displaying a special character conversion dictionary used when inputting a character string such as a password.
- display a mask like “ ⁇ ” display to hide the input character string
- display a screen as shown in FIG. 6C by performing user authentication using a password or the like.
- the normal character conversion dictionary and the special character conversion dictionary may be displayed in a list simultaneously without switching the dictionary display. When displaying simultaneously, it displays so that it may be understood whether it is registered into either a normal character conversion dictionary or a special character conversion dictionary. For example, “N” is added to the head of the registration number of the character string registered in the normal character conversion dictionary, and “S” is added to the head of the registration number of the character string registered in the special character conversion dictionary.
- the recognized character string when the recognized character string is not registered in the character conversion dictionary in the dictionary conversion input process, the recognized character string may be directly used as the input character without displaying an error.
- FIG. 10 is a sequence diagram showing the flow of dictionary registration processing including user authentication processing.
- FIG. 11 is a diagram showing a screen display example in the dictionary registration processing according to the second embodiment, where (a) shows a special dictionary registration list display screen, (b) shows a new dictionary registration screen, and (c) Indicates a dictionary registration change screen.
- the process is repeatedly performed (S1000) until the user selects the end of dictionary registration.
- the dictionary registration execution unit 1037 displays a list of character strings registered in the dictionary on the display unit 116 (S1001).
- FIG. 11A shows a display example of the dictionary registration character string list display screen.
- the special dictionary registration list shown in FIG. 11A displays a registration number 11a1, a pre-conversion character string 11a2, a post-conversion character string 11a3, a selection frame 11a4, and a function 11a5 assigned to the color key 120a14 of the remote controller 120.
- functions assigned to the color key 120a14 “red” is newly registered, “blue” is registered and changed, “yellow” is registered and deleted, and “green” is assigned to switch between the normal character conversion dictionary and the special character conversion dictionary.
- An example is shown.
- character strings that require authentication such as registration numbers S4 and S5 are masked so that the converted character strings are not known.
- the user performs process selection input using the remote controller 120 (S1002). For example, when the user moves the selection frame 11a4 with the up and down cursor keys 120a12 of the remote controller 120 and presses the "blue" or "yellow” key of the color key 120a14, the user changes the dictionary registration contents displayed in the selection frame. Processing or deletion processing can be selected.
- the dictionary registration execution unit 1037 executes the branch process S1010.
- the dictionary registration execution unit 1037 displays a new dictionary registration screen on the display unit 116 (S1011).
- a character conversion dictionary registration process (S1012) is executed.
- Fig. 11B shows a display example of the new dictionary registration screen.
- the new dictionary registration screen shown in FIG. 11B includes a new registration number 11b1, a character string (recognized character string) input box 11b2 before conversion by voice recognition input, and a character string (special character string) input box after conversion.
- 11b3 and the function 11b4 assigned to the color key 120a14 of the remote controller 120 are displayed.
- “red” is assigned as a character string that requires user authentication at the time of conversion
- “blue” is assigned as a character string that does not require user authentication at the time of conversion.
- the user authentication execution unit 1035 When the user selects [Registration Change] (when the “blue” key of the color key 120a14 is pressed), the user authentication execution unit 1035 performs the user conversion when converting the character string recognized by the character string to be changed. Branch processing (S1013) is performed depending on whether or not the character string requires authentication.
- the dictionary registration execution unit 1037 displays a dictionary registration change screen on the display unit 116 (S1014), and executes a character conversion dictionary registration process S1015 described later. .
- Fig. 11 (c) shows a display example of the dictionary registration change screen.
- the dictionary registration change screen shows the registration number 11c1 of the character string to be changed, the character string input frame 11c2 before conversion by voice recognition input, the character string input frame 11c3 after conversion, and the color keys of the remote controller 120.
- the function 11b4 assigned to 120a14 is displayed.
- “red” is assigned as a character string that requires user authentication during conversion
- “blue” is assigned as a character string that does not require user authentication during conversion.
- the registered character string is displayed first, and either the character string before conversion, the character string after conversion, or both can be changed.
- the user authentication execution unit 1035 When the selected character string is a character string that requires user authentication, the user authentication execution unit 1035 performs user authentication processing (S1016), and performs branch processing (S1017) based on the authentication result.
- the user authentication execution unit 1035 displays that the authentication is invalid (S1018).
- the user authentication execution unit 1035 is authenticated to the dictionary registration execution unit 1037.
- the result is output, and the dictionary registration execution unit 1037 displays a dictionary registration change screen on the display unit 116 as in S1014 (S1019), and executes a character conversion dictionary registration process S1020 described later.
- the dictionary registration execution unit 1037 deletes the registration of the character string in the selection frame portion (S1021).
- the dictionary registration execution unit 1037 When the user selects [Dictionary switching] (when the “green” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 performs registration processing of the normal character conversion dictionary from the user or special characters. An operation for switching whether to perform registration processing of the conversion dictionary is accepted, and the dictionary to be registered is switched according to the operation result (S1022). As a result, the dictionary to be displayed when the process returns to the dictionary registration character string list display process S1001 by the repetition process is switched.
- the dictionary registration execution unit 1037 ends the dictionary registration process by the interruption process (S1023), and ends the repeat process S1000.
- the user authentication execution unit 1035 when deleting a character string that requires authentication, the user authentication execution unit 1035 performs user authentication, and the dictionary registration execution unit 1037 deletes only when the authentication is valid. You may make it perform.
- FIG. 12 is a sequence diagram showing the flow of the character conversion dictionary registration process in the dictionary registration process of the second embodiment.
- the repetitive process (S1200) is executed until the user selects the end of dictionary registration.
- the user inputs a process selection using the remote controller 120 (S1201).
- the character string input frame 11b2 before conversion or the character string input frame 11b3 after conversion by voice recognition input is selected by the upper and lower cursor keys 120a12 of the remote controller 120, and the color key 120a14 “ Processing selection is performed by pressing the “red” or “blue” key.
- the character string input frame 11c2 before conversion or the character string input frame 11c3 after conversion by voice recognition input is selected by the upper and lower cursor keys 120a12 of the remote controller 120, and the color key. The process is selected by pressing the “red” or “green” key of 120a14.
- the dictionary registration execution unit 1037 executes the branch process S1202 according to the result of the process selected by the user.
- voice input from the user S1203 is captured by the voice input unit 111 (S1204) and converted into a recognized character string by voice recognition based on the voice captured by the voice conversion execution unit 1034, and the recognized character string is converted into the character string input frame 11b2 or the character string input frame.
- 11c2 is displayed (S1205).
- the user selects [input character string after conversion] (when the character string input frame 11b3 or the character string input frame 11c3 is selected and the determination key 120a13 of the remote controller 120 is pressed), the user selects the channel of the remote controller 120.
- the converted character string is input using the character input key 120a3 or the like (S1206), and the dictionary registration execution unit 1037 displays it in the character string input frame 11b3 or the character string input frame 11c3 (S1207).
- the user authentication execution unit 1035 When the user selects [Registration with user authentication] (when the “red” key of the color key 120a14 is pressed), the user authentication execution unit 1035 performs an authentication information acquisition process (S1208), and the user authentication execution unit The authentication information acquired in 1035 is stored in the user information storage area 104c in association with the registration number. Further, the dictionary registration execution unit 1037 has switched the character string displayed in the character string input frame 11b2 or 11c2 before conversion by the voice recognition input and the character string input frame 11b3 or 11c3 after conversion by the dictionary switching process S1022. It is registered in the character conversion dictionary or special character conversion dictionary (S1209). The registered dictionary information is stored in the voice conversion information storage area 104b of the storage unit 104.
- the dictionary registration execution unit 1037 When the user selects [No User Authentication Registration] (when the “blue” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 performs the character string input frame 11b2 or 11c2 before conversion by voice recognition input.
- the character string displayed in the converted character string input box 11b3 or 11c3 is registered in the normal character conversion dictionary or special character conversion dictionary switched by the dictionary switching process S1022 (S1210).
- the registered dictionary information is stored in the voice conversion information storage area 104b of the storage unit 104.
- the dictionary registration execution unit 1037 ends the character conversion dictionary registration process by the interruption process (S1211), and ends the repetition process S1200. To do.
- the user authentication execution unit 1035 performs user authentication information acquisition processing S1208, and stores the acquired authentication information in the user authentication information storage area 104c in association with the character string to be registered.
- the voice print data may be acquired as the authentication information using the voice data acquired in the voice input acquisition process S1204.
- an image of the user's face may be acquired by the imaging unit 114, and the face recognition data may be used as authentication information.
- the user authentication execution unit 1035 acquires authentication information using an authentication method that can identify a user, and converts a voice-recognized character string according to a character conversion dictionary, user authentication is performed using the acquired authentication information, When the user authentication is valid, conversion using a character conversion dictionary may be performed.
- either the voice recognition or the face recognition may be selected as the user authentication execution method for the broadcast receiving apparatus 1.
- a screen for selecting a user authentication method may be displayed on the display unit 116 so that the user can select using the cursor keys (up, down, left, right) 120a12 and the enter key 120a13 of the remote controller 120.
- the user when the user is in poor physical condition and the voice changes from normal, it may be configured to switch to face recognition when it is determined that the voice print data stored as authentication information does not match.
- registration of a character string for performing user authentication may be performed using both the normal character conversion dictionary and the special character conversion dictionary, or may be performed using only the special character conversion dictionary.
- the special character conversion dictionary can be used only, authentication corresponding to “red” of the color key 120a14 in FIG. 11B and FIG. 11C in the new dictionary registration screen and the dictionary registration change screen of the normal dictionary. No registration is displayed)
- registration of user authentication for a single character string may be performed for a plurality of people.
- the mother can input the password by voice input of the same character string.
- FIG. 13 is a diagram illustrating an example of dictionary registration information stored in the special character conversion dictionary according to the second embodiment.
- dictionary registration information is stored in a registration number 13b1, a character string (recognized character string) 13b2 before conversion by voice recognition input, a character string 13b3 after conversion, authentication necessity 13b4, and a user authentication information storage area 104c.
- the authenticated authentication information 13b5 is stored.
- registration number 3 or 4 registered for authentication needs to authenticate the user at the time of voice input, and even if a person other than the registered person inputs the recognition character string by voice, the authentication becomes invalid. It is possible to prevent a password or the like from being entered without permission because conversion to a column is not possible.
- FIG. 14 is a flowchart showing the flow of dictionary conversion input processing according to the second embodiment.
- the speech conversion execution unit 1034 determines whether or not the recognized character string generated by speech recognition by the speech conversion execution unit 1034 in step S423 (see FIG. 4) is registered in the character conversion dictionary selected by the user. Confirmation is made (S1401). For example, when the normal character conversion dictionary is selected in S411, it is checked whether a character string recognized by speech is registered in the normal character conversion dictionary, and when the special character conversion dictionary is selected in S412. Check if the character string recognized as voice is registered in the special character conversion dictionary.
- the speech conversion execution unit 1034 performs a branching process (S1402) based on the confirmation result.
- the speech conversion execution unit 1034 displays an error such as not being registered in the dictionary. (S1403), the process ends. If the voice-recognized character string is registered in the selected character conversion dictionary (S1402 / Yes), the voice conversion execution unit 1034 determines whether the voice-recognized character string is a character string that requires user authentication. This is confirmed by the dictionary registration information (S1404). Next, based on the result of the confirmation, the voice conversion execution unit 1034 performs a branch process (S1405).
- the speech conversion execution unit 1034 stores information on the character conversion dictionary in which the recognized character string recognized by speech conversion in the character conversion process is selected. (S1409), and the process ends. If the character string requires user authentication (S1405 / Yes), the user authentication execution unit 1035 performs user authentication using the authentication information stored in the user authentication information storage area 104c corresponding to the dictionary registration number. In step S1406, authentication determination information indicating whether user authentication is invalid or valid is output to the voice conversion execution unit 1034. The voice conversion execution unit 1034 performs a branch process (S1407) based on the authentication determination information.
- the voice conversion execution unit 1034 displays that the user authentication is invalid (S1408) and ends the process. If the user authentication is valid (S1407 / Yes), the speech conversion execution unit 1034 converts the recognized character string recognized by the character conversion process into a character string according to the information in the character conversion dictionary (S1409), and performs the process. finish.
- one special character conversion dictionary 104b3 is provided.
- a special character conversion dictionary 104b3 that is different for each user is provided, and a special character conversion dictionary provided for a user specified as a result of user authentication.
- 104b3 may be selected and used to convert the character string in step S1409.
- a first special character conversion dictionary is provided for father and mother, and a second special character conversion dictionary is provided for children.
- the first special character conversion dictionary registers the converted character string “Suzuki_parents” for the recognized character string “Suzuki Adores”.
- the second special character conversion dictionary registers the conversion character string “Suzuki_kid” for the recognized character string “Suzuki Adores”.
- the user authentication execution unit 1035 acquires the recognized character string generated by the speech conversion execution unit 1034 in step S1401, and authenticates the user uniquely in the user authentication of S1406.
- the dictionary conversion process is executed after the first special character conversion dictionary is selected in S1409. If it is determined that the user is a child, the dictionary conversion process is executed after selecting the second special character conversion dictionary.
- the third embodiment is an embodiment that uses voice input by the user in the user authentication process. More specifically, when voice user authentication is used like a voice print, a character string that requires user authentication is used. In voice input, it is possible to prevent user authentication from being validated by inputting voice recorded when a voice of a character string that requires user authentication in the past is input.
- FIG. 15 is a sequence diagram illustrating an example of a character input process by voice recognition according to the third embodiment.
- the voice conversion execution unit 1034 sets the identification sound (S431), and starts outputting the identification sound set from the voice output unit 113 (S432).
- the user utters a voice (S433)
- the voice input unit 111 captures the voice (S434)
- the voice conversion execution unit 1034 finishes outputting the identification sound from the voice output unit 113 (S435).
- the voice conversion execution unit 1034 Based on the voice captured by the voice conversion execution unit 1034 and the voice recognition dictionary 104b1, it is converted into a recognized character string by voice recognition (S436), and the user authentication execution unit 1035 is input in a dictionary conversion input process (S437) described later. If the user authentication is valid after performing user authentication using voice, the recognition character string recognized by the voice conversion execution unit 1034 is converted according to the character conversion dictionary selected by the user, and the converted result is converted to a character. As column input.
- FIG. 16 is a flowchart showing the flow of dictionary conversion input processing according to the third embodiment.
- FIG. 16 the same processing portions as those in FIG.
- user authentication is performed by the user authentication execution unit 1035 using voiceprint data stored in the user information storage area 104c corresponding to the dictionary registration number (S1406a1), and the user authentication execution unit 1035.
- Branch processing (S1406a2) is performed based on authentication determination information indicating whether user authentication by voiceprint is invalid or valid. If the voiceprint stored as authentication information at the time of dictionary registration is different from the voiceprint captured in S434 (see FIG. 15) and the user authentication is invalid (S1406a2 / No), the voice conversion execution unit 1034 has invalid user authentication. It is displayed (S1408) and the process is terminated.
- the user authentication execution unit 1035 performs an identification sound determination process (S1406a3) described later.
- the authentication determination information indicating whether the user authentication is invalid or valid is output to the voice conversion execution unit 1034.
- the voice conversion execution unit 1034 performs a branch process (S1407) based on the authentication determination information.
- the voice conversion execution unit 1034 determines that the recorded voice input has been performed, and the user authentication is invalid. Display (S1408), and the process ends.
- the speech conversion execution unit 1034 converts the character string according to the information of the selected character conversion dictionary (S1409), and ends the process.
- FIG. 17 is a flowchart showing the flow of the identification sound determination process.
- FIG. 18 is a diagram illustrating signal waveforms used in the identification sound determination process, and (a1) to (a5) are cases where the user inputs a voice, and waveform diagrams when the user authentication is valid by the identification sound determination. It is an example.
- FIGS. 18B1 to 18B5 are examples of waveform diagrams in the case where the recorded voice is inputted as voice and the user authentication is invalid by the identification sound determination.
- the execution unit 1035 outputs a signal having a predetermined frequency F0 every predetermined period.
- the output is repeated in the period Ton and the output stop is repeated in the period Toff.
- the predetermined periods Ton and Toff are desirably periods such as milliseconds when it is difficult for the user to synchronize the utterance timing.
- 18 (a2) and 18 (b2) are waveforms of the audio signal captured in S434.
- the identification sound output from the sound output unit 113 and the user's sound are captured.
- the identification sound output from the audio output unit 113, the recorded user's voice and the recorded identification signal are captured.
- the user authentication execution unit 1035 first detects a signal corresponding to the frequency F0 of the identification sound with respect to the audio signal (FIGS. 18 (a2) and (b2)) captured in S434. Filter processing (S1701) is performed. Thereby, the signal (FIG. 18 (a3) and (b3)) which detected the component of the identification sound is obtained.
- FIG. 18 (a3) the identification sound output from the audio output unit 113 is detected, whereas in FIG. 18 (b3), the identification sound output from the audio output unit 113 and the identification sound of the recorded voice are displayed. Detected. Further, the timing is different between the identification sound output from the audio output unit 113 and the recorded identification sound. For this reason, during the period in which the identification sound output from the audio output unit 113 and the identification sound of the recorded sound are detected simultaneously, the interference between the identification sound output from the audio output unit 113 and the identification sound of the recorded sound And the amplitude of the identification sound captured by the voice input unit 111 changes depending on the way of interference.
- FIG. 18 (b3) shows a case where the amplitude of the identification sound is reduced due to interference.
- the user authentication execution unit 1035 detects the amplitude of the detected signal (S1702), and compares the detected signal with a predetermined threshold value, thereby converting the detected signal into a binary signal of H level and L level. (S1703).
- 18 (a4) and (b4) are signals obtained by detecting the amplitude of the identification sound shown in FIGS. 18 (a3) and (b3) by the amplitude detection processing S1702.
- the signals are output from the audio output unit 113.
- 18 (b4) shows that the identification sound output from the audio output unit 113, the recorded identification sound, and the output from the audio output unit 113 are detected. It is shown that the detected amplitude differs between the identification sound being recorded and the identification sound in which the recorded identification sound interferes.
- 18 (a5) and 18 (b5) are signals obtained by comparing the signal detected by the amplitude detection process S1702 with a predetermined threshold value Vt by the binarization process S1703 and converting it to a binary value of H level and L level.
- the H level period is substantially Ton
- the H level period is greatly deviated from Ton due to the influence of the identification sound of the recorded voice.
- the user authentication execution unit 1035 detects the H level period of the binarized signal (S1704), and the H level period is within a predetermined range (here, as an example, Ton ⁇ 0.9 to Ton ⁇ 1). Branching processing S1705 is performed based on whether or not it is within the range of .1. If it is not within the predetermined range (S1705 / No), the user authentication execution unit 1035 determines that the user authentication is invalid, assuming that the recorded voice is input (S1706), and ends the process. If it is within the predetermined range (S1705 / Yes), the user authentication execution unit 1035 determines that the user authentication is valid (S1707), and ends the process.
- a predetermined range here, as an example, Ton ⁇ 0.9 to Ton ⁇ 0.9 to Ton ⁇ 1).
- Branching processing S1705 is performed based on whether or not it is within the range of .1. If it is not within the predetermined range (S1705 / No), the user authentication execution unit 1035 determines that the user
- the recorded sound When is entered since it is difficult to input the recorded sound in accordance with the timing of the recorded identification sound with respect to the identification sound output from the sound output unit 113, the recorded sound When is entered, it is possible to determine that user authentication is invalid.
- FIG. 19 is a diagram illustrating an example of the waveform of the identification sound when a plurality of (two or more) timing patterns are provided for the period Ton for outputting the signal having the predetermined frequency F0 and the period Toff for the stop in the identification sound setting process S341. is there.
- FIG. 20 is a diagram showing an example of a frequency spectrum of voice input when a plurality (two or more) of identification sound frequencies are provided.
- FIG. 20A shows a case where voice actually uttered by the user is input. A frequency spectrum is shown, (b) shows a frequency spectrum when the recorded audio
- a plurality (two or more) of timing patterns of a period Ton for outputting a signal having a predetermined frequency F0 and a period Toff for stopping are provided.
- it is determined whether or not the recorded sound is input by detecting the difference in timing between the output identification sound and the period Ton of the identification sound included in the captured sound and the stop period Toff. Can be determined.
- the identification sound setting process S441 when a plurality of (two or more) identification sound frequencies are set and changed for each sound input, and a sound that has not been recorded is input, as shown in FIG. Thus, the spectrum corresponding to the frequency F0 of the identification signal output from the audio output unit 113 is detected.
- the recorded voice when the recorded voice is input, the spectrum corresponding to the frequency F0 of the identification signal output from the voice output unit 113 and the frequency of the recorded identification signal are shown in FIG. A spectrum corresponding to F1 is detected. Therefore, the frequency spectrum of the input signal is analyzed, and when a spectrum having a frequency different from the frequency of the identification signal output from the audio output unit 113 is detected, it is determined that the recorded sound is input, and the user Authentication may be invalidated.
- the identification sound may be an audible frequency signal or a non-audible frequency signal (ultrasound).
- FIG. 21 is a sequence diagram showing an example of character input processing by voice recognition according to the fourth embodiment.
- FIG. 22 is a flowchart showing the flow of dictionary conversion input processing according to the fourth embodiment.
- FIG. 23 shows the contents of the lip movement determination process in face authentication, (a) shows the lip movement, and (b) shows the time series change of the x and y components of the lip movement. .
- the processing from S401 to S412 is the same as that in FIG.
- the imaging unit 114 starts capturing an image (S441).
- the voice input unit 111 captures the voice uttered by the user (S442) (S443), and the imaging unit 114 ends the capturing of the image (S444).
- the voice conversion execution unit 1034 converts the captured voice and the voice recognition dictionary 104b1 into a recognized character string by voice recognition (S445), and the user recognizes the voice recognized character string by dictionary conversion input processing (S446) described later. Conversion processing is performed according to the selected character conversion dictionary, and the converted result is used as the input of the character string.
- FIG. 22 shows an example of a flowchart of the dictionary conversion input process S446.
- the same processing parts as those in FIG. 22 are identical to FIG. 22.
- the user authentication execution unit 1035 performs user authentication by face authentication based on the image captured from the imaging unit 114 between S441 and S454 (S1406b1), and whether the user authentication is invalid.
- a branch process (S1406b2) is performed based on the authentication determination information indicating whether it is valid.
- the voice conversion execution unit 1034 has invalid user authentication. Is displayed (S1408), and the process is terminated.
- the user authentication execution unit 1035 executes the process between S441 and S444.
- authentication determination information indicating whether the user authentication is invalid or valid is output to the voice conversion execution unit 1034.
- the voice conversion execution unit 1034 performs a branch process (S1407) based on the authentication determination information.
- the user authentication execution unit 1035 detects the movement of the recognized lips of the face from the images captured between S441 and S444. When the user authentication execution unit 1035 detects the movement of the lips, it is determined that the captured image is not a photograph taken and it is determined that the user authentication is valid (S1407 / Yes). If the user authentication execution unit 1035 cannot detect the movement of the lips, it determines that the captured image is a photograph taken, and determines that the user authentication based on the face image is invalid (S1407 / No).
- the voice conversion execution unit 1034 displays that the user authentication is invalid (S1408), and ends the process. If the user authentication is valid based on the authentication determination information (S1407 / Yes), the speech conversion execution unit 1034 converts the character string according to the information in the character conversion dictionary (S1409), and the process ends.
- the user authentication execution unit 1035 may determine based on not only the presence / absence of the movement of the lips but also the degree of opening of the lips corresponding to the character string at the time of voice input.
- the user authentication execution unit 1035 detects the size of the lips with the horizontal size of the lips as X and the vertical size as Y. Then, when the user authentication information (13a5 in FIG. 13) is registered in advance in association with the dictionary, for example, special character information, a face image when the speech recognition character string is input by voice is captured. Then, from the image captured by the user authentication execution unit 1035, as shown in FIG. 23B, the X and Y components of the size of the lip corresponding to the character string inputted by voice are detected, together with the user authentication information based on the face image. Lip sizes X and Y corresponding to the character string are stored as dictionary registration information.
- the user authentication execution unit 1035 determines the size X, Y of the lip corresponding to the character string input by voice from the image captured between S441 and S444 in the determination process S1406b3 based on the movement of the lips. May be detected by comparing the sizes of the lips registered as dictionary information. As a result, it is possible to perform user authentication with a face image with higher accuracy.
- voice input processing is performed using an information terminal device 2 linked with the broadcast receiving device 1.
- FIG. 24 is a block diagram illustrating an example of an internal configuration of the information terminal device 2.
- the information terminal device 2 includes a system bus 200, a main control unit 201, a ROM 202, a RAM 203, a storage unit 204, an expansion I / F unit 205, an operation unit 206, a sensor unit 210, a communication processing unit 220, an image processing unit 230, and an audio.
- the processing unit 240 is configured.
- the main control unit 201 is a microprocessor unit that controls the entire information terminal device 2.
- the system bus 200 is a data communication path for transmitting and receiving data between the main control unit 201 and each operation block in the information terminal device 2.
- the ROM 202 is a memory in which a basic operation program such as an operating system and other operation programs are stored. For example, a rewritable ROM such as an EEPROM or a flash ROM is used.
- the RAM 203 serves as a work area when the basic operation program and other operation programs are executed.
- the ROM 202 and RAM 203 may be integrated with the main control unit 201. Further, the ROM 202 may not use an independent configuration as shown in FIG. 24 but may use a partial storage area in the storage unit 204.
- the storage unit 204 stores an operation program and an operation setting value of the information terminal device 2, personal information of the user of the information terminal device 2, and the like. Further, it is possible to store an operation program downloaded from the network, various data created by the operation program, and the like. Also, contents such as moving images, still images, and audio downloaded from the network can be stored. All or some of the functions of the ROM 202 may be replaced by a partial area of the storage unit 204. In addition, the storage unit 204 needs to hold stored information even when power is not supplied to the information terminal device 2 from the outside. Therefore, for example, devices such as a flash ROM, SSD, and HDD are used.
- each operation program stored in the ROM 202 or the storage unit 204 can be updated and expanded by a download process from a server device on the external network 4.
- the expansion I / F unit 205 is a group of interfaces for extending the functions of the information terminal device 2, and in this embodiment, includes an image / audio I / F, a USB I / F, a memory I / F, and the like. And
- the video / audio I / F performs input of video signals / audio signals from external video / audio output devices, output of video signals / audio signals to external video / audio input devices, and the like.
- the USB I / F transmits and receives data by connecting to a PC or the like. A keyboard or other USB device may be connected.
- the memory I / F connects a memory card or other memory medium to transmit / receive data.
- the operation unit 206 is an instruction input unit that inputs an operation instruction to the information terminal device 2, and in this embodiment, is configured by a touch panel that is arranged over the display unit 231. For example, a gesture called swipe that moves a finger in a specific direction while touching the touch panel with a finger, a gesture called tap that is quickly released after touching the touch panel with a finger, and the like are detected. Operation input is possible.
- the sensor unit 210 is a sensor group for detecting the state of the information terminal device 2.
- the information terminal device 2 may further include other sensors such as a barometric pressure sensor.
- the communication processing unit 220 includes a LAN communication unit 221, a mobile telephone network communication unit 222, and a Bluetooth communication unit 223.
- the LAN communication unit 221 is connected to the external network 4 via the router device 3 and transmits / receives data to / from a server device on the external network 4. It is assumed that the connection with the router device 3 is performed by a wireless connection such as WiFi (registered trademark).
- the mobile telephone network communication unit 222 performs telephone communication (call) and data transmission / reception by wireless communication with a base station (not shown) of the mobile telephone communication network.
- the LAN communication unit 221, the mobile telephone network communication unit 222, and the Bluetooth communication unit 223 are each provided with a coding circuit, a decoding circuit, an antenna, and the like.
- the communication processing unit 220 may further include another communication unit such as an NFC communication unit or an infrared communication unit.
- the image processing unit 230 includes a display unit 231, an image signal processing unit 232, a first image input unit 233, and a second image input unit 234.
- the display unit 231 is a display device such as a liquid crystal panel, for example, and provides the image data processed by the image signal processing unit 232 to the user of the information terminal device 2.
- the image signal processing unit 232 includes a video RAM (not shown), and image data input to the video RAM is displayed on the display unit 231.
- the image signal processing unit 232 has a function of performing format conversion, menu and other OSD signal superimposition processing as necessary.
- the first image input unit 233 and the second image input unit 234 input image data of surroundings and objects by converting light input from a lens into an electrical signal using an electronic device such as a CCD or a CMOS sensor. It is a camera unit.
- the audio processing unit 240 includes an audio output unit 241, an audio signal processing unit 242, and an audio input unit 243.
- the audio output unit 241 is a speaker, and provides the audio signal processed by the audio signal processing unit 242 to the user of the information terminal device 2.
- the voice input unit 243 is a microphone, which converts a user's voice and the like into voice data and inputs the voice data.
- the information terminal device 2 may be a mobile phone, a smartphone, a tablet terminal, or the like. It may be a PDA (Personal Digital Assistant) or a notebook PC (Personal Computer).
- PDA Personal Digital Assistant
- notebook PC Personal Computer
- the configuration example of the information terminal device 2 illustrated in FIG. 24 includes a number of configurations that are not essential to the present embodiment, such as the sensor unit 210, but this embodiment may be configured without these components. There is no loss of effect. Further, a configuration not shown in the figure such as a digital broadcast receiving function and an electronic money settlement function may be further added.
- FIG. 25 is a diagram illustrating an internal configuration of the information terminal device 2 according to the present embodiment, in which (a) is a software configuration diagram of the information terminal device 2 and (b) is stored in the voice conversion information storage area. Indicates the dictionary to be used.
- FIG. 25A shows a software configuration in the ROM 202, the RAM 203, and the storage unit 204.
- the basic operation program 2021 stored in the ROM 202 is expanded in the RAM 203, and the main control unit 201 executes the expanded basic operation program to constitute the basic operation execution unit 2031.
- the application program 2041, the voice conversion program 2042, and the cooperation device management program 2043 stored in the storage unit 204 are expanded in the RAM 203, and the main control unit 201 executes the expanded operation programs so that the application program is executed.
- An execution unit 2032, a voice conversion execution unit 2033, and a linked device management execution unit 2034 are configured.
- the RAM 203 includes a temporary storage area that temporarily stores data created when each operation program is executed as necessary.
- the storage unit 204 also includes a voice conversion information storage area 204a for storing a dictionary or the like for converting a voice-recognized character string into a predetermined character string, and authentication information used in a cooperative operation with the broadcast receiving apparatus 1 or the like.
- a cooperative device information storage area 204b for storing information and a variety of information storage areas 204c for storing various other information are provided.
- a voice recognition dictionary 204a1 that recognizes an input voice and converts it into a character string, and converts the voice recognized character string into a predetermined character string. It is assumed that a normal character conversion dictionary (first dictionary) 204a2 and a special character conversion dictionary (second dictionary) 204a3 are stored.
- the normal character conversion dictionary (first dictionary) 204a2 and the special character conversion dictionary (second dictionary) 204a3 are the same as the normal character conversion dictionary and the special character conversion dictionary of the first embodiment.
- a user when a user is specified like a smart phone or a mobile phone, it can be configured as a normal character conversion dictionary and a special character conversion dictionary corresponding to the user.
- the information processing terminal 2 is a device used by a plurality of users, for example, a PC or a tablet terminal, a normal character conversion dictionary and a special character conversion dictionary corresponding to each user are prepared, and a user is designated for these devices.
- a normal character conversion dictionary and a special character conversion dictionary corresponding to the logged-in user may be selected by the voice conversion executing unit 2033 described later.
- the main control unit 201 executes the basic operation execution by executing the basic operation program 2021 stored in the ROM 202 by expanding the basic operation program 2021 in the RAM 203 and executing it.
- the unit 2031 is described as performing control of each operation block. The same description is made for other operation programs.
- the application execution unit 2032 executes various operation programs downloaded from the server device. Each application is activated by receiving an operation from the user via the operation unit 206 and selecting an application activation icon displayed on the display unit 231.
- the voice conversion execution unit 2033 recognizes the user's voice captured by the voice input unit 243 as a character string based on the voice recognition dictionary 204a1, and performs an operation input such as channel selection of the broadcast receiving apparatus 1 or a voice-recognized character string (recognition). Character string) is converted into a predetermined character string according to the dictionaries 204a2 and 204a3, and characters are input.
- the linked device management execution unit 2034 registers and manages the broadcast receiving device 1 connected to the local network in the house or the external network 4 such as the Internet, and executes the operation of the broadcast receiving device 1 registered by the information terminal device 2.
- the RAM 203 includes a temporary storage area 2035 that serves as a work area for the main control unit 201.
- the operation programs may be stored in the ROM 202 and / or the storage unit 204 in advance at the time of product shipment. After the product is shipped, it may be acquired from the server device or the like via the external network 4 via the LAN communication unit 221 or the mobile telephone network communication unit 222. Further, each operation program stored in a memory card, an optical disk, or the like may be acquired via the expansion I / F unit 205 or the like.
- FIG. 26 is a sequence diagram illustrating an example of a character input process by voice recognition according to the fifth embodiment.
- FIG. 27 is a diagram showing a screen display example of the information terminal device 2, where (a) shows an application startup screen, (b) shows a device authentication screen, and (c) shows a character input screen.
- FIG. 27A shows an example of a screen that is selected and activated in the information terminal device 2, and a button (icon) corresponding to the application is displayed on the display unit 231.
- a button (icon) corresponding to the application is displayed on the display unit 231.
- the user inputs the activation of the application by tapping the “TV cooperation” button 231 a 1, the application for operating the broadcast receiving apparatus 1 is activated in cooperation with the information terminal apparatus 2.
- the main control unit 201 displays an authentication screen for device authentication on the display unit 231 (S2603), and the user performs selection input of a cooperation device (S2604), and selects the cooperation device.
- FIG. 27 is an example of a device authentication screen, and a device list 23b1, a selection frame 231b2, and an enter button 231b3 are displayed.
- the linked device management execution unit 2034 displays the device list 23b1.
- the linked device management execution unit 2034 displays the device name in the device list 23b1 regardless of whether the device name is authenticated or not authenticated.
- the linked device management execution unit 2034 stores the found device name in the authentication information storage area 204b of the storage unit 204 together with the authenticated or unauthenticated information. If a device was found in the past but could not be found this time, the device name can be displayed with a different display color to distinguish it from other devices.
- “TV1 (broadcast receiving device 1)” and “TV2 (broadcast receiving device 2)” are displayed in the device list 231b1, and it is confirmed that TV1 has been previously authenticated. Show.
- the selection frame 231b2 is displayed on the display portion of “TV1”. Is selected.
- the cooperation device management execution unit 2034 broadcasts authentication information such as a user ID and a password stored in the cooperation device information storage area 204b in advance via the LAN communication unit 221, the router device 3, and the LAN communication unit 117. It transmits with respect to the cooperation terminal management execution part 1036 of the receiver 1 (S2605).
- the cooperation terminal management execution unit 1036 of the broadcast receiving device 1 performs authentication by comparing the authentication information stored in the cooperation terminal information storage area 104d with the authentication information transmitted from the cooperation device management execution unit 2034 of the information terminal device 2. (S2606), and returns the authentication result to the linked device management execution unit 2034 of the information terminal device 2 (S2607). If the authentication information matches, the cooperation terminal management execution unit 1036 authenticates the connection to the cooperation device management execution unit 2034.
- the information terminal device 2 can omit the authentication screen of FIG. 27B by storing the last authenticated device in the linked device information storage area 204b. If the authentication information does not match, the authentication screen shown in FIG. 27B may be displayed again.
- the information terminal device 2 is authenticated as a cooperation terminal of the broadcast receiving device 1, and the user can operate the broadcast receiving device 1 using the information terminal device 2.
- the broadcast receiving apparatus 1 displays a character input screen (S2610), and transmits a software keyboard activation request to the cooperating information terminal apparatus 2 (S2611).
- the display of the character string input screen is displayed, for example, when an account name or password for logging in to the server is input in order to perform streaming viewing or download viewing of the content distributed from the server device via the external network 4.
- the information terminal device 2 displays a character input screen on the display unit 231 (S2612).
- FIG. 27C shows an example of a character input screen of the information terminal device 2, which includes a display frame 231 c 1 for displaying the input characters, a character input key 231 c 2, and a transmission key for transmitting the input characters to the broadcast receiving device 1.
- a character input key 231c4 by voice recognition is displayed.
- the main control unit 201 repeats the process (S2613) until the transmission key 231c3 is selected.
- the main control unit 201 determines the key input state by the key input state determination based on the type of the key tapped by the user (S2622), and executes a branch process (S2623) based on the determination result.
- the main control unit 201 determines that the key input is performed from the character input key 231c2
- the main control unit 201 displays the character input by the key on the display frame 231c1 (S2631).
- the voice input unit 243 captures voice input from the user (S2641) ( In step S2642, the voice conversion execution unit 2033 converts the voice into a recognized character string by voice recognition based on the voice and the voice recognition dictionary 204a1 (S2643).
- the speech conversion execution unit 2033 performs a conversion process on the recognized character string according to the normal character conversion dictionary (first dictionary) 204a2 by the normal dictionary conversion / display process, and displays the converted result on the display frame 231c1 (S2644). ).
- a normal character conversion dictionary general words and phrases are stored in advance in the ROM 202 and / or the storage unit 204 at the time of product shipment. Alternatively, it may be acquired from the server device on the Internet 4 via the LAN communication unit 221 or the telephone network communication unit 222 after product shipment. Further, each operation program stored in a memory card, an optical disk, or the like may be acquired via the expansion I / F unit 205. Alternatively, the user may register in the same manner as the dictionary registration process described above with reference to FIGS.
- the voice input from the user (S2651) is set as voice. It is captured by the input unit 243 (S2652), converted into a recognized character string by speech recognition based on the speech captured by the speech conversion execution unit 2033 and the speech recognition dictionary 204a1 (S2653), and special dictionary conversion / display processing (S2654) is performed. .
- the speech conversion execution unit 2033 performs a conversion process on the recognized character string according to the special character conversion dictionary (second dictionary) 204a3 registered by the user, and displays the converted result in the display frame 231c1. (S2654).
- the main control unit 201 determines that the transmission key 231c3 is input based on the key input state determination result, the main control unit 201 broadcasts the character string displayed in the display frame 231c1 in the interruption process.
- the data is transmitted to the receiving apparatus 1 (S2661), and the repetition process S2613 is terminated.
- the broadcast receiving apparatus 1 processes the character string transmitted from the information terminal apparatus 2 as a character input (S2614).
- the registration process for the normal character conversion dictionary and the special character conversion dictionary may be performed in the same manner as in the first embodiment.
- the dictionary registration process may be activated when the user touches the voice input key 231c4 for a predetermined time or more.
- the information terminal device 2 stores a normal character conversion dictionary and a special character conversion dictionary. For this reason, even if a user inputs a character string registered in the information terminal device A by voice in another information terminal device B, the character string is converted into a character string by the normal character conversion dictionary or the special character conversion dictionary of the information terminal device A. Absent. Therefore, in an information terminal device such as a smartphone that performs user authentication when used, user authentication when performing voice input may be omitted.
- the broadcast receiving device 1 and the information terminal device 2 are linked by the LAN communication units 117 and 221 and the router device 3, but may be linked by a communication method such as Bluetooth.
- the password since the password is not directly inputted by voice when inputting the password, the password is not known even if it is asked by another person.
- complex character strings can be input by inputting simple character strings by voice.
- character conversion is performed by the first operation input operation (for example, the voice input key is pressed once within a predetermined time or less) and the second operation input operation (for example, the voice input key is pressed twice at a predetermined time or less).
- the first operation input operation for example, the voice input key is pressed once within a predetermined time or less
- the second operation input operation for example, the voice input key is pressed twice at a predetermined time or less.
- character input by Japanese speech has been described.
- the result of speech recognition is set as character input, and the second operation input operation is performed.
- the result of converting a character string recognized by speech using a special character conversion dictionary may be used as a character input. That is, a character string recognized by voice in a predetermined operation input operation (for example, a voice input key is pressed twice within a predetermined time or less) may be converted based on a character conversion dictionary.
- the voice input unit 111 of the broadcast receiving apparatus 1 is used for voice input.
- the remote control 120 is provided with a voice input unit such as Bluetooth.
- the voice input may be performed by transmitting to the broadcast receiving apparatus 1 by the communication method.
- voice is input by the voice input unit 243 of the information terminal device 2, and the captured voice data is transmitted to the broadcast receiving device 1 via the LAN communication unit 221, the router device 3, and the LAN communication unit 117. May be.
- the broadcast receiving apparatus 1 and the information terminal apparatus 2 may be provided with a communication unit such as Bluetooth, and voice input may be performed by transmitting audio data from the information terminal apparatus 2 to the broadcast receiving apparatus 1.
- the captured speech data may be transmitted to the server device connected to the network, and the speech recognition may be performed in the server device.
- the broadcast receiving device 1 or the information terminal device 2 may receive the character string information recognized by the server device and perform character string conversion using the normal character conversion dictionary or special character conversion dictionary.
- the first operation and the second operation for selecting each of the normal character conversion dictionary and the special character conversion dictionary are determined in advance, and based on whether the user has performed the first operation or the second operation.
- the dictionaries are switched, but the dictionaries may be switched not from user operations but from program descriptive terms.
- the speech conversion execution unit 1034 or the dictionary registration execution unit 1037 is a character input field type, for example, a user password input field, or a search keyword
- the input field to be changed to a special character string for example, the user password input field is selected as the input destination for character input
- a special character conversion dictionary may be selected as a dictionary for converting the recognized character string.
- a normal character conversion dictionary may be selected.
- the user can perform conversion to a normal character string or conversion to a special character string only by performing an input field selection operation (for example, an operation for aligning the cursor) without performing a dictionary selection operation.
- the functions and the like of the present invention described above may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Further, the microprocessor unit or the like may be realized by software by interpreting and executing an operation program that realizes each function or the like. Hardware and software may be used together.
- control lines and information lines shown in the figure are those that are considered necessary for the explanation, and not all control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.
- 1 broadcast receiving device
- 2 information terminal device
- 3 router device
- 4 external network
- 101 main control unit
- 102 ROM
- 103 RAM
- 104 storage unit
- 111 audio input unit
- 113 audio Output unit
- 114 imaging unit
- 116 LAN communication unit
- 120 remote controller
- 1034 voice conversion execution unit
- 1035 user authentication execution unit
- 1036 linkage terminal management unit
- 201 main control unit
- 203 RAM
- 204 storage unit
- 243 voice input unit
- 2033 voice conversion execution unit
- 2034 cooperation device management execution unit
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The purpose of the present invention is to provide a voice conversion technique providing improved usability. In order to achieve this purpose, the present invention: (S412) refers to special character conversion dictionary information that associates each of a plurality of first characters with a special character including at least one character or symbol that is pronounced differently than the first character; and (S423) converts voice information spoken by a user into one or more recognized characters comprising characters that are pronounced the same as the voice information and converts the one or more recognized characters into a special character by reference to the special character conversion dictionary.
Description
本発明は、音声変換装置、音声変換方法、及び音声変換プログラムに関し、特にユーザーが発声した音声の変換技術に関する。
The present invention relates to a voice conversion device, a voice conversion method, and a voice conversion program, and more particularly to a technique for converting voice uttered by a user.
近年、放送波を受信して番組を視聴するだけでなく、インターネットを利用してサーバ装置上にある情報を検索して表示したり、サーバ装置から配信される音声や動画のコンテンツを視聴したりするサービスに対応した放送受信装置が普及しつつある。これらのサービスを利用する際には、検索するキーワード等の文字列や、コンテンツ配信サービスにログインする際の文字を入力する必要が生じる。そこで従来の放送受信装置では、ユーザがリモコンの操作ボタンを指で押して文字の入力操作を行っていたが、この操作は難しいという問題があった。このため、音声認識により検索等の文字列を入力する方法が提案されている。
In recent years, in addition to receiving broadcast waves and viewing programs, the Internet can be used to search and display information on the server device, and to view audio and video content distributed from the server device. Broadcast receiving apparatuses corresponding to services to be used are becoming widespread. When using these services, it is necessary to input a character string such as a keyword to be searched or a character for logging in to the content distribution service. Therefore, in the conventional broadcast receiving apparatus, the user performs the character input operation by pressing the operation button of the remote controller with a finger, but there is a problem that this operation is difficult. For this reason, a method of inputting a character string such as a search by voice recognition has been proposed.
例えば特許文献1では「番組情報からキーワードを抽出し、キーワードを音声認識情報とともに登録した音声認識用辞書を生成し、音声認識用辞書の部分辞書を基にユーザ要求に応じてキーワードを画面表示し、画面表示したキーワードからの選択によりユーザは音声によりキーワード入力を行う」方法が記載されている(要約参照)。
For example, in Patent Document 1, “a keyword is extracted from program information, a speech recognition dictionary in which the keyword is registered together with the speech recognition information is generated, and the keyword is displayed on the screen in response to a user request based on the partial dictionary of the speech recognition dictionary. The user inputs the keyword by voice by selecting from the keyword displayed on the screen "(see summary).
検索するキーワード等の文字列は通常日常で使用される言葉が使用され、音声認識しやすいため、文字列の入力を容易にするのに音声入力は便利である。これに対して、サーバ装置にログインする際に入力するアカウント名やパスワードは、不規則な英数字や“@”等の特殊な文字や記号の組み合わせたものを用いることがある。そのため、音声認識により入力する場合には、ユーザは文字列を構成する英数字等を一つずつ音声入力する必要があり、使い勝手が悪いという課題がある。
文字 Words that are used in daily search are usually used for the character strings such as keywords to be searched, and speech recognition is easy. Therefore, speech input is convenient for easy input of character strings. On the other hand, the account name and password input when logging in to the server apparatus may be a combination of irregular alphanumeric characters or special characters and symbols such as “@”. Therefore, when inputting by voice recognition, the user needs to input one by one the alphanumeric characters constituting the character string one by one, and there is a problem that usability is poor.
本発明の目的は、より使い勝手の良い音声変換技術を提供することである。
An object of the present invention is to provide a more convenient voice conversion technology.
上記課題を解決するために、本発明は、ユーザが発声した音声情報を、当該音声情報の読みを表す表音文字からなる認識文字に変換し、前記表音文字、及び当該表音文字の読みとは異なる読みを有する文字及び記号の少なくとも一つを含む特殊文字を関連づけて記憶する特殊文字変換辞書情報を参照し、前記認識文字を前記特殊文字に変換することを特徴とする。
In order to solve the above-mentioned problem, the present invention converts speech information uttered by a user into a recognized character composed of a phonetic character representing the reading of the speech information, and reads the phonetic character and the reading of the phonetic character. The special character conversion dictionary information that stores a special character including at least one of a character and a symbol having a different reading from each other is referred to, and the recognized character is converted into the special character.
本発明の技術を用いることにより、より使い勝手の良い音声変換技術を提供することができる。なお、上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。
By using the technology of the present invention, it is possible to provide a more convenient voice conversion technology. Problems, configurations, and effects other than those described above will be clarified by the following description of the embodiments.
以下、図面等を用いて、本発明の実施形態について説明する。以下の説明は本発明の内容の具体例を示すものであり、本発明がこれらの説明に限定されるものではなく、本明細書に開示される技術的思想の範囲内において当業者による様々な変更および修正が可能である。また、本発明を説明するための全図において、同一の機能を有するものは、同一の符号を付け、その繰り返しの説明は省略する場合がある。
Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following description shows specific examples of the contents of the present invention, and the present invention is not limited to these descriptions. Various modifications by those skilled in the art are within the scope of the technical idea disclosed in this specification. Changes and modifications are possible. In all the drawings for explaining the present invention, components having the same function are denoted by the same reference numerals, and repeated description thereof may be omitted.
<第一実施形態>
以下の実施形態では、本発明に係る音声変換技術を放送受信装置に適用した実施形態を例に挙げて説明するが、本発明は、放送受信装置に限らず、音声入力を行う装置、例えば、PC(Personal Computer)やスマートフォン、タブレット端末装置などの情報処理装置におけるユーザログイン技術や監視区域への入退管理システム等、音声入力技術を適用できる技術に適用することができる。 <First embodiment>
In the following embodiment, an embodiment in which the sound conversion technology according to the present invention is applied to a broadcast receiving device will be described as an example.However, the present invention is not limited to a broadcast receiving device, but a device that performs sound input, for example, The present invention can be applied to a technology to which a voice input technology can be applied, such as a user login technology in an information processing apparatus such as a PC (Personal Computer), a smartphone, and a tablet terminal device, and an entry / exit management system for a monitoring area.
以下の実施形態では、本発明に係る音声変換技術を放送受信装置に適用した実施形態を例に挙げて説明するが、本発明は、放送受信装置に限らず、音声入力を行う装置、例えば、PC(Personal Computer)やスマートフォン、タブレット端末装置などの情報処理装置におけるユーザログイン技術や監視区域への入退管理システム等、音声入力技術を適用できる技術に適用することができる。 <First embodiment>
In the following embodiment, an embodiment in which the sound conversion technology according to the present invention is applied to a broadcast receiving device will be described as an example.However, the present invention is not limited to a broadcast receiving device, but a device that performs sound input, for example, The present invention can be applied to a technology to which a voice input technology can be applied, such as a user login technology in an information processing apparatus such as a PC (Personal Computer), a smartphone, and a tablet terminal device, and an entry / exit management system for a monitoring area.
[放送受信装置のハードウェア構成]
図1は、本発明の実施形態に係る放送受信装置1の構成例を示すブロック図である。図1に示すように、放送受信装置1は、例えば家庭内に設置され、不図示の無線又は有線LAN(Local Area Network)を介してルータ3に電気的に接続される。ルータ3には、例えば無線LANを介してスマートフォンやタブレット端末等の情報端末装置2が接続され、これらの情報端末装置2はルータ3及びLANを介して放送受信装置1に接続される。ルータ装置3は、インターネットなどの外部公衆回線網4にも接続される。さらに放送受信装置1は、リモコン120と赤外線通信により無線接続される。また、放送受信装置1は放送局5から放送波を受信するためのアンテナ107を備え、公共の放送波を受信する。 [Hardware configuration of broadcast receiver]
FIG. 1 is a block diagram showing a configuration example of abroadcast receiving apparatus 1 according to an embodiment of the present invention. As shown in FIG. 1, the broadcast receiving apparatus 1 is installed in a home, for example, and is electrically connected to the router 3 via a wireless or wired LAN (Local Area Network) (not shown). For example, an information terminal device 2 such as a smartphone or a tablet terminal is connected to the router 3 via a wireless LAN, and these information terminal devices 2 are connected to the broadcast receiving device 1 via the router 3 and the LAN. The router device 3 is also connected to an external public network 4 such as the Internet. Furthermore, the broadcast receiving apparatus 1 is wirelessly connected to the remote controller 120 by infrared communication. In addition, the broadcast receiving apparatus 1 includes an antenna 107 for receiving broadcast waves from the broadcast station 5 and receives public broadcast waves.
図1は、本発明の実施形態に係る放送受信装置1の構成例を示すブロック図である。図1に示すように、放送受信装置1は、例えば家庭内に設置され、不図示の無線又は有線LAN(Local Area Network)を介してルータ3に電気的に接続される。ルータ3には、例えば無線LANを介してスマートフォンやタブレット端末等の情報端末装置2が接続され、これらの情報端末装置2はルータ3及びLANを介して放送受信装置1に接続される。ルータ装置3は、インターネットなどの外部公衆回線網4にも接続される。さらに放送受信装置1は、リモコン120と赤外線通信により無線接続される。また、放送受信装置1は放送局5から放送波を受信するためのアンテナ107を備え、公共の放送波を受信する。 [Hardware configuration of broadcast receiver]
FIG. 1 is a block diagram showing a configuration example of a
放送受信装置1は、主制御部101、ROM(Read Only Memory)102或いはRAM(Random Access Memory)103、ストレージ部104、外部インターフェース(以下インターフェースを「I/F部」と記載する)部105、操作部106、放送局5から放送波を受信するためのアンテナ107、アンテナ107に接続されるチューナ/復調部108、分離部109、デコーダ部110、音声入力部111、音声処理部112、音声出力部113、撮像部114、画像処理部115、表示部116、及びLAN通信部117を含み、これらの各構成要素がシステムバス100により互いに電気的に接続される。システムバス100は主制御部101と放送受信装置1内の各部との間でデータ送受信を行うためのデータ通信路である。
The broadcast receiving apparatus 1 includes a main control unit 101, a ROM (Read Only Memory) 102 or a RAM (Random Access Memory) 103, a storage unit 104, an external interface (hereinafter referred to as an “I / F unit”) unit 105, Operation unit 106, antenna 107 for receiving broadcast waves from broadcasting station 5, tuner / demodulation unit 108 connected to antenna 107, separation unit 109, decoder unit 110, audio input unit 111, audio processing unit 112, audio output Unit 113, imaging unit 114, image processing unit 115, display unit 116, and LAN communication unit 117, and these components are electrically connected to each other via a system bus 100. The system bus 100 is a data communication path for performing data transmission / reception between the main control unit 101 and each unit in the broadcast receiving apparatus 1.
主制御部101は、演算・制御処理を行うCPU(Central Processing Unit)等で構成される。主制御部101は、ROM102、RAM103、及び/又はストレージ部104に記憶された各種の動作プログラム及びデータをRAM103に展開(ロード)して実行する。これにより、動作プログラム(ソフトウェア)がCPU(ハードウェア)と協働し、放送受信装置1の多種の機能が実現される。そして主制御部101が放送受信装置1の全体を制御する。
The main control unit 101 includes a CPU (Central Processing Unit) that performs arithmetic and control processing. The main control unit 101 develops (loads) various operation programs and data stored in the ROM 102, the RAM 103, and / or the storage unit 104 into the RAM 103 and executes them. Thereby, an operation program (software) cooperates with CPU (hardware), and various functions of broadcast receiving device 1 are realized. The main control unit 101 controls the entire broadcast receiving apparatus 1.
ROM102は、オペレーティングシステムなどの基本動作プログラムやその他の動作プログラムが格納されたメモリであり、例えばEEPROM(Electrically Erasable Programmable ROM)やフラッシュROMのような書き換え可能なROMが用いられる。
The ROM 102 is a memory in which a basic operation program such as an operating system and other operation programs are stored. For example, a rewritable ROM such as an EEPROM (Electrically Erasable Programmable や ROM) or a flash ROM is used.
RAM103は基本動作プログラムやその他の動作プログラム(アプリケーション)実行時のワークエリアとなる。ROM102及びRAM103は主制御部101と一体に構成されてもよい。また、ROM102は、図1に示したような独立構成とはせず、ストレージ部104内の一部記憶領域を使用するようにしても良い。
The RAM 103 serves as a work area for executing a basic operation program and other operation programs (applications). The ROM 102 and the RAM 103 may be configured integrally with the main control unit 101. Further, the ROM 102 may not use an independent configuration as shown in FIG. 1 but may use a partial storage area in the storage unit 104.
ストレージ部104は、電源をオフした状態でもデータを保持して記憶することが可能なデバイス、例えばHDD(Hard Disc Drive)を用いて構成され、放送受信装置1の動作プログラムや動作設定値等を記憶する。また、LAN通信部117、ルータ装置3、外部ネットワーク4を介してサーバ装置(図示せず)に接続し、サーバ装置からダウンロードした新規の動作プログラム(アプリケーション)や上記動作プログラムで作成した各種データ等を記憶可能である。また、放送波から取得した、或いはネットワーク上のサーバ装置からダウンロードした、動画、静止画、音声等のコンテンツも記憶可能である。
The storage unit 104 is configured using a device capable of holding and storing data even when the power is turned off, such as an HDD (Hard Disc Drive), and stores an operation program, an operation setting value, and the like of the broadcast receiving device 1. Remember. Also, a new operation program (application) downloaded from the server device connected to the server device (not shown) via the LAN communication unit 117, the router device 3, and the external network 4, various data created by the operation program, etc. Can be stored. In addition, contents such as moving images, still images, and voices acquired from broadcast waves or downloaded from server devices on the network can be stored.
外部I/F部105は、放送受信装置1の機能を拡張するためのインターフェース群であり、本実施形態では、映像入力I/F部105a、音声入力I/F部105b、及びUSB(Universal Serial Bus)I/F部105cを有する。映像入力I/F部105a及び音声入力I/F部105bは、外部映像/音声出力機器からの映像信号/音声信号の入力を行う。USBI/F部105cはキーボード等のUSB機器若しくはメモリカード類の接続等を行う。放送受信装置1が外部に接続されたHDD装置等にデジタル放送番組の録画を行う場合、USB I/F部105cにHDD装置等を接続すれば良い。また、映像入力I/F部105a、音声入力I/F部105bはHDMI(High-Definition Multimedia Interface:登録商標)を用いて、映像と音声を一緒に入力するようにしてもよい。
The external I / F unit 105 is an interface group for extending the function of the broadcast receiving apparatus 1. In this embodiment, the external I / F unit 105 is a video input I / F unit 105 a, an audio input I / F unit 105 b, and a USB (Universal Serial Interface). Bus) I / F unit 105c. The video input I / F unit 105a and the audio input I / F unit 105b input video signals / audio signals from an external video / audio output device. The USB I / F unit 105c connects a USB device such as a keyboard or a memory card. When the broadcast receiving apparatus 1 records a digital broadcast program on an HDD apparatus or the like connected to the outside, the HDD apparatus or the like may be connected to the USB I / F unit 105c. Further, the video input I / F unit 105a and the audio input I / F unit 105b may input video and audio together using HDMI (High-Definition Multimedia Interface: registered trademark).
操作部106は、ユーザが放送受信装置1に対する操作指示の入力を行うための入力デバイスを用いて構成される。本実施形態では、操作部106は、ボタンスイッチを並べた操作キー106a及びリモコン120からの赤外線信号を受信するリモコン受信部106bを有する。USBI/F部105cに接続されたキーボード等を用いて放送受信装置1の操作を行っても良い。また、LAN通信部117によりルータ装置3に接続し、宅内のローカルネットワークに接続された情報端末装置2やその他のPC等を用いて放送受信装置1の操作をする際には、これら情報端末装置2やリモコン120が操作部106として機能する。
The operation unit 106 is configured by using an input device for a user to input an operation instruction to the broadcast receiving apparatus 1. In the present embodiment, the operation unit 106 includes an operation key 106 a in which button switches are arranged and a remote control reception unit 106 b that receives an infrared signal from the remote control 120. The broadcast receiving apparatus 1 may be operated using a keyboard or the like connected to the USB I / F unit 105c. In addition, when operating the broadcast receiving apparatus 1 by using the information terminal apparatus 2 connected to the local network in the house, other PCs, etc., connected to the router apparatus 3 by the LAN communication unit 117, these information terminal apparatuses 2 and the remote controller 120 function as the operation unit 106.
チューナ/復調部108は、アンテナ107が受信した放送波からユーザが選局したチャンネルの信号を抽出してTS(Transport Stream)信号を復調する。
The tuner / demodulator 108 demodulates a TS (Transport Stream) signal by extracting a channel signal selected by the user from the broadcast wave received by the antenna 107.
分離部109は、上記TS信号をそれぞれパケット化された映像データと音声データと付随情報データとに分離し、デコーダ部110に出力する。
The separation unit 109 separates the TS signal into packetized video data, audio data, and accompanying information data, and outputs them to the decoder unit 110.
デコーダ部110は、音声デコーダ110a、映像デコーダ110b、情報デコーダ110cを含む。そして、分離部109でパケット化された音声データは音声デコーダ110aに、映像データは映像デコーダ110bに、付随情報データは情報デコーダ110cにそれぞれ出力される。
The decoder unit 110 includes an audio decoder 110a, a video decoder 110b, and an information decoder 110c. The audio data packetized by the separation unit 109 is output to the audio decoder 110a, the video data is output to the video decoder 110b, and the accompanying information data is output to the information decoder 110c.
音声デコーダ110aは、分離部109から出力された音声データをデコードし、音声信号として音声処理部112に出力する。映像デコーダ110bは、分離部109から出力された映像データをデコードし、映像信号として画像処理部115に出力する。情報デコーダ110cは、分離部109から出力された付随情報データを処理し、特に各番組の番組名、ジャンル、放映開始/終了日時、等の番組情報を含むSI(Service Information)情報等を取得する。
The audio decoder 110a decodes the audio data output from the separation unit 109 and outputs the audio data to the audio processing unit 112 as an audio signal. The video decoder 110b decodes the video data output from the separation unit 109 and outputs the decoded video data to the image processing unit 115 as a video signal. The information decoder 110c processes the accompanying information data output from the separation unit 109, and acquires SI (Service Information) information including program information such as the program name, genre, broadcast start / end date and time of each program in particular. .
音声入力部111はマイクであり、外部の音声を取り込んで音声処理部112に出力する。
The audio input unit 111 is a microphone, and takes in external audio and outputs it to the audio processing unit 112.
音声処理部112は、音声入力部111が取り込んだ音声をA/D(Analog/Digital Converter)変換したり、音声出力部114に出力する音声信号をD/A変換(Digital/Analog Converter)したりする。
The audio processing unit 112 performs A / D (Analog / Digital Converter) conversion on the audio captured by the audio input unit 111, or performs D / A conversion (Digital / Analog® Converter) on the audio signal output to the audio output unit 114. To do.
音声出力部113はスピーカであり、音声処理部112で処理した音声信号を出力する。
The audio output unit 113 is a speaker and outputs the audio signal processed by the audio processing unit 112.
撮像部114はCCD(Charge Coupled Device)やCMOS(Complementary Metal Oxide Semiconductor)センサ等の電子デバイスを用いてレンズから入力した光を電気信号に変換することにより放送受信装置1周辺の映像を取り込んで画像処理部115に出力するカメラである。
The imaging unit 114 captures an image around the broadcast receiving apparatus 1 by converting light input from a lens into an electrical signal using an electronic device such as a CCD (Charge-Coupled Device) or a CMOS (Complementary-Metal-Oxide-Semiconductor) sensor. It is a camera that outputs to the processing unit 115.
画像処理部115は、入力された映像信号に対して、必要に応じてフォーマット変換、メニューやその他のOSD(On Screen Display)信号の重畳処理等を行う。
The image processing unit 115 performs format conversion, menu and other OSD (On Screen Display) signal superimposition processing on the input video signal as necessary.
表示部116は、例えば液晶パネルのような表示デバイスであり、画像処理部115で処理された映像信号を表示する。
The display unit 116 is a display device such as a liquid crystal panel, and displays the video signal processed by the image processing unit 115.
LAN通信部117は有線または無線によりルータ装置3と接続され、宅内のローカルネットワークに接続された機器、或いはインターネット等の外部ネットワーク4に接続されたサーバ装置等と情報の送受信を行う。また、情報端末装置2とルータ装置3を介して接続され、ユーザが情報端末装置2を用いて放送受信装置1を操作することができる。また、LAN通信部117は放送受信装置1が、Bluetooth(登録商標)通信部やNFC(Near Field Communication)通信部等、他の通信部を更に備えていても良い。
The LAN communication unit 117 is connected to the router device 3 by wire or wirelessly, and transmits / receives information to / from a device connected to a local network in a home or a server device connected to an external network 4 such as the Internet. Further, the information terminal device 2 is connected to the router device 3, and the user can operate the broadcast receiving device 1 using the information terminal device 2. The LAN communication unit 117 may further include another communication unit such as the Bluetooth (registered trademark) communication unit or the NFC (Near Field Communication) communication unit in the broadcast receiving device 1.
なお、以上の説明では音声入力部111及び撮像部114を放送受信装置1に内蔵している構成としたが、USBI/F部105cを介して外部に設けられたカメラ或いはマイクを用いるようにしてもよい。また、リモコン120からリモコン受信部106bへの通信は赤外線により行う構成としたが、これに限らずBluetoothなどの他の通信方法を用いるようにしてもよい。
In the above description, the audio input unit 111 and the imaging unit 114 are built in the broadcast receiving apparatus 1, but a camera or a microphone provided outside via the USB I / F unit 105c is used. Also good. The communication from the remote control 120 to the remote control receiving unit 106b is configured to be performed by infrared rays, but the present invention is not limited to this, and other communication methods such as Bluetooth may be used.
また、放送受信装置1は、BD(Blu-ray Disc:登録商標)レコーダ、HDDレコーダなどの放送録画再生装置、STB(Set Top Box)等であっても良い。放送受信装置1がDVDレコーダ、HDDレコーダ、STB等である場合、表示部116及び音声出力部113に代替して映像信号出力部及び音声信号出力部を備えれば良い。前記映像信号出力部及び音声信号出力部に外部モニタ及び外部スピーカを接続することにより、本実施形態の放送受信装置1と同様の動作が可能となる。
Also, the broadcast receiving apparatus 1 may be a broadcast recording / playback apparatus such as a BD (Blu-ray Disc: registered trademark) recorder or HDD recorder, an STB (Set Top Box), or the like. When the broadcast receiving apparatus 1 is a DVD recorder, HDD recorder, STB, or the like, a video signal output unit and an audio signal output unit may be provided instead of the display unit 116 and the audio output unit 113. By connecting an external monitor and an external speaker to the video signal output unit and the audio signal output unit, an operation similar to that of the broadcast receiving apparatus 1 of the present embodiment is possible.
[放送受信装置のソフトウェア構成]
図2は、本実施形態に係る放送受信装置1の内部構成を示す図であって、(a)は、放送受信装置1のソフトウェア構成図であり、(b)は音声変換情報記憶領域に記憶される辞書を示す。図2の(a)のソフトウェア構成図は、ROM102、RAM103及びストレージ部104におけるソフトウェアの構成を示す。 [Broadcast receiving device software configuration]
FIG. 2 is a diagram illustrating an internal configuration of thebroadcast receiving apparatus 1 according to the present embodiment, in which (a) is a software configuration diagram of the broadcast receiving apparatus 1 and (b) is stored in a voice conversion information storage area. Indicates the dictionary to be used. The software configuration diagram of FIG. 2A shows a software configuration in the ROM 102, the RAM 103, and the storage unit 104.
図2は、本実施形態に係る放送受信装置1の内部構成を示す図であって、(a)は、放送受信装置1のソフトウェア構成図であり、(b)は音声変換情報記憶領域に記憶される辞書を示す。図2の(a)のソフトウェア構成図は、ROM102、RAM103及びストレージ部104におけるソフトウェアの構成を示す。 [Broadcast receiving device software configuration]
FIG. 2 is a diagram illustrating an internal configuration of the
ROM102に記憶された基本動作プログラム1021はRAM103に展開され、更に主制御部101が前記展開された基本動作プログラムを実行することにより、基本動作実行部1031を構成する。
The basic operation program 1021 stored in the ROM 102 is expanded in the RAM 103, and the main control unit 101 executes the expanded basic operation program to constitute the basic operation execution unit 1031.
また、ストレージ部104に記憶されたアプリケーションプログラム1041、コンテンツ処理プログラム1042、音声変換プログラム1043、ユーザ認証プログラム1044、連携端末管理プログラム1045、及び辞書登録プログラム1046はそれぞれRAM103に展開され、更に主制御部101が前記展開された各動作プログラムを実行することにより、アプリケーション実行部1032、コンテンツ処理実行部1033、音声変換実行部1034、ユーザ認証実行部1035、連携端末管理実行部1036、及び辞書登録実行部1037を構成する。また、RAM103は、各動作プログラム実行時に作成したデータを、必要に応じて一時的に保持する一時記憶領域1038を備えるものとする。
In addition, the application program 1041, content processing program 1042, voice conversion program 1043, user authentication program 1044, cooperative terminal management program 1045, and dictionary registration program 1046 stored in the storage unit 104 are expanded in the RAM 103, and further, the main control unit The application execution unit 1032, the content processing execution unit 1033, the voice conversion execution unit 1034, the user authentication execution unit 1035, the linked terminal management execution unit 1036, and the dictionary registration execution unit are executed by the 101 executing the expanded operation programs. 1037 is configured. Further, the RAM 103 includes a temporary storage area 1038 for temporarily storing data created when each operation program is executed as necessary.
また、ストレージ部104は、ネットワーク上のサーバ装置からダウンロードした映像コンテンツを録画コンテンツとして記憶し、前記録画コンテンツに関する情報等を管理するコンテンツ情報記憶領域104a、音声認識した文字列を所定の文字列に変換する辞書等を記憶する音声変換情報記憶領域104b、ユーザ認証を行うための音声或いは画像データからなるユーザ認証情報を記憶するユーザ認証情報記憶領域104c、放送受信装置1との連携動作が可能な情報端末装置2の識別情報等を記憶する連携端末情報記憶領域104d、その他の各種情報を記憶する各種情報記憶領域104eを備えるものとする
In addition, the storage unit 104 stores video content downloaded from a server device on the network as recorded content, a content information storage area 104a for managing information related to the recorded content, and the like. A voice conversion information storage area 104b for storing a dictionary to be converted, a user authentication information storage area 104c for storing user authentication information including voice or image data for user authentication, and a cooperative operation with the broadcast receiving apparatus 1 are possible. It is assumed that a cooperation terminal information storage area 104d for storing identification information and the like of the information terminal device 2 and various information storage areas 104e for storing other various information are provided.
音声変換情報記憶領域104bには図2の(b)に示すように、入力された音声を認識して、音声の構成要素(例えば音素、音節)を認識し、表音文字(例えば日本語におけるひらがな、カタカナ、英語におけるアルファベット)からなる文字列(以下「認識文字列」という)に変換する音声認識辞書104b1、認識文字列を同じ音(読み)を有する異なる種類、形状の文字や記号を含む表意文字(例えば日本語における漢字)に変換するための通常文字変換辞書(第1辞書)104b2、及び認識文字列をそれとは異なる音の表音文字や表意文字に変換するための特殊文字変換辞書(第2辞書)104b3が記憶されているものとする。
In the voice conversion information storage area 104b, as shown in FIG. 2B, the input voice is recognized, the voice components (for example, phonemes and syllables) are recognized, and the phonetic characters (for example, in Japanese). Speech recognition dictionary 104b1 for converting to a character string (hereinafter referred to as "recognized character string") consisting of hiragana, katakana, and English alphabet, including characters and symbols of different types and shapes having the same sound (reading) as the recognized character string Normal character conversion dictionary (first dictionary) 104b2 for converting to ideographic characters (for example, kanji in Japanese), and special character conversion dictionary for converting recognized character strings to phonograms and ideograms of different sounds Assume that (second dictionary) 104b3 is stored.
なお、上記表音文字には、音素と文字種類とがほぼ完全に1対1対応の文字、例えば日本語におけるひらがな、カタカナに加え、例えば英語におけるアルファベットのように、同じ音に複数の文字が対応する場合(「Japan」の「J」と「Germany」の「G」)、異なる音に同じ文字が使われる場合(例えば「apple」の「a」と「ace」の「a」)を含む。
In addition to the above-mentioned phonetic characters, phonemes and character types are almost completely one-to-one correspondence, for example, hiragana and katakana in Japanese, as well as multiple letters in the same sound, such as the alphabet in English. In the case of corresponding (including “J” in “Japan” and “G” in “Germany”), the same character is used for different sounds (for example, “a” in “apple” and “a” in “ace”) .
また、上記表意文字は、数字、記号を含む。すなわち、例えば英語で「and」に相当する音声が入力されると認識文字列が「and」に変換され、通常文字変換によってアルファベット3文字の文字列からなる「and」、及び「&」が通常変換候補例として提供され、ユーザが選択できるように構成してもよい。または、通常文字変換辞書ではなく、特殊文字変換辞書において、基礎文字列「and」を記号「&」に対応付けて登録してもよい。
Also, the above ideographic characters include numbers and symbols. That is, for example, when a voice corresponding to “and” is input in English, the recognized character string is converted to “and”, and “and” and “&” consisting of three alphabetic character strings are usually converted by normal character conversion. It may be provided as a conversion candidate example and configured so that the user can select it. Alternatively, the basic character string “and” may be registered in association with the symbol “&” in a special character conversion dictionary instead of the normal character conversion dictionary.
なお、以下では、説明を簡単にするために、主制御部101がROM102に格納された基本動作プログラム1021をRAM103に展開して実行することにより各動作ブロックの制御を行う処理を、基本動作実行部1031が各動作ブロックの制御を行うものとして記述する。他の動作プログラムに関しても同様の記述を行う。
In the following, in order to simplify the explanation, the main control unit 101 executes the basic operation processing to control each operation block by expanding the basic operation program 1021 stored in the ROM 102 into the RAM 103 and executing it. The unit 1031 is described as performing control of each operation block. The same description is made for other operation programs.
アプリケーション実行部1032はサーバ装置等からダウンロードを行った各種の動作プログラムを実行する。各アプリケーションは、操作部106によりユーザからの操作を受け、ユーザが表示部116に表示されたアプリケーション起動用アイコンを選択することにより起動する。
The application execution unit 1032 executes various operation programs downloaded from a server device or the like. Each application is activated when the operation unit 106 receives an operation from the user and the user selects an application activation icon displayed on the display unit 116.
コンテンツ処理実行部1033は、外部ネットワーク4を介してサーバ装置からコンテンツのデータを予めコンテンツ情報記憶領域104aに蓄積しておき、蓄積したコンテンツを再生し、表示部116に表示する(ダウンロード再生)。或いは、コンテンツ処理実行部1033は外部ネットワーク4を介して、サーバ装置から配信されるコンテンツのデータやコンテンツ情報を受信し、逐次受信した映像や音声等を再生し、表示部116に表示する(ストリーミング再生)こともできる。
The content processing execution unit 1033 accumulates content data from the server device in advance in the content information storage area 104a via the external network 4, reproduces the accumulated content, and displays it on the display unit 116 (download reproduction). Alternatively, the content processing execution unit 1033 receives content data and content information distributed from the server device via the external network 4, reproduces the sequentially received video, audio, and the like and displays them on the display unit 116 (streaming). Playback).
音声変換実行部1034は音声入力部111により取り込んだユーザの音声を音声認識辞書104b1に基づいて認識文字列に変換し、放送受信装置1のチャンネル選択等の操作入力、或いは音声認識した文字列を辞書104b2、104b3に従って所定の文字列に変換して文字入力を行う。
The voice conversion execution unit 1034 converts the user's voice captured by the voice input unit 111 into a recognized character string based on the voice recognition dictionary 104b1, and inputs an operation input such as channel selection of the broadcast receiving apparatus 1 or a voice recognized character string. Characters are input after being converted into a predetermined character string according to the dictionaries 104b2 and 104b3.
ユーザ認証実行部1035はユーザ認証情報記憶領域104cに記憶されたユーザ認証情報と、音声入力部111により取り込んだユーザの音声、或いは撮像部114により取り取り込んだユーザの顔画像に基づいてユーザの認証を行う。
The user authentication execution unit 1035 authenticates the user based on the user authentication information stored in the user authentication information storage area 104c and the user's voice captured by the voice input unit 111 or the user's face image captured by the imaging unit 114. I do.
連携端末管理実行部1036は、宅内のローカルネットワーク或いはインターネット等の外部ネットワーク4に接続された情報端末装置2を登録・管理し、登録した情報端末装置2からの操作入力に従って放送受信装置1は各種の操作を実行する。
The linked terminal management execution unit 1036 registers and manages the information terminal device 2 connected to the external network 4 such as the home local network or the Internet, and the broadcast receiving device 1 can perform various operations according to the operation input from the registered information terminal device 2. Execute the operation.
前記各動作プログラムは、製品出荷の時点で、予めROM102及び/またはストレージ部104に格納された状態であっても良い。製品出荷後に、インターネット4上のサーバ装置等からLAN通信部117を介して取得するものであっても良い。また、メモリカードや光ディスク等に格納された前記各動作プログラムを、USBインターフェース部105c等を介して取得するものであっても良い。
The operation programs may be stored in advance in the ROM 102 and / or the storage unit 104 at the time of product shipment. It may be obtained via a LAN communication unit 117 from a server device on the Internet 4 after product shipment. Further, the respective operation programs stored in a memory card, an optical disk or the like may be acquired via the USB interface unit 105c or the like.
[リモコンのキー配置]
図3は、リモコン120のキー配置の一例を示す図である。 [Remote control key layout]
FIG. 3 is a diagram illustrating an example of the key arrangement of theremote controller 120.
図3は、リモコン120のキー配置の一例を示す図である。 [Remote control key layout]
FIG. 3 is a diagram illustrating an example of the key arrangement of the
図3に示すリモコンでは、電源キー120a1、放送波選択キー(地デジ、BS、CS)120a2、チャンネル文字入力キー(1~12)120a3、音量UP/DOWNキー120a4、チャンネルUP/DOWNキー120a5、入力切替キー120a6、番組表キー120a7、Dataキー120a8、音声入力キー120a9、メニューキー120a10、戻るキー120a11、カーソルキー(上、下、左、右)120a12、決定キー120a13、カラーキー(青、赤、緑、黄)120a14、を含んで構成される。その他の操作キーが更に表示されていても良い。
In the remote control shown in FIG. 3, a power key 120a1, a broadcast wave selection key (terrestrial digital, BS, CS) 120a2, a channel character input key (1-12) 120a3, a volume UP / DOWN key 120a4, a channel UP / DOWN key 120a5, Input switch key 120a6, program guide key 120a7, Data key 120a8, audio input key 120a9, menu key 120a10, return key 120a11, cursor keys (up, down, left, right) 120a12, enter key 120a13, color keys (blue, red) , Green, yellow) 120a14. Other operation keys may be further displayed.
電源キー120a1、放送波選択キー120a2等は、公知のテレビリモコンの各操作キーと同様の機能を有するものであり、詳細の説明を省略する。音声入力キー120a9は、本実施形態の音声入力機能のために用意される操作キーである。
The power key 120a1, the broadcast wave selection key 120a2, and the like have the same functions as the operation keys of a known TV remote controller, and will not be described in detail. The voice input key 120a9 is an operation key prepared for the voice input function of this embodiment.
[音声文字入力処理]
図4は本実施形態による音声認識による文字入力処理の一例を示すシーケンス図である。以下、図4の各ステップ順に沿って、音声認識による文字入力処理の流れを説明する。まず、放送受信装置1は、文字入力画面を表示する(S400)。文字列入力画面は、例えば外部ネットワーク4を介して、サーバ装置から配信されるコンテンツをストリーミング視聴或いはダウンロード視聴する際にサーバにログインするアカウント名或いはパスワードを入力するときに表示される。 [Voice input process]
FIG. 4 is a sequence diagram showing an example of character input processing by voice recognition according to the present embodiment. Hereinafter, the flow of character input processing by voice recognition will be described in the order of steps in FIG. First, thebroadcast receiving apparatus 1 displays a character input screen (S400). The character string input screen is displayed, for example, when an account name or a password for logging in to the server is input via the external network 4 when the content distributed from the server device is streamed or downloaded.
図4は本実施形態による音声認識による文字入力処理の一例を示すシーケンス図である。以下、図4の各ステップ順に沿って、音声認識による文字入力処理の流れを説明する。まず、放送受信装置1は、文字入力画面を表示する(S400)。文字列入力画面は、例えば外部ネットワーク4を介して、サーバ装置から配信されるコンテンツをストリーミング視聴或いはダウンロード視聴する際にサーバにログインするアカウント名或いはパスワードを入力するときに表示される。 [Voice input process]
FIG. 4 is a sequence diagram showing an example of character input processing by voice recognition according to the present embodiment. Hereinafter, the flow of character input processing by voice recognition will be described in the order of steps in FIG. First, the
次に、ユーザがリモコン120の音声入力キー120a9を押下することにより(S401)、リモコン120から放送受信装置1にキー入力情報が送信される(S402)。
Next, when the user presses the voice input key 120a9 of the remote controller 120 (S401), key input information is transmitted from the remote controller 120 to the broadcast receiving apparatus 1 (S402).
放送受信装置1の主制御部101は、キー入力情報により音声入力キー120a9の押下された状態(操作入力状態)を判定する(S403)。
The main control unit 101 of the broadcast receiving apparatus 1 determines the state (operation input state) where the voice input key 120a9 is pressed based on the key input information (S403).
次に、主制御部101は、操作入力状態判定結果により分岐処理(S404)を実行する。主制御部101が操作入力状態判定処理において、音声入力キー120a9が所定の時間以下で1回押下されたと判定した場合には、音声変換実行部1034が音声認識した文字列(認識文字列)を他の文字列(通常文字列)に変換する辞書として通常文字変換辞書(第1辞書)を選択する(S411)。
Next, the main control unit 101 executes a branch process (S404) based on the operation input state determination result. When the main control unit 101 determines in the operation input state determination process that the voice input key 120a9 is pressed once within a predetermined time or less, the character string (recognized character string) recognized by the voice conversion execution unit 1034 is recognized. The normal character conversion dictionary (first dictionary) is selected as a dictionary to be converted into another character string (normal character string) (S411).
操作入力状態判定結果により音声入力キー120a9が所定の時間以下で2回押下された、と判断した場合には、音声変換実行部1034が認識文字列を他の文字列(特殊文字列)に変換する辞書として特殊文字変換辞書(第2辞書)を選択する(S412)。
If it is determined from the operation input state determination result that the voice input key 120a9 is pressed twice within a predetermined time, the voice conversion execution unit 1034 converts the recognized character string into another character string (special character string). The special character conversion dictionary (second dictionary) is selected as the dictionary to be executed (S412).
次に、ユーザが音声を発声し(S421)、その音声を音声入力部111で取り込み(S422)、取り込んだ音声と音声変換情報記憶領域104bに記憶された音声認識辞書104b1に基づいて音声変換実行部1034が音声認識処理を行い音声を認識文字列に変換する(S423)。
Next, the user utters a voice (S421), the voice is captured by the voice input unit 111 (S422), and voice conversion is executed based on the captured voice and the voice recognition dictionary 104b1 stored in the voice conversion information storage area 104b. The unit 1034 performs voice recognition processing to convert the voice into a recognized character string (S423).
次に、音声変換実行部1034が認識文字列を後述する辞書変換入力処理により通常文字列又は特殊文字列の変換処理を行い、変換した結果を文字入力とする(S424)。
Next, the speech conversion execution unit 1034 converts the recognized character string into a normal character string or a special character string by dictionary conversion input processing described later, and sets the converted result as character input (S424).
通常文字変換辞書は一般的な語句を記憶したものであり、製品出荷の時点で予めROM102及び/またはストレージ部104に格納されている。或いは製品出荷後に、インターネット4上のサーバ装置等からLAN通信部117を介して取得するものであっても良い。また、メモリカードや光ディスク等に格納された通常文字列変換辞書を、USBインターフェース部105c等を介して取得するものであっても良い。或いは、後述する辞書登録処理によりユーザが登録するようにしてもよい。
The normal character conversion dictionary stores general words and phrases and is stored in the ROM 102 and / or the storage unit 104 in advance at the time of product shipment. Alternatively, it may be acquired via a LAN communication unit 117 from a server device on the Internet 4 after product shipment. Further, a normal character string conversion dictionary stored in a memory card, an optical disk or the like may be acquired via the USB interface unit 105c or the like. Or you may make it a user register by the dictionary registration process mentioned later.
次に辞書登録処理の一例について、図5のシーケンス図及び図6の画面表示例の図を用いて説明する。図5は、辞書登録処理の流れを示すシーケンス図である。図6は、実施形態に係る辞書登録処理における画面表示例を示す図であり、(a)は特殊辞書登録リスト表示画面を示し、(b)は新規辞書登録画面を示し、(c)は辞書登録変更画面を示す。
Next, an example of dictionary registration processing will be described with reference to the sequence diagram of FIG. 5 and the screen display example of FIG. FIG. 5 is a sequence diagram showing the flow of dictionary registration processing. 6A and 6B are diagrams showing a screen display example in the dictionary registration processing according to the embodiment. FIG. 6A shows a special dictionary registration list display screen, FIG. 6B shows a new dictionary registration screen, and FIG. The registration change screen is shown.
図5に示す辞書登録処理は、例えばユーザがリモコン120の音声入力キー120a9を所定の時間以上押下することによって辞書登録実行部1037が起動される。以下、辞書登録実行部1037は、図5のステップ順に沿って辞書登録処理を実行する。辞書登録処理ではユーザが辞書登録終了を選択するまで繰り返し処理(S500)を行う。繰り返し処理S500では、まず辞書登録された文字列のリスト(図6の(a)参照)を表示部116に表示する(S501)。
In the dictionary registration process shown in FIG. 5, for example, when the user presses the voice input key 120a9 of the remote controller 120 for a predetermined time or more, the dictionary registration execution unit 1037 is activated. Hereinafter, the dictionary registration execution unit 1037 executes the dictionary registration process in the order of steps in FIG. In the dictionary registration process, the process is repeated (S500) until the user selects the end of dictionary registration. In the repetitive processing S500, first, a list of character strings registered in the dictionary (see FIG. 6A) is displayed on the display unit 116 (S501).
図6の(a)に示すように、特殊辞書登録リスト表示画面では、登録番号6a1、音声認識した文字列(認識文字列)6a2、認識文字列に対応する特殊文字列(図中の「変換文字列」に相当)6a3、選択枠6a4、リモコン120のカラーキー120a14に割り当てた機能6a5を表示している。ここではカラーキー120a14に割り当てた機能として、“赤”を新規登録、“青”を登録変更、“黄”を登録削除、“緑”を通常文字変換辞書(第1辞書)と特殊文字変換辞書(第2辞書)の切り替えに割り当てている例を示している。
6A, on the special dictionary registration list display screen, the registration number 6a1, the voice-recognized character string (recognized character string) 6a2, and the special character string corresponding to the recognized character string ("Conversion" in the figure) 6a3, the selection frame 6a4, and the function 6a5 assigned to the color key 120a14 of the remote controller 120 are displayed. Here, as functions assigned to the color key 120a14, “red” is newly registered, “blue” is registered and changed, “yellow” is registered and deleted, and “green” is a normal character conversion dictionary (first dictionary) and a special character conversion dictionary. An example of assignment to (second dictionary) switching is shown.
次に、ユーザがリモコン120により処理選択入力を行う(S502)。例えばリモコン120の上下のカーソルキー120a12により選択枠6a4を移動させ、カラーキー120a14の“青”或いは“黄”のキーを押下することで、選択枠表示された辞書登録内容の変更処理或いは削除処理を選択することができる。
Next, the user performs process selection input using the remote controller 120 (S502). For example, by moving the selection frame 6a4 with the up and down cursor keys 120a12 of the remote controller 120 and pressing the "blue" or "yellow" key of the color key 120a14, the processing for changing or deleting the dictionary registration contents displayed in the selection frame Can be selected.
ユーザが[新規登録]を選択した場合(カラーキー120a14の“赤”のキーを押下した場合)には、辞書登録実行部1037は新規辞書登録画面(図6の(b)参照)を表示部116に表示し(S511)、後述する文字変換辞書登録処理S512を実行する。
When the user selects [New Registration] (when the “red” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 displays a new dictionary registration screen (see FIG. 6B). 116 (S511), and a character conversion dictionary registration process S512 described later is executed.
図6の(b)は新規辞書登録画面の表示例を示す。新規辞書登録画面は、新規の登録番号6b1、音声認識入力による変換前の文字列入力枠6b2(認識文字列が表示される入力枠)、変換後の文字列入力枠6b3、リモコン120のカラーキー120a14に割り当てた機能6b4を表示している。ユーザがリモコン120の上下のカーソルキー120a12により変換前の文字列入力枠6b2または変換後の文字列入力枠6b3を選択すると、選択した文字列入力枠が太線の枠で表示される。入力枠を選択した状態でリモコン120の決定キー120a13を押下することで、変換前の文字列の入力または変換後の文字列の入力が行われる。
(B) of FIG. 6 shows a display example of a new dictionary registration screen. The new dictionary registration screen includes a new registration number 6b1, a character string input frame 6b2 before conversion by speech recognition input (an input frame in which a recognized character string is displayed), a character string input frame 6b3 after conversion, and color keys of the remote controller 120. The function 6b4 assigned to 120a14 is displayed. When the user selects the character string input frame 6b2 before conversion or the character string input frame 6b3 after conversion using the up and down cursor keys 120a12 of the remote controller 120, the selected character string input frame is displayed as a thick line frame. By pressing the enter key 120a13 of the remote controller 120 with the input frame selected, the character string before conversion or the character string after conversion is input.
また、変換後の文字列入力枠6b3に文字列を入力するときにソフトキーボード6b5を表示するようにしてもよい。ユーザはリモコン120のカーソル操作でソフトキーボード6b5で文字を選択し決定することで変換文字列を入力してもよいし、リモコン120のチャンネル文字入力キー(1~12)120a3を操作して、変換後の文字列入力枠6b3に文字列を入力してもよい。さらに、変換後の文字列入力枠6b3を選択した状態で、例えばリモコン120の音声入力キー120a9を押し下げて、音声入力により変換後の文字列を入力してもよい。
Alternatively, the soft keyboard 6b5 may be displayed when a character string is input to the converted character string input frame 6b3. The user may input a converted character string by selecting and determining a character with the soft keyboard 6b5 by operating the cursor of the remote controller 120, or by operating the channel character input keys (1 to 12) 120a3 of the remote controller 120 for conversion. You may input a character string into the subsequent character string input frame 6b3. Further, with the converted character string input frame 6b3 selected, for example, the voice input key 120a9 of the remote controller 120 may be depressed to input the converted character string by voice input.
また、新規辞書登録画面において文字列入力枠6b2に表示された認識文字列が、ユーザが認識させようとした文字列と異なる場合には、文字列入力枠6b2を再度選択してリモコン120の決定キー120a13を押下することで再度ユーザが発声して音声を入力し、音声認識処理をやり直すことができる。或いは別途カラーキー120a14に再度音声入力を選択する機能を割り当てることにより(例えば“青”のキーを割り当てる)ことにより、再度ユーザが発声して音声を入力し、音声認識処理をやり直すことができるようにしてもよい。
If the recognized character string displayed in the character string input frame 6b2 on the new dictionary registration screen is different from the character string that the user tried to recognize, the character string input frame 6b2 is selected again and the remote controller 120 is determined. By depressing the key 120a13, the user can speak again and input voice, and voice recognition processing can be performed again. Alternatively, by separately assigning a function for selecting voice input to the color key 120a14 again (for example, by assigning a “blue” key), the user can speak again to input voice and perform voice recognition processing again. It may be.
ユーザが[登録変更]を選択した場合(カラーキー120a14の“青”のキーを押下した場合)には、辞書登録実行部1037が辞書登録変更画面(図6の(c))を表示部116に表示し(S513)、後述する文字変換辞書登録処理S514を実行する。
When the user selects [Registration Change] (when the “blue” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 displays the dictionary registration change screen ((c) in FIG. 6). (S513), and a character conversion dictionary registration process S514 described later is executed.
図6の(c)に辞書登録変更画面の表示例を示す。辞書登録変更画面は、変更する文字列の登録番号6c1、音声認識入力による変換前の文字列入力枠6c2(認識文字列が表示される入力枠)、変換後の文字列入力枠6c3、リモコン120のカラーキー120a14に割り当てた機能6c4を表示する。辞書登録変更画面では、現在辞書に登録されている認識文字列とそれに対応する特殊文字列とが最初に表示され、変更後の認識文字列、特殊文字列の何れか、或いは両方を変更登録することができる。なお、変換後の文字列入力枠6c3に文字列を入力するときに新規辞書登録画面と同様にソフトキーボードを更に表示してもよい。
Fig. 6 (c) shows a display example of the dictionary registration change screen. The dictionary registration change screen includes a registration number 6c1 of a character string to be changed, a character string input frame 6c2 before conversion by speech recognition input (an input frame in which a recognized character string is displayed), a character string input frame 6c3 after conversion, and the remote controller 120. The function 6c4 assigned to the color key 120a14 is displayed. On the dictionary registration change screen, the recognized character string currently registered in the dictionary and the corresponding special character string are displayed first, and either the recognized character string after change or the special character string or both are changed and registered be able to. Note that a soft keyboard may be further displayed as in the new dictionary registration screen when a character string is input to the converted character string input frame 6c3.
ユーザが[登録削除]を選択した場合(カラーキー120a14の“黄”のキーを押下した場合)には、辞書登録実行部1037が選択枠部分の文字列の登録を削除する(S515)。
When the user selects [Delete Registration] (when the “yellow” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 deletes the registration of the character string in the selection frame portion (S515).
ユーザが[辞書切替]を選択した場合(カラーキー120a14の“緑”のキーを押下した場合)には、辞書登録実行部1037が通常文字変換辞書の登録処理を行うのか、特殊文字変換辞書の登録処理を行うのかを切り替える(S516)。これにより、繰り返し処理により辞書登録文字列リスト表示処理S501に戻った時に表示する辞書を切り替えることができる。通常文字変換辞書の登録処理をユーザが選択した場合には、辞書登録実行部1037は、図6の(a)に示す特殊辞書登録リスト表示画面と同様の構成からなり、通常辞書に登録される認識文字列及び通常文字列を表示する通常辞書登録リスト表示画面を表示し、上記と同様、ユーザは新規登録、変更、削除処理を行うことができる。
When the user selects [Switch Dictionary] (when the “green” key of the color key 120a14 is pressed), whether the dictionary registration execution unit 1037 performs the registration process of the normal character conversion dictionary or the special character conversion dictionary The registration process is switched (S516). Thereby, the dictionary displayed when it returns to dictionary registration character string list display processing S501 by repetition processing can be changed. When the user selects registration processing for the normal character conversion dictionary, the dictionary registration execution unit 1037 has the same configuration as the special dictionary registration list display screen shown in FIG. 6A and is registered in the normal dictionary. The normal dictionary registration list display screen for displaying the recognized character string and the normal character string is displayed, and the user can perform new registration, change, and deletion processing as described above.
[戻る]を選択した場合(リモコン120の戻るキー120a11を押下した場合)には、中断処理により辞書登録処理を終了し(S517)、繰り返し処理S500を終了する。
When [Return] is selected (when the return key 120a11 of the remote controller 120 is pressed), the dictionary registration process is terminated by the interruption process (S517), and the repeat process S500 is terminated.
次に文字変換辞書登録処理(S512及びS514)について、図7のシーケンス図及び図6の画面表示例の図を用いて説明する。図7は、辞書登録処理における文字変換辞書登録処理の流れを示すシーケンス図である。
Next, the character conversion dictionary registration processing (S512 and S514) will be described with reference to the sequence diagram of FIG. 7 and the screen display example of FIG. FIG. 7 is a sequence diagram showing the flow of the character conversion dictionary registration process in the dictionary registration process.
図7において、ユーザが辞書登録終了を選択するまで繰り返し処理(S700)を行う。繰り返し処理S700では、まず、ユーザがリモコン120により処理選択入力を行う(S701)。例えば新規辞書登録における辞書登録処理S512ではリモコン120の上下のカーソルキー120a12により、図6(b)に示す音声認識入力による変換前の文字列入力枠6b2或いは変換後の文字列入力枠6b3を選択、カラーキー120a14の “赤”のキーを押下することにより処理選択を行う。また、辞書登録変更における辞書登録処理S514ではリモコン120の上下のカーソルキー120a12により、図6(c)に示す音声認識入力による変換前の文字列入力枠6c2或いは変換後の文字列入力枠6c3を選択、カラーキー120a14の“赤”のキーを押下することにより処理選択を行う。
In FIG. 7, the process is repeated (S700) until the user selects the end of dictionary registration. In the repetitive process S700, first, the user inputs a process selection using the remote controller 120 (S701). For example, in dictionary registration processing S512 in new dictionary registration, the character string input frame 6b2 before conversion or the character string input frame 6b3 after conversion by voice recognition input shown in FIG. 6B is selected by the up and down cursor keys 120a12 of the remote controller 120. The process is selected by pressing the “red” key of the color key 120a14. In addition, in the dictionary registration process S514 in the dictionary registration change, the character string input frame 6c2 before conversion or the character string input frame 6c3 after conversion by voice recognition input shown in FIG. Selection and processing selection is performed by pressing the “red” key of the color key 120a14.
次に辞書登録実行部1037は、選択した処理によって分岐処理S702を実行する。
Next, the dictionary registration execution unit 1037 executes a branch process S702 according to the selected process.
ユーザが[音声認識文字列入力]を選択した場合(文字列入力枠6b2、或いは文字列入力枠6c2を選択してリモコン120の決定キー120a13を押下した場合)には、ユーザからの音声の入力(S703)を音声入力部111で取り込み(S704)、音声変換実行部1034が取り込んだ音声を音声認識辞書104b1に基づいて音声認識により認識文字列に変換し、認識文字列を文字列入力枠6b2、或いは文字列入力枠6c2に表示する(S705)。
When the user selects [input voice recognition character string] (when the character string input frame 6b2 or the character string input frame 6c2 is selected and the determination key 120a13 of the remote controller 120 is pressed), the user inputs voice. (S703) is captured by the speech input unit 111 (S704), the speech captured by the speech conversion execution unit 1034 is converted into a recognized character string by speech recognition based on the speech recognition dictionary 104b1, and the recognized character string is converted into a character string input frame 6b2. Or, it is displayed in the character string input frame 6c2 (S705).
ユーザが[変換後文字列入力]を選択した場合(文字列入力枠6b3、或いは文字列入力枠6c3を選択してリモコン120の決定キー120a13を押下した場合)には、ユーザがリモコン120のチャンネル文字入力キー120a3等を用いて変換後の文字列(通常文字列又は特殊文字列)を入力し(S706)、辞書登録実行部1037が入力された文字列を文字列入力枠6b3、或いは文字列入力枠6c3に表示する(S707)。なお、変換後の文字列の入力方法としては、図6(b)に示すようなソフトキーボードを表示して文字列を入力するようにしてもよい。
When the user selects [input character string after conversion] (when the character string input frame 6b3 or the character string input frame 6c3 is selected and the determination key 120a13 of the remote controller 120 is pressed), the user selects the channel of the remote controller 120. The converted character string (ordinary character string or special character string) is input using the character input key 120a3 or the like (S706), and the character string input by the dictionary registration execution unit 1037 is input to the character string input frame 6b3 or the character string. It is displayed in the input frame 6c3 (S707). As a method for inputting the converted character string, a character string may be input by displaying a soft keyboard as shown in FIG.
ユーザが[登録]を選択した場合(カラーキー120a14の“赤”のキーを押下した場合)には、辞書登録実行部1037が認識文字列の文字列入力枠(6b2或いは6c2)と、変換後の文字列入力枠6b3或いは6c3に表示されている文字列と、を関連付けて、辞書切替処理S516によって切り替えられた通常文字変換辞書(第1辞書)或いは特殊文字変換辞書(第2辞書)の辞書情報としてストレージ部104の通常文字変換辞書104b2或いは特殊文字変換辞書104b3に記憶する(S708)。
When the user selects [Registration] (when the “red” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 and the character string input box (6b2 or 6c2) for the recognized character string and after conversion A dictionary of normal character conversion dictionary (first dictionary) or special character conversion dictionary (second dictionary) switched by dictionary switching processing S516 in association with the character string displayed in the character string input frame 6b3 or 6c3. Information is stored in the normal character conversion dictionary 104b2 or special character conversion dictionary 104b3 of the storage unit 104 (S708).
ユーザが[戻る]を選択した場合(リモコン120の戻るキー120a11を押下した場合)には、辞書登録実行部1037は辞書中断処理により文字列変換辞書登録処理を終了し(S709)、繰り返し処理S700を終了する。
When the user selects [Return] (when the return key 120a11 of the remote controller 120 is pressed), the dictionary registration execution unit 1037 ends the character string conversion dictionary registration process by the dictionary interruption process (S709), and repeats the process S700. Exit.
図8を参照して、辞書登録情報について説明する。図8は、辞書情報の構成を示す図であって、(a)は通常文字変換辞書を示し、(b)は特殊文字変換辞書を示す。
The dictionary registration information will be described with reference to FIG. FIG. 8 is a diagram showing the configuration of dictionary information, where (a) shows a normal character conversion dictionary and (b) shows a special character conversion dictionary.
図8の(a)に示すように、通常文字変換辞書は辞書情報として登録番号8a1、音声認識入力による変換前の文字列(認識文字列)8a2、変換後の文字列(変換文字列)8a3を記憶する。また、図8の(b)に示すように、特殊文字変換辞書は辞書情報として、登録番号8b1、音声認識入力による変換前の文字列(認識文字列)8b2、変換後の文字列(変換文字列)8b3を記憶する。これにより、例えば通常文字変換辞書では音声認識入力による変換前の文字列“ひらけごま”に対して、“開けゴマ”という文字列に変換される。一方、特殊文字変換辞書では音声認識入力による変換前の文字列“ひらけごま”に対して、“6922#7MgkRH”という文字列に変換される。
As shown in FIG. 8A, the normal character conversion dictionary has a registration number 8a1 as dictionary information, a character string (recognized character string) 8a2 before conversion by speech recognition input, and a character string (converted character string) 8a3 after conversion. Remember. Further, as shown in FIG. 8B, the special character conversion dictionary includes, as dictionary information, a registration number 8b1, a character string (recognized character string) 8b2 before conversion by voice recognition input, and a character string after conversion (converted character). Column) 8b3 is stored. Thereby, for example, in the normal character conversion dictionary, the character string “Hirake Sesame” before conversion by voice recognition input is converted into a character string “open sesame”. On the other hand, in the special character conversion dictionary, the character string “Hirake Sesame” before conversion by voice recognition input is converted to a character string “6922 # 7MgkRH”.
図9を参照して、図4のステップS424に示す辞書変換入力処理について説明する。図9は、辞書変換入力処理の流れを示すフローチャートである。
With reference to FIG. 9, the dictionary conversion input process shown in step S424 of FIG. 4 will be described. FIG. 9 is a flowchart showing the flow of dictionary conversion input processing.
辞書変換入力処理では、まず音声変換実行部1034は、S423で音声認識された認識文字列が選択した文字変換辞書に登録されているかを確認する(S901)。音声変換実行部1034は、例えば、S411で通常文字変換辞書が選択されている場合には通常文字変換辞書に音声認識された文字列が登録されているかを確認し、S412で特殊文字変換辞書が選択されている場合には特殊文字変換辞書に音声認識された文字列が登録されているかを確認する。
In the dictionary conversion input process, first, the speech conversion execution unit 1034 confirms whether the recognized character string recognized in S423 is registered in the selected character conversion dictionary (S901). For example, when the normal character conversion dictionary is selected in S411, the speech conversion execution unit 1034 confirms whether the character string recognized as speech is registered in the normal character conversion dictionary, and the special character conversion dictionary is determined in S412. If it is selected, it is confirmed whether or not a character string that has been voice-recognized is registered in the special character conversion dictionary.
次に、音声変換実行部1034は確認の結果に基づいて分岐処理(S902)を行う。認識文字列が文字変換辞書に登録されていない場合(S902/No)には、音声変換実行部1034は辞書登録されていない等のエラーの表示を行い(S903)、処理を終了する。音声認識された文字列が文字変換辞書に登録されている場合(S902/Yes)には、音声変換実行部1034は文字変換辞書の情報に従って認識文字列を通常文字列又は特殊文字列を変換し(S904)、変換した文字列を入力文字とし、処理を終了する。
Next, the voice conversion execution unit 1034 performs a branch process (S902) based on the confirmation result. If the recognized character string is not registered in the character conversion dictionary (S902 / No), the speech conversion execution unit 1034 displays an error such as not being registered in the dictionary (S903), and ends the process. If the speech recognized character string is registered in the character conversion dictionary (S902 / Yes), the speech conversion execution unit 1034 converts the recognized character string into a normal character string or a special character string according to the information in the character conversion dictionary. (S904) The converted character string is used as an input character, and the process is terminated.
以上の実施形態ではサーバ装置にログインする際のアカウントやパスワードの等の文字入力において、一旦文字変換辞書に登録しておくことにより、簡単な文字列を音声認識で入力することで複雑な文字列を容易に入力することができ、ユーザの使い勝手を向上することが可能となる。
In the above embodiment, in inputting characters such as an account and a password when logging in to the server device, a complicated character string can be obtained by inputting a simple character string by voice recognition once registered in the character conversion dictionary. Can be easily input, and the usability of the user can be improved.
なお、辞書登録文字列リスト表示処理S501において、パスワード等の文字列を入力するときに使用する特殊文字変換辞書を表示する際にパスワード等によるユーザ認証を行うようにしてもよい。或いは特殊文字列変換辞書のリスト表示では変換文字列が分からないように“●●●”のようなマスク表示(入力文字列を隠すための表示)し、変換文字列を表示或いは変更する際にはパスワード等によるユーザ認証を行うことで、図6の(c)のような画面を表示するようにしてもよい。これにより、特殊文字変換辞書登録した人以外に登録したパスワードの文字列を知られないようにすることができる。
In the dictionary registration character string list display processing S501, user authentication using a password or the like may be performed when displaying a special character conversion dictionary used when inputting a character string such as a password. Or, when displaying or changing the conversion character string, display a mask like “●●●” (display to hide the input character string) so that the conversion character string is not understood in the list display of the special character string conversion dictionary 6 may display a screen as shown in FIG. 6C by performing user authentication using a password or the like. As a result, it is possible to prevent the registered password character string from being known to anyone other than those registered in the special character conversion dictionary.
また、辞書登録処理において、辞書表示の切り替えを行わず通常文字変換辞書と特殊文字変換辞書を同時にリスト表示するようにしてもよい。同時に表示する場合には、通常文字変換辞書と特殊文字変換辞書の何れかに登録されているのかが分かるように表示を行う。例えば、通常文字変換辞書に登録されている文字列の登録番号には頭に“N”を、特殊文字変換辞書に登録されている文字列の登録番号には頭に“S”を付加する。
In the dictionary registration process, the normal character conversion dictionary and the special character conversion dictionary may be displayed in a list simultaneously without switching the dictionary display. When displaying simultaneously, it displays so that it may be understood whether it is registered into either a normal character conversion dictionary or a special character conversion dictionary. For example, “N” is added to the head of the registration number of the character string registered in the normal character conversion dictionary, and “S” is added to the head of the registration number of the character string registered in the special character conversion dictionary.
また、辞書変換入力処理において認識文字列が文字変換辞書に登録されていない場合にエラー表示を行わず、認識文字列をそのまま入力文字とするようにしてもよい。
Further, when the recognized character string is not registered in the character conversion dictionary in the dictionary conversion input process, the recognized character string may be directly used as the input character without displaying an error.
<第2実施形態>
第2実施形態は、音声入力を行ったユーザの認証を行い、辞書登録したユーザ以外の人が登録した文字列を音声入力したときには文字列変換を行わないようにするものである。以下、図10及び図11を参照して第2実施形態に係る辞書登録処理について説明する。図10は、ユーザ認証処理を含む辞書登録処理の流れを示すシーケンス図である。図11は、第2実施形態に係る辞書登録処理における画面表示例を示す図であり、(a)は特殊辞書登録リスト表示画面を示し、(b)は新規辞書登録画面を示し、(c)は辞書登録変更画面を示す。 Second Embodiment
In the second embodiment, authentication of a user who has input a voice is performed, and when a character string registered by a person other than the user registered in the dictionary is input by voice, character string conversion is not performed. Hereinafter, the dictionary registration processing according to the second embodiment will be described with reference to FIGS. 10 and 11. FIG. 10 is a sequence diagram showing the flow of dictionary registration processing including user authentication processing. FIG. 11 is a diagram showing a screen display example in the dictionary registration processing according to the second embodiment, where (a) shows a special dictionary registration list display screen, (b) shows a new dictionary registration screen, and (c) Indicates a dictionary registration change screen.
第2実施形態は、音声入力を行ったユーザの認証を行い、辞書登録したユーザ以外の人が登録した文字列を音声入力したときには文字列変換を行わないようにするものである。以下、図10及び図11を参照して第2実施形態に係る辞書登録処理について説明する。図10は、ユーザ認証処理を含む辞書登録処理の流れを示すシーケンス図である。図11は、第2実施形態に係る辞書登録処理における画面表示例を示す図であり、(a)は特殊辞書登録リスト表示画面を示し、(b)は新規辞書登録画面を示し、(c)は辞書登録変更画面を示す。 Second Embodiment
In the second embodiment, authentication of a user who has input a voice is performed, and when a character string registered by a person other than the user registered in the dictionary is input by voice, character string conversion is not performed. Hereinafter, the dictionary registration processing according to the second embodiment will be described with reference to FIGS. 10 and 11. FIG. 10 is a sequence diagram showing the flow of dictionary registration processing including user authentication processing. FIG. 11 is a diagram showing a screen display example in the dictionary registration processing according to the second embodiment, where (a) shows a special dictionary registration list display screen, (b) shows a new dictionary registration screen, and (c) Indicates a dictionary registration change screen.
図10において、辞書登録処理ではユーザが辞書登録終了を選択するまで繰り返し処理(S1000)が実行される。繰り返し処理S1000では、まず辞書登録実行部1037が辞書登録された文字列のリストを表示部116に表示する(S1001)。
In FIG. 10, in the dictionary registration process, the process is repeatedly performed (S1000) until the user selects the end of dictionary registration. In the iterative process S1000, first, the dictionary registration execution unit 1037 displays a list of character strings registered in the dictionary on the display unit 116 (S1001).
図11(a)に辞書登録文字列リスト表示画面の表示例を示す。図11の(a)に示す特殊辞書登録リストは、登録番号11a1、変換前文字列11a2、変換後文字列11a3、選択枠11a4、リモコン120のカラーキー120a14に割り当てた機能11a5が表示される。ここではカラーキー120a14に割り当てた機能として、“赤”を新規登録、“青”を登録変更、“黄”を登録削除、“緑”を通常文字変換辞書と特殊文字変換辞書の切り替えに割り当てている例を示している。また、登録番号S4及びS5のように認証が必要な文字列は変換文字列が分からないようにマスク表示している。
FIG. 11A shows a display example of the dictionary registration character string list display screen. The special dictionary registration list shown in FIG. 11A displays a registration number 11a1, a pre-conversion character string 11a2, a post-conversion character string 11a3, a selection frame 11a4, and a function 11a5 assigned to the color key 120a14 of the remote controller 120. Here, as functions assigned to the color key 120a14, “red” is newly registered, “blue” is registered and changed, “yellow” is registered and deleted, and “green” is assigned to switch between the normal character conversion dictionary and the special character conversion dictionary. An example is shown. In addition, character strings that require authentication, such as registration numbers S4 and S5, are masked so that the converted character strings are not known.
次に、ユーザがリモコン120により処理選択入力を行う(S1002)。例えばリモコン120の上下のカーソルキー120a12によりユーザが選択枠11a4を移動させ、カラーキー120a14の“青”或いは“黄”のキーを押下することで、ユーザは選択枠表示された辞書登録内容の変更処理或いは削除処理を選択することができる。
Next, the user performs process selection input using the remote controller 120 (S1002). For example, when the user moves the selection frame 11a4 with the up and down cursor keys 120a12 of the remote controller 120 and presses the "blue" or "yellow" key of the color key 120a14, the user changes the dictionary registration contents displayed in the selection frame. Processing or deletion processing can be selected.
次に、ユーザの処理選択入力S1002に従って、辞書登録実行部1037は分岐処理S1010を実行する。
Next, according to the user process selection input S1002, the dictionary registration execution unit 1037 executes the branch process S1010.
ユーザが[新規登録]を選択した場合(カラーキー120a14の“赤”のキーを押下した場合)には、辞書登録実行部1037が新規辞書登録画面を表示部116に表示し(S1011)、後述する文字変換辞書登録処理(S1012)を実行する。
When the user selects [New Registration] (when the “red” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 displays a new dictionary registration screen on the display unit 116 (S1011). A character conversion dictionary registration process (S1012) is executed.
図11の(b)に新規辞書登録画面の表示例を示す。図11の(b)に示す新規辞書登録画面は、新規の登録番号11b1、音声認識入力による変換前の文字列(認識文字列)入力枠11b2、変換後の文字列(特殊文字列)入力枠11b3、リモコン120のカラーキー120a14に割り当てた機能11b4が表示される。ここでは“赤”を変換時にユーザの認証を必要とする文字列として登録、“青” を変換時にユーザの認証を必要としない文字列として登録するのに割り当てている。
Fig. 11B shows a display example of the new dictionary registration screen. The new dictionary registration screen shown in FIG. 11B includes a new registration number 11b1, a character string (recognized character string) input box 11b2 before conversion by voice recognition input, and a character string (special character string) input box after conversion. 11b3 and the function 11b4 assigned to the color key 120a14 of the remote controller 120 are displayed. Here, “red” is assigned as a character string that requires user authentication at the time of conversion, and “blue” is assigned as a character string that does not require user authentication at the time of conversion.
ユーザが[登録変更]を選択した場合(カラーキー120a14の“青”のキーを押下した場合)には、ユーザ認証実行部1035は変更する文字列が音声認識した文字列を変換する際にユーザの認証を必要とする文字列か否かにより分岐処理(S1013)を行う。
When the user selects [Registration Change] (when the “blue” key of the color key 120a14 is pressed), the user authentication execution unit 1035 performs the user conversion when converting the character string recognized by the character string to be changed. Branch processing (S1013) is performed depending on whether or not the character string requires authentication.
選択した文字列がユーザ認証を必要としない文字列の場合には、辞書登録実行部1037は辞書登録変更画面を表示部116に表示し(S1014)、後述する文字変換辞書登録処理S1015を実行する。
If the selected character string is a character string that does not require user authentication, the dictionary registration execution unit 1037 displays a dictionary registration change screen on the display unit 116 (S1014), and executes a character conversion dictionary registration process S1015 described later. .
図11の(c)に辞書登録変更画面の表示例を示す。図11の(c)において、辞書登録変更画面は、変更する文字列の登録番号11c1、音声認識入力による変換前の文字列入力枠11c2、変換後の文字列入力枠11c3、リモコン120のカラーキー120a14に割り当てた機能11b4が表示される。ここでは“赤”を変換時にユーザの認証を必要とする文字列として登録、“青”を変換時にユーザの認証を必要としない文字列として登録するのに割り当てている。また、登録されている文字列が最初に表示され、変換前の文字列、変換後の文字列の何れか、或いは両方を変更することができる。
Fig. 11 (c) shows a display example of the dictionary registration change screen. In FIG. 11C, the dictionary registration change screen shows the registration number 11c1 of the character string to be changed, the character string input frame 11c2 before conversion by voice recognition input, the character string input frame 11c3 after conversion, and the color keys of the remote controller 120. The function 11b4 assigned to 120a14 is displayed. Here, “red” is assigned as a character string that requires user authentication during conversion, and “blue” is assigned as a character string that does not require user authentication during conversion. In addition, the registered character string is displayed first, and either the character string before conversion, the character string after conversion, or both can be changed.
選択した文字列がユーザ認証を必要とする文字列の場合には、ユーザ認証実行部1035がユーザ認証処理(S1016)を行い、認証結果による分岐処理(S1017)を行う。
When the selected character string is a character string that requires user authentication, the user authentication execution unit 1035 performs user authentication processing (S1016), and performs branch processing (S1017) based on the authentication result.
分岐処理S1017において、ユーザ認証が無効の場合(辞書登録したユーザと音声入力したユーザが異なると判断した場合)には、ユーザ認証実行部1035は認証が無効であることを表示する(S1018)。
In the branch process S1017, when the user authentication is invalid (when it is determined that the user registered in the dictionary and the user who has input the voice are different), the user authentication execution unit 1035 displays that the authentication is invalid (S1018).
分岐処理S1017において、ユーザ認証が有効の場合(辞書登録したユーザと音声入力したユーザが同じと判断した場合)には、ユーザ認証実行部1035は辞書登録実行部1037に対し認証が有効であった結果を出力し、辞書登録実行部1037がS1014と同様に辞書登録変更画面を表示部116に表示し(S1019)、後述する文字変換辞書登録処理S1020を実行する。
In the branch process S1017, when the user authentication is valid (when it is determined that the user registered in the dictionary and the user who entered the voice are the same), the user authentication execution unit 1035 is authenticated to the dictionary registration execution unit 1037. The result is output, and the dictionary registration execution unit 1037 displays a dictionary registration change screen on the display unit 116 as in S1014 (S1019), and executes a character conversion dictionary registration process S1020 described later.
ユーザが[登録削除]を選択した場合(カラーキー120a14の“黄”のキーを押下した場合)には、辞書登録実行部1037が選択枠部分の文字列の登録を削除する(S1021)。
When the user selects [Delete Registration] (when the “yellow” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 deletes the registration of the character string in the selection frame portion (S1021).
ユーザが[辞書切替]を選択した場合(カラーキー120a14の“緑”のキーを押下した場合)には、辞書登録実行部1037は、ユーザから通常文字変換辞書の登録処理を行うのか、特殊文字変換辞書の登録処理を行うのかのを切り替える操作を受け付け、その操作結果に従って登録処理の対象となる辞書を切り替える(S1022)。これにより、繰り返し処理により辞書登録文字列リスト表示処理S1001に戻ったときに表示する辞書が切り替わる。
When the user selects [Dictionary switching] (when the “green” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 performs registration processing of the normal character conversion dictionary from the user or special characters. An operation for switching whether to perform registration processing of the conversion dictionary is accepted, and the dictionary to be registered is switched according to the operation result (S1022). As a result, the dictionary to be displayed when the process returns to the dictionary registration character string list display process S1001 by the repetition process is switched.
ユーザが[戻る]を選択した場合(リモコン120の戻るキー120a11を押下した場合)には、辞書登録実行部1037は中断処理により辞書登録処理を終了し(S1023)、繰り返し処理S1000を終了する。
When the user selects [Return] (when the return key 120a11 of the remote controller 120 is pressed), the dictionary registration execution unit 1037 ends the dictionary registration process by the interruption process (S1023), and ends the repeat process S1000.
なお、上記の実施形態における削除の処理において、認証が必要な文字列を削除する場合にはユーザ認証実行部1035がユーザ認証を行い、認証が有効なときにのみ辞書登録実行部1037が削除を行うようにしてもよい。
In the deletion processing in the above embodiment, when deleting a character string that requires authentication, the user authentication execution unit 1035 performs user authentication, and the dictionary registration execution unit 1037 deletes only when the authentication is valid. You may make it perform.
次に第2実施形態における文字変換辞書登録処理(図10のS1012、S1015及びS1020)について、図11の画面表示例及び図12のシーケンス図を用いて説明する。図12は、第2実施形態の辞書登録処理における文字変換辞書登録処理の流れを示すシーケンス図である。
Next, the character conversion dictionary registration process (S1012, S1015, and S1020 in FIG. 10) in the second embodiment will be described with reference to the screen display example in FIG. 11 and the sequence diagram in FIG. FIG. 12 is a sequence diagram showing the flow of the character conversion dictionary registration process in the dictionary registration process of the second embodiment.
図12において、ユーザが辞書登録終了を選択するまで繰り返し処理(S1200)が実行される。繰り返し処理S1200では、まず、ユーザがリモコン120により処理選択入力を行う(S1201)。例えば新規辞書登録における文字変換辞書登録処理S1012ではリモコン120の上下のカーソルキー120a12により音声認識入力による変換前の文字列入力枠11b2或いは変換後の文字列入力枠11b3を選択、カラーキー120a14の“赤”或いは”青”のキーを押下することにより処理選択を行う。また、辞書登録変更における文字変換辞書登録処理S1015、S1020ではリモコン120の上下のカーソルキー120a12により音声認識入力による変換前の文字列入力枠11c2或いは変換後の文字列入力枠11c3を選択、カラーキー120a14の“赤”或いは”緑”のキーを押下することにより処理選択を行う。
In FIG. 12, the repetitive process (S1200) is executed until the user selects the end of dictionary registration. In the repetitive process S1200, first, the user inputs a process selection using the remote controller 120 (S1201). For example, in the character conversion dictionary registration process S1012 in the new dictionary registration, the character string input frame 11b2 before conversion or the character string input frame 11b3 after conversion by voice recognition input is selected by the upper and lower cursor keys 120a12 of the remote controller 120, and the color key 120a14 “ Processing selection is performed by pressing the “red” or “blue” key. In addition, in the character conversion dictionary registration processing S1015 and S1020 in the dictionary registration change, the character string input frame 11c2 before conversion or the character string input frame 11c3 after conversion by voice recognition input is selected by the upper and lower cursor keys 120a12 of the remote controller 120, and the color key. The process is selected by pressing the “red” or “green” key of 120a14.
次にユーザが選択した処理の結果に従って辞書登録実行部1037が分岐処理S1202を実行する。
Next, the dictionary registration execution unit 1037 executes the branch process S1202 according to the result of the process selected by the user.
ユーザが[音声認識文字列入力]を選択した場合(文字列入力枠11b2、或いは文字列入力枠11c2を選択してリモコン120の決定キー120a13を押下した場合)には、ユーザからの音声入力(S1203)を音声入力部111が取り込み(S1204)、音声変換実行部1034が取り込んだ音声に基づいて音声認識により認識文字列に変換し、認識文字列を文字列入力枠11b2、或いは文字列入力枠11c2に表示する(S1205)。
When the user selects [Voice recognition character string input] (when the character string input frame 11b2 or the character string input frame 11c2 is selected and the determination key 120a13 of the remote controller 120 is pressed), voice input from the user ( S1203) is captured by the voice input unit 111 (S1204) and converted into a recognized character string by voice recognition based on the voice captured by the voice conversion execution unit 1034, and the recognized character string is converted into the character string input frame 11b2 or the character string input frame. 11c2 is displayed (S1205).
ユーザが[変換後文字列入力]を選択した場合(文字列入力枠11b3、或いは文字列入力枠11c3を選択してリモコン120の決定キー120a13を押下した場合)には、ユーザがリモコン120のチャンネル文字入力キー120a3等を用いて変換後の文字列を入力し(S1206)、辞書登録実行部1037が文字列入力枠11b3、或いは文字列入力枠11c3に表示する(S1207)。
When the user selects [input character string after conversion] (when the character string input frame 11b3 or the character string input frame 11c3 is selected and the determination key 120a13 of the remote controller 120 is pressed), the user selects the channel of the remote controller 120. The converted character string is input using the character input key 120a3 or the like (S1206), and the dictionary registration execution unit 1037 displays it in the character string input frame 11b3 or the character string input frame 11c3 (S1207).
ユーザが[ユーザ認証有登録]を選択した場合(カラーキー120a14の“赤”のキーを押下した場合)には、ユーザ認証実行部1035が認証情報取得処理(S1208)を行い、ユーザ認証実行部1035が取得した認証情報を登録番号に対応させてユーザ情報記憶領域104cに記憶する。また、辞書登録実行部1037が音声認識入力による変換前の文字列入力枠11b2或いは11c2、変換後の文字列入力枠11b3或いは11c3に表示されている文字列を辞書切替処理S1022によって切り替えられた通常文字変換辞書或いは特殊文字変換辞書に登録する(S1209)。登録した辞書情報はストレージ部104の音声変換情報記憶領域104bに記憶される。
When the user selects [Registration with user authentication] (when the “red” key of the color key 120a14 is pressed), the user authentication execution unit 1035 performs an authentication information acquisition process (S1208), and the user authentication execution unit The authentication information acquired in 1035 is stored in the user information storage area 104c in association with the registration number. Further, the dictionary registration execution unit 1037 has switched the character string displayed in the character string input frame 11b2 or 11c2 before conversion by the voice recognition input and the character string input frame 11b3 or 11c3 after conversion by the dictionary switching process S1022. It is registered in the character conversion dictionary or special character conversion dictionary (S1209). The registered dictionary information is stored in the voice conversion information storage area 104b of the storage unit 104.
ユーザが[ユーザ認証無登録]を選択した場合(カラーキー120a14の“青”のキーを押下した場合)には、辞書登録実行部1037が音声認識入力による変換前の文字列入力枠11b2或いは11c2、変換後の文字列入力枠11b3或いは11c3に表示されている文字列を辞書切替処理S1022によって切り替えられた通常文字変換辞書或いは特殊文字変換辞書に登録する(S1210)。登録した辞書情報はストレージ部104の音声変換情報記憶領域104bに記憶される。
When the user selects [No User Authentication Registration] (when the “blue” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 performs the character string input frame 11b2 or 11c2 before conversion by voice recognition input. The character string displayed in the converted character string input box 11b3 or 11c3 is registered in the normal character conversion dictionary or special character conversion dictionary switched by the dictionary switching process S1022 (S1210). The registered dictionary information is stored in the voice conversion information storage area 104b of the storage unit 104.
ユーザが[戻る]を選択した場合(リモコン120の戻るキー120a11を押下した場合)には、辞書登録実行部1037が中断処理により文字変換辞書登録処理を終了し(S1211)、繰り返し処理S1200を終了する。
When the user selects [Return] (when the return key 120a11 of the remote controller 120 is pressed), the dictionary registration execution unit 1037 ends the character conversion dictionary registration process by the interruption process (S1211), and ends the repetition process S1200. To do.
上記において、ユーザ認証実行部1035がユーザ認証情報取得処理S1208を行い、取得した認証情報を登録する文字列に関連付けてユーザ認証情報記憶領域104cに記憶を行う。また、ユーザ認証情報取得処理S1208において、認証情報として音声入力取り込み処理S1204で取り込んだ音声データを用いて声紋データを取得し、認証情報としてもよい。或いは、撮像部114によりユーザの顔の画像を取得し、顔認識データを認証情報としてもよい。要は、ユーザ認証実行部1035がユーザを特定できる認証方法を用いて認証情報を取得し、音声認識した文字列を文字変換辞書に従って変換する時に、取得した認証情報を用いてユーザ認証を行い、ユーザ認証が有効なときには文字変換辞書による変換を行うようにすればよい。
In the above, the user authentication execution unit 1035 performs user authentication information acquisition processing S1208, and stores the acquired authentication information in the user authentication information storage area 104c in association with the character string to be registered. In the user authentication information acquisition process S1208, the voice print data may be acquired as the authentication information using the voice data acquired in the voice input acquisition process S1204. Alternatively, an image of the user's face may be acquired by the imaging unit 114, and the face recognition data may be used as authentication information. In short, when the user authentication execution unit 1035 acquires authentication information using an authentication method that can identify a user, and converts a voice-recognized character string according to a character conversion dictionary, user authentication is performed using the acquired authentication information, When the user authentication is valid, conversion using a character conversion dictionary may be performed.
また、放送受信装置1にユーザ認証の実行方法として音声認識又は顔認識のどちらかを選択するようにしてもよい。例えば表示部116にユーザ認証方法を選択する画面を表示して、リモコン120のカーソルキー(上、下、左、右)120a12及び決定キー120a13を用いてユーザが選択できるように構成してもよい。例えばユーザが体調不良で声が通常とは変わったときやかすれていて、認証情報として記憶された声紋データと不一致と判定される場合には、顔認識に切り替えるように構成してもよい。
Also, either the voice recognition or the face recognition may be selected as the user authentication execution method for the broadcast receiving apparatus 1. For example, a screen for selecting a user authentication method may be displayed on the display unit 116 so that the user can select using the cursor keys (up, down, left, right) 120a12 and the enter key 120a13 of the remote controller 120. . For example, when the user is in poor physical condition and the voice changes from normal, it may be configured to switch to face recognition when it is determined that the voice print data stored as authentication information does not match.
また、ユーザ認証を行う文字列の登録は通常文字変換辞書及び特殊文字変換辞書の両方で行えるようにしてもよく、特殊文字変換辞書のみで行えるようにしてもよい。特殊文字変換辞書のみで行えるようにした場合、通常辞書の新規辞書登録画面、辞書登録変更画面では図11の(b)、図11の(c)のカラーキー120a14の“赤”に対応した認証有の登録の表示は行わない)
In addition, registration of a character string for performing user authentication may be performed using both the normal character conversion dictionary and the special character conversion dictionary, or may be performed using only the special character conversion dictionary. When the special character conversion dictionary can be used only, authentication corresponding to “red” of the color key 120a14 in FIG. 11B and FIG. 11C in the new dictionary registration screen and the dictionary registration change screen of the normal dictionary. No registration is displayed)
また、一つの文字列に対してユーザ認証の登録を複数の人について行うようにしてもよい。これにより、例えばユーザ認証に父親と母親を登録することで、父親が留守の場合でも母親が同じ文字列を音声入力することによりパスワードの入力を行うことが可能となる。
Also, registration of user authentication for a single character string may be performed for a plurality of people. Thus, for example, by registering the father and mother for user authentication, even when the father is away, the mother can input the password by voice input of the same character string.
図13を参照して第2実施形態に係る音声変換情報記憶領域104bの特殊文字変換辞書104b3に記憶される辞書登録情報について説明する。図13は、第2実施形態に係る特殊文字変換辞書に記憶される辞書登録情報の例を示す図である。
The dictionary registration information stored in the special character conversion dictionary 104b3 in the speech conversion information storage area 104b according to the second embodiment will be described with reference to FIG. FIG. 13 is a diagram illustrating an example of dictionary registration information stored in the special character conversion dictionary according to the second embodiment.
図13において、辞書登録情報は、登録番号13b1、音声認識入力による変換前の文字列(認識文字列)13b2、変換後の文字列13b3、認証要否13b4、及びユーザ認証情報記憶領域104cに記憶された認証情報13b5が記憶されている。これにより、例えば認証要で登録された登録番号3或いは4は音声入力時にユーザの認証を行う必要があり、登録した人以外が認識文字列を音声入力しても認証が無効となるため、文字列への変換ができず、パスワード等が勝手に入力されることを防ぐことが可能となる。
In FIG. 13, dictionary registration information is stored in a registration number 13b1, a character string (recognized character string) 13b2 before conversion by voice recognition input, a character string 13b3 after conversion, authentication necessity 13b4, and a user authentication information storage area 104c. The authenticated authentication information 13b5 is stored. As a result, for example, registration number 3 or 4 registered for authentication needs to authenticate the user at the time of voice input, and even if a person other than the registered person inputs the recognition character string by voice, the authentication becomes invalid. It is possible to prevent a password or the like from being entered without permission because conversion to a column is not possible.
図14を参照して辞書変換入力処理S424の流れについて説明する。図14は、第2実施形態に係る辞書変換入力処理の流れを示すフローチャートである。
The flow of the dictionary conversion input process S424 will be described with reference to FIG. FIG. 14 is a flowchart showing the flow of dictionary conversion input processing according to the second embodiment.
図14において、まず音声変換実行部1034は、ステップS423(図4参照)で音声変換実行部1034が音声認識して生成した認識文字列が、ユーザが選択した文字変換辞書に登録されているかを確認する(S1401)。例えば、S411で通常文字変換辞書が選択されている場合には通常文字変換辞書に音声認識された文字列が登録されているかを確認し、S412で特殊文字変換辞書が選択されている場合には特殊文字変換辞書に音声認識された文字列が登録されているかを確認する。
In FIG. 14, first, the speech conversion execution unit 1034 determines whether or not the recognized character string generated by speech recognition by the speech conversion execution unit 1034 in step S423 (see FIG. 4) is registered in the character conversion dictionary selected by the user. Confirmation is made (S1401). For example, when the normal character conversion dictionary is selected in S411, it is checked whether a character string recognized by speech is registered in the normal character conversion dictionary, and when the special character conversion dictionary is selected in S412. Check if the character string recognized as voice is registered in the special character conversion dictionary.
次に、音声変換実行部1034は、確認の結果に基づいて分岐処理(S1402)を行う。
Next, the speech conversion execution unit 1034 performs a branching process (S1402) based on the confirmation result.
分岐処理S1402において、音声認識された文字列が選択した文字変換辞書に登録されていない場合(S1402/No)には、音声変換実行部1034は、辞書登録されていない等のエラーの表示を行い(S1403)、処理を終了する。音声認識された文字列が選択した文字変換辞書に登録されている場合(S1402/Yes)には、音声変換実行部1034は、音声認識された文字列がユーザ認証を必要とする文字列か否かを辞書登録情報により確認する(S1404)。次に、確認の結果に基づいて音声変換実行部1034は、分岐処理(S1405)を行う。
In the branch processing S1402, when the character string recognized by speech is not registered in the selected character conversion dictionary (S1402 / No), the speech conversion execution unit 1034 displays an error such as not being registered in the dictionary. (S1403), the process ends. If the voice-recognized character string is registered in the selected character conversion dictionary (S1402 / Yes), the voice conversion execution unit 1034 determines whether the voice-recognized character string is a character string that requires user authentication. This is confirmed by the dictionary registration information (S1404). Next, based on the result of the confirmation, the voice conversion execution unit 1034 performs a branch process (S1405).
分岐処理S1405において、ユーザ認証を必要としない文字列の場合(S1405/No)には、音声変換実行部1034は、文字変換処理の音声認識した認識文字列を選択されている文字変換辞書の情報に従った文字列に変換し(S1409)、処理を終了する。ユーザ認証を必要とする文字列の場合(S1405/Yes)には、ユーザ認証実行部1035は、辞書登録番号に対応してユーザ認証情報記憶領域104cに記憶された認証情報を用いてユーザ認証を行い(S1406)、音声変換実行部1034に対してユーザ認証が無効であるか有効であるかを示す認証判定情報を出力する。音声変換実行部1034は認証判定情報に基づいて分岐処理(S1407)を行う。
In the branch process S1405, in the case of a character string that does not require user authentication (S1405 / No), the speech conversion execution unit 1034 stores information on the character conversion dictionary in which the recognized character string recognized by speech conversion in the character conversion process is selected. (S1409), and the process ends. If the character string requires user authentication (S1405 / Yes), the user authentication execution unit 1035 performs user authentication using the authentication information stored in the user authentication information storage area 104c corresponding to the dictionary registration number. In step S1406, authentication determination information indicating whether user authentication is invalid or valid is output to the voice conversion execution unit 1034. The voice conversion execution unit 1034 performs a branch process (S1407) based on the authentication determination information.
分岐処理S1407において、ユーザ認証が無効の場合(S1407/No)には、音声変換実行部1034はユーザ認証が無効であることなどを表示し(S1408)、処理を終了する。ユーザ認証が有効の場合(S1407/Yes)には、音声変換実行部1034は、文字変換処理の音声認識した認識文字列を、文字変換辞書の情報に従って文字列を変換し(S1409)、処理を終了する。
In the branch process S1407, if the user authentication is invalid (S1407 / No), the voice conversion execution unit 1034 displays that the user authentication is invalid (S1408) and ends the process. If the user authentication is valid (S1407 / Yes), the speech conversion execution unit 1034 converts the recognized character string recognized by the character conversion process into a character string according to the information in the character conversion dictionary (S1409), and performs the process. finish.
以上の実施形態では、辞書登録するとき、及び辞書による音声認識した文字列を変換するときにユーザ認証を行うことにより、辞書登録したユーザ以外の人が登録した文字列を音声入力したときには文字列の変換を行わないようにすることができる。これにより、例えば辞書登録していない子供がパスワードに対応する文字列を音声入力しても、パスワードに対応する文字列に変換されることがなく、子供が勝手に有料コンテンツの配信サービス等にログインするのを防止することができる。
In the above embodiment, when a character string registered by a person other than the user registered in the dictionary is voice-inputted by performing user authentication when registering the dictionary and converting a character string recognized by the dictionary. Can be avoided. As a result, for example, even if a child who is not registered in the dictionary inputs a character string corresponding to a password by voice, the child does not convert it to a character string corresponding to the password, and the child logs in to a paid content distribution service without permission. Can be prevented.
上記の説明では、特殊文字変換辞書104b3を一つ備えたが、ユーザ毎に異なる特殊文字変換辞書104b3を備えておき、ユーザ認証の結果、特定されたユーザに対して設けられた特殊文字変換辞書104b3を選択し、これを使ってステップS1409において文字列を変換してもよい。例えば、上記例で父及び母に対して第一特殊文字変換辞書を、子供に対して第二の特殊文字変換辞書を設けておく。そして、第一特殊文字変換辞書は、認識文字列「すずきあどれす」に対して変換文字列「Suzuki_parents」を登録する。また第二特殊文字変換辞書は、認識文字列「すずきあどれす」に対して変換文字列「Suzuki_kid」を登録する。そして、ステップS1401で音声変換実行部1034が生成した認識文字列をユーザ認証実行部1035が取得し、S1406のユーザ認証において、ユーザを固有に認証する。その結果、ユーザが父であると判定された場合には、S1409において第一特殊文字変換辞書を選択してから辞書変換処理を実行する。また、ユーザが子供であると判定された場合には、第二特殊文字変換辞書を選択してから辞書変換処理を実行する。これにより、同一の認識文字列「すずきあどれす」が生成された場合にも、ユーザにより異なる特殊文字列に変換を行うことができる。
In the above description, one special character conversion dictionary 104b3 is provided. However, a special character conversion dictionary 104b3 that is different for each user is provided, and a special character conversion dictionary provided for a user specified as a result of user authentication. 104b3 may be selected and used to convert the character string in step S1409. For example, in the above example, a first special character conversion dictionary is provided for father and mother, and a second special character conversion dictionary is provided for children. Then, the first special character conversion dictionary registers the converted character string “Suzuki_parents” for the recognized character string “Suzuki Adores”. Also, the second special character conversion dictionary registers the conversion character string “Suzuki_kid” for the recognized character string “Suzuki Adores”. Then, the user authentication execution unit 1035 acquires the recognized character string generated by the speech conversion execution unit 1034 in step S1401, and authenticates the user uniquely in the user authentication of S1406. As a result, if it is determined that the user is the father, the dictionary conversion process is executed after the first special character conversion dictionary is selected in S1409. If it is determined that the user is a child, the dictionary conversion process is executed after selecting the second special character conversion dictionary. Thereby, even when the same recognized character string “Suzuki Adorasu” is generated, it is possible to convert the character string to a different special character string depending on the user.
<第3実施形態>
第3実施形態は、ユーザ認証処理においてユーザが入力した音声を用いる実施形態であり、より詳しくは、声紋のように音声によるユーザ認証を用いた場合に、ユーザの認証を必要とする文字列の音声入力において、過去にユーザの認証を必要とする文字列の音声入力を行った際に録音された音声を入力することによりユーザ認証が有効とされるのを防止するものである。以下、図15を参照して、第3実施形態について説明する。図15は、第3実施形態に係る音声認識による文字入力処理の一例を示すシーケンス図である。 <Third Embodiment>
The third embodiment is an embodiment that uses voice input by the user in the user authentication process. More specifically, when voice user authentication is used like a voice print, a character string that requires user authentication is used. In voice input, it is possible to prevent user authentication from being validated by inputting voice recorded when a voice of a character string that requires user authentication in the past is input. Hereinafter, the third embodiment will be described with reference to FIG. FIG. 15 is a sequence diagram illustrating an example of a character input process by voice recognition according to the third embodiment.
第3実施形態は、ユーザ認証処理においてユーザが入力した音声を用いる実施形態であり、より詳しくは、声紋のように音声によるユーザ認証を用いた場合に、ユーザの認証を必要とする文字列の音声入力において、過去にユーザの認証を必要とする文字列の音声入力を行った際に録音された音声を入力することによりユーザ認証が有効とされるのを防止するものである。以下、図15を参照して、第3実施形態について説明する。図15は、第3実施形態に係る音声認識による文字入力処理の一例を示すシーケンス図である。 <Third Embodiment>
The third embodiment is an embodiment that uses voice input by the user in the user authentication process. More specifically, when voice user authentication is used like a voice print, a character string that requires user authentication is used. In voice input, it is possible to prevent user authentication from being validated by inputting voice recorded when a voice of a character string that requires user authentication in the past is input. Hereinafter, the third embodiment will be described with reference to FIG. FIG. 15 is a sequence diagram illustrating an example of a character input process by voice recognition according to the third embodiment.
図15において、S400からS404の処理(S404の分岐処理内のS411及びS412を含む)は図4と同じであり、説明を省略する。
15, the processing from S400 to S404 (including S411 and S412 in the branching processing of S404) is the same as that in FIG.
分岐処理S404の後、音声変換実行部1034は識別音の設定を行い(S431)、音声出力部113から設定した識別音の出力を開始する(S432)。
After the branch process S404, the voice conversion execution unit 1034 sets the identification sound (S431), and starts outputting the identification sound set from the voice output unit 113 (S432).
次に、ユーザが音声を発声し(S433)、音声入力部111が音声を取り込み(S434)、音声変換実行部1034は音声出力部113からの識別音の出力を終了する(S435)。
Next, the user utters a voice (S433), the voice input unit 111 captures the voice (S434), and the voice conversion execution unit 1034 finishes outputting the identification sound from the voice output unit 113 (S435).
音声変換実行部1034が取り込んだ音声と音声認識辞書104b1に基づいて音声認識により認識文字列に変換し(S436)、後述する辞書変換入力処理(S437)において、ユーザ認証実行部1035が入力された音声を用いてユーザ認証を行った後、ユーザ認証が有効であれば、音声変換実行部1034が音声認識した認識文字列をユーザが選択した文字変換辞書に従って変換処理を行い、変換した結果を文字列の入力とする。
Based on the voice captured by the voice conversion execution unit 1034 and the voice recognition dictionary 104b1, it is converted into a recognized character string by voice recognition (S436), and the user authentication execution unit 1035 is input in a dictionary conversion input process (S437) described later. If the user authentication is valid after performing user authentication using voice, the recognition character string recognized by the voice conversion execution unit 1034 is converted according to the character conversion dictionary selected by the user, and the converted result is converted to a character. As column input.
図16を参照して、図15のステップS437の辞書変換入力処理について説明する。図16は、第3実施形態に係る辞書変換入力処理の流れを示すフローチャートである。図16において、図14と同一処理部分には同一の番号を付し、説明を省略する。
Referring to FIG. 16, the dictionary conversion input process in step S437 in FIG. 15 will be described. FIG. 16 is a flowchart showing the flow of dictionary conversion input processing according to the third embodiment. In FIG. 16, the same processing portions as those in FIG.
図16のユーザ認識処理S1406では、まずユーザ認証実行部1035により辞書登録番号に対応してユーザ情報記憶領域104cに記憶された声紋データを用いてユーザ認証を行い(S1406a1)、ユーザ認証実行部1035が声紋によるユーザ認証が無効であるか有効であるかを示す認証判定情報に基づいて分岐処理(S1406a2)を行う。辞書登録時に認証情報として記憶した声紋とS434(図15参照)で取り込んだ音声の声紋が異なり、ユーザ認証が無効の場合(S1406a2/No)には、音声変換実行部1034はユーザ認証が無効であることなどを表示し(S1408)、処理を終了する。辞書登録時に認証情報として記憶した声紋とS434で取り込んだ音声の声紋が一致し、ユーザ認証が有効の場合(S1406a2/Yes)には、ユーザ認証実行部1035は後述する識別音判定処理(S1406a3)を行い、ユーザ認証が無効であるか有効であるかを示す認証判定情報を音声変換実行部1034に対して出力する。音声変換実行部1034では、認証判定情報に基づいて分岐処理(S1407)を行う。
In the user recognition processing S1406 of FIG. 16, first, user authentication is performed by the user authentication execution unit 1035 using voiceprint data stored in the user information storage area 104c corresponding to the dictionary registration number (S1406a1), and the user authentication execution unit 1035. Branch processing (S1406a2) is performed based on authentication determination information indicating whether user authentication by voiceprint is invalid or valid. If the voiceprint stored as authentication information at the time of dictionary registration is different from the voiceprint captured in S434 (see FIG. 15) and the user authentication is invalid (S1406a2 / No), the voice conversion execution unit 1034 has invalid user authentication. It is displayed (S1408) and the process is terminated. If the voiceprint stored as authentication information at the time of dictionary registration matches the voiceprint captured in S434 and the user authentication is valid (S1406a2 / Yes), the user authentication execution unit 1035 performs an identification sound determination process (S1406a3) described later. The authentication determination information indicating whether the user authentication is invalid or valid is output to the voice conversion execution unit 1034. The voice conversion execution unit 1034 performs a branch process (S1407) based on the authentication determination information.
分岐処理S1407において、認証判定情報によりユーザ認証が無効の場合(S1407/No)には、音声変換実行部1034は録音された音声入力が行われたと判断し、ユーザ認証が無効であることなどを表示し(S1408)、処理を終了する。
In the branch process S1407, when the user authentication is invalid based on the authentication determination information (S1407 / No), the voice conversion execution unit 1034 determines that the recorded voice input has been performed, and the user authentication is invalid. Display (S1408), and the process ends.
認証判定情報によりユーザ認証が有効の場合(S1407/Yes)には、音声変換実行部1034は選択した文字変換辞書の情報に従って文字列を変換し(S1409)、処理を終了する。
If the user authentication is valid based on the authentication determination information (S1407 / Yes), the speech conversion execution unit 1034 converts the character string according to the information of the selected character conversion dictionary (S1409), and ends the process.
識別音判定処理S1406a3の一例について、図17のフローチャート及び図18の波形図を用いて説明する。図17は識別音判定処理の流れを示すフローチャートである。図18は、識別音判定処理に用いられる信号波形を示す図であって、(a1)から(a5)はユーザが音声入力した場合であり、識別音判定によりユーザ認証が有効の場合の波形図の例である。図18(b1)から(b5)は録音された音声が音声入力された場合であり、識別音判定によりユーザ認証が無効の場合の波形図の例である。
An example of the identification sound determination process S1406a3 will be described with reference to the flowchart of FIG. 17 and the waveform diagram of FIG. FIG. 17 is a flowchart showing the flow of the identification sound determination process. FIG. 18 is a diagram illustrating signal waveforms used in the identification sound determination process, and (a1) to (a5) are cases where the user inputs a voice, and waveform diagrams when the user authentication is valid by the identification sound determination. It is an example. FIGS. 18B1 to 18B5 are examples of waveform diagrams in the case where the recorded voice is inputted as voice and the user authentication is invalid by the identification sound determination.
図18(a1)及び(b1)は識別音出力開始S432から識別音出力終了S435までの期間に音声出力部113から出力している識別音の波形の一部を示したものであり、ユーザ認証実行部1035は所定の周波数F0の信号を所定の期間毎に出力する。ここでは、期間Tonで出力、期間Toffで出力停止を繰り返すものとする。なお、所定の期間Ton、Toffは、ユーザが発話のタイミングを合わせるのが困難な程度の期間、例えばミリ秒単位であることが望ましい。
18 (a1) and 18 (b1) show a part of the waveform of the identification sound output from the voice output unit 113 during the period from the identification sound output start S432 to the identification sound output end S435. The execution unit 1035 outputs a signal having a predetermined frequency F0 every predetermined period. Here, the output is repeated in the period Ton and the output stop is repeated in the period Toff. It should be noted that the predetermined periods Ton and Toff are desirably periods such as milliseconds when it is difficult for the user to synchronize the utterance timing.
図18(a2)及び(b2)はS434で取り込んだ音声信号の波形である。図18(a2)では音声出力部113から出力された識別音とユーザの音声とが取り込まれている。図18(b2)では音声出力部113から出力された識別音と録音されたユーザの音声及び録音された識別信号が取り込まれている。
18 (a2) and 18 (b2) are waveforms of the audio signal captured in S434. In FIG. 18 (a2), the identification sound output from the sound output unit 113 and the user's sound are captured. In FIG. 18 (b2), the identification sound output from the audio output unit 113, the recorded user's voice and the recorded identification signal are captured.
図17に示す識別音判定処理では、ユーザ認証実行部1035は、まずS434で取り込んだ音声信号(図18(a2)及び(b2))に対して識別音の周波数F0に対応する信号を検出するフィルタ処理(S1701)を行う。これにより、識別音の成分を検出した信号(図18(a3)及び(b3))が得られる。
In the identification sound determination process shown in FIG. 17, the user authentication execution unit 1035 first detects a signal corresponding to the frequency F0 of the identification sound with respect to the audio signal (FIGS. 18 (a2) and (b2)) captured in S434. Filter processing (S1701) is performed. Thereby, the signal (FIG. 18 (a3) and (b3)) which detected the component of the identification sound is obtained.
図18(a3)では音声出力部113から出力された識別音が検出されるのに対して、図18(b3)では音声出力部113から出力された識別音と録音された音声の識別音が検出される。また、音声出力部113から出力された識別音と録音された識別音ではタイミングがずれている。このため、音声出力部113から出力している識別音と録音された音声の識別音が同時に検出される期間は音声出力部113から出力している識別音と録音された音声の識別音の干渉が発生し、干渉の仕方によって音声入力部111で取り込まれる識別音の振幅が変化する。図18(b3)の例は干渉により識別音の振幅が小さくなった場合を示している。
In FIG. 18 (a3), the identification sound output from the audio output unit 113 is detected, whereas in FIG. 18 (b3), the identification sound output from the audio output unit 113 and the identification sound of the recorded voice are displayed. Detected. Further, the timing is different between the identification sound output from the audio output unit 113 and the recorded identification sound. For this reason, during the period in which the identification sound output from the audio output unit 113 and the identification sound of the recorded sound are detected simultaneously, the interference between the identification sound output from the audio output unit 113 and the identification sound of the recorded sound And the amplitude of the identification sound captured by the voice input unit 111 changes depending on the way of interference. The example of FIG. 18 (b3) shows a case where the amplitude of the identification sound is reduced due to interference.
次にユーザ認証実行部1035は、検出した信号の振幅を検出し(S1702)、検出した信号と所定の閾値と比較することにより、検出した信号をHレベルとLレベルの2値の信号に変換する(S1703)。
Next, the user authentication execution unit 1035 detects the amplitude of the detected signal (S1702), and compares the detected signal with a predetermined threshold value, thereby converting the detected signal into a binary signal of H level and L level. (S1703).
図18(a4)及び(b4)は振幅検出処理S1702により図18(a3)及び(b3)の識別音の振幅検出を行った信号であり、図18(a4)では音声出力部113から出力している識別音に対応したほぼ一定の振幅が検出されるのに対して、図18(b4)は音声出力部113から出力している識別音、録音された識別音、音声出力部113から出力している識別音と録音された識別音が干渉した識別音で各々検出される振幅が異なることを示している。図18(a5)及び(b5)は振幅検出処理S1702により検出した信号を2値化処理S1703により所定の閾値Vtと比較し、HレベルとLレベルの2値に変換した信号である。図18(a5)ではHレベルの期間がほぼTonとなるのに対して、図18(b5)では録音された音声の識別音の影響によりHレベルの期間がTonから大きくずれることになる。
18 (a4) and (b4) are signals obtained by detecting the amplitude of the identification sound shown in FIGS. 18 (a3) and (b3) by the amplitude detection processing S1702. In FIG. 18 (a4), the signals are output from the audio output unit 113. 18 (b4) shows that the identification sound output from the audio output unit 113, the recorded identification sound, and the output from the audio output unit 113 are detected. It is shown that the detected amplitude differs between the identification sound being recorded and the identification sound in which the recorded identification sound interferes. 18 (a5) and 18 (b5) are signals obtained by comparing the signal detected by the amplitude detection process S1702 with a predetermined threshold value Vt by the binarization process S1703 and converting it to a binary value of H level and L level. In FIG. 18 (a5), the H level period is substantially Ton, whereas in FIG. 18 (b5), the H level period is greatly deviated from Ton due to the influence of the identification sound of the recorded voice.
次に、ユーザ認証実行部1035は、2値化した信号のHレベルの期間を検出し(S1704)、Hレベルの期間が所定の範囲内(ここでは一例としてTon×0.9からTon×1.1の範囲とした)であるか否かによる分岐処理S1705を行う。所定の範囲に入っていない場合(S1705/No)には、ユーザ認証実行部1035は、録音された音声が入力されたものとしてユーザの認証を無効と判定し(S1706)、処理を終了する。所定の範囲内の場合(S1705/Yes)には、ユーザ認証実行部1035は、ユーザの認証を有効と判定し(S1707)、処理を終了する。
Next, the user authentication execution unit 1035 detects the H level period of the binarized signal (S1704), and the H level period is within a predetermined range (here, as an example, Ton × 0.9 to Ton × 1). Branching processing S1705 is performed based on whether or not it is within the range of .1. If it is not within the predetermined range (S1705 / No), the user authentication execution unit 1035 determines that the user authentication is invalid, assuming that the recorded voice is input (S1706), and ends the process. If it is within the predetermined range (S1705 / Yes), the user authentication execution unit 1035 determines that the user authentication is valid (S1707), and ends the process.
以上の処理によれば、音声出力部113から出力している識別音に対して録音された識別音のタイミングを合わせて録音された音声を入力するのは困難であることから、録音された音声が入力された時にはユーザ認証を無効と判定することが可能となる。
According to the above processing, since it is difficult to input the recorded sound in accordance with the timing of the recorded identification sound with respect to the identification sound output from the sound output unit 113, the recorded sound When is entered, it is possible to determine that user authentication is invalid.
なお、以上の実施形態では識別音として所定の周波数F0の信号を所定の期間毎に出力するものとしたが、これに限定するものではない。識別音を用いたユーザ認証の他例について、図19及び図20を参照して説明する。図19は、識別音設定処理S341において所定の周波数F0の信号を出力する期間Tonと停止する期間Toffのタイミングのパターンを複数(2以上)設けたときの識別音の波形の例を示す図である。(a)は周波数F0の信号を出力する期間Tonと停止する期間Toffがほぼ同じ出力パターンを示し、(b)は周波数F0の信号を出力する期間Tonと停止する期間Toffが異なる識別音の出力パターンを示す。図20は、識別音の周波数を複数(2以上)設けたときの音声入力の周波数スペクトルの例を示す図であって、(a)は、ユーザが実際に発声した音声が入力されたときの周波数スペクトルを示し、(b)は、録音された音声が入力されたときの周波数スペクトルを示す。
In the above embodiment, a signal having a predetermined frequency F0 is output as a discriminating sound every predetermined period. However, the present invention is not limited to this. Another example of user authentication using the identification sound will be described with reference to FIGS. 19 and 20. FIG. 19 is a diagram illustrating an example of the waveform of the identification sound when a plurality of (two or more) timing patterns are provided for the period Ton for outputting the signal having the predetermined frequency F0 and the period Toff for the stop in the identification sound setting process S341. is there. (A) shows the output pattern in which the period Ton for outputting the signal of the frequency F0 and the period of off Toff are substantially the same, and (b) shows the output of the identification sound in which the period Ton of outputting the signal of the frequency F0 and the period of off Toff are different Indicates a pattern. FIG. 20 is a diagram showing an example of a frequency spectrum of voice input when a plurality (two or more) of identification sound frequencies are provided. FIG. 20A shows a case where voice actually uttered by the user is input. A frequency spectrum is shown, (b) shows a frequency spectrum when the recorded audio | voice is input.
図19の(a)及び(b)に示すように、識別音設定処理S441において所定の周波数F0の信号を出力する期間Tonと停止する期間Toffのタイミングのパターンを複数(2以上)設け、音声入力毎に変えることにより、出力された識別音と、取り込んだ音声に含まれる識別音の期間Tonと停止する期間Toffのタイミングの違いを検出することで録音された音声が入力されたか否かを判別することができる。
As shown in FIGS. 19A and 19B, in the identification sound setting process S441, a plurality (two or more) of timing patterns of a period Ton for outputting a signal having a predetermined frequency F0 and a period Toff for stopping are provided. By changing for each input, it is determined whether or not the recorded sound is input by detecting the difference in timing between the output identification sound and the period Ton of the identification sound included in the captured sound and the stop period Toff. Can be determined.
また、識別音設定処理S441において識別音の周波数の設定を複数(2以上)設け、音声入力毎に変えることにより、録音していない音声を入力した場合には、図20の(a)に示すように音声出力部113から出力された識別信号の周波数F0に対応したスペクトルが検出される。これに対して、録音した音声を入力した場合には、図20の(b)に示すように音声出力部113から出力された識別信号の周波数F0に対応したスペクトルと録音された識別信号の周波数F1に対応したスペクトルが検出される。したがって、音声入力された信号の周波数スペクトルを解析し、音声出力部113から出力された識別信号の周波数と異なる周波数のスペクトルが検出された場合には録音された音声が入力されたと判断し、ユーザ認証を無効とするようにしてもよい。
In addition, in the identification sound setting process S441, when a plurality of (two or more) identification sound frequencies are set and changed for each sound input, and a sound that has not been recorded is input, as shown in FIG. Thus, the spectrum corresponding to the frequency F0 of the identification signal output from the audio output unit 113 is detected. On the other hand, when the recorded voice is input, the spectrum corresponding to the frequency F0 of the identification signal output from the voice output unit 113 and the frequency of the recorded identification signal are shown in FIG. A spectrum corresponding to F1 is detected. Therefore, the frequency spectrum of the input signal is analyzed, and when a spectrum having a frequency different from the frequency of the identification signal output from the audio output unit 113 is detected, it is determined that the recorded sound is input, and the user Authentication may be invalidated.
要は音声入力時に識別音を出力し、取り込んだ音声信号の識別音により録音された音声が入力されたか否かを判別することが出来ればよい。ここで、識別音は可聴周波数の信号でもよく、非可聴周波数の信号(超音波)であってもよい。
In short, it is only necessary to output an identification sound at the time of voice input, and to determine whether or not a recorded voice is input by the identification sound of the captured voice signal. Here, the identification sound may be an audible frequency signal or a non-audible frequency signal (ultrasound).
以上の処理により、声紋のように音声によるユーザ認証を用いた場合でも、録音された音声が入力として使用されたときにはユーザの認証を無効と判定することで、文字列の変換を行わないようにすることができる。
By the above processing, even when voice user authentication is used like a voiceprint, when recorded voice is used as input, it is determined that user authentication is invalid, so that character strings are not converted. can do.
<第4実施形態>
第4実施形態は、顔画像によるユーザ認証を用いた場合に、ユーザの認証を必要とする文字列の音声入力において、登録したユーザの写真によりユーザ認証が有効とされるのを防止するものである。以下図21乃至図23を参照して本実施形態の処理の流れを説明する。図21は第4実施形態による音声認識による文字入力処理の一例を示すシーケンス図である。図22は、第4実施形態に係る辞書変換入力処理の流れを示すフローチャートである。図23は、顔認証における口唇の動きの判定処理の内容を示し、(a)は口唇の動きを示し、(b)は、口唇の動きのx、y成分の時系列に沿った変化を示す。 <Fourth embodiment>
In the fourth embodiment, when user authentication based on a face image is used, it is possible to prevent the user authentication from being validated by a registered user's photograph in voice input of a character string that requires user authentication. is there. The processing flow of this embodiment will be described below with reference to FIGS. FIG. 21 is a sequence diagram showing an example of character input processing by voice recognition according to the fourth embodiment. FIG. 22 is a flowchart showing the flow of dictionary conversion input processing according to the fourth embodiment. FIG. 23 shows the contents of the lip movement determination process in face authentication, (a) shows the lip movement, and (b) shows the time series change of the x and y components of the lip movement. .
第4実施形態は、顔画像によるユーザ認証を用いた場合に、ユーザの認証を必要とする文字列の音声入力において、登録したユーザの写真によりユーザ認証が有効とされるのを防止するものである。以下図21乃至図23を参照して本実施形態の処理の流れを説明する。図21は第4実施形態による音声認識による文字入力処理の一例を示すシーケンス図である。図22は、第4実施形態に係る辞書変換入力処理の流れを示すフローチャートである。図23は、顔認証における口唇の動きの判定処理の内容を示し、(a)は口唇の動きを示し、(b)は、口唇の動きのx、y成分の時系列に沿った変化を示す。 <Fourth embodiment>
In the fourth embodiment, when user authentication based on a face image is used, it is possible to prevent the user authentication from being validated by a registered user's photograph in voice input of a character string that requires user authentication. is there. The processing flow of this embodiment will be described below with reference to FIGS. FIG. 21 is a sequence diagram showing an example of character input processing by voice recognition according to the fourth embodiment. FIG. 22 is a flowchart showing the flow of dictionary conversion input processing according to the fourth embodiment. FIG. 23 shows the contents of the lip movement determination process in face authentication, (a) shows the lip movement, and (b) shows the time series change of the x and y components of the lip movement. .
図21において、S401からS412の処理は図4と同じであり、説明を省略する。分岐処理S404の後、撮像部114は画像の取り込みを開始する(S441)。次に、音声入力部111は、ユーザが発声した音声(S442)を取り込み(S443)、撮像部114が画像の取り込みを終了する(S444)。
In FIG. 21, the processing from S401 to S412 is the same as that in FIG. After the branch process S404, the imaging unit 114 starts capturing an image (S441). Next, the voice input unit 111 captures the voice uttered by the user (S442) (S443), and the imaging unit 114 ends the capturing of the image (S444).
音声変換実行部1034は、取り込んだ音声と音声認識辞書104b1に基づいて音声認識により認識文字列に変換し(S445)、後述する辞書変換入力処理(S446)により、音声認識した文字列をユーザが選択した文字変換辞書に従って変換処理を行い、変換した結果を文字列の入力とする。
The voice conversion execution unit 1034 converts the captured voice and the voice recognition dictionary 104b1 into a recognized character string by voice recognition (S445), and the user recognizes the voice recognized character string by dictionary conversion input processing (S446) described later. Conversion processing is performed according to the selected character conversion dictionary, and the converted result is used as the input of the character string.
図22に辞書変換入力処理S446のフローチャートの一例を示す。図22において、図14と同一処理部分には同一の番号を付し、説明を省略する。
FIG. 22 shows an example of a flowchart of the dictionary conversion input process S446. In FIG. 22, the same processing parts as those in FIG.
図22のユーザ認識処理S1406では、まずユーザ認証実行部1035がS441からS454の間に撮像部114から取り込んだ画像に基づいて顔認証によるユーザ認証を行い(S1406b1)、ユーザ認証が無効であるか有効であるかを示す認証判定情報に基づいて分岐処理(S1406b2)を行う。音声辞書登録時に認証情報として記憶した顔画像とS441からS444の間に取り込んだ画像が異なり、ユーザ認証が無効の場合(S1406b2/No)には、音声変換実行部1034はユーザ認証が無効であることなどを表示し(S1408)、処理を終了する。
In the user recognition process S1406 of FIG. 22, first, the user authentication execution unit 1035 performs user authentication by face authentication based on the image captured from the imaging unit 114 between S441 and S454 (S1406b1), and whether the user authentication is invalid. A branch process (S1406b2) is performed based on the authentication determination information indicating whether it is valid. When the face image stored as the authentication information at the time of registering the voice dictionary is different from the image captured between S441 and S444 and the user authentication is invalid (S1406b2 / No), the voice conversion execution unit 1034 has invalid user authentication. Is displayed (S1408), and the process is terminated.
音声辞書登録時に認証情報として記憶した顔画像とS441からS444の間に取り込んだ画像が一致し、ユーザ認証が有効の場合(S1406b2/Yes)には、ユーザ認証実行部1035はS441からS444の間に取り込んだ画像から認証した顔の口の動きによる判定を行い(S1406b3)、ユーザ認証が無効であるか有効であるかを示す認証判定情報を音声変換実行部1034に対して出力する。音声変換実行部1034では、認証判定情報に基づいて分岐処理(S1407)を行う。
When the face image stored as the authentication information at the time of registering the voice dictionary matches the image captured between S441 and S444 and the user authentication is valid (S1406b2 / Yes), the user authentication execution unit 1035 executes the process between S441 and S444. In step S1406b3, authentication determination information indicating whether the user authentication is invalid or valid is output to the voice conversion execution unit 1034. The voice conversion execution unit 1034 performs a branch process (S1407) based on the authentication determination information.
口唇の動きによる判定では、ユーザ認証実行部1035はS441からS444の間に取り込んだ画像から、認識した顔の口唇の動きを検出する。ユーザ認証実行部1035が口唇の動きを検出した場合には、取り込んだ画像が写真を撮影したものではないと判断し、ユーザ認証を有効と判定する(S1407/Yes)。ユーザ認証実行部1035が口唇の動きを検出できない場合には、取り込んだ画像が写真を撮影したものと判断し、顔画像によるユーザ認証を無効と判定する(S1407/No)。
In the determination based on the movement of the lips, the user authentication execution unit 1035 detects the movement of the recognized lips of the face from the images captured between S441 and S444. When the user authentication execution unit 1035 detects the movement of the lips, it is determined that the captured image is not a photograph taken and it is determined that the user authentication is valid (S1407 / Yes). If the user authentication execution unit 1035 cannot detect the movement of the lips, it determines that the captured image is a photograph taken, and determines that the user authentication based on the face image is invalid (S1407 / No).
分岐処理S1407において、認証判定情報によりユーザ認証が無効の場合(S1407/No)には、音声変換実行部1034は、ユーザ認証が無効であることなどを表示し(S1408)、処理を終了する。認証判定情報によりユーザ認証が有効の場合(S1407/Yes)には、音声変換実行部1034が文字変換辞書の情報に従って文字列を変換し(S1409)、処理を終了する。
In the branch process S1407, when the user authentication is invalid according to the authentication determination information (S1407 / No), the voice conversion execution unit 1034 displays that the user authentication is invalid (S1408), and ends the process. If the user authentication is valid based on the authentication determination information (S1407 / Yes), the speech conversion execution unit 1034 converts the character string according to the information in the character conversion dictionary (S1409), and the process ends.
なお、口唇の動きによる判定処理S1406b3において、ユーザ認証実行部1035は単に口唇の動きの有無だけでなく、音声入力時の文字列に対応した口唇の開け具合により判定するようにしてもよい。
In the determination process S1406b3 based on the movement of the lips, the user authentication execution unit 1035 may determine based on not only the presence / absence of the movement of the lips but also the degree of opening of the lips corresponding to the character string at the time of voice input.
例えば、図23の(a)に示すようにユーザ認証実行部1035は口唇の横方向の大きさをX、縦方向の大きさをYとして口唇の大きさを検出する。そして、予め辞書、例えば特殊文字情報にユーザ認証情報(図13の13a5)を関連付けて登録する際、音声認識文字列を音声入力するときの顔画像を取り込む。そして、ユーザ認証実行部1035が取り込んだ画像から図23の(b)に示すように音声入力した文字列に対応した口唇の大きさのX、Y成分を検出し、顔画像によるユーザ認証情報と共に文字列に対応した口唇の大きさX、Yを辞書登録情報として記憶する。
For example, as shown in FIG. 23 (a), the user authentication execution unit 1035 detects the size of the lips with the horizontal size of the lips as X and the vertical size as Y. Then, when the user authentication information (13a5 in FIG. 13) is registered in advance in association with the dictionary, for example, special character information, a face image when the speech recognition character string is input by voice is captured. Then, from the image captured by the user authentication execution unit 1035, as shown in FIG. 23B, the X and Y components of the size of the lip corresponding to the character string inputted by voice are detected, together with the user authentication information based on the face image. Lip sizes X and Y corresponding to the character string are stored as dictionary registration information.
ユーザ認証処理(S1406)においては、ユーザ認証実行部1035は、口唇の動きによる判定処理S1406b3において、S441からS444の間に取り込んだ画像から音声入力した文字列に対応した口唇の大きさX、Yを検出し、辞書情報として登録された口唇の大きさを比較することにより判定を行うようにしてもよい。これにより、より精度よく顔画像によるユーザ認証を行うことが可能となる。
In the user authentication process (S1406), the user authentication execution unit 1035 determines the size X, Y of the lip corresponding to the character string input by voice from the image captured between S441 and S444 in the determination process S1406b3 based on the movement of the lips. May be detected by comparing the sizes of the lips registered as dictionary information. As a result, it is possible to perform user authentication with a face image with higher accuracy.
<第5実施形態>
本実施形態は放送受信装置1と連携した情報端末装置2を用いて音声入力処理を行うものである。 <Fifth Embodiment>
In the present embodiment, voice input processing is performed using aninformation terminal device 2 linked with the broadcast receiving device 1.
本実施形態は放送受信装置1と連携した情報端末装置2を用いて音声入力処理を行うものである。 <Fifth Embodiment>
In the present embodiment, voice input processing is performed using an
[携帯情報端末のハードウェア構成]
図24は、情報端末装置2の内部構成の一例を示すブロック図である。情報端末装置2は、システムバス200、主制御部201、ROM202、RAM203、ストレージ部204、拡張I/F部205、操作部206、センサ部210、通信処理部220、画像処理部230、及び音声処理部240、で構成される。 [Hardware configuration of portable information terminal]
FIG. 24 is a block diagram illustrating an example of an internal configuration of theinformation terminal device 2. The information terminal device 2 includes a system bus 200, a main control unit 201, a ROM 202, a RAM 203, a storage unit 204, an expansion I / F unit 205, an operation unit 206, a sensor unit 210, a communication processing unit 220, an image processing unit 230, and an audio. The processing unit 240 is configured.
図24は、情報端末装置2の内部構成の一例を示すブロック図である。情報端末装置2は、システムバス200、主制御部201、ROM202、RAM203、ストレージ部204、拡張I/F部205、操作部206、センサ部210、通信処理部220、画像処理部230、及び音声処理部240、で構成される。 [Hardware configuration of portable information terminal]
FIG. 24 is a block diagram illustrating an example of an internal configuration of the
主制御部201は、情報端末装置2全体を制御するマイクロプロセッサユニットである。システムバス200は主制御部201と情報端末装置2内の各動作ブロックとの間でデータ送受信を行うためのデータ通信路である。
The main control unit 201 is a microprocessor unit that controls the entire information terminal device 2. The system bus 200 is a data communication path for transmitting and receiving data between the main control unit 201 and each operation block in the information terminal device 2.
ROM202は、オペレーティングシステムなどの基本動作プログラムやその他の動作プログラムが格納されたメモリであり、例えばEEPROMやフラッシュROMのような書き換え可能なROMが用いられる。RAM203は基本動作プログラムやその他の動作プログラム実行時のワークエリアとなる。ROM202及びRAM203は主制御部201と一体構成であっても良い。また、ROM202は、図24に示したような独立構成とはせず、ストレージ部204内の一部記憶領域を使用するようにしても良い。
The ROM 202 is a memory in which a basic operation program such as an operating system and other operation programs are stored. For example, a rewritable ROM such as an EEPROM or a flash ROM is used. The RAM 203 serves as a work area when the basic operation program and other operation programs are executed. The ROM 202 and RAM 203 may be integrated with the main control unit 201. Further, the ROM 202 may not use an independent configuration as shown in FIG. 24 but may use a partial storage area in the storage unit 204.
ストレージ部204は、情報端末装置2の動作プログラムや動作設定値、情報端末装置2のユーザの個人情報等を記憶する。また、ネットワーク上からダウンロードした動作プログラムや前記動作プログラムで作成した各種データ等を記憶可能である。また、ネットワーク上からダウンロードした、動画、静止画、音声等のコンテンツも記憶可能である。ストレージ部204の一部領域を以ってROM202の機能の全部または一部を代替しても良い。また、ストレージ部204は、情報端末装置2に外部から電源が供給されていない状態であっても記憶している情報を保持する必要がある。したがって、例えば、フラッシュROMやSSD、HDD等のデバイスが用いられる。
The storage unit 204 stores an operation program and an operation setting value of the information terminal device 2, personal information of the user of the information terminal device 2, and the like. Further, it is possible to store an operation program downloaded from the network, various data created by the operation program, and the like. Also, contents such as moving images, still images, and audio downloaded from the network can be stored. All or some of the functions of the ROM 202 may be replaced by a partial area of the storage unit 204. In addition, the storage unit 204 needs to hold stored information even when power is not supplied to the information terminal device 2 from the outside. Therefore, for example, devices such as a flash ROM, SSD, and HDD are used.
なお、ROM202やストレージ部204に記憶された前記各動作プログラムは、外部ネットワーク4上のサーバ装置からのダウンロード処理により更新及び機能拡張することが可能であるものとする。
Note that each operation program stored in the ROM 202 or the storage unit 204 can be updated and expanded by a download process from a server device on the external network 4.
拡張I/F部205は、情報端末装置2の機能を拡張するためのインターフェース群であり、本実施形態では、映像/音声I/F、USBI/F、メモリI/F等で構成されるものとする。映像/音声I/Fは、外部映像/音声出力機器からの映像信号/音声信号の入力、外部映像/音声入力機器への映像信号/音声信号の出力、等を行う。USBI/Fは、PC等と接続してデータの送受信を行う。また、キーボードやその他のUSB機器の接続を行っても良い。メモリI/Fはメモリカードやその他のメモリ媒体を接続してデータの送受信を行う。
The expansion I / F unit 205 is a group of interfaces for extending the functions of the information terminal device 2, and in this embodiment, includes an image / audio I / F, a USB I / F, a memory I / F, and the like. And The video / audio I / F performs input of video signals / audio signals from external video / audio output devices, output of video signals / audio signals to external video / audio input devices, and the like. The USB I / F transmits and receives data by connecting to a PC or the like. A keyboard or other USB device may be connected. The memory I / F connects a memory card or other memory medium to transmit / receive data.
操作部206は、情報端末装置2に対する操作指示の入力を行う指示入力部であり、本実施形態では、表示部231に重ねて配置したタッチパネルで構成されるものとする。例えば、指でタッチパネルにタッチした後にタッチしたまま指を特定の方向に移動させるスワイプと呼ばれるジェスチャー、指でタッチパネルにタッチした後に素早くリリースするタップと呼ばれるジェスチャー等を検出し、情報端末装置2への操作入力が可能となる。
The operation unit 206 is an instruction input unit that inputs an operation instruction to the information terminal device 2, and in this embodiment, is configured by a touch panel that is arranged over the display unit 231. For example, a gesture called swipe that moves a finger in a specific direction while touching the touch panel with a finger, a gesture called tap that is quickly released after touching the touch panel with a finger, and the like are detected. Operation input is possible.
センサ部210は、情報端末装置2の状態を検出するためのセンサ群であり、本実施形態では、GPS受信部211、ジャイロセンサ212、地磁気センサ213、加速度センサ214、照度センサ215、近接センサ216、を含む。これらのセンサ群により、情報端末装置2の位置、傾き、方角、動き、及び周囲の明るさ、周囲物の近接状況、等を検出することが可能となる。また、情報端末装置2が、気圧センサ等、他のセンサを更に備えていても良い。
The sensor unit 210 is a sensor group for detecting the state of the information terminal device 2. In this embodiment, the GPS reception unit 211, the gyro sensor 212, the geomagnetic sensor 213, the acceleration sensor 214, the illuminance sensor 215, and the proximity sensor 216. ,including. With these sensor groups, it is possible to detect the position, inclination, direction, movement, brightness of the information terminal device 2, the proximity of nearby objects, and the like. The information terminal device 2 may further include other sensors such as a barometric pressure sensor.
通信処理部220は、LAN通信部221、移動体電話網通信部222、Bluetooth通信部223、を含む。LAN通信部221はルータ装置3を介して外部ネットワーク4と接続され、外部ネットワーク4上のサーバ装置とデータの送受信を行う。ルータ装置3との接続はWifi(登録商標)等の無線接続で行われるものとする。移動体電話網通信部222は移動体電話通信網の基地局(図示せず)との無線通信により、電話通信(通話)及びデータの送受信を行う。LAN通信部221、移動体電話網通信部222、Bluetooth通信部223は、それぞれ符号回路や復号回路、アンテナ等を備えるものとする。また、通信処理部220が、NFC通信部や赤外線通信部等、他の通信部を更に備えていても良い。
The communication processing unit 220 includes a LAN communication unit 221, a mobile telephone network communication unit 222, and a Bluetooth communication unit 223. The LAN communication unit 221 is connected to the external network 4 via the router device 3 and transmits / receives data to / from a server device on the external network 4. It is assumed that the connection with the router device 3 is performed by a wireless connection such as WiFi (registered trademark). The mobile telephone network communication unit 222 performs telephone communication (call) and data transmission / reception by wireless communication with a base station (not shown) of the mobile telephone communication network. The LAN communication unit 221, the mobile telephone network communication unit 222, and the Bluetooth communication unit 223 are each provided with a coding circuit, a decoding circuit, an antenna, and the like. The communication processing unit 220 may further include another communication unit such as an NFC communication unit or an infrared communication unit.
画像処理部230は、表示部231、画像信号処理部232、第一画像入力部233、第二画像入力部234、を含む。表示部231は、例えば液晶パネル等の表示デバイスであり、画像信号処理部232で処理した画像データを情報端末装置2のユーザに提供する。画像信号処理部232は図示を省略したビデオRAMを備え、前記ビデオRAMに入力された画像データが表示部231に表示される。また、画像信号処理部232は、必要に応じてフォーマット変換、メニューやその他のOSD信号の重畳処理等を行う機能を有するものとする。第一画像入力部233及び第二画像入力部234は、CCDやCMOSセンサ等の電子デバイスを用いてレンズから入力した光を電気信号に変換することにより、周囲や対象物の画像データを入力するカメラユニットである。
The image processing unit 230 includes a display unit 231, an image signal processing unit 232, a first image input unit 233, and a second image input unit 234. The display unit 231 is a display device such as a liquid crystal panel, for example, and provides the image data processed by the image signal processing unit 232 to the user of the information terminal device 2. The image signal processing unit 232 includes a video RAM (not shown), and image data input to the video RAM is displayed on the display unit 231. In addition, the image signal processing unit 232 has a function of performing format conversion, menu and other OSD signal superimposition processing as necessary. The first image input unit 233 and the second image input unit 234 input image data of surroundings and objects by converting light input from a lens into an electrical signal using an electronic device such as a CCD or a CMOS sensor. It is a camera unit.
音声処理部240は、音声出力部241、音声信号処理部242、音声入力部243、を含む。音声出力部241はスピーカであり、音声信号処理部242で処理した音声信号を情報端末装置2のユーザに提供する。音声入力部243はマイクであり、ユーザの声などを音声データに変換して入力する。
The audio processing unit 240 includes an audio output unit 241, an audio signal processing unit 242, and an audio input unit 243. The audio output unit 241 is a speaker, and provides the audio signal processed by the audio signal processing unit 242 to the user of the information terminal device 2. The voice input unit 243 is a microphone, which converts a user's voice and the like into voice data and inputs the voice data.
情報端末装置2は、携帯電話やスマートフォン、タブレット端末等であって良い。PDA(Personal Digital Assistants)やノート型PC(Personal Computer)であっても良い。
The information terminal device 2 may be a mobile phone, a smartphone, a tablet terminal, or the like. It may be a PDA (Personal Digital Assistant) or a notebook PC (Personal Computer).
なお、図24に示した情報端末装置2の構成例は、センサ部210等、本実施形態に必須ではない構成も多数含んでいるが、これらが備えられていない構成であっても本実施形態の効果を損なうことはない。また、デジタル放送受信機能や電子マネー決済機能等、図示していない構成が更に加えられていても良い。
Note that the configuration example of the information terminal device 2 illustrated in FIG. 24 includes a number of configurations that are not essential to the present embodiment, such as the sensor unit 210, but this embodiment may be configured without these components. There is no loss of effect. Further, a configuration not shown in the figure such as a digital broadcast receiving function and an electronic money settlement function may be further added.
[携帯情報端末のソフトウェア構成]
図25を参照して本実施形態の情報端末装置2のソフトウェア構成について説明する。図25は、本実施形態に係る情報端末装置2の内部構成を示す図であって、(a)は、情報端末装置2のソフトウェア構成図であり、(b)は音声変換情報記憶領域に記憶される辞書を示す。図25の(a)は、ROM202、RAM203及びストレージ部204におけるソフトウェアの構成を示す。 [Software configuration of portable information terminal]
A software configuration of theinformation terminal device 2 of the present embodiment will be described with reference to FIG. FIG. 25 is a diagram illustrating an internal configuration of the information terminal device 2 according to the present embodiment, in which (a) is a software configuration diagram of the information terminal device 2 and (b) is stored in the voice conversion information storage area. Indicates the dictionary to be used. FIG. 25A shows a software configuration in the ROM 202, the RAM 203, and the storage unit 204.
図25を参照して本実施形態の情報端末装置2のソフトウェア構成について説明する。図25は、本実施形態に係る情報端末装置2の内部構成を示す図であって、(a)は、情報端末装置2のソフトウェア構成図であり、(b)は音声変換情報記憶領域に記憶される辞書を示す。図25の(a)は、ROM202、RAM203及びストレージ部204におけるソフトウェアの構成を示す。 [Software configuration of portable information terminal]
A software configuration of the
ROM202に記憶された基本動作プログラム2021はRAM203に展開され、更に主制御部201が前記展開された基本動作プログラムを実行することにより、基本動作実行部2031を構成する。
The basic operation program 2021 stored in the ROM 202 is expanded in the RAM 203, and the main control unit 201 executes the expanded basic operation program to constitute the basic operation execution unit 2031.
また、ストレージ部204に記憶されたアプリケーションプログラム2041、音声変換プログラム2042、連携機器管理プログラム2043はそれぞれRAM203に展開され、更に主制御部201が前記展開された各動作プログラムを実行することにより、アプリケーション実行部2032、音声変換実行部2033、連携機器管理実行部2034を構成する。また、RAM203は、各動作プログラム実行時に作成したデータを、必要に応じて一時的に保持する一時記憶領域を備えるものとする。
In addition, the application program 2041, the voice conversion program 2042, and the cooperation device management program 2043 stored in the storage unit 204 are expanded in the RAM 203, and the main control unit 201 executes the expanded operation programs so that the application program is executed. An execution unit 2032, a voice conversion execution unit 2033, and a linked device management execution unit 2034 are configured. The RAM 203 includes a temporary storage area that temporarily stores data created when each operation program is executed as necessary.
また、ストレージ部204は、音声認識した文字列を所定の文字列に変換するための辞書等を記憶する音声変換情報記憶領域204a、放送受信装置1との連携動作等の際に使用する認証情報を記憶する連携機器情報記憶領域204b、その他の各種情報を記憶する各種情報記憶領域204cを備えるものとする。音声変換情報記憶領域204aには図25の(b)に示すように、入力された音声を認識して文字列に変換する音声認識辞書204a1、音声認識した文字列を所定の文字列に変換する通常文字変換辞書(第1辞書)204a2、特殊文字変換辞書(第2辞書)204a3が記憶されているものとする。
The storage unit 204 also includes a voice conversion information storage area 204a for storing a dictionary or the like for converting a voice-recognized character string into a predetermined character string, and authentication information used in a cooperative operation with the broadcast receiving apparatus 1 or the like. Assume that a cooperative device information storage area 204b for storing information and a variety of information storage areas 204c for storing various other information are provided. In the voice conversion information storage area 204a, as shown in FIG. 25B, a voice recognition dictionary 204a1 that recognizes an input voice and converts it into a character string, and converts the voice recognized character string into a predetermined character string. It is assumed that a normal character conversion dictionary (first dictionary) 204a2 and a special character conversion dictionary (second dictionary) 204a3 are stored.
通常文字変換辞書(第1辞書)204a2、特殊文字変換辞書(第2辞書)204a3は、第一実施形態の通常文字変換辞書及び特殊文字変換辞書と同じものであるが、情報処理端末装置2が例えばスマートフォンや携帯電話のように、ユーザが特定される場合には、このユーザに対応した通常文字変換辞書及び特殊文字変換辞書として構成することができる。また、情報処理端末2が複数のユーザが使用する機器、例えばPCやタブレット端末の場合、各ユーザに対応した通常文字変換辞書及び特殊文字変換辞書を用意しておき、これらの機器にユーザを指定してログインした情報を用いて、後述する音声変換実行部2033がログインしたユーザに対応する通常文字変換辞書及び特殊文字変換辞書を選択してもよい。
The normal character conversion dictionary (first dictionary) 204a2 and the special character conversion dictionary (second dictionary) 204a3 are the same as the normal character conversion dictionary and the special character conversion dictionary of the first embodiment. For example, when a user is specified like a smart phone or a mobile phone, it can be configured as a normal character conversion dictionary and a special character conversion dictionary corresponding to the user. In addition, when the information processing terminal 2 is a device used by a plurality of users, for example, a PC or a tablet terminal, a normal character conversion dictionary and a special character conversion dictionary corresponding to each user are prepared, and a user is designated for these devices. Using the logged-in information, a normal character conversion dictionary and a special character conversion dictionary corresponding to the logged-in user may be selected by the voice conversion executing unit 2033 described later.
なお、以下では、説明を簡単にするために、主制御部201がROM202に格納された基本動作プログラム2021をRAM203に展開して実行することにより各動作ブロックの制御を行う処理を、基本動作実行部2031が各動作ブロックの制御を行うものとして記述する。他の動作プログラムに関しても同様の記述を行う。
In the following, in order to simplify the explanation, the main control unit 201 executes the basic operation execution by executing the basic operation program 2021 stored in the ROM 202 by expanding the basic operation program 2021 in the RAM 203 and executing it. The unit 2031 is described as performing control of each operation block. The same description is made for other operation programs.
アプリケーション実行部2032はサーバ装置からダウンロードを行った各種の動作プログラムを実行するものである。各アプリケーションは、操作部206によりユーザからの操作を受け、表示部231に表示されたアプリケーション起動用アイコンを選択することにより起動する。
The application execution unit 2032 executes various operation programs downloaded from the server device. Each application is activated by receiving an operation from the user via the operation unit 206 and selecting an application activation icon displayed on the display unit 231.
音声変換実行部2033は音声入力部243により取り込んだユーザの音声を音声認識辞書204a1に基づいて文字列として認識し、放送受信装置1のチャンネル選択等の操作入力、或いは音声認識した文字列(認識文字列)を辞書204a2、204a3に従って所定の文字列に変換して文字入力を行う。
The voice conversion execution unit 2033 recognizes the user's voice captured by the voice input unit 243 as a character string based on the voice recognition dictionary 204a1, and performs an operation input such as channel selection of the broadcast receiving apparatus 1 or a voice-recognized character string (recognition). Character string) is converted into a predetermined character string according to the dictionaries 204a2 and 204a3, and characters are input.
連携機器管理実行部2034は、宅内のローカルネットワーク或いはインターネット等の外部ネットワーク4に接続された放送受信装置1を登録・管理し、情報端末装置2により登録した放送受信装置1の操作を実行する。
The linked device management execution unit 2034 registers and manages the broadcast receiving device 1 connected to the local network in the house or the external network 4 such as the Internet, and executes the operation of the broadcast receiving device 1 registered by the information terminal device 2.
またRAM203は、主制御部201の作業エリアとなる一時記憶領域2035を含む。
The RAM 203 includes a temporary storage area 2035 that serves as a work area for the main control unit 201.
前記各動作プログラムは、製品出荷の時点で、予めROM202及び/またはストレージ部204に格納された状態であっても良い。製品出荷後に、外部ネットワーク4を介してサーバ装置等からLAN通信部221または移動体電話網通信部222を介して取得するものであっても良い。また、メモリカードや光ディスク等に格納された前記各動作プログラムを、拡張I/F部205等を介して取得するものであっても良い。
The operation programs may be stored in the ROM 202 and / or the storage unit 204 in advance at the time of product shipment. After the product is shipped, it may be acquired from the server device or the like via the external network 4 via the LAN communication unit 221 or the mobile telephone network communication unit 222. Further, each operation program stored in a memory card, an optical disk, or the like may be acquired via the expansion I / F unit 205 or the like.
次に情報端末装置2により放送受信装置1の文字入力を行う場合の処理について、図26及び図27を参照して説明する。図26は、第5実施形態による音声認識による文字入力処理の一例を示すシーケンス図である。図27は、情報端末装置2の画面表示例を示す図であり、(a)はアプリケーションの起動画面を示し、(b)は機器認証画面を示し、(c)は文字入力画面を示す。
Next, processing when the information terminal device 2 performs character input of the broadcast receiving device 1 will be described with reference to FIGS. 26 and 27. FIG. FIG. 26 is a sequence diagram illustrating an example of a character input process by voice recognition according to the fifth embodiment. FIG. 27 is a diagram showing a screen display example of the information terminal device 2, where (a) shows an application startup screen, (b) shows a device authentication screen, and (c) shows a character input screen.
図26において、まずユーザが操作入力(S2601)を行い、放送受信装置1と情報端末装置2を連携して動作させるためのアプリケーションを起動する(S2602)。
26, first, the user performs an operation input (S2601), and starts an application for operating the broadcast receiving apparatus 1 and the information terminal apparatus 2 in cooperation (S2602).
図27の(a)は、情報端末装置2においてアプリケーションを選択して起動する画面の例であり、アプリケーションに対応したボタン (アイコン)が表示部231に表示されている。ここで、ユーザが「TV連携」ボタン231a1をタップすることによりアプリケーションの起動入力を行うと、情報端末装置2と連携して放送受信装置1を操作するアプリケーションが起動する。
FIG. 27A shows an example of a screen that is selected and activated in the information terminal device 2, and a button (icon) corresponding to the application is displayed on the display unit 231. Here, when the user inputs the activation of the application by tapping the “TV cooperation” button 231 a 1, the application for operating the broadcast receiving apparatus 1 is activated in cooperation with the information terminal apparatus 2.
次に主制御部201は機器認証のための認証画面を表示部231に表示し(S2603)、ユーザが連携機器の選択入力(S2604)を行い、連携機器を選択する。
Next, the main control unit 201 displays an authentication screen for device authentication on the display unit 231 (S2603), and the user performs selection input of a cooperation device (S2604), and selects the cooperation device.
図27の(b)は機器認証画面の一例であり、機器一覧リスト23b1と選択枠231b2、決定ボタン231b3が表示される。連携機器管理実行部2034は、機器一覧リスト23b1を表示する。連携機器管理実行部2034は、ネットワークに接続されている機器が見つかった場合は、機器一覧リスト23b1にその機器名が認証済み/未認証にかかわらず表示する。連携機器管理実行部2034は、見つかった機器名を、認証または未認証の情報と合わせて、ストレージ部204の認証情報記憶領域204bに保存する。過去に見つけた機器で、今回見つけられなかった場合は、機器名を他の機器と区別するために表示色を変えて表示することもできる。図27の(b)の例では、機器一覧リスト231b1には、「TV1(放送受信装置1)」「TV2(放送受信装置2)」が表示されており、TV1は以前認証済みであることを示している。
(B) of FIG. 27 is an example of a device authentication screen, and a device list 23b1, a selection frame 231b2, and an enter button 231b3 are displayed. The linked device management execution unit 2034 displays the device list 23b1. When a device connected to the network is found, the linked device management execution unit 2034 displays the device name in the device list 23b1 regardless of whether the device name is authenticated or not authenticated. The linked device management execution unit 2034 stores the found device name in the authentication information storage area 204b of the storage unit 204 together with the authenticated or unauthenticated information. If a device was found in the past but could not be found this time, the device name can be displayed with a different display color to distinguish it from other devices. In the example of FIG. 27B, “TV1 (broadcast receiving device 1)” and “TV2 (broadcast receiving device 2)” are displayed in the device list 231b1, and it is confirmed that TV1 has been previously authenticated. Show.
ユーザが機器一覧リスト231b1の「TV1」が表示された部分をタップすることで選択枠231b2が「TV1」の表示部分に表示され、決定ボタン231b3をタップすることにより、連携する放送受信装置としてTV1が選択される。
When the user taps the portion of the device list 231b1 where “TV1” is displayed, the selection frame 231b2 is displayed on the display portion of “TV1”. Is selected.
ユーザが連携機器の選択操作を行うと、放送受信装置1からの認証を受ける動作が始まる。そのため、連携機器管理実行部2034は、予め連携機器情報記憶領域204bに保存されたユーザID及びパスワードなどの認証情報を、LAN通信部221、ルータ装置3、LAN通信部117を経由して、放送受信装置1の連携端末管理実行部1036に対して送信する(S2605)。
When the user performs an operation of selecting a linked device, the operation of receiving authentication from the broadcast receiving device 1 starts. Therefore, the cooperation device management execution unit 2034 broadcasts authentication information such as a user ID and a password stored in the cooperation device information storage area 204b in advance via the LAN communication unit 221, the router device 3, and the LAN communication unit 117. It transmits with respect to the cooperation terminal management execution part 1036 of the receiver 1 (S2605).
放送受信装置1の連携端末管理実行部1036は、連携端末情報記憶領域104dに記憶された認証情報と情報端末装置2の連携機器管理実行部2034から送信された認証情報とを比較することにより認証を行い(S2606)、認証結果を情報端末装置2の連携機器管理実行部2034に応答する(S2607)。認証情報が一致する場合は、連携端末管理実行部1036は連携機器管理実行部2034に対して接続を認証する。ここで情報端末装置2は、最後に認証した機器を連携機器情報記憶領域204bに記憶しておくことにより、図27の(b)の認証画面を省略することもできる。認証情報が一致しない場合は、改めて図27の(b)の認証画面を表示するようにしてもよい。
The cooperation terminal management execution unit 1036 of the broadcast receiving device 1 performs authentication by comparing the authentication information stored in the cooperation terminal information storage area 104d with the authentication information transmitted from the cooperation device management execution unit 2034 of the information terminal device 2. (S2606), and returns the authentication result to the linked device management execution unit 2034 of the information terminal device 2 (S2607). If the authentication information matches, the cooperation terminal management execution unit 1036 authenticates the connection to the cooperation device management execution unit 2034. Here, the information terminal device 2 can omit the authentication screen of FIG. 27B by storing the last authenticated device in the linked device information storage area 204b. If the authentication information does not match, the authentication screen shown in FIG. 27B may be displayed again.
以上の処理により、情報端末装置2が放送受信装置1の連携端末として認証され、ユーザは情報端末装置2を用いて放送受信装置1の操作を行うことができる。
Through the above processing, the information terminal device 2 is authenticated as a cooperation terminal of the broadcast receiving device 1, and the user can operate the broadcast receiving device 1 using the information terminal device 2.
次に、放送受信装置1は文字入力画面を表示するとともに(S2610)、連携する情報端末装置2にソフトウェアキーボードの起動要求を送信する(S2611)。文字列入力画面の表示は、例えば外部ネットワーク4を介して、サーバ装置から配信されるコンテンツをストリーミング視聴或いはダウンロード視聴するために、サーバにログインするアカウント名或いはパスワードを入力する際に表示される。
Next, the broadcast receiving apparatus 1 displays a character input screen (S2610), and transmits a software keyboard activation request to the cooperating information terminal apparatus 2 (S2611). The display of the character string input screen is displayed, for example, when an account name or password for logging in to the server is input in order to perform streaming viewing or download viewing of the content distributed from the server device via the external network 4.
情報端末装置2は表示部231に文字入力画面を表示する(S2612)。図27の(c)は情報端末装置2の文字入力画面の例であり、入力された文字を表示する表示枠231c1、文字入力キー231c2、入力された文字を放送受信装置1に送信する送信キー231c3、音声認識による文字入力キー231c4が表示されている。
The information terminal device 2 displays a character input screen on the display unit 231 (S2612). FIG. 27C shows an example of a character input screen of the information terminal device 2, which includes a display frame 231 c 1 for displaying the input characters, a character input key 231 c 2, and a transmission key for transmitting the input characters to the broadcast receiving device 1. A character input key 231c4 by voice recognition is displayed.
主制御部201は送信キー231c3が選択されるまで繰り返し処理(S2613)を実行する。
The main control unit 201 repeats the process (S2613) until the transmission key 231c3 is selected.
繰り返し処理S2613では、まずユーザが表示部231に表示されたキーをタップすることにより、キー入力を行う(S2621)。次に、主制御部201はユーザがタップしたキーの種別を基にキー入力状態判定でキー入力状態を判定し(S2622)、判定結果による分岐処理(S2623)を実行する。
In the repetitive processing S2613, first, the user performs key input by tapping the key displayed on the display unit 231 (S2621). Next, the main control unit 201 determines the key input state by the key input state determination based on the type of the key tapped by the user (S2622), and executes a branch process (S2623) based on the determination result.
キー入力状態判定結果において、主制御部201が、キー入力は文字入力キー231c2から行われると判定すると、主制御部201はキー入力された文字を表示枠231c1に表示する(S2631)。
In the key input state determination result, when the main control unit 201 determines that the key input is performed from the character input key 231c2, the main control unit 201 displays the character input by the key on the display frame 231c1 (S2631).
キー入力状態判定結果において、主制御部201が、音声入力キーが所定の時間以下で1回タップされたと判定した場合には、ユーザからの音声の入力(S2641)を音声入力部243で取り込み(S2642)、音声変換実行部2033が取り込んだ音声と音声認識辞書204a1に基づいて音声認識により認識文字列に変換する(S2643)。
In the key input state determination result, when the main control unit 201 determines that the voice input key has been tapped once within a predetermined time or less, the voice input unit 243 captures voice input from the user (S2641) ( In step S2642, the voice conversion execution unit 2033 converts the voice into a recognized character string by voice recognition based on the voice and the voice recognition dictionary 204a1 (S2643).
次に、音声変換実行部2033が通常辞書変換・表示処理により、認識文字列を通常文字変換辞書(第1の辞書)204a2に従って変換処理を行い、変換した結果を表示枠231c1に表示する(S2644)。通常文字変換辞書としては一般的な語句が製品出荷の時点で、予めROM202及び/またはストレージ部204に格納されている。或いは製品出荷後に、インターネット4上のサーバ装置等からLAN通信部221或いは電話網通信部222を介して取得するものであっても良い。また、メモリカードや光ディスク等に格納された前記各動作プログラムを、拡張I/F部205を介して取得するものであっても良い。或いは、先に図5及び図7で説明した辞書登録処理と同様にユーザが登録するようにしてもよい。
Next, the speech conversion execution unit 2033 performs a conversion process on the recognized character string according to the normal character conversion dictionary (first dictionary) 204a2 by the normal dictionary conversion / display process, and displays the converted result on the display frame 231c1 (S2644). ). As a normal character conversion dictionary, general words and phrases are stored in advance in the ROM 202 and / or the storage unit 204 at the time of product shipment. Alternatively, it may be acquired from the server device on the Internet 4 via the LAN communication unit 221 or the telephone network communication unit 222 after product shipment. Further, each operation program stored in a memory card, an optical disk, or the like may be acquired via the expansion I / F unit 205. Alternatively, the user may register in the same manner as the dictionary registration process described above with reference to FIGS.
分岐処理S2623において、主制御部201が、キー入力状態判定結果、音声入力キー231c4が所定の時間以下で2回押下された、と判定した場合にはユーザからの音声の入力(S2651)を音声入力部243で取り込み(S2652)、音声変換実行部2033が取り込んだ音声と音声認識辞書204a1に基づいて音声認識により認識文字列に変換し(S2653)、特殊辞書変換・表示処理(S2654)を行う。特殊辞書変換・表示処理S2654では、音声変換実行部2033が認識文字列をユーザが登録した特殊文字変換辞書(第2の辞書)204a3に従って変換処理を行い、変換した結果を表示枠231c1に表示する(S2654)。
In the branch process S2623, when the main control unit 201 determines that the voice input key 231c4 is pressed twice within a predetermined time as a result of the key input state determination, the voice input from the user (S2651) is set as voice. It is captured by the input unit 243 (S2652), converted into a recognized character string by speech recognition based on the speech captured by the speech conversion execution unit 2033 and the speech recognition dictionary 204a1 (S2653), and special dictionary conversion / display processing (S2654) is performed. . In the special dictionary conversion / display process S2654, the speech conversion execution unit 2033 performs a conversion process on the recognized character string according to the special character conversion dictionary (second dictionary) 204a3 registered by the user, and displays the converted result in the display frame 231c1. (S2654).
分岐処理S2623において、主制御部201が、キー入力状態判定結果により送信キー231c3が入力されたと判断した場合には、主制御部201が、中断処理において表示枠231c1に表示された文字列を放送受信装置1に送信し(S2661)、繰り返し処理S2613を終了する。放送受信装置1では情報端末装置2から送信された文字列を文字入力として処理する(S2614)。
In the branch process S2623, when the main control unit 201 determines that the transmission key 231c3 is input based on the key input state determination result, the main control unit 201 broadcasts the character string displayed in the display frame 231c1 in the interruption process. The data is transmitted to the receiving apparatus 1 (S2661), and the repetition process S2613 is terminated. The broadcast receiving apparatus 1 processes the character string transmitted from the information terminal apparatus 2 as a character input (S2614).
なお、通常文字変換辞書及び特殊文字変換辞書の登録処理は第一実施形態と同様に行えばよい。例えば、図27の(c)に示す文字入力表示画面において、ユーザが音声入力キー231c4を所定の時間以上タッチすることにより辞書登録処理を起動するようにしてもよい。
The registration process for the normal character conversion dictionary and the special character conversion dictionary may be performed in the same manner as in the first embodiment. For example, on the character input display screen shown in FIG. 27C, the dictionary registration process may be activated when the user touches the voice input key 231c4 for a predetermined time or more.
本実施形態では情報端末装置2は通常文字変換辞書及び特殊文字変換辞書を記憶している。このため、ユーザが情報端末装置Aで登録した文字列を他の情報端末装置Bで音声入力しても情報端末装置Aの通常文字変換辞書或いは特殊文字変換辞書により文字列に変換されることはない。したがって、使用する際にユーザ認証を行うスマートフォンのような情報端末装置では、音声入力する際のユーザ認証を省略してもよい。なお、本実施形態では放送受信装置1と情報端末装置2との連携をLAN通信部117、221とルータ装置3により行うようにしたが、Bluetoothなどの通信方法により連携するようにしてもよい。
In this embodiment, the information terminal device 2 stores a normal character conversion dictionary and a special character conversion dictionary. For this reason, even if a user inputs a character string registered in the information terminal device A by voice in another information terminal device B, the character string is converted into a character string by the normal character conversion dictionary or the special character conversion dictionary of the information terminal device A. Absent. Therefore, in an information terminal device such as a smartphone that performs user authentication when used, user authentication when performing voice input may be omitted. In the present embodiment, the broadcast receiving device 1 and the information terminal device 2 are linked by the LAN communication units 117 and 221 and the router device 3, but may be linked by a communication method such as Bluetooth.
以上の実施形態では、パスワードを入力するときに直接パスワードを音声入力しないため、他の人に聞かれてもパスワードが知られることはない。
In the above embodiment, since the password is not directly inputted by voice when inputting the password, the password is not known even if it is asked by another person.
また、簡単な文字列を音声入力することで複雑な文字列を入力することができる。
Also, complex character strings can be input by inputting simple character strings by voice.
また、第1の操作入力動作(例えば、音声入力キーが所定の時間以下で1回押下)と第2の操作入力動作(例えば、音声入力キーが所定の時間以下で2回押下)で文字変換辞書を切り替えることにより、通常のキーワードの入力中等にパスワードと同じ文字列を音声入力したときに、誤ってパスワードに対応した文字列に変換され、パスワードが知られてしまうのを防ぐことが可能となる。
In addition, character conversion is performed by the first operation input operation (for example, the voice input key is pressed once within a predetermined time or less) and the second operation input operation (for example, the voice input key is pressed twice at a predetermined time or less). By switching dictionaries, it is possible to prevent the password from being known by accidentally being converted to a character string corresponding to the password when the same character string as the password is input during normal keyword input, etc. Become.
なお、以上の実施形態では日本語の音声による文字入力について説明したが、例えば英語の音声による文字入力においては第1の操作入力動作では音声認識した結果を文字入力とし、第2の操作入力動作では音声認識した文字列を特殊文字変換辞書により変換した結果を文字入力とするようにしてもよい。すなわち、所定の操作入力動作(例えば、音声入力キーが所定の時間以下で2回押下)において音声認識された文字列を文字変換辞書に基づいて変換するようにすればよい。
In the above embodiment, character input by Japanese speech has been described. For example, in character input by English speech, in the first operation input operation, the result of speech recognition is set as character input, and the second operation input operation is performed. Then, the result of converting a character string recognized by speech using a special character conversion dictionary may be used as a character input. That is, a character string recognized by voice in a predetermined operation input operation (for example, a voice input key is pressed twice within a predetermined time or less) may be converted based on a character conversion dictionary.
また、第1実施形態から第4実施形態において、放送受信装置1の音声入力部111を用いて音声の入力を行うものとしたが、これに限らずリモコン120に音声入力部を設け、Bluetoothなどの通信方法により放送受信装置1に送信することで音声入力を行うようにしてもよい。或いは情報端末装置2の音声入力部243により音声の入力を行い、取り込んだ音声のデータをLAN通信部221、ルータ装置3、LAN通信部117を経由して、放送受信装置1に送信するようにしてもよい。或いは、放送受信装置1と情報端末装置2にBluetoothなどの通信部を設け、情報端末装置2から放送受信装置1に音声のデータ送信することで音声入力を行うようにしてもよい。
In the first to fourth embodiments, the voice input unit 111 of the broadcast receiving apparatus 1 is used for voice input. However, the present invention is not limited to this, and the remote control 120 is provided with a voice input unit such as Bluetooth. The voice input may be performed by transmitting to the broadcast receiving apparatus 1 by the communication method. Alternatively, voice is input by the voice input unit 243 of the information terminal device 2, and the captured voice data is transmitted to the broadcast receiving device 1 via the LAN communication unit 221, the router device 3, and the LAN communication unit 117. May be. Alternatively, the broadcast receiving apparatus 1 and the information terminal apparatus 2 may be provided with a communication unit such as Bluetooth, and voice input may be performed by transmitting audio data from the information terminal apparatus 2 to the broadcast receiving apparatus 1.
また、音声認識を放送受信装置1或いは情報端末装置2において行うようにしたが、取り込んだ音声のデータをネットワークに接続されたサーバ装置に送信し、サーバ装置において音声認識を行うようにしてもよい。放送受信装置1或いは情報端末装置2ではサーバ装置で音声認識された文字列情報を受け取り、通常文字変換辞書或いは特殊文字変換辞書による文字列の変換を行うようにすればよい。
Further, although the speech recognition is performed in the broadcast receiving device 1 or the information terminal device 2, the captured speech data may be transmitted to the server device connected to the network, and the speech recognition may be performed in the server device. . The broadcast receiving device 1 or the information terminal device 2 may receive the character string information recognized by the server device and perform character string conversion using the normal character conversion dictionary or special character conversion dictionary.
また、上記では、通常文字変換辞書及び特殊文字変換辞書のそれぞれを選択するための第1操作、第2操作を予め定めておき、ユーザが第1操作又は第2操作のどちらを行ったかに基づいて辞書を切り替えたが、辞書の切替は、ユーザの操作ではなく、プログラムの記述用語から行ってよい。例えば、画面がマークアップ言語を用いて記述されている場合、音声変換実行部1034や辞書登録実行部1037が、文字入力欄の種別、例えばユーザパスワードの入力欄であるか、検索用のキーワードの入力欄であるかを、既述言語のタグ情報を基に判定し、特殊文字列に変更するべき入力欄、例えばユーザパスワードの入力欄が、文字入力の入力先として選択されている際には、認識文字列を変換する際の辞書として特殊文字変換辞書を選択してもよい。また、キーワードの入力欄が選択されている場合には、通常文字変換辞書が選択されるように構成してもよい。これにより、ユーザは、入力欄の選択操作(例えばカーソルを合わせる操作)をするだけで、辞書選択操作をすることなく、通常文字列への変換または特殊文字列への変換をすることができる。
In the above description, the first operation and the second operation for selecting each of the normal character conversion dictionary and the special character conversion dictionary are determined in advance, and based on whether the user has performed the first operation or the second operation. The dictionaries are switched, but the dictionaries may be switched not from user operations but from program descriptive terms. For example, when the screen is described using a markup language, the speech conversion execution unit 1034 or the dictionary registration execution unit 1037 is a character input field type, for example, a user password input field, or a search keyword When the input field to be changed to a special character string, for example, the user password input field is selected as the input destination for character input A special character conversion dictionary may be selected as a dictionary for converting the recognized character string. Further, when the keyword input field is selected, a normal character conversion dictionary may be selected. As a result, the user can perform conversion to a normal character string or conversion to a special character string only by performing an input field selection operation (for example, an operation for aligning the cursor) without performing a dictionary selection operation.
以上、本発明の実施形態の例を、第1実施形態から第5実施形態を用いて説明したが、言うまでもなく、本発明の技術を実現する構成は前記実施形態に限られるものではなく、様々な変形例が考えられる。例えば、ある実施形態の構成の一部を他の実施形態の構成と置き換えることが可能であり、また、ある実施形態の構成に他の実施形態の構成を加えることも可能である。これらは全て本発明の範疇に属するものである。また、文中や図中に現れる数値やメッセージ等もあくまでも一例であり、異なるものを用いても本発明の効果を損なうことはない。
As mentioned above, although the example of embodiment of this invention was demonstrated using 5th Embodiment from 1st Embodiment, it cannot be overemphasized that the structure which implement | achieves the technique of this invention is not restricted to the said embodiment, Various. Various modifications can be considered. For example, part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. These all belong to the category of the present invention. In addition, numerical values, messages, and the like appearing in sentences and drawings are merely examples, and the use of different ones does not impair the effects of the present invention.
前述した本発明の機能等は、それらの一部または全部を、例えば集積回路で設計する等によりハードウェアで実現しても良い。また、マイクロプロセッサユニット等がそれぞれの機能等を実現する動作プログラムを解釈して実行することによりソフトウェアで実現しても良い。ハードウェアとソフトウェアを併用しても良い。
The functions and the like of the present invention described above may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Further, the microprocessor unit or the like may be realized by software by interpreting and executing an operation program that realizes each function or the like. Hardware and software may be used together.
また、図中に示した制御線や情報線は説明上必要と考えられるものを示しており、必ずしも製品上の全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えても良い。
Also, the control lines and information lines shown in the figure are those that are considered necessary for the explanation, and not all control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.
1:放送受信装置、2:情報端末装置、3:ルータ装置、4:外部ネットワーク、101:主制御部、102:ROM、103:RAM、104:ストレージ部、111:音声入力部、113:音声出力部、114:撮像部、116:LAN通信部、120:リモコン、1034:音声変換実行部、1035:ユーザ認証実行部、1036、連携端末管理部、201:主制御部、202:ROM、203:RAM、204:ストレージ部、243:音声入力部、2033:音声変換実行部、2034:連携機器管理実行部
1: broadcast receiving device, 2: information terminal device, 3: router device, 4: external network, 101: main control unit, 102: ROM, 103: RAM, 104: storage unit, 111: audio input unit, 113: audio Output unit, 114: imaging unit, 116: LAN communication unit, 120: remote controller, 1034: voice conversion execution unit, 1035: user authentication execution unit, 1036, linkage terminal management unit, 201: main control unit, 202: ROM, 203 : RAM, 204: storage unit, 243: voice input unit, 2033: voice conversion execution unit, 2034: cooperation device management execution unit
Claims (8)
- 表音文字、及び当該表音文字の読みとは異なる読みを有する文字及び記号の少なくとも一つを含む特殊文字を関連づけた特殊文字変換辞書情報を記憶する特殊文字変換辞書部と、
ユーザが発声した音声情報を、当該音声情報の読みを表す表音文字からなる認識文字に変換し、前記特殊文字変換辞書情報を参照して、前記認識文字を前記特殊文字に変換する音声変換実行部と、
を備えることを特徴とする音声変換装置。 A special character conversion dictionary unit that stores special character conversion dictionary information that associates a special character including at least one of a phonetic character and a character and a symbol having a reading different from the reading of the phonetic character;
Voice conversion execution for converting voice information uttered by a user into recognized characters made of phonetic characters representing the reading of the voice information, and converting the recognized characters into the special characters with reference to the special character conversion dictionary information And
An audio conversion device comprising: - 前記表音文字、及び当該表音文字の読みと同じ読みを有する表意文字及び記号の少なくとも一つを含む通常文字を関連づけた通常文字変換辞書情報を記憶する通常文字変換辞書部と、
前記特殊文字変換辞書部又は前記通常変換辞書部の選択操作を受け付ける辞書選択操作部と、を更に備え、
前記音声変換実行部は、前記特殊文字変換辞書が選択されている場合には、前記特殊文字変換辞書情報を参照して前記認識文字を前記特殊文字に変換し、前記通常文字変換辞書が選択されている場合には、前記通常文字変換辞書情報を参照して前記認識文字を前記通常文字に変換する、
ことを特徴とする請求項1に記載の音声変換装置。 A normal character conversion dictionary unit that stores normal character conversion dictionary information that associates normal characters including at least one of the phonetic characters and ideograms and symbols having the same reading as the phonetic characters;
A dictionary selection operation unit that accepts a selection operation of the special character conversion dictionary unit or the normal conversion dictionary unit,
When the special character conversion dictionary is selected, the speech conversion execution unit converts the recognized character into the special character with reference to the special character conversion dictionary information, and the normal character conversion dictionary is selected. If so, the recognition character is converted to the normal character with reference to the normal character conversion dictionary information,
The speech conversion apparatus according to claim 1. - 前記ユーザの認証を行うユーザ認証実行部を更に備え、
前記音声変換実行部は、前記ユーザ認証実行部の認証が有効な場合に、前記認識文字を前記特殊文字列に変換する、
ことを特徴とする請求項1に記載の音声変換装置。 A user authentication execution unit for authenticating the user;
The voice conversion execution unit converts the recognized character into the special character string when the authentication of the user authentication execution unit is valid.
The speech conversion apparatus according to claim 1. - 前記特殊文字変換辞書部は、前記ユーザ毎に備えられ、
前記ユーザ認証実行部の認証が有効な場合に、前記音声変換実行部は、認証されたユーザに対応付けられた前記特殊文字変換辞書情報を参照して、前記認識文字を前記特殊文字に変換する、
ことを特徴とする請求項3に記載の音声変換装置。 The special character conversion dictionary unit is provided for each user,
When the authentication of the user authentication execution unit is valid, the speech conversion execution unit converts the recognized character into the special character with reference to the special character conversion dictionary information associated with the authenticated user. ,
The voice conversion device according to claim 3. - 前記ユーザが発声した音声情報の入力を受け付ける音声入力部を更に備え、
前記音声変換実行部は、前記音声情報の入力を受け付ける際に、ユーザ認証に用いるための識別音を出力し、前記音声入力部から入力された音声情報に含まれる前記識別音、及び前記音声変換実行部が出力した識別音の比較結果に基づいて、ユーザ認証を行う、
ことを特徴とする請求項3に記載の音声変換装置。 A voice input unit for receiving input of voice information uttered by the user;
The voice conversion execution unit outputs an identification sound for use in user authentication when receiving input of the voice information, the identification sound included in the voice information input from the voice input unit, and the voice conversion Based on the identification sound comparison result output by the execution unit, user authentication is performed.
The voice conversion device according to claim 3. - 前記ユーザの顔画像を撮像する撮像部と、
前記ユーザを固有に識別するユーザ識別情報、及び前記ユーザが予め定められた音声を発声した際の顔画像を関連付けたユーザ認証情報を記憶するユーザ認証情報記憶部と、を更に備え、
前記撮像部は、前記ユーザが前記予め定められた音声を発声する際の顔画像を撮像し、前記ユーザ認証実行部は、前記撮像部から取得した顔画像、及び前記ユーザ認証情報記憶部に記憶された顔画像の比較結果に基づいて、ユーザ認証を行う、
ことを特徴とする請求項3に記載の音声変換装置。 An imaging unit for imaging the user's face image;
A user authentication information storage unit for storing user identification information for uniquely identifying the user and user authentication information associated with a face image when the user utters a predetermined voice;
The imaging unit captures a face image when the user utters the predetermined voice, and the user authentication execution unit stores the face image acquired from the imaging unit and the user authentication information storage unit. User authentication based on the comparison result of the face image
The voice conversion device according to claim 3. - ユーザが発声した音声情報を、当該音声情報の読みを表す表音文字からなる認識文字に変換するステップと、
前記表音文字、及び当該表音文字の読みとは異なる読みを有する文字及び記号の少なくとも一つを含む特殊文字を関連づけて記憶する特殊文字変換辞書情報を参照し、前記認識文字を前記特殊文字に変換するステップと、
を含むことを特徴とする音声変換方法。 Converting speech information uttered by the user into recognized characters composed of phonograms representing the reading of the speech information;
The special character conversion dictionary information that stores the phonogram and the special character including at least one of a character and a symbol having a reading different from the phonogram reading, and stores the recognized character as the special character. Converting to
A speech conversion method comprising: - ユーザが発声した音声情報を、当該音声情報の読みを表す表音文字からなる認識文字に変換するステップと、
前記表音文字、及び当該表音文字の読みとは異なる読みを有する文字及び記号の少なくとも一つを含む特殊文字を関連づけて記憶する特殊文字変換辞書情報を参照し、前記認識文字を前記特殊文字に変換するステップと、
をコンピュータに実行させることを特徴とする音声変換プログラム。 Converting speech information uttered by the user into recognized characters composed of phonograms representing the reading of the speech information;
The special character conversion dictionary information that stores the phonogram and the special character including at least one of a character and a symbol having a reading different from the phonogram reading, and stores the recognized character as the special character. Converting to
A voice conversion program for causing a computer to execute.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2014/080103 WO2016075794A1 (en) | 2014-11-13 | 2014-11-13 | Voice conversion device, voice conversion method, and voice conversion program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2014/080103 WO2016075794A1 (en) | 2014-11-13 | 2014-11-13 | Voice conversion device, voice conversion method, and voice conversion program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016075794A1 true WO2016075794A1 (en) | 2016-05-19 |
Family
ID=55953907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/080103 WO2016075794A1 (en) | 2014-11-13 | 2014-11-13 | Voice conversion device, voice conversion method, and voice conversion program |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2016075794A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6470865A (en) * | 1987-09-11 | 1989-03-16 | Brother Ind Ltd | Language processor based upon voice input |
JPH05290033A (en) * | 1992-04-07 | 1993-11-05 | Nec Off Syst Ltd | Japanese language input device |
JPH1115823A (en) * | 1997-06-26 | 1999-01-22 | Tokyo Gas Co Ltd | Method for setting simply transmission destination address in communication program of electronic mail or the like |
JP2013105440A (en) * | 2011-11-16 | 2013-05-30 | Ntt Docomo Inc | Character input device and character input program |
-
2014
- 2014-11-13 WO PCT/JP2014/080103 patent/WO2016075794A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6470865A (en) * | 1987-09-11 | 1989-03-16 | Brother Ind Ltd | Language processor based upon voice input |
JPH05290033A (en) * | 1992-04-07 | 1993-11-05 | Nec Off Syst Ltd | Japanese language input device |
JPH1115823A (en) * | 1997-06-26 | 1999-01-22 | Tokyo Gas Co Ltd | Method for setting simply transmission destination address in communication program of electronic mail or the like |
JP2013105440A (en) * | 2011-11-16 | 2013-05-30 | Ntt Docomo Inc | Character input device and character input program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11823304B2 (en) | Picture display device, and setting modification method and setting modification program therefor | |
US20150310856A1 (en) | Speech recognition apparatus, speech recognition method, and television set | |
US9928030B2 (en) | Speech retrieval device, speech retrieval method, and display device | |
EP3089157B1 (en) | Voice recognition processing device, voice recognition processing method, and display device | |
KR20140055502A (en) | Broadcast receiving apparatus, server and control method thereof | |
CN103947220A (en) | Display device and method for providing content using the same | |
US20150052169A1 (en) | Method, electronic device, and computer program product | |
CN102566895A (en) | Electronic device and method for providing menu using the same | |
US20150382070A1 (en) | Method, electronic device, and computer program product | |
KR20150034956A (en) | Method for recognizing content, Display apparatus and Content recognition system thereof | |
KR20120083025A (en) | Multimedia device for providing voice recognition service by using at least two of database and the method for controlling the same | |
JP7538273B2 (en) | Video display device | |
KR102511385B1 (en) | Display device | |
WO2016075794A1 (en) | Voice conversion device, voice conversion method, and voice conversion program | |
JP2016029495A (en) | Image display device and image display method | |
KR20130080380A (en) | Electronic apparatus and method for controlling electronic apparatus thereof | |
JP6100328B2 (en) | Video display device | |
KR20130071148A (en) | Method for operating an image display apparatus | |
JP6423470B2 (en) | Video display device | |
CN107948696A (en) | A kind of set-top box text entry method, system and set-top box | |
JP2024160320A (en) | Video display device | |
JP2009037433A (en) | Number voice browser and method for controlling number voice browser | |
JP2013045139A (en) | Network terminal system, terminal device, display device and information retrieval method in network terminal system | |
JP2017060059A (en) | Control program, storage medium, portable communication equipment, program-related information provision device, and program-related information display method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14905932 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14905932 Country of ref document: EP Kind code of ref document: A1 |