WO2016075794A1

WO2016075794A1 - Voice conversion device, voice conversion method, and voice conversion program

Info

Publication number: WO2016075794A1
Application number: PCT/JP2014/080103
Authority: WO
Inventors: 鈴木　基之; 橋本　康宣; 益岡　信夫
Original assignee: 日立マクセル株式会社
Priority date: 2014-11-13
Filing date: 2014-11-13
Publication date: 2016-05-19

Abstract

The purpose of the present invention is to provide a voice conversion technique providing improved usability. In order to achieve this purpose, the present invention: (S412) refers to special character conversion dictionary information that associates each of a plurality of first characters with a special character including at least one character or symbol that is pronounced differently than the first character; and (S423) converts voice information spoken by a user into one or more recognized characters comprising characters that are pronounced the same as the voice information and converts the one or more recognized characters into a special character by reference to the special character conversion dictionary.

Description

Voice conversion device, voice conversion method, and voice conversion program

The present invention relates to a voice conversion device, a voice conversion method, and a voice conversion program, and more particularly to a technique for converting voice uttered by a user.

In recent years, in addition to receiving broadcast waves and viewing programs, the Internet can be used to search and display information on the server device, and to view audio and video content distributed from the server device. Broadcast receiving apparatuses corresponding to services to be used are becoming widespread. When using these services, it is necessary to input a character string such as a keyword to be searched or a character for logging in to the content distribution service. Therefore, in the conventional broadcast receiving apparatus, the user performs the character input operation by pressing the operation button of the remote controller with a finger, but there is a problem that this operation is difficult. For this reason, a method of inputting a character string such as a search by voice recognition has been proposed.

For example, in Patent Document 1, “a keyword is extracted from program information, a speech recognition dictionary in which the keyword is registered together with the speech recognition information is generated, and the keyword is displayed on the screen in response to a user request based on the partial dictionary of the speech recognition dictionary. The user inputs the keyword by voice by selecting from the keyword displayed on the screen "(see summary).

JP 2006-106451 A

文字 Words that are used in daily search are usually used for the character strings such as keywords to be searched, and speech recognition is easy. Therefore, speech input is convenient for easy input of character strings. On the other hand, the account name and password input when logging in to the server apparatus may be a combination of irregular alphanumeric characters or special characters and symbols such as “@”. Therefore, when inputting by voice recognition, the user needs to input one by one the alphanumeric characters constituting the character string one by one, and there is a problem that usability is poor.

An object of the present invention is to provide a more convenient voice conversion technology.

In order to solve the above-mentioned problem, the present invention converts speech information uttered by a user into a recognized character composed of a phonetic character representing the reading of the speech information, and reads the phonetic character and the reading of the phonetic character. The special character conversion dictionary information that stores a special character including at least one of a character and a symbol having a different reading from each other is referred to, and the recognized character is converted into the special character.

By using the technology of the present invention, it is possible to provide a more convenient voice conversion technology. Problems, configurations, and effects other than those described above will be clarified by the following description of the embodiments.

The block diagram which shows the structural example of the broadcast receiver which concerns on this embodiment It is a figure which shows the internal structure of the broadcast receiver which concerns on this embodiment, Comprising: (a) is a software block diagram of a broadcast receiver, (b) shows the dictionary memorize | stored in an audio | voice conversion information storage area. The figure which shows an example of key arrangement | positioning of the remote control 120 The sequence diagram which shows an example of the character input process by the speech recognition by 1st embodiment Sequence diagram showing the flow of dictionary registration processing according to the first embodiment It is a figure which shows the example of a screen display in the dictionary registration process concerning embodiment which concerns on 1st embodiment, (a) shows a special dictionary registration list display screen, (b) shows a new dictionary registration screen, (c) Indicates a dictionary registration change screen. Sequence diagram showing the flow of the character conversion dictionary registration process in the dictionary registration process according to the first embodiment It is a figure which shows the structure of the dictionary information which concerns on 1st embodiment, Comprising: (a) shows a normal character conversion dictionary, (b) shows a special character conversion dictionary. The flowchart which shows the flow of the dictionary conversion input process which concerns on 1st embodiment. Sequence diagram showing the flow of dictionary registration processing including user authentication processing It is a figure which shows the example of a screen display in the dictionary registration process which concerns on 2nd embodiment, (a) shows a special dictionary registration list display screen, (b) shows a new dictionary registration screen, (c) changes dictionary registration Show the screen. Sequence diagram showing the flow of the character conversion dictionary registration process in the dictionary registration process according to the second embodiment The figure which shows the example of the dictionary registration information memorize | stored in the special character conversion dictionary which concerns on 2nd embodiment. The flowchart which shows the flow of the dictionary conversion input process which concerns on 2nd embodiment. Sequence diagram showing an example of character input processing by speech recognition according to the third embodiment The flowchart which shows the flow of the dictionary conversion input process which concerns on 3rd embodiment. Flow chart showing the flow of identification sound determination processing It is a figure which shows the signal waveform used for an identification sound determination process, Comprising: (a1) and (b1) are the output waveforms of an identification sound, (a2) is an audio | voice input waveform when a user actually utters, (b2) is The voice input waveform of the recorded voice, (a3) is the signal waveform after filtering when the user actually utters, (b3) is the signal waveform after filtering of the recorded voice, and (a4) is (a3) (B4) is the waveform of the amplitude detection signal of (b3), (a5) is the waveform of the binary signal of (a4), and (b5) is the binary signal of (b4). Waveform is shown. It is a figure which shows the example of the waveform of an identification sound when the pattern of the timing Ton of outputting the signal of the predetermined frequency F0 in the identification sound setting process S341 and the timing pattern of the period Toff to stop are provided (2 or more), (a ) Shows an output pattern in which the period Ton for outputting the signal of the frequency F0 is substantially the same as the period Toff for stopping, and (b) shows an output pattern of the identification sound in which the period Ton for outputting the signal of the frequency F0 is different from the period Toff of stopping. Show. It is a figure which shows the example of the frequency spectrum of the audio | voice input when the frequency of the identification sound is provided (two or more), Comprising: (a) shows a frequency spectrum when the audio | voice which the user actually uttered is input. , (B) shows the frequency spectrum when the recorded voice is input. Sequence diagram showing an example of character input processing by voice recognition according to the fourth embodiment The flowchart which shows the flow of the dictionary conversion input process which concerns on 4th embodiment. The contents of the lip movement determination process in the face authentication are shown, (a) shows the lip movement, and (b) shows the time series change of the x and y components of the lip movement. The block diagram which shows an example of an internal structure of an information terminal device It is a figure which shows the internal structure of the information terminal device 2, Comprising: (a) is a software block diagram of the information terminal device 2, (b) shows the dictionary memorize | stored in an audio | voice conversion information storage area. Sequence diagram showing an example of character input processing by voice recognition according to the fifth embodiment It is a figure which shows the example of a screen display of an information terminal device, (a) shows the starting screen of an application, (b) shows a device authentication screen, (c) shows a character input screen.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. The following description shows specific examples of the contents of the present invention, and the present invention is not limited to these descriptions. Various modifications by those skilled in the art are within the scope of the technical idea disclosed in this specification. Changes and modifications are possible. In all the drawings for explaining the present invention, components having the same function are denoted by the same reference numerals, and repeated description thereof may be omitted.

<First embodiment>
In the following embodiment, an embodiment in which the sound conversion technology according to the present invention is applied to a broadcast receiving device will be described as an example.However, the present invention is not limited to a broadcast receiving device, but a device that performs sound input, for example, The present invention can be applied to a technology to which a voice input technology can be applied, such as a user login technology in an information processing apparatus such as a PC (Personal Computer), a smartphone, and a tablet terminal device, and an entry / exit management system for a monitoring area.

[Hardware configuration of broadcast receiver]
FIG. 1 is a block diagram showing a configuration example of a broadcast receiving apparatus 1 according to an embodiment of the present invention. As shown in FIG. 1, the broadcast receiving apparatus 1 is installed in a home, for example, and is electrically connected to the router 3 via a wireless or wired LAN (Local Area Network) (not shown). For example, an information terminal device 2 such as a smartphone or a tablet terminal is connected to the router 3 via a wireless LAN, and these information terminal devices 2 are connected to the broadcast receiving device 1 via the router 3 and the LAN. The router device 3 is also connected to an external public network 4 such as the Internet. Furthermore, the broadcast receiving apparatus 1 is wirelessly connected to the remote controller 120 by infrared communication. In addition, the broadcast receiving apparatus 1 includes an antenna 107 for receiving broadcast waves from the broadcast station 5 and receives public broadcast waves.

The broadcast receiving apparatus 1 includes a main control unit 101, a ROM (Read Only Memory) 102 or a RAM (Random Access Memory) 103, a storage unit 104, an external interface (hereinafter referred to as an “I / F unit”) unit 105, Operation unit 106, antenna 107 for receiving broadcast waves from broadcasting station 5, tuner / demodulation unit 108 connected to antenna 107, separation unit 109, decoder unit 110, audio input unit 111, audio processing unit 112, audio output Unit 113, imaging unit 114, image processing unit 115, display unit 116, and LAN communication unit 117, and these components are electrically connected to each other via a system bus 100. The system bus 100 is a data communication path for performing data transmission / reception between the main control unit 101 and each unit in the broadcast receiving apparatus 1.

The main control unit 101 includes a CPU (Central Processing Unit) that performs arithmetic and control processing. The main control unit 101 develops (loads) various operation programs and data stored in the ROM 102, the RAM 103, and / or the storage unit 104 into the RAM 103 and executes them. Thereby, an operation program (software) cooperates with CPU (hardware), and various functions of broadcast receiving device 1 are realized. The main control unit 101 controls the entire broadcast receiving apparatus 1.

The ROM 102 is a memory in which a basic operation program such as an operating system and other operation programs are stored. For example, a rewritable ROM such as an EEPROM (Electrically Erasable Programmable や ROM) or a flash ROM is used.

The RAM 103 serves as a work area for executing a basic operation program and other operation programs (applications). The ROM 102 and the RAM 103 may be configured integrally with the main control unit 101. Further, the ROM 102 may not use an independent configuration as shown in FIG. 1 but may use a partial storage area in the storage unit 104.

The storage unit 104 is configured using a device capable of holding and storing data even when the power is turned off, such as an HDD (Hard Disc Drive), and stores an operation program, an operation setting value, and the like of the broadcast receiving device 1. Remember. Also, a new operation program (application) downloaded from the server device connected to the server device (not shown) via the LAN communication unit 117, the router device 3, and the external network 4, various data created by the operation program, etc. Can be stored. In addition, contents such as moving images, still images, and voices acquired from broadcast waves or downloaded from server devices on the network can be stored.

The external I / F unit 105 is an interface group for extending the function of the broadcast receiving apparatus 1. In this embodiment, the external I / F unit 105 is a video input I / F unit 105 a, an audio input I / F unit 105 b, and a USB (Universal Serial Interface). Bus) I / F unit 105c. The video input I / F unit 105a and the audio input I / F unit 105b input video signals / audio signals from an external video / audio output device. The USB I / F unit 105c connects a USB device such as a keyboard or a memory card. When the broadcast receiving apparatus 1 records a digital broadcast program on an HDD apparatus or the like connected to the outside, the HDD apparatus or the like may be connected to the USB I / F unit 105c. Further, the video input I / F unit 105a and the audio input I / F unit 105b may input video and audio together using HDMI (High-Definition Multimedia Interface: registered trademark).

The operation unit 106 is configured by using an input device for a user to input an operation instruction to the broadcast receiving apparatus 1. In the present embodiment, the operation unit 106 includes an operation key 106 a in which button switches are arranged and a remote control reception unit 106 b that receives an infrared signal from the remote control 120. The broadcast receiving apparatus 1 may be operated using a keyboard or the like connected to the USB I / F unit 105c. In addition, when operating the broadcast receiving apparatus 1 by using the information terminal apparatus 2 connected to the local network in the house, other PCs, etc., connected to the router apparatus 3 by the LAN communication unit 117, these information terminal apparatuses 2 and the remote controller 120 function as the operation unit 106.

The tuner / demodulator 108 demodulates a TS (Transport Stream) signal by extracting a channel signal selected by the user from the broadcast wave received by the antenna 107.

The separation unit 109 separates the TS signal into packetized video data, audio data, and accompanying information data, and outputs them to the decoder unit 110.

The decoder unit 110 includes an audio decoder 110a, a video decoder 110b, and an information decoder 110c. The audio data packetized by the separation unit 109 is output to the audio decoder 110a, the video data is output to the video decoder 110b, and the accompanying information data is output to the information decoder 110c.

The audio decoder 110a decodes the audio data output from the separation unit 109 and outputs the audio data to the audio processing unit 112 as an audio signal. The video decoder 110b decodes the video data output from the separation unit 109 and outputs the decoded video data to the image processing unit 115 as a video signal. The information decoder 110c processes the accompanying information data output from the separation unit 109, and acquires SI (Service Information) information including program information such as the program name, genre, broadcast start / end date and time of each program in particular. .

The audio input unit 111 is a microphone, and takes in external audio and outputs it to the audio processing unit 112.

The audio processing unit 112 performs A / D (Analog / Digital Converter) conversion on the audio captured by the audio input unit 111, or performs D / A conversion (Digital / Analog® Converter) on the audio signal output to the audio output unit 114. To do.

The audio output unit 113 is a speaker and outputs the audio signal processed by the audio processing unit 112.

The imaging unit 114 captures an image around the broadcast receiving apparatus 1 by converting light input from a lens into an electrical signal using an electronic device such as a CCD (Charge-Coupled Device) or a CMOS (Complementary-Metal-Oxide-Semiconductor) sensor. It is a camera that outputs to the processing unit 115.

The image processing unit 115 performs format conversion, menu and other OSD (On Screen Display) signal superimposition processing on the input video signal as necessary.

The display unit 116 is a display device such as a liquid crystal panel, and displays the video signal processed by the image processing unit 115.

The LAN communication unit 117 is connected to the router device 3 by wire or wirelessly, and transmits / receives information to / from a device connected to a local network in a home or a server device connected to an external network 4 such as the Internet. Further, the information terminal device 2 is connected to the router device 3, and the user can operate the broadcast receiving device 1 using the information terminal device 2. The LAN communication unit 117 may further include another communication unit such as the Bluetooth (registered trademark) communication unit or the NFC (Near Field Communication) communication unit in the broadcast receiving device 1.

In the above description, the audio input unit 111 and the imaging unit 114 are built in the broadcast receiving apparatus 1, but a camera or a microphone provided outside via the USB I / F unit 105c is used. Also good. The communication from the remote control 120 to the remote control receiving unit 106b is configured to be performed by infrared rays, but the present invention is not limited to this, and other communication methods such as Bluetooth may be used.

Also, the broadcast receiving apparatus 1 may be a broadcast recording / playback apparatus such as a BD (Blu-ray Disc: registered trademark) recorder or HDD recorder, an STB (Set Top Box), or the like. When the broadcast receiving apparatus 1 is a DVD recorder, HDD recorder, STB, or the like, a video signal output unit and an audio signal output unit may be provided instead of the display unit 116 and the audio output unit 113. By connecting an external monitor and an external speaker to the video signal output unit and the audio signal output unit, an operation similar to that of the broadcast receiving apparatus 1 of the present embodiment is possible.

[Broadcast receiving device software configuration]
FIG. 2 is a diagram illustrating an internal configuration of the broadcast receiving apparatus 1 according to the present embodiment, in which (a) is a software configuration diagram of the broadcast receiving apparatus 1 and (b) is stored in a voice conversion information storage area. Indicates the dictionary to be used. The software configuration diagram of FIG. 2A shows a software configuration in the ROM 102, the RAM 103, and the storage unit 104.

The basic operation program 1021 stored in the ROM 102 is expanded in the RAM 103, and the main control unit 101 executes the expanded basic operation program to constitute the basic operation execution unit 1031.

In addition, the application program 1041, content processing program 1042, voice conversion program 1043, user authentication program 1044, cooperative terminal management program 1045, and dictionary registration program 1046 stored in the storage unit 104 are expanded in the RAM 103, and further, the main control unit The application execution unit 1032, the content processing execution unit 1033, the voice conversion execution unit 1034, the user authentication execution unit 1035, the linked terminal management execution unit 1036, and the dictionary registration execution unit are executed by the 101 executing the expanded operation programs. 1037 is configured. Further, the RAM 103 includes a temporary storage area 1038 for temporarily storing data created when each operation program is executed as necessary.

In addition, the storage unit 104 stores video content downloaded from a server device on the network as recorded content, a content information storage area 104a for managing information related to the recorded content, and the like. A voice conversion information storage area 104b for storing a dictionary to be converted, a user authentication information storage area 104c for storing user authentication information including voice or image data for user authentication, and a cooperative operation with the broadcast receiving apparatus 1 are possible. It is assumed that a cooperation terminal information storage area 104d for storing identification information and the like of the information terminal device 2 and various information storage areas 104e for storing other various information are provided.

In the voice conversion information storage area 104b, as shown in FIG. 2B, the input voice is recognized, the voice components (for example, phonemes and syllables) are recognized, and the phonetic characters (for example, in Japanese). Speech recognition dictionary 104b1 for converting to a character string (hereinafter referred to as "recognized character string") consisting of hiragana, katakana, and English alphabet, including characters and symbols of different types and shapes having the same sound (reading) as the recognized character string Normal character conversion dictionary (first dictionary) 104b2 for converting to ideographic characters (for example, kanji in Japanese), and special character conversion dictionary for converting recognized character strings to phonograms and ideograms of different sounds Assume that (second dictionary) 104b3 is stored.

In addition to the above-mentioned phonetic characters, phonemes and character types are almost completely one-to-one correspondence, for example, hiragana and katakana in Japanese, as well as multiple letters in the same sound, such as the alphabet in English. In the case of corresponding (including “J” in “Japan” and “G” in “Germany”), the same character is used for different sounds (for example, “a” in “apple” and “a” in “ace”) .

Also, the above ideographic characters include numbers and symbols. That is, for example, when a voice corresponding to “and” is input in English, the recognized character string is converted to “and”, and “and” and “&” consisting of three alphabetic character strings are usually converted by normal character conversion. It may be provided as a conversion candidate example and configured so that the user can select it. Alternatively, the basic character string “and” may be registered in association with the symbol “&” in a special character conversion dictionary instead of the normal character conversion dictionary.

In the following, in order to simplify the explanation, the main control unit 101 executes the basic operation processing to control each operation block by expanding the basic operation program 1021 stored in the ROM 102 into the RAM 103 and executing it. The unit 1031 is described as performing control of each operation block. The same description is made for other operation programs.

The application execution unit 1032 executes various operation programs downloaded from a server device or the like. Each application is activated when the operation unit 106 receives an operation from the user and the user selects an application activation icon displayed on the display unit 116.

The content processing execution unit 1033 accumulates content data from the server device in advance in the content information storage area 104a via the external network 4, reproduces the accumulated content, and displays it on the display unit 116 (download reproduction). Alternatively, the content processing execution unit 1033 receives content data and content information distributed from the server device via the external network 4, reproduces the sequentially received video, audio, and the like and displays them on the display unit 116 (streaming). Playback).

The voice conversion execution unit 1034 converts the user's voice captured by the voice input unit 111 into a recognized character string based on the voice recognition dictionary 104b1, and inputs an operation input such as channel selection of the broadcast receiving apparatus 1 or a voice recognized character string. Characters are input after being converted into a predetermined character string according to the dictionaries 104b2 and 104b3.

The user authentication execution unit 1035 authenticates the user based on the user authentication information stored in the user authentication information storage area 104c and the user's voice captured by the voice input unit 111 or the user's face image captured by the imaging unit 114. I do.

The linked terminal management execution unit 1036 registers and manages the information terminal device 2 connected to the external network 4 such as the home local network or the Internet, and the broadcast receiving device 1 can perform various operations according to the operation input from the registered information terminal device 2. Execute the operation.

The operation programs may be stored in advance in the ROM 102 and / or the storage unit 104 at the time of product shipment. It may be obtained via a LAN communication unit 117 from a server device on the Internet 4 after product shipment. Further, the respective operation programs stored in a memory card, an optical disk or the like may be acquired via the USB interface unit 105c or the like.

[Remote control key layout]
FIG. 3 is a diagram illustrating an example of the key arrangement of the remote controller 120.

In the remote control shown in FIG. 3, a power key 120a1, a broadcast wave selection key (terrestrial digital, BS, CS) 120a2, a channel character input key (1-12) 120a3, a volume UP / DOWN key 120a4, a channel UP / DOWN key 120a5, Input switch key 120a6, program guide key 120a7, Data key 120a8, audio input key 120a9, menu key 120a10, return key 120a11, cursor keys (up, down, left, right) 120a12, enter key 120a13, color keys (blue, red) , Green, yellow) 120a14. Other operation keys may be further displayed.

The power key 120a1, the broadcast wave selection key 120a2, and the like have the same functions as the operation keys of a known TV remote controller, and will not be described in detail. The voice input key 120a9 is an operation key prepared for the voice input function of this embodiment.

[Voice input process]
FIG. 4 is a sequence diagram showing an example of character input processing by voice recognition according to the present embodiment. Hereinafter, the flow of character input processing by voice recognition will be described in the order of steps in FIG. First, the broadcast receiving apparatus 1 displays a character input screen (S400). The character string input screen is displayed, for example, when an account name or a password for logging in to the server is input via the external network 4 when the content distributed from the server device is streamed or downloaded.

Next, when the user presses the voice input key 120a9 of the remote controller 120 (S401), key input information is transmitted from the remote controller 120 to the broadcast receiving apparatus 1 (S402).

The main control unit 101 of the broadcast receiving apparatus 1 determines the state (operation input state) where the voice input key 120a9 is pressed based on the key input information (S403).

Next, the main control unit 101 executes a branch process (S404) based on the operation input state determination result. When the main control unit 101 determines in the operation input state determination process that the voice input key 120a9 is pressed once within a predetermined time or less, the character string (recognized character string) recognized by the voice conversion execution unit 1034 is recognized. The normal character conversion dictionary (first dictionary) is selected as a dictionary to be converted into another character string (normal character string) (S411).

If it is determined from the operation input state determination result that the voice input key 120a9 is pressed twice within a predetermined time, the voice conversion execution unit 1034 converts the recognized character string into another character string (special character string). The special character conversion dictionary (second dictionary) is selected as the dictionary to be executed (S412).

Next, the user utters a voice (S421), the voice is captured by the voice input unit 111 (S422), and voice conversion is executed based on the captured voice and the voice recognition dictionary 104b1 stored in the voice conversion information storage area 104b. The unit 1034 performs voice recognition processing to convert the voice into a recognized character string (S423).

Next, the speech conversion execution unit 1034 converts the recognized character string into a normal character string or a special character string by dictionary conversion input processing described later, and sets the converted result as character input (S424).

The normal character conversion dictionary stores general words and phrases and is stored in the ROM 102 and / or the storage unit 104 in advance at the time of product shipment. Alternatively, it may be acquired via a LAN communication unit 117 from a server device on the Internet 4 after product shipment. Further, a normal character string conversion dictionary stored in a memory card, an optical disk or the like may be acquired via the USB interface unit 105c or the like. Or you may make it a user register by the dictionary registration process mentioned later.

Next, an example of dictionary registration processing will be described with reference to the sequence diagram of FIG. 5 and the screen display example of FIG. FIG. 5 is a sequence diagram showing the flow of dictionary registration processing. 6A and 6B are diagrams showing a screen display example in the dictionary registration processing according to the embodiment. FIG. 6A shows a special dictionary registration list display screen, FIG. 6B shows a new dictionary registration screen, and FIG. The registration change screen is shown.

In the dictionary registration process shown in FIG. 5, for example, when the user presses the voice input key 120a9 of the remote controller 120 for a predetermined time or more, the dictionary registration execution unit 1037 is activated. Hereinafter, the dictionary registration execution unit 1037 executes the dictionary registration process in the order of steps in FIG. In the dictionary registration process, the process is repeated (S500) until the user selects the end of dictionary registration. In the repetitive processing S500, first, a list of character strings registered in the dictionary (see FIG. 6A) is displayed on the display unit 116 (S501).

6A, on the special dictionary registration list display screen, the registration number 6a1, the voice-recognized character string (recognized character string) 6a2, and the special character string corresponding to the recognized character string ("Conversion" in the figure) 6a3, the selection frame 6a4, and the function 6a5 assigned to the color key 120a14 of the remote controller 120 are displayed. Here, as functions assigned to the color key 120a14, “red” is newly registered, “blue” is registered and changed, “yellow” is registered and deleted, and “green” is a normal character conversion dictionary (first dictionary) and a special character conversion dictionary. An example of assignment to (second dictionary) switching is shown.

Next, the user performs process selection input using the remote controller 120 (S502). For example, by moving the selection frame 6a4 with the up and down cursor keys 120a12 of the remote controller 120 and pressing the "blue" or "yellow" key of the color key 120a14, the processing for changing or deleting the dictionary registration contents displayed in the selection frame Can be selected.

When the user selects [New Registration] (when the “red” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 displays a new dictionary registration screen (see FIG. 6B). 116 (S511), and a character conversion dictionary registration process S512 described later is executed.

(B) of FIG. 6 shows a display example of a new dictionary registration screen. The new dictionary registration screen includes a new registration number 6b1, a character string input frame 6b2 before conversion by speech recognition input (an input frame in which a recognized character string is displayed), a character string input frame 6b3 after conversion, and color keys of the remote controller 120. The function 6b4 assigned to 120a14 is displayed. When the user selects the character string input frame 6b2 before conversion or the character string input frame 6b3 after conversion using the up and down cursor keys 120a12 of the remote controller 120, the selected character string input frame is displayed as a thick line frame. By pressing the enter key 120a13 of the remote controller 120 with the input frame selected, the character string before conversion or the character string after conversion is input.

Alternatively, the soft keyboard 6b5 may be displayed when a character string is input to the converted character string input frame 6b3. The user may input a converted character string by selecting and determining a character with the soft keyboard 6b5 by operating the cursor of the remote controller 120, or by operating the channel character input keys (1 to 12) 120a3 of the remote controller 120 for conversion. You may input a character string into the subsequent character string input frame 6b3. Further, with the converted character string input frame 6b3 selected, for example, the voice input key 120a9 of the remote controller 120 may be depressed to input the converted character string by voice input.

If the recognized character string displayed in the character string input frame 6b2 on the new dictionary registration screen is different from the character string that the user tried to recognize, the character string input frame 6b2 is selected again and the remote controller 120 is determined. By depressing the key 120a13, the user can speak again and input voice, and voice recognition processing can be performed again. Alternatively, by separately assigning a function for selecting voice input to the color key 120a14 again (for example, by assigning a “blue” key), the user can speak again to input voice and perform voice recognition processing again. It may be.

When the user selects [Registration Change] (when the “blue” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 displays the dictionary registration change screen ((c) in FIG. 6). (S513), and a character conversion dictionary registration process S514 described later is executed.

Fig. 6 (c) shows a display example of the dictionary registration change screen. The dictionary registration change screen includes a registration number 6c1 of a character string to be changed, a character string input frame 6c2 before conversion by speech recognition input (an input frame in which a recognized character string is displayed), a character string input frame 6c3 after conversion, and the remote controller 120. The function 6c4 assigned to the color key 120a14 is displayed. On the dictionary registration change screen, the recognized character string currently registered in the dictionary and the corresponding special character string are displayed first, and either the recognized character string after change or the special character string or both are changed and registered be able to. Note that a soft keyboard may be further displayed as in the new dictionary registration screen when a character string is input to the converted character string input frame 6c3.

When the user selects [Delete Registration] (when the “yellow” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 deletes the registration of the character string in the selection frame portion (S515).

When the user selects [Switch Dictionary] (when the “green” key of the color key 120a14 is pressed), whether the dictionary registration execution unit 1037 performs the registration process of the normal character conversion dictionary or the special character conversion dictionary The registration process is switched (S516). Thereby, the dictionary displayed when it returns to dictionary registration character string list display processing S501 by repetition processing can be changed. When the user selects registration processing for the normal character conversion dictionary, the dictionary registration execution unit 1037 has the same configuration as the special dictionary registration list display screen shown in FIG. 6A and is registered in the normal dictionary. The normal dictionary registration list display screen for displaying the recognized character string and the normal character string is displayed, and the user can perform new registration, change, and deletion processing as described above.

When [Return] is selected (when the return key 120a11 of the remote controller 120 is pressed), the dictionary registration process is terminated by the interruption process (S517), and the repeat process S500 is terminated.

Next, the character conversion dictionary registration processing (S512 and S514) will be described with reference to the sequence diagram of FIG. 7 and the screen display example of FIG. FIG. 7 is a sequence diagram showing the flow of the character conversion dictionary registration process in the dictionary registration process.

In FIG. 7, the process is repeated (S700) until the user selects the end of dictionary registration. In the repetitive process S700, first, the user inputs a process selection using the remote controller 120 (S701). For example, in dictionary registration processing S512 in new dictionary registration, the character string input frame 6b2 before conversion or the character string input frame 6b3 after conversion by voice recognition input shown in FIG. 6B is selected by the up and down cursor keys 120a12 of the remote controller 120. The process is selected by pressing the “red” key of the color key 120a14. In addition, in the dictionary registration process S514 in the dictionary registration change, the character string input frame 6c2 before conversion or the character string input frame 6c3 after conversion by voice recognition input shown in FIG. Selection and processing selection is performed by pressing the “red” key of the color key 120a14.

Next, the dictionary registration execution unit 1037 executes a branch process S702 according to the selected process.

When the user selects [input voice recognition character string] (when the character string input frame 6b2 or the character string input frame 6c2 is selected and the determination key 120a13 of the remote controller 120 is pressed), the user inputs voice. (S703) is captured by the speech input unit 111 (S704), the speech captured by the speech conversion execution unit 1034 is converted into a recognized character string by speech recognition based on the speech recognition dictionary 104b1, and the recognized character string is converted into a character string input frame 6b2. Or, it is displayed in the character string input frame 6c2 (S705).

When the user selects [input character string after conversion] (when the character string input frame 6b3 or the character string input frame 6c3 is selected and the determination key 120a13 of the remote controller 120 is pressed), the user selects the channel of the remote controller 120. The converted character string (ordinary character string or special character string) is input using the character input key 120a3 or the like (S706), and the character string input by the dictionary registration execution unit 1037 is input to the character string input frame 6b3 or the character string. It is displayed in the input frame 6c3 (S707). As a method for inputting the converted character string, a character string may be input by displaying a soft keyboard as shown in FIG.

When the user selects [Registration] (when the “red” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 and the character string input box (6b2 or 6c2) for the recognized character string and after conversion A dictionary of normal character conversion dictionary (first dictionary) or special character conversion dictionary (second dictionary) switched by dictionary switching processing S516 in association with the character string displayed in the character string input frame 6b3 or 6c3. Information is stored in the normal character conversion dictionary 104b2 or special character conversion dictionary 104b3 of the storage unit 104 (S708).

When the user selects [Return] (when the return key 120a11 of the remote controller 120 is pressed), the dictionary registration execution unit 1037 ends the character string conversion dictionary registration process by the dictionary interruption process (S709), and repeats the process S700. Exit.

The dictionary registration information will be described with reference to FIG. FIG. 8 is a diagram showing the configuration of dictionary information, where (a) shows a normal character conversion dictionary and (b) shows a special character conversion dictionary.

As shown in FIG. 8A, the normal character conversion dictionary has a registration number 8a1 as dictionary information, a character string (recognized character string) 8a2 before conversion by speech recognition input, and a character string (converted character string) 8a3 after conversion. Remember. Further, as shown in FIG. 8B, the special character conversion dictionary includes, as dictionary information, a registration number 8b1, a character string (recognized character string) 8b2 before conversion by voice recognition input, and a character string after conversion (converted character). Column) 8b3 is stored. Thereby, for example, in the normal character conversion dictionary, the character string “Hirake Sesame” before conversion by voice recognition input is converted into a character string “open sesame”. On the other hand, in the special character conversion dictionary, the character string “Hirake Sesame” before conversion by voice recognition input is converted to a character string “6922 # 7MgkRH”.

With reference to FIG. 9, the dictionary conversion input process shown in step S424 of FIG. 4 will be described. FIG. 9 is a flowchart showing the flow of dictionary conversion input processing.

In the dictionary conversion input process, first, the speech conversion execution unit 1034 confirms whether the recognized character string recognized in S423 is registered in the selected character conversion dictionary (S901). For example, when the normal character conversion dictionary is selected in S411, the speech conversion execution unit 1034 confirms whether the character string recognized as speech is registered in the normal character conversion dictionary, and the special character conversion dictionary is determined in S412. If it is selected, it is confirmed whether or not a character string that has been voice-recognized is registered in the special character conversion dictionary.

Next, the voice conversion execution unit 1034 performs a branch process (S902) based on the confirmation result. If the recognized character string is not registered in the character conversion dictionary (S902 / No), the speech conversion execution unit 1034 displays an error such as not being registered in the dictionary (S903), and ends the process. If the speech recognized character string is registered in the character conversion dictionary (S902 / Yes), the speech conversion execution unit 1034 converts the recognized character string into a normal character string or a special character string according to the information in the character conversion dictionary. (S904) The converted character string is used as an input character, and the process is terminated.

In the above embodiment, in inputting characters such as an account and a password when logging in to the server device, a complicated character string can be obtained by inputting a simple character string by voice recognition once registered in the character conversion dictionary. Can be easily input, and the usability of the user can be improved.

In the dictionary registration character string list display processing S501, user authentication using a password or the like may be performed when displaying a special character conversion dictionary used when inputting a character string such as a password. Or, when displaying or changing the conversion character string, display a mask like “●●●” (display to hide the input character string) so that the conversion character string is not understood in the list display of the special character string conversion dictionary 6 may display a screen as shown in FIG. 6C by performing user authentication using a password or the like. As a result, it is possible to prevent the registered password character string from being known to anyone other than those registered in the special character conversion dictionary.

In the dictionary registration process, the normal character conversion dictionary and the special character conversion dictionary may be displayed in a list simultaneously without switching the dictionary display. When displaying simultaneously, it displays so that it may be understood whether it is registered into either a normal character conversion dictionary or a special character conversion dictionary. For example, “N” is added to the head of the registration number of the character string registered in the normal character conversion dictionary, and “S” is added to the head of the registration number of the character string registered in the special character conversion dictionary.

Further, when the recognized character string is not registered in the character conversion dictionary in the dictionary conversion input process, the recognized character string may be directly used as the input character without displaying an error.

Second Embodiment
In the second embodiment, authentication of a user who has input a voice is performed, and when a character string registered by a person other than the user registered in the dictionary is input by voice, character string conversion is not performed. Hereinafter, the dictionary registration processing according to the second embodiment will be described with reference to FIGS. 10 and 11. FIG. 10 is a sequence diagram showing the flow of dictionary registration processing including user authentication processing. FIG. 11 is a diagram showing a screen display example in the dictionary registration processing according to the second embodiment, where (a) shows a special dictionary registration list display screen, (b) shows a new dictionary registration screen, and (c) Indicates a dictionary registration change screen.

In FIG. 10, in the dictionary registration process, the process is repeatedly performed (S1000) until the user selects the end of dictionary registration. In the iterative process S1000, first, the dictionary registration execution unit 1037 displays a list of character strings registered in the dictionary on the display unit 116 (S1001).

FIG. 11A shows a display example of the dictionary registration character string list display screen. The special dictionary registration list shown in FIG. 11A displays a registration number 11a1, a pre-conversion character string 11a2, a post-conversion character string 11a3, a selection frame 11a4, and a function 11a5 assigned to the color key 120a14 of the remote controller 120. Here, as functions assigned to the color key 120a14, “red” is newly registered, “blue” is registered and changed, “yellow” is registered and deleted, and “green” is assigned to switch between the normal character conversion dictionary and the special character conversion dictionary. An example is shown. In addition, character strings that require authentication, such as registration numbers S4 and S5, are masked so that the converted character strings are not known.

Next, the user performs process selection input using the remote controller 120 (S1002). For example, when the user moves the selection frame 11a4 with the up and down cursor keys 120a12 of the remote controller 120 and presses the "blue" or "yellow" key of the color key 120a14, the user changes the dictionary registration contents displayed in the selection frame. Processing or deletion processing can be selected.

Next, according to the user process selection input S1002, the dictionary registration execution unit 1037 executes the branch process S1010.

When the user selects [New Registration] (when the “red” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 displays a new dictionary registration screen on the display unit 116 (S1011). A character conversion dictionary registration process (S1012) is executed.

Fig. 11B shows a display example of the new dictionary registration screen. The new dictionary registration screen shown in FIG. 11B includes a new registration number 11b1, a character string (recognized character string) input box 11b2 before conversion by voice recognition input, and a character string (special character string) input box after conversion. 11b3 and the function 11b4 assigned to the color key 120a14 of the remote controller 120 are displayed. Here, “red” is assigned as a character string that requires user authentication at the time of conversion, and “blue” is assigned as a character string that does not require user authentication at the time of conversion.

When the user selects [Registration Change] (when the “blue” key of the color key 120a14 is pressed), the user authentication execution unit 1035 performs the user conversion when converting the character string recognized by the character string to be changed. Branch processing (S1013) is performed depending on whether or not the character string requires authentication.

If the selected character string is a character string that does not require user authentication, the dictionary registration execution unit 1037 displays a dictionary registration change screen on the display unit 116 (S1014), and executes a character conversion dictionary registration process S1015 described later. .

Fig. 11 (c) shows a display example of the dictionary registration change screen. In FIG. 11C, the dictionary registration change screen shows the registration number 11c1 of the character string to be changed, the character string input frame 11c2 before conversion by voice recognition input, the character string input frame 11c3 after conversion, and the color keys of the remote controller 120. The function 11b4 assigned to 120a14 is displayed. Here, “red” is assigned as a character string that requires user authentication during conversion, and “blue” is assigned as a character string that does not require user authentication during conversion. In addition, the registered character string is displayed first, and either the character string before conversion, the character string after conversion, or both can be changed.

When the selected character string is a character string that requires user authentication, the user authentication execution unit 1035 performs user authentication processing (S1016), and performs branch processing (S1017) based on the authentication result.

In the branch process S1017, when the user authentication is invalid (when it is determined that the user registered in the dictionary and the user who has input the voice are different), the user authentication execution unit 1035 displays that the authentication is invalid (S1018).

In the branch process S1017, when the user authentication is valid (when it is determined that the user registered in the dictionary and the user who entered the voice are the same), the user authentication execution unit 1035 is authenticated to the dictionary registration execution unit 1037. The result is output, and the dictionary registration execution unit 1037 displays a dictionary registration change screen on the display unit 116 as in S1014 (S1019), and executes a character conversion dictionary registration process S1020 described later.

When the user selects [Delete Registration] (when the “yellow” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 deletes the registration of the character string in the selection frame portion (S1021).

When the user selects [Dictionary switching] (when the “green” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 performs registration processing of the normal character conversion dictionary from the user or special characters. An operation for switching whether to perform registration processing of the conversion dictionary is accepted, and the dictionary to be registered is switched according to the operation result (S1022). As a result, the dictionary to be displayed when the process returns to the dictionary registration character string list display process S1001 by the repetition process is switched.

When the user selects [Return] (when the return key 120a11 of the remote controller 120 is pressed), the dictionary registration execution unit 1037 ends the dictionary registration process by the interruption process (S1023), and ends the repeat process S1000.

In the deletion processing in the above embodiment, when deleting a character string that requires authentication, the user authentication execution unit 1035 performs user authentication, and the dictionary registration execution unit 1037 deletes only when the authentication is valid. You may make it perform.

Next, the character conversion dictionary registration process (S1012, S1015, and S1020 in FIG. 10) in the second embodiment will be described with reference to the screen display example in FIG. 11 and the sequence diagram in FIG. FIG. 12 is a sequence diagram showing the flow of the character conversion dictionary registration process in the dictionary registration process of the second embodiment.

In FIG. 12, the repetitive process (S1200) is executed until the user selects the end of dictionary registration. In the repetitive process S1200, first, the user inputs a process selection using the remote controller 120 (S1201). For example, in the character conversion dictionary registration process S1012 in the new dictionary registration, the character string input frame 11b2 before conversion or the character string input frame 11b3 after conversion by voice recognition input is selected by the upper and lower cursor keys 120a12 of the remote controller 120, and the color key 120a14 “ Processing selection is performed by pressing the “red” or “blue” key. In addition, in the character conversion dictionary registration processing S1015 and S1020 in the dictionary registration change, the character string input frame 11c2 before conversion or the character string input frame 11c3 after conversion by voice recognition input is selected by the upper and lower cursor keys 120a12 of the remote controller 120, and the color key. The process is selected by pressing the “red” or “green” key of 120a14.

Next, the dictionary registration execution unit 1037 executes the branch process S1202 according to the result of the process selected by the user.

When the user selects [Voice recognition character string input] (when the character string input frame 11b2 or the character string input frame 11c2 is selected and the determination key 120a13 of the remote controller 120 is pressed), voice input from the user ( S1203) is captured by the voice input unit 111 (S1204) and converted into a recognized character string by voice recognition based on the voice captured by the voice conversion execution unit 1034, and the recognized character string is converted into the character string input frame 11b2 or the character string input frame. 11c2 is displayed (S1205).

When the user selects [input character string after conversion] (when the character string input frame 11b3 or the character string input frame 11c3 is selected and the determination key 120a13 of the remote controller 120 is pressed), the user selects the channel of the remote controller 120. The converted character string is input using the character input key 120a3 or the like (S1206), and the dictionary registration execution unit 1037 displays it in the character string input frame 11b3 or the character string input frame 11c3 (S1207).

When the user selects [Registration with user authentication] (when the “red” key of the color key 120a14 is pressed), the user authentication execution unit 1035 performs an authentication information acquisition process (S1208), and the user authentication execution unit The authentication information acquired in 1035 is stored in the user information storage area 104c in association with the registration number. Further, the dictionary registration execution unit 1037 has switched the character string displayed in the character string input frame 11b2 or 11c2 before conversion by the voice recognition input and the character string input frame 11b3 or 11c3 after conversion by the dictionary switching process S1022. It is registered in the character conversion dictionary or special character conversion dictionary (S1209). The registered dictionary information is stored in the voice conversion information storage area 104b of the storage unit 104.

When the user selects [No User Authentication Registration] (when the “blue” key of the color key 120a14 is pressed), the dictionary registration execution unit 1037 performs the character string input frame 11b2 or 11c2 before conversion by voice recognition input. The character string displayed in the converted character string input box 11b3 or 11c3 is registered in the normal character conversion dictionary or special character conversion dictionary switched by the dictionary switching process S1022 (S1210). The registered dictionary information is stored in the voice conversion information storage area 104b of the storage unit 104.

When the user selects [Return] (when the return key 120a11 of the remote controller 120 is pressed), the dictionary registration execution unit 1037 ends the character conversion dictionary registration process by the interruption process (S1211), and ends the repetition process S1200. To do.

In the above, the user authentication execution unit 1035 performs user authentication information acquisition processing S1208, and stores the acquired authentication information in the user authentication information storage area 104c in association with the character string to be registered. In the user authentication information acquisition process S1208, the voice print data may be acquired as the authentication information using the voice data acquired in the voice input acquisition process S1204. Alternatively, an image of the user's face may be acquired by the imaging unit 114, and the face recognition data may be used as authentication information. In short, when the user authentication execution unit 1035 acquires authentication information using an authentication method that can identify a user, and converts a voice-recognized character string according to a character conversion dictionary, user authentication is performed using the acquired authentication information, When the user authentication is valid, conversion using a character conversion dictionary may be performed.

Also, either the voice recognition or the face recognition may be selected as the user authentication execution method for the broadcast receiving apparatus 1. For example, a screen for selecting a user authentication method may be displayed on the display unit 116 so that the user can select using the cursor keys (up, down, left, right) 120a12 and the enter key 120a13 of the remote controller 120. . For example, when the user is in poor physical condition and the voice changes from normal, it may be configured to switch to face recognition when it is determined that the voice print data stored as authentication information does not match.

In addition, registration of a character string for performing user authentication may be performed using both the normal character conversion dictionary and the special character conversion dictionary, or may be performed using only the special character conversion dictionary. When the special character conversion dictionary can be used only, authentication corresponding to “red” of the color key 120a14 in FIG. 11B and FIG. 11C in the new dictionary registration screen and the dictionary registration change screen of the normal dictionary. No registration is displayed)

Also, registration of user authentication for a single character string may be performed for a plurality of people. Thus, for example, by registering the father and mother for user authentication, even when the father is away, the mother can input the password by voice input of the same character string.

The dictionary registration information stored in the special character conversion dictionary 104b3 in the speech conversion information storage area 104b according to the second embodiment will be described with reference to FIG. FIG. 13 is a diagram illustrating an example of dictionary registration information stored in the special character conversion dictionary according to the second embodiment.

In FIG. 13, dictionary registration information is stored in a registration number 13b1, a character string (recognized character string) 13b2 before conversion by voice recognition input, a character string 13b3 after conversion, authentication necessity 13b4, and a user authentication information storage area 104c. The authenticated authentication information 13b5 is stored. As a result, for example,

registration number

3 or 4 registered for authentication needs to authenticate the user at the time of voice input, and even if a person other than the registered person inputs the recognition character string by voice, the authentication becomes invalid. It is possible to prevent a password or the like from being entered without permission because conversion to a column is not possible.

The flow of the dictionary conversion input process S424 will be described with reference to FIG. FIG. 14 is a flowchart showing the flow of dictionary conversion input processing according to the second embodiment.

In FIG. 14, first, the speech conversion execution unit 1034 determines whether or not the recognized character string generated by speech recognition by the speech conversion execution unit 1034 in step S423 (see FIG. 4) is registered in the character conversion dictionary selected by the user. Confirmation is made (S1401). For example, when the normal character conversion dictionary is selected in S411, it is checked whether a character string recognized by speech is registered in the normal character conversion dictionary, and when the special character conversion dictionary is selected in S412. Check if the character string recognized as voice is registered in the special character conversion dictionary.

Next, the speech conversion execution unit 1034 performs a branching process (S1402) based on the confirmation result.

In the branch processing S1402, when the character string recognized by speech is not registered in the selected character conversion dictionary (S1402 / No), the speech conversion execution unit 1034 displays an error such as not being registered in the dictionary. (S1403), the process ends. If the voice-recognized character string is registered in the selected character conversion dictionary (S1402 / Yes), the voice conversion execution unit 1034 determines whether the voice-recognized character string is a character string that requires user authentication. This is confirmed by the dictionary registration information (S1404). Next, based on the result of the confirmation, the voice conversion execution unit 1034 performs a branch process (S1405).

In the branch process S1405, in the case of a character string that does not require user authentication (S1405 / No), the speech conversion execution unit 1034 stores information on the character conversion dictionary in which the recognized character string recognized by speech conversion in the character conversion process is selected. (S1409), and the process ends. If the character string requires user authentication (S1405 / Yes), the user authentication execution unit 1035 performs user authentication using the authentication information stored in the user authentication information storage area 104c corresponding to the dictionary registration number. In step S1406, authentication determination information indicating whether user authentication is invalid or valid is output to the voice conversion execution unit 1034. The voice conversion execution unit 1034 performs a branch process (S1407) based on the authentication determination information.

In the branch process S1407, if the user authentication is invalid (S1407 / No), the voice conversion execution unit 1034 displays that the user authentication is invalid (S1408) and ends the process. If the user authentication is valid (S1407 / Yes), the speech conversion execution unit 1034 converts the recognized character string recognized by the character conversion process into a character string according to the information in the character conversion dictionary (S1409), and performs the process. finish.

In the above embodiment, when a character string registered by a person other than the user registered in the dictionary is voice-inputted by performing user authentication when registering the dictionary and converting a character string recognized by the dictionary. Can be avoided. As a result, for example, even if a child who is not registered in the dictionary inputs a character string corresponding to a password by voice, the child does not convert it to a character string corresponding to the password, and the child logs in to a paid content distribution service without permission. Can be prevented.

In the above description, one special character conversion dictionary 104b3 is provided. However, a special character conversion dictionary 104b3 that is different for each user is provided, and a special character conversion dictionary provided for a user specified as a result of user authentication. 104b3 may be selected and used to convert the character string in step S1409. For example, in the above example, a first special character conversion dictionary is provided for father and mother, and a second special character conversion dictionary is provided for children. Then, the first special character conversion dictionary registers the converted character string “Suzuki_parents” for the recognized character string “Suzuki Adores”. Also, the second special character conversion dictionary registers the conversion character string “Suzuki_kid” for the recognized character string “Suzuki Adores”. Then, the user authentication execution unit 1035 acquires the recognized character string generated by the speech conversion execution unit 1034 in step S1401, and authenticates the user uniquely in the user authentication of S1406. As a result, if it is determined that the user is the father, the dictionary conversion process is executed after the first special character conversion dictionary is selected in S1409. If it is determined that the user is a child, the dictionary conversion process is executed after selecting the second special character conversion dictionary. Thereby, even when the same recognized character string “Suzuki Adorasu” is generated, it is possible to convert the character string to a different special character string depending on the user.

<Third Embodiment>
The third embodiment is an embodiment that uses voice input by the user in the user authentication process. More specifically, when voice user authentication is used like a voice print, a character string that requires user authentication is used. In voice input, it is possible to prevent user authentication from being validated by inputting voice recorded when a voice of a character string that requires user authentication in the past is input. Hereinafter, the third embodiment will be described with reference to FIG. FIG. 15 is a sequence diagram illustrating an example of a character input process by voice recognition according to the third embodiment.

15, the processing from S400 to S404 (including S411 and S412 in the branching processing of S404) is the same as that in FIG.

After the branch process S404, the voice conversion execution unit 1034 sets the identification sound (S431), and starts outputting the identification sound set from the voice output unit 113 (S432).

Next, the user utters a voice (S433), the voice input unit 111 captures the voice (S434), and the voice conversion execution unit 1034 finishes outputting the identification sound from the voice output unit 113 (S435).

Based on the voice captured by the voice conversion execution unit 1034 and the voice recognition dictionary 104b1, it is converted into a recognized character string by voice recognition (S436), and the user authentication execution unit 1035 is input in a dictionary conversion input process (S437) described later. If the user authentication is valid after performing user authentication using voice, the recognition character string recognized by the voice conversion execution unit 1034 is converted according to the character conversion dictionary selected by the user, and the converted result is converted to a character. As column input.

Referring to FIG. 16, the dictionary conversion input process in step S437 in FIG. 15 will be described. FIG. 16 is a flowchart showing the flow of dictionary conversion input processing according to the third embodiment. In FIG. 16, the same processing portions as those in FIG.

In the user recognition processing S1406 of FIG. 16, first, user authentication is performed by the user authentication execution unit 1035 using voiceprint data stored in the user information storage area 104c corresponding to the dictionary registration number (S1406a1), and the user authentication execution unit 1035. Branch processing (S1406a2) is performed based on authentication determination information indicating whether user authentication by voiceprint is invalid or valid. If the voiceprint stored as authentication information at the time of dictionary registration is different from the voiceprint captured in S434 (see FIG. 15) and the user authentication is invalid (S1406a2 / No), the voice conversion execution unit 1034 has invalid user authentication. It is displayed (S1408) and the process is terminated. If the voiceprint stored as authentication information at the time of dictionary registration matches the voiceprint captured in S434 and the user authentication is valid (S1406a2 / Yes), the user authentication execution unit 1035 performs an identification sound determination process (S1406a3) described later. The authentication determination information indicating whether the user authentication is invalid or valid is output to the voice conversion execution unit 1034. The voice conversion execution unit 1034 performs a branch process (S1407) based on the authentication determination information.

In the branch process S1407, when the user authentication is invalid based on the authentication determination information (S1407 / No), the voice conversion execution unit 1034 determines that the recorded voice input has been performed, and the user authentication is invalid. Display (S1408), and the process ends.

If the user authentication is valid based on the authentication determination information (S1407 / Yes), the speech conversion execution unit 1034 converts the character string according to the information of the selected character conversion dictionary (S1409), and ends the process.

An example of the identification sound determination process S1406a3 will be described with reference to the flowchart of FIG. 17 and the waveform diagram of FIG. FIG. 17 is a flowchart showing the flow of the identification sound determination process. FIG. 18 is a diagram illustrating signal waveforms used in the identification sound determination process, and (a1) to (a5) are cases where the user inputs a voice, and waveform diagrams when the user authentication is valid by the identification sound determination. It is an example. FIGS. 18B1 to 18B5 are examples of waveform diagrams in the case where the recorded voice is inputted as voice and the user authentication is invalid by the identification sound determination.

18 (a1) and 18 (b1) show a part of the waveform of the identification sound output from the voice output unit 113 during the period from the identification sound output start S432 to the identification sound output end S435. The execution unit 1035 outputs a signal having a predetermined frequency F0 every predetermined period. Here, the output is repeated in the period Ton and the output stop is repeated in the period Toff. It should be noted that the predetermined periods Ton and Toff are desirably periods such as milliseconds when it is difficult for the user to synchronize the utterance timing.

18 (a2) and 18 (b2) are waveforms of the audio signal captured in S434. In FIG. 18 (a2), the identification sound output from the sound output unit 113 and the user's sound are captured. In FIG. 18 (b2), the identification sound output from the audio output unit 113, the recorded user's voice and the recorded identification signal are captured.

In the identification sound determination process shown in FIG. 17, the user authentication execution unit 1035 first detects a signal corresponding to the frequency F0 of the identification sound with respect to the audio signal (FIGS. 18 (a2) and (b2)) captured in S434. Filter processing (S1701) is performed. Thereby, the signal (FIG. 18 (a3) and (b3)) which detected the component of the identification sound is obtained.

In FIG. 18 (a3), the identification sound output from the audio output unit 113 is detected, whereas in FIG. 18 (b3), the identification sound output from the audio output unit 113 and the identification sound of the recorded voice are displayed. Detected. Further, the timing is different between the identification sound output from the audio output unit 113 and the recorded identification sound. For this reason, during the period in which the identification sound output from the audio output unit 113 and the identification sound of the recorded sound are detected simultaneously, the interference between the identification sound output from the audio output unit 113 and the identification sound of the recorded sound And the amplitude of the identification sound captured by the voice input unit 111 changes depending on the way of interference. The example of FIG. 18 (b3) shows a case where the amplitude of the identification sound is reduced due to interference.

Next, the user authentication execution unit 1035 detects the amplitude of the detected signal (S1702), and compares the detected signal with a predetermined threshold value, thereby converting the detected signal into a binary signal of H level and L level. (S1703).

18 (a4) and (b4) are signals obtained by detecting the amplitude of the identification sound shown in FIGS. 18 (a3) and (b3) by the amplitude detection processing S1702. In FIG. 18 (a4), the signals are output from the audio output unit 113. 18 (b4) shows that the identification sound output from the audio output unit 113, the recorded identification sound, and the output from the audio output unit 113 are detected. It is shown that the detected amplitude differs between the identification sound being recorded and the identification sound in which the recorded identification sound interferes. 18 (a5) and 18 (b5) are signals obtained by comparing the signal detected by the amplitude detection process S1702 with a predetermined threshold value Vt by the binarization process S1703 and converting it to a binary value of H level and L level. In FIG. 18 (a5), the H level period is substantially Ton, whereas in FIG. 18 (b5), the H level period is greatly deviated from Ton due to the influence of the identification sound of the recorded voice.

Next, the user authentication execution unit 1035 detects the H level period of the binarized signal (S1704), and the H level period is within a predetermined range (here, as an example, Ton × 0.9 to Ton × 1). Branching processing S1705 is performed based on whether or not it is within the range of .1. If it is not within the predetermined range (S1705 / No), the user authentication execution unit 1035 determines that the user authentication is invalid, assuming that the recorded voice is input (S1706), and ends the process. If it is within the predetermined range (S1705 / Yes), the user authentication execution unit 1035 determines that the user authentication is valid (S1707), and ends the process.

According to the above processing, since it is difficult to input the recorded sound in accordance with the timing of the recorded identification sound with respect to the identification sound output from the sound output unit 113, the recorded sound When is entered, it is possible to determine that user authentication is invalid.

In the above embodiment, a signal having a predetermined frequency F0 is output as a discriminating sound every predetermined period. However, the present invention is not limited to this. Another example of user authentication using the identification sound will be described with reference to FIGS. 19 and 20. FIG. 19 is a diagram illustrating an example of the waveform of the identification sound when a plurality of (two or more) timing patterns are provided for the period Ton for outputting the signal having the predetermined frequency F0 and the period Toff for the stop in the identification sound setting process S341. is there. (A) shows the output pattern in which the period Ton for outputting the signal of the frequency F0 and the period of off Toff are substantially the same, and (b) shows the output of the identification sound in which the period Ton of outputting the signal of the frequency F0 and the period of off Toff are different Indicates a pattern. FIG. 20 is a diagram showing an example of a frequency spectrum of voice input when a plurality (two or more) of identification sound frequencies are provided. FIG. 20A shows a case where voice actually uttered by the user is input. A frequency spectrum is shown, (b) shows a frequency spectrum when the recorded audio | voice is input.

As shown in FIGS. 19A and 19B, in the identification sound setting process S441, a plurality (two or more) of timing patterns of a period Ton for outputting a signal having a predetermined frequency F0 and a period Toff for stopping are provided. By changing for each input, it is determined whether or not the recorded sound is input by detecting the difference in timing between the output identification sound and the period Ton of the identification sound included in the captured sound and the stop period Toff. Can be determined.

In addition, in the identification sound setting process S441, when a plurality of (two or more) identification sound frequencies are set and changed for each sound input, and a sound that has not been recorded is input, as shown in FIG. Thus, the spectrum corresponding to the frequency F0 of the identification signal output from the audio output unit 113 is detected. On the other hand, when the recorded voice is input, the spectrum corresponding to the frequency F0 of the identification signal output from the voice output unit 113 and the frequency of the recorded identification signal are shown in FIG. A spectrum corresponding to F1 is detected. Therefore, the frequency spectrum of the input signal is analyzed, and when a spectrum having a frequency different from the frequency of the identification signal output from the audio output unit 113 is detected, it is determined that the recorded sound is input, and the user Authentication may be invalidated.

In short, it is only necessary to output an identification sound at the time of voice input, and to determine whether or not a recorded voice is input by the identification sound of the captured voice signal. Here, the identification sound may be an audible frequency signal or a non-audible frequency signal (ultrasound).

By the above processing, even when voice user authentication is used like a voiceprint, when recorded voice is used as input, it is determined that user authentication is invalid, so that character strings are not converted. can do.

<Fourth embodiment>
In the fourth embodiment, when user authentication based on a face image is used, it is possible to prevent the user authentication from being validated by a registered user's photograph in voice input of a character string that requires user authentication. is there. The processing flow of this embodiment will be described below with reference to FIGS. FIG. 21 is a sequence diagram showing an example of character input processing by voice recognition according to the fourth embodiment. FIG. 22 is a flowchart showing the flow of dictionary conversion input processing according to the fourth embodiment. FIG. 23 shows the contents of the lip movement determination process in face authentication, (a) shows the lip movement, and (b) shows the time series change of the x and y components of the lip movement. .

In FIG. 21, the processing from S401 to S412 is the same as that in FIG. After the branch process S404, the imaging unit 114 starts capturing an image (S441). Next, the voice input unit 111 captures the voice uttered by the user (S442) (S443), and the imaging unit 114 ends the capturing of the image (S444).

The voice conversion execution unit 1034 converts the captured voice and the voice recognition dictionary 104b1 into a recognized character string by voice recognition (S445), and the user recognizes the voice recognized character string by dictionary conversion input processing (S446) described later. Conversion processing is performed according to the selected character conversion dictionary, and the converted result is used as the input of the character string.

FIG. 22 shows an example of a flowchart of the dictionary conversion input process S446. In FIG. 22, the same processing parts as those in FIG.

In the user recognition process S1406 of FIG. 22, first, the user authentication execution unit 1035 performs user authentication by face authentication based on the image captured from the imaging unit 114 between S441 and S454 (S1406b1), and whether the user authentication is invalid. A branch process (S1406b2) is performed based on the authentication determination information indicating whether it is valid. When the face image stored as the authentication information at the time of registering the voice dictionary is different from the image captured between S441 and S444 and the user authentication is invalid (S1406b2 / No), the voice conversion execution unit 1034 has invalid user authentication. Is displayed (S1408), and the process is terminated.

When the face image stored as the authentication information at the time of registering the voice dictionary matches the image captured between S441 and S444 and the user authentication is valid (S1406b2 / Yes), the user authentication execution unit 1035 executes the process between S441 and S444. In step S1406b3, authentication determination information indicating whether the user authentication is invalid or valid is output to the voice conversion execution unit 1034. The voice conversion execution unit 1034 performs a branch process (S1407) based on the authentication determination information.

In the determination based on the movement of the lips, the user authentication execution unit 1035 detects the movement of the recognized lips of the face from the images captured between S441 and S444. When the user authentication execution unit 1035 detects the movement of the lips, it is determined that the captured image is not a photograph taken and it is determined that the user authentication is valid (S1407 / Yes). If the user authentication execution unit 1035 cannot detect the movement of the lips, it determines that the captured image is a photograph taken, and determines that the user authentication based on the face image is invalid (S1407 / No).

In the branch process S1407, when the user authentication is invalid according to the authentication determination information (S1407 / No), the voice conversion execution unit 1034 displays that the user authentication is invalid (S1408), and ends the process. If the user authentication is valid based on the authentication determination information (S1407 / Yes), the speech conversion execution unit 1034 converts the character string according to the information in the character conversion dictionary (S1409), and the process ends.

In the determination process S1406b3 based on the movement of the lips, the user authentication execution unit 1035 may determine based on not only the presence / absence of the movement of the lips but also the degree of opening of the lips corresponding to the character string at the time of voice input.

For example, as shown in FIG. 23 (a), the user authentication execution unit 1035 detects the size of the lips with the horizontal size of the lips as X and the vertical size as Y. Then, when the user authentication information (13a5 in FIG. 13) is registered in advance in association with the dictionary, for example, special character information, a face image when the speech recognition character string is input by voice is captured. Then, from the image captured by the user authentication execution unit 1035, as shown in FIG. 23B, the X and Y components of the size of the lip corresponding to the character string inputted by voice are detected, together with the user authentication information based on the face image. Lip sizes X and Y corresponding to the character string are stored as dictionary registration information.

In the user authentication process (S1406), the user authentication execution unit 1035 determines the size X, Y of the lip corresponding to the character string input by voice from the image captured between S441 and S444 in the determination process S1406b3 based on the movement of the lips. May be detected by comparing the sizes of the lips registered as dictionary information. As a result, it is possible to perform user authentication with a face image with higher accuracy.

<Fifth Embodiment>
In the present embodiment, voice input processing is performed using an information terminal device 2 linked with the broadcast receiving device 1.

[Hardware configuration of portable information terminal]
FIG. 24 is a block diagram illustrating an example of an internal configuration of the information terminal device 2. The information terminal device 2 includes a system bus 200, a main control unit 201, a ROM 202, a RAM 203, a storage unit 204, an expansion I / F unit 205, an operation unit 206, a sensor unit 210, a communication processing unit 220, an image processing unit 230, and an audio. The processing unit 240 is configured.

The main control unit 201 is a microprocessor unit that controls the entire information terminal device 2. The system bus 200 is a data communication path for transmitting and receiving data between the main control unit 201 and each operation block in the information terminal device 2.

The ROM 202 is a memory in which a basic operation program such as an operating system and other operation programs are stored. For example, a rewritable ROM such as an EEPROM or a flash ROM is used. The RAM 203 serves as a work area when the basic operation program and other operation programs are executed. The ROM 202 and RAM 203 may be integrated with the main control unit 201. Further, the ROM 202 may not use an independent configuration as shown in FIG. 24 but may use a partial storage area in the storage unit 204.

The storage unit 204 stores an operation program and an operation setting value of the information terminal device 2, personal information of the user of the information terminal device 2, and the like. Further, it is possible to store an operation program downloaded from the network, various data created by the operation program, and the like. Also, contents such as moving images, still images, and audio downloaded from the network can be stored. All or some of the functions of the ROM 202 may be replaced by a partial area of the storage unit 204. In addition, the storage unit 204 needs to hold stored information even when power is not supplied to the information terminal device 2 from the outside. Therefore, for example, devices such as a flash ROM, SSD, and HDD are used.

Note that each operation program stored in the ROM 202 or the storage unit 204 can be updated and expanded by a download process from a server device on the external network 4.

The expansion I / F unit 205 is a group of interfaces for extending the functions of the information terminal device 2, and in this embodiment, includes an image / audio I / F, a USB I / F, a memory I / F, and the like. And The video / audio I / F performs input of video signals / audio signals from external video / audio output devices, output of video signals / audio signals to external video / audio input devices, and the like. The USB I / F transmits and receives data by connecting to a PC or the like. A keyboard or other USB device may be connected. The memory I / F connects a memory card or other memory medium to transmit / receive data.

The operation unit 206 is an instruction input unit that inputs an operation instruction to the information terminal device 2, and in this embodiment, is configured by a touch panel that is arranged over the display unit 231. For example, a gesture called swipe that moves a finger in a specific direction while touching the touch panel with a finger, a gesture called tap that is quickly released after touching the touch panel with a finger, and the like are detected. Operation input is possible.

The sensor unit 210 is a sensor group for detecting the state of the information terminal device 2. In this embodiment, the GPS reception unit 211, the gyro sensor 212, the geomagnetic sensor 213, the acceleration sensor 214, the illuminance sensor 215, and the proximity sensor 216. ,including. With these sensor groups, it is possible to detect the position, inclination, direction, movement, brightness of the information terminal device 2, the proximity of nearby objects, and the like. The information terminal device 2 may further include other sensors such as a barometric pressure sensor.

The communication processing unit 220 includes a LAN communication unit 221, a mobile telephone network communication unit 222, and a Bluetooth communication unit 223. The LAN communication unit 221 is connected to the external network 4 via the router device 3 and transmits / receives data to / from a server device on the external network 4. It is assumed that the connection with the router device 3 is performed by a wireless connection such as WiFi (registered trademark). The mobile telephone network communication unit 222 performs telephone communication (call) and data transmission / reception by wireless communication with a base station (not shown) of the mobile telephone communication network. The LAN communication unit 221, the mobile telephone network communication unit 222, and the Bluetooth communication unit 223 are each provided with a coding circuit, a decoding circuit, an antenna, and the like. The communication processing unit 220 may further include another communication unit such as an NFC communication unit or an infrared communication unit.

The image processing unit 230 includes a display unit 231, an image signal processing unit 232, a first image input unit 233, and a second image input unit 234. The display unit 231 is a display device such as a liquid crystal panel, for example, and provides the image data processed by the image signal processing unit 232 to the user of the information terminal device 2. The image signal processing unit 232 includes a video RAM (not shown), and image data input to the video RAM is displayed on the display unit 231. In addition, the image signal processing unit 232 has a function of performing format conversion, menu and other OSD signal superimposition processing as necessary. The first image input unit 233 and the second image input unit 234 input image data of surroundings and objects by converting light input from a lens into an electrical signal using an electronic device such as a CCD or a CMOS sensor. It is a camera unit.

The audio processing unit 240 includes an audio output unit 241, an audio signal processing unit 242, and an audio input unit 243. The audio output unit 241 is a speaker, and provides the audio signal processed by the audio signal processing unit 242 to the user of the information terminal device 2. The voice input unit 243 is a microphone, which converts a user's voice and the like into voice data and inputs the voice data.

The information terminal device 2 may be a mobile phone, a smartphone, a tablet terminal, or the like. It may be a PDA (Personal Digital Assistant) or a notebook PC (Personal Computer).

Note that the configuration example of the information terminal device 2 illustrated in FIG. 24 includes a number of configurations that are not essential to the present embodiment, such as the sensor unit 210, but this embodiment may be configured without these components. There is no loss of effect. Further, a configuration not shown in the figure such as a digital broadcast receiving function and an electronic money settlement function may be further added.

[Software configuration of portable information terminal]
A software configuration of the information terminal device 2 of the present embodiment will be described with reference to FIG. FIG. 25 is a diagram illustrating an internal configuration of the information terminal device 2 according to the present embodiment, in which (a) is a software configuration diagram of the information terminal device 2 and (b) is stored in the voice conversion information storage area. Indicates the dictionary to be used. FIG. 25A shows a software configuration in the ROM 202, the RAM 203, and the storage unit 204.

The basic operation program 2021 stored in the ROM 202 is expanded in the RAM 203, and the main control unit 201 executes the expanded basic operation program to constitute the basic operation execution unit 2031.

In addition, the application program 2041, the voice conversion program 2042, and the cooperation device management program 2043 stored in the storage unit 204 are expanded in the RAM 203, and the main control unit 201 executes the expanded operation programs so that the application program is executed. An execution unit 2032, a voice conversion execution unit 2033, and a linked device management execution unit 2034 are configured. The RAM 203 includes a temporary storage area that temporarily stores data created when each operation program is executed as necessary.

The storage unit 204 also includes a voice conversion information storage area 204a for storing a dictionary or the like for converting a voice-recognized character string into a predetermined character string, and authentication information used in a cooperative operation with the broadcast receiving apparatus 1 or the like. Assume that a cooperative device information storage area 204b for storing information and a variety of information storage areas 204c for storing various other information are provided. In the voice conversion information storage area 204a, as shown in FIG. 25B, a voice recognition dictionary 204a1 that recognizes an input voice and converts it into a character string, and converts the voice recognized character string into a predetermined character string. It is assumed that a normal character conversion dictionary (first dictionary) 204a2 and a special character conversion dictionary (second dictionary) 204a3 are stored.

The normal character conversion dictionary (first dictionary) 204a2 and the special character conversion dictionary (second dictionary) 204a3 are the same as the normal character conversion dictionary and the special character conversion dictionary of the first embodiment. For example, when a user is specified like a smart phone or a mobile phone, it can be configured as a normal character conversion dictionary and a special character conversion dictionary corresponding to the user. In addition, when the information processing terminal 2 is a device used by a plurality of users, for example, a PC or a tablet terminal, a normal character conversion dictionary and a special character conversion dictionary corresponding to each user are prepared, and a user is designated for these devices. Using the logged-in information, a normal character conversion dictionary and a special character conversion dictionary corresponding to the logged-in user may be selected by the voice conversion executing unit 2033 described later.

In the following, in order to simplify the explanation, the main control unit 201 executes the basic operation execution by executing the basic operation program 2021 stored in the ROM 202 by expanding the basic operation program 2021 in the RAM 203 and executing it. The unit 2031 is described as performing control of each operation block. The same description is made for other operation programs.

The application execution unit 2032 executes various operation programs downloaded from the server device. Each application is activated by receiving an operation from the user via the operation unit 206 and selecting an application activation icon displayed on the display unit 231.

The voice conversion execution unit 2033 recognizes the user's voice captured by the voice input unit 243 as a character string based on the voice recognition dictionary 204a1, and performs an operation input such as channel selection of the broadcast receiving apparatus 1 or a voice-recognized character string (recognition). Character string) is converted into a predetermined character string according to the dictionaries 204a2 and 204a3, and characters are input.

The linked device management execution unit 2034 registers and manages the broadcast receiving device 1 connected to the local network in the house or the external network 4 such as the Internet, and executes the operation of the broadcast receiving device 1 registered by the information terminal device 2.

The RAM 203 includes a temporary storage area 2035 that serves as a work area for the main control unit 201.

The operation programs may be stored in the ROM 202 and / or the storage unit 204 in advance at the time of product shipment. After the product is shipped, it may be acquired from the server device or the like via the external network 4 via the LAN communication unit 221 or the mobile telephone network communication unit 222. Further, each operation program stored in a memory card, an optical disk, or the like may be acquired via the expansion I / F unit 205 or the like.

Next, processing when the information terminal device 2 performs character input of the broadcast receiving device 1 will be described with reference to FIGS. 26 and 27. FIG. FIG. 26 is a sequence diagram illustrating an example of a character input process by voice recognition according to the fifth embodiment. FIG. 27 is a diagram showing a screen display example of the information terminal device 2, where (a) shows an application startup screen, (b) shows a device authentication screen, and (c) shows a character input screen.

26, first, the user performs an operation input (S2601), and starts an application for operating the broadcast receiving apparatus 1 and the information terminal apparatus 2 in cooperation (S2602).

FIG. 27A shows an example of a screen that is selected and activated in the information terminal device 2, and a button (icon) corresponding to the application is displayed on the display unit 231. Here, when the user inputs the activation of the application by tapping the “TV cooperation” button 231 a 1, the application for operating the broadcast receiving apparatus 1 is activated in cooperation with the information terminal apparatus 2.

Next, the main control unit 201 displays an authentication screen for device authentication on the display unit 231 (S2603), and the user performs selection input of a cooperation device (S2604), and selects the cooperation device.

(B) of FIG. 27 is an example of a device authentication screen, and a device list 23b1, a selection frame 231b2, and an enter button 231b3 are displayed. The linked device management execution unit 2034 displays the device list 23b1. When a device connected to the network is found, the linked device management execution unit 2034 displays the device name in the device list 23b1 regardless of whether the device name is authenticated or not authenticated. The linked device management execution unit 2034 stores the found device name in the authentication information storage area 204b of the storage unit 204 together with the authenticated or unauthenticated information. If a device was found in the past but could not be found this time, the device name can be displayed with a different display color to distinguish it from other devices. In the example of FIG. 27B, “TV1 (broadcast receiving device 1)” and “TV2 (broadcast receiving device 2)” are displayed in the device list 231b1, and it is confirmed that TV1 has been previously authenticated. Show.

When the user taps the portion of the device list 231b1 where “TV1” is displayed, the selection frame 231b2 is displayed on the display portion of “TV1”. Is selected.

When the user performs an operation of selecting a linked device, the operation of receiving authentication from the broadcast receiving device 1 starts. Therefore, the cooperation device management execution unit 2034 broadcasts authentication information such as a user ID and a password stored in the cooperation device information storage area 204b in advance via the LAN communication unit 221, the router device 3, and the LAN communication unit 117. It transmits with respect to the cooperation terminal management execution part 1036 of the receiver 1 (S2605).

The cooperation terminal management execution unit 1036 of the broadcast receiving device 1 performs authentication by comparing the authentication information stored in the cooperation terminal information storage area 104d with the authentication information transmitted from the cooperation device management execution unit 2034 of the information terminal device 2. (S2606), and returns the authentication result to the linked device management execution unit 2034 of the information terminal device 2 (S2607). If the authentication information matches, the cooperation terminal management execution unit 1036 authenticates the connection to the cooperation device management execution unit 2034. Here, the information terminal device 2 can omit the authentication screen of FIG. 27B by storing the last authenticated device in the linked device information storage area 204b. If the authentication information does not match, the authentication screen shown in FIG. 27B may be displayed again.

Through the above processing, the information terminal device 2 is authenticated as a cooperation terminal of the broadcast receiving device 1, and the user can operate the broadcast receiving device 1 using the information terminal device 2.

Next, the broadcast receiving apparatus 1 displays a character input screen (S2610), and transmits a software keyboard activation request to the cooperating information terminal apparatus 2 (S2611). The display of the character string input screen is displayed, for example, when an account name or password for logging in to the server is input in order to perform streaming viewing or download viewing of the content distributed from the server device via the external network 4.

The information terminal device 2 displays a character input screen on the display unit 231 (S2612). FIG. 27C shows an example of a character input screen of the information terminal device 2, which includes a display frame 231 c 1 for displaying the input characters, a character input key 231 c 2, and a transmission key for transmitting the input characters to the broadcast receiving device 1. A character input key 231c4 by voice recognition is displayed.

The main control unit 201 repeats the process (S2613) until the transmission key 231c3 is selected.

In the repetitive processing S2613, first, the user performs key input by tapping the key displayed on the display unit 231 (S2621). Next, the main control unit 201 determines the key input state by the key input state determination based on the type of the key tapped by the user (S2622), and executes a branch process (S2623) based on the determination result.

In the key input state determination result, when the main control unit 201 determines that the key input is performed from the character input key 231c2, the main control unit 201 displays the character input by the key on the display frame 231c1 (S2631).

In the key input state determination result, when the main control unit 201 determines that the voice input key has been tapped once within a predetermined time or less, the voice input unit 243 captures voice input from the user (S2641) ( In step S2642, the voice conversion execution unit 2033 converts the voice into a recognized character string by voice recognition based on the voice and the voice recognition dictionary 204a1 (S2643).

Next, the speech conversion execution unit 2033 performs a conversion process on the recognized character string according to the normal character conversion dictionary (first dictionary) 204a2 by the normal dictionary conversion / display process, and displays the converted result on the display frame 231c1 (S2644). ). As a normal character conversion dictionary, general words and phrases are stored in advance in the ROM 202 and / or the storage unit 204 at the time of product shipment. Alternatively, it may be acquired from the server device on the Internet 4 via the LAN communication unit 221 or the telephone network communication unit 222 after product shipment. Further, each operation program stored in a memory card, an optical disk, or the like may be acquired via the expansion I / F unit 205. Alternatively, the user may register in the same manner as the dictionary registration process described above with reference to FIGS.

In the branch process S2623, when the main control unit 201 determines that the voice input key 231c4 is pressed twice within a predetermined time as a result of the key input state determination, the voice input from the user (S2651) is set as voice. It is captured by the input unit 243 (S2652), converted into a recognized character string by speech recognition based on the speech captured by the speech conversion execution unit 2033 and the speech recognition dictionary 204a1 (S2653), and special dictionary conversion / display processing (S2654) is performed. . In the special dictionary conversion / display process S2654, the speech conversion execution unit 2033 performs a conversion process on the recognized character string according to the special character conversion dictionary (second dictionary) 204a3 registered by the user, and displays the converted result in the display frame 231c1. (S2654).

In the branch process S2623, when the main control unit 201 determines that the transmission key 231c3 is input based on the key input state determination result, the main control unit 201 broadcasts the character string displayed in the display frame 231c1 in the interruption process. The data is transmitted to the receiving apparatus 1 (S2661), and the repetition process S2613 is terminated. The broadcast receiving apparatus 1 processes the character string transmitted from the information terminal apparatus 2 as a character input (S2614).

The registration process for the normal character conversion dictionary and the special character conversion dictionary may be performed in the same manner as in the first embodiment. For example, on the character input display screen shown in FIG. 27C, the dictionary registration process may be activated when the user touches the voice input key 231c4 for a predetermined time or more.

In this embodiment, the information terminal device 2 stores a normal character conversion dictionary and a special character conversion dictionary. For this reason, even if a user inputs a character string registered in the information terminal device A by voice in another information terminal device B, the character string is converted into a character string by the normal character conversion dictionary or the special character conversion dictionary of the information terminal device A. Absent. Therefore, in an information terminal device such as a smartphone that performs user authentication when used, user authentication when performing voice input may be omitted. In the present embodiment, the broadcast receiving device 1 and the information terminal device 2 are linked by the

LAN communication units

117 and 221 and the router device 3, but may be linked by a communication method such as Bluetooth.

In the above embodiment, since the password is not directly inputted by voice when inputting the password, the password is not known even if it is asked by another person.

Also, complex character strings can be input by inputting simple character strings by voice.

In addition, character conversion is performed by the first operation input operation (for example, the voice input key is pressed once within a predetermined time or less) and the second operation input operation (for example, the voice input key is pressed twice at a predetermined time or less). By switching dictionaries, it is possible to prevent the password from being known by accidentally being converted to a character string corresponding to the password when the same character string as the password is input during normal keyword input, etc. Become.

In the above embodiment, character input by Japanese speech has been described. For example, in character input by English speech, in the first operation input operation, the result of speech recognition is set as character input, and the second operation input operation is performed. Then, the result of converting a character string recognized by speech using a special character conversion dictionary may be used as a character input. That is, a character string recognized by voice in a predetermined operation input operation (for example, a voice input key is pressed twice within a predetermined time or less) may be converted based on a character conversion dictionary.

In the first to fourth embodiments, the voice input unit 111 of the broadcast receiving apparatus 1 is used for voice input. However, the present invention is not limited to this, and the remote control 120 is provided with a voice input unit such as Bluetooth. The voice input may be performed by transmitting to the broadcast receiving apparatus 1 by the communication method. Alternatively, voice is input by the voice input unit 243 of the information terminal device 2, and the captured voice data is transmitted to the broadcast receiving device 1 via the LAN communication unit 221, the router device 3, and the LAN communication unit 117. May be. Alternatively, the broadcast receiving apparatus 1 and the information terminal apparatus 2 may be provided with a communication unit such as Bluetooth, and voice input may be performed by transmitting audio data from the information terminal apparatus 2 to the broadcast receiving apparatus 1.

Further, although the speech recognition is performed in the broadcast receiving device 1 or the information terminal device 2, the captured speech data may be transmitted to the server device connected to the network, and the speech recognition may be performed in the server device. . The broadcast receiving device 1 or the information terminal device 2 may receive the character string information recognized by the server device and perform character string conversion using the normal character conversion dictionary or special character conversion dictionary.

In the above description, the first operation and the second operation for selecting each of the normal character conversion dictionary and the special character conversion dictionary are determined in advance, and based on whether the user has performed the first operation or the second operation. The dictionaries are switched, but the dictionaries may be switched not from user operations but from program descriptive terms. For example, when the screen is described using a markup language, the speech conversion execution unit 1034 or the dictionary registration execution unit 1037 is a character input field type, for example, a user password input field, or a search keyword When the input field to be changed to a special character string, for example, the user password input field is selected as the input destination for character input A special character conversion dictionary may be selected as a dictionary for converting the recognized character string. Further, when the keyword input field is selected, a normal character conversion dictionary may be selected. As a result, the user can perform conversion to a normal character string or conversion to a special character string only by performing an input field selection operation (for example, an operation for aligning the cursor) without performing a dictionary selection operation.

As mentioned above, although the example of embodiment of this invention was demonstrated using 5th Embodiment from 1st Embodiment, it cannot be overemphasized that the structure which implement | achieves the technique of this invention is not restricted to the said embodiment, Various. Various modifications can be considered. For example, part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. These all belong to the category of the present invention. In addition, numerical values, messages, and the like appearing in sentences and drawings are merely examples, and the use of different ones does not impair the effects of the present invention.

The functions and the like of the present invention described above may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Further, the microprocessor unit or the like may be realized by software by interpreting and executing an operation program that realizes each function or the like. Hardware and software may be used together.

Also, the control lines and information lines shown in the figure are those that are considered necessary for the explanation, and not all control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

1: broadcast receiving device, 2: information terminal device, 3: router device, 4: external network, 101: main control unit, 102: ROM, 103: RAM, 104: storage unit, 111: audio input unit, 113: audio Output unit, 114: imaging unit, 116: LAN communication unit, 120: remote controller, 1034: voice conversion execution unit, 1035: user authentication execution unit, 1036, linkage terminal management unit, 201: main control unit, 202: ROM, 203 : RAM, 204: storage unit, 243: voice input unit, 2033: voice conversion execution unit, 2034: cooperation device management execution unit

Claims

A special character conversion dictionary unit that stores special character conversion dictionary information that associates a special character including at least one of a phonetic character and a character and a symbol having a reading different from the reading of the phonetic character;
Voice conversion execution for converting voice information uttered by a user into recognized characters made of phonetic characters representing the reading of the voice information, and converting the recognized characters into the special characters with reference to the special character conversion dictionary information And
An audio conversion device comprising:
A normal character conversion dictionary unit that stores normal character conversion dictionary information that associates normal characters including at least one of the phonetic characters and ideograms and symbols having the same reading as the phonetic characters;
A dictionary selection operation unit that accepts a selection operation of the special character conversion dictionary unit or the normal conversion dictionary unit,
When the special character conversion dictionary is selected, the speech conversion execution unit converts the recognized character into the special character with reference to the special character conversion dictionary information, and the normal character conversion dictionary is selected. If so, the recognition character is converted to the normal character with reference to the normal character conversion dictionary information,
The speech conversion apparatus according to claim 1.
A user authentication execution unit for authenticating the user;
The voice conversion execution unit converts the recognized character into the special character string when the authentication of the user authentication execution unit is valid.
The speech conversion apparatus according to claim 1.
The special character conversion dictionary unit is provided for each user,
When the authentication of the user authentication execution unit is valid, the speech conversion execution unit converts the recognized character into the special character with reference to the special character conversion dictionary information associated with the authenticated user. ,
The voice conversion device according to claim 3.
A voice input unit for receiving input of voice information uttered by the user;
The voice conversion execution unit outputs an identification sound for use in user authentication when receiving input of the voice information, the identification sound included in the voice information input from the voice input unit, and the voice conversion Based on the identification sound comparison result output by the execution unit, user authentication is performed.
The voice conversion device according to claim 3.
An imaging unit for imaging the user's face image;
A user authentication information storage unit for storing user identification information for uniquely identifying the user and user authentication information associated with a face image when the user utters a predetermined voice;
The imaging unit captures a face image when the user utters the predetermined voice, and the user authentication execution unit stores the face image acquired from the imaging unit and the user authentication information storage unit. User authentication based on the comparison result of the face image
The voice conversion device according to claim 3.
Converting speech information uttered by the user into recognized characters composed of phonograms representing the reading of the speech information;
The special character conversion dictionary information that stores the phonogram and the special character including at least one of a character and a symbol having a reading different from the phonogram reading, and stores the recognized character as the special character. Converting to
A speech conversion method comprising:
Converting speech information uttered by the user into recognized characters composed of phonograms representing the reading of the speech information;
The special character conversion dictionary information that stores the phonogram and the special character including at least one of a character and a symbol having a reading different from the phonogram reading, and stores the recognized character as the special character. Converting to
A voice conversion program for causing a computer to execute.