WO2013190963A1 - Voice response device - Google Patents

Voice response device Download PDF

Info

Publication number
WO2013190963A1
WO2013190963A1 PCT/JP2013/064918 JP2013064918W WO2013190963A1 WO 2013190963 A1 WO2013190963 A1 WO 2013190963A1 JP 2013064918 W JP2013064918 W JP 2013064918W WO 2013190963 A1 WO2013190963 A1 WO 2013190963A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
response
user
information
voice response
Prior art date
Application number
PCT/JP2013/064918
Other languages
French (fr)
Japanese (ja)
Inventor
勉 足立
丈誠 横井
林 茂
健純 近藤
辰美 黒田
大介 毛利
豪生 野澤
謙史 竹中
毅 川西
健司 水野
博司 前川
岩田 誠
Original Assignee
エイディシーテクノロジー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by エイディシーテクノロジー株式会社 filed Critical エイディシーテクノロジー株式会社
Priority to JP2014521255A priority Critical patent/JP6267636B2/en
Publication of WO2013190963A1 publication Critical patent/WO2013190963A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback

Definitions

  • This international application includes Japanese Patent Application No. 2012-137065, Japanese Patent Application No. 2012-137066, and Japanese Patent Application No. 2012-137067 filed with the Japan Patent Office on June 18, 2012.
  • the Japanese patent application No. 2012-137065, the Japanese patent application No. 2012-137066, and the Japanese patent application No. 2012-137067 are referred to in the present international application. Incorporated into.
  • the present invention relates to a voice response device that allows voice response to input character information.
  • One aspect of the present invention is to improve the usability for a user in a voice response device that makes a response to input character information by voice.
  • a voice response device that allows voice response to input character information, Response acquisition means for acquiring a plurality of different responses to the character information; Voice output means for outputting the plurality of different responses in different voice colors; It is provided with.
  • a voice response device it is possible to output a plurality of responses with different voice colors, so even if it is not possible to specify one solution for one character information, a different solution with different voice colors can be used. Can be easily output. Therefore, it is possible to improve usability for the user.
  • the voice response device of the present invention may be configured as a terminal device possessed by a user, or may be configured as a server that communicates with the terminal device.
  • the character information may be input to the keyboard or the like using input means, or may be input by converting voice into character information.
  • the voice response device For voice input means for a user to input voice and an external device that converts the input voice into character information, generates a plurality of different responses to the character information, and transmits the response to the voice response device Audio transmitting means for transmitting; With The response acquisition unit may acquire the response from the external device.
  • the voice response device since the voice response device can input voice, it can be configured to input character information by voice. Moreover, since it can be set as the structure which produces
  • the operation of “converting input voice into character information” may be performed by a voice response device or an external device.
  • the voice response device or the external device includes response recording means in which a plurality of different responses including a positive response and a negative response to each character information are recorded for each of a plurality of character information,
  • the response acquisition means acquires the positive response and the negative response as the plurality of different responses,
  • the voice output means may play back with different voice colors for the positive response and the negative response.
  • responses of different positions such as a positive response and a negative response can be played with different voice colors, so that the voice is played as if another person is speaking. Can do. Therefore, it is possible to make it difficult for the user who listens to the voice to feel uncomfortable.
  • the voice color may be changed depending on the type of response and the language used in the response. For example, when a response is made with a gentle tone, the voice is reproduced with a calm woman's voice, and when a response is made with a severe tone, a response with a brave man's voice may be made. That is, the response content and the personality are associated with each other, and the voice color may be set according to the personality.
  • the voice response device can be configured to be used at the reception of a workplace or company as in the invention of the fourth aspect, or can be configured to notify the user that it is difficult to tell someone directly.
  • the name and company name of the person coming to the sales are recorded in advance in the voice response device or an external device, and those who came to the receptionist have given this name or company name. In that case, a response may be generated so as to reproduce the voice of the phrase to be refused.
  • the voice response device may speak (reproduce the voice) instead.
  • the response may not be output immediately, but may be output when the reproduction condition is satisfied, for example, after a certain time has elapsed.
  • the external device or the voice response device may acquire information for generating a response to the character information from another voice response device.
  • the voice response device as in the invention of the sixth aspect, when information for generating a response to the character information is requested from another voice response device, the information corresponding to the request is returned. May be.
  • the voice response device includes sensors for detecting position information, temperature, humidity, illuminance, noise level, etc., and a database such as dictionary information, and extracts necessary information according to the request. You can do it.
  • Such a voice response device can acquire information for generating a response from another voice response device.
  • information unique to the other voice response device such as the position of the other voice response device can be acquired.
  • information unique to itself can be transmitted to another voice response device.
  • a response for example, a positive response or a negative response
  • a response output by itself or another voice response device is input as character information, and the response to this response is received.
  • This configuration can be realized using one or a plurality of voice response devices.
  • voices may be directly input / output, or wireless communication or the like may be used.
  • a voice response device that allows voice response to input character information
  • Personality information acquisition means for acquiring personality information associated with the personality of a person representing a user or a person related to the user according to a preset category
  • Response acquisition means for acquiring response candidates representing a plurality of different responses to the character information; Selecting a response to be output from a response candidate according to the personality information, and outputting the selected response; It is provided with.
  • a voice response device According to such a voice response device, different responses can be made according to the personality of the user or the person related to the user (related person). Therefore, usability can be improved for the user.
  • the voice response device Comprising first personality information generating means for generating personality information of the user or the related person based on answers to a plurality of preset questions;
  • the personality information acquisition unit may acquire personality information generated by the personality information generation unit.
  • personality information can be generated in the voice response device.
  • a well-known personality analysis technique (Rorschach test, Sondy test, etc.) may be used.
  • aptitude inspection technology used for employment tests by companies and the like may be used.
  • Second character information generating means for generating character information of the user or the related person based on a character string included in the input character information;
  • the personality information acquisition unit may acquire personality information generated by the personality information generation unit.
  • Preference information generating means for generating preference information indicating a tendency of preference of the user or the related person based on a character string included in character information;
  • the voice output means may select a response to be output from the response candidate based on the preference information, and output the selected response.
  • a response can be made according to the preference of the user or the person concerned. Further, in the above voice response device, as in the invention of the twelfth aspect, the user's behavior (conversation, place moved, reflected in the camera) is learned (recorded and analyzed), and the user's conversation You may make up for the lack of words.
  • Response candidate acquisition means for acquiring response candidates from a predetermined server or the Internet May be provided.
  • response candidates can be acquired not only from the device itself or an external device, but also from any device connected via the Internet or a dedicated line.
  • Character information generation means for converting user's action into character information; May be provided.
  • the action referred to in the present invention corresponds to an action caused by a muscle action such as conversation, handwriting of characters, or gesture gesture (for example, sign language).
  • a muscle action such as conversation, handwriting of characters, or gesture gesture (for example, sign language).
  • gesture gesture for example, sign language
  • the user's action can be converted into character information.
  • the character information generation means converts the voice of the user's utterance into character information, and accumulates utterances (such as pronunciation utterances) at the time of utterance as learning information (captures and records these characteristics) You may do it.
  • the character information can be generated based on the learning information, so that the generation accuracy of the character information can be improved.
  • Transfer means for transferring the learning information to another voice response device May be provided.
  • the learning information recorded by the voice response device can be used. Therefore, even when other voice response devices are used, the generation accuracy of character information can be improved.
  • any one of the user's behavior and operation may be detected, and learning information or personality information may be generated based on these. .
  • a voice response device for example, when it is detected that the user jumps on the train for several days in a row, it is urged to leave the house several minutes earlier from the next day, or the user is easily angry from the conversation. When it is detected that there is a tendency, it is possible to output voice or music that suppresses the mood.
  • a response can be generated based on information recorded in another voice response device.
  • Reproduction condition determination means for determining whether or not the state of the voice response device matches a reproduction condition set in advance as a condition for outputting voice when the character information is not input; Message reproduction means for outputting a preset message when the reproduction condition is satisfied; May be provided.
  • voice can be output even when character information is not input (that is, when the user does not speak). For example, by forcing the user to speak, it can be used as a measure for suppressing drowsiness while driving a car. Moreover, safety confirmation can be performed by determining whether a person living alone responds.
  • the message reproducing means may acquire news information and output a message related to the news in a question format for asking a user's answer.
  • the voice response device since it is possible to have a conversation about news, it is possible to prevent the conversation from always being the same. For example, if the information about the stock price of a company can be acquired, the content of the conversation can be "Today's stock price of XX company has increased by XX yen. Did you know?" .
  • the voice output means or the message reproduction means may be made to output by adding externally acquired information (news and environment (temperature, weather, position information, etc.)) separately acquired to a preset message.
  • a response in which a predetermined message and the acquired information are combined can be output.
  • a plurality of messages may be acquired, and a message to be reproduced may be selected and output according to the message reproduction frequency.
  • a voice response device it is difficult to reproduce a message having a high reproduction frequency, so that randomness at the time of message reproduction is achieved, or a message having a high reproduction frequency is intentionally reproduced repeatedly. It can promote establishment.
  • Unanswered transmission means for transmitting information that specifies a user and that a reply was not obtained to a preset contact address when a reply or response to a message is not obtained, May be provided.
  • the message playback means stores the conversation content and asks questions to obtain the same content as the heard content (memory confirmation processing). You may do it.
  • the user's memory ability can be confirmed and the memory can be fixed.
  • Utterance accuracy detection means for detecting the accuracy of the pronunciation and accent of the voice input by the user
  • An accuracy output means for outputting the detected accuracy; May be provided.
  • the accuracy level output means may output a voice including the nearest word when the accuracy level is a predetermined value or less.
  • the user can confirm the accuracy of pronunciation and accent. Furthermore, in the voice response device, as in the invention of the twenty-seventh aspect, the message reproduction means may output the same question again when the accuracy is below a certain value.
  • the connection control means may identify a sales activity (sales) and a visitor, and reproduce a message to decline if it is a sales activity.
  • a keyword included in input character information may be extracted and connected to a connection destination to which the keyword corresponds.
  • a keyword such as the name of the other party may be associated with the connection destination in advance.
  • a voice response device it is possible to assist operations such as telephone transfer and call reception.
  • the said voice response apparatus like the invention of the 31st aspect, it may be made to recognize the requirements which the other party speaks based on a keyword, and to tell the user the outline which the other party spoke.
  • Emotions that read emotions from the voice color of the voice input by the user and output the emotions that fall into at least one of emotions including at least one of normal, anger, joy, confusion, sadness, and uplift You may provide the determination means.
  • the invention of the 33rd aspect is Response generation means for generating a response according to a captured image obtained by imaging the periphery of the voice response device when the character information is input; Voice output means for outputting the response by voice; It is provided with.
  • a response can be output by voice according to the captured image. Therefore, usability can be improved compared with the structure which produces
  • Voice input video acquisition means for acquiring a moving image obtained by capturing the shape of the mouth of the user when inputting character information by voice;
  • Character information conversion means for converting the sound into character information and correcting the character information by estimating an unclear part of the sound based on the moving image; May be provided.
  • the utterance content can be estimated from the shape of the mouth, so that an unclear part of the voice can be estimated well.
  • the message reproduction means may detect the user's irritation and sway by detecting the unexpectedly uttered voice, and may generate a message for suppressing the irritation and sway.
  • the voice response device In the case of performing guidance to the destination, it is provided with route information acquisition means for acquiring route information such as weather, temperature, humidity, traffic information, road surface condition to the destination, The message reproducing means may output the route information by voice.
  • route information acquisition means for acquiring route information such as weather, temperature, humidity, traffic information, road surface condition to the destination.
  • the message reproducing means may output the route information by voice.
  • Gaze detection means for detecting the gaze of the user;
  • a line-of-sight movement request transmission unit that outputs a sound requesting to move the line of sight to a predetermined position when the user's line of sight does not move to a predetermined position in response to the call by the message reproduction unit; May be provided.
  • a change request transmitting means for observing the position of the body part and facial expression and outputting a voice requesting to change the position of the body part and facial expression when there is little change in the call may be provided.
  • the position of the body part of the user can be moved to a specific position or can be guided to have a specific facial expression.
  • the present invention can be used when driving a vehicle or performing a physical examination.
  • Broadcast program acquisition means for acquiring a broadcast program similar to the broadcast program viewed by the user;
  • a broadcast program supplementing means for complementing the discontinued broadcast program by outputting the broadcast program acquired by itself when the broadcast program is interrupted; May be provided.
  • the voice response device it is possible to compensate for the broadcast program viewed by the user from being interrupted.
  • the voice response device as in the invention of the forty-first aspect, When a user sings a song without lyrics, the song with lyrics is compared to the song with the lyrics, and the lyrics are added to output the sound in the part where only the user's lyrics are not present Means.
  • a voice response device it is possible to compensate for a portion where the user cannot sing (a portion where the lyrics are interrupted) in so-called karaoke. Furthermore, in the voice response device, as in the invention of the forty-second aspect, When a character is included in the captured image, when a user receives a question about how to read this character, the character information is acquired from the outside, and the reading output that causes the character reading included in this information to be output by voice means, May be provided.
  • the user can be taught how to read characters.
  • the voice response device as in the invention of the 43rd aspect, It is equipped with behavioral environment detection means that detects the user's behavior and the surrounding environment of the user,
  • the message generation means may generate a message according to the detected action and the surrounding environment.
  • the health condition of the user can be managed.
  • a voice response device it is possible to make a report when the health state of the user is equal to or less than a reference value. Therefore, the abnormality can be notified to the other person earlier. Further, in the above voice response device, as with the invention of the 46th aspect, information about the user may be output in response to an inquiry from a person other than the user.
  • Such a voice response device can answer a question in a hospital or the like on behalf of the user by detecting, for example, the walk distance of the user's meal content. Moreover, you may be allowed to learn about health conditions and self-introduction.
  • FIG. 1 is a block diagram showing a schematic configuration of a voice response system to which the present invention is applied. It is a block diagram which shows schematic structure of a terminal device. It is a flowchart which shows the voice response terminal process which MPU of a terminal device performs. It is a flowchart which shows the voice response server process which the calculating part of a server performs. It is explanatory drawing which shows an example of response candidate DB. It is a flowchart which shows the automatic conversation terminal process which MPU of a terminal device performs. It is a flowchart which shows the automatic conversation server process which the calculating part of a server performs. It is a flowchart which shows the message terminal process which MPU of a terminal device performs.
  • SYMBOLS 1 Terminal device, 10 ... Behavior sensor unit, 11 ... Dimensional acceleration sensor, 13 ... Axis gyro sensor, 15 ... Temperature sensor, 17 ... Humidity sensor, 19 ... Temperature sensor, 21 ... Humidity sensor, 23 ... Illuminance sensor, 25 ... Wet sensor 27 ... GPS receiver 29 ... Wind speed sensor 33 ... Electrocardiographic sensor 35 ... Heart sound sensor 37 ... Microphone 39 ... Memory 41 ... Camera 50 ... Communication unit 53 ... Wireless telephone unit 55 ... Contact memory, 60 ... notification unit, 61 ... display, 63 ... lighting, 65 ... speaker, 70 ... operation unit, 71 ... touch pad, 73 ... confirmation button, 75 ...
  • the voice response system 100 to which the present invention is applied is a system configured to generate an appropriate response at the server 90 and output the response by voice at the terminal device 1 with respect to the voice input at the terminal device 1. It is. Specifically, as shown in FIG. 1, the voice response system 100 is configured such that a plurality of terminal devices 1 and a server 90 can communicate with each other via a communication base station 80 or an Internet network 85.
  • the server 90 has a function as a normal server device.
  • the server 90 includes a calculation unit 101 and various databases (DB).
  • the calculation unit 101 is configured as a well-known calculation device including a CPU and a memory such as a ROM and a RAM. Based on a program in the memory, the calculation unit 101 communicates with the terminal device 1 and the like via the Internet network 85, Various processes such as voice recognition and response generation for performing reading / writing of data in various DBs or conversation with a user using the terminal device 1 are performed.
  • a speech recognition DB 102 As various DBs, as shown in FIG. 1, a speech recognition DB 102, a predictive conversion DB 103, a speech DB 104, a response candidate DB 105, a personality DB 106, a learning DB 107, a preference DB 108, a news DB 109, a weather DB 110, a reproduction condition DB 111, handwritten characters / sign language DB 112, terminal information DB 113, emotion determination DB 114, health determination DB 115, karaoke DB 116, report destination DB 117, sales DB 118, client DB 119, and the like.
  • the details of these DBs will be described every time the processing is described.
  • the terminal device 1 includes a behavior sensor unit 10, a communication unit 50, a notification unit 60, and an operation unit 70 provided in a predetermined housing.
  • the behavior sensor unit 10 includes a well-known MPU 31 (microprocessor unit), a memory 39 such as a ROM and a RAM, and various sensors.
  • the MPU 31 includes sensor elements that constitute various sensors to be inspected (humidity, wind speed, etc.). For example, processing such as driving a heater for optimizing the temperature of the sensor element is performed so that the detection can be performed satisfactorily.
  • the behavior sensor unit 10 includes, as various sensors, a three-dimensional acceleration sensor 11 (3DG sensor), a three-axis gyro sensor 13, a temperature sensor 15 disposed on the back surface of the housing, and humidity disposed on the back surface of the housing.
  • a wetness sensor 25 a GPS receiver 27 that detects the current location of the terminal device 1, and a wind speed sensor 29.
  • the behavior sensor unit 10 also includes an electrocardiogram sensor 33, a heart sound sensor 35, a microphone 37, and a camera 41 as various sensors.
  • the temperature sensors 15 and 19 and the humidity sensors 17 and 21 measure the temperature or humidity of the outside air of the housing as an inspection target.
  • the three-dimensional acceleration sensor 11 measures accelerations applied to the terminal device 1 in three orthogonal directions (vertical direction (Z direction), width direction of the casing (Y direction), and thickness direction of the casing (X direction)). Detect and output the detection result.
  • the three-axis gyro sensor 13 has an angular velocity applied to the terminal device 1 as a vertical direction (Z direction), two arbitrary directions orthogonal to the vertical direction (a width direction of the casing (Y direction), and a casing Angular acceleration (thickness direction (X direction)) (counterclockwise speed in each direction is positive) is detected, and the detection result is output.
  • the temperature sensors 15 and 19 include, for example, a thermistor element whose electric resistance changes according to temperature.
  • the temperature sensors 15 and 19 detect the Celsius temperature, and all temperature displays described in the following description are performed at the Celsius temperature.
  • the humidity sensors 17 and 21 are configured as, for example, known polymer film humidity sensors.
  • This polymer film humidity sensor is configured as a capacitor in which the amount of moisture contained in the polymer film changes in accordance with the change in relative humidity and the dielectric constant changes.
  • the illuminance sensor 23 is configured as a well-known illuminance sensor including a phototransistor, for example.
  • the wind speed sensor 29 is, for example, a well-known wind speed sensor, and calculates the wind speed from electric power (heat radiation amount) necessary for maintaining the heater temperature at a predetermined temperature.
  • the heart sound sensor 35 is configured as a vibration sensor that captures vibrations caused by the beat of the heart of the user.
  • the MPU 31 considers the detection result of the heart sound sensor 35 and the heart sound input from the microphone 37, Distinguish between noise and other vibrations and noise.
  • the wetness sensor 25 detects water droplets on the surface of the housing, and the electrocardiographic sensor 33 detects the user's heartbeat.
  • the camera 41 is arranged in the casing of the terminal device 1 so that the outside of the terminal device 1 is an imaging range.
  • the communication unit 50 includes a well-known MPU 51, a wireless telephone unit 53, and a contact memory 55, and can acquire detection signals from various sensors constituting the behavior sensor unit 10 via an input / output interface (not shown). It is configured. And MPU51 of the communication part 50 performs the process according to the detection result by this behavior sensor unit 10, the input signal input via the operation part 70, and the program stored in ROM (illustration omitted).
  • the MPU 51 of the communication unit 50 functions as an operation detection device that detects a specific operation performed by the user, a function as a positional relationship detection device that detects a positional relationship with the user, and is performed by the user.
  • the function as an exercise load detection device for detecting the exercise load and the function of transmitting the processing result by the MPU 51 are executed.
  • the radio telephone unit 53 is configured to be able to communicate with, for example, a mobile phone base station, and the MPU 51 of the communication unit 50 outputs a processing result by the MPU 51 to the notification unit 60 or via the radio telephone unit 53. To a preset destination.
  • the contact address memory 55 functions as a storage area for storing location information of the user's visit destination.
  • the contact address memory 55 stores information on contact information (such as a telephone number) to be contacted when an abnormality occurs in the user.
  • the notification unit 60 includes, for example, a display 61 configured as an LCD or an organic EL display, an electrical decoration 63 made of LEDs that can emit light in, for example, seven colors, and a speaker 65.
  • a display 61 configured as an LCD or an organic EL display
  • an electrical decoration 63 made of LEDs that can emit light in, for example, seven colors
  • a speaker 65 Each part which comprises the alerting
  • the operation unit 70 includes a touch pad 71, a confirmation button 73, a fingerprint sensor 75, and a rescue request lever 77.
  • the touch pad 71 outputs a signal corresponding to the position and pressure touched by the user (user, user's guardian, etc.).
  • the confirmation button 73 is configured so that the contact of the built-in switch is closed when pressed by the user, and the communication unit 50 can detect that the confirmation button 73 is pressed. Yes.
  • the fingerprint sensor 75 is a well-known fingerprint sensor, and is configured to be able to read a fingerprint using, for example, an optical sensor.
  • a means for recognizing a human physical feature such as a sensor for recognizing the shape of a palm vein (means capable of biometric authentication: identifying an individual) If it is a possible means), it can be adopted.
  • the voice response terminal process performed in the terminal device 1 is a process of receiving voice input by the user, sending the voice to the server 90, and playing back the voice response when receiving a response to be output from the server 90. . This process is started when the user inputs a voice input via the operation unit 70.
  • the input from the microphone 37 is accepted (ON state) (S2), and imaging (recording) by the camera 41 is started (S4). Then, it is determined whether or not there is a voice input (S6).
  • the timeout indicates that the allowable time for waiting for processing has been exceeded, and here the allowable time is set to about 5 seconds, for example.
  • the process returns to S10. If the voice input is not completed (S12: NO), the process returns to S10. If the voice input has been completed (S12: YES), data such as an ID for identifying itself, a voice, and a captured image are packet-transmitted to the server 90 (S14). Note that the process of transmitting data may be performed between S10 and S12.
  • S16 it is determined whether or not the data transmission is completed. If transmission has not been completed (S16: NO), the process returns to S14. If the transmission has been completed (S16: YES), it is determined whether or not data (packet) transmitted by the voice response server process described later has been received (S18). If no data has been received (S18: NO), it is determined whether or not a timeout has occurred (S20).
  • reception is completed S24. If reception has not been completed (S24: NO), it is determined whether or not a timeout has occurred (S26). If timeout has occurred (S26: YES), the fact that an error has occurred is output via the notification unit 60, and the voice response terminal process is terminated. If the time has not expired (S26: NO), the process returns to S22.
  • a response based on the received packet is output from the speaker 65 by voice (S28).
  • voice a response based on the received packet is output from the speaker 65 by voice (S28).
  • the plurality of responses are reproduced with different voice colors.
  • the voice response server process is a process of receiving voice from the terminal device 1, performing voice recognition for converting the voice into character information, and generating a response to the voice and returning it to the terminal device 1.
  • a plurality of responses may be transmitted in association with different voice colors.
  • the communication partner terminal device 1 is specified (S44). In this process, the terminal device 1 is specified by the ID of the terminal device 1 included in the packet.
  • the voice included in the packet is recognized (S46).
  • the speech recognition DB 102 many speech waveforms and many characters are associated with each other.
  • the predictive conversion DB 103 is associated with a word that is likely to be used after a certain word.
  • a known voice recognition process is performed by referring to the voice recognition DB 102 and the prediction conversion DB 103 to convert the voice into character information. Subsequently, an object in the captured image is specified by performing image processing on the captured image (S48). Then, the user's emotion is determined based on the waveform of the voice or the ending of the word (S50).
  • the user is referred to by referring to the emotion determination DB 114 in which a speech waveform (voice color), a word ending, and the like are usually associated with emotion categories such as anger, joy, confusion, sadness, and elevation. It is determined whether or not the emotion falls in any category, and the determination result is recorded in the memory. Subsequently, by referring to the learning DB 107, a word often spoken by the user is searched, and a portion where the character information generated by the speech recognition is ambiguous is corrected.
  • a speech waveform voice color
  • a word ending, and the like are usually associated with emotion categories such as anger, joy, confusion, sadness, and elevation. It is determined whether or not the emotion falls in any category, and the determination result is recorded in the memory.
  • the learning DB 107 a word often spoken by the user is searched, and a portion where the character information generated by the speech recognition is ambiguous is corrected.
  • the learning DB 107 user features such as words often spoken by the user and habits during pronunciation are recorded for each user. Further, addition / correction of data to the learning DB 107 is performed in a conversation with the user.
  • the corrected character information is specified as the input character information (S54), and a response similar to the character information is retrieved from the response candidate DB 105 as an input to obtain a response from the response candidate DB 105 (S56).
  • the response candidate DB 105 as shown in FIG. 5, input character information, first output, first output voice color, second output, and second output voice color are uniquely associated.
  • the first output “Today's * weather is *” will be the voice of woman 1. Output in association.
  • the portion “*” is acquired by accessing the weather DB 110 in which the region name and the weather forecast for several days in the region are associated with each other.
  • the weather at the time when today's weather changes is also acquired from the weather DB 110, and the second output “However, * is *.” 1 is output in association with the voice color.
  • “Today's Tokyo weather” is entered, and the voice of woman 1 is output, "Today's Tokyo weather is sunny.” The voice of man 1 will output “However, it will rain tomorrow.”
  • the response contents are associated with the voice color (S60).
  • the voice DB 104 stores an artificial voice database for each voice color, and in this process, the voice color set for each response is associated with the voice color in the database.
  • the response content is converted into voice (S62).
  • a process for outputting response contents (character information) as a voice is performed.
  • the generated response (voice) is packet-transmitted to the communication partner terminal device 1 (S64).
  • the packet may be transmitted while generating the voice of the response content.
  • the conversation content is recorded (S68).
  • the input character information and the output response contents are recorded in the learning DB 107 as conversation contents.
  • keywords words recorded in the speech recognition DB 102 included in the conversation content, pronunciation characteristics, and the like are recorded in the learning DB 107.
  • the voice response system 100 described in detail above is a system that makes a response to inputted character information by voice, and the terminal device 1 (MPU 31) acquires a plurality of different responses to the character information, and Are output in different voice colors.
  • a voice response system 100 since a plurality of responses can be output with different voice colors, different solutions are used with different voice colors even when a single answer to one character information cannot be specified. Can be output in an easy-to-understand manner. Therefore, it is possible to improve usability for the user.
  • the terminal device 1 inputs a voice by the user via the microphone 37, and the server 90 (calculation unit 101) converts the inputted voice into character information, A plurality of different responses are generated and transmitted to the terminal device 1. Then, the terminal device 1 acquires a response from the server 90.
  • the terminal device 1 can input voice, it can be configured to input character information by voice. Moreover, since it can be set as the structure which produces
  • the server 90 converts the voice of the user's utterance into character information, and accumulates utterances (such as pronunciation utterances) at the time of utterance as learning information (capturing the features and using the features). Record).
  • the character information can be generated based on the learning information, so that the generation accuracy of the character information can be improved.
  • the server 90 reads the emotion from the voice color of the voice input by the user, and usually includes at least one of anger, joy, confusion, sadness, and emotion. , Which emotion is applicable is output.
  • voice recognition is used as a configuration for inputting character information.
  • the present invention is not limited to voice recognition, and may be input using input means (operation unit 70) such as a keyboard or a touch panel.
  • operation unit 70 such as a keyboard or a touch panel.
  • the server 90 includes a response candidate DB 105 in which a plurality of different responses including a positive response and a negative response for each character information are recorded for each of the plurality of character information.
  • the terminal device 1 may acquire a positive response and a negative response as a plurality of different responses, and reproduce the voices with different voices according to the positive response and the negative response.
  • responses of different positions such as a positive response and a negative response can be reproduced with different voice colors, so that a voice is reproduced as if another person is speaking. Can do. Therefore, it is possible to make it difficult for the user who listens to the voice to feel uncomfortable.
  • the voice color may be changed depending on the type of response and the language used in the response. For example, when a response is made with a gentle tone, the voice is reproduced with a calm woman's voice, and when a response is made with a severe tone, a response with a brave man's voice may be made. That is, the response content and the personality are associated with each other, and the voice color may be set according to the personality.
  • a response for example, a positive response or a negative response
  • the response may be generated.
  • This configuration can be realized using one or a plurality of terminal devices 1.
  • voices may be directly input / output, or wireless communication or the like may be used.
  • data may be transmitted to other terminal devices 1 in the process of S66.
  • the calculation unit 101 learns (records and analyzes) the user's behavior (conversation, the place where the user moved, and what is reflected in the camera) to compensate for the lack of words in the user's conversation. You may do it.
  • the server 90 may acquire response candidates from a predetermined server or the Internet.
  • response candidates can be acquired not only from the server 90 but also from any device connected via the Internet, a dedicated line, or the like.
  • Speech Embodiment [Process of Second Embodiment] Next, another type of voice response system will be described.
  • This embodiment (second embodiment) In the following embodiment, only the parts different from the voice response system 100 of the first embodiment will be described in detail, and the same parts as the voice response system 100 of the first embodiment will be the same. The description is abbreviate
  • the voice response system outputs voice even when the user does not input character information.
  • the terminal device 1 performs the automatic conversation terminal process shown in FIG.
  • the automatic conversation terminal process is a process that is started when the terminal device 1 is turned on, for example, and is repeatedly executed thereafter.
  • the automatic conversation terminal process first, it is determined whether or not the setting for performing an automatic conversation is ON (S82). Whether or not to perform an automatic conversation can be set by the user via the operation unit 70 or by inputting voice.
  • the server 90 executes the automatic conversation server process shown in FIG.
  • the automatic conversation server process is a process that is started when the server 90 is turned on, for example, and then repeatedly executed.
  • the automatic conversation server processing first, it is determined whether or not the fact that the automatic conversation mode is set is received from the terminal device 1 (S92). If it is not received that the automatic conversation mode is set (S92: NO), the process proceeds to S98.
  • the terminal device 1 to be a communication partner is specified based on the ID included in the received packet (S94), The automatic conversation is set (S96). Subsequently, it is determined whether or not the reproduction condition is satisfied for each of the terminal devices 1 that are set to have automatic conversation (S98).
  • the playback condition is, for example, that a certain time has elapsed since the previous conversation (speech input), a certain time of the day, or a specific weather, and any sensor value is abnormal. When the value is shown.
  • the latest and the message in accordance with the playback conditions are, for example, "Good morning.” Or "Hello.” May be a fixed sentence such as, obtained from the news DB109 the latest news is automatically updated It may be related to news. For example, if you want to get information about the latest news, for example, if you can get information about the stock price of a certain company, "Today's stock price of XX company has increased by XX yen. Did you know? Or the like.
  • the processes of S42 to S54 described above are performed. Then, when the processing of S54 is completed, it is determined whether or not a predetermined answer has been obtained from the terminal device 1 that is the communication partner (S112).
  • the predetermined answer may be, for example, some voice or a specific answer. For example, for the question “Do you know?” For example, the answer “Do you know” or “Do not know” corresponds to the question “Do you know the weather?” On the other hand, those including words indicating weather such as “rainy” or “sunny” are applicable.
  • the server 90 determines whether or not the situation of the voice response system 100 matches a playback condition set in advance as a condition for outputting voice. When the reproduction condition is met, a preset message is output.
  • a voice response system 100 it is possible to output voice even when character information is not input (that is, when the user does not speak). For example, by forcing the user to speak, it can be used as a measure for suppressing drowsiness while driving a car. Moreover, safety confirmation can be performed by determining whether a person living alone responds.
  • the server 90 acquires news information and outputs a message related to the news in a question format for asking a user's answer. According to such a voice response system 100, since it is possible to have a conversation about news, it can be suppressed that the conversation is always the same.
  • the server 90 adds and outputs externally acquired information (news and environment (temperature, weather, location information, etc.)) acquired separately to a preset message.
  • externally acquired information news and environment (temperature, weather, location information, etc.)
  • a response in which a predetermined message and the acquired information are combined can be output. Further, in the voice response system 100, when the response to the response or message cannot be obtained, the server 90 informs the preset contact information that the user has not been obtained and that the response has not been obtained. Send.
  • a voice response system 100 it is possible to notify a contact person when an answer cannot be obtained. Therefore, for example, an abnormality such as an elderly person living alone can be notified early.
  • the server 90 may acquire a plurality of messages, and select and output a message to be reproduced according to the reproduction frequency of the message.
  • a voice response system 100 it is difficult to reproduce a message having a high reproduction frequency, thereby achieving randomness at the time of message reproduction, or repetitively reproducing a message having a high reproduction frequency to call attention or store the message. Or can be promoted.
  • the terminal device 1 notifies the user that it is difficult to tell the user directly. For example, if you talk to this device today to say something like this before dating, at an appropriate time (for example, when a preset time or a certain time has passed since the conversation was interrupted) The voice response system 100 speaks instead (plays voice).
  • the terminal device 1 performs the message terminal process shown in FIG. 8, and the server 90 performs the message server process shown in FIG.
  • the message terminal process is a process that is started when the terminal device 1 is turned on, for example, and then repeatedly executed.
  • S136 it is determined whether or not a packet from the server 90 has been received (S136). If no packet is received (S136: NO), the process of S136 is repeated. If a packet has been received (S136: YES), the processing of S24 to S30 is performed, and the message terminal processing is terminated.
  • the message server process is a process that starts when the server 90 is powered on, for example, and is repeatedly executed thereafter. Specifically, first, it is determined whether or not a packet is received from any one of the terminal devices 1 (S142). If no packet has been received (S142: NO), the process proceeds to S156 described later.
  • the communication partner terminal device 1 is specified (S44), and it is determined whether or not the packet includes a mode flag such as a message mode flag (S144). . If there is no mode flag (S144: NO), the process proceeds to S148.
  • the server 90 If there is a mode flag (S144: YES), the server 90 also sets the mode by setting the flag corresponding to the terminal device 1 of the communication partner to the ON state (S146). For example, if the message mode flag corresponds to the message mode, the processing of S46 to S152 described later is performed. If the guidance mode flag described later corresponds to the guidance mode, S46 to S176 (see FIG. 11) is performed. Will be.
  • the message reproduction condition can be set in advance by the user via the operation unit 70 of the terminal device 1, and corresponds to, for example, time and position.
  • the message reproduction condition is transmitted to the server 90 at the time of packet transmission for message terminal processing.
  • the message and voice are associated with each other and recorded in the memory (S152), and the process proceeds to S156. If the message flag is OFF (S148: NO), processing relating to another mode is performed (S154), and it is determined whether or not the playback timing has come (S156).
  • the reproduction timing indicates contents set in the message reproduction condition.
  • the voice input by the user is not played back immediately, but can be played back when a message playback condition is satisfied after a certain time.
  • the content spoken by the user is reproduced.
  • a word that triggers a difficult thing to say for example, "Sorry, did you say something to her?" It may be configured to speak such words.
  • the terminal device 1 performs the guidance terminal process shown in FIG. 10, and the server 90 performs the guidance server process shown in FIG.
  • the guidance terminal process is a process that is started when the terminal device 1 is turned on, for example, and then repeatedly executed. For example, this is a process that is started when the terminal device 1 is powered on and then repeatedly executed.
  • the guidance terminal process as shown in FIG. 10, it is first determined whether or not the guidance mode is set by the user (S162). If the guidance mode is not set (S162: NO), the process of S162 is repeated.
  • S166 it is determined whether a packet from the server 90 has been received (S166). If no packet has been received (S166: NO), the process of S166 is repeated. If a packet has been received (S166: YES), the processing of S24 to S30 is performed, and the guidance terminal processing is terminated.
  • the guidance server process is a process that is started, for example, when the server 90 is turned on and then repeatedly executed.
  • the processes of S142 to S146 described above are executed.
  • the guidance reproduction condition can be set in advance by the user via the operation unit 70 of the terminal device 1, and corresponds to, for example, time and position.
  • the guided reproduction condition is transmitted to the server 90 at the time of packet transmission for message terminal processing.
  • the guidance content is generated, and the guidance content and voice (voice color) are associated with each other and recorded in the memory (S176).
  • the guidance content for example, a word representing a desire such as “I want to” or “hope” included in the input character information is searched, keywords before these words are extracted, and these keywords are induced.
  • the words registered as words are output as guidance contents.
  • the keyword and the word indicating the guidance content are associated with each other in advance and recorded in the response candidate DB 105.
  • the terminal device 1 is installed at a company reception or the like. It can also be used for telephone reception for company representative telephones and telephone banking.
  • it implement achieves by replacing the process of S56 in 1st Embodiment with the reception process shown in FIG.
  • a response for asking the company name and personal name is generated (S194), and the reception process is terminated. In this process, for example, a response such as “Please tell us your name and business” is generated.
  • the company name or personal name is included in the character information (S192: NO)
  • the company name or personal name is extracted from the sales DB 118 and the client DB 119 (S196).
  • the sales DB 118 the name of the company and the person in charge who came to the sales in the past, or the name of the Kramer who only talks about complaints are recorded.
  • the client DB 119 a company name, a person in charge of the company, a person in charge on the user side (in-house side) of the terminal device 1, a schedule such as a scheduled visit time, and a contact address are recorded in association with each person in charge. ing.
  • the person who has come to the reception visits the schedule in the client DB 119 at a close time (for example, within 1 hour before and after the current time). It is determined whether or not it is a person who comes (S202). If the person is visiting at a close time (S202: YES), the contact information of the person in charge of this person is extracted from the client DB 119 so that the person in charge and the person who has come to the reception can have a conversation. The person in charge is connected (S204). In this process, it is only necessary to connect to the extension telephone of the person in charge, a mobile phone or the like.
  • an acceptance response for the client is generated (S206).
  • a response for the client for example, a response such as “Thank you XXX, please wait for a while as you are connected to the person in charge” is generated.
  • the acceptance process ends.
  • this person is connected to a preset contact for reception so that this person in charge and the person who has come to the reception can have a conversation.
  • the person in charge is connected (S208). Then, a normal acceptance response is generated (S210).
  • the normal reception response for example, a response such as “Please wait for a while because it is connected to reception” is generated.
  • the acceptance process ends.
  • the voice response system 100 is configured to be used at a workplace or company reception. In this configuration, the name and company name of the person coming to the sales are recorded in advance in the sales DB 118 of the server 90. Generate a response to play.
  • the server 90 identifies a communication partner based on the input character information, and connects a communication destination set in advance for each communication partner and the communication partner. According to such a voice response system 100, it is possible to assist reception work and telephone support. Moreover, according to such a voice response system 100, it is possible to eliminate a person who may interfere with a user's business without dealing with it.
  • the server 90 extracts a keyword included in the input character information (particularly voice) and connects to a connection destination corresponding to the keyword.
  • a keyword such as the name of the other party is associated with the connection destination in advance.
  • connection destination is set according to the other party.
  • requirements keyboards included in the character information
  • connection destination may be changed according to the requirements.
  • the server 90 may recognize the requirement of the other party to speak based on the keyword, and transmit the outline of the other party to the user. According to the voice response system 100 as described above, it is possible to assist an intermediary service with a customer.
  • the terminal device 1 may provide information requested by the other terminal device 1.
  • the server 90 requests other terminal device 1 for necessary information in the process of S56, and generates a response after obtaining necessary information from the other terminal device 1. And in the terminal device 1 which provides required information, the information provision terminal process shown in FIG. 13 is implemented.
  • the information providing terminal process is a process that is started when there is a request from the server 90, for example.
  • an information providing destination is extracted (S222).
  • the information providing destination indicates another terminal device 1 that requests information, and an ID for specifying the other terminal device 1 is included in the request from the server 90.
  • the partner is permitted to provide information (S224: YES)
  • the requested information is acquired from its own memory 39 or various sensors (S226), and this data is transmitted to the server 90 (S228). If it is not the other party who permits the provision of information (S224: NO), the server 90 is notified that the provision of information is rejected (S230).
  • the server 90 requests location information from Mr. XX's terminal device 1 in response to the question “What is Mr. XX doing?”
  • the terminal device 1 returns position information.
  • the server 90 recognizes the action of Mr. XX based on the position information. For example, if you are moving on the track at a speed faster than the speed of humans, it is judged that you are moving on a train, and “Mr. XX is on the train. And generate a response.
  • the server 90 acquires information recorded in the other terminal device 1 from another terminal device 1 different from the requesting terminal device 1 and provides the information to the other terminal device 1. That is, in the voice response system 100, the server 90 acquires information for generating a response to the character information from the other terminal device 1.
  • a response can be generated based on information recorded in another terminal device 1.
  • the terminal device 1 requests information for generating a response to the character information from another terminal device 1, the terminal device 1 returns information corresponding to the request.
  • the terminal device 1 includes sensors for detecting position information, temperature, humidity, illuminance, noise level, and a database such as dictionary information, and extracts necessary information as required.
  • information unique to the other terminal device 1 such as the position of the other terminal device 1 can be acquired.
  • information unique to itself can be transmitted to another terminal device 1.
  • a personality DB 106 is prepared, in which personality information that associates personalities of users or persons who are related to the users according to preset categories is recorded. ing. For example, as shown in FIG. 14, the personality DB 106 records the names of users and parties and the personality classifications of these persons in association with each other.
  • a personality test is performed on users and related parties, and the test results are also recorded.
  • a known personality analysis technique (Rorschach test, Sondy test, etc.) may be used.
  • aptitude inspection technology used for employment tests by companies and the like may be used.
  • the personality information generation process is a process that starts when, for example, the terminal device 1 is input to generate personality information using the operation unit 70 or the like.
  • the microphone 37 is turned on (S242), and one of the predetermined four-choice questions is output by voice (S244).
  • the four-choice question may be acquired from the server 90, or a problem recorded in advance in the memory 39 may be asked.
  • S246 it is determined whether or not there is a voice response from the target person (user or related person) (S246). If there is no answer (S246: NO), the process of S246 is repeated. If there is an answer (S246: YES), conversation parameters such as word ending and conversation speed are extracted (S248), and it is determined whether or not the current problem is the final problem (S250). If it is not the final problem (S250: NO), the next problem is selected (S252), and the process returns to S242.
  • a personality analysis is performed by answering a four-choice question (S254), and a personality analysis is performed using conversation parameters (S256).
  • conversation parameters those who are confident in themselves tend to have a strong ending, those who are not confident tend to have a weak ending, and those who are impatient have a fast conversation speed, those who are quiet have a slow conversation speed Trends can be captured.
  • these personality analysis results are comprehensively analyzed, such as weighted average (S258), and assigned to personality categories (S260). Specifically, the personality of the subject obtained through the test is scored and assigned to the personality classification for each score.
  • the target person and the personality classification are associated (S262) and recorded in the personality DB 106 (S264). That is, the relationship between the target person and the personality classification is transmitted to the server 90. At this time, the test result is also transmitted to the server 90, and the server 90 constructs the personality DB 106 as shown in FIG. When such processing ends, the personality information generation processing ends.
  • a response candidate DB 105 is prepared in which personality classifications and responses different from each other are associated with each other.
  • the server 90 acquires response candidates representing a plurality of different responses to the character information in the process of S56, selects a response to be output from the response candidates according to the personality information, and in the processes of S60 and S64, Output the selected response.
  • the terminal device 1 In the voice response system 100, the terminal device 1 generates personality information of a user or a person concerned based on answers to a plurality of preset questions, and acquires the generated personality information.
  • personality information can be generated in the server 90 or the terminal device 1. Further, in the voice response system 100, the calculation unit 101 generates personality information of the user or related person based on the character string included in the input character information.
  • personality information can be generated in the process in which the user uses the voice response system 100.
  • a different response can be performed according to the character of the user or a person related to the user (related person). Therefore, usability can be improved for the user.
  • the response may be output after being narrowed down to one according to the personality, or the voices of different voice colors may be associated with the plurality of responses and output.
  • the processing of S248 and S254 to S264 may be performed by the server 90.
  • the voice and problem may be exchanged between the terminal device 1 and the server 90 while allowing the server 90 to identify the terminal device 1.
  • the server 90 may detect any one of the user's actions and operations, and generate learning information or personality information based on these.
  • a voice response system 100 for example, when it is detected that the user jumps on the train for several consecutive days, the user is prompted to leave the house several minutes earlier from the next day, or the user is angry from the conversation. When it is detected that there is a tendency to be easy, it is possible to output voice and music to suppress mood.
  • a preference DB 108 is prepared in which preference information in which preferences of users and parties are associated in accordance with preset categories is recorded.
  • the preference DB 108 stores the names of users and related parties and the preference of these persons as the type of preference such as food preference (food), color preference (color), hobby, etc. Are recorded in association with each other.
  • sweet taste sweet
  • spicy taste spicy
  • the middle level and for color tastes, warm color (warm), cool color (cold), middle order, hobby are classified into indoor hobbies (inside), outdoor hobbies (outside), and both indoor and outdoor hobbies (inside and outside).
  • preference information generation processing shown in FIG. 17 is executed.
  • the preference information generation process is performed between S48 and S54, for example.
  • keywords relating to preference are extracted from character information (S282), and among objects identified by image processing, those relating to preference are extracted (S284).
  • the preference-related keywords and the classifications within the types are associated with each other in the preference DB 108, and are extracted in these processes.
  • keywords and objects are included in the preference DB 108, they are extracted as preferences.
  • the counter is incremented for each keyword group related to preference (S288). For example, when the type of preference is “food preference” and the type is “spicy”, such as kimchi, the counter corresponding to “food preference” and “spicy” is incremented. .
  • the preference information (preference DB 108) is updated based on the counter value (S290). That is, for each “preference type”, the “type” with the largest counter value is recorded in the preference DB 108 as the preference feature of the user or the person concerned as the best match.
  • the preference information generation processing ends.
  • the response candidate DB 105 prepares a response in which different responses are associated with each preference, and the server 90 performs a plurality of different character information items in the process of S56.
  • a response candidate representing the response is acquired, a response to be output from the response candidate is selected according to the preference information, and the selected response is output in the processing of S60 and S64.
  • the server 90 In the voice response system 100, the server 90 generates preference information indicating a preference tendency of a user or a person concerned based on a character string included in the character information. Then, a response to be output from the response candidates is selected based on the preference information, and the selected response is output.
  • a response can be made according to the preference of the user or the person concerned. For example, when a user asks the terminal device 1 “What do you want Mr. XX?” When buying a present of a related person, a response according to the preference information can be obtained.
  • the response candidate DB 105 may have a table in which personality classifications and preference information are associated with each other.
  • personality classifications and color preferences are associated with each other, and products that can be estimated to be happy when a woman gives a present as a present are arranged in a matrix.
  • a response can be generated in consideration of both personality and preference.
  • the terminal device 1 captures the user's action as a captured image and transmits it to the server 90.
  • the server 90 may perform, for example, an action character input process shown in FIG.
  • the action character input process is a process started when a part of the user's body is reflected in the captured image in the process of S48.
  • a captured image is acquired (S302). Then, it is determined whether the user intends to input characters by handwriting or to input characters by sign language (S304, S308).
  • the character input by the operation is associated with the character input by the speech, and whether there is a similar sound (whether the matching degree between the reference waveform based on the character and the pronunciation waveform is equal to or higher than the reference value). Is determined (S316). If there is such a voice input (S316: YES), the accent and pronunciation characteristics when the user inputs this character are recorded in the learning DB 107 in association with the character (S318), and the action character input process is performed. Exit.
  • the operation according to the present embodiment is not limited to handwriting of characters or gesture gestures (for example, sign language), but may be any operation that is caused by a muscle operation.
  • the contents of the learning DB 107 may be used in the other terminal device 1 when the user uses another terminal device 1 different from the terminal device 1 that the user normally uses.
  • the ID and password of the terminal device 1 that is normally used are transmitted from the other terminal device 1 to the server 90 together with the usage request.
  • the other terminal use process is a process started when a use request is received.
  • S332 If the ID and password are input (S332: YES), it is determined whether or not the authentication using the ID and password is completed (S334). If the authentication is completed (S334: YES), the fact that the authentication is complete is transmitted to the other terminal device 1 (S336), and the other terminal device 1 stores the learning DB 107 of the terminal device 1 corresponding to the ID and password. Setting to use is made (S338).
  • the server 90 transfers learning information of a certain terminal device 1 to another terminal device 1.
  • a voice response system 100 even when a user who uses a certain terminal device 1 uses another terminal device 1, learning information recorded in a certain terminal device 1 (recorded in the server 90). Learning information). Therefore, even when other terminal devices 1 are used, the generation accuracy of character information can be improved. This is particularly effective when the user has a plurality of terminal devices 1.
  • the server 90 outputs information about the user in response to an inquiry from a person other than the user.
  • a voice response system 100 for example, if the distance of a walk such as a user's meal content is detected, a question in a hospital or the like can be answered on behalf of the user. Moreover, you may be allowed to learn about health conditions and self-introduction.
  • the server 90 stores conversation contents and asks questions for obtaining the same contents about the heard contents. Specifically, the storage confirmation process shown in FIG. 21 is executed in S100 of the automatic conversation server process shown in FIG.
  • the past conversation content is extracted from the learning DB 107 (S352), and a question with the keyword included in any of the conversation content as an answer is generated (S353).
  • the storage confirmation processing ends.
  • the user's memory ability can be confirmed and the memory can be fixed. It is also considered effective in suppressing the progression of dementia in the elderly.
  • the voice response system according to the eleventh embodiment is configured such that a user can practice a foreign language using the terminal device 1 and the server 90.
  • the sound generation determination process 1 shown in FIG. 22, the sound generation determination process 2 shown in FIG. 23, and the sound generation determination process 3 shown in FIG. 24 are executed in order.
  • the server 90 executes one of the sound generation determination processes 1 to 3 each time the voice response server process (FIG. 2) is performed.
  • Each of the sound generation determination processes 1 to 3 is executed as the process of S56 described above.
  • a response to instruct to input a predetermined sentence by voice is generated (S362).
  • a sentence serving as a model for a foreign language is generated, and the sentence is prompted to imitate following the model.
  • the sound generation determination process 1 ends.
  • the sound generation determination process 2 is performed.
  • the pronunciation determination process 2 as shown in FIG. 23, the accuracy of pronunciation and accent is scored (score) (S372).
  • the speech is regarded as a waveform, and the degree of coincidence of the waveform with the case where the text as an example is used as a waveform is scored.
  • this score is recorded in the memory (S374), and the pronunciation determination process 2 is terminated. Subsequently, pronunciation determination processing 3 is performed.
  • the pronunciation determination process 3 as shown in FIG. 24, first, it is determined whether or not the score is less than a threshold value (S382).
  • a response to instruct to input the same sentence is generated again (S384).
  • a response for prompting the user to speak after imitating the model is generated again.
  • a response that prompts the user to input the next sentence and the fact that the pronunciation is good is generated (S386). For example, it generates a response such as "Good pronunciation. Let's move on.”
  • the sound generation determination processing 3 ends.
  • the server 90 detects the accuracy of voice pronunciation and accent input by the user, and outputs the detected accuracy.
  • the accuracy of pronunciation and accent can be confirmed. For example, it is effective when practicing a foreign language.
  • the server 90 causes the same question to be output again when the accuracy is a predetermined value or less.
  • the server 90 may output a voice including a word closest to the pronunciation made by the user for confirmation when the accuracy is equal to or less than a predetermined value.
  • the user can confirm the accuracy of pronunciation and accent.
  • a voice response system according to the twelfth embodiment will be described.
  • the user's emotion is detected from the voice input by the user, and a response that heals the user is generated according to the emotion.
  • the emotion determination process shown in FIG. 25 and the emotion response generation process shown in FIG. 26 are executed.
  • the emotion determination process is performed as the details of the process of S50 described above. As shown in FIG. 25, first, as shown in FIG. Then, the emotions are classified by the score and recorded in the memory (S394).
  • an emotion response generation process is executed in the process of S56 described above. Specifically, as shown in FIG. 26, first, the emotion classification set in the emotion determination process is determined (S412). If the emotion category is a normal (S412: Normal), generated as a response (message) an ordinary greeting such as "Hello” (S414).
  • the server 90 detects the user's irritation and shaking by detecting the unexpectedly generated voice, and generates a message for suppressing the irritation and shaking.
  • the terminal device 1 When the terminal device 1 inputs a voice message such as “Please guide to the visible tower”, guidance processing is performed in the processing of S56.
  • the terminal position information is acquired from the GPS receiver 27 or the like of the terminal device 1 (S432).
  • a target object is specified from among the objects in the captured image, and this position is specified (S434).
  • the position of the object is specified in the map information (which may be acquired from the outside or may be held by the server 90) based on the shape, relative position, etc. of the object. For example, when a tower is reflected in the captured image, the tower is specified on the map from the position of the terminal device 1 and the shape of the tower.
  • a response for guiding the route is generated (S440).
  • a response similar to the guidance by the navigation device may be generated.
  • the guidance process ends.
  • the automatic conversation server process may be used to reproduce the message on the condition that the user reaches the point to be guided.
  • the server 90 when character information is input, the server 90 generates a response corresponding to a captured image obtained by imaging the surroundings of the voice response system 100, and outputs the response by voice.
  • a response can be output by voice according to the captured image. Therefore, usability can be improved compared with the structure which produces
  • the server 90 searches for an object included in the character information from the captured image by image processing, specifies the position of the searched object, and guides to the position of the object.
  • the user can be guided to the object in the captured image.
  • the server 90 obtains route information such as weather, temperature, humidity, traffic information, road surface condition and the like to the destination when performing guidance to the destination, and the route information is voiced. Output.
  • the situation (route information) to the destination can be notified to the user by voice.
  • character information may be input so as to respond to what is recognized, and what (someone) is recognized from the captured image may be output by voice.
  • the server 90 may acquire a moving image obtained by capturing the shape of the mouth of the user when inputting character information by voice instead of the process of S48.
  • the voice instead of the processing of S52, the voice may be converted into character information, and the character information may be corrected by estimating an unclear part of the voice based on the moving image.
  • the utterance content can be estimated from the shape of the mouth, so that an unclear part of the voice can be estimated well.
  • the voice response system according to the fourteenth embodiment the user is requested to perform a predetermined operation, and it is determined whether the user has performed the operation as requested.
  • the movement request process 1 shown in FIG. 28 and the movement request process 2 shown in FIG. It is carried out in order.
  • the movement request process 1 is started, and the movement request process 1 outputs a response (message) instructing to move the line of sight or the head to a predetermined position as shown in FIG. (S452).
  • a response messages instructing to move the line of sight or the head to a predetermined position as shown in FIG. (S452).
  • the movement request process 2 is started.
  • the movement request process 2 as shown in FIG. 29, it is determined whether or not the position of the line of sight or the head has moved as instructed ( S462).
  • the user's action is detected by performing image processing on an image captured by the camera or using detection results by various sensors of the terminal device 1.
  • a known gaze recognition technique may be employed.
  • the movement request processing 2 ends.
  • the voice response system 100 the user's line of sight is detected, and if the user's line of sight does not move to a predetermined position in response to the call, a voice requesting to move the line of sight to the predetermined position is output.
  • the user can be made to see a specific position. Therefore, it is possible to reliably perform safety confirmation when driving the vehicle.
  • the server 90 observes the position of the body part and the facial expression, and outputs a voice requesting to change the position of the body part and the facial expression when there are few changes to the call. To do.
  • the position of the body part of the user can be moved to a specific position or can be guided to have a specific facial expression.
  • the present invention can be used when driving a vehicle or performing a physical examination.
  • broadcast music supplement processing shown in FIG. 30 is performed as the details of S56 described above.
  • the broadcast music complementing process it is first determined whether or not the broadcast program or the music (the song if the user sings) has been interrupted (S482).
  • the broadcast program or music synchronized in the process of S492 described later is set as the response content (S484), and the broadcast music complementing process is terminated. If there is no interruption (S482: NO), the broadcast program is acquired if the broadcast program is being viewed (S486), and if the music is being played, the corresponding music is acquired (S488).
  • the karaoke DB 116 music and lyrics are recorded in association with each other, and when music is acquired in this process, music with lyrics is acquired. Subsequently, the broadcast program or music to be viewed by the user is specified (S490). Then, this broadcast program or music is acquired and prepared so that it can be played back in synchronization with the broadcast program or music viewed by the user (S492), and the broadcast music supplement processing is terminated.
  • the server 90 acquires a broadcast program similar to the broadcast program viewed by the user, and outputs the broadcast program that the user acquired by outputting the broadcast program that the user acquired when the broadcast program is interrupted. Complement.
  • the server 90 compares the song with the lyrics with the lyrics added by the user, and only the user's lyrics.
  • the lyrics are output by voice in the part where there is no.
  • a voice response system 100 it is possible to make up for a portion where a user who uses a so-called karaoke apparatus cannot sing (a portion where the lyrics are interrupted).
  • a voice response system according to the sixteenth embodiment when a character is included in the captured image, when the terminal device 1 receives a question about how to read the character from the user, the character information is acquired from the outside. How to read characters included in information is output by voice.
  • the character explanation process shown in FIG. 31 is performed as the details of S56 described above.
  • the character commentary process as shown in FIG. 31, it is first determined whether or not a reading question such as “how to read” has been received (S502). If a reading question has been received (S502: YES), the image-recognized character is searched for reading from another server or the like connected via the Internet 85 (S504), and the obtained reading is set as a response. In step S506, the character explanation process is terminated.
  • the server 90 detects an abnormal action or state of the user of the terminal device 1, and notifies when there is an abnormality. Perform the process.
  • the terminal device 1 performs the behavior response terminal process shown in FIG. 32, and the server 90 performs the behavior response server process.
  • the action response terminal process as shown in FIG. 32, first, outputs from various sensors mounted on the terminal device 1 are acquired (S522), and an image captured by the camera 41 is acquired (S524). Then, the obtained outputs from the various sensors and the captured images are packet-transmitted to the server 90 (S526), and the action response terminal process is terminated.
  • the actions of S42 to S44 described above are performed. Subsequently, based on the position information of the terminal device 1 (detection result by the GPS receiver 27), an action such as a bag is specified (S532), and the environment of the user is determined based on the detection results by the temperature sensors 15, 19 and the like. It is detected (S534). Then, an abnormality is detected (S536).
  • an abnormality is detected based on the change in position information and the environment. For example, when the user does not move in a place where the temperature is high or low, or when the user exists in a place where the user does not normally go, it is detected that there is an abnormality (S536). Alternatively, the position information and the environment are scored, and if this score is below the reference value (out of the reference range), it is determined that there is an abnormality.
  • the server 90 detects the user's behavior and the surrounding environment of the user, and generates a message according to the detected behavior and the surrounding environment.
  • a voice response system 100 it is possible to notify a dangerous place or an area where entry is prohibited. It is also possible to detect that the user has an abnormal behavior.
  • the server 90 determines a health condition based on a captured image obtained by capturing the user, and generates a message according to the health condition. According to such a voice response system 100, the health condition of the user can be managed.
  • the server 90 notifies a predetermined contact when the health condition is lower than the reference value. According to such a voice response system 100, when the user's health state is equal to or less than a reference value, a report can be made. Therefore, the abnormality can be notified to the other person earlier.
  • Embodiments of the present invention are not limited to the above-described embodiments, and can take various forms as long as they belong to the technical scope of the present invention.
  • the voice response system 100 may mediate exchange between two parties or between multiple parties. Specifically, when it is necessary to give way at an intersection or the like, the terminal devices 1 may negotiate which vehicle enters the intersection first. In this case, each terminal device 1 transmits information on the moving direction when approaching the intersection and the approaching speed to the intersection to the server 90, and the server 90 sends the information to each terminal device 1 according to the moving direction and the approaching speed.
  • a priority order may be set, and a voice such as “Tare” or “Enterable” may be generated and output according to the priority order.
  • the terminal device 1 accepts an incoming call (incoming call) of communication that needs to respond in real time such as voice communication
  • the incoming call may be accepted only at the convenience of the user. Specifically, when the user's face can be imaged by the camera 41, it may be assumed that the user is convenient and the incoming call may be accepted.
  • the situation of the other party may be communicated to a user who is waiting for a response from the other party. For example, if the user's schedule is managed in the terminal device 1 and the user does not respond to an incoming call, the user's schedule is searched for what the user is doing, or the user's schedule, It is possible to tell when the user can respond.
  • the location of the user may be notified to the caller. For example, if the user is connected to the Internet or the like via a smartphone or a personal computer, it can be determined which terminal is being operated. It is conceivable to identify the location of the user from this information and convey it to the caller.
  • whether or not the user can respond to an incoming call may be determined using position information using GPS or the like. Based on location information, you can determine whether you are in a car, at home, etc. For example, if the user is on the move or on the bed, it is highly public or sleep What is necessary is just to judge that it cannot answer an incoming call by judging that it is inside. If the incoming call cannot be answered in this way, it can be considered to inform the caller what the user is doing as described above.
  • a configuration using a security camera can be considered.
  • various location security cameras have been installed, so it is possible to recognize the position of the user using a configuration for identifying the person such as face authentication using these security cameras.
  • a situation determination such as what the user is doing using the security camera (whether or not the telephone can be answered) may be performed.
  • whether or not an incoming call can be answered can also be determined based on conditions such as whether another fixed telephone is being used (the incoming call cannot be answered while the fixed telephone is in use).
  • the user of the terminal device 1 wants to have a conversation with someone, use the personality learning result of the user and call out the terminal device that is estimated to have good compatibility among users among the unspecified number. Also good.
  • a topic that is likely to be excited (a topic that both users are interested in (extracted using the learning result)) may be spoken to the user.
  • the voice response device when the voice response device is not used for a long time (when the user is not speaking for more than the reference time), the voice response device may put some words on the user. At this time, words to be spoken using position information such as GPS may be selected.
  • the terminal device 1 and the server 90 in the above embodiment correspond to an example of the voice response device of the present invention. Further, the processes of S22 and S56 in the above embodiment correspond to an example of a response acquisition unit of the present invention.
  • the process of S14 in the above embodiment corresponds to an example of the voice transmission means of the present invention.
  • the response candidate DB 105 in the above embodiment corresponds to an example of a response recording unit of the present invention.
  • process of S56 in the above embodiment corresponds to an example of the character information acquisition means of the present invention. Further, the processing of S22 and S56 in the above embodiment corresponds to an example of a response acquisition unit of the present invention.
  • the processes of S28, S60, and S64 in the above embodiment correspond to an example of the audio output means of the present invention.
  • the processing of S254, S258, and S260 in the above embodiment corresponds to an example of the first personality information generation unit and the second personality information generation unit of the present invention.
  • the process of S56 in the said embodiment is corresponded to an example of the character information acquisition means of this invention.
  • processing of S22 and S56 in the above embodiment corresponds to the response acquisition means of the present invention.
  • processes of S28, S60, and S64 in the above embodiment correspond to an example of the audio output means of the present invention.
  • processing of S254, S258, and S260 in the above embodiment corresponds to an example of the first personality information generation unit and the second personality information generation unit of the present invention.
  • processing of S48 and S56 in the above embodiment corresponds to an example of a response generation unit of the present invention.
  • processes of S28, S60, and S64 in the above embodiment correspond to an example of the audio output means of the present invention.
  • the process of S48 corresponds to an example of the voice input moving image acquiring means of the present invention.
  • the process of S52 in the said embodiment is corresponded to an example of the character information conversion means of this invention.
  • the preference information generation process in the above embodiment corresponds to an example of the preference information generation means of the present invention.
  • the process of S56 in the said embodiment is corresponded to an example of the response candidate acquisition means of this invention.
  • the action character input process in the above embodiment corresponds to an example of the character information generating means of the present invention.
  • Other device information acquisition means The other terminal use processing in the above embodiment corresponds to an example of the transfer means of the present invention.
  • process of S98 in the above embodiment corresponds to an example of the reproduction condition determining means of the present invention.
  • process of S100 in the said embodiment is corresponded to an example of the message reproduction
  • processing of S116 in the above embodiment corresponds to an example of the non-response transmission means of the present invention.
  • the process of S372 in the said embodiment is corresponded to an example of the speech accuracy detection means of this invention.
  • process of S374 in the above embodiment corresponds to an example of the accuracy output means of the present invention.
  • process of S204 in the said embodiment is corresponded to an example of the connection control means of this invention.
  • process of S50 in the above embodiment corresponds to an example of the emotion determination means of the present invention.
  • process of S438 in the said embodiment is equivalent to an example of the route information acquisition means of this invention.
  • process of S462 in the above embodiment corresponds to an example of the line-of-sight detection means of the present invention.
  • process of S464 in the above embodiment corresponds to an example of a line-of-sight movement request transmission unit of the present invention.
  • process of S464 in the above embodiment corresponds to an example of a change request transmission unit of the present invention.
  • process of S486 in the said embodiment is corresponded to an example of the broadcast program acquisition means of this invention.
  • processing of S484 in the above embodiment corresponds to an example of the broadcast program complementing means and the lyrics adding means of the present invention.
  • processing of S504 and S506 in the above embodiment corresponds to an example of the reading output means of the present invention.
  • processing of S522 and S524 in the above embodiment corresponds to an example of the behavior environment detection means of the present invention.
  • process of S538 in the above embodiment corresponds to an example of the health condition determining means of the present invention.
  • process of S540 in the said embodiment is corresponded to an example of the health message production

Abstract

This voice response device makes voice responses to input text information, and is provided with: a response acquisition means for acquiring a plurality of responses to the text information; and a voice output means for outputting the plurality of responses, doing so in respectively different voices. According to this voice response device, a plurality of responses can be output in different voices, whereby even in instances in which the answer to one item of text information cannot be identified as being a single one, different answers can be output in different voices in a manner easily comprehensible by a user.

Description

音声応答装置Voice response device 関連出願の相互参照Cross-reference of related applications
 本国際出願は、2012年6月18日に日本国特許庁に出願された日本国特許出願第2012-137065号、日本国特許出願第2012-137066号、および日本国特許出願第2012-137067号に基づく優先権を主張するものであり、日本国特許出願第2012-137065号、日本国特許出願第2012-137066号、および日本国特許出願第2012-137067号の全内容を参照により本国際出願に援用する。 This international application includes Japanese Patent Application No. 2012-137065, Japanese Patent Application No. 2012-137066, and Japanese Patent Application No. 2012-137067 filed with the Japan Patent Office on June 18, 2012. The Japanese patent application No. 2012-137065, the Japanese patent application No. 2012-137066, and the Japanese patent application No. 2012-137067 are referred to in the present international application. Incorporated into.
 本発明は、入力された文字情報に対する応答を音声で行わせる音声応答装置に関する。 The present invention relates to a voice response device that allows voice response to input character information.
 上記の音声応答装置として、入力された質問に対する回答を辞書から検索し、検索した回答を音声で出力するものが知られている(例えば特許文献1参照)。また、使用者との対話の内容に基づいて質問に対する回答を生成する技術も知られている(例えば特許文献2参照)。 As the above-mentioned voice response device, there is known a device that searches a dictionary for an answer to an inputted question and outputs the searched answer by voice (for example, see Patent Document 1). In addition, a technique for generating an answer to a question based on the content of dialogue with a user is also known (see, for example, Patent Document 2).
特許第4832097号公報Japanese Patent No. 4832097 特許第4924950号公報Japanese Patent No. 4924950
 上記技術では、単に1つの質問に対して辞書によって特定される1つの回答を行うように設定されている。
 入力された文字情報に対する応答を音声で行わせる音声応答装置において、使用者にとってより使い勝手をよくすることが本発明の一側面である。
In the above technique, one answer specified by the dictionary is simply set for one question.
One aspect of the present invention is to improve the usability for a user in a voice response device that makes a response to input character information by voice.
 第1局面の発明においては、
 入力された文字情報に対する応答を音声で行わせる音声応答装置であって、
 前記文字情報に対する複数の異なる応答を取得する応答取得手段と、
 前記複数の異なる応答をそれぞれ異なる声色で出力させる音声出力手段と、
 を備えたことを特徴とする。
In the invention of the first aspect,
A voice response device that allows voice response to input character information,
Response acquisition means for acquiring a plurality of different responses to the character information;
Voice output means for outputting the plurality of different responses in different voice colors;
It is provided with.
 このような音声応答装置によれば、複数の応答を異なる声色で出力させることができるので、1の文字情報に対する解が1つに特定できない場合であっても、異なる解を異なる声色で使用者に分かりやすく出力することができる。よって、使用者にとってより使い勝手をよくすることができる。 According to such a voice response device, it is possible to output a plurality of responses with different voice colors, so even if it is not possible to specify one solution for one character information, a different solution with different voice colors can be used. Can be easily output. Therefore, it is possible to improve usability for the user.
 なお、本発明の音声応答装置は、例えば、使用者が所持する端末装置として構成されていてもよいし、この端末装置と通信を行うサーバとして構成されていてもよい。また、文字情報は、キーボード等に入力手段を利用して入力されてもよいし、音声を文字情報に変換することで入力されてもよい。 Note that the voice response device of the present invention may be configured as a terminal device possessed by a user, or may be configured as a server that communicates with the terminal device. The character information may be input to the keyboard or the like using input means, or may be input by converting voice into character information.
 ところで、上記音声応答装置においては、第2局面の発明のように、
 使用者が音声を入力するための音声入力手段と、入力された音声を文字情報に変換し、該文字情報に対する複数の異なる応答を生成して当該音声応答装置に送信する外部装置、に対して送信する音声送信手段と、
 を備え、
 前記応答取得手段は、前記外部装置から前記応答を取得する
ようにしてもよい。
By the way, in the voice response device, as in the invention of the second aspect,
For voice input means for a user to input voice and an external device that converts the input voice into character information, generates a plurality of different responses to the character information, and transmits the response to the voice response device Audio transmitting means for transmitting;
With
The response acquisition unit may acquire the response from the external device.
 このような音声応答装置によれば、音声応答装置では音声を入力することができるので、文字情報を音声で入力する構成とすることができる。また、外部装置において応答を生成する構成とすることができるので、音声応答装置での処理負荷を軽減することができる。 According to such a voice response device, since the voice response device can input voice, it can be configured to input character information by voice. Moreover, since it can be set as the structure which produces | generates a response in an external device, the processing load in a voice response device can be reduced.
 なお、音声送信手段においては、「入力された音声を文字情報に変換」する作動を音声応答装置で行ってもよいし、外部装置で行ってもよい。
 さらに、上記音声応答装置においては、第3局面の発明のように、
 当該音声応答装置または前記外部装置には、複数の文字情報のそれぞれに対して、各文字情報に対する肯定的応答と否定的応答とを含む複数の異なる応答が記録された応答記録手段、を備え、
 前記応答取得手段は、前記複数の異なる応答として前記肯定的応答と前記否定的応答とを取得し、
 前記音声出力手段は、前記肯定的応答と前記否定的応答とで異なる声色で再生する
ようにしてもよい。
In the voice transmitting means, the operation of “converting input voice into character information” may be performed by a voice response device or an external device.
Furthermore, in the voice response device, as in the invention of the third aspect,
The voice response device or the external device includes response recording means in which a plurality of different responses including a positive response and a negative response to each character information are recorded for each of a plurality of character information,
The response acquisition means acquires the positive response and the negative response as the plurality of different responses,
The voice output means may play back with different voice colors for the positive response and the negative response.
 このような音声応答装置によれば、肯定的応答と否定的応答というように、立場の異なる応答を異なる声色で再生することができるので、別人物が話しているかのように音声を再生することができる。よって、音声を聞く使用者に違和感を覚えさせにくくすることができる。 According to such a voice response device, responses of different positions such as a positive response and a negative response can be played with different voice colors, so that the voice is played as if another person is speaking. Can do. Therefore, it is possible to make it difficult for the user who listens to the voice to feel uncomfortable.
 なお、応答の種別や応答の際の言葉遣いによって声色を変更してもよい。例えば、優しい口調で応答を行う場合には、落ち着いた女性の音声で再生し、激しい口調で応答する場合には、勇ましい男性の音声で応答するなどすればよい。つまり、応答内容と性格とを対応付けておき、性格に応じて声色を設定するようにすればよい。 Note that the voice color may be changed depending on the type of response and the language used in the response. For example, when a response is made with a gentle tone, the voice is reproduced with a calm woman's voice, and when a response is made with a severe tone, a response with a brave man's voice may be made. That is, the response content and the personality are associated with each other, and the voice color may be set according to the personality.
 また、上記音声応答装置においては、第4局面の発明のように、仕事場や会社の受付で利用する構成とし、或いは使用者が誰かに直接言いにくいことを代わりに伝える構成とすることができる。 In addition, the voice response device can be configured to be used at the reception of a workplace or company as in the invention of the fourth aspect, or can be configured to notify the user that it is difficult to tell someone directly.
 受付において音声応答装置を利用する場合には、セールスに来る者の名前と会社名を音声応答装置や外部装置に予め記録しておき、受付に来たものが、この名前や会社名を名乗った場合には、断る文句の音声を再生するように、応答を生成すればよい。 When using a voice response device at the reception, the name and company name of the person coming to the sales are recorded in advance in the voice response device or an external device, and those who came to the receptionist have given this name or company name. In that case, a response may be generated so as to reproduce the voice of the phrase to be refused.
 また、言いにくいことを代わりに伝える構成とする場合には、例えば、デート前に、今日はこのようなことを言いたいと本装置に話しかけておくと、適当なタイミング(例えば予め設定した時刻や、会話が途切れてから一定時間が経過した場合など)で、音声応答装置が代わりに話してくれる(音声を再生する)ようにすればよい。 In addition, when it is configured to convey something that is difficult to say instead, for example, if you talk to the device today that you want to say such a thing before dating, it will be appropriate time (for example, preset time or When a certain time has passed since the conversation was interrupted), the voice response device may speak (reproduce the voice) instead.
 或いは、言いにくいことのきっかけになる言葉、例えば「そういえば何か彼女に話すって言ってなかったっけ?」のような言葉、を話す構成としてもよい。つまり、直ちに応答を出力するのではなく、一定時間経過後など、再生条件が成立した場合に応答を出力するようにしてもよい。 Or, it may be configured to speak words that trigger difficult things to say, such as words like "Did you say something to her?" That is, the response may not be output immediately, but may be output when the reproduction condition is satisfied, for example, after a certain time has elapsed.
 さらに、上記音声応答装置においては、第5局面の発明のように、外部装置または音声応答装置は、文字情報に対する応答を生成するための情報を他の音声応答装置から取得するようにしてもよい。また、上記音声応答装置においては、第6局面の発明のように、文字情報に対する応答を生成するための情報を他の音声応答装置から要求された場合、この要求に応じた情報を返すようにしてもよい。 Further, in the voice response device, as in the fifth aspect of the invention, the external device or the voice response device may acquire information for generating a response to the character information from another voice response device. . In the voice response device, as in the invention of the sixth aspect, when information for generating a response to the character information is requested from another voice response device, the information corresponding to the request is returned. May be.
 この場合、音声応答装置は、位置情報、温度、湿度、照度、騒音レベル等を検出するためのセンサ類や、辞書情報などのデータベースを備えておき、要求に応じて必要な情報を抽出するようにすればよい。 In this case, the voice response device includes sensors for detecting position information, temperature, humidity, illuminance, noise level, etc., and a database such as dictionary information, and extracts necessary information according to the request. You can do it.
 このような音声応答装置(外部装置)によれば、他の音声応答装置から応答を生成するための情報を取得することができる。この場合、他の音声応答装置の位置等、他の音声応答装置固有の情報を取得することができる。 Such a voice response device (external device) can acquire information for generating a response from another voice response device. In this case, information unique to the other voice response device such as the position of the other voice response device can be acquired.
 また、他の音声応答装置に自身固有の情報を送信することができる。
 さらに、上記音声応答装置においては、第7局面の発明のように、自身または他の音声応答装置が出力した応答(例えば、肯定的応答や否定的応答)を文字情報として入力し、この応答に対する反論を行うための応答を生成するようにしてもよい。つまり、使用者の立場からすると、賛成の立場と反対の立場との両方の意見による議論を聞くことができる。そして、この議論を聞いたうえで、使用者は最終判断を行うことができる。
In addition, information unique to itself can be transmitted to another voice response device.
Further, in the voice response device, as in the seventh aspect of the invention, a response (for example, a positive response or a negative response) output by itself or another voice response device is input as character information, and the response to this response is received. You may make it produce | generate the response for performing objection. In other words, from the user's point of view, it is possible to hear discussions based on both opinions in favor and in opposition. And after hearing this discussion, the user can make a final decision.
 この構成は、1台または複数の音声応答装置を用いて実現できる。この場合、複数の音声応答装置が音声をやり取りするには、音声を直接入出力してもよいし、無線等による通信を利用してもよい。 This configuration can be realized using one or a plurality of voice response devices. In this case, in order for a plurality of voice response devices to exchange voices, voices may be directly input / output, or wireless communication or the like may be used.
 また、第8局面の発明においては、
 入力された文字情報に対する応答を音声で行わせる音声応答装置であって、
 使用者または使用者に関係がある者を表す関係者の性格を予め設定された区分に従って対応付けた性格情報を取得する性格情報取得手段と、
 前記文字情報に対する複数の異なる応答を表す応答候補を取得する応答取得手段と、
 前記性格情報に応じて応答候補から出力させる応答を選択し、該選択した応答を出力させる音声出力手段と、
 を備えたことを特徴とする。
In the invention of the eighth aspect,
A voice response device that allows voice response to input character information,
Personality information acquisition means for acquiring personality information associated with the personality of a person representing a user or a person related to the user according to a preset category;
Response acquisition means for acquiring response candidates representing a plurality of different responses to the character information;
Selecting a response to be output from a response candidate according to the personality information, and outputting the selected response;
It is provided with.
 このような音声応答装置によれば、使用者や使用者に関係がある者(関係者)の性格に応じて異なる応答を行うことができる。よって、使用者にとって使い勝手を良くすることができる。 According to such a voice response device, different responses can be made according to the personality of the user or the person related to the user (related person). Therefore, usability can be improved for the user.
 また、上記音声応答装置においては、第9局面の発明のように、
 予め設定された複数の質問に対する回答に基づいて前記使用者または前記関係者の性格情報を生成する第1性格情報生成手段を備え、
 前記性格情報取得手段は、前記性格情報生成手段で生成された性格情報を取得する
ようにしてもよい。
In the voice response device, as in the ninth aspect of the invention,
Comprising first personality information generating means for generating personality information of the user or the related person based on answers to a plurality of preset questions;
The personality information acquisition unit may acquire personality information generated by the personality information generation unit.
 このような音声応答装置によれば、性格情報を音声応答装置において生成することができる。なお、性格情報を生成する際には、周知の性格分析技術(ロールシャッハ・テスト、ソンディ・テスト等)を利用すればよい。また、性格情報を生成する際には、企業等が採用試験に利用する適性検査の技術を利用してもよい。 According to such a voice response device, personality information can be generated in the voice response device. When generating personality information, a well-known personality analysis technique (Rorschach test, Sondy test, etc.) may be used. In addition, when generating personality information, aptitude inspection technology used for employment tests by companies and the like may be used.
 さらに、上記音声応答装置においては、第10局面の発明のように、
 前記入力された文字情報に含まれる文字列に基づいて前記使用者または前記関係者の性格情報を生成する第2性格情報生成手段を備え、
 前記性格情報取得手段は、前記性格情報生成手段で生成された性格情報を取得する
ようにしてもよい。
Furthermore, in the voice response device, as in the invention of the tenth aspect,
Second character information generating means for generating character information of the user or the related person based on a character string included in the input character information;
The personality information acquisition unit may acquire personality information generated by the personality information generation unit.
 このような音声応答装置によれば、使用者が音声応答装置を利用する過程で性格情報を生成することができる。
 また、上記音声応答装置においては、第11局面の発明のように、
 文字情報に含まれる文字列に基づいて前記使用者または前記関係者の嗜好の傾向を示す嗜好情報を生成する嗜好情報生成手段、を備え、
 前記音声出力手段は、前記嗜好情報に基づいて前記応答候補から出力させる応答を選択し、該選択した応答を出力させる
ようにしてもよい。
According to such a voice response device, personality information can be generated in the process in which the user uses the voice response device.
In the voice response device, as in the invention of the eleventh aspect,
Preference information generating means for generating preference information indicating a tendency of preference of the user or the related person based on a character string included in character information;
The voice output means may select a response to be output from the response candidate based on the preference information, and output the selected response.
 このような音声応答装置によれば、使用者または関係者の好みに応じて応答を行うことができる。
 さらに、上記音声応答装置においては、第12局面の発明のように、使用者の行動(会話、移動した場所、カメラに映ったもの)を学習(記録および解析)しておき、使用者の会話における言葉足らずを補うようにしてもよい。
According to such a voice response device, a response can be made according to the preference of the user or the person concerned.
Further, in the above voice response device, as in the invention of the twelfth aspect, the user's behavior (conversation, place moved, reflected in the camera) is learned (recorded and analyzed), and the user's conversation You may make up for the lack of words.
 例えば、「今日はハンバーグでいい?」との質問に対して「カレーがいいな。」と使用者が回答する会話に対して、本装置が「昨日ハンバーグだったからね」と補うと、使用者が、カレーがいいと発言した理由が伝わる。 For example, when the user answers the question “Is it hamburger yesterday?” To the conversation that the user answers “I want curry?” However, the reason why he said that curry is good is conveyed.
 また、このような構成は、電話中に実施することもでき、また、使用者の会話に勝手に参加するよう構成してもよい。
 さらに、上記音声応答装置においては、第13局面の発明のように、
 応答候補を所定のサーバ、またはインターネット上から取得する応答候補取得手段、
を備えていてもよい。
Further, such a configuration can be implemented during a telephone call, or may be configured to participate in a user's conversation without permission.
Furthermore, in the voice response device, as in the invention of the thirteenth aspect,
Response candidate acquisition means for acquiring response candidates from a predetermined server or the Internet,
May be provided.
 このような音声応答装置によれば、応答候補を自装置や外部装置だけでなく、インターネットや専用線等で接続された任意の装置から取得することができる。
 また、上記音声応答装置においては、第14局面の発明のように、
 使用者による動作を文字情報に変換する文字情報生成手段、
を備えていてもよい。
According to such a voice response device, response candidates can be acquired not only from the device itself or an external device, but also from any device connected via the Internet or a dedicated line.
In the voice response device, as in the invention of the fourteenth aspect,
Character information generation means for converting user's action into character information;
May be provided.
 ここで、本発明でいう動作には、会話、文字の手書き、或いは身振り手振り(例えば手話)等の筋肉の動作に起因するものが該当する。
 このような音声応答装置によれば、使用者の動作を文字情報に変換することができる。
Here, the action referred to in the present invention corresponds to an action caused by a muscle action such as conversation, handwriting of characters, or gesture gesture (for example, sign language).
According to such a voice response device, the user's action can be converted into character information.
 さらに、上記音声応答装置においては、第15局面の発明のように、
 文字情報生成手段は、使用者の発話による音声を文字情報に変換し、発声時の癖(発音上の癖など)を学習情報として蓄積する(特徴を捉えてこの特徴を記録しておく)
ようにしてもよい。
Furthermore, in the voice response device, as in the invention of the fifteenth aspect,
The character information generation means converts the voice of the user's utterance into character information, and accumulates utterances (such as pronunciation utterances) at the time of utterance as learning information (captures and records these characteristics)
You may do it.
 このような音声応答装置によれば、学習情報に基づいて文字情報を生成することができるので、文字情報の生成精度を向上させることができる。
 また、上記音声応答装置においては、第16局面の発明のように、
 前記学習情報を他の音声応答装置に転送する転送手段、
を備えていてもよい。
According to such a voice response device, the character information can be generated based on the learning information, so that the generation accuracy of the character information can be improved.
In the voice response device, as in the invention of the sixteenth aspect,
Transfer means for transferring the learning information to another voice response device;
May be provided.
 このような音声応答装置によれば、使用者が他の音声応答装置を利用する場合においても、本音声応答装置で記録された学習情報を利用することができる。よって、他の音声応答装置を利用する場合においても文字情報の生成精度を向上させることができる。 According to such a voice response device, even when the user uses another voice response device, the learning information recorded by the voice response device can be used. Therefore, even when other voice response devices are used, the generation accuracy of character information can be improved.
 さらに、上記音声応答装置においては、第17局面の発明のように、使用者の行動および操作のうちの何れかを検出し、これらに基づいて学習情報または性格情報を生成するようにしてもよい。 Further, in the voice response device, as in the invention of the seventeenth aspect, any one of the user's behavior and operation may be detected, and learning information or personality information may be generated based on these. .
 このような音声応答装置によれば、例えば、使用者が数日間連続で電車に飛び乗ることを検出した場合には、翌日からは数分早く家を出るよう促したり、会話から使用者に怒りやすい傾向があることを検出した場合には、気分を抑える音声や音楽を出力したりすることができる。 According to such a voice response device, for example, when it is detected that the user jumps on the train for several days in a row, it is urged to leave the house several minutes earlier from the next day, or the user is easily angry from the conversation. When it is detected that there is a tendency, it is possible to output voice or music that suppresses the mood.
 また、上記音声応答装置においては、第18局面の発明のように、
 他の音声応答装置から他の音声応答装置に記録されている情報を取得する他装置情報取得手段
を備えていてもよい。
In the voice response device, as in the eighteenth aspect of the invention,
You may provide the other apparatus information acquisition means which acquires the information currently recorded on the other voice response apparatus from another voice response apparatus.
 このような音声応答装置によれば、他の音声応答装置に記録された情報に基づいて応答を生成することができる。
 さらに、上記の音声応答装置においては、第19局面の発明のように、
 前記文字情報が入力されない場合において、当該音声応答装置の状況が予め音声を出力させる条件として設定された再生条件に合致するか否かを判定する再生条件判定手段と、
 前記再生条件に合致する場合に、予め設定されたメッセージを出力させるメッセージ再生手段と、
を備えていてもよい。
According to such a voice response device, a response can be generated based on information recorded in another voice response device.
Furthermore, in the above voice response device, as in the nineteenth aspect,
Reproduction condition determination means for determining whether or not the state of the voice response device matches a reproduction condition set in advance as a condition for outputting voice when the character information is not input;
Message reproduction means for outputting a preset message when the reproduction condition is satisfied;
May be provided.
 このような音声応答装置によれば、文字情報が入力されない場合(つまり、使用者が話しかけない場合)であっても、音声を出力させることができる。例えば、強制的に使用者に発話させることで、自動車運転中の眠気抑制対策に利用することができる。また、一人暮らしの者が応答するか否かを判定することで、安否確認を行うことができる。 According to such a voice response device, voice can be output even when character information is not input (that is, when the user does not speak). For example, by forcing the user to speak, it can be used as a measure for suppressing drowsiness while driving a car. Moreover, safety confirmation can be performed by determining whether a person living alone responds.
 また、上記音声応答装置においては、第20局面の発明のように、
 メッセージ再生手段は、ニュースの情報を取得し、該ニュースに関するメッセージを使用者の回答を求める質問形式で出力させる
ようにしてもよい。
In the voice response device, as in the twentieth aspect of the invention,
The message reproducing means may acquire news information and output a message related to the news in a question format for asking a user's answer.
 このような音声応答装置によれば、ニュースに関する会話をすることができるので、いつも同じ会話ばかりになることを抑制することができる。会話の内容としては、例えば、ある会社の株価に関する情報を取得できた場合には、「今日の○○会社の株価が○○円上がりましたね。ご存じでしたか?」などとすることができる。 According to such a voice response device, since it is possible to have a conversation about news, it is possible to prevent the conversation from always being the same. For example, if the information about the stock price of a company can be acquired, the content of the conversation can be "Today's stock price of XX company has increased by XX yen. Did you know?" .
 さらに、上記音声応答装置においては、第21局面の発明のように、
 音声出力手段またはメッセージ再生手段は、予め設定されたメッセージに別途取得した(ニュースや環境(気温、天気、位置情報等の)外部取得情報を付加して出力させる
ようにしてもよい。
Furthermore, in the voice response device, as in the invention of the twenty-first aspect,
The voice output means or the message reproduction means may be made to output by adding externally acquired information (news and environment (temperature, weather, position information, etc.)) separately acquired to a preset message.
 このような音声応答装置によれば、所定のメッセージと取得した情報とを組み合わせた応答を出力することができる。
 また、上記音声応答装置においては、第22局面の発明のように、
 複数のメッセージを取得し、メッセージの再生頻度に応じて再生するメッセージを選択して出力させる
ようにしてもよい。
According to such a voice response device, a response in which a predetermined message and the acquired information are combined can be output.
In the voice response device, as in the invention of the twenty-second aspect,
A plurality of messages may be acquired, and a message to be reproduced may be selected and output according to the message reproduction frequency.
 このような音声応答装置によれば、再生頻度が高いメッセージを再生しにくくすることで、メッセージ再生時のランダム性を奏したり、敢えて再生頻度が高いメッセージを繰り返し再生することで注意喚起や記憶の定着を促したりすることができる。 According to such a voice response device, it is difficult to reproduce a message having a high reproduction frequency, so that randomness at the time of message reproduction is achieved, or a message having a high reproduction frequency is intentionally reproduced repeatedly. It can promote establishment.
 さらに、上記音声応答装置においては、第23局面の発明のように、
 応答やメッセージに対する回答が得られない場合に、予め設定された連絡先に対して、使用者を特定する情報、および回答が得られなかった旨を送信する未回答時送信手段、
を備えていてもよい。
Further, in the voice response device, as in the invention of the 23rd aspect,
Unanswered transmission means for transmitting information that specifies a user and that a reply was not obtained to a preset contact address when a reply or response to a message is not obtained,
May be provided.
 このような音声応答装置によれば、回答が得られない場合に連絡先に通報することができる。よって、例えば、一人暮らしの老人等の異常を早期に通報することができる。
 また、上記音声応答装置においては、第24局面の発明のように、
 メッセージ再生手段は、会話内容を記憶し、聞いた内容について同じ内容を得るための質問をする(記憶確認処理)
ようにしてもよい。
According to such a voice response device, it is possible to notify a contact person when an answer cannot be obtained. Therefore, for example, an abnormality such as an elderly person living alone can be notified early.
In the voice response device, as in the invention of the 24th aspect,
The message playback means stores the conversation content and asks questions to obtain the same content as the heard content (memory confirmation processing).
You may do it.
 このような音声応答装置によれば、使用者の記憶力の確認をするとともに、記憶の定着を図ることができる。
 さらに、上記音声応答装置においては、第25局面の発明のように、
 使用者が入力する音声の発音やアクセントの正確度合を検出する発話正確度検出手段と、
 検出した正確度合を出力する正確度合出力手段と、
を備えていてもよい。
According to such a voice response device, the user's memory ability can be confirmed and the memory can be fixed.
Further, in the voice response device, as in the invention of the 25th aspect,
Utterance accuracy detection means for detecting the accuracy of the pronunciation and accent of the voice input by the user,
An accuracy output means for outputting the detected accuracy;
May be provided.
 このような音声応答装置によれば、発音やアクセントの正確性を確認することができる。例えば外国語の練習を行う際に有効である。
 また、上記音声応答装置においては、第26局面の発明のように、
 前記正確度合出力手段は、正確度合が一定値以下の場合に、最も近い単語を含む音声を出力する
ようにしてもよい。
According to such a voice response device, the accuracy of pronunciation and accent can be confirmed. For example, it is effective when practicing a foreign language.
In the voice response device, as in the invention of the twenty-sixth aspect,
The accuracy level output means may output a voice including the nearest word when the accuracy level is a predetermined value or less.
 このような音声応答装置によれば、使用者が発音やアクセントの正確性を確認することができる。
 さらに、上記音声応答装置においては、第27局面の発明のように、
 メッセージ再生手段は、正確度合が一定値以下の場合に、再度、同じ質問を出力させるようにしてもよい。
According to such a voice response device, the user can confirm the accuracy of pronunciation and accent.
Furthermore, in the voice response device, as in the invention of the twenty-seventh aspect,
The message reproduction means may output the same question again when the accuracy is below a certain value.
 このような音声応答装置によれば同じ質問を出力することによって正確な回答を求めることができる。
 また、上記音声応答装置においては、第28局面の発明のように、
 入力された文字情報によって通信相手を特定し、通信相手毎に予め設定された通信先と前記通信相手とを接続する接続制御手段、
を備えていてもよい。
According to such a voice response device, an accurate answer can be obtained by outputting the same question.
In the voice response device, as in the invention of the twenty-eighth aspect,
A connection control means for identifying a communication partner based on the inputted character information and connecting the communication partner set in advance for each communication partner and the communication partner;
May be provided.
 このような音声応答装置によれば、受付業務や電話対応を補助することができる。
 特に、上記音声応答装置においては、第29局面の発明のように、
 接続制御手段は、営業活動(セールス)、来客を識別し、営業活動であれば断るメッセージを再生する
ようにしてもよい。
According to such a voice response device, it is possible to assist reception work and telephone support.
Particularly, in the voice response device, as in the invention of the 29th aspect,
The connection control means may identify a sales activity (sales) and a visitor, and reproduce a message to decline if it is a sales activity.
 このような音声応答装置によれば、使用者の業務に支障がある虞がある者を、自身が対応することなく排除することができる。
 さらに、上記音声応答装置においては、第30局面の発明のように、入力された文字情報(特に音声)に含まれるキーワードを抽出し、キーワードが該当する接続先に接続するようにしてもよい。なお、例えば相手先の名称等のキーワードとその接続先とは予め対応付けておけばよい。
According to such a voice response device, it is possible to eliminate a person who may interfere with a user's business without dealing with it.
Further, in the voice response device, as in the invention of the thirtieth aspect, a keyword included in input character information (particularly voice) may be extracted and connected to a connection destination to which the keyword corresponds. For example, a keyword such as the name of the other party may be associated with the connection destination in advance.
 このような音声応答装置によれば、電話の転送や受付への呼び出し等の業務を補助することができる。
 また、上記音声応答装置においては、第31局面の発明のように、キーワードに基づいて相手が話す要件を認識し、相手が話した概要を使用者に伝えるようにしてもよい。
According to such a voice response device, it is possible to assist operations such as telephone transfer and call reception.
Moreover, in the said voice response apparatus, like the invention of the 31st aspect, it may be made to recognize the requirements which the other party speaks based on a keyword, and to tell the user the outline which the other party spoke.
 このような音声応答装置によれば、客先との取次の業務を補助することができる。
 さらに、上記音声応答装置においては、第32局面の発明のように、
 使用者によって入力された音声について、声色から感情を読み取り、通常、怒り、喜び、困惑、悲しみ、高揚のうちの少なくとも1つを含む感情のうちの、何れの感情に該当するかを出力する感情判定手段
を備えていてもよい。
According to such a voice response device, it is possible to assist the intermediary work with the customer.
Furthermore, in the voice response device, as in the invention of the thirty-second aspect,
Emotions that read emotions from the voice color of the voice input by the user and output the emotions that fall into at least one of emotions including at least one of normal, anger, joy, confusion, sadness, and uplift You may provide the determination means.
 このような音声応答装置によれば、使用者の感情に応じて応答を出力することができる。
 次に、第33局面の発明は、
 前記文字情報が入力された際に、当該音声応答装置の周囲を撮像した撮像画像に応じた応答を生成する応答生成手段と、
 前記応答を音声で出力させる音声出力手段と、
 を備えたことを特徴とする。
According to such a voice response device, a response can be output according to the user's emotion.
Next, the invention of the 33rd aspect is
Response generation means for generating a response according to a captured image obtained by imaging the periphery of the voice response device when the character information is input;
Voice output means for outputting the response by voice;
It is provided with.
 このような音声応答装置によれば、撮像画像に応じて応答を音声で出力することができる。したがって、文字情報のみから応答を生成する構成と比較して使い勝手を向上させることができる。 According to such a voice response device, a response can be output by voice according to the captured image. Therefore, usability can be improved compared with the structure which produces | generates a response only from character information.
 本発明の具体的構成としては、例えば、認識したものが何かを応答するよう文字情報を入力し、撮像画像から認識したものが何か(誰か)を音声で出力するなどの構成が挙げられる。 As a specific configuration of the present invention, for example, there is a configuration in which character information is input so that what is recognized responds, and what is recognized from the captured image (someone) is output by voice. .
 ところで、上記音声応答装置においては、第34局面の発明のように、
 文字情報に含まれる物体を撮像画像中から画像処理によって検索し、該検索された物体の位置を特定する位置特定手段検索手段と、
 前記物体の位置まで案内する案内手段と、
 を備えていてもよい。
By the way, in the voice response device, as in the invention of the 34th aspect,
Searching for an object included in the character information from the captured image by image processing, and specifying a position of the searched object;
Guiding means for guiding to the position of the object;
May be provided.
 このような音声応答装置によれば、撮像画像中の物体まで使用者を案内することができる。
 さらに、上記音声応答装置においては、第35局面の発明のように、
 文字情報を音声で入力する際において使用者の口の形状を撮像した動画像を取得する音声入力動画取得手段と、
 前記音声を文字情報に変換し、かつ、該動画像に基づいて、音声の不明確な部分を推定して文字情報を補正する文字情報変換手段と、
 を備えていてもよい。
According to such a voice response device, the user can be guided to the object in the captured image.
Further, in the voice response device, as in the invention of the 35th aspect,
Voice input video acquisition means for acquiring a moving image obtained by capturing the shape of the mouth of the user when inputting character information by voice;
Character information conversion means for converting the sound into character information and correcting the character information by estimating an unclear part of the sound based on the moving image;
May be provided.
 このような音声応答装置によれば、口の形状から発声内容を推定することできるので、音声の不明確な部分を良好に推定することができる。
 また、上記音声応答装置においては、第36局面の発明のように、
 メッセージ再生手段は、思いがけずに発する音声を検出することによって使用者の苛立ちや動揺を検出し、苛立ちや動揺を抑制するためのメッセージを生成する
ようにしてもよい。
According to such a voice response device, the utterance content can be estimated from the shape of the mouth, so that an unclear part of the voice can be estimated well.
In the voice response device, as in the invention of the thirty-sixth aspect,
The message reproduction means may detect the user's irritation and sway by detecting the unexpectedly uttered voice, and may generate a message for suppressing the irritation and sway.
 このような音声応答装置によれば、使用者に苛立ちや動揺がある場合に、これらを抑制することができる。よって、使用者と周囲とのトラブルの発生を抑制することができる。
 さらに、上記音声応答装置においては、第37局面の発明のように、
 目的地までの案内を行う場合において、目的地までの天気、温度、湿度、交通情報、路面状態等の経路情報を取得する経路情報取得手段、を備え、
 メッセージ再生手段は、経路情報を音声で出力させる
ようにしてもよい。
According to such a voice response device, when the user is frustrated or shaken, these can be suppressed. Therefore, the occurrence of trouble between the user and the surroundings can be suppressed.
Further, in the voice response device, as in the invention of the 37th aspect,
In the case of performing guidance to the destination, it is provided with route information acquisition means for acquiring route information such as weather, temperature, humidity, traffic information, road surface condition to the destination,
The message reproducing means may output the route information by voice.
 このような音声応答装置によれば、目的地までの状況(経路情報)を使用者に音声で通知することができる。
 また、上記音声応答装置においては、第38局面の発明のように、
 使用者の視線を検出する視線検出手段と、
 前記メッセージ再生手段による呼びかけに対して所定の位置に使用者の視線が移動しない場合、視線を所定の位置に移動させるよう要求する音声を出力する視線移動要求送信手段と、
 を備えていてもよい。
According to such a voice response device, the situation (route information) to the destination can be notified to the user by voice.
In the voice response device, as in the invention of the thirty-eighth aspect,
Gaze detection means for detecting the gaze of the user;
A line-of-sight movement request transmission unit that outputs a sound requesting to move the line of sight to a predetermined position when the user's line of sight does not move to a predetermined position in response to the call by the message reproduction unit;
May be provided.
 このような音声応答装置によれば、使用者に特定の位置を見させることができる。よって、車両運転時の安全確認などを確実に行うことができる。
 なお、上記音声応答装置においては、第39局面の発明のように、
 体の部位の位置や顔の表情を観察し、前記呼びかけ対する変化が少ない場合、体の部位の位置や顔の表情を変化させるよう要求する音声を出力する変化要求送信手段
を備えていてもよい。
According to such a voice response device, the user can be made to see a specific position. Therefore, it is possible to reliably perform safety confirmation when driving the vehicle.
In the voice response device, as in the invention of the 39th aspect,
A change request transmitting means for observing the position of the body part and facial expression and outputting a voice requesting to change the position of the body part and facial expression when there is little change in the call may be provided. .
 このような音声応答装置によれば、使用者の体の部位の位置を特定の位置に移動させたり、特定の表情をするよう誘導したりすることができる。本発明は、車両の運転時や身体検査等の際に利用することができる。 According to such a voice response device, the position of the body part of the user can be moved to a specific position or can be guided to have a specific facial expression. The present invention can be used when driving a vehicle or performing a physical examination.
 さらに、上記音声応答装置においては、第40局面の発明のように、
 使用者が視聴する放送番組と同様の放送番組を取得する放送番組取得手段と、
 放送番組が途切れた場合に、自身が取得した放送番組を出力することで途切れた放送番組を補完する放送番組補完手段と、
を備えていてもよい。
Furthermore, in the voice response device, as in the invention of the fortyth aspect,
Broadcast program acquisition means for acquiring a broadcast program similar to the broadcast program viewed by the user;
A broadcast program supplementing means for complementing the discontinued broadcast program by outputting the broadcast program acquired by itself when the broadcast program is interrupted;
May be provided.
 このような音声応答装置によれば、使用者が視聴する放送番組が途切れないように補うことができる。
 また、上記音声応答装置においては、第41局面の発明のように、
 歌詞無しの楽曲に使用者が歌詞を付して歌う場合において、歌詞ありの楽曲と使用者が付した歌詞とを比較し、使用者の歌詞のみがない部分において歌詞を音声で出力させる歌詞付加手段、を備えたこと。
According to such a voice response device, it is possible to compensate for the broadcast program viewed by the user from being interrupted.
In the voice response device, as in the invention of the forty-first aspect,
When a user sings a song without lyrics, the song with lyrics is compared to the song with the lyrics, and the lyrics are added to output the sound in the part where only the user's lyrics are not present Means.
 このような音声応答装置によれば、いわゆるカラオケにおいて使用者が歌えない部分(歌詞が途切れた部分)を補うことができる。
 さらに、上記音声応答装置においては、第42局面の発明のように、
 撮像画像中に文字が含まれる場合において、使用者からこの文字の読み方についての質問を受けると、この文字の情報を外部から取得し、この情報に含まれる文字の読み方を音声で出力させる読み方出力手段、
を備えていてもよい。
According to such a voice response device, it is possible to compensate for a portion where the user cannot sing (a portion where the lyrics are interrupted) in so-called karaoke.
Furthermore, in the voice response device, as in the invention of the forty-second aspect,
When a character is included in the captured image, when a user receives a question about how to read this character, the character information is acquired from the outside, and the reading output that causes the character reading included in this information to be output by voice means,
May be provided.
 このような音声応答装置によれば、文字の読み方を使用者に教えることができる。
 また、上記音声応答装置においては、第43局面の発明のように、
 使用者の行動や使用者の周囲環境を検出する行動環境検出手段を備え、
 メッセージ生成手段は、検出された行動や周囲環境に応じてメッセージを生成する
ようにしてもよい。
According to such a voice response device, the user can be taught how to read characters.
In the voice response device, as in the invention of the 43rd aspect,
It is equipped with behavioral environment detection means that detects the user's behavior and the surrounding environment of the user,
The message generation means may generate a message according to the detected action and the surrounding environment.
 このような音声応答装置によれば、危険な場所や立ち入り禁止の領域などを報知することができる。また、使用者に異常な行動があることなどを検出することができる。
 さらに、上記音声応答装置においては、第44局面の発明のように、
 使用者を撮像した撮像画像に基づいて、健康状態を判定する健康状態判定手段と、
 健康状態に応じてメッセージを生成する健康メッセージ生成手段と、
を備えていてもよい。
According to such a voice response device, it is possible to notify a dangerous place, a prohibited area, or the like. It is also possible to detect that the user has an abnormal behavior.
Furthermore, in the voice response device, as in the invention of the forty-fourth aspect,
A health condition determining means for determining a health condition based on a captured image of the user;
Health message generating means for generating a message according to the health condition;
May be provided.
 このような音声応答装置によれば、使用者の健康状態を管理することができる。
 また、上記音声応答装置においては、第45局面の発明のように、
 健康状態が基準値を下回る場合に、所定の連絡先に通報を行う通報手段、
を備えていてもよい。
According to such a voice response device, the health condition of the user can be managed.
In the voice response device, as in the invention of the 45th aspect,
A reporting means for reporting to a specified contact when the health condition falls below a reference value;
May be provided.
 このような音声応答装置によれば、使用者の健康状態が基準値以下の場合に、通報を行うことができる。よってより早期に異常を他者に報知することができる。
 さらに、上記音声応答装置においては、第46局面の発明のように、使用者以外の者から問い合わせに対して使用者についての情報を出力するようにしてもよい。
According to such a voice response device, it is possible to make a report when the health state of the user is equal to or less than a reference value. Therefore, the abnormality can be notified to the other person earlier.
Further, in the above voice response device, as with the invention of the 46th aspect, information about the user may be output in response to an inquiry from a person other than the user.
 このような音声応答装置によれば、例えば、使用者の食事内容な散歩の距離などを検出しておけば、病院等での質問に使用者に代わって回答することができる。また、健康状態や自己紹介など学習しておくようにしてもよい。 Such a voice response device can answer a question in a hospital or the like on behalf of the user by detecting, for example, the walk distance of the user's meal content. Moreover, you may be allowed to learn about health conditions and self-introduction.
 なお、各局面の発明は、他の発明を前提とする必要はなく、可能な限り独立した発明とすることができる。 Note that the invention of each aspect does not have to be based on other inventions, and can be made as independent as possible.
本発明が適用された音声応答システムの概略構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a voice response system to which the present invention is applied. 端末装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of a terminal device. 端末装置のMPUが実行する音声応答端末処理を示すフローチャートである。It is a flowchart which shows the voice response terminal process which MPU of a terminal device performs. サーバの演算部が実行する音声応答サーバ処理を示すフローチャートである。It is a flowchart which shows the voice response server process which the calculating part of a server performs. 応答候補DBの一例を示す説明図である。It is explanatory drawing which shows an example of response candidate DB. 端末装置のMPUが実行する自動会話端末処理を示すフローチャートである。It is a flowchart which shows the automatic conversation terminal process which MPU of a terminal device performs. サーバの演算部が実行する自動会話サーバ処理を示すフローチャートである。It is a flowchart which shows the automatic conversation server process which the calculating part of a server performs. 端末装置のMPUが実行する伝言端末処理を示すフローチャートである。It is a flowchart which shows the message terminal process which MPU of a terminal device performs. サーバの演算部が実行する伝言サーバ処理を示すフローチャートである。It is a flowchart which shows the message server process which the calculating part of a server performs. 端末装置のMPUが実行する誘導端末処理を示すフローチャートである。It is a flowchart which shows the guidance terminal process which MPU of a terminal device performs. サーバの演算部が実行する誘導サーバ処理を示すフローチャートである。It is a flowchart which shows the guidance server process which the calculating part of a server performs. サーバの演算部が実行する受付処理を示すフローチャートである。It is a flowchart which shows the reception process which the calculating part of a server performs. 端末装置のMPUが実行する情報提供端末処理を示すフローチャートである。It is a flowchart which shows the information provision terminal process which MPU of a terminal device performs. 性格DBの一例を示す説明図である。It is explanatory drawing which shows an example of character DB. 端末装置のMPUが実行する性格情報生成処理を示すフローチャートである。It is a flowchart which shows the character information generation process which MPU of a terminal device performs. 嗜好DBの一例を示す説明図である。It is explanatory drawing which shows an example of preference DB. サーバの演算部が実行する嗜好情報生成処理を示すフローチャートである。It is a flowchart which shows the preference information generation process which the calculating part of a server performs. 性格区分と嗜好との組み合わせ例を示す説明図である。It is explanatory drawing which shows the example of a combination of a character classification and a preference. サーバの演算部が実行する動作文字入力処理を示すフローチャートである。It is a flowchart which shows the operation character input process which the calculating part of a server performs. サーバの演算部が実行する他端末利用処理を示すフローチャートである。It is a flowchart which shows the other terminal utilization process which the calculating part of a server performs. サーバの演算部が実行する記憶確認処理を示すフローチャートである。It is a flowchart which shows the memory confirmation process which the calculating part of a server performs. サーバの演算部が実行する発音判定処理1を示すフローチャートである。It is a flowchart which shows the pronunciation determination process 1 which the calculating part of a server performs. サーバの演算部が実行する発音判定処理2を示すフローチャートである。It is a flowchart which shows the pronunciation determination process 2 which the calculating part of a server performs. サーバの演算部が実行する発音判定処理3を示すフローチャートである。It is a flowchart which shows the pronunciation determination process 3 which the calculating part of a server performs. サーバの演算部が実行する感情判定処理を示すフローチャートである。It is a flowchart which shows the emotion determination process which the calculating part of a server performs. サーバの演算部が実行する感情応答生成処理を示すフローチャートである。It is a flowchart which shows the emotion response production | generation process which the calculating part of a server performs. サーバの演算部が実行する案内処理を示すフローチャートである。It is a flowchart which shows the guidance process which the calculating part of a server performs. サーバの演算部が実行する移動要求処理1を示すフローチャートである。It is a flowchart which shows the movement request process 1 which the calculating part of a server performs. サーバの演算部が実行する移動要求処理2を示すフローチャートである。It is a flowchart which shows the movement request process 2 which the calculating part of a server performs. サーバの演算部が実行する放送楽曲補完処理を示すフローチャートである。It is a flowchart which shows the broadcast music supplement process which the calculating part of a server performs. サーバの演算部が実行する文字解説処理を示すフローチャートである。It is a flowchart which shows the character commentary process which the calculating part of a server performs. サーバの演算部が実行する行動応答端末処理を示すフローチャートである。It is a flowchart which shows the action response terminal process which the calculating part of a server performs. サーバの演算部が実行する行動応答サーバ処理を示すフローチャートである。It is a flowchart which shows the action response server process which the calculating part of a server performs.
 1…端末装置、10…行動センサユニット、11…次元加速度センサ、13…軸ジャイロセンサ、15…温度センサ、17…湿度センサ、19…温度センサ、21…湿度センサ、23…照度センサ、25…濡れセンサ、27…GPS受信機、29…風速センサ、33…心電センサ、35…心音センサ、37…マイク、39…メモリ、41…カメラ、50…通信部、53…無線電話ユニット、55…連絡先メモリ、60…報知部、61…ディスプレイ、63…電飾、65…スピーカ、70…操作部、71…タッチパッド、73…確認ボタン、75…指紋センサ、77…救援依頼レバー、80…通信基地局、85…インターネット網、90…サーバ、100…音声応答システム、101…演算部、102…音声認識DB、103…予測変換DB、104…音声DB、105…応答候補DB、106…性格DB、107…学習DB、108…嗜好DB、109…ニュースDB、110…天気DB、111…再生条件DB、112…手書き文字・手話DB、113…端末情報DB、114…感情判定DB、115…健康判定DB、116…カラオケDB、117…通報先DB、118…セールスDB、119…クライアントDB。 DESCRIPTION OF SYMBOLS 1 ... Terminal device, 10 ... Behavior sensor unit, 11 ... Dimensional acceleration sensor, 13 ... Axis gyro sensor, 15 ... Temperature sensor, 17 ... Humidity sensor, 19 ... Temperature sensor, 21 ... Humidity sensor, 23 ... Illuminance sensor, 25 ... Wet sensor 27 ... GPS receiver 29 ... Wind speed sensor 33 ... Electrocardiographic sensor 35 ... Heart sound sensor 37 ... Microphone 39 ... Memory 41 ... Camera 50 ... Communication unit 53 ... Wireless telephone unit 55 ... Contact memory, 60 ... notification unit, 61 ... display, 63 ... lighting, 65 ... speaker, 70 ... operation unit, 71 ... touch pad, 73 ... confirmation button, 75 ... fingerprint sensor, 77 ... relief request lever, 80 ... Communication base station, 85 ... Internet network, 90 ... Server, 100 ... Voice response system, 101 ... Calculation unit, 102 ... Voice recognition DB, 103 ... Predictive conversion DB 104 ... voice DB, 105 ... response candidate DB, 106 ... personality DB, 107 ... learning DB, 108 ... preference DB, 109 ... news DB, 110 ... weather DB, 111 ... reproduction condition DB, 112 ... handwritten character / sign language DB, 113 ... terminal information DB, 114 ... emotion judgment DB, 115 ... health judgment DB, 116 ... karaoke DB, 117 ... report destination DB, 118 ... sales DB, 119 ... client DB.
 以下に本発明にかかる実施の形態を図面と共に説明する。
 [第1実施形態]
 [本実施形態の構成]
 本発明が適用された音声応答システム100は、端末装置1において入力された音声に対して、サーバ90にて適切な応答を生成し、端末装置1で応答を音声で出力するよう構成されたシステムである。詳細には、図1に示すように、音声応答システム100は、複数の端末装置1とサーバ90とが通信基地局80やインターネット網85を介して互いに通信可能に構成されている。
Embodiments according to the present invention will be described below with reference to the drawings.
[First Embodiment]
[Configuration of this embodiment]
The voice response system 100 to which the present invention is applied is a system configured to generate an appropriate response at the server 90 and output the response by voice at the terminal device 1 with respect to the voice input at the terminal device 1. It is. Specifically, as shown in FIG. 1, the voice response system 100 is configured such that a plurality of terminal devices 1 and a server 90 can communicate with each other via a communication base station 80 or an Internet network 85.
 サーバ90は、通常のサーバ装置としての機能を備えている。特にサーバ90は、演算部101と、各種データベース(DB)とを備えている。演算部101は、CPUと、ROM、RAM等のメモリを備えた周知の演算装置として構成されており、メモリ内のプログラムに基づいて、インターネット網85を介した端末装置1等との通信や、各種DB内のデータの読み書き、或いは、端末装置1を利用する使用者との会話を行うための音声認識や応答生成といった各種処理を実施する。 The server 90 has a function as a normal server device. In particular, the server 90 includes a calculation unit 101 and various databases (DB). The calculation unit 101 is configured as a well-known calculation device including a CPU and a memory such as a ROM and a RAM. Based on a program in the memory, the calculation unit 101 communicates with the terminal device 1 and the like via the Internet network 85, Various processes such as voice recognition and response generation for performing reading / writing of data in various DBs or conversation with a user using the terminal device 1 are performed.
 各種DBとしては、図1に示すように、音声認識DB102、予測変換DB103、音声DB104、応答候補DB105、性格DB106、学習DB107、嗜好DB108、ニュースDB109、天気DB110、再生条件DB111、手書き文字・手話DB112、端末情報DB113、感情判定DB114、健康判定DB115、カラオケDB116、通報先DB117、セールスDB118、クライアントDB119等を備えている。なお、これらのDBの詳細については、処理の説明の都度述べることにする。 As various DBs, as shown in FIG. 1, a speech recognition DB 102, a predictive conversion DB 103, a speech DB 104, a response candidate DB 105, a personality DB 106, a learning DB 107, a preference DB 108, a news DB 109, a weather DB 110, a reproduction condition DB 111, handwritten characters / sign language DB 112, terminal information DB 113, emotion determination DB 114, health determination DB 115, karaoke DB 116, report destination DB 117, sales DB 118, client DB 119, and the like. The details of these DBs will be described every time the processing is described.
 次に、端末装置1は、図2に示すように、行動センサユニット10と、通信部50と、報知部60と、操作部70と、が所定の筐体に備えられて構成されている。
 行動センサユニット10は、周知のMPU31(マイクロプロセッサユニット)、ROM、RAM等のメモリ39、および各種センサを備えており、MPU31は各種センサを構成するセンサ素子が検査対象(湿度、風速等)を良好に検出することができるように、例えば、センサ素子の温度に最適化するためのヒータを駆動させる等の処理を行う。
Next, as illustrated in FIG. 2, the terminal device 1 includes a behavior sensor unit 10, a communication unit 50, a notification unit 60, and an operation unit 70 provided in a predetermined housing.
The behavior sensor unit 10 includes a well-known MPU 31 (microprocessor unit), a memory 39 such as a ROM and a RAM, and various sensors. The MPU 31 includes sensor elements that constitute various sensors to be inspected (humidity, wind speed, etc.). For example, processing such as driving a heater for optimizing the temperature of the sensor element is performed so that the detection can be performed satisfactorily.
 行動センサユニット10は、各種センサとして、3次元加速度センサ11(3DGセンサ)と、3軸ジャイロセンサ13と、筐体の背面に配置された温度センサ15と、筐体の背面に配置された湿度センサ17と、筐体の正面に配置された温度センサ19と、筐体の正面に配置された湿度センサ21と、筐体の正面に配置された照度センサ23と、筐体の背面に配置された濡れセンサ25と、端末装置1の現在地を検出するGPS受信機27と、風速センサ29とを備えている。 The behavior sensor unit 10 includes, as various sensors, a three-dimensional acceleration sensor 11 (3DG sensor), a three-axis gyro sensor 13, a temperature sensor 15 disposed on the back surface of the housing, and humidity disposed on the back surface of the housing. The sensor 17, the temperature sensor 19 disposed on the front surface of the housing, the humidity sensor 21 disposed on the front surface of the housing, the illuminance sensor 23 disposed on the front surface of the housing, and the rear surface of the housing. A wetness sensor 25, a GPS receiver 27 that detects the current location of the terminal device 1, and a wind speed sensor 29.
 また、行動センサユニット10は、各種センサとして、心電センサ33、心音センサ35、マイク37、カメラ41も備えている。なお、各温度センサ15,19、および各湿度センサ17,21は、筐体の外部空気の温度または湿度を検査対象として測定を行う。 The behavior sensor unit 10 also includes an electrocardiogram sensor 33, a heart sound sensor 35, a microphone 37, and a camera 41 as various sensors. The temperature sensors 15 and 19 and the humidity sensors 17 and 21 measure the temperature or humidity of the outside air of the housing as an inspection target.
 3次元加速度センサ11は、端末装置1に加えられる互いに直交する3方向(鉛直方向(Z方向)、筐体の幅方向(Y方向)、および筐体の厚み方向(X方向))における加速度を検出し、この検出結果を出力する。 The three-dimensional acceleration sensor 11 measures accelerations applied to the terminal device 1 in three orthogonal directions (vertical direction (Z direction), width direction of the casing (Y direction), and thickness direction of the casing (X direction)). Detect and output the detection result.
 3軸ジャイロセンサ13は、端末装置1に加えられる角速度として、鉛直方向(Z方向)と、該鉛直方向とは直交する任意の2方向(筐体の幅方向(Y方向)、および筐体の厚み方向(X方向))における角加速度(各方向における左回りの各速度を正とする)を検出し、この検出結果を出力する。 The three-axis gyro sensor 13 has an angular velocity applied to the terminal device 1 as a vertical direction (Z direction), two arbitrary directions orthogonal to the vertical direction (a width direction of the casing (Y direction), and a casing Angular acceleration (thickness direction (X direction)) (counterclockwise speed in each direction is positive) is detected, and the detection result is output.
 温度センサ15,19は、例えば温度に応じて電気抵抗が変化するサーミスタ素子を備えて構成されている。なお、本実施例においては、温度センサ15,19は摂氏温度を検出し、以下の説明に記載する温度表示は全て摂氏温度で行うものとする。 The temperature sensors 15 and 19 include, for example, a thermistor element whose electric resistance changes according to temperature. In this embodiment, the temperature sensors 15 and 19 detect the Celsius temperature, and all temperature displays described in the following description are performed at the Celsius temperature.
 湿度センサ17,21は、例えば周知の高分子膜湿度センサとして構成されている。この高分子膜湿度センサは、相対湿度の変化に応じて高分子膜に含まれる水分の量が変化し、誘電率が変化するコンデンサとして構成されている。 The humidity sensors 17 and 21 are configured as, for example, known polymer film humidity sensors. This polymer film humidity sensor is configured as a capacitor in which the amount of moisture contained in the polymer film changes in accordance with the change in relative humidity and the dielectric constant changes.
 照度センサ23は、例えばフォトトランジスタを備えた周知の照度センサとして構成されている。
 風速センサ29は、例えば周知の風速センサであって、ヒータ温度を所定温度に維持する際に必要な電力(放熱量)から風速を算出する。
The illuminance sensor 23 is configured as a well-known illuminance sensor including a phototransistor, for example.
The wind speed sensor 29 is, for example, a well-known wind speed sensor, and calculates the wind speed from electric power (heat radiation amount) necessary for maintaining the heater temperature at a predetermined temperature.
 心音センサ35は、使用者の心臓の拍動による振動を捉える振動センサとして構成されており、MPU31は心音センサ35による検出結果とマイク37から入力される心音とを鑑みて、拍動による振動や騒音と、他の振動や騒音とを識別する。 The heart sound sensor 35 is configured as a vibration sensor that captures vibrations caused by the beat of the heart of the user. The MPU 31 considers the detection result of the heart sound sensor 35 and the heart sound input from the microphone 37, Distinguish between noise and other vibrations and noise.
 濡れセンサ25は筐体表面の水滴を検出し、心電センサ33は使用者の鼓動を検出する。
 カメラ41は、端末装置1の筐体内において、端末装置1の外部を撮像範囲とするように配置されている。
The wetness sensor 25 detects water droplets on the surface of the housing, and the electrocardiographic sensor 33 detects the user's heartbeat.
The camera 41 is arranged in the casing of the terminal device 1 so that the outside of the terminal device 1 is an imaging range.
 通信部50は、周知のMPU51と、無線電話ユニット53と、連絡先メモリ55と、を備え、図示しない入出力インターフェイスを介して行動センサユニット10を構成する各種センサからの検出信号を取得可能に構成されている。そして、通信部50のMPU51は、この行動センサユニット10による検出結果や、操作部70を介して入力される入力信号、ROM(図示省略)に格納されたプログラムに応じた処理を実行する。 The communication unit 50 includes a well-known MPU 51, a wireless telephone unit 53, and a contact memory 55, and can acquire detection signals from various sensors constituting the behavior sensor unit 10 via an input / output interface (not shown). It is configured. And MPU51 of the communication part 50 performs the process according to the detection result by this behavior sensor unit 10, the input signal input via the operation part 70, and the program stored in ROM (illustration omitted).
 具体的には、通信部50のMPU51は、使用者が行う特定の動作を検出する動作検出装置としての機能、使用者との位置関係を検出する位置関係検出装置としての機能、使用者により行われる運動の負荷を検出する運動負荷検出装置としての機能、およびMPU51による処理結果を送信する機能を実行する。 Specifically, the MPU 51 of the communication unit 50 functions as an operation detection device that detects a specific operation performed by the user, a function as a positional relationship detection device that detects a positional relationship with the user, and is performed by the user. The function as an exercise load detection device for detecting the exercise load and the function of transmitting the processing result by the MPU 51 are executed.
 無線電話ユニット53は、例えば携帯電話の基地局と通信可能に構成されており、通信部50のMPU51は、該MPU51による処理結果を報知部60に対して出力したり、無線電話ユニット53を介して予め設定された送信先に対して送信したりする。 The radio telephone unit 53 is configured to be able to communicate with, for example, a mobile phone base station, and the MPU 51 of the communication unit 50 outputs a processing result by the MPU 51 to the notification unit 60 or via the radio telephone unit 53. To a preset destination.
 連絡先メモリ55は、使用者の訪問先の位置情報を記憶するための記憶領域として機能する。この連絡先メモリ55には、使用者に異常が生じた場合に連絡をすべき連絡先(電話番号など)の情報が記録されている。 The contact address memory 55 functions as a storage area for storing location information of the user's visit destination. The contact address memory 55 stores information on contact information (such as a telephone number) to be contacted when an abnormality occurs in the user.
 報知部60は、例えば、LCDや有機ELディスプレイとして構成されたディスプレイ61と、例えば7色に発光可能なLEDからなる電飾63と、スピーカ65とを備えている。報知部60を構成する各部は、通信部50のMPU51により駆動制御される。 The notification unit 60 includes, for example, a display 61 configured as an LCD or an organic EL display, an electrical decoration 63 made of LEDs that can emit light in, for example, seven colors, and a speaker 65. Each part which comprises the alerting | reporting part 60 is drive-controlled by MPU51 of the communication part 50. FIG.
 次に、操作部70としては、タッチパッド71と、確認ボタン73と、指紋センサ75と、救援依頼レバー77とを備えている。
 タッチパッド71は、使用者(使用者や使用者の保護者等)により触れられた位置や圧力に応じた信号を出力する。
Next, the operation unit 70 includes a touch pad 71, a confirmation button 73, a fingerprint sensor 75, and a rescue request lever 77.
The touch pad 71 outputs a signal corresponding to the position and pressure touched by the user (user, user's guardian, etc.).
 確認ボタン73は、使用者に押下されると内蔵されたスイッチの接点が閉じるように構成されており、通信部50にて確認ボタン73が押下されたことを検出することができるようにされている。 The confirmation button 73 is configured so that the contact of the built-in switch is closed when pressed by the user, and the communication unit 50 can detect that the confirmation button 73 is pressed. Yes.
 指紋センサ75は、周知の指紋センサであって、例えば、光学式センサを用いて指紋を読みとることができるよう構成されている。なお、指紋センサ75に換えて、例えば掌の静脈の形状を認識するセンサ等、人間の身体的特徴を認識することができる手段(バイオメトリクス認証をすることができる手段:個人を特定することができる手段)であれば、採用することができる。 The fingerprint sensor 75 is a well-known fingerprint sensor, and is configured to be able to read a fingerprint using, for example, an optical sensor. In addition, instead of the fingerprint sensor 75, for example, a means for recognizing a human physical feature such as a sensor for recognizing the shape of a palm vein (means capable of biometric authentication: identifying an individual) If it is a possible means), it can be adopted.
 また、操作されると所定の連絡先に接続される救援依頼レバー77も備えている。
 [本実施形態の処理]
 このような音声応答システム100において実施される処理について以下に説明する。
In addition, a rescue request lever 77 connected to a predetermined contact address when operated is also provided.
[Process of this embodiment]
Processing executed in the voice response system 100 will be described below.
 端末装置1にて実施される音声応答端末処理は、使用者による音声入力を受付けてこの音声をサーバ90に送り、サーバ90から出力すべき応答を受けるとこの応答を音声で再生する処理である。なお、この処理は、使用者が操作部70を介して音声入力を行う旨を入力すると開始される。 The voice response terminal process performed in the terminal device 1 is a process of receiving voice input by the user, sending the voice to the server 90, and playing back the voice response when receiving a response to be output from the server 90. . This process is started when the user inputs a voice input via the operation unit 70.
 詳細には、図3に示すように、まず、マイク37からの入力を受け付ける状態(ON状態)とし(S2)、カメラ41による撮像(録画)を開始する(S4)。そして、音声入力があったか否かを判定する(S6)。 In detail, as shown in FIG. 3, first, the input from the microphone 37 is accepted (ON state) (S2), and imaging (recording) by the camera 41 is started (S4). Then, it is determined whether or not there is a voice input (S6).
 音声入力がなければ(S6:NO)、タイムアウトしたか否かを判定する(S8)。ここで、タイムアウトとは、処理を待機する際の許容時間を超えたことを示し、ここでは許容時間は例えば5秒程度に設定される。 If there is no voice input (S6: NO), it is determined whether a time-out has occurred (S8). Here, the timeout indicates that the allowable time for waiting for processing has been exceeded, and here the allowable time is set to about 5 seconds, for example.
 タイムアウトしていれば(S8:YES)、後述するS30の処理に移行する。また、タイムアウトしていなければ(S8:NO)、S6の処理に戻る。
 音声入力があれば(S6:YES)、音声をメモリに記録し(S10)、音声の入力が終了したか否かを判定する(S12)。ここでは、音声が一定時間以上途切れた場合や、操作部70を介して音声入力を終了する旨が入力された場合に、音声の入力が終了したと判定する。
If time-out has occurred (S8: YES), the process proceeds to S30 described later. If the time has not expired (S8: NO), the process returns to S6.
If there is a voice input (S6: YES), the voice is recorded in the memory (S10), and it is determined whether or not the voice input is completed (S12). Here, it is determined that the input of the voice has ended when the voice has been interrupted for a certain period of time or when an input to end the voice input is made via the operation unit 70.
 音声の入力が終了していなければ(S12:NO)、S10の処理に戻る。また、音声の入力が終了していれば(S12:YES)、自身を特定するためのID、音声、および撮像画像等のデータをサーバ90に対してパケット送信する(S14)。なお、データを送信する処理は、S10とS12の間で行ってもよい。 If the voice input is not completed (S12: NO), the process returns to S10. If the voice input has been completed (S12: YES), data such as an ID for identifying itself, a voice, and a captured image are packet-transmitted to the server 90 (S14). Note that the process of transmitting data may be performed between S10 and S12.
 続いて、データの送信が完了したか否かを判定する(S16)。送信が完了していなければ(S16:NO)、S14の処理に戻る。
 また、送信が完了していれば(S16:YES)、後述する音声応答サーバ処理にて送信されるデータ(パケット)を受信したか否かを判定する(S18)。データを受信していなければ(S18:NO)、タイムアウトしたか否かを判定する(S20)。
Subsequently, it is determined whether or not the data transmission is completed (S16). If transmission has not been completed (S16: NO), the process returns to S14.
If the transmission has been completed (S16: YES), it is determined whether or not data (packet) transmitted by the voice response server process described later has been received (S18). If no data has been received (S18: NO), it is determined whether or not a timeout has occurred (S20).
 タイムアウトしていれば(S20:YES)、後述するS30の処理に移行する。また、タイムアウトしていなければ(S20:NO)、S18の処理に戻る。
 また、データを受信していれば(S18:YES)、パケットを受信する(S22)。この処理では、文字情報に対する1または複数の異なる応答がそれぞれ異なる声色で対応付けられたものを取得する。
If time-out has occurred (S20: YES), the process proceeds to S30 described later. If the time has not expired (S20: NO), the process returns to S18.
If data has been received (S18: YES), a packet is received (S22). In this process, one in which one or a plurality of different responses to character information are associated with different voice colors is acquired.
 そして、受信が完了したか否かを判定する(S24)。受信が完了していなければ(S24:NO)、タイムアウトしたか否かを判定する(S26)。
 タイムアウトしていれば(S26:YES)、エラーが発生した旨を報知部60を介して出力し、音声応答端末処理を終了する。また、タイムアウトしていなければ(S26:NO)、S22の処理に戻る。
Then, it is determined whether the reception is completed (S24). If reception has not been completed (S24: NO), it is determined whether or not a timeout has occurred (S26).
If timeout has occurred (S26: YES), the fact that an error has occurred is output via the notification unit 60, and the voice response terminal process is terminated. If the time has not expired (S26: NO), the process returns to S22.
 また、受信が完了していれば(S24:YES)、受信したパケットに基づく応答を音声でスピーカ65から出力させる(S28)。この処理では、複数の応答を再生する場合には、複数の応答がそれぞれ異なる声色で再生される。このような処理が終了すると、音声応答端末処理を終了する。 If the reception is completed (S24: YES), a response based on the received packet is output from the speaker 65 by voice (S28). In this process, when a plurality of responses are reproduced, the plurality of responses are reproduced with different voice colors. When such processing ends, the voice response terminal processing ends.
 続いて、サーバ90(外部装置)にて実施される音声応答サーバ処理について図4を用いて説明する。音声応答サーバ処理は、端末装置1から音声を受信し、この音声を文字情報に変換する音声認識を行うとともに、音声に対する応答を生成して端末装置1に返す処理である。特に、本実施形態においては、複数の応答を異なる声色の音声と対応付けて送信する場合がある。 Subsequently, the voice response server process performed by the server 90 (external device) will be described with reference to FIG. The voice response server process is a process of receiving voice from the terminal device 1, performing voice recognition for converting the voice into character information, and generating a response to the voice and returning it to the terminal device 1. In particular, in the present embodiment, a plurality of responses may be transmitted in association with different voice colors.
 音声応答サーバ処理の詳細としては、図4に示すように、まず、何れかの端末装置1からのパケットを受信したか否かを判定する(S42)。パケットを受信していなければ(S42:NO)、S42の処理を繰り返す。 As the details of the voice response server process, as shown in FIG. 4, it is first determined whether or not a packet from any one of the terminal devices 1 has been received (S42). If no packet has been received (S42: NO), the process of S42 is repeated.
 また、パケットを受信していれば(S42:YES)、通信相手の端末装置1を特定する(S44)。この処理では、パケットに含まれる端末装置1のIDによって端末装置1を特定する。 If a packet has been received (S42: YES), the communication partner terminal device 1 is specified (S44). In this process, the terminal device 1 is specified by the ID of the terminal device 1 included in the packet.
 続いて、パケットに含まれる音声を認識する(S46)。ここで、音声認識DB102においては、多数の音声の波形と多数の文字とが対応付けられている。また、予測変換DB103には、ある単語に続いて利用されがちな単語が対応付けられている。 Subsequently, the voice included in the packet is recognized (S46). Here, in the speech recognition DB 102, many speech waveforms and many characters are associated with each other. In addition, the predictive conversion DB 103 is associated with a word that is likely to be used after a certain word.
 そこで、この処理では、音声認識DB102および予測変換DB103を参照することで、周知の音声認識処理を実施し、音声を文字情報に変換する。
 続いて、撮像画像を画像処理することによって、撮像画像中の物体を特定する(S48)。そして、音声の波形や言葉の語尾などに基づいて、使用者の感情を判定する(S50)。
Therefore, in this process, a known voice recognition process is performed by referring to the voice recognition DB 102 and the prediction conversion DB 103 to convert the voice into character information.
Subsequently, an object in the captured image is specified by performing image processing on the captured image (S48). Then, the user's emotion is determined based on the waveform of the voice or the ending of the word (S50).
 この処理では、音声の波形(声色)や言葉の語尾などと、通常、怒り、喜び、困惑、悲しみ、高揚などの感情の区分とが対応付けられた感情判定DB114を参照することによって、使用者の感情が何れかの区分に該当するかを判定し、この判定結果をメモリに記録する。続いて、学習DB107を参照することによって、この使用者がよく話す単語を検索し、音声認識にて生成した文字情報が曖昧であった部位を補正する。 In this process, the user is referred to by referring to the emotion determination DB 114 in which a speech waveform (voice color), a word ending, and the like are usually associated with emotion categories such as anger, joy, confusion, sadness, and elevation. It is determined whether or not the emotion falls in any category, and the determination result is recorded in the memory. Subsequently, by referring to the learning DB 107, a word often spoken by the user is searched, and a portion where the character information generated by the speech recognition is ambiguous is corrected.
 なお、学習DB107には、使用者がよく話す単語や発音時の癖など、使用者の特徴が使用者ごとに記録されている。また、使用者との会話において学習DB107へのデータの追加・修正がなされる。 In the learning DB 107, user features such as words often spoken by the user and habits during pronunciation are recorded for each user. Further, addition / correction of data to the learning DB 107 is performed in a conversation with the user.
 続いて、補正後の文字情報を入力された文字情報として特定し(S54)、文字情報に類似する文章を入力として応答候補DB105から検索することによって、応答候補DB105から応答を取得する(S56)。ここで、応答候補DB105には、図5に示すように、入力となる文字情報、第1出力、第1出力の声色、第2出力、第2出力の声色が一義に対応付けられている。 Subsequently, the corrected character information is specified as the input character information (S54), and a response similar to the character information is retrieved from the response candidate DB 105 as an input to obtain a response from the response candidate DB 105 (S56). . Here, in the response candidate DB 105, as shown in FIG. 5, input character information, first output, first output voice color, second output, and second output voice color are uniquely associated.
 例えば、図5の第1段目に示すように、「今日の※の天気」という文字情報が入力されると、「今日の※の天気は※です」という第1出力が女1の声色に対応付けて出力される。ただし、「※」の部分は、地域名とその地域での数日間の天気予報とが対応付けられた天気DB110にアクセスすることで取得される。 For example, as shown in the first row of FIG. 5, when the text information “Today's * weather” is input, the first output “Today's * weather is *” will be the voice of woman 1. Output in association. However, the portion “*” is acquired by accessing the weather DB 110 in which the region name and the weather forecast for several days in the region are associated with each other.
 また、「今日の※の天気」という文字情報が入力された場合には、今日の天気が変化するタイミングの天気も天気DB110から取得し、「ただし※は※です。」という第2出力が男1の声色に対応付けて出力される。今日の東京の天気が晴れで明日の天気が雨の場合において「今日の東京の天気」と入力された場合、女1の声色で、「今日の東京の天気は晴れです。」と出力され、男1の声色で、「ただし明日は雨です。」と出力されることになる。 In addition, when the text information “Today's weather *” is input, the weather at the time when today's weather changes is also acquired from the weather DB 110, and the second output “However, * is *.” 1 is output in association with the voice color. When today's Tokyo weather is sunny and tomorrow's weather is rainy, "Today's Tokyo weather" is entered, and the voice of woman 1 is output, "Today's Tokyo weather is sunny." The voice of man 1 will output “However, it will rain tomorrow.”
 なお、本実施形態では、複数の応答を出力する場合を説明したが、入力に対する回答が1つだけの場合には応答は1つだけになる。このため、応答は1つであるか否かを判定する(S58)。応答が1つだけであれば(S58:YES)、後述するS62の処理に移行する。 In this embodiment, the case where a plurality of responses are output has been described. However, when there is only one answer to the input, there is only one response. For this reason, it is determined whether there is one response (S58). If there is only one response (S58: YES), the process proceeds to S62 described later.
 また、応答が複数であれば(S58:NO)、応答内容と声色とを対応付ける(S60)。ここで、音声DB104には、人工音声のデータベースが声色毎に格納されており、この処理では、各応答に対して設定された声色を、データベース中の声色と対応付ける。 If there are a plurality of responses (S58: NO), the response contents are associated with the voice color (S60). Here, the voice DB 104 stores an artificial voice database for each voice color, and in this process, the voice color set for each response is associated with the voice color in the database.
 続いて、応答内容を音声に変換する(S62)。この処理では、音声DB104に格納されたデータベースに基づいて、応答内容(文字情報)を音声として出力する処理を行う。 Subsequently, the response content is converted into voice (S62). In this process, based on a database stored in the voice DB 104, a process for outputting response contents (character information) as a voice is performed.
 そして、生成した応答(音声)を通信相手の端末装置1にパケット送信する(S64)。なお、応答内容の音声を生成しつつパケット送信してもよい。
 続いて、会話内容を記録する(S68)。この処理では、入力された文字情報と出力された応答内容を会話内容として学習DB107に記録する。この際、会話内容に含まれるキーワード(音声認識DB102に記録された単語)や発音時の特徴などを学習DB107に記録する。
Then, the generated response (voice) is packet-transmitted to the communication partner terminal device 1 (S64). Note that the packet may be transmitted while generating the voice of the response content.
Subsequently, the conversation content is recorded (S68). In this process, the input character information and the output response contents are recorded in the learning DB 107 as conversation contents. At this time, keywords (words recorded in the speech recognition DB 102) included in the conversation content, pronunciation characteristics, and the like are recorded in the learning DB 107.
 このような処理が終了すると、音声応答サーバ処理を終了する。
 [本実施形態による効果]
 以上のように詳述した音声応答システム100は、入力された文字情報に対する応答を音声で行わせるシステムであって、端末装置1(MPU31)は、文字情報に対する複数の異なる応答を取得し、複数の異なる応答をそれぞれ異なる声色で出力させる。
When such processing ends, the voice response server processing ends.
[Effects of this embodiment]
The voice response system 100 described in detail above is a system that makes a response to inputted character information by voice, and the terminal device 1 (MPU 31) acquires a plurality of different responses to the character information, and Are output in different voice colors.
 このような音声応答システム100によれば、複数の応答を異なる声色で出力させることができるので、1の文字情報に対する解が1つに特定できない場合であっても、異なる解を異なる声色で使用者に分かりやすく出力することができる。よって、使用者にとってより使い勝手をよくすることができる。 According to such a voice response system 100, since a plurality of responses can be output with different voice colors, different solutions are used with different voice colors even when a single answer to one character information cannot be specified. Can be output in an easy-to-understand manner. Therefore, it is possible to improve usability for the user.
 また、上記音声応答システム100において端末装置1は、マイク37を介して使用者による音声を入力し、サーバ90(演算部101)は、入力された音声を文字情報に変換し、該文字情報に対する複数の異なる応答を生成して端末装置1に対して送信する。そして、端末装置1は、サーバ90から応答を取得する。 Further, in the voice response system 100, the terminal device 1 inputs a voice by the user via the microphone 37, and the server 90 (calculation unit 101) converts the inputted voice into character information, A plurality of different responses are generated and transmitted to the terminal device 1. Then, the terminal device 1 acquires a response from the server 90.
 このような音声応答システム100によれば、端末装置1では音声を入力することができるので、文字情報を音声で入力する構成とすることができる。また、サーバ90において応答を生成する構成とすることができるので、音声応答システム100での処理負荷を軽減することができる。 According to such a voice response system 100, since the terminal device 1 can input voice, it can be configured to input character information by voice. Moreover, since it can be set as the structure which produces | generates a response in the server 90, the processing load in the voice response system 100 can be reduced.
 さらに、上記音声応答システム100においてサーバ90は、使用者の発話による音声を文字情報に変換し、発声時の癖(発音上の癖など)を学習情報として蓄積する(特徴を捉えてこの特徴を記録しておく)。 Further, in the voice response system 100, the server 90 converts the voice of the user's utterance into character information, and accumulates utterances (such as pronunciation utterances) at the time of utterance as learning information (capturing the features and using the features). Record).
 このような音声応答システム100によれば、学習情報に基づいて文字情報を生成することができるので、文字情報の生成精度を向上させることができる。
 さらに、上記音声応答システム100においてサーバ90は、使用者によって入力された音声について、声色から感情を読み取り、通常、怒り、喜び、困惑、悲しみ、高揚のうちの少なくとも1つを含む感情のうちの、何れの感情に該当するかを出力する。
According to such a voice response system 100, the character information can be generated based on the learning information, so that the generation accuracy of the character information can be improved.
Further, in the voice response system 100, the server 90 reads the emotion from the voice color of the voice input by the user, and usually includes at least one of anger, joy, confusion, sadness, and emotion. , Which emotion is applicable is output.
 このような音声応答システム100によれば、使用者の感情に応じて応答を出力することができる。
 [第1実施形態の変形例]
 本実施形態においては、文字情報を入力する構成として音声認識を利用したが、音声認識に限らず、キーボードやタッチパネル等の入力手段(操作部70)を利用して入力されてもよい。また、「入力された音声を文字情報に変換」する作動についてはサーバ90で行ったが、端末装置1で行ってもよい。
According to such a voice response system 100, a response can be output according to the user's emotion.
[Modification of First Embodiment]
In the present embodiment, voice recognition is used as a configuration for inputting character information. However, the present invention is not limited to voice recognition, and may be input using input means (operation unit 70) such as a keyboard or a touch panel. Moreover, although the operation | movement which "converts the input audio | voice into character information" was performed in the server 90, you may carry out with the terminal device 1. FIG.
 さらに、上記音声応答システム100においてサーバ90には、複数の文字情報のそれぞれに対して、各文字情報に対する肯定的応答と否定的応答とを含む複数の異なる応答が記録された応答候補DB105、を備え、端末装置1は、複数の異なる応答として肯定的応答と否定的応答とを取得し、肯定的応答と否定的応答とで異なる声色で再生するようにしてもよい。 Further, in the voice response system 100, the server 90 includes a response candidate DB 105 in which a plurality of different responses including a positive response and a negative response for each character information are recorded for each of the plurality of character information. The terminal device 1 may acquire a positive response and a negative response as a plurality of different responses, and reproduce the voices with different voices according to the positive response and the negative response.
 例えば図5に示す第2段目に示すように、何らかの物を「買ってもよいか」との音声を入力すると、この物について、よい評判などの肯定的情報を女の声を対応付けて出力する。また、その一方で、悪い評判などの否定的情報を肯定的情報が対応付けられた女の声とは異なる声色(ここでは男の声)で出力する。 For example, as shown in the second row shown in FIG. 5, when a voice saying “Can I buy something” is input, positive information such as good reputation is associated with the female voice for this thing. Output. On the other hand, negative information such as bad reputation is output in a voice color (in this case, a male voice) different from the female voice associated with the positive information.
 このような音声応答システム100によれば、肯定的応答と否定的応答というように、立場の異なる応答を異なる声色で再生することができるので、別人物が話しているように音声を再生することができる。よって、音声を聞く使用者に違和感を覚えさせにくくすることができる。 According to the voice response system 100 as described above, responses of different positions such as a positive response and a negative response can be reproduced with different voice colors, so that a voice is reproduced as if another person is speaking. Can do. Therefore, it is possible to make it difficult for the user who listens to the voice to feel uncomfortable.
 なお、応答の種別や応答の際の言葉遣いによって声色を変更してもよい。例えば、優しい口調で応答を行う場合には、落ち着いた女性の音声で再生し、激しい口調で応答する場合には、勇ましい男性の音声で応答するなどすればよい。つまり、応答内容と性格とを対応付けておき、性格に応じて声色を設定するようにすればよい。 Note that the voice color may be changed depending on the type of response and the language used in the response. For example, when a response is made with a gentle tone, the voice is reproduced with a calm woman's voice, and when a response is made with a severe tone, a response with a brave man's voice may be made. That is, the response content and the personality are associated with each other, and the voice color may be set according to the personality.
 さらに、上記音声応答システム100においては、自身の端末装置1または他の端末装置1が出力した応答(例えば、肯定的応答や否定的応答)を文字情報として入力し、この応答に対する反論を行うための応答を生成するようにしてもよい。つまり、使用者の立場からすると、賛成の立場と反対の立場との両方の意見による議論を聞くことができる。そして、この議論を聞いたうえで、使用者は最終判断を行うことができる。 Furthermore, in the voice response system 100, a response (for example, a positive response or a negative response) output from the terminal device 1 or another terminal device 1 is input as character information, and a response to this response is made. The response may be generated. In other words, from the user's point of view, it is possible to hear discussions based on both opinions in favor and in opposition. And after hearing this discussion, the user can make a final decision.
 この構成は、1台または複数の端末装置1を用いて実現できる。複数の端末装置1が音声を互いにやり取りするには、音声を直接入出力してもよいし、無線等による通信を利用してもよい。複数の端末装置1とサーバ90とが通信する場合には、S66の処理にて、他の端末装置1にデータを送信すればよい。 This configuration can be realized using one or a plurality of terminal devices 1. In order for the plurality of terminal devices 1 to exchange voices with each other, voices may be directly input / output, or wireless communication or the like may be used. When a plurality of terminal devices 1 communicate with the server 90, data may be transmitted to other terminal devices 1 in the process of S66.
 さらに、上記音声応答システム100において演算部101は、使用者の行動(会話、移動した場所、カメラに映ったもの)を学習(記録および解析)しておき、使用者の会話における言葉足らずを補うようにしてもよい。 Further, in the voice response system 100, the calculation unit 101 learns (records and analyzes) the user's behavior (conversation, the place where the user moved, and what is reflected in the camera) to compensate for the lack of words in the user's conversation. You may do it.
 例えば、「今日はハンバーグでいい?」との質問に対して「カレーがいいな。」と使用者が回答する会話に対して、本装置が「昨日ハンバーグだったからね」と補うと、使用者が、カレーがいいと発言した理由が伝わる。 For example, when the user answers the question “Is it hamburger yesterday?” To the conversation that the user answers “I want curry?” However, the reason why he said that curry is good is conveyed.
 また、このような構成は、電話中に実施することもでき、また、使用者の会話に勝手に参加するよう構成してもよい。
 さらに、上記音声応答システム100においてサーバ90は、応答候補を所定のサーバ、またはインターネット上から取得するようにしてもよい。
Further, such a configuration can be implemented during a telephone call, or may be configured to participate in a user's conversation without permission.
Furthermore, in the voice response system 100, the server 90 may acquire response candidates from a predetermined server or the Internet.
 このような音声応答システム100によれば、応答候補をサーバ90だけでなく、インターネットや専用線等で接続された任意の装置から取得することができる。
 [第2実施形態]
 [第2実施形態の処理]
 次に、別形態の音声応答システムについて説明する。本実施形態(第2実施形態)以下の実施形態では、第1実施形態の音声応答システム100と異なる箇所のみを詳述し、第1実施形態の音声応答システム100と同様の箇所については、同一の符号を付して説明を省略する。
According to such a voice response system 100, response candidates can be acquired not only from the server 90 but also from any device connected via the Internet, a dedicated line, or the like.
[Second Embodiment]
[Process of Second Embodiment]
Next, another type of voice response system will be described. This embodiment (second embodiment) In the following embodiment, only the parts different from the voice response system 100 of the first embodiment will be described in detail, and the same parts as the voice response system 100 of the first embodiment will be the same. The description is abbreviate | omitted and attached | subjected.
 第2実施形態の音声応答システムでは、使用者が文字情報を入力しない場合においても、音声を出力する。詳細には、端末装置1では図6に示す自動会話端末処理を実施する。自動会話端末処理は、例えば端末装置1の電源が投入されると開始される処理であって、その後、繰り返し実行される処理である。 The voice response system according to the second embodiment outputs voice even when the user does not input character information. Specifically, the terminal device 1 performs the automatic conversation terminal process shown in FIG. The automatic conversation terminal process is a process that is started when the terminal device 1 is turned on, for example, and is repeatedly executed thereafter.
 自動会話端末処理では、まず、自動会話をする旨の設定がON(オン)にされているか否かを判定する(S82)。なお、自動会話を行うか否かについては操作部70を介して、或いは音声を入力することによって使用者が設定可能に構成されている。 In the automatic conversation terminal process, first, it is determined whether or not the setting for performing an automatic conversation is ON (S82). Whether or not to perform an automatic conversation can be set by the user via the operation unit 70 or by inputting voice.
 自動会話する旨がOFF(オフ)であれば(S82:NO)、自動会話端末処理を終了する。また、自動会話する旨がONであれば(S82:YES)、自動会話モードに設定された旨を、自身を特定するためのIDとともにサーバ90に対して送信する(S84)。 If the automatic conversation is OFF (S82: NO), the automatic conversation terminal processing is terminated. If automatic conversation is ON (S82: YES), the fact that the automatic conversation mode is set is transmitted to the server 90 together with an ID for identifying itself (S84).
 続いて、サーバ90からのパケットを受信したか否かを判定する(S86)。パケットを受信していなければ(S86:NO)、S86の処理を繰り返す。また、パケットを受信していれば(S86:YES)、前述のS22~S30と同様の処理を実施し、これらの処理が終了すると自動会話端末処理を終了する。 Subsequently, it is determined whether or not a packet is received from the server 90 (S86). If no packet has been received (S86: NO), the process of S86 is repeated. If a packet has been received (S86: YES), the same processes as in S22 to S30 described above are performed, and when these processes are completed, the automatic conversation terminal process is terminated.
 また、サーバ90では、図7に示す自動会話サーバ処理を実行する。自動会話サーバ処理は、例えばサーバ90の電源が投入されると開始され、その後、繰り返し実行される処理である。 Further, the server 90 executes the automatic conversation server process shown in FIG. The automatic conversation server process is a process that is started when the server 90 is turned on, for example, and then repeatedly executed.
 自動会話サーバ処理では、まず、自動会話モードに設定された旨を端末装置1から受信したか否かを判定する(S92)。自動会話モードに設定された旨を受信していなければ(S92:NO)、S98の処理に移行する。 In the automatic conversation server processing, first, it is determined whether or not the fact that the automatic conversation mode is set is received from the terminal device 1 (S92). If it is not received that the automatic conversation mode is set (S92: NO), the process proceeds to S98.
 自動会話モードに設定された旨を受信していれば(S92:YES)、受信したパケットに含まれるIDに基づいて通信相手となる端末装置1を特定し(S94)、この通信相手に対して自動会話する旨を設定する(S96)。続いて、自動会話する旨を設定した端末装置1のそれぞれについて、再生条件を満たすか否かを判定する(S98)。 If the fact that the automatic conversation mode is set is received (S92: YES), the terminal device 1 to be a communication partner is specified based on the ID included in the received packet (S94), The automatic conversation is set (S96). Subsequently, it is determined whether or not the reproduction condition is satisfied for each of the terminal devices 1 that are set to have automatic conversation (S98).
 ここで、再生条件とは、例えば、前回の会話(音声入力)から一定時間が経過していることや、1日のあるきまった時刻、特定の天気のとき、何れかのセンサ値が異常を示す値であるときなどを示す。 Here, the playback condition is, for example, that a certain time has elapsed since the previous conversation (speech input), a certain time of the day, or a specific weather, and any sensor value is abnormal. When the value is shown.
 再生条件を満たしていなければ(S98:NO)、自動会話サーバ処理を終了する。また、再生条件を満たしていれば(S98:YES)、再生条件に応じたメッセージを生成する(S100)。 If the playback condition is not satisfied (S98: NO), the automatic conversation server process is terminated. If the reproduction condition is satisfied (S98: YES), a message corresponding to the reproduction condition is generated (S100).
 ここで、再生条件に応じたメッセージとは、例えば、「おはようございます。」や「こんにちは。」等の定型文であってもよいし、最新のニュースが自動更新されるニュースDB109から得られる最新のニュースに関するものであってもよい。最新のニュースに関するものをメッセージとする場合には、例えば、ある会社の株価に関する情報を取得できた場合には、「今日の○○会社の株価が○○円上がりましたね。ご存じでしたか?」などとすることができる。 Here, the latest and the message in accordance with the playback conditions are, for example, "Good morning." Or "Hello." May be a fixed sentence such as, obtained from the news DB109 the latest news is automatically updated It may be related to news. For example, if you want to get information about the latest news, for example, if you can get information about the stock price of a certain company, "Today's stock price of XX company has increased by XX yen. Did you know? Or the like.
 この処理が終了すると、前述のS42~S54の処理を実施する。そして、S54の処理が終了すると、通信相手となる端末装置1から所定の回答が得られたか否かを判定する(S112)。ここで、所定の回答とは、例えば、何らかの音声であってもよいし、特定の解答であってもよい。特定の解答とは、例えば、「知っていますか?」との質問に対しては、「知っている」または「知らない」という回答が該当し、「今の天気はどうですか?」という質問に対しては、「雨です」や「晴れています」など、天気を示す単語を含むものが該当する。 When this process is completed, the processes of S42 to S54 described above are performed. Then, when the processing of S54 is completed, it is determined whether or not a predetermined answer has been obtained from the terminal device 1 that is the communication partner (S112). Here, the predetermined answer may be, for example, some voice or a specific answer. For example, for the question “Do you know?” For example, the answer “Do you know” or “Do not know” corresponds to the question “Do you know the weather?” On the other hand, those including words indicating weather such as “rainy” or “sunny” are applicable.
 所定の回答があれば(S112:YES)、自動会話サーバ処理を終了する。また、所定の回答がなければ(S112:NO)、S100にて送信したメッセージを再送する(S114)。このようにメッセージを再送する際には、声色を変化させ、語気を強く、かつ厳しい口調の音声を生成する。 If there is a predetermined answer (S112: YES), the automatic conversation server process is terminated. If there is no predetermined answer (S112: NO), the message transmitted in S100 is retransmitted (S114). When the message is retransmitted in this way, the voice color is changed to generate voice with strong vocabulary and severe tone.
 続いて、予め端末装置1と通報先とが対応付けられた通報先DB117を参照し、所定の通報先に回答がなかった旨を送信する(S116)。このような処理が終了すると、自動会話サーバ処理を終了する。 Subsequently, referring to the report destination DB 117 in which the terminal device 1 and the report destination are associated with each other in advance, the fact that there is no answer to the predetermined report destination is transmitted (S116). When such processing ends, the automatic conversation server processing ends.
 [第2実施形態による効果]
 上記の音声応答システム100においてサーバ90は、文字情報が入力されない場合において、当該音声応答システム100の状況が予め音声を出力させる条件として設定された再生条件に合致するか否かを判定する。そして、再生条件に合致する場合に、予め設定されたメッセージを出力させる。
[Effects of Second Embodiment]
In the voice response system 100 described above, when no character information is input, the server 90 determines whether or not the situation of the voice response system 100 matches a playback condition set in advance as a condition for outputting voice. When the reproduction condition is met, a preset message is output.
 このような音声応答システム100によれば、文字情報が入力されない場合(つまり、使用者が話しかけない場合)であっても、音声を出力させることができる。例えば、強制的に使用者に発話させることで、自動車運転中の眠気抑制対策に利用することができる。また、一人暮らしの者が応答するか否かを判定することで、安否確認を行うことができる。 According to such a voice response system 100, it is possible to output voice even when character information is not input (that is, when the user does not speak). For example, by forcing the user to speak, it can be used as a measure for suppressing drowsiness while driving a car. Moreover, safety confirmation can be performed by determining whether a person living alone responds.
 また、上記音声応答システム100においてサーバ90は、ニュースの情報を取得し、該ニュースに関するメッセージを使用者の回答を求める質問形式で出力させる。
 このような音声応答システム100によれば、ニュースに関する会話をすることができるので、いつも同じ会話ばかりになることを抑制することができる。
Further, in the voice response system 100, the server 90 acquires news information and outputs a message related to the news in a question format for asking a user's answer.
According to such a voice response system 100, since it is possible to have a conversation about news, it can be suppressed that the conversation is always the same.
 さらに、上記音声応答システム100においてサーバ90は、予め設定されたメッセージに別途取得した(ニュースや環境(気温、天気、位置情報等の)外部取得情報を付加して出力させる。 Further, in the voice response system 100, the server 90 adds and outputs externally acquired information (news and environment (temperature, weather, location information, etc.)) acquired separately to a preset message.
 このような音声応答システム100によれば、所定のメッセージと取得した情報とを組み合わせた応答を出力することができる。
 さらに、上記音声応答システム100においてサーバ90は、応答やメッセージに対する回答が得られない場合に、予め設定された連絡先に対して、使用者を特定する情報、および回答が得られなかった旨を送信する。
According to such a voice response system 100, a response in which a predetermined message and the acquired information are combined can be output.
Further, in the voice response system 100, when the response to the response or message cannot be obtained, the server 90 informs the preset contact information that the user has not been obtained and that the response has not been obtained. Send.
 このような音声応答システム100によれば、回答が得られない場合に連絡先に通報することができる。よって、例えば、一人暮らしの老人等の異常を早期に通報することができる。 According to such a voice response system 100, it is possible to notify a contact person when an answer cannot be obtained. Therefore, for example, an abnormality such as an elderly person living alone can be notified early.
 [第2実施形態の変形例]
 また、上記音声応答システム100においてサーバ90は、複数のメッセージを取得し、メッセージの再生頻度に応じて再生するメッセージを選択して出力させるようにしてもよい。
[Modification of Second Embodiment]
Further, in the voice response system 100, the server 90 may acquire a plurality of messages, and select and output a message to be reproduced according to the reproduction frequency of the message.
 このような音声応答システム100によれば、再生頻度が高いメッセージを再生しにくくすることで、メッセージ再生時のランダム性を奏したり、敢えて再生頻度が高いメッセージを繰り返し再生することで注意喚起や記憶の定着を促したりすることができる。 According to such a voice response system 100, it is difficult to reproduce a message having a high reproduction frequency, thereby achieving randomness at the time of message reproduction, or repetitively reproducing a message having a high reproduction frequency to call attention or store the message. Or can be promoted.
 [第3実施形態]
 [第3実施形態の処理]
 次に第3実施形態の音声応答システムでは、使用者が誰かに直接は言いにくいことを端末装置1が代わりに伝える構成としている。例えば、デート前に、今日はこのようなことを言いたいと本装置に話しかけておくと、適当なタイミング(例えば予め設定した時刻や、会話が途切れてから一定時間が経過した場合など)で、音声応答システム100が代わりに話してくれる(音声を再生する)ようにする。
[Third Embodiment]
[Process of Third Embodiment]
Next, in the voice response system according to the third embodiment, the terminal device 1 notifies the user that it is difficult to tell the user directly. For example, if you talk to this device today to say something like this before dating, at an appropriate time (for example, when a preset time or a certain time has passed since the conversation was interrupted) The voice response system 100 speaks instead (plays voice).
 詳細には、端末装置1は図8に示す伝言端末処理を実施し、サーバ90は図9に示す伝言サーバ処理を実施する。伝言端末処理は例えば端末装置1の電源が投入されると開始され、その後、繰り返し実行される処理である。 Specifically, the terminal device 1 performs the message terminal process shown in FIG. 8, and the server 90 performs the message server process shown in FIG. The message terminal process is a process that is started when the terminal device 1 is turned on, for example, and then repeatedly executed.
 伝言端末処理では、図8に示すように、まず、使用者によって伝言モードが設定されているか否かを判定する(S132)。伝言モードが設定されていなければ(S132:NO)、S132の処理を繰り返す。 In the message terminal process, as shown in FIG. 8, it is first determined whether or not the message mode is set by the user (S132). If the message mode is not set (S132: NO), the process of S132 is repeated.
 また、伝言モードが設定されていれば(S132:YES)、S2~S8の処理を実施し、S6にて肯定判定された場合には、端末装置1のメモリ内において、伝言モードフラグをON状態に設定する(S134)。そして、S10~S16の処理を実施する。 If the message mode is set (S132: YES), the processing of S2 to S8 is performed. If the determination in S6 is affirmative, the message mode flag is turned on in the memory of the terminal device 1. (S134). Then, the processes of S10 to S16 are performed.
 S16にて肯定判定された場合には、サーバ90からのパケットを受信したか否かを判定する(S136)。パケットを受信していなければ(S136:NO)、S136の処理を繰り返す。また、パケットを受信していれば(S136:YES)、S24~S30の処理を実施し、伝言端末処理を終了する。 If an affirmative determination is made in S16, it is determined whether or not a packet from the server 90 has been received (S136). If no packet is received (S136: NO), the process of S136 is repeated. If a packet has been received (S136: YES), the processing of S24 to S30 is performed, and the message terminal processing is terminated.
 次に、伝言サーバ処理は、例えばサーバ90の電源が投入されると開始される処理であり、その後、繰り返し実行される。詳細には、まず、何れかの端末装置1からパケットを受信したか否かを判定する(S142)。パケットを受信していなければ(S142:NO)、後述するS156の処理に移行する。 Next, the message server process is a process that starts when the server 90 is powered on, for example, and is repeatedly executed thereafter. Specifically, first, it is determined whether or not a packet is received from any one of the terminal devices 1 (S142). If no packet has been received (S142: NO), the process proceeds to S156 described later.
 また、パケットを受信していれば(S142:YES)、通信相手の端末装置1を特定し(S44)、パケットに伝言モードフラグ等のモードフラグが含まれているか否かを判定する(S144)。モードフラグがなければ(S144:NO)、S148の処理に移行する。 If a packet is received (S142: YES), the communication partner terminal device 1 is specified (S44), and it is determined whether or not the packet includes a mode flag such as a message mode flag (S144). . If there is no mode flag (S144: NO), the process proceeds to S148.
 また、モードフラグがあれば(S144:YES)、サーバ90においても通信相手の端末装置1に対応するフラグをON状態に設定することでモード設定をする(S146)。例えば、伝言モードフラグが対応する伝言モードであれば、後述するS46~S152の処理が実施され、後述する誘導モードフラグが対応する誘導モードであれば、S46~S176(図11参照)が実施されることになる。 If there is a mode flag (S144: YES), the server 90 also sets the mode by setting the flag corresponding to the terminal device 1 of the communication partner to the ON state (S146). For example, if the message mode flag corresponds to the message mode, the processing of S46 to S152 described later is performed. If the guidance mode flag described later corresponds to the guidance mode, S46 to S176 (see FIG. 11) is performed. Will be.
 続いて、伝言フラグがON状態であるか否かを判定する(S148)。伝言フラグがON状態であれば(S148:YES)、S46~S54の処理を実施し、続いて、伝言再生条件を抽出する(S150)。 Subsequently, it is determined whether or not the message flag is ON (S148). If the message flag is ON (S148: YES), the processing of S46 to S54 is performed, and then message reproduction conditions are extracted (S150).
 ここで、伝言再生条件は、予め使用者が端末装置1の操作部70を介して設定可能であって、例えば、時刻や位置が該当する。なお、伝言再生条件は、伝言端末処理のパケット送信の際にサーバ90に送信される。 Here, the message reproduction condition can be set in advance by the user via the operation unit 70 of the terminal device 1, and corresponds to, for example, time and position. The message reproduction condition is transmitted to the server 90 at the time of packet transmission for message terminal processing.
 続いて、伝言と音声(声色)とを対応付けて、メモリに記録し(S152)、S156の処理に移行する。また、伝言フラグがOFF状態であれば(S148:NO)、他のモードに関する処理を行い(S154)、再生タイミングになったか否かを判定する(S156)。ここで、再生タイミングとは、伝言再生条件で設定された内容を示す。 Subsequently, the message and voice (voice color) are associated with each other and recorded in the memory (S152), and the process proceeds to S156. If the message flag is OFF (S148: NO), processing relating to another mode is performed (S154), and it is determined whether or not the playback timing has come (S156). Here, the reproduction timing indicates contents set in the message reproduction condition.
 再生タイミングでなければ(S156:NO)、直ちに伝言サーバ処理を終了する。また、再生タイミングであれば(S156:YES)、S62~S64の処理を実施し、伝言サーバ処理を終了する。 If it is not the reproduction timing (S156: NO), the message server process is immediately terminated. If it is the reproduction timing (S156: YES), the processing of S62 to S64 is performed, and the message server processing is terminated.
 [第3実施形態による効果]
 このような第3実施形態の音声応答システムによれば、使用者が入力した音声を直ちに再生するのではなく、一定時間後において伝言再生条件が成立したときに再生することができる。
[Effects of Third Embodiment]
According to the voice response system of the third embodiment, the voice input by the user is not played back immediately, but can be played back when a message playback condition is satisfied after a certain time.
 例えば、図5の第3段目に示すように、「○○さんに○○と伝えてね」と入力すると、○○さんの声が認識されてから(聞こえてから)、伝えたい文章が再生されることになる。 For example, as shown in the third row of FIG. 5, if you enter “Please tell Mr. XX to XX”, the sentence you want to convey will be recognized after Mr. XX ’s voice is recognized (heard). Will be played.
 [第3実施形態の変形例]
 上記第3実施形態においては、使用者が話した内容を再生するよう構成したが、言いにくいことのきっかけになる言葉、例えば「そういえば何か彼女に話すって言ってなかったっけ?」のような言葉、を話す構成としてもよい。詳細には、端末装置1は図10に示す誘導端末処理を実施し、サーバ90は図11に示す誘導サーバ処理を実施する。
[Modification of Third Embodiment]
In the third embodiment, the content spoken by the user is reproduced. However, a word that triggers a difficult thing to say, for example, "Sorry, did you say something to her?" It may be configured to speak such words. Specifically, the terminal device 1 performs the guidance terminal process shown in FIG. 10, and the server 90 performs the guidance server process shown in FIG.
 誘導端末処理は、例えば端末装置1の電源が投入されると開始され、その後、繰り返し実行される処理である。例えば端末装置1の電源が投入されると開始され、その後、繰り返し実行される処理である。 The guidance terminal process is a process that is started when the terminal device 1 is turned on, for example, and then repeatedly executed. For example, this is a process that is started when the terminal device 1 is powered on and then repeatedly executed.
 誘導端末処理では、図10に示すように、まず、使用者によって誘導モードが設定されているか否かを判定する(S162)。誘導モードが設定されていなければ(S162:NO)、S162の処理を繰り返す。 In the guidance terminal process, as shown in FIG. 10, it is first determined whether or not the guidance mode is set by the user (S162). If the guidance mode is not set (S162: NO), the process of S162 is repeated.
 また、誘導モードが設定されていれば(S162:YES)、S2~S8の処理を実施し、S6にて肯定判定された場合には、端末装置1のメモリ内において、誘導モードフラグをON状態に設定する(S164)。そして、S10~S16の処理を実施する。 If the guidance mode is set (S162: YES), the processing of S2 to S8 is performed. If the determination in S6 is affirmative, the guidance mode flag is turned on in the memory of the terminal device 1. (S164). Then, the processes of S10 to S16 are performed.
 S16にて肯定判定された場合には、サーバ90からのパケットを受信したか否かを判定する(S166)。パケットを受信していなければ(S166:NO)、S166の処理を繰り返す。また、パケットを受信していれば(S166:YES)、S24~S30の処理を実施し、誘導端末処理を終了する。 If an affirmative determination is made in S16, it is determined whether a packet from the server 90 has been received (S166). If no packet has been received (S166: NO), the process of S166 is repeated. If a packet has been received (S166: YES), the processing of S24 to S30 is performed, and the guidance terminal processing is terminated.
 次に、誘導サーバ処理は、例えばサーバ90の電源が投入されると開始され、その後、繰り返し実行される処理である。詳細には、前述のS142~S146の処理を実行する。そして、誘導フラグがON状態であるか否かを判定する(S172)。 Next, the guidance server process is a process that is started, for example, when the server 90 is turned on and then repeatedly executed. In detail, the processes of S142 to S146 described above are executed. Then, it is determined whether or not the guidance flag is in an ON state (S172).
 誘導フラグがON状態であれば(S172:YES)、S46~S54の処理を実施し、続いて、誘導再生条件を抽出する(S174)。
 ここで、誘導再生条件においても伝言再生条件と同様に、予め使用者が端末装置1の操作部70を介して設定可能であって、例えば、時刻や位置が該当する。なお、誘導再生条件は、伝言端末処理のパケット送信の際にサーバ90に送信される。
If the guidance flag is in the ON state (S172: YES), the processing of S46 to S54 is performed, and then the guidance regeneration condition is extracted (S174).
Here, similarly to the message reproduction condition, the guidance reproduction condition can be set in advance by the user via the operation unit 70 of the terminal device 1, and corresponds to, for example, time and position. The guided reproduction condition is transmitted to the server 90 at the time of packet transmission for message terminal processing.
 続いて、誘導内容を生成し、この誘導内容と音声(声色)とを対応付けて、メモリに記録する(S176)。ここで誘導内容としては、例えば、入力された文字情報に含まれる「したい」「希望」などの願望を表す単語を検索し、これらの単語の前のキーワードを抽出し、これらのキーワードを誘導する言葉として登録された言葉を誘導内容として出力する。なお、キーワードと誘導内容を示す言葉とは、予め対応付けられて応答候補DB105に記録されている。 Subsequently, the guidance content is generated, and the guidance content and voice (voice color) are associated with each other and recorded in the memory (S176). Here, as the guidance content, for example, a word representing a desire such as “I want to” or “hope” included in the input character information is searched, keywords before these words are extracted, and these keywords are induced. The words registered as words are output as guidance contents. The keyword and the word indicating the guidance content are associated with each other in advance and recorded in the response candidate DB 105.
 続いて、前述のS156以下の処理を実施し、サーバ処理を終了する。また、誘導フラグがOFF状態であれば(S172:NO)、他のモードに関する処理を行い(S154)、前述のS156以下の処理を実施し、サーバ処理を終了する。 Subsequently, the processing from S156 described above is performed, and the server processing is terminated. If the guidance flag is in the OFF state (S172: NO), the process related to the other mode is performed (S154), the process of S156 and subsequent steps is performed, and the server process is terminated.
 このような第3実施形態の変形例の構成によれば、使用者が言いたい言葉を直接出力するのではなく、言いたい言葉を話せるように誘導することができる。
 [第4実施形態]
 [第4実施形態の処理]
 次に、端末装置1を受付業務に使用する例について説明する。本実施形態においては、端末装置1は会社の受付などに設置される。なお、会社の代表電話やテレホンバンキングなどの電話受付に採用することもできる。ここで、本実施形態では、第1実施形態におけるS56の処理を、図12に示す受付処理に置き換えることによって実現される。
According to the configuration of the modified example of the third embodiment, it is possible to guide the user to speak the desired word, instead of directly outputting the desired word.
[Fourth Embodiment]
[Process of Fourth Embodiment]
Next, an example in which the terminal device 1 is used for reception work will be described. In the present embodiment, the terminal device 1 is installed at a company reception or the like. It can also be used for telephone reception for company representative telephones and telephone banking. Here, in this embodiment, it implement | achieves by replacing the process of S56 in 1st Embodiment with the reception process shown in FIG.
 受付処理では、図12に示すように、まず、文字情報に会社名が含まれるか否かを判定する(S192)。この処理では、一般的な名前や会社名(音声認識DB102に記録されたもの)が含まれているか否かを判定する。 In the acceptance process, as shown in FIG. 12, it is first determined whether or not the company name is included in the character information (S192). In this processing, it is determined whether or not a general name or company name (recorded in the voice recognition DB 102) is included.
 文字情報に会社名または個人名が含まれていなければ(S192:YES)、会社名および個人名を尋ねるための応答を生成し(S194)、受付処理を終了する。この処理では、例えば、「お名前とご用件をお話しください。」などの応答を生成する。 If the company name or personal name is not included in the character information (S192: YES), a response for asking the company name and personal name is generated (S194), and the reception process is terminated. In this process, for example, a response such as “Please tell us your name and business” is generated.
 文字情報に会社名または個人名が含まれていれば(S192:NO)、この会社名や個人名をセールスDB118およびクライアントDB119から抽出する(S196)。ここで、セールスDB118には、過去にセールスに来た会社および担当者、或いは苦情ばかり話すクレーマーの名前等が記録されている。また、クライアントDB119には、会社名やその会社の担当者、端末装置1の利用者側(自社側)の担当者、面会予定時刻等のスケジュール、担当者ごとに連絡先が対応付けて記録されている。 If the company name or personal name is included in the character information (S192: NO), the company name or personal name is extracted from the sales DB 118 and the client DB 119 (S196). Here, in the sales DB 118, the name of the company and the person in charge who came to the sales in the past, or the name of the Kramer who only talks about complaints are recorded. Further, in the client DB 119, a company name, a person in charge of the company, a person in charge on the user side (in-house side) of the terminal device 1, a schedule such as a scheduled visit time, and a contact address are recorded in association with each person in charge. ing.
 続いて、会社名や個人名をセールスDB118から抽出できたか否か、つまり、文字情報に含まれる会社名や個人名がセールスDB118に含まれていたか否かを判定する(S198)。会社名や個人名をセールスDB118から抽出できていれば(S198:YES)、セールスを断る旨のセールスお断り応答(取次ぎを断る応答)を生成し(S200)、受付処理を終了する。 Subsequently, it is determined whether or not the company name and personal name can be extracted from the sales DB 118, that is, whether or not the company name and personal name included in the character information are included in the sales DB 118 (S198). If the company name and the individual name can be extracted from the sales DB 118 (S198: YES), a sales rejection response (response to reject the agency) is generated (S200), and the reception process is terminated.
 また、会社名や個人名をセールスDB118から抽出できていなければ(S198:NO)、受付に来た者がクライアントDB119内のスケジュールにおいて、近い時刻(例えば、現在時刻の前後1時間以内)に訪問してくる者か否かを判定する(S202)。近い時刻に訪問してくる者であれば(S202:YES)、この者を担当する担当者の連絡先をクライアントDB119から抽出し、この担当者と受付に来た者とが会話をできるように、この担当者に接続する(S204)。この処理では、担当者の内線電話、携帯電話等に接続すればよい。 If the company name or personal name cannot be extracted from the sales DB 118 (S198: NO), the person who has come to the reception visits the schedule in the client DB 119 at a close time (for example, within 1 hour before and after the current time). It is determined whether or not it is a person who comes (S202). If the person is visiting at a close time (S202: YES), the contact information of the person in charge of this person is extracted from the client DB 119 so that the person in charge and the person who has come to the reception can have a conversation. The person in charge is connected (S204). In this process, it is only necessary to connect to the extension telephone of the person in charge, a mobile phone or the like.
 続いて、クライアント用の受付応答を生成する(S206)。ここで、クライアント用の受付応答としては、例えば、「○○様、いつもありがとうございます。担当者に接続しておりますのでしばらくお待ちください。」のような応答を生成する。このような処理が終了すると、受付処理を終了する。 Subsequently, an acceptance response for the client is generated (S206). Here, as a response for the client, for example, a response such as “Thank you XXX, please wait for a while as you are connected to the person in charge” is generated. When such a process ends, the acceptance process ends.
 また、近い時刻に訪問してくる者でなければ(S202:NO)、予め設定された受付用の連絡先に接続し、この担当者と受付に来た者とが会話をできるように、この受付担当者に接続する(S208)。そして、通常受付応答を生成する(S210)。 Also, if the person is not visiting at a close time (S202: NO), this person is connected to a preset contact for reception so that this person in charge and the person who has come to the reception can have a conversation. The person in charge is connected (S208). Then, a normal acceptance response is generated (S210).
 ここで、通常受付応答としては、例えば、「受付に接続しておりますのでしばらくお待ちください。」のような応答を生成する。このような処理が終了すると、受付処理を終了する。 Here, as the normal reception response, for example, a response such as “Please wait for a while because it is connected to reception” is generated. When such a process ends, the acceptance process ends.
 [第4実施形態による効果]
 上記音声応答システム100においては、仕事場や会社の受付で利用する構成としている。この構成では、セールスに来る者の名前と会社名をサーバ90のセールスDB118に予め記録しておき、受付に来たものが、この名前や会社名を名乗った場合には、断る文句の音声を再生するように、応答を生成する。
[Effects of Fourth Embodiment]
The voice response system 100 is configured to be used at a workplace or company reception. In this configuration, the name and company name of the person coming to the sales are recorded in advance in the sales DB 118 of the server 90. Generate a response to play.
 また、上記音声応答システム100においてサーバ90は、入力された文字情報によって通信相手を特定し、通信相手毎に予め設定された通信先と通信相手とを接続する。
 このような音声応答システム100によれば、受付業務や電話対応を補助することができる。また、このような音声応答システム100によれば、使用者の業務に支障がある虞がある者を、自身が対応することなく排除することができる。
In the voice response system 100, the server 90 identifies a communication partner based on the input character information, and connects a communication destination set in advance for each communication partner and the communication partner.
According to such a voice response system 100, it is possible to assist reception work and telephone support. Moreover, according to such a voice response system 100, it is possible to eliminate a person who may interfere with a user's business without dealing with it.
 さらに、上記音声応答システム100においてサーバ90は、入力された文字情報(特に音声)に含まれるキーワードを抽出し、キーワードが該当する接続先に接続する。なお、例えば相手先の名称等のキーワードとその接続先とは予め対応付けられている。 Further, in the voice response system 100, the server 90 extracts a keyword included in the input character information (particularly voice) and connects to a connection destination corresponding to the keyword. For example, a keyword such as the name of the other party is associated with the connection destination in advance.
 このような音声応答システム100によれば、電話の転送や受付への呼び出し等の業務を補助することができる。
 [第4実施形態の変形例]
 上記実施形態では、相手先に応じて接続先を設定するよう構成したが、この技術を応用して、例えば、テレホンバンキングやテレホンショッピング等の電話受付において、要件(文字情報に含まれるキーワード)を認識し、要件に応じて接続先を変更するようにしてもよい。
According to such a voice response system 100, it is possible to assist operations such as telephone transfer and call reception.
[Modification of Fourth Embodiment]
In the above embodiment, the connection destination is set according to the other party. However, by applying this technology, for example, in telephone reception such as telephone banking and telephone shopping, requirements (keywords included in the character information) are set. It may be recognized and the connection destination may be changed according to the requirements.
 また、上記音声応答システム100においてサーバ90は、キーワードに基づいて相手が話す要件を認識し、相手が話した概要を使用者に伝えるようにしてもよい。
 このような音声応答システム100によれば、客先との取次の業務を補助することができる。
Further, in the voice response system 100, the server 90 may recognize the requirement of the other party to speak based on the keyword, and transmit the outline of the other party to the user.
According to the voice response system 100 as described above, it is possible to assist an intermediary service with a customer.
 [第5実施形態]
 [第5実施形態の処理]
 次に、端末装置1は、他の端末装置1からの要求を受けて、他の端末装置1が求める情報を提供するようにしてもよい。
[Fifth Embodiment]
[Process of Fifth Embodiment]
Next, in response to a request from another terminal device 1, the terminal device 1 may provide information requested by the other terminal device 1.
 このように構成する場合、サーバ90は、S56の処理において、必要な情報を他の端末装置1に要求し、他の端末装置1から必要な情報を取得した上で応答を生成する。そして、必要な情報を提供する端末装置1では、図13に示す情報提供端末処理が実施される。情報提供端末処理は、例えば、サーバ90からの要求があると開始される処理である。 In such a configuration, the server 90 requests other terminal device 1 for necessary information in the process of S56, and generates a response after obtaining necessary information from the other terminal device 1. And in the terminal device 1 which provides required information, the information provision terminal process shown in FIG. 13 is implemented. The information providing terminal process is a process that is started when there is a request from the server 90, for example.
 情報提供端末処理は、図13に示すように、まず、情報提供先を抽出する(S222)。この情報提供先は、情報を要求する他の端末装置1を示し、この他の端末装置1を特定するためのIDがサーバ90からの要求に含まれている。 In the information providing terminal process, as shown in FIG. 13, first, an information providing destination is extracted (S222). The information providing destination indicates another terminal device 1 that requests information, and an ID for specifying the other terminal device 1 is included in the request from the server 90.
 続いて、情報の提供を許可する相手であるか否かを判定する(S224)。ここで、端末情報DB113には、家族や友人等、情報の提供を許可する相手のIDが予め記録されている。この処理ではこの端末情報DB113を参照することで判定を行う。 Subsequently, it is determined whether or not the other party is permitted to provide information (S224). Here, in the terminal information DB 113, IDs of other parties who are permitted to provide information, such as family members and friends, are recorded in advance. In this process, determination is performed by referring to this terminal information DB 113.
 情報の提供を許可する相手であれば(S224:YES)、自身のメモリ39や各種センサ類等から要求された情報を取得し(S226)、このデータをサーバ90に送信する(S228)。また、情報の提供を許可する相手でなければ(S224:NO)、情報の提供を拒否する旨をサーバ90に送信する(S230)。 If the partner is permitted to provide information (S224: YES), the requested information is acquired from its own memory 39 or various sensors (S226), and this data is transmitted to the server 90 (S228). If it is not the other party who permits the provision of information (S224: NO), the server 90 is notified that the provision of information is rejected (S230).
 このような処理が終了すると、情報提供端末処理を終了する。
 この構成では、例えば、図5の第4段目に示すように、「○○さんは何をしているか」という質問に対して、サーバ90は○○さんの端末装置1に位置情報を要求し、この端末装置1は位置情報を返す。
When such processing ends, the information provision terminal processing ends.
In this configuration, for example, as shown in the fourth row of FIG. 5, the server 90 requests location information from Mr. XX's terminal device 1 in response to the question “What is Mr. XX doing?” The terminal device 1 returns position information.
 そして、サーバ90は位置情報に基づいて○○さんの行動を認識する。例えば、線路上を人間の走る速度よりも速い速度で移動していれば、電車に乗って移動中と判断し、「○○さんは電車の中にいます。帰宅中のようです。」などと応答を生成することになる。 Then, the server 90 recognizes the action of Mr. XX based on the position information. For example, if you are moving on the track at a speed faster than the speed of humans, it is judged that you are moving on a train, and “Mr. XX is on the train. And generate a response.
 [第5実施形態による効果]
 上記音声応答システム100においてサーバ90は要求元の端末装置1とは異なる他の端末装置1から他の端末装置1に記録されている情報を取得し、他の端末装置1に提供する。つまり、上記音声応答システム100においてサーバ90は、文字情報に対する応答を生成するための情報を他の端末装置1から取得する。
[Effects of Fifth Embodiment]
In the voice response system 100, the server 90 acquires information recorded in the other terminal device 1 from another terminal device 1 different from the requesting terminal device 1 and provides the information to the other terminal device 1. That is, in the voice response system 100, the server 90 acquires information for generating a response to the character information from the other terminal device 1.
 このような音声応答システム100によれば、他の端末装置1に記録された情報に基づいて応答を生成することができる。
 また、上記音声応答システム100において端末装置1は、文字情報に対する応答を生成するための情報を他の端末装置1から要求された場合、この要求に応じた情報を返す。
According to such a voice response system 100, a response can be generated based on information recorded in another terminal device 1.
In the voice response system 100, when the terminal device 1 requests information for generating a response to the character information from another terminal device 1, the terminal device 1 returns information corresponding to the request.
 この構成において端末装置1は、位置情報、温度、湿度、照度、騒音レベル等を検出するためのセンサ類や、辞書情報などのデータベースを備えておき、要求に応じて必要な情報を抽出する。 In this configuration, the terminal device 1 includes sensors for detecting position information, temperature, humidity, illuminance, noise level, and a database such as dictionary information, and extracts necessary information as required.
 このような音声応答システム100によれば、他の端末装置1の位置等、他の端末装置1固有の情報を取得することができる。また、他の端末装置1に自身固有の情報を送信することができる。 According to such a voice response system 100, information unique to the other terminal device 1 such as the position of the other terminal device 1 can be acquired. In addition, information unique to itself can be transmitted to another terminal device 1.
 [第6実施形態]
 [第6実施形態の処理]
 次に、第6実施形態の音声応答システムでは、使用者または使用者に関係がある者を表す関係者の性格を予め設定された区分に従って対応付けた性格情報が記録された性格DB106を準備している。性格DB106は、例えば、図14に示すように、使用者や関係者の名前と、これらの者の性格区分とを対応付けて記録されている。
[Sixth Embodiment]
[Process of Sixth Embodiment]
Next, in the voice response system according to the sixth embodiment, a personality DB 106 is prepared, in which personality information that associates personalities of users or persons who are related to the users according to preset categories is recorded. ing. For example, as shown in FIG. 14, the personality DB 106 records the names of users and parties and the personality classifications of these persons in association with each other.
 また、図14に示す性格DB106では、使用者や関係者に性格テストを実施し、そのテスト結果についても記録している。ここで、性格情報を生成する際には、周知の性格分析技術(ロールシャッハ・テスト、ソンディ・テスト等)を利用すればよい。また、性格情報を生成する際には、企業等が採用試験に利用する適性検査の技術を利用してもよい。 In the personality DB 106 shown in FIG. 14, a personality test is performed on users and related parties, and the test results are also recorded. Here, when generating the personality information, a known personality analysis technique (Rorschach test, Sondy test, etc.) may be used. In addition, when generating personality information, aptitude inspection technology used for employment tests by companies and the like may be used.
 性格情報を生成する際には、例えば図15に示す性格情報生成処理を実施する。性格情報生成処理は、例えば、端末装置1において操作部70等を用いて性格情報を生成する旨が入力されると開始される処理である。 When generating personality information, for example, personality information generation processing shown in FIG. 15 is performed. The personality information generation process is a process that starts when, for example, the terminal device 1 is input to generate personality information using the operation unit 70 or the like.
 性格情報生成処理では、図15に示すように、まず、マイク37をON状態とし(S242)、所定の4択問題の1つを音声で出力する(S244)。この際、4択問題については、サーバ90から取得してもよいし、予めメモリ39に記録された問題を出題してもよい。 In the personality information generation process, as shown in FIG. 15, first, the microphone 37 is turned on (S242), and one of the predetermined four-choice questions is output by voice (S244). At this time, the four-choice question may be acquired from the server 90, or a problem recorded in advance in the memory 39 may be asked.
 続いて、対象者(使用者またはその関係者)から音声で回答があったか否かを判定する(S246)。回答がなければ(S246:NO)、S246の処理を繰り返す。
 また、回答があれば(S246:YES)、言葉の語尾、会話スピード等の会話パラメータを抽出し(S248)、現在の問題が最終問題であるか否かを判定する(S250)。最終問題でなければ(S250:NO)、次の問題を選択し(S252)、S242の処理に戻る。
Subsequently, it is determined whether or not there is a voice response from the target person (user or related person) (S246). If there is no answer (S246: NO), the process of S246 is repeated.
If there is an answer (S246: YES), conversation parameters such as word ending and conversation speed are extracted (S248), and it is determined whether or not the current problem is the final problem (S250). If it is not the final problem (S250: NO), the next problem is selected (S252), and the process returns to S242.
 また、最終問題であれば(S250:YES)、4択問題を回答することによる性格分析を行い(S254)、会話パラメータによる性格分析を行う(S256)。ここで、会話パラメータによる性格分析では、自分に自信がある人は語尾が強く、自信がない人は語尾が弱くなる傾向や、せっかちな人は会話スピードが速く、おっとりした人は会話スピードが遅い傾向等を捉えることができる。 If it is a final question (S250: YES), a personality analysis is performed by answering a four-choice question (S254), and a personality analysis is performed using conversation parameters (S256). Here, in the personality analysis based on conversation parameters, those who are confident in themselves tend to have a strong ending, those who are not confident tend to have a weak ending, and those who are impatient have a fast conversation speed, those who are quiet have a slow conversation speed Trends can be captured.
 続いて、これらの性格分析結果を加重平均するなど、総合的に分析し(S258)、性格区分に振り分ける(S260)。詳細には、テストによって得られた対象者の性格を点数化し、点数ごとに性格区分に振り分ける。 Subsequently, these personality analysis results are comprehensively analyzed, such as weighted average (S258), and assigned to personality categories (S260). Specifically, the personality of the subject obtained through the test is scored and assigned to the personality classification for each score.
 続いて、対象者と性格区分とを対応付けて(S262)、性格DB106に記録させる(S264)。つまり、対象者と性格区分との関係をサーバ90に送信する。なお、このとき、テスト結果についてもサーバ90に送信し、サーバ90は図14に示すような性格DB106を構築する。このような処理が終了すると、性格情報生成処理を終了する。 Subsequently, the target person and the personality classification are associated (S262) and recorded in the personality DB 106 (S264). That is, the relationship between the target person and the personality classification is transmitted to the server 90. At this time, the test result is also transmitted to the server 90, and the server 90 constructs the personality DB 106 as shown in FIG. When such processing ends, the personality information generation processing ends.
 このように生成された性格DB106を利用する際には、性格区分と異なる応答とを対応付けたものを応答候補DB105において準備しておく。そして、サーバ90はS56の処理にて、文字情報に対する複数の異なる応答を表す応答候補を取得し、性格情報に応じて応答候補から出力させる応答を選択し、S60、S64の処理にて、該選択した応答を出力させる。 When using the personality DB 106 thus generated, a response candidate DB 105 is prepared in which personality classifications and responses different from each other are associated with each other. The server 90 acquires response candidates representing a plurality of different responses to the character information in the process of S56, selects a response to be output from the response candidates according to the personality information, and in the processes of S60 and S64, Output the selected response.
 [第6実施形態による効果]
 上記音声応答システム100において端末装置1は、予め設定された複数の質問に対する回答に基づいて使用者または関係者の性格情報を生成し、生成された性格情報を取得する。
[Effects of Sixth Embodiment]
In the voice response system 100, the terminal device 1 generates personality information of a user or a person concerned based on answers to a plurality of preset questions, and acquires the generated personality information.
 このような音声応答システム100によれば、性格情報をサーバ90や端末装置1において生成することができる。
 さらに、上記音声応答システム100において演算部101は、入力された文字情報に含まれる文字列に基づいて使用者または関係者の性格情報を生成する。
According to such a voice response system 100, personality information can be generated in the server 90 or the terminal device 1.
Further, in the voice response system 100, the calculation unit 101 generates personality information of the user or related person based on the character string included in the input character information.
 このような音声応答システム100によれば、使用者が音声応答システム100を利用する過程で性格情報を生成することができる。
 また、このような音声応答システム100によれば、使用者や使用者に関係がある者(関係者)の性格に応じて異なる応答を行うことができる。よって、使用者にとって使い勝手を良くすることができる。
According to such a voice response system 100, personality information can be generated in the process in which the user uses the voice response system 100.
Moreover, according to such a voice response system 100, a different response can be performed according to the character of the user or a person related to the user (related person). Therefore, usability can be improved for the user.
 [第6実施形態の変形例]
 上記第6実施形態では、性格に応じて応答を1つに絞ってから出力してもよいし、複数の応答に対してそれぞれ異なる声色の音声を対応付けて出力してもよい。
[Modification of Sixth Embodiment]
In the sixth embodiment, the response may be output after being narrowed down to one according to the personality, or the voices of different voice colors may be associated with the plurality of responses and output.
 また、上記性格情報生成処理のうちの、S248、S254~S264の処理は、サーバ90において実施してもよい。この場合、第1実施形態等と同様に、サーバ90に端末装置1を特定させつつ、端末装置1とサーバ90との間で音声や問題をやりとりすればよい。 Of the personality information generation processing, the processing of S248 and S254 to S264 may be performed by the server 90. In this case, as in the first embodiment, the voice and problem may be exchanged between the terminal device 1 and the server 90 while allowing the server 90 to identify the terminal device 1.
 さらに、上記音声応答システム100においてサーバ90は、使用者の行動および操作のうちの何れかを検出し、これらに基づいて学習情報または性格情報を生成するようにしてもよい。 Furthermore, in the voice response system 100, the server 90 may detect any one of the user's actions and operations, and generate learning information or personality information based on these.
 このような音声応答システム100によれば、例えば、使用者が数日間連続で電車に飛び乗ることを検出した場合には、翌日からは数分早く家を出るよう促したり、会話から使用者に怒りやすい傾向があることを検出した場合には、気分を抑える音声や音楽を出力したりすることができる。 According to such a voice response system 100, for example, when it is detected that the user jumps on the train for several consecutive days, the user is prompted to leave the house several minutes earlier from the next day, or the user is angry from the conversation. When it is detected that there is a tendency to be easy, it is possible to output voice and music to suppress mood.
 [第7実施形態]
 [第7実施形態の処理]
 次に、第7実施形態の音声応答システムでは、使用者や関係者の嗜好を予め設定された区分に従って対応付けた嗜好情報が記録された嗜好DB108を準備している。嗜好DB108は、例えば、図16に示すように、使用者や関係者の名前と、これらの者の嗜好が、食の好み(食)、色の好み(色)、趣味、等の嗜好の種別のそれぞれに対して対応付けて記録されている。
[Seventh Embodiment]
[Process of Seventh Embodiment]
Next, in the voice response system according to the seventh embodiment, a preference DB 108 is prepared in which preference information in which preferences of users and parties are associated in accordance with preset categories is recorded. For example, as shown in FIG. 16, the preference DB 108 stores the names of users and related parties and the preference of these persons as the type of preference such as food preference (food), color preference (color), hobby, etc. Are recorded in association with each other.
 特に、食の好みについては、甘党(甘)、辛党(辛)、その中間である並、色の好みについては、暖色系(暖)、寒色系(寒)、その中間である並、趣味については、インドア系の趣味(内)、アウトドア系の趣味(外)、インドア・アウトドア両方の趣味(内外)に分類している。 In particular, as for food preferences, sweet taste (sweet), spicy taste (spicy), the middle level, and for color tastes, warm color (warm), cool color (cold), middle order, hobby Are classified into indoor hobbies (inside), outdoor hobbies (outside), and both indoor and outdoor hobbies (inside and outside).
 このような嗜好DB108を構築する際には、例えば、図17に示す嗜好情報生成処理を実行する。嗜好情報生成処理は、例えば、S48~S54の間で実施される。
 詳細には、図17に示すように、文字情報から嗜好に関するキーワードを抽出し(S282)、画像処理によって特定された物体のうち、嗜好に関するものを抽出する(S284)。なお、嗜好に関するキーワードは、嗜好DB108において、嗜好の種別とその種別の中での分類(食の好みであれば甘、並、辛など)とが対応付けられており、これらの処理では抽出したキーワードや物体が嗜好DB108に含まれている場合に、嗜好に関するものとして抽出される。
When constructing such a preference DB 108, for example, preference information generation processing shown in FIG. 17 is executed. The preference information generation process is performed between S48 and S54, for example.
Specifically, as shown in FIG. 17, keywords relating to preference are extracted from character information (S282), and among objects identified by image processing, those relating to preference are extracted (S284). In addition, in the preference DB 108, the preference-related keywords and the classifications within the types (category sweet, average, hot, etc.) are associated with each other in the preference DB 108, and are extracted in these processes. When keywords and objects are included in the preference DB 108, they are extracted as preferences.
 続いて、嗜好に関するキーワードのグループごとにカウンタをインクリメントする(S288)。例えば、キムチのように、嗜好の種別が「食の好み」であり、種別が「辛」であるものが抽出された場合には、「食の好み」「辛」が対応するカウンタをインクリメントする。 Subsequently, the counter is incremented for each keyword group related to preference (S288). For example, when the type of preference is “food preference” and the type is “spicy”, such as kimchi, the counter corresponding to “food preference” and “spicy” is incremented. .
 そして、カウンタ値に基づいて、嗜好情報(嗜好DB108)を更新する(S290)。つまり、「嗜好の種別」ごとに、最もカウンタ値が大きな「種別」が最も嗜好に合致しているものとして、使用者や関係者の嗜好の特徴として嗜好DB108に記録する。このような処理が終了すると、嗜好情報生成処理を終了する。 Then, the preference information (preference DB 108) is updated based on the counter value (S290). That is, for each “preference type”, the “type” with the largest counter value is recorded in the preference DB 108 as the preference feature of the user or the person concerned as the best match. When such processing ends, the preference information generation processing ends.
 このように生成された嗜好DB108を利用する際には、嗜好毎に異なる応答を対応付けたものを応答候補DB105において準備しておき、サーバ90はS56の処理にて、文字情報に対する複数の異なる応答を表す応答候補を取得し、嗜好情報に応じて応答候補から出力させる応答を選択し、S60、S64の処理にて、該選択した応答を出力させる。 When the preference DB 108 generated in this way is used, the response candidate DB 105 prepares a response in which different responses are associated with each preference, and the server 90 performs a plurality of different character information items in the process of S56. A response candidate representing the response is acquired, a response to be output from the response candidate is selected according to the preference information, and the selected response is output in the processing of S60 and S64.
 [第7実施形態による効果]
 上記音声応答システム100においてサーバ90は、文字情報に含まれる文字列に基づいて使用者または関係者の嗜好の傾向を示す嗜好情報を生成する。そして、嗜好情報に基づいて応答候補から出力させる応答を選択し、該選択した応答を出力させる。
[Effects of the seventh embodiment]
In the voice response system 100, the server 90 generates preference information indicating a preference tendency of a user or a person concerned based on a character string included in the character information. Then, a response to be output from the response candidates is selected based on the preference information, and the selected response is output.
 このような音声応答システム100によれば、使用者または関係者の好みに応じて応答を行うことができる。例えば、使用者が関係者のプレゼントを買う際に、「○○さんは何がほしいかな」と端末装置1に問いかけると、嗜好情報に応じた応答を得ることができる。 According to such a voice response system 100, a response can be made according to the preference of the user or the person concerned. For example, when a user asks the terminal device 1 “What do you want Mr. XX?” When buying a present of a related person, a response according to the preference information can be obtained.
 [第7実施形態の変形例]
 応答候補DB105においては、図18に示すように、性格区分と嗜好情報とを対応付けたテーブルを持たせておいてもよい。
[Modification of the seventh embodiment]
As shown in FIG. 18, the response candidate DB 105 may have a table in which personality classifications and preference information are associated with each other.
 例えば、図18に示す例では、性格区分と色に関する好みとを対応付けて、女性がプレゼントとして貰えると喜ぶと推定できる商品をマトリクス状に配置している。
 S56の処理では、このように性格と嗜好との両方を加味して応答を生成することもできる。
For example, in the example shown in FIG. 18, personality classifications and color preferences are associated with each other, and products that can be estimated to be happy when a woman gives a present as a present are arranged in a matrix.
In the process of S56, a response can be generated in consideration of both personality and preference.
 [第8実施形態]
 [第8実施形態の処理]
 上記実施形態では、音声を文字情報に変換したが、使用者による動作を文字情報に変換するようにしてもよい。
[Eighth Embodiment]
[Process of Eighth Embodiment]
In the above embodiment, the voice is converted into the character information, but the operation by the user may be converted into the character information.
 詳細には、端末装置1は、使用者の動作を撮像画像として捉えてサーバ90に送信し、サーバ90では、例えば、図19に示す動作文字入力処理を実施すればよい。動作文字入力処理は、S48の処理にて撮像画像中に使用者の体の部位が映っていた場合に開始される処理である。 Specifically, the terminal device 1 captures the user's action as a captured image and transmits it to the server 90. The server 90 may perform, for example, an action character input process shown in FIG. The action character input process is a process started when a part of the user's body is reflected in the captured image in the process of S48.
 動作文字入力処理では、図19に示すように、まず、撮像画像を取得する(S302)。そして、使用者が手書きで文字を入力しようとしているか、手話で文字を入力しようとしているかを判定する(S304、S308)。 In the action character input process, as shown in FIG. 19, first, a captured image is acquired (S302). Then, it is determined whether the user intends to input characters by handwriting or to input characters by sign language (S304, S308).
 これらの処理では、例えば、撮像画像に使用者の上半身が顔とともに映っている場合には、手話で文字を入力しようとしていると判定し、撮像画像に使用者の顔が映ることなく使用者の手が映っている場合には、手書きで文字を入力しようとしていると判定する。 In these processes, for example, when the user's upper body is reflected in the captured image together with the face, it is determined that the user is trying to input characters in sign language, and the user's face is not reflected in the captured image. When the hand is reflected, it is determined that the user is trying to input characters by handwriting.
 手書きで文字を入力しようとしてれば(S304:YES)、指先またはペン先の挙動を記録し(S306)、この挙動に基づいて挙動を文字情報に変換する(S312)。ここで、手書き文字・手話DB112には、文字を手書きする際の挙動と文字とが対応付けられており、また、手の動きと手話により表現される文字とが対応付けられている。S312の処理では、手書き文字・手話DB112を参照することによって、文字情報を生成する。 If an attempt is made to input characters by handwriting (S304: YES), the behavior of the fingertip or pen tip is recorded (S306), and the behavior is converted into character information based on this behavior (S312). Here, in the handwritten character / sign language DB 112, a behavior when handwriting a character and a character are associated with each other, and a hand motion and a character expressed by sign language are associated with each other. In the process of S312, character information is generated by referring to the handwritten character / sign language DB 112.
 また、手話で文字を入力しようとしてれば(S304:NO、S308:YES)、手書き文字・手話DB112を参照して手話内容を認識し、前述のS312の処理を実施する。また、手書きや手話で文字を入力しようとしてなければ(S308:NO)、他の方式による入力の処理を行う(S314)。 If a character is to be input in sign language (S304: NO, S308: YES), the sign language content is recognized with reference to the handwritten character / sign language DB 112, and the process of S312 described above is performed. Moreover, if it is not going to input a character by handwriting or sign language (S308: NO), the input process by another system will be performed (S314).
 続いて、動作によって入力された文字と音声によって入力された文字とを対応付け、類似性がある音声があるか否か(文字に基づく基準波形と発音波形との一致度が基準値以上か否か)を判定する(S316)。このような音声入力があれば(S316:YES)、この使用者がこの文字を入力するときのアクセントや発音の特徴を、文字と対応付けて学習DB107に記録し(S318)、動作文字入力処理を終了する。 Subsequently, the character input by the operation is associated with the character input by the speech, and whether there is a similar sound (whether the matching degree between the reference waveform based on the character and the pronunciation waveform is equal to or higher than the reference value). Is determined (S316). If there is such a voice input (S316: YES), the accent and pronunciation characteristics when the user inputs this character are recorded in the learning DB 107 in association with the character (S318), and the action character input process is performed. Exit.
 また、このような音声入力がなければ(S316:NO)、動作文字入力処理を終了する。
 [第8実施形態による効果]
 上記音声応答システム100においては、使用者による動作を文字情報に変換するので、使用者が声を出すことなく文字情報を入力することができる。
If there is no such voice input (S316: NO), the operation character input process is terminated.
[Effects of Eighth Embodiment]
In the voice response system 100, since the operation by the user is converted into character information, the user can input the character information without making a voice.
 [第8実施形態の変形例]
 本実施形態の動作としては、文字の手書き、或いは身振り手振り(例えば手話)だけでなく筋肉の動作に起因するものであればよい。
[Modification of Eighth Embodiment]
The operation according to the present embodiment is not limited to handwriting of characters or gesture gestures (for example, sign language), but may be any operation that is caused by a muscle operation.
 [第9実施形態]
 [第9実施形態の処理]
 学習DB107の内容は、使用者が普段利用する端末装置1とは別の他の端末装置1を利用する場合に、この他の端末装置1において利用できるようにしてもよい。この場合、他の端末装置1から、利用要求とともに、普段利用する端末装置1のIDとパスワードをサーバ90に対して送信する。
[Ninth Embodiment]
[Process of Ninth Embodiment]
The contents of the learning DB 107 may be used in the other terminal device 1 when the user uses another terminal device 1 different from the terminal device 1 that the user normally uses. In this case, the ID and password of the terminal device 1 that is normally used are transmitted from the other terminal device 1 to the server 90 together with the usage request.
 そして、サーバ90では、図20に示す他端末利用処理を実行する。他端末利用処理は、利用要求を受けると開始される処理である。
 他端末利用処理では、図20に示すように、まず、IDとパスワードが入力されたか否かを判定する(S332)。IDとパスワードが入力されていなければ(S332:NO)、S332の処理を繰り返す。
Then, the server 90 executes the other terminal use process shown in FIG. The other terminal use process is a process started when a use request is received.
In the other terminal use process, as shown in FIG. 20, it is first determined whether or not an ID and a password have been input (S332). If the ID and password are not input (S332: NO), the process of S332 is repeated.
 また、IDとパスワードが入力されていれば(S332:YES)、IDとパスワードによる認証が完了したか否かを判定する(S334)。認証が完了していれば(S334:YES)、認証が完了した旨を他の端末装置1に送信し(S336)、他の端末装置1がIDとパスワードが対応する端末装置1の学習DB107を利用するよう設定する(S338)。 If the ID and password are input (S332: YES), it is determined whether or not the authentication using the ID and password is completed (S334). If the authentication is completed (S334: YES), the fact that the authentication is complete is transmitted to the other terminal device 1 (S336), and the other terminal device 1 stores the learning DB 107 of the terminal device 1 corresponding to the ID and password. Setting to use is made (S338).
 認証が完了しなければ(S334:NO)、エラーである旨を他の端末装置1に送信し(S340)、他端末利用処理を終了する。
 [第9実施形態による効果]
 また、上記音声応答システム100においてサーバ90は、ある端末装置1の学習情報を他の端末装置1に転送する。
If the authentication is not completed (S334: NO), an error is transmitted to the other terminal device 1 (S340), and the other terminal use process is terminated.
[Effects of Ninth Embodiment]
In the voice response system 100, the server 90 transfers learning information of a certain terminal device 1 to another terminal device 1.
 このような音声応答システム100によれば、ある端末装置1を利用する使用者が他の端末装置1を利用する場合においても、ある端末装置1で記録された学習情報(サーバ90に記録された学習情報)を利用することができる。よって、他の端末装置1を利用する場合においても文字情報の生成精度を向上させることができる。特に、使用者が端末装置1を複数所持する場合に有効である。 According to such a voice response system 100, even when a user who uses a certain terminal device 1 uses another terminal device 1, learning information recorded in a certain terminal device 1 (recorded in the server 90). Learning information). Therefore, even when other terminal devices 1 are used, the generation accuracy of character information can be improved. This is particularly effective when the user has a plurality of terminal devices 1.
 さらに、上記音声応答システム100においてサーバ90は、使用者以外の者から問い合わせに対して使用者についての情報を出力する。
 このような音声応答システム100によれば、例えば、使用者の食事内容な散歩の距離などを検出しておけば、病院等での質問に使用者に代わって回答することができる。また、健康状態や自己紹介など学習しておくようにしてもよい。
Further, in the voice response system 100, the server 90 outputs information about the user in response to an inquiry from a person other than the user.
According to such a voice response system 100, for example, if the distance of a walk such as a user's meal content is detected, a question in a hospital or the like can be answered on behalf of the user. Moreover, you may be allowed to learn about health conditions and self-introduction.
 [第9実施形態の変形例]
 第9実施形態の構成と同様に、利用を終了する要求と、IDおよびパスワードを受けると、IDおよびパスワードが対応する端末装置1に対する学習DB107の利用を終了(禁止)するようにしてもよい。
[Modification of Ninth Embodiment]
Similar to the configuration of the ninth embodiment, when the request to end use and the ID and password are received, the use of the learning DB 107 for the terminal device 1 corresponding to the ID and password may be ended (prohibited).
 [第10実施形態]
 [第10実施形態の処理]
 第10実施形態の音声応答システムでは、サーバ90が、会話内容を記憶し、聞いた内容について同じ内容を得るための質問をする。詳細には、図7に示す自動会話サーバ処理のS100において、図21に示す記憶確認処理を実行する。
[Tenth embodiment]
[Process of Tenth Embodiment]
In the voice response system of the tenth embodiment, the server 90 stores conversation contents and asks questions for obtaining the same contents about the heard contents. Specifically, the storage confirmation process shown in FIG. 21 is executed in S100 of the automatic conversation server process shown in FIG.
 記憶確認処理では、図21に示すように、過去の会話内容を学習DB107から抽出し(S352)、このうちの何れかの会話内容に含まれるキーワードを解答とする質問を生成する(S353)。このような処理が終了すると記憶確認処理を終了する。 In the memory confirmation process, as shown in FIG. 21, the past conversation content is extracted from the learning DB 107 (S352), and a question with the keyword included in any of the conversation content as an answer is generated (S353). When such processing ends, the storage confirmation processing ends.
 記憶確認処理では、例えば、「昨日の夕食のメニューは何でしたか」や、「3日前にどこに出かけましたか」などと質問すればよい。
 [第10実施形態による効果]
 このような音声応答システム100によれば、使用者の記憶力の確認をするとともに、記憶の定着を図ることができる。高齢者の認知症の進行を抑制するためにも有効であると考えられる。
In the memory confirmation process, for example, “What was the menu for yesterday's dinner” or “Where did you go three days ago”?
[Effects of Tenth Embodiment]
According to such a voice response system 100, the user's memory ability can be confirmed and the memory can be fixed. It is also considered effective in suppressing the progression of dementia in the elderly.
 [第11実施形態]
 [第11実施形態の処理]
 次に、第11実施形態の音声応答システムでは、端末装置1およびサーバ90を利用して使用者が外国語の練習を行えるよう構成している。
[Eleventh embodiment]
[Process of Eleventh Embodiment]
Next, the voice response system according to the eleventh embodiment is configured such that a user can practice a foreign language using the terminal device 1 and the server 90.
 詳細には、図22に示す発音判定処理1と図23に示す発音判定処理2と図24に示す発音判定処理3とを順に実行する。ただし、サーバ90は、音声応答サーバ処理(図2)の実施毎に、発音判定処理1~3の各処理のうちの1つを実行する。また、発音判定処理1~3の各処理は、前述のS56の処理として実行される。 Specifically, the sound generation determination process 1 shown in FIG. 22, the sound generation determination process 2 shown in FIG. 23, and the sound generation determination process 3 shown in FIG. 24 are executed in order. However, the server 90 executes one of the sound generation determination processes 1 to 3 each time the voice response server process (FIG. 2) is performed. Each of the sound generation determination processes 1 to 3 is executed as the process of S56 described above.
 まず、発音判定処理1では、図22に示すように、所定の文章を音声で入力するよう指示する旨の応答を生成する(S362)。この処理では、例えば、外国語の手本となる文章を生成し、この文章を手本に続いて真似て話すよう促す。この処理が終了すると、発音判定処理1を終了する。 First, in the pronunciation determination process 1, as shown in FIG. 22, a response to instruct to input a predetermined sentence by voice is generated (S362). In this process, for example, a sentence serving as a model for a foreign language is generated, and the sentence is prompted to imitate following the model. When this process ends, the sound generation determination process 1 ends.
 次に、発音判定処理1に伴って音声が入力されると、発音判定処理2を実施する。発音判定処理2では、図23に示すように、発音およびアクセントの正確性をスコア(点数)化する(S372)。この処理では、音声は波形として捉え、このとき手本となる文章を波形としたときとの波形の一致度合をスコア化する。 Next, when a sound is input along with the sound generation determination process 1, the sound generation determination process 2 is performed. In the pronunciation determination process 2, as shown in FIG. 23, the accuracy of pronunciation and accent is scored (score) (S372). In this process, the speech is regarded as a waveform, and the degree of coincidence of the waveform with the case where the text as an example is used as a waveform is scored.
 そして、このスコアをメモリに記録し(S374)、発音判定処理2を終了する。続いて、発音判定処理3を実施する。発音判定処理3では、図24に示すように、まず、スコアが閾値未満であるか否かを判定する(S382)。 Then, this score is recorded in the memory (S374), and the pronunciation determination process 2 is terminated. Subsequently, pronunciation determination processing 3 is performed. In the pronunciation determination process 3, as shown in FIG. 24, first, it is determined whether or not the score is less than a threshold value (S382).
 スコアが閾値未満であれば(S382:YES)、再度、同様の文章を入力するよう指示する旨の応答を生成する(S384)。この処理では、例えば、再度、手本に続いて真似て話すよう促すための応答を生成する。 If the score is less than the threshold (S382: YES), a response to instruct to input the same sentence is generated again (S384). In this process, for example, a response for prompting the user to speak after imitating the model is generated again.
 また、スコアが閾値以上であれば(S382:NO)、発音がよかった旨および次の文章を入力するよう促す応答を生成する(S386)。例えば、「よい発音です。次に進みましょう。」などと応答を生成する。 If the score is equal to or greater than the threshold (S382: NO), a response that prompts the user to input the next sentence and the fact that the pronunciation is good is generated (S386). For example, it generates a response such as "Good pronunciation. Let's move on."
 このような処理が終了すると、発音判定処理3を終了する。
 [第11実施形態による効果]
 上記音声応答システム100においてサーバ90は、使用者が入力する音声の発音やアクセントの正確度合を検出し、検出した正確度合を出力する。
When such processing ends, the sound generation determination processing 3 ends.
[Effects of the eleventh embodiment]
In the voice response system 100, the server 90 detects the accuracy of voice pronunciation and accent input by the user, and outputs the detected accuracy.
 このような音声応答システム100によれば、発音やアクセントの正確性を確認することができる。例えば外国語の練習を行う際に有効である。
 さらに、上記音声応答システム100においてサーバ90は、正確度合が一定値以下の場合に、再度、同じ質問を出力させる。
According to such a voice response system 100, the accuracy of pronunciation and accent can be confirmed. For example, it is effective when practicing a foreign language.
Furthermore, in the voice response system 100, the server 90 causes the same question to be output again when the accuracy is a predetermined value or less.
 このような音声応答システム100によれば同じ質問を出力することによって正確な回答を求めることができる。
 [第11実施形態の変形例]
 上記音声応答システム100においてサーバ90は、正確度合が一定値以下の場合に、確認のために、使用者が行った発音に最も近い単語を含む音声を出力するようにしてもよい。
According to such a voice response system 100, an accurate answer can be obtained by outputting the same question.
[Modification of Eleventh Embodiment]
In the voice response system 100, the server 90 may output a voice including a word closest to the pronunciation made by the user for confirmation when the accuracy is equal to or less than a predetermined value.
 このような音声応答システム100によれば、使用者が発音やアクセントの正確性を確認することができる。
 [第12実施形態]
 [第12実施形態の処理]
 次に第12実施形態の音声応答システムについて説明する。第12実施形態の音声応答システムでは、使用者が入力した音声から使用者の感情を検出し、感情に応じて使用者を癒す応答を生成する。
According to such a voice response system 100, the user can confirm the accuracy of pronunciation and accent.
[Twelfth embodiment]
[Process of 12th Embodiment]
Next, a voice response system according to the twelfth embodiment will be described. In the voice response system according to the twelfth embodiment, the user's emotion is detected from the voice input by the user, and a response that heals the user is generated according to the emotion.
 詳細には、図25に示す感情判定処理と、図26に示す感情応答生成処理とを実行する。感情判定処理は、前述のS50の処理の詳細として実施され、図25に示すように、まず、声色、文章の語尾の強弱、一文の長さ、会話スピード、思いがけず出る言葉等から感情をスコア化する(S392)、続いて、スコアによって感情を分類し、メモリに記録する(S394)。 Specifically, the emotion determination process shown in FIG. 25 and the emotion response generation process shown in FIG. 26 are executed. The emotion determination process is performed as the details of the process of S50 described above. As shown in FIG. 25, first, as shown in FIG. Then, the emotions are classified by the score and recorded in the memory (S394).
 このような処理が終了すると、感情判定処理を終了する。続いて、前述のS56の処理において、感情応答生成処理を実行する。
 詳細には、図26に示すように、まず、感情判定処理にて設定された感情区分を判定する(S412)。感情区分が通常であれば(S412:通常)、「こんにちは」等の普通の挨拶文を応答(メッセージ)として生成する(S414)。
When such processing ends, the emotion determination processing ends. Subsequently, an emotion response generation process is executed in the process of S56 described above.
Specifically, as shown in FIG. 26, first, the emotion classification set in the emotion determination process is determined (S412). If the emotion category is a normal (S412: Normal), generated as a response (message) an ordinary greeting such as "Hello" (S414).
 また、感情区分が怒りであれば(S412:怒り)、「お気に障りましたか」等、相手の感情を落ち着かせる際の文章を応答として生成する(S416)。さらに、感情区分が喜びであれば(S412:喜び)、「今日は楽しいですね」等、普通の挨拶文と比較して明るいニュアンスの挨拶文を応答として生成する(S418)。 Also, if the emotion classification is anger (S412: anger), a sentence for calming the opponent's emotion, such as “Is it bothered you”, is generated as a response (S416). Further, if the emotion classification is joy (S412: joy), a bright nuance greeting is generated as a response compared to an ordinary greeting such as “It is fun today” (S418).
 また、感情区分が困惑であれば(S412:困惑)、「どうかしましたか」等、相手を気遣う際の挨拶文を応答として生成する(S420)。このような処理が終了すると、感情応答生成処理を終了する。 Also, if the emotion classification is confused (S412: confused), a greeting for caring for the other party, such as “How was it?” Is generated as a response (S420). When such processing ends, the emotion response generation processing ends.
 [第12実施形態による効果]
 上記音声応答システム100においてサーバ90は、思いがけずに発する音声を検出することによって使用者の苛立ちや動揺を検出し、苛立ちや動揺を抑制するためのメッセージを生成する。
[Effects of the twelfth embodiment]
In the voice response system 100, the server 90 detects the user's irritation and shaking by detecting the unexpectedly generated voice, and generates a message for suppressing the irritation and shaking.
 このような音声応答システム100によれば、使用者に苛立ちや動揺がある場合に、これらを抑制することができる。よって、使用者と周囲とのトラブルの発声を抑制することができる。 According to such a voice response system 100, when the user is frustrated or shaken, these can be suppressed. Therefore, it is possible to suppress the utterance of trouble between the user and the surroundings.
 [第13実施形態]
 [第13実施形態の処理]
 次に第13実施形態の音声応答システムについて説明する。第13実施形態の音声応答システムでは、撮像画像中の物体まで使用者を案内する処理を行う。この処理はサーバ90において前述のS56の処理の詳細として実施される。
[Thirteenth embodiment]
[Process of 13th Embodiment]
Next, a voice response system according to a thirteenth embodiment will be described. In the voice response system according to the thirteenth embodiment, a process of guiding the user to an object in the captured image is performed. This process is performed by the server 90 as the details of the process of S56 described above.
 端末装置1において「見えているタワーまで道案内してください」などと音声で入力すると、S56の処理では案内処理が実施される。案内処理では、図27に示すように、まず、端末位置情報を端末装置1のGPS受信機27等から取得する(S432)。 When the terminal device 1 inputs a voice message such as “Please guide to the visible tower”, guidance processing is performed in the processing of S56. In the guidance process, as shown in FIG. 27, first, the terminal position information is acquired from the GPS receiver 27 or the like of the terminal device 1 (S432).
 そして、音声(文字情報)と画像処理とに基づいて、撮像画像中の物体のうちから対象となる物体を特定し、この位置を特定する(S434)。この処理では、物体の形状、相対的な位置等に基づいて地図情報(外部から取得してもよいし、サーバ90が保持していてもよい)において物体の位置を特定する。例えば、撮像画像中にタワーが映っていた場合、端末装置1の位置とタワーの形状とから、そのタワーを地図上において特定する。 Then, based on the sound (character information) and the image processing, a target object is specified from among the objects in the captured image, and this position is specified (S434). In this processing, the position of the object is specified in the map information (which may be acquired from the outside or may be held by the server 90) based on the shape, relative position, etc. of the object. For example, when a tower is reflected in the captured image, the tower is specified on the map from the position of the terminal device 1 and the shape of the tower.
 続いて、この物体までの経路を検索し(S436)、経路情報を取得する(S438)。この処理は周知のクラウド方式のナビゲーション装置における処理と同様の処理を用いて実現できる。 Subsequently, a route to this object is searched (S436), and route information is acquired (S438). This processing can be realized by using the same processing as that in a known cloud navigation device.
 そして、経路を案内するための応答を生成する(S440)。この処理においても、ナビゲーション装置による案内と同様の応答を生成すればよい。
 このような処理が終了すると、案内処理を終了する。なお、使用者が移動しながら案内処理を実施する際には、自動会話サーバ処理を利用して、使用者が案内すべきポイントに到達することを再生条件としてメッセージを再生すればよい。
Then, a response for guiding the route is generated (S440). In this process, a response similar to the guidance by the navigation device may be generated.
When such a process ends, the guidance process ends. When the guidance process is performed while the user is moving, the automatic conversation server process may be used to reproduce the message on the condition that the user reaches the point to be guided.
 [第13実施形態による効果]
 上記音声応答システム100においてサーバ90は、文字情報が入力された際に、当該音声応答システム100の周囲を撮像した撮像画像に応じた応答を生成し、この応答を音声で出力させる。
[Effects of the thirteenth embodiment]
In the voice response system 100, when character information is input, the server 90 generates a response corresponding to a captured image obtained by imaging the surroundings of the voice response system 100, and outputs the response by voice.
 このような音声応答システム100によれば、撮像画像に応じて応答を音声で出力することができる。したがって、文字情報のみから応答を生成する構成と比較して使い勝手を向上させることができる。 According to such a voice response system 100, a response can be output by voice according to the captured image. Therefore, usability can be improved compared with the structure which produces | generates a response only from character information.
 また、上記音声応答システム100においてサーバ90は、文字情報に含まれる物体を撮像画像中から画像処理によって検索し、該検索された物体の位置を特定し、この物体の位置まで案内する。 Further, in the voice response system 100, the server 90 searches for an object included in the character information from the captured image by image processing, specifies the position of the searched object, and guides to the position of the object.
 このような音声応答システム100によれば、撮像画像中の物体まで使用者を案内することができる。
 さらに、上記音声応答システム100においてサーバ90は、目的地までの案内を行う場合において、目的地までの天気、温度、湿度、交通情報、路面状態等の経路情報を取得し、経路情報を音声で出力させる。
According to such a voice response system 100, the user can be guided to the object in the captured image.
Further, in the voice response system 100, the server 90 obtains route information such as weather, temperature, humidity, traffic information, road surface condition and the like to the destination when performing guidance to the destination, and the route information is voiced. Output.
 このような音声応答システム100によれば、目的地までの状況(経路情報)を使用者に音声で通知することができる。
 [第13実施形態の変形例]
 上記構成に加えて、認識したものが何かを応答するよう文字情報を入力し、撮像画像から認識したものが何か(誰か)を音声で出力するようにしてもよい。
According to such a voice response system 100, the situation (route information) to the destination can be notified to the user by voice.
[Modification of the thirteenth embodiment]
In addition to the above configuration, character information may be input so as to respond to what is recognized, and what (someone) is recognized from the captured image may be output by voice.
 さらに、上記音声応答システム100においてサーバ90は、S48の処理に換えて、文字情報を音声で入力する際において使用者の口の形状を撮像した動画像を取得してもよい。この場合、S52の処理に換えて、音声を文字情報に変換し、かつ、該動画像に基づいて、音声の不明確な部分を推定して文字情報を補正してもよい。 Further, in the voice response system 100, the server 90 may acquire a moving image obtained by capturing the shape of the mouth of the user when inputting character information by voice instead of the process of S48. In this case, instead of the processing of S52, the voice may be converted into character information, and the character information may be corrected by estimating an unclear part of the voice based on the moving image.
 このような音声応答システム100によれば、口の形状から発声内容を推定することできるので、音声の不明確な部分を良好に推定することができる。
 [第14実施形態]
 [第14実施形態の処理]
 次に第14実施形態の音声応答システムについて説明する。第14実施形態の音声応答システムでは、使用者に所定の動作を要求し、この要求通りに使用者が動作を行ったかどうかを判定する。この構成では、図6に示す自動会話端末処理、および図7に示す自動会話サーバ処理において、前述のS56の処理の詳細として図28に示す移動要求処理1および図29に示す移動要求処理2が順に実施される。
According to such a voice response system 100, the utterance content can be estimated from the shape of the mouth, so that an unclear part of the voice can be estimated well.
[Fourteenth embodiment]
[Process of 14th Embodiment]
Next, a voice response system according to the fourteenth embodiment will be described. In the voice response system according to the fourteenth embodiment, the user is requested to perform a predetermined operation, and it is determined whether the user has performed the operation as requested. In this configuration, in the automatic conversation terminal process shown in FIG. 6 and the automatic conversation server process shown in FIG. 7, the movement request process 1 shown in FIG. 28 and the movement request process 2 shown in FIG. It is carried out in order.
 初めにS54の処理が終了すると移動要求処理1が開始され、移動要求処理1では、図28に示すように、所定の位置に視線や頭を移動させるよう指示する旨の応答(メッセージ)を出力する(S452)。この処理が終了すると、移動要求処理1を終了する。 First, when the process of S54 is completed, the movement request process 1 is started, and the movement request process 1 outputs a response (message) instructing to move the line of sight or the head to a predetermined position as shown in FIG. (S452). When this process ends, the movement request process 1 ends.
 続いて、次にS54の処理が終了すると移動要求処理2が開始され、移動要求処理2では、図29に示すように、指示通りに視線や頭の位置が移動したか否かを判定する(S462)。この処理では、カメラによる撮像画像を画像処理することや、端末装置1の各種センサによる検出結果を用いて使用者の動作を検出する。なお、画像処理によって視線を検出する場合には、周知の視線認識の技術を採用すればよい。 Subsequently, when the process of S54 is completed, the movement request process 2 is started. In the movement request process 2, as shown in FIG. 29, it is determined whether or not the position of the line of sight or the head has moved as instructed ( S462). In this process, the user's action is detected by performing image processing on an image captured by the camera or using detection results by various sensors of the terminal device 1. In addition, when detecting a gaze by image processing, a known gaze recognition technique may be employed.
 指示通りに視線や頭が移動していなければ(S462:NO)、再度、S452にて生成した応答を出力する(S464)。また、指示通りに視線や頭が移動していれば(S462:YES)、別の任意の応答を生成する(S466)。 If the line of sight and head are not moving as instructed (S462: NO), the response generated in S452 is output again (S464). If the line of sight and head are moving as instructed (S462: YES), another arbitrary response is generated (S466).
 このような処理が終了すると、移動要求処理2を終了する。
 [第14実施形態による効果]
 上記音声応答システム100においては、使用者の視線を検出し、呼びかけに対して所定の位置に使用者の視線が移動しない場合、視線を所定の位置に移動させるよう要求する音声を出力する。
When such processing ends, the movement request processing 2 ends.
[Effects of the 14th embodiment]
In the voice response system 100, the user's line of sight is detected, and if the user's line of sight does not move to a predetermined position in response to the call, a voice requesting to move the line of sight to the predetermined position is output.
 このような音声応答システム100によれば、使用者に特定の位置を見させることができる。よって、車両運転時の安全確認などを確実に行うことができる。
 なお、上記音声応答システム100においてサーバ90は、体の部位の位置や顔の表情を観察し、呼びかけ対する変化が少ない場合、体の部位の位置や顔の表情を変化させるよう要求する音声を出力する。
According to such a voice response system 100, the user can be made to see a specific position. Therefore, it is possible to reliably perform safety confirmation when driving the vehicle.
In the voice response system 100, the server 90 observes the position of the body part and the facial expression, and outputs a voice requesting to change the position of the body part and the facial expression when there are few changes to the call. To do.
 このような音声応答システム100によれば、使用者の体の部位の位置を特定の位置に移動させたり、特定の表情をするよう誘導したりすることができる。本発明は、車両の運転時や身体検査等の際に利用することができる。 According to such a voice response system 100, the position of the body part of the user can be moved to a specific position or can be guided to have a specific facial expression. The present invention can be used when driving a vehicle or performing a physical examination.
 [第15実施形態]
 [第15実施形態の処理]
 次に第15実施形態の音声応答システムについて説明する。第15実施形態の音声応答システムでは、使用者が音声として放送番組や楽曲を入力した場合において、放送番組や楽曲が途切れた場合に補完する処理を実施する。
[Fifteenth embodiment]
[Processing of Fifteenth Embodiment]
Next, a voice response system according to the fifteenth embodiment will be described. In the voice response system according to the fifteenth embodiment, when a user inputs a broadcast program or music as voice, a process for complementing when the broadcast program or music is interrupted is performed.
 この構成では、前述のS56の詳細として、図30に示す放送楽曲補完処理を実施する。放送楽曲補完処理では、図30に示すように、まず、放送番組や楽曲(使用者が歌う場合にはその歌)が途切れたか否かを判定する(S482)。 In this configuration, broadcast music supplement processing shown in FIG. 30 is performed as the details of S56 described above. In the broadcast music complementing process, as shown in FIG. 30, it is first determined whether or not the broadcast program or the music (the song if the user sings) has been interrupted (S482).
 途切れていれば(S482:YES)、後述するS492の処理にて同期した放送番組や楽曲を応答内容として設定し(S484)、放送楽曲補完処理を終了する。また、途切れていなければ(S482:NO)、放送番組の視聴中であれば放送番組を取得し(S486)、楽曲の演奏中であれば該当する楽曲を取得する(S488)。 If it is interrupted (S482: YES), the broadcast program or music synchronized in the process of S492 described later is set as the response content (S484), and the broadcast music complementing process is terminated. If there is no interruption (S482: NO), the broadcast program is acquired if the broadcast program is being viewed (S486), and if the music is being played, the corresponding music is acquired (S488).
 ここで、カラオケDB116には、楽曲と歌詞とが対応付けて記録されており、この処理において楽曲を取得する場合には、歌詞が付いた楽曲を取得する。
 続いて、使用者が視聴する放送番組または楽曲を特定する(S490)。そして、この放送番組または楽曲を取得して、使用者が視聴する放送番組または楽曲に同期して再生できるよう準備し(S492)、放送楽曲補完処理を終了する。
Here, in the karaoke DB 116, music and lyrics are recorded in association with each other, and when music is acquired in this process, music with lyrics is acquired.
Subsequently, the broadcast program or music to be viewed by the user is specified (S490). Then, this broadcast program or music is acquired and prepared so that it can be played back in synchronization with the broadcast program or music viewed by the user (S492), and the broadcast music supplement processing is terminated.
 [第15実施形態による効果]
 上記音声応答システム100においてサーバ90は、使用者が視聴する放送番組と同様の放送番組を取得し、放送番組が途切れた場合に、自身が取得した放送番組を出力することで途切れた放送番組を補完する。
[Effects of the fifteenth embodiment]
In the voice response system 100, the server 90 acquires a broadcast program similar to the broadcast program viewed by the user, and outputs the broadcast program that the user acquired by outputting the broadcast program that the user acquired when the broadcast program is interrupted. Complement.
 このような音声応答システム100によれば、使用者が視聴する放送番組が途切れないように補うことができる。
 また、上記音声応答システム100においてサーバ90は、歌詞無しの楽曲に使用者が歌詞を付して歌う場合において、歌詞ありの楽曲と使用者が付した歌詞とを比較し、使用者の歌詞のみがない部分において歌詞を音声で出力させる。
According to such a voice response system 100, it is possible to compensate for the broadcast program viewed by the user not to be interrupted.
In the voice response system 100, when the user sings a song without lyrics, the server 90 compares the song with the lyrics with the lyrics added by the user, and only the user's lyrics. The lyrics are output by voice in the part where there is no.
 このような音声応答システム100によれば、いわゆるカラオケ装置を利用する使用者が歌えない部分(歌詞が途切れた部分)を補うことができる。
 [第16実施形態]
 [第16実施形態の処理]
 次に第16実施形態の音声応答システムについて説明する。第16実施形態の音声応答システムでは、撮像画像中に文字が含まれる場合において、端末装置1において使用者からこの文字の読み方についての質問を受けると、この文字の情報を外部から取得し、この情報に含まれる文字の読み方を音声で出力させる。
According to such a voice response system 100, it is possible to make up for a portion where a user who uses a so-called karaoke apparatus cannot sing (a portion where the lyrics are interrupted).
[Sixteenth Embodiment]
[Process of Sixteenth Embodiment]
Next, a voice response system according to the sixteenth embodiment will be described. In the voice response system according to the sixteenth embodiment, when a character is included in the captured image, when the terminal device 1 receives a question about how to read the character from the user, the character information is acquired from the outside. How to read characters included in information is output by voice.
 この構成では、前述のS56の詳細として、図31に示す文字解説処理を実施する。文字解説処理では、図31に示すように、まず、例えば「読み方」のように読みの質問を受けたか否かを判定する(S502)。読みの質問を受けていれば(S502:YES)、画像認識した文字について、読みをインターネット網85を介して接続された他のサーバ等から検索し(S504)、得られた読みを応答に設定し(S506)、文字解説処理を終了する。 In this configuration, the character explanation process shown in FIG. 31 is performed as the details of S56 described above. In the character commentary process, as shown in FIG. 31, it is first determined whether or not a reading question such as “how to read” has been received (S502). If a reading question has been received (S502: YES), the image-recognized character is searched for reading from another server or the like connected via the Internet 85 (S504), and the obtained reading is set as a response. In step S506, the character explanation process is terminated.
 読みの質問でなければ(S502:NO)、国語辞典に記載された内容のような「言葉の意味」の質問。を受けたか否かを判定する(S508)。意味の質問を受けていれば、画像認識した文字(言葉)について、意味をインターネット網85を介して接続された他のサーバ等から検索し(S510)、得られた意味を応答に設定し(S512)、文字解説処理を終了する。 If it is not a reading question (S502: NO), a question of "meaning of words" such as the contents described in the Japanese dictionary. It is determined whether or not it has been received (S508). If a meaning question is received, the meaning of the image-recognized character (word) is retrieved from another server or the like connected via the Internet network 85 (S510), and the obtained meaning is set as a response ( S512), the character explanation process is terminated.
 [第16実施形態による効果]
 このような音声応答システム100によれば、画像認識した文字について、読みを他のサーバ等から検索し、得られた読みを応答に設定するので、文字の読み方や言葉の意味等を使用者に教えることができる。
[Effects of Sixteenth Embodiment]
According to such a voice response system 100, since the reading of the image-recognized character is retrieved from another server or the like and the obtained reading is set as a response, the user is informed of how to read the character and the meaning of the word. Can teach.
 [第17実施形態]
 [第17実施形態の処理]
 次に第17実施形態の音声応答システムについて説明する。第17実施形態の音声応答システムでは、端末装置1によって検出されたセンサ値に基づいて、サーバ90が端末装置1の使用者の異常行動や結構状態を検出し、異常がある場合に通報を行う処理を実施する。
[Seventeenth embodiment]
[Process of 17th Embodiment]
Next, a voice response system according to the seventeenth embodiment will be described. In the voice response system according to the seventeenth embodiment, based on the sensor value detected by the terminal device 1, the server 90 detects an abnormal action or state of the user of the terminal device 1, and notifies when there is an abnormality. Perform the process.
 詳細には、端末装置1においては図32に示す行動応答端末処理を実施し、サーバ90においては行動応答サーバ処理を実施する。行動応答端末処理においては、図32に示すように、まず、端末装置1に搭載された各種センサによる出力を取得するとともに(S522)、カメラ41による撮像画像を取得する(S524)。そして、取得した各種センサによる出力および撮像画像をサーバ90に対してパケット送信し(S526)、行動応答端末処理を終了する。 Specifically, the terminal device 1 performs the behavior response terminal process shown in FIG. 32, and the server 90 performs the behavior response server process. In the action response terminal process, as shown in FIG. 32, first, outputs from various sensors mounted on the terminal device 1 are acquired (S522), and an image captured by the camera 41 is acquired (S524). Then, the obtained outputs from the various sensors and the captured images are packet-transmitted to the server 90 (S526), and the action response terminal process is terminated.
 次に、行動応答サーバ処理では、図33に示すように、まず、前述のS42~S44の処理を実施する。続いて、端末装置1の位置情報(GPS受信機27による検出結果)に基づいて、徘徊等の行動を特定し(S532)、温度センサ15,19等による検出結果に基づいて使用者の環境を検出する(S534)。そして、異常を検出する(S536)。 Next, in the action response server process, as shown in FIG. 33, first, the processes of S42 to S44 described above are performed. Subsequently, based on the position information of the terminal device 1 (detection result by the GPS receiver 27), an action such as a bag is specified (S532), and the environment of the user is determined based on the detection results by the temperature sensors 15, 19 and the like. It is detected (S534). Then, an abnormality is detected (S536).
 この処理では、位置情報の変化と環境とに基づいて異常を検出する。例えば、使用者が高温や低温の場所で動かない場合や、使用者が普段行かない場所に存在する場合に、異常である旨を検出する(S536)。或いは、位置情報や環境を点数化し、この点数が基準値を下回る場合(基準範囲外である場合)に異常であると判断する。 In this process, an abnormality is detected based on the change in position information and the environment. For example, when the user does not move in a place where the temperature is high or low, or when the user exists in a place where the user does not normally go, it is detected that there is an abnormality (S536). Alternatively, the position information and the environment are scored, and if this score is below the reference value (out of the reference range), it is determined that there is an abnormality.
 続いて、異常が検出されたか否かを判定する(S538)。異常が検出されていなければ(S538:NO)、行動応答サーバ処理を終了する。また、異常が検出されていれば(S538:YES)、異常がある旨のメッセージを生成し(S540)、所定の連絡先に通報する(S542)。そして、前述のS62~S68(S66を除く)の処理を実施し、行動応答サーバ処理を終了する。 Subsequently, it is determined whether or not an abnormality has been detected (S538). If no abnormality is detected (S538: NO), the behavior response server process is terminated. If an abnormality is detected (S538: YES), a message indicating that there is an abnormality is generated (S540), and a predetermined contact is notified (S542). Then, the processes of S62 to S68 (excluding S66) described above are performed, and the action response server process is terminated.
 [第17実施形態による効果]
 上記音声応答システム100においてサーバ90は、使用者の行動や使用者の周囲環境を検出し、検出された行動や周囲環境に応じてメッセージを生成する。
[Effects of Seventeenth Embodiment]
In the voice response system 100, the server 90 detects the user's behavior and the surrounding environment of the user, and generates a message according to the detected behavior and the surrounding environment.
 このような音声応答システム100によれば、危険な場所や立ち入り禁止の領域などを報知することができる。また、使用者に異常な行動があることなどを検出することができる。 According to such a voice response system 100, it is possible to notify a dangerous place or an area where entry is prohibited. It is also possible to detect that the user has an abnormal behavior.
 さらに、上記音声応答システム100においてサーバ90は、使用者を撮像した撮像画像に基づいて、健康状態を判定し、この健康状態に応じてメッセージを生成する。
 このような音声応答システム100によれば、使用者の健康状態を管理することができる。
Further, in the voice response system 100, the server 90 determines a health condition based on a captured image obtained by capturing the user, and generates a message according to the health condition.
According to such a voice response system 100, the health condition of the user can be managed.
 また、上記音声応答システム100においてサーバ90は、健康状態が基準値を下回る場合に、所定の連絡先に通報を行う。
 このような音声応答システム100によれば、使用者の健康状態が基準値以下の場合に、通報を行うことができる。よってより早期に異常を他者に報知することができる。
Further, in the voice response system 100, the server 90 notifies a predetermined contact when the health condition is lower than the reference value.
According to such a voice response system 100, when the user's health state is equal to or less than a reference value, a report can be made. Therefore, the abnormality can be notified to the other person earlier.
 [その他の実施形態]
 本発明の実施の形態は、上記の実施形態に何ら限定されることはなく、本発明の技術的範囲に属する限り種々の形態を採りうる。
[Other Embodiments]
Embodiments of the present invention are not limited to the above-described embodiments, and can take various forms as long as they belong to the technical scope of the present invention.
 例えば、音声応答システム100が二者間および多者間でのやり取りを仲介するようにしてもよい。詳細には、交差点等で道を譲り合う必要がある場合、どちらの車両が先に交差点に進入するかを端末装置1同士が交渉するようにしてもよい。この場合、各端末装置1はサーバ90に対して交差点へ接近する際の移動方位や交差点への接近速度の情報を送信し、サーバ90は、移動方位や接近速度に応じて各端末装置1に優先順位を設定し、優先順位に応じて「とまれ」や「進入可」などの音声を生成して出力すればよい。 For example, the voice response system 100 may mediate exchange between two parties or between multiple parties. Specifically, when it is necessary to give way at an intersection or the like, the terminal devices 1 may negotiate which vehicle enters the intersection first. In this case, each terminal device 1 transmits information on the moving direction when approaching the intersection and the approaching speed to the intersection to the server 90, and the server 90 sends the information to each terminal device 1 according to the moving direction and the approaching speed. A priority order may be set, and a voice such as “Tare” or “Enterable” may be generated and output according to the priority order.
 また、音声通信等、リアルタイムに応答を行う必要がある通信の着呼(着信)を端末装置1が受け付ける際には、使用者の都合のよいときだけ着呼を受け付けるようにしてもよい。具体的には、カメラ41にて使用者の顔を撮像できたときに使用者の都合のよいときとみなして着呼を受け付けるようにしてもよい。 Further, when the terminal device 1 accepts an incoming call (incoming call) of communication that needs to respond in real time such as voice communication, the incoming call may be accepted only at the convenience of the user. Specifically, when the user's face can be imaged by the camera 41, it may be assumed that the user is convenient and the incoming call may be accepted.
 さらに、音声通信等の際に、相手を呼び出しても応答しないと、不愉快になる人がいる。このような感情を抑制するために、相手からの応答を待っている利用者に対し、相手の状況を伝えるようにしてもよい。例えば、端末装置1において使用者のスケジュールを管理しておき、使用者が着呼に対して応答しない場合、使用者が何をしているか、或いは、使用者のスケジュールの空き時間を検索し、使用者がいつ応答できるか伝えるようにすることが考えられる。 In addition, there are people who are uncomfortable if they do not respond even if they call the other party during voice communication. In order to suppress such feelings, the situation of the other party may be communicated to a user who is waiting for a response from the other party. For example, if the user's schedule is managed in the terminal device 1 and the user does not respond to an incoming call, the user's schedule is searched for what the user is doing, or the user's schedule, It is possible to tell when the user can respond.
 また、使用者が着呼に対して応答しない場合、使用者の場所を呼出元に伝えてもよい。例えば、使用者がスマートフォンやパソコンを介してインターネット等に繋いでいれば、どの端末が操作されているかがわかる。この情報から使用者の場所を特定して呼出元に伝えることが考えられる。 Also, when the user does not answer the incoming call, the location of the user may be notified to the caller. For example, if the user is connected to the Internet or the like via a smartphone or a personal computer, it can be determined which terminal is being operated. It is conceivable to identify the location of the user from this information and convey it to the caller.
 さらに、使用者が着呼に対して応答できるか否かを、GPS等を用いた位置情報を利用して判断するようにしてもよい。位置情報に基づけば、車に乗っているか否か、自宅に居るか否か等を判断でき、例えば使用者が移動中である場合やベッド上にいる場合であれば、公共性が高い或いは睡眠中と判断して着呼に応答できないと判断すればよい。このように着呼に応答できない場合には、前述のように使用者が何をしているか等を呼出元に伝えることが考えられる。 Further, whether or not the user can respond to an incoming call may be determined using position information using GPS or the like. Based on location information, you can determine whether you are in a car, at home, etc. For example, if the user is on the move or on the bed, it is highly public or sleep What is necessary is just to judge that it cannot answer an incoming call by judging that it is inside. If the incoming call cannot be answered in this way, it can be considered to inform the caller what the user is doing as described above.
 また、位置情報を取得するためには、防犯カメラを利用する構成も考えられる。近年では、様々な場所防犯カメラが取り付けられているので、これらの防犯カメラを利用して、顔認証等の本人を特定するための構成を利用して、使用者の位置を認識することができる。また、防犯カメラを利用して使用者が何をしているか(電話に出られる状況か否か)といった状況判断を行ってもよい。また、着呼に応答できるか否かについては、別の固定電話を使っているかといった条件(固定電話の使用中には着呼に応答できない)でも判断できる。 Also, in order to acquire the position information, a configuration using a security camera can be considered. In recent years, various location security cameras have been installed, so it is possible to recognize the position of the user using a configuration for identifying the person such as face authentication using these security cameras. . In addition, a situation determination such as what the user is doing using the security camera (whether or not the telephone can be answered) may be performed. Further, whether or not an incoming call can be answered can also be determined based on conditions such as whether another fixed telephone is being used (the incoming call cannot be answered while the fixed telephone is in use).
 さらに、端末装置1の使用者が誰かと会話をしたい場合、使用者の性格学習結果を利用し、不特定多数の内、利用者同士の相性が良いと推定される端末装置を呼び出すようにしてもよい。また、このような場合、盛り上がりそうな話題(双方の使用者が興味のある話題(学習結果を利用して抽出されるもの))を使用者に対して話しかけるようにしてもよい。 Furthermore, when the user of the terminal device 1 wants to have a conversation with someone, use the personality learning result of the user and call out the terminal device that is estimated to have good compatibility among users among the unspecified number. Also good. In such a case, a topic that is likely to be excited (a topic that both users are interested in (extracted using the learning result)) may be spoken to the user.
 また、音声応答装置の利用が長時間ない場合(基準時間以上、使用者が発話していないとき)に、音声応答装置が使用者に何らかの言葉を掛けるようにしてもよい。
 この際に、GPS等の位置情報を利用して話しかける言葉を選択してもよい。
Further, when the voice response device is not used for a long time (when the user is not speaking for more than the reference time), the voice response device may put some words on the user.
At this time, words to be spoken using position information such as GPS may be selected.
 [特許請求の範囲または課題を解決するための手段に記載(本発明)の各手段と、実施形態における構成との関係]
 上記実施形態における端末装置1、サーバ90は本発明の音声応答装置の一例に相当する。また、上記実施形態における22、S56の処理は本発明の応答取得手段の一例に相当する。
[Relationship Between Means Described in the Claims or Means for Solving the Problems (Invention) and Configuration in Embodiment]
The terminal device 1 and the server 90 in the above embodiment correspond to an example of the voice response device of the present invention. Further, the processes of S22 and S56 in the above embodiment correspond to an example of a response acquisition unit of the present invention.
 さらに、上記実施形態におけるS28、S60、S64の処理は本発明の音声出力手段の一例に相当する。また、上記実施形態におけるS2、S6の処理は本発明の音声入力手段の一例に相当する。 Furthermore, the processes of S28, S60, and S64 in the above embodiment correspond to an example of the audio output means of the present invention. Further, the processes of S2 and S6 in the above embodiment correspond to an example of the voice input means of the present invention.
 さらに、上記実施形態におけるS14の処理は本発明の音声送信手段の一例に相当する。また、上記実施形態における応答候補DB105は本発明の応答記録手段の一例に相当する。 Furthermore, the process of S14 in the above embodiment corresponds to an example of the voice transmission means of the present invention. The response candidate DB 105 in the above embodiment corresponds to an example of a response recording unit of the present invention.
 さらに、上記実施形態におけるS56の処理は本発明の性格情報取得手段の一例に相当する。また、上記実施形態におけるS22、S56の処理は本発明の応答取得手段の一例に相当する。 Furthermore, the process of S56 in the above embodiment corresponds to an example of the character information acquisition means of the present invention. Further, the processing of S22 and S56 in the above embodiment corresponds to an example of a response acquisition unit of the present invention.
 さらに、上記実施形態におけるS28、S60、S64の処理は本発明の音声出力手段の一例に相当する。また、上記実施形態におけるS254、S258、S260の処理は本発明の第1性格情報生成手段、第2性格情報生成手段の一例に相当する。また、上記実施形態におけるS56の処理は本発明の性格情報取得手段の一例に相当する。 Furthermore, the processes of S28, S60, and S64 in the above embodiment correspond to an example of the audio output means of the present invention. The processing of S254, S258, and S260 in the above embodiment corresponds to an example of the first personality information generation unit and the second personality information generation unit of the present invention. Moreover, the process of S56 in the said embodiment is corresponded to an example of the character information acquisition means of this invention.
 さらに、上記実施形態におけるS22、S56の処理は本発明の応答取得手段に相当する。また、上記実施形態におけるS28、S60、S64の処理は本発明の音声出力手段の一例に相当する。 Furthermore, the processing of S22 and S56 in the above embodiment corresponds to the response acquisition means of the present invention. Further, the processes of S28, S60, and S64 in the above embodiment correspond to an example of the audio output means of the present invention.
 さらに、上記実施形態におけるS254、S258、S260の処理は本発明の第1性格情報生成手段、第2性格情報生成手段の一例に相当する。
 さらに、上記実施形態におけるS48、S56の処理は本発明の応答生成手段の一例に相当する。また、上記実施形態におけるS28、S60、S64の処理は本発明の音声出力手段の一例に相当する。
Furthermore, the processing of S254, S258, and S260 in the above embodiment corresponds to an example of the first personality information generation unit and the second personality information generation unit of the present invention.
Furthermore, the processing of S48 and S56 in the above embodiment corresponds to an example of a response generation unit of the present invention. Further, the processes of S28, S60, and S64 in the above embodiment correspond to an example of the audio output means of the present invention.
 さらに、上記実施形態における変形例:S48の処理は本発明の音声入力動画取得手段の一例に相当する。また、上記実施形態におけるS52の処理は本発明の文字情報変換手段の一例に相当する。 Furthermore, the modification in the above embodiment: the process of S48 corresponds to an example of the voice input moving image acquiring means of the present invention. Moreover, the process of S52 in the said embodiment is corresponded to an example of the character information conversion means of this invention.
 さらに、上記実施形態における嗜好情報生成処理は本発明の嗜好情報生成手段の一例に相当する。また、上記実施形態におけるS56の処理は本発明の応答候補取得手段の一例に相当する。 Furthermore, the preference information generation process in the above embodiment corresponds to an example of the preference information generation means of the present invention. Moreover, the process of S56 in the said embodiment is corresponded to an example of the response candidate acquisition means of this invention.
 さらに、上記実施形態における動作文字入力処理は本発明の文字情報生成手段の一例に相当する。また、他装置情報取得手段上記実施形態における他端末利用処理は本発明の転送手段の一例に相当する。 Furthermore, the action character input process in the above embodiment corresponds to an example of the character information generating means of the present invention. Other device information acquisition means The other terminal use processing in the above embodiment corresponds to an example of the transfer means of the present invention.
 さらに、上記実施形態におけるS98の処理は本発明の再生条件判定手段の一例に相当する。また、上記実施形態におけるS100の処理は本発明のメッセージ再生手段の一例に相当する。 Furthermore, the process of S98 in the above embodiment corresponds to an example of the reproduction condition determining means of the present invention. Moreover, the process of S100 in the said embodiment is corresponded to an example of the message reproduction | regeneration means of this invention.
 さらに、上記実施形態におけるS116の処理は本発明の未回答時送信手段の一例に相当する。また、上記実施形態におけるS372の処理は本発明の発話正確度検出手段の一例に相当する。 Furthermore, the processing of S116 in the above embodiment corresponds to an example of the non-response transmission means of the present invention. Moreover, the process of S372 in the said embodiment is corresponded to an example of the speech accuracy detection means of this invention.
 さらに、上記実施形態におけるS374の処理は本発明の正確度合出力手段の一例に相当する。また、上記実施形態におけるS204の処理は本発明の接続制御手段の一例に相当する。 Furthermore, the process of S374 in the above embodiment corresponds to an example of the accuracy output means of the present invention. Moreover, the process of S204 in the said embodiment is corresponded to an example of the connection control means of this invention.
 さらに、上記実施形態におけるS50の処理は本発明の感情判定手段の一例に相当する。また、上記実施形態におけるS438の処理は本発明の経路情報取得手段の一例に相当する。 Furthermore, the process of S50 in the above embodiment corresponds to an example of the emotion determination means of the present invention. Moreover, the process of S438 in the said embodiment is equivalent to an example of the route information acquisition means of this invention.
 さらに、上記実施形態におけるS462の処理は本発明の視線検出手段の一例に相当する。また、上記実施形態におけるS464の処理は本発明の視線移動要求送信手段の一例に相当する。 Furthermore, the process of S462 in the above embodiment corresponds to an example of the line-of-sight detection means of the present invention. Further, the process of S464 in the above embodiment corresponds to an example of a line-of-sight movement request transmission unit of the present invention.
 さらに、上記実施形態におけるS464の処理は本発明の変化要求送信手段の一例に相当する。また、上記実施形態におけるS486の処理は本発明の放送番組取得手段の一例に相当する。 Furthermore, the process of S464 in the above embodiment corresponds to an example of a change request transmission unit of the present invention. Moreover, the process of S486 in the said embodiment is corresponded to an example of the broadcast program acquisition means of this invention.
 さらに、上記実施形態におけるS484の処理は本発明の放送番組補完手段、歌詞付加手段の一例に相当する。また、上記実施形態におけるS504、S506の処理は本発明の読み方出力手段の一例に相当する。また、上記実施形態におけるS522、S524の処理は本発明の行動環境検出手段の一例に相当する。 Furthermore, the processing of S484 in the above embodiment corresponds to an example of the broadcast program complementing means and the lyrics adding means of the present invention. Further, the processing of S504 and S506 in the above embodiment corresponds to an example of the reading output means of the present invention. Further, the processing of S522 and S524 in the above embodiment corresponds to an example of the behavior environment detection means of the present invention.
 さらに、上記実施形態におけるS538の処理は本発明の健康状態判定手段の一例に相当する。また、上記実施形態におけるS540の処理は本発明の健康メッセージ生成手段の一例に相当する。 Furthermore, the process of S538 in the above embodiment corresponds to an example of the health condition determining means of the present invention. Moreover, the process of S540 in the said embodiment is corresponded to an example of the health message production | generation means of this invention.
 さらに、上記実施形態におけるS542の処理は本発明の通報手段の一例に相当する。 Furthermore, the process of S542 in the above embodiment corresponds to an example of the reporting means of the present invention.

Claims (3)

  1.  入力された文字情報に対する応答を音声で行わせる音声応答装置であって、
     前記文字情報に対する複数の異なる応答を取得する応答取得手段と、
     前記複数の異なる応答をそれぞれ異なる声色で出力させる音声出力手段と、
     を備えたことを特徴とする音声応答装置。
    A voice response device that allows voice response to input character information,
    Response acquisition means for acquiring a plurality of different responses to the character information;
    Voice output means for outputting the plurality of different responses in different voice colors;
    A voice response device comprising:
  2.  請求項1に記載の音声応答装置において、
     使用者が音声を入力するための音声入力手段と、
     入力された音声を文字情報に変換し、該文字情報に対する複数の異なる応答を生成して当該音声応答装置に送信する外部装置、に対して送信する音声送信手段と、
     を備え、
     前記応答取得手段は、前記外部装置から前記応答を取得すること
     を特徴とする音声応答装置。
    The voice response device according to claim 1,
    Voice input means for the user to input voice;
    Voice transmitting means for converting the input voice to character information, generating a plurality of different responses to the character information and transmitting the response to the voice response device;
    With
    The response acquisition unit acquires the response from the external device.
  3.  請求項1または請求項2に記載の音声応答装置において、
     当該音声応答装置または前記外部装置には、複数の文字情報のそれぞれに対して、各文字情報に対する肯定的応答と否定的応答とを含む複数の異なる応答が記録された応答記録手段、を備え、
     前記応答取得手段は、前記複数の異なる応答として前記肯定的応答と前記否定的応答とを取得し、
     前記音声出力手段は、前記肯定的応答と前記否定的応答とで異なる声色で再生すること
     を特徴とする音声応答装置。
    The voice response device according to claim 1 or 2,
    The voice response device or the external device includes response recording means in which a plurality of different responses including a positive response and a negative response to each character information are recorded for each of a plurality of character information,
    The response acquisition means acquires the positive response and the negative response as the plurality of different responses,
    The voice output device is characterized in that the voice output means reproduces a voice color different between the positive response and the negative response.
PCT/JP2013/064918 2012-06-18 2013-05-29 Voice response device WO2013190963A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2014521255A JP6267636B2 (en) 2012-06-18 2013-05-29 Voice response device

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
JP2012137065 2012-06-18
JP2012-137067 2012-06-18
JP2012-137066 2012-06-18
JP2012137066 2012-06-18
JP2012137067 2012-06-18
JP2012-137065 2012-06-18

Publications (1)

Publication Number Publication Date
WO2013190963A1 true WO2013190963A1 (en) 2013-12-27

Family

ID=49768566

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/064918 WO2013190963A1 (en) 2012-06-18 2013-05-29 Voice response device

Country Status (2)

Country Link
JP (14) JP6267636B2 (en)
WO (1) WO2013190963A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015182177A1 (en) * 2014-05-28 2015-12-03 シャープ株式会社 Electronic device and message system
WO2016052164A1 (en) * 2014-09-30 2016-04-07 シャープ株式会社 Conversation device
JP2016076117A (en) * 2014-10-07 2016-05-12 株式会社Nttドコモ Information processing device and utterance content output method
JP2017084177A (en) * 2015-10-29 2017-05-18 シャープ株式会社 Electronic apparatus and control method thereof
WO2017130497A1 (en) * 2016-01-28 2017-08-03 ソニー株式会社 Communication system and communication control method
JP6205039B1 (en) * 2016-09-16 2017-09-27 ヤフー株式会社 Information processing apparatus, information processing method, and program
JP2018167339A (en) * 2017-03-29 2018-11-01 富士通株式会社 Utterance control program, information processor, and utterance control method
WO2019187590A1 (en) * 2018-03-29 2019-10-03 ソニー株式会社 Information processing device, information processing method, and program
JP2019535037A (en) * 2016-10-03 2019-12-05 グーグル エルエルシー Synthetic Speech Selection for Agents by Computer
US10853747B2 (en) 2016-10-03 2020-12-01 Google Llc Selection of computational agent for task performance
US10854188B2 (en) 2016-10-03 2020-12-01 Google Llc Synthesized voice selection for computational agents
JP2022017561A (en) * 2017-06-14 2022-01-25 ヤマハ株式会社 Information processing unit, output method for singing voice, and program
US11595331B2 (en) 2016-01-28 2023-02-28 Sony Group Corporation Communication system and communication control method
US11663535B2 (en) 2016-10-03 2023-05-30 Google Llc Multi computational agent performance of tasks

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6267636B2 (en) * 2012-06-18 2018-01-24 エイディシーテクノロジー株式会社 Voice response device
EP3766614A4 (en) 2018-03-16 2021-07-14 Sumitomo Electric Hardmetal Corp. Surface coated cutting tool and manufacturing method for same
JPWO2022038720A1 (en) 2020-08-19 2022-02-24
CN115697858A (en) 2020-08-19 2023-02-03 日本烟草产业株式会社 Packaging material for tobacco products and package for tobacco products
WO2023047487A1 (en) 2021-09-22 2023-03-30 株式会社Fuji Situation awareness system, voice response device, and situation awareness method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000181475A (en) * 1998-12-21 2000-06-30 Sony Corp Voice answering device
JP2007148039A (en) * 2005-11-28 2007-06-14 Matsushita Electric Ind Co Ltd Speech translation device and speech translation method

Family Cites Families (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58104743U (en) * 1982-01-13 1983-07-16 日本精機株式会社 Vehicle audio notification device
JP2912394B2 (en) * 1989-10-04 1999-06-28 株式会社日立製作所 Car phone position notification device
JP3120995B2 (en) 1990-07-03 2000-12-25 本田技研工業株式会社 Paint temperature control system
JP3674990B2 (en) * 1995-08-21 2005-07-27 セイコーエプソン株式会社 Speech recognition dialogue apparatus and speech recognition dialogue processing method
US5766015A (en) * 1996-07-11 1998-06-16 Digispeech (Israel) Ltd. Apparatus for interactive language training
JP3408425B2 (en) * 1998-08-11 2003-05-19 株式会社日立製作所 Automatic mediation method and medium on which processing program is recorded
WO2001024139A1 (en) * 1999-09-27 2001-04-05 Kojima Co., Ltd. Pronunciation evaluation system
JP2001330450A (en) * 2000-03-13 2001-11-30 Alpine Electronics Inc Automobile navigation system
US6731307B1 (en) * 2000-10-30 2004-05-04 Koninklije Philips Electronics N.V. User interface/entertainment device that simulates personal interaction and responds to user's mental state and/or personality
CN1266625C (en) * 2001-05-04 2006-07-26 微软公司 Server for identifying WEB invocation
JP2002342356A (en) * 2001-05-18 2002-11-29 Nec Software Kyushu Ltd System, method and program for providing information
JP2003023501A (en) * 2001-07-06 2003-01-24 Self Security:Kk Safety confirmation support equipment of single life person
JP2003108362A (en) * 2001-07-23 2003-04-11 Matsushita Electric Works Ltd Communication supporting device and system thereof
JP2003108376A (en) * 2001-10-01 2003-04-11 Denso Corp Response message generation apparatus, and terminal device thereof
JP2003216176A (en) * 2002-01-28 2003-07-30 Matsushita Electric Works Ltd Speech controller
JP4086280B2 (en) * 2002-01-29 2008-05-14 株式会社東芝 Voice input system, voice input method, and voice input program
EP1408303B1 (en) * 2002-03-15 2011-12-21 Mitsubishi Denki Kabushiki Kaisha Vehicular navigation device
JP3777337B2 (en) 2002-03-27 2006-05-24 ドコモ・モバイルメディア関西株式会社 Data server access control method, system thereof, management apparatus, computer program, and recording medium
JP2003329477A (en) * 2002-05-15 2003-11-19 Pioneer Electronic Corp Navigation device and interactive information providing program
JP2004021121A (en) * 2002-06-19 2004-01-22 Nec Corp Voice interaction controller unit
JP2004030313A (en) * 2002-06-26 2004-01-29 Ntt Docomo Tokai Inc Method and system for providing service
JP2004046400A (en) * 2002-07-10 2004-02-12 Mitsubishi Heavy Ind Ltd Speaking method of robot
JP2004301942A (en) * 2003-03-28 2004-10-28 Bandai Co Ltd Speech recognition device, conversation device, and robot toy
JP2004364128A (en) * 2003-06-06 2004-12-24 Sanyo Electric Co Ltd Communication system
JP3895758B2 (en) * 2004-01-27 2007-03-22 松下電器産業株式会社 Speech synthesizer
JP2005301914A (en) * 2004-04-15 2005-10-27 Sharp Corp Portable information appliance
JP2005342862A (en) * 2004-06-04 2005-12-15 Nec Corp Robot
JP2005352895A (en) * 2004-06-11 2005-12-22 Kenwood Corp Vehicle driver awakening system
JP4459238B2 (en) * 2004-12-28 2010-04-28 シャープ株式会社 Mobile terminal, communication terminal, location notification system using these, and location notification method
US8983962B2 (en) * 2005-02-08 2015-03-17 Nec Corporation Question and answer data editing device, question and answer data editing method and question answer data editing program
JP2006227846A (en) * 2005-02-16 2006-08-31 Fujitsu Ten Ltd Information display system
JP4586566B2 (en) * 2005-02-22 2010-11-24 トヨタ自動車株式会社 Spoken dialogue system
JP4631501B2 (en) * 2005-03-28 2011-02-16 パナソニック電工株式会社 Home system
JP2008053989A (en) * 2006-08-24 2008-03-06 Megachips System Solutions Inc Door phone system
JP2008153889A (en) * 2006-12-15 2008-07-03 Promise Co Ltd Answering operation mediation system
JP2008152013A (en) * 2006-12-18 2008-07-03 Canon Inc Voice synthesizing device and method
JP5173221B2 (en) * 2007-03-25 2013-04-03 京セラ株式会社 Mobile terminal, information processing system, and information processing method
JP2009093284A (en) * 2007-10-04 2009-04-30 Toyota Motor Corp Drive support device
JP5305802B2 (en) * 2008-09-17 2013-10-02 オリンパス株式会社 Information presentation system, program, and information storage medium
JP2010079149A (en) * 2008-09-29 2010-04-08 Brother Ind Ltd Visitor reception device, person-in-charge terminal, visitor reception method, and visitor reception program
JP5195405B2 (en) * 2008-12-25 2013-05-08 トヨタ自動車株式会社 Response generating apparatus and program
US9092389B2 (en) * 2009-03-16 2015-07-28 Avaya Inc. Advanced availability detection
JP5563422B2 (en) * 2010-10-15 2014-07-30 京セラ株式会社 Electronic device and control method
JP6267636B2 (en) 2012-06-18 2018-01-24 エイディシーテクノロジー株式会社 Voice response device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000181475A (en) * 1998-12-21 2000-06-30 Sony Corp Voice answering device
JP2007148039A (en) * 2005-11-28 2007-06-14 Matsushita Electric Ind Co Ltd Speech translation device and speech translation method

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106233372B (en) * 2014-05-28 2019-07-26 夏普株式会社 Electronic equipment and message leaving system
JP2015225258A (en) * 2014-05-28 2015-12-14 シャープ株式会社 Electronic apparatus and message system
CN106233372A (en) * 2014-05-28 2016-12-14 夏普株式会社 Electronic equipment and message leaving system
WO2015182177A1 (en) * 2014-05-28 2015-12-03 シャープ株式会社 Electronic device and message system
WO2016052164A1 (en) * 2014-09-30 2016-04-07 シャープ株式会社 Conversation device
JP2016071247A (en) * 2014-09-30 2016-05-09 シャープ株式会社 Interaction device
JP2016076117A (en) * 2014-10-07 2016-05-12 株式会社Nttドコモ Information processing device and utterance content output method
JP2017084177A (en) * 2015-10-29 2017-05-18 シャープ株式会社 Electronic apparatus and control method thereof
US11595331B2 (en) 2016-01-28 2023-02-28 Sony Group Corporation Communication system and communication control method
JPWO2017130497A1 (en) * 2016-01-28 2018-11-22 ソニー株式会社 Communication system and communication control method
WO2017130497A1 (en) * 2016-01-28 2017-08-03 ソニー株式会社 Communication system and communication control method
US11159462B2 (en) 2016-01-28 2021-10-26 Sony Corporation Communication system and communication control method
JP6205039B1 (en) * 2016-09-16 2017-09-27 ヤフー株式会社 Information processing apparatus, information processing method, and program
JP2018045630A (en) * 2016-09-16 2018-03-22 ヤフー株式会社 Information processing device, information processing method, and program
US10853747B2 (en) 2016-10-03 2020-12-01 Google Llc Selection of computational agent for task performance
JP2019535037A (en) * 2016-10-03 2019-12-05 グーグル エルエルシー Synthetic Speech Selection for Agents by Computer
US10854188B2 (en) 2016-10-03 2020-12-01 Google Llc Synthesized voice selection for computational agents
US11663535B2 (en) 2016-10-03 2023-05-30 Google Llc Multi computational agent performance of tasks
JP2018167339A (en) * 2017-03-29 2018-11-01 富士通株式会社 Utterance control program, information processor, and utterance control method
JP2022017561A (en) * 2017-06-14 2022-01-25 ヤマハ株式会社 Information processing unit, output method for singing voice, and program
JP7424359B2 (en) 2017-06-14 2024-01-30 ヤマハ株式会社 Information processing device, singing voice output method, and program
WO2019187590A1 (en) * 2018-03-29 2019-10-03 ソニー株式会社 Information processing device, information processing method, and program

Also Published As

Publication number Publication date
JP2018136545A (en) 2018-08-30
JP2018136546A (en) 2018-08-30
JP6751865B2 (en) 2020-09-09
JP2021184111A (en) 2021-12-02
JP2022062200A (en) 2022-04-19
JP2018136540A (en) 2018-08-30
JP2017215602A (en) 2017-12-07
JPWO2013190963A1 (en) 2016-05-26
JP2019179243A (en) 2019-10-17
JP7231289B2 (en) 2023-03-01
JP2018136541A (en) 2018-08-30
JP2017215603A (en) 2017-12-07
JP2018049285A (en) 2018-03-29
JP6552123B2 (en) 2019-07-31
JP6669951B2 (en) 2020-03-18
JP2023079225A (en) 2023-06-07
JP2018092179A (en) 2018-06-14
JP6267636B2 (en) 2018-01-24
JP2020038387A (en) 2020-03-12
JP6969811B2 (en) 2021-11-24

Similar Documents

Publication Publication Date Title
JP7231289B2 (en) voice response system
US11241789B2 (en) Data processing method for care-giving robot and apparatus
US11004446B2 (en) Alias resolving intelligent assistant computing device
JP7070544B2 (en) Learning device, learning method, speech synthesizer, speech synthesizer
US11430439B2 (en) System and method for providing assistance in a live conversation
US20160379107A1 (en) Human-computer interactive method based on artificial intelligence and terminal device
JP2019049742A (en) Voice response device
CN109313935B (en) Information processing system, storage medium, and information processing method
JP2018133696A (en) In-vehicle device, content providing system, and content providing method
US11270682B2 (en) Information processing device and information processing method for presentation of word-of-mouth information
JP2022006610A (en) Social capacity generation device, social capacity generation method, and communication robot
JP2020166593A (en) User support device, user support method, and user support program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13806621

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2014521255

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13806621

Country of ref document: EP

Kind code of ref document: A1