WO2013190963A1

WO2013190963A1 - Voice response device

Info

Publication number: WO2013190963A1
Application number: PCT/JP2013/064918
Authority: WO
Inventors: 勉足立; 丈誠横井; 林　茂; 健純近藤; 辰美黒田; 大介毛利; 豪生野澤; 謙史竹中; 毅川西; 健司水野; 博司前川; 岩田　誠
Original assignee: エイディシーテクノロジー株式会社
Priority date: 2012-06-18
Filing date: 2013-05-29
Publication date: 2013-12-27
Also published as: JP2018136545A; JP2018136546A; JP6751865B2; JP2021184111A; JP2022062200A; JP2018136540A; JP2017215602A; JPWO2013190963A1; JP2019179243A; JP7231289B2; JP2018136541A; JP2017215603A; JP2018049285A; JP6552123B2; JP6669951B2; JP2023079225A; JP2018092179A; JP6267636B2; JP2020038387A; JP6969811B2

Abstract

This voice response device makes voice responses to input text information, and is provided with: a response acquisition means for acquiring a plurality of responses to the text information; and a voice output means for outputting the plurality of responses, doing so in respectively different voices. According to this voice response device, a plurality of responses can be output in different voices, whereby even in instances in which the answer to one item of text information cannot be identified as being a single one, different answers can be output in different voices in a manner easily comprehensible by a user.

Description

Voice response device

Cross-reference of related applications

This international application includes Japanese Patent Application No. 2012-137065, Japanese Patent Application No. 2012-137066, and Japanese Patent Application No. 2012-137067 filed with the Japan Patent Office on June 18, 2012. The Japanese patent application No. 2012-137065, the Japanese patent application No. 2012-137066, and the Japanese patent application No. 2012-137067 are referred to in the present international application. Incorporated into.

The present invention relates to a voice response device that allows voice response to input character information.

As the above-mentioned voice response device, there is known a device that searches a dictionary for an answer to an inputted question and outputs the searched answer by voice (for example, see Patent Document 1). In addition, a technique for generating an answer to a question based on the content of dialogue with a user is also known (see, for example, Patent Document 2).

Japanese Patent No. 4832097 Japanese Patent No. 4924950

In the above technique, one answer specified by the dictionary is simply set for one question.
One aspect of the present invention is to improve the usability for a user in a voice response device that makes a response to input character information by voice.

In the invention of the first aspect,
A voice response device that allows voice response to input character information,
Response acquisition means for acquiring a plurality of different responses to the character information;
Voice output means for outputting the plurality of different responses in different voice colors;
It is provided with.

According to such a voice response device, it is possible to output a plurality of responses with different voice colors, so even if it is not possible to specify one solution for one character information, a different solution with different voice colors can be used. Can be easily output. Therefore, it is possible to improve usability for the user.

Note that the voice response device of the present invention may be configured as a terminal device possessed by a user, or may be configured as a server that communicates with the terminal device. The character information may be input to the keyboard or the like using input means, or may be input by converting voice into character information.

By the way, in the voice response device, as in the invention of the second aspect,
For voice input means for a user to input voice and an external device that converts the input voice into character information, generates a plurality of different responses to the character information, and transmits the response to the voice response device Audio transmitting means for transmitting;
With
The response acquisition unit may acquire the response from the external device.

According to such a voice response device, since the voice response device can input voice, it can be configured to input character information by voice. Moreover, since it can be set as the structure which produces | generates a response in an external device, the processing load in a voice response device can be reduced.

In the voice transmitting means, the operation of “converting input voice into character information” may be performed by a voice response device or an external device.
Furthermore, in the voice response device, as in the invention of the third aspect,
The voice response device or the external device includes response recording means in which a plurality of different responses including a positive response and a negative response to each character information are recorded for each of a plurality of character information,
The response acquisition means acquires the positive response and the negative response as the plurality of different responses,
The voice output means may play back with different voice colors for the positive response and the negative response.

According to such a voice response device, responses of different positions such as a positive response and a negative response can be played with different voice colors, so that the voice is played as if another person is speaking. Can do. Therefore, it is possible to make it difficult for the user who listens to the voice to feel uncomfortable.

Note that the voice color may be changed depending on the type of response and the language used in the response. For example, when a response is made with a gentle tone, the voice is reproduced with a calm woman's voice, and when a response is made with a severe tone, a response with a brave man's voice may be made. That is, the response content and the personality are associated with each other, and the voice color may be set according to the personality.

In addition, the voice response device can be configured to be used at the reception of a workplace or company as in the invention of the fourth aspect, or can be configured to notify the user that it is difficult to tell someone directly.

When using a voice response device at the reception, the name and company name of the person coming to the sales are recorded in advance in the voice response device or an external device, and those who came to the receptionist have given this name or company name. In that case, a response may be generated so as to reproduce the voice of the phrase to be refused.

In addition, when it is configured to convey something that is difficult to say instead, for example, if you talk to the device today that you want to say such a thing before dating, it will be appropriate time (for example, preset time or When a certain time has passed since the conversation was interrupted), the voice response device may speak (reproduce the voice) instead.

Or, it may be configured to speak words that trigger difficult things to say, such as words like "Did you say something to her?" That is, the response may not be output immediately, but may be output when the reproduction condition is satisfied, for example, after a certain time has elapsed.

Further, in the voice response device, as in the fifth aspect of the invention, the external device or the voice response device may acquire information for generating a response to the character information from another voice response device. . In the voice response device, as in the invention of the sixth aspect, when information for generating a response to the character information is requested from another voice response device, the information corresponding to the request is returned. May be.

In this case, the voice response device includes sensors for detecting position information, temperature, humidity, illuminance, noise level, etc., and a database such as dictionary information, and extracts necessary information according to the request. You can do it.

Such a voice response device (external device) can acquire information for generating a response from another voice response device. In this case, information unique to the other voice response device such as the position of the other voice response device can be acquired.

In addition, information unique to itself can be transmitted to another voice response device.
Further, in the voice response device, as in the seventh aspect of the invention, a response (for example, a positive response or a negative response) output by itself or another voice response device is input as character information, and the response to this response is received. You may make it produce | generate the response for performing objection. In other words, from the user's point of view, it is possible to hear discussions based on both opinions in favor and in opposition. And after hearing this discussion, the user can make a final decision.

This configuration can be realized using one or a plurality of voice response devices. In this case, in order for a plurality of voice response devices to exchange voices, voices may be directly input / output, or wireless communication or the like may be used.

In the invention of the eighth aspect,
A voice response device that allows voice response to input character information,
Personality information acquisition means for acquiring personality information associated with the personality of a person representing a user or a person related to the user according to a preset category;
Response acquisition means for acquiring response candidates representing a plurality of different responses to the character information;
Selecting a response to be output from a response candidate according to the personality information, and outputting the selected response;
It is provided with.

According to such a voice response device, different responses can be made according to the personality of the user or the person related to the user (related person). Therefore, usability can be improved for the user.

In the voice response device, as in the ninth aspect of the invention,
Comprising first personality information generating means for generating personality information of the user or the related person based on answers to a plurality of preset questions;
The personality information acquisition unit may acquire personality information generated by the personality information generation unit.

According to such a voice response device, personality information can be generated in the voice response device. When generating personality information, a well-known personality analysis technique (Rorschach test, Sondy test, etc.) may be used. In addition, when generating personality information, aptitude inspection technology used for employment tests by companies and the like may be used.

Furthermore, in the voice response device, as in the invention of the tenth aspect,
Second character information generating means for generating character information of the user or the related person based on a character string included in the input character information;
The personality information acquisition unit may acquire personality information generated by the personality information generation unit.

According to such a voice response device, personality information can be generated in the process in which the user uses the voice response device.
In the voice response device, as in the invention of the eleventh aspect,
Preference information generating means for generating preference information indicating a tendency of preference of the user or the related person based on a character string included in character information;
The voice output means may select a response to be output from the response candidate based on the preference information, and output the selected response.

According to such a voice response device, a response can be made according to the preference of the user or the person concerned.
Further, in the above voice response device, as in the invention of the twelfth aspect, the user's behavior (conversation, place moved, reflected in the camera) is learned (recorded and analyzed), and the user's conversation You may make up for the lack of words.

For example, when the user answers the question “Is it hamburger yesterday?” To the conversation that the user answers “I want curry?” However, the reason why he said that curry is good is conveyed.

Further, such a configuration can be implemented during a telephone call, or may be configured to participate in a user's conversation without permission.
Furthermore, in the voice response device, as in the invention of the thirteenth aspect,
Response candidate acquisition means for acquiring response candidates from a predetermined server or the Internet,
May be provided.

According to such a voice response device, response candidates can be acquired not only from the device itself or an external device, but also from any device connected via the Internet or a dedicated line.
In the voice response device, as in the invention of the fourteenth aspect,
Character information generation means for converting user's action into character information;
May be provided.

Here, the action referred to in the present invention corresponds to an action caused by a muscle action such as conversation, handwriting of characters, or gesture gesture (for example, sign language).
According to such a voice response device, the user's action can be converted into character information.

Furthermore, in the voice response device, as in the invention of the fifteenth aspect,
The character information generation means converts the voice of the user's utterance into character information, and accumulates utterances (such as pronunciation utterances) at the time of utterance as learning information (captures and records these characteristics)
You may do it.

According to such a voice response device, the character information can be generated based on the learning information, so that the generation accuracy of the character information can be improved.
In the voice response device, as in the invention of the sixteenth aspect,
Transfer means for transferring the learning information to another voice response device;
May be provided.

According to such a voice response device, even when the user uses another voice response device, the learning information recorded by the voice response device can be used. Therefore, even when other voice response devices are used, the generation accuracy of character information can be improved.

Further, in the voice response device, as in the invention of the seventeenth aspect, any one of the user's behavior and operation may be detected, and learning information or personality information may be generated based on these. .

According to such a voice response device, for example, when it is detected that the user jumps on the train for several days in a row, it is urged to leave the house several minutes earlier from the next day, or the user is easily angry from the conversation. When it is detected that there is a tendency, it is possible to output voice or music that suppresses the mood.

In the voice response device, as in the eighteenth aspect of the invention,
You may provide the other apparatus information acquisition means which acquires the information currently recorded on the other voice response apparatus from another voice response apparatus.

According to such a voice response device, a response can be generated based on information recorded in another voice response device.
Furthermore, in the above voice response device, as in the nineteenth aspect,
Reproduction condition determination means for determining whether or not the state of the voice response device matches a reproduction condition set in advance as a condition for outputting voice when the character information is not input;
Message reproduction means for outputting a preset message when the reproduction condition is satisfied;
May be provided.

According to such a voice response device, voice can be output even when character information is not input (that is, when the user does not speak). For example, by forcing the user to speak, it can be used as a measure for suppressing drowsiness while driving a car. Moreover, safety confirmation can be performed by determining whether a person living alone responds.

In the voice response device, as in the twentieth aspect of the invention,
The message reproducing means may acquire news information and output a message related to the news in a question format for asking a user's answer.

According to such a voice response device, since it is possible to have a conversation about news, it is possible to prevent the conversation from always being the same. For example, if the information about the stock price of a company can be acquired, the content of the conversation can be "Today's stock price of XX company has increased by XX yen. Did you know?" .

Furthermore, in the voice response device, as in the invention of the twenty-first aspect,
The voice output means or the message reproduction means may be made to output by adding externally acquired information (news and environment (temperature, weather, position information, etc.)) separately acquired to a preset message.

According to such a voice response device, a response in which a predetermined message and the acquired information are combined can be output.
In the voice response device, as in the invention of the twenty-second aspect,
A plurality of messages may be acquired, and a message to be reproduced may be selected and output according to the message reproduction frequency.

According to such a voice response device, it is difficult to reproduce a message having a high reproduction frequency, so that randomness at the time of message reproduction is achieved, or a message having a high reproduction frequency is intentionally reproduced repeatedly. It can promote establishment.

Further, in the voice response device, as in the invention of the 23rd aspect,
Unanswered transmission means for transmitting information that specifies a user and that a reply was not obtained to a preset contact address when a reply or response to a message is not obtained,
May be provided.

According to such a voice response device, it is possible to notify a contact person when an answer cannot be obtained. Therefore, for example, an abnormality such as an elderly person living alone can be notified early.
In the voice response device, as in the invention of the 24th aspect,
The message playback means stores the conversation content and asks questions to obtain the same content as the heard content (memory confirmation processing).
You may do it.

According to such a voice response device, the user's memory ability can be confirmed and the memory can be fixed.
Further, in the voice response device, as in the invention of the 25th aspect,
Utterance accuracy detection means for detecting the accuracy of the pronunciation and accent of the voice input by the user,
An accuracy output means for outputting the detected accuracy;
May be provided.

According to such a voice response device, the accuracy of pronunciation and accent can be confirmed. For example, it is effective when practicing a foreign language.
In the voice response device, as in the invention of the twenty-sixth aspect,
The accuracy level output means may output a voice including the nearest word when the accuracy level is a predetermined value or less.

According to such a voice response device, the user can confirm the accuracy of pronunciation and accent.
Furthermore, in the voice response device, as in the invention of the twenty-seventh aspect,
The message reproduction means may output the same question again when the accuracy is below a certain value.

According to such a voice response device, an accurate answer can be obtained by outputting the same question.
In the voice response device, as in the invention of the twenty-eighth aspect,
A connection control means for identifying a communication partner based on the inputted character information and connecting the communication partner set in advance for each communication partner and the communication partner;
May be provided.

According to such a voice response device, it is possible to assist reception work and telephone support.
Particularly, in the voice response device, as in the invention of the 29th aspect,
The connection control means may identify a sales activity (sales) and a visitor, and reproduce a message to decline if it is a sales activity.

According to such a voice response device, it is possible to eliminate a person who may interfere with a user's business without dealing with it.
Further, in the voice response device, as in the invention of the thirtieth aspect, a keyword included in input character information (particularly voice) may be extracted and connected to a connection destination to which the keyword corresponds. For example, a keyword such as the name of the other party may be associated with the connection destination in advance.

According to such a voice response device, it is possible to assist operations such as telephone transfer and call reception.
Moreover, in the said voice response apparatus, like the invention of the 31st aspect, it may be made to recognize the requirements which the other party speaks based on a keyword, and to tell the user the outline which the other party spoke.

According to such a voice response device, it is possible to assist the intermediary work with the customer.
Furthermore, in the voice response device, as in the invention of the thirty-second aspect,
Emotions that read emotions from the voice color of the voice input by the user and output the emotions that fall into at least one of emotions including at least one of normal, anger, joy, confusion, sadness, and uplift You may provide the determination means.

According to such a voice response device, a response can be output according to the user's emotion.
Next, the invention of the 33rd aspect is
Response generation means for generating a response according to a captured image obtained by imaging the periphery of the voice response device when the character information is input;
Voice output means for outputting the response by voice;
It is provided with.

According to such a voice response device, a response can be output by voice according to the captured image. Therefore, usability can be improved compared with the structure which produces | generates a response only from character information.

As a specific configuration of the present invention, for example, there is a configuration in which character information is input so that what is recognized responds, and what is recognized from the captured image (someone) is output by voice. .

By the way, in the voice response device, as in the invention of the 34th aspect,
Searching for an object included in the character information from the captured image by image processing, and specifying a position of the searched object;
Guiding means for guiding to the position of the object;
May be provided.

According to such a voice response device, the user can be guided to the object in the captured image.
Further, in the voice response device, as in the invention of the 35th aspect,
Voice input video acquisition means for acquiring a moving image obtained by capturing the shape of the mouth of the user when inputting character information by voice;
Character information conversion means for converting the sound into character information and correcting the character information by estimating an unclear part of the sound based on the moving image;
May be provided.

According to such a voice response device, the utterance content can be estimated from the shape of the mouth, so that an unclear part of the voice can be estimated well.
In the voice response device, as in the invention of the thirty-sixth aspect,
The message reproduction means may detect the user's irritation and sway by detecting the unexpectedly uttered voice, and may generate a message for suppressing the irritation and sway.

According to such a voice response device, when the user is frustrated or shaken, these can be suppressed. Therefore, the occurrence of trouble between the user and the surroundings can be suppressed.
Further, in the voice response device, as in the invention of the 37th aspect,
In the case of performing guidance to the destination, it is provided with route information acquisition means for acquiring route information such as weather, temperature, humidity, traffic information, road surface condition to the destination,
The message reproducing means may output the route information by voice.

According to such a voice response device, the situation (route information) to the destination can be notified to the user by voice.
In the voice response device, as in the invention of the thirty-eighth aspect,
Gaze detection means for detecting the gaze of the user;
A line-of-sight movement request transmission unit that outputs a sound requesting to move the line of sight to a predetermined position when the user's line of sight does not move to a predetermined position in response to the call by the message reproduction unit;
May be provided.

According to such a voice response device, the user can be made to see a specific position. Therefore, it is possible to reliably perform safety confirmation when driving the vehicle.
In the voice response device, as in the invention of the 39th aspect,
A change request transmitting means for observing the position of the body part and facial expression and outputting a voice requesting to change the position of the body part and facial expression when there is little change in the call may be provided. .

According to such a voice response device, the position of the body part of the user can be moved to a specific position or can be guided to have a specific facial expression. The present invention can be used when driving a vehicle or performing a physical examination.

Furthermore, in the voice response device, as in the invention of the fortyth aspect,
Broadcast program acquisition means for acquiring a broadcast program similar to the broadcast program viewed by the user;
A broadcast program supplementing means for complementing the discontinued broadcast program by outputting the broadcast program acquired by itself when the broadcast program is interrupted;
May be provided.

According to such a voice response device, it is possible to compensate for the broadcast program viewed by the user from being interrupted.
In the voice response device, as in the invention of the forty-first aspect,
When a user sings a song without lyrics, the song with lyrics is compared to the song with the lyrics, and the lyrics are added to output the sound in the part where only the user's lyrics are not present Means.

According to such a voice response device, it is possible to compensate for a portion where the user cannot sing (a portion where the lyrics are interrupted) in so-called karaoke.
Furthermore, in the voice response device, as in the invention of the forty-second aspect,
When a character is included in the captured image, when a user receives a question about how to read this character, the character information is acquired from the outside, and the reading output that causes the character reading included in this information to be output by voice means,
May be provided.

According to such a voice response device, the user can be taught how to read characters.
In the voice response device, as in the invention of the 43rd aspect,
It is equipped with behavioral environment detection means that detects the user's behavior and the surrounding environment of the user,
The message generation means may generate a message according to the detected action and the surrounding environment.

According to such a voice response device, it is possible to notify a dangerous place, a prohibited area, or the like. It is also possible to detect that the user has an abnormal behavior.
Furthermore, in the voice response device, as in the invention of the forty-fourth aspect,
A health condition determining means for determining a health condition based on a captured image of the user;
Health message generating means for generating a message according to the health condition;
May be provided.

According to such a voice response device, the health condition of the user can be managed.
In the voice response device, as in the invention of the 45th aspect,
A reporting means for reporting to a specified contact when the health condition falls below a reference value;
May be provided.

According to such a voice response device, it is possible to make a report when the health state of the user is equal to or less than a reference value. Therefore, the abnormality can be notified to the other person earlier.
Further, in the above voice response device, as with the invention of the 46th aspect, information about the user may be output in response to an inquiry from a person other than the user.

Such a voice response device can answer a question in a hospital or the like on behalf of the user by detecting, for example, the walk distance of the user's meal content. Moreover, you may be allowed to learn about health conditions and self-introduction.

Note that the invention of each aspect does not have to be based on other inventions, and can be made as independent as possible.

1 is a block diagram showing a schematic configuration of a voice response system to which the present invention is applied. It is a block diagram which shows schematic structure of a terminal device. It is a flowchart which shows the voice response terminal process which MPU of a terminal device performs. It is a flowchart which shows the voice response server process which the calculating part of a server performs. It is explanatory drawing which shows an example of response candidate DB. It is a flowchart which shows the automatic conversation terminal process which MPU of a terminal device performs. It is a flowchart which shows the automatic conversation server process which the calculating part of a server performs. It is a flowchart which shows the message terminal process which MPU of a terminal device performs. It is a flowchart which shows the message server process which the calculating part of a server performs. It is a flowchart which shows the guidance terminal process which MPU of a terminal device performs. It is a flowchart which shows the guidance server process which the calculating part of a server performs. It is a flowchart which shows the reception process which the calculating part of a server performs. It is a flowchart which shows the information provision terminal process which MPU of a terminal device performs. It is explanatory drawing which shows an example of character DB. It is a flowchart which shows the character information generation process which MPU of a terminal device performs. It is explanatory drawing which shows an example of preference DB. It is a flowchart which shows the preference information generation process which the calculating part of a server performs. It is explanatory drawing which shows the example of a combination of a character classification and a preference. It is a flowchart which shows the operation character input process which the calculating part of a server performs. It is a flowchart which shows the other terminal utilization process which the calculating part of a server performs. It is a flowchart which shows the memory confirmation process which the calculating part of a server performs. It is a flowchart which shows the pronunciation determination process 1 which the calculating part of a server performs. It is a flowchart which shows the pronunciation determination process 2 which the calculating part of a server performs. It is a flowchart which shows the pronunciation determination process 3 which the calculating part of a server performs. It is a flowchart which shows the emotion determination process which the calculating part of a server performs. It is a flowchart which shows the emotion response production | generation process which the calculating part of a server performs. It is a flowchart which shows the guidance process which the calculating part of a server performs. It is a flowchart which shows the movement request process 1 which the calculating part of a server performs. It is a flowchart which shows the movement request process 2 which the calculating part of a server performs. It is a flowchart which shows the broadcast music supplement process which the calculating part of a server performs. It is a flowchart which shows the character commentary process which the calculating part of a server performs. It is a flowchart which shows the action response terminal process which the calculating part of a server performs. It is a flowchart which shows the action response server process which the calculating part of a server performs.

DESCRIPTION OF SYMBOLS 1 ... Terminal device, 10 ... Behavior sensor unit, 11 ... Dimensional acceleration sensor, 13 ... Axis gyro sensor, 15 ... Temperature sensor, 17 ... Humidity sensor, 19 ... Temperature sensor, 21 ... Humidity sensor, 23 ... Illuminance sensor, 25 ... Wet sensor 27 ... GPS receiver 29 ... Wind speed sensor 33 ... Electrocardiographic sensor 35 ... Heart sound sensor 37 ... Microphone 39 ... Memory 41 ... Camera 50 ... Communication unit 53 ... Wireless telephone unit 55 ... Contact memory, 60 ... notification unit, 61 ... display, 63 ... lighting, 65 ... speaker, 70 ... operation unit, 71 ... touch pad, 73 ... confirmation button, 75 ... fingerprint sensor, 77 ... relief request lever, 80 ... Communication base station, 85 ... Internet network, 90 ... Server, 100 ... Voice response system, 101 ... Calculation unit, 102 ... Voice recognition DB, 103 ... Predictive conversion DB 104 ... voice DB, 105 ... response candidate DB, 106 ... personality DB, 107 ... learning DB, 108 ... preference DB, 109 ... news DB, 110 ... weather DB, 111 ... reproduction condition DB, 112 ... handwritten character / sign language DB, 113 ... terminal information DB, 114 ... emotion judgment DB, 115 ... health judgment DB, 116 ... karaoke DB, 117 ... report destination DB, 118 ... sales DB, 119 ... client DB.

Embodiments according to the present invention will be described below with reference to the drawings.
[First Embodiment]
[Configuration of this embodiment]
The voice response system 100 to which the present invention is applied is a system configured to generate an appropriate response at the server 90 and output the response by voice at the terminal device 1 with respect to the voice input at the terminal device 1. It is. Specifically, as shown in FIG. 1, the voice response system 100 is configured such that a plurality of terminal devices 1 and a server 90 can communicate with each other via a communication base station 80 or an Internet network 85.

The server 90 has a function as a normal server device. In particular, the server 90 includes a calculation unit 101 and various databases (DB). The calculation unit 101 is configured as a well-known calculation device including a CPU and a memory such as a ROM and a RAM. Based on a program in the memory, the calculation unit 101 communicates with the terminal device 1 and the like via the Internet network 85, Various processes such as voice recognition and response generation for performing reading / writing of data in various DBs or conversation with a user using the terminal device 1 are performed.

As various DBs, as shown in FIG. 1, a speech recognition DB 102, a predictive conversion DB 103, a speech DB 104, a response candidate DB 105, a personality DB 106, a learning DB 107, a preference DB 108, a news DB 109, a weather DB 110, a reproduction condition DB 111, handwritten characters / sign language DB 112, terminal information DB 113, emotion determination DB 114, health determination DB 115, karaoke DB 116, report destination DB 117, sales DB 118, client DB 119, and the like. The details of these DBs will be described every time the processing is described.

Next, as illustrated in FIG. 2, the terminal device 1 includes a behavior sensor unit 10, a communication unit 50, a notification unit 60, and an operation unit 70 provided in a predetermined housing.
The behavior sensor unit 10 includes a well-known MPU 31 (microprocessor unit), a memory 39 such as a ROM and a RAM, and various sensors. The MPU 31 includes sensor elements that constitute various sensors to be inspected (humidity, wind speed, etc.). For example, processing such as driving a heater for optimizing the temperature of the sensor element is performed so that the detection can be performed satisfactorily.

The behavior sensor unit 10 includes, as various sensors, a three-dimensional acceleration sensor 11 (3DG sensor), a three-axis gyro sensor 13, a temperature sensor 15 disposed on the back surface of the housing, and humidity disposed on the back surface of the housing. The sensor 17, the temperature sensor 19 disposed on the front surface of the housing, the humidity sensor 21 disposed on the front surface of the housing, the illuminance sensor 23 disposed on the front surface of the housing, and the rear surface of the housing. A wetness sensor 25, a GPS receiver 27 that detects the current location of the terminal device 1, and a wind speed sensor 29.

The behavior sensor unit 10 also includes an electrocardiogram sensor 33, a heart sound sensor 35, a microphone 37, and a camera 41 as various sensors. The

temperature sensors

15 and 19 and the

humidity sensors

17 and 21 measure the temperature or humidity of the outside air of the housing as an inspection target.

The three-dimensional acceleration sensor 11 measures accelerations applied to the terminal device 1 in three orthogonal directions (vertical direction (Z direction), width direction of the casing (Y direction), and thickness direction of the casing (X direction)). Detect and output the detection result.

The three-axis gyro sensor 13 has an angular velocity applied to the terminal device 1 as a vertical direction (Z direction), two arbitrary directions orthogonal to the vertical direction (a width direction of the casing (Y direction), and a casing Angular acceleration (thickness direction (X direction)) (counterclockwise speed in each direction is positive) is detected, and the detection result is output.

The

temperature sensors

15 and 19 include, for example, a thermistor element whose electric resistance changes according to temperature. In this embodiment, the

temperature sensors

15 and 19 detect the Celsius temperature, and all temperature displays described in the following description are performed at the Celsius temperature.

The

humidity sensors

17 and 21 are configured as, for example, known polymer film humidity sensors. This polymer film humidity sensor is configured as a capacitor in which the amount of moisture contained in the polymer film changes in accordance with the change in relative humidity and the dielectric constant changes.

The illuminance sensor 23 is configured as a well-known illuminance sensor including a phototransistor, for example.
The wind speed sensor 29 is, for example, a well-known wind speed sensor, and calculates the wind speed from electric power (heat radiation amount) necessary for maintaining the heater temperature at a predetermined temperature.

The heart sound sensor 35 is configured as a vibration sensor that captures vibrations caused by the beat of the heart of the user. The MPU 31 considers the detection result of the heart sound sensor 35 and the heart sound input from the microphone 37, Distinguish between noise and other vibrations and noise.

The wetness sensor 25 detects water droplets on the surface of the housing, and the electrocardiographic sensor 33 detects the user's heartbeat.
The camera 41 is arranged in the casing of the terminal device 1 so that the outside of the terminal device 1 is an imaging range.

The communication unit 50 includes a well-known MPU 51, a wireless telephone unit 53, and a contact memory 55, and can acquire detection signals from various sensors constituting the behavior sensor unit 10 via an input / output interface (not shown). It is configured. And MPU51 of the communication part 50 performs the process according to the detection result by this behavior sensor unit 10, the input signal input via the operation part 70, and the program stored in ROM (illustration omitted).

Specifically, the MPU 51 of the communication unit 50 functions as an operation detection device that detects a specific operation performed by the user, a function as a positional relationship detection device that detects a positional relationship with the user, and is performed by the user. The function as an exercise load detection device for detecting the exercise load and the function of transmitting the processing result by the MPU 51 are executed.

The radio telephone unit 53 is configured to be able to communicate with, for example, a mobile phone base station, and the MPU 51 of the communication unit 50 outputs a processing result by the MPU 51 to the notification unit 60 or via the radio telephone unit 53. To a preset destination.

The contact address memory 55 functions as a storage area for storing location information of the user's visit destination. The contact address memory 55 stores information on contact information (such as a telephone number) to be contacted when an abnormality occurs in the user.

The notification unit 60 includes, for example, a display 61 configured as an LCD or an organic EL display, an electrical decoration 63 made of LEDs that can emit light in, for example, seven colors, and a speaker 65. Each part which comprises the alerting | reporting part 60 is drive-controlled by MPU51 of the communication part 50. FIG.

Next, the operation unit 70 includes a touch pad 71, a confirmation button 73, a fingerprint sensor 75, and a rescue request lever 77.
The touch pad 71 outputs a signal corresponding to the position and pressure touched by the user (user, user's guardian, etc.).

The confirmation button 73 is configured so that the contact of the built-in switch is closed when pressed by the user, and the communication unit 50 can detect that the confirmation button 73 is pressed. Yes.

The fingerprint sensor 75 is a well-known fingerprint sensor, and is configured to be able to read a fingerprint using, for example, an optical sensor. In addition, instead of the fingerprint sensor 75, for example, a means for recognizing a human physical feature such as a sensor for recognizing the shape of a palm vein (means capable of biometric authentication: identifying an individual) If it is a possible means), it can be adopted.

In addition, a rescue request lever 77 connected to a predetermined contact address when operated is also provided.
[Process of this embodiment]
Processing executed in the voice response system 100 will be described below.

The voice response terminal process performed in the terminal device 1 is a process of receiving voice input by the user, sending the voice to the server 90, and playing back the voice response when receiving a response to be output from the server 90. . This process is started when the user inputs a voice input via the operation unit 70.

In detail, as shown in FIG. 3, first, the input from the microphone 37 is accepted (ON state) (S2), and imaging (recording) by the camera 41 is started (S4). Then, it is determined whether or not there is a voice input (S6).

If there is no voice input (S6: NO), it is determined whether a time-out has occurred (S8). Here, the timeout indicates that the allowable time for waiting for processing has been exceeded, and here the allowable time is set to about 5 seconds, for example.

If time-out has occurred (S8: YES), the process proceeds to S30 described later. If the time has not expired (S8: NO), the process returns to S6.
If there is a voice input (S6: YES), the voice is recorded in the memory (S10), and it is determined whether or not the voice input is completed (S12). Here, it is determined that the input of the voice has ended when the voice has been interrupted for a certain period of time or when an input to end the voice input is made via the operation unit 70.

If the voice input is not completed (S12: NO), the process returns to S10. If the voice input has been completed (S12: YES), data such as an ID for identifying itself, a voice, and a captured image are packet-transmitted to the server 90 (S14). Note that the process of transmitting data may be performed between S10 and S12.

Subsequently, it is determined whether or not the data transmission is completed (S16). If transmission has not been completed (S16: NO), the process returns to S14.
If the transmission has been completed (S16: YES), it is determined whether or not data (packet) transmitted by the voice response server process described later has been received (S18). If no data has been received (S18: NO), it is determined whether or not a timeout has occurred (S20).

If time-out has occurred (S20: YES), the process proceeds to S30 described later. If the time has not expired (S20: NO), the process returns to S18.
If data has been received (S18: YES), a packet is received (S22). In this process, one in which one or a plurality of different responses to character information are associated with different voice colors is acquired.

Then, it is determined whether the reception is completed (S24). If reception has not been completed (S24: NO), it is determined whether or not a timeout has occurred (S26).
If timeout has occurred (S26: YES), the fact that an error has occurred is output via the notification unit 60, and the voice response terminal process is terminated. If the time has not expired (S26: NO), the process returns to S22.

If the reception is completed (S24: YES), a response based on the received packet is output from the speaker 65 by voice (S28). In this process, when a plurality of responses are reproduced, the plurality of responses are reproduced with different voice colors. When such processing ends, the voice response terminal processing ends.

Subsequently, the voice response server process performed by the server 90 (external device) will be described with reference to FIG. The voice response server process is a process of receiving voice from the terminal device 1, performing voice recognition for converting the voice into character information, and generating a response to the voice and returning it to the terminal device 1. In particular, in the present embodiment, a plurality of responses may be transmitted in association with different voice colors.

As the details of the voice response server process, as shown in FIG. 4, it is first determined whether or not a packet from any one of the terminal devices 1 has been received (S42). If no packet has been received (S42: NO), the process of S42 is repeated.

If a packet has been received (S42: YES), the communication partner terminal device 1 is specified (S44). In this process, the terminal device 1 is specified by the ID of the terminal device 1 included in the packet.

Subsequently, the voice included in the packet is recognized (S46). Here, in the speech recognition DB 102, many speech waveforms and many characters are associated with each other. In addition, the predictive conversion DB 103 is associated with a word that is likely to be used after a certain word.

Therefore, in this process, a known voice recognition process is performed by referring to the voice recognition DB 102 and the prediction conversion DB 103 to convert the voice into character information.
Subsequently, an object in the captured image is specified by performing image processing on the captured image (S48). Then, the user's emotion is determined based on the waveform of the voice or the ending of the word (S50).

In this process, the user is referred to by referring to the emotion determination DB 114 in which a speech waveform (voice color), a word ending, and the like are usually associated with emotion categories such as anger, joy, confusion, sadness, and elevation. It is determined whether or not the emotion falls in any category, and the determination result is recorded in the memory. Subsequently, by referring to the learning DB 107, a word often spoken by the user is searched, and a portion where the character information generated by the speech recognition is ambiguous is corrected.

In the learning DB 107, user features such as words often spoken by the user and habits during pronunciation are recorded for each user. Further, addition / correction of data to the learning DB 107 is performed in a conversation with the user.

Subsequently, the corrected character information is specified as the input character information (S54), and a response similar to the character information is retrieved from the response candidate DB 105 as an input to obtain a response from the response candidate DB 105 (S56). . Here, in the response candidate DB 105, as shown in FIG. 5, input character information, first output, first output voice color, second output, and second output voice color are uniquely associated.

For example, as shown in the first row of FIG. 5, when the text information “Today's * weather” is input, the first output “Today's * weather is *” will be the voice of woman 1. Output in association. However, the portion “*” is acquired by accessing the weather DB 110 in which the region name and the weather forecast for several days in the region are associated with each other.

In addition, when the text information “Today's weather *” is input, the weather at the time when today's weather changes is also acquired from the weather DB 110, and the second output “However, * is *.” 1 is output in association with the voice color. When today's Tokyo weather is sunny and tomorrow's weather is rainy, "Today's Tokyo weather" is entered, and the voice of woman 1 is output, "Today's Tokyo weather is sunny." The voice of man 1 will output “However, it will rain tomorrow.”

In this embodiment, the case where a plurality of responses are output has been described. However, when there is only one answer to the input, there is only one response. For this reason, it is determined whether there is one response (S58). If there is only one response (S58: YES), the process proceeds to S62 described later.

If there are a plurality of responses (S58: NO), the response contents are associated with the voice color (S60). Here, the voice DB 104 stores an artificial voice database for each voice color, and in this process, the voice color set for each response is associated with the voice color in the database.

Subsequently, the response content is converted into voice (S62). In this process, based on a database stored in the voice DB 104, a process for outputting response contents (character information) as a voice is performed.

Then, the generated response (voice) is packet-transmitted to the communication partner terminal device 1 (S64). Note that the packet may be transmitted while generating the voice of the response content.
Subsequently, the conversation content is recorded (S68). In this process, the input character information and the output response contents are recorded in the learning DB 107 as conversation contents. At this time, keywords (words recorded in the speech recognition DB 102) included in the conversation content, pronunciation characteristics, and the like are recorded in the learning DB 107.

When such processing ends, the voice response server processing ends.
[Effects of this embodiment]
The voice response system 100 described in detail above is a system that makes a response to inputted character information by voice, and the terminal device 1 (MPU 31) acquires a plurality of different responses to the character information, and Are output in different voice colors.

According to such a voice response system 100, since a plurality of responses can be output with different voice colors, different solutions are used with different voice colors even when a single answer to one character information cannot be specified. Can be output in an easy-to-understand manner. Therefore, it is possible to improve usability for the user.

Further, in the voice response system 100, the terminal device 1 inputs a voice by the user via the microphone 37, and the server 90 (calculation unit 101) converts the inputted voice into character information, A plurality of different responses are generated and transmitted to the terminal device 1. Then, the terminal device 1 acquires a response from the server 90.

According to such a voice response system 100, since the terminal device 1 can input voice, it can be configured to input character information by voice. Moreover, since it can be set as the structure which produces | generates a response in the server 90, the processing load in the voice response system 100 can be reduced.

Further, in the voice response system 100, the server 90 converts the voice of the user's utterance into character information, and accumulates utterances (such as pronunciation utterances) at the time of utterance as learning information (capturing the features and using the features). Record).

According to such a voice response system 100, the character information can be generated based on the learning information, so that the generation accuracy of the character information can be improved.
Further, in the voice response system 100, the server 90 reads the emotion from the voice color of the voice input by the user, and usually includes at least one of anger, joy, confusion, sadness, and emotion. , Which emotion is applicable is output.

According to such a voice response system 100, a response can be output according to the user's emotion.
[Modification of First Embodiment]
In the present embodiment, voice recognition is used as a configuration for inputting character information. However, the present invention is not limited to voice recognition, and may be input using input means (operation unit 70) such as a keyboard or a touch panel. Moreover, although the operation | movement which "converts the input audio | voice into character information" was performed in the server 90, you may carry out with the terminal device 1. FIG.

Further, in the voice response system 100, the server 90 includes a response candidate DB 105 in which a plurality of different responses including a positive response and a negative response for each character information are recorded for each of the plurality of character information. The terminal device 1 may acquire a positive response and a negative response as a plurality of different responses, and reproduce the voices with different voices according to the positive response and the negative response.

For example, as shown in the second row shown in FIG. 5, when a voice saying “Can I buy something” is input, positive information such as good reputation is associated with the female voice for this thing. Output. On the other hand, negative information such as bad reputation is output in a voice color (in this case, a male voice) different from the female voice associated with the positive information.

According to the voice response system 100 as described above, responses of different positions such as a positive response and a negative response can be reproduced with different voice colors, so that a voice is reproduced as if another person is speaking. Can do. Therefore, it is possible to make it difficult for the user who listens to the voice to feel uncomfortable.

Furthermore, in the voice response system 100, a response (for example, a positive response or a negative response) output from the terminal device 1 or another terminal device 1 is input as character information, and a response to this response is made. The response may be generated. In other words, from the user's point of view, it is possible to hear discussions based on both opinions in favor and in opposition. And after hearing this discussion, the user can make a final decision.

This configuration can be realized using one or a plurality of terminal devices 1. In order for the plurality of terminal devices 1 to exchange voices with each other, voices may be directly input / output, or wireless communication or the like may be used. When a plurality of terminal devices 1 communicate with the server 90, data may be transmitted to other terminal devices 1 in the process of S66.

Further, in the voice response system 100, the calculation unit 101 learns (records and analyzes) the user's behavior (conversation, the place where the user moved, and what is reflected in the camera) to compensate for the lack of words in the user's conversation. You may do it.

Further, such a configuration can be implemented during a telephone call, or may be configured to participate in a user's conversation without permission.
Furthermore, in the voice response system 100, the server 90 may acquire response candidates from a predetermined server or the Internet.

According to such a voice response system 100, response candidates can be acquired not only from the server 90 but also from any device connected via the Internet, a dedicated line, or the like.
[Second Embodiment]
[Process of Second Embodiment]
Next, another type of voice response system will be described. This embodiment (second embodiment) In the following embodiment, only the parts different from the voice response system 100 of the first embodiment will be described in detail, and the same parts as the voice response system 100 of the first embodiment will be the same. The description is abbreviate | omitted and attached | subjected.

The voice response system according to the second embodiment outputs voice even when the user does not input character information. Specifically, the terminal device 1 performs the automatic conversation terminal process shown in FIG. The automatic conversation terminal process is a process that is started when the terminal device 1 is turned on, for example, and is repeatedly executed thereafter.

In the automatic conversation terminal process, first, it is determined whether or not the setting for performing an automatic conversation is ON (S82). Whether or not to perform an automatic conversation can be set by the user via the operation unit 70 or by inputting voice.

If the automatic conversation is OFF (S82: NO), the automatic conversation terminal processing is terminated. If automatic conversation is ON (S82: YES), the fact that the automatic conversation mode is set is transmitted to the server 90 together with an ID for identifying itself (S84).

Subsequently, it is determined whether or not a packet is received from the server 90 (S86). If no packet has been received (S86: NO), the process of S86 is repeated. If a packet has been received (S86: YES), the same processes as in S22 to S30 described above are performed, and when these processes are completed, the automatic conversation terminal process is terminated.

Further, the server 90 executes the automatic conversation server process shown in FIG. The automatic conversation server process is a process that is started when the server 90 is turned on, for example, and then repeatedly executed.

In the automatic conversation server processing, first, it is determined whether or not the fact that the automatic conversation mode is set is received from the terminal device 1 (S92). If it is not received that the automatic conversation mode is set (S92: NO), the process proceeds to S98.

If the fact that the automatic conversation mode is set is received (S92: YES), the terminal device 1 to be a communication partner is specified based on the ID included in the received packet (S94), The automatic conversation is set (S96). Subsequently, it is determined whether or not the reproduction condition is satisfied for each of the terminal devices 1 that are set to have automatic conversation (S98).

Here, the playback condition is, for example, that a certain time has elapsed since the previous conversation (speech input), a certain time of the day, or a specific weather, and any sensor value is abnormal. When the value is shown.

If the playback condition is not satisfied (S98: NO), the automatic conversation server process is terminated. If the reproduction condition is satisfied (S98: YES), a message corresponding to the reproduction condition is generated (S100).

Here, the latest and the message in accordance with the playback conditions are, for example, "Good morning." Or "Hello." May be a fixed sentence such as, obtained from the news DB109 the latest news is automatically updated It may be related to news. For example, if you want to get information about the latest news, for example, if you can get information about the stock price of a certain company, "Today's stock price of XX company has increased by XX yen. Did you know? Or the like.

When this process is completed, the processes of S42 to S54 described above are performed. Then, when the processing of S54 is completed, it is determined whether or not a predetermined answer has been obtained from the terminal device 1 that is the communication partner (S112). Here, the predetermined answer may be, for example, some voice or a specific answer. For example, for the question “Do you know?” For example, the answer “Do you know” or “Do not know” corresponds to the question “Do you know the weather?” On the other hand, those including words indicating weather such as “rainy” or “sunny” are applicable.

If there is a predetermined answer (S112: YES), the automatic conversation server process is terminated. If there is no predetermined answer (S112: NO), the message transmitted in S100 is retransmitted (S114). When the message is retransmitted in this way, the voice color is changed to generate voice with strong vocabulary and severe tone.

Subsequently, referring to the report destination DB 117 in which the terminal device 1 and the report destination are associated with each other in advance, the fact that there is no answer to the predetermined report destination is transmitted (S116). When such processing ends, the automatic conversation server processing ends.

[Effects of Second Embodiment]
In the voice response system 100 described above, when no character information is input, the server 90 determines whether or not the situation of the voice response system 100 matches a playback condition set in advance as a condition for outputting voice. When the reproduction condition is met, a preset message is output.

According to such a voice response system 100, it is possible to output voice even when character information is not input (that is, when the user does not speak). For example, by forcing the user to speak, it can be used as a measure for suppressing drowsiness while driving a car. Moreover, safety confirmation can be performed by determining whether a person living alone responds.

Further, in the voice response system 100, the server 90 acquires news information and outputs a message related to the news in a question format for asking a user's answer.
According to such a voice response system 100, since it is possible to have a conversation about news, it can be suppressed that the conversation is always the same.

Further, in the voice response system 100, the server 90 adds and outputs externally acquired information (news and environment (temperature, weather, location information, etc.)) acquired separately to a preset message.

According to such a voice response system 100, a response in which a predetermined message and the acquired information are combined can be output.
Further, in the voice response system 100, when the response to the response or message cannot be obtained, the server 90 informs the preset contact information that the user has not been obtained and that the response has not been obtained. Send.

According to such a voice response system 100, it is possible to notify a contact person when an answer cannot be obtained. Therefore, for example, an abnormality such as an elderly person living alone can be notified early.

[Modification of Second Embodiment]
Further, in the voice response system 100, the server 90 may acquire a plurality of messages, and select and output a message to be reproduced according to the reproduction frequency of the message.

According to such a voice response system 100, it is difficult to reproduce a message having a high reproduction frequency, thereby achieving randomness at the time of message reproduction, or repetitively reproducing a message having a high reproduction frequency to call attention or store the message. Or can be promoted.

[Third Embodiment]
[Process of Third Embodiment]
Next, in the voice response system according to the third embodiment, the terminal device 1 notifies the user that it is difficult to tell the user directly. For example, if you talk to this device today to say something like this before dating, at an appropriate time (for example, when a preset time or a certain time has passed since the conversation was interrupted) The voice response system 100 speaks instead (plays voice).

Specifically, the terminal device 1 performs the message terminal process shown in FIG. 8, and the server 90 performs the message server process shown in FIG. The message terminal process is a process that is started when the terminal device 1 is turned on, for example, and then repeatedly executed.

In the message terminal process, as shown in FIG. 8, it is first determined whether or not the message mode is set by the user (S132). If the message mode is not set (S132: NO), the process of S132 is repeated.

If the message mode is set (S132: YES), the processing of S2 to S8 is performed. If the determination in S6 is affirmative, the message mode flag is turned on in the memory of the terminal device 1. (S134). Then, the processes of S10 to S16 are performed.

If an affirmative determination is made in S16, it is determined whether or not a packet from the server 90 has been received (S136). If no packet is received (S136: NO), the process of S136 is repeated. If a packet has been received (S136: YES), the processing of S24 to S30 is performed, and the message terminal processing is terminated.

Next, the message server process is a process that starts when the server 90 is powered on, for example, and is repeatedly executed thereafter. Specifically, first, it is determined whether or not a packet is received from any one of the terminal devices 1 (S142). If no packet has been received (S142: NO), the process proceeds to S156 described later.

If a packet is received (S142: YES), the communication partner terminal device 1 is specified (S44), and it is determined whether or not the packet includes a mode flag such as a message mode flag (S144). . If there is no mode flag (S144: NO), the process proceeds to S148.

If there is a mode flag (S144: YES), the server 90 also sets the mode by setting the flag corresponding to the terminal device 1 of the communication partner to the ON state (S146). For example, if the message mode flag corresponds to the message mode, the processing of S46 to S152 described later is performed. If the guidance mode flag described later corresponds to the guidance mode, S46 to S176 (see FIG. 11) is performed. Will be.

Subsequently, it is determined whether or not the message flag is ON (S148). If the message flag is ON (S148: YES), the processing of S46 to S54 is performed, and then message reproduction conditions are extracted (S150).

Here, the message reproduction condition can be set in advance by the user via the operation unit 70 of the terminal device 1, and corresponds to, for example, time and position. The message reproduction condition is transmitted to the server 90 at the time of packet transmission for message terminal processing.

Subsequently, the message and voice (voice color) are associated with each other and recorded in the memory (S152), and the process proceeds to S156. If the message flag is OFF (S148: NO), processing relating to another mode is performed (S154), and it is determined whether or not the playback timing has come (S156). Here, the reproduction timing indicates contents set in the message reproduction condition.

If it is not the reproduction timing (S156: NO), the message server process is immediately terminated. If it is the reproduction timing (S156: YES), the processing of S62 to S64 is performed, and the message server processing is terminated.

[Effects of Third Embodiment]
According to the voice response system of the third embodiment, the voice input by the user is not played back immediately, but can be played back when a message playback condition is satisfied after a certain time.

For example, as shown in the third row of FIG. 5, if you enter “Please tell Mr. XX to XX”, the sentence you want to convey will be recognized after Mr. XX ’s voice is recognized (heard). Will be played.

[Modification of Third Embodiment]
In the third embodiment, the content spoken by the user is reproduced. However, a word that triggers a difficult thing to say, for example, "Sorry, did you say something to her?" It may be configured to speak such words. Specifically, the terminal device 1 performs the guidance terminal process shown in FIG. 10, and the server 90 performs the guidance server process shown in FIG.

The guidance terminal process is a process that is started when the terminal device 1 is turned on, for example, and then repeatedly executed. For example, this is a process that is started when the terminal device 1 is powered on and then repeatedly executed.

In the guidance terminal process, as shown in FIG. 10, it is first determined whether or not the guidance mode is set by the user (S162). If the guidance mode is not set (S162: NO), the process of S162 is repeated.

If the guidance mode is set (S162: YES), the processing of S2 to S8 is performed. If the determination in S6 is affirmative, the guidance mode flag is turned on in the memory of the terminal device 1. (S164). Then, the processes of S10 to S16 are performed.

If an affirmative determination is made in S16, it is determined whether a packet from the server 90 has been received (S166). If no packet has been received (S166: NO), the process of S166 is repeated. If a packet has been received (S166: YES), the processing of S24 to S30 is performed, and the guidance terminal processing is terminated.

Next, the guidance server process is a process that is started, for example, when the server 90 is turned on and then repeatedly executed. In detail, the processes of S142 to S146 described above are executed. Then, it is determined whether or not the guidance flag is in an ON state (S172).

If the guidance flag is in the ON state (S172: YES), the processing of S46 to S54 is performed, and then the guidance regeneration condition is extracted (S174).
Here, similarly to the message reproduction condition, the guidance reproduction condition can be set in advance by the user via the operation unit 70 of the terminal device 1, and corresponds to, for example, time and position. The guided reproduction condition is transmitted to the server 90 at the time of packet transmission for message terminal processing.

Subsequently, the guidance content is generated, and the guidance content and voice (voice color) are associated with each other and recorded in the memory (S176). Here, as the guidance content, for example, a word representing a desire such as “I want to” or “hope” included in the input character information is searched, keywords before these words are extracted, and these keywords are induced. The words registered as words are output as guidance contents. The keyword and the word indicating the guidance content are associated with each other in advance and recorded in the response candidate DB 105.

Subsequently, the processing from S156 described above is performed, and the server processing is terminated. If the guidance flag is in the OFF state (S172: NO), the process related to the other mode is performed (S154), the process of S156 and subsequent steps is performed, and the server process is terminated.

According to the configuration of the modified example of the third embodiment, it is possible to guide the user to speak the desired word, instead of directly outputting the desired word.
[Fourth Embodiment]
[Process of Fourth Embodiment]
Next, an example in which the terminal device 1 is used for reception work will be described. In the present embodiment, the terminal device 1 is installed at a company reception or the like. It can also be used for telephone reception for company representative telephones and telephone banking. Here, in this embodiment, it implement | achieves by replacing the process of S56 in 1st Embodiment with the reception process shown in FIG.

In the acceptance process, as shown in FIG. 12, it is first determined whether or not the company name is included in the character information (S192). In this processing, it is determined whether or not a general name or company name (recorded in the voice recognition DB 102) is included.

If the company name or personal name is not included in the character information (S192: YES), a response for asking the company name and personal name is generated (S194), and the reception process is terminated. In this process, for example, a response such as “Please tell us your name and business” is generated.

If the company name or personal name is included in the character information (S192: NO), the company name or personal name is extracted from the sales DB 118 and the client DB 119 (S196). Here, in the sales DB 118, the name of the company and the person in charge who came to the sales in the past, or the name of the Kramer who only talks about complaints are recorded. Further, in the client DB 119, a company name, a person in charge of the company, a person in charge on the user side (in-house side) of the terminal device 1, a schedule such as a scheduled visit time, and a contact address are recorded in association with each person in charge. ing.

Subsequently, it is determined whether or not the company name and personal name can be extracted from the sales DB 118, that is, whether or not the company name and personal name included in the character information are included in the sales DB 118 (S198). If the company name and the individual name can be extracted from the sales DB 118 (S198: YES), a sales rejection response (response to reject the agency) is generated (S200), and the reception process is terminated.

If the company name or personal name cannot be extracted from the sales DB 118 (S198: NO), the person who has come to the reception visits the schedule in the client DB 119 at a close time (for example, within 1 hour before and after the current time). It is determined whether or not it is a person who comes (S202). If the person is visiting at a close time (S202: YES), the contact information of the person in charge of this person is extracted from the client DB 119 so that the person in charge and the person who has come to the reception can have a conversation. The person in charge is connected (S204). In this process, it is only necessary to connect to the extension telephone of the person in charge, a mobile phone or the like.

Subsequently, an acceptance response for the client is generated (S206). Here, as a response for the client, for example, a response such as “Thank you XXX, please wait for a while as you are connected to the person in charge” is generated. When such a process ends, the acceptance process ends.

Also, if the person is not visiting at a close time (S202: NO), this person is connected to a preset contact for reception so that this person in charge and the person who has come to the reception can have a conversation. The person in charge is connected (S208). Then, a normal acceptance response is generated (S210).

Here, as the normal reception response, for example, a response such as “Please wait for a while because it is connected to reception” is generated. When such a process ends, the acceptance process ends.

[Effects of Fourth Embodiment]
The voice response system 100 is configured to be used at a workplace or company reception. In this configuration, the name and company name of the person coming to the sales are recorded in advance in the sales DB 118 of the server 90. Generate a response to play.

In the voice response system 100, the server 90 identifies a communication partner based on the input character information, and connects a communication destination set in advance for each communication partner and the communication partner.
According to such a voice response system 100, it is possible to assist reception work and telephone support. Moreover, according to such a voice response system 100, it is possible to eliminate a person who may interfere with a user's business without dealing with it.

Further, in the voice response system 100, the server 90 extracts a keyword included in the input character information (particularly voice) and connects to a connection destination corresponding to the keyword. For example, a keyword such as the name of the other party is associated with the connection destination in advance.

According to such a voice response system 100, it is possible to assist operations such as telephone transfer and call reception.
[Modification of Fourth Embodiment]
In the above embodiment, the connection destination is set according to the other party. However, by applying this technology, for example, in telephone reception such as telephone banking and telephone shopping, requirements (keywords included in the character information) are set. It may be recognized and the connection destination may be changed according to the requirements.

Further, in the voice response system 100, the server 90 may recognize the requirement of the other party to speak based on the keyword, and transmit the outline of the other party to the user.
According to the voice response system 100 as described above, it is possible to assist an intermediary service with a customer.

[Fifth Embodiment]
[Process of Fifth Embodiment]
Next, in response to a request from another terminal device 1, the terminal device 1 may provide information requested by the other terminal device 1.

In such a configuration, the server 90 requests other terminal device 1 for necessary information in the process of S56, and generates a response after obtaining necessary information from the other terminal device 1. And in the terminal device 1 which provides required information, the information provision terminal process shown in FIG. 13 is implemented. The information providing terminal process is a process that is started when there is a request from the server 90, for example.

In the information providing terminal process, as shown in FIG. 13, first, an information providing destination is extracted (S222). The information providing destination indicates another terminal device 1 that requests information, and an ID for specifying the other terminal device 1 is included in the request from the server 90.

Subsequently, it is determined whether or not the other party is permitted to provide information (S224). Here, in the terminal information DB 113, IDs of other parties who are permitted to provide information, such as family members and friends, are recorded in advance. In this process, determination is performed by referring to this terminal information DB 113.

If the partner is permitted to provide information (S224: YES), the requested information is acquired from its own memory 39 or various sensors (S226), and this data is transmitted to the server 90 (S228). If it is not the other party who permits the provision of information (S224: NO), the server 90 is notified that the provision of information is rejected (S230).

When such processing ends, the information provision terminal processing ends.
In this configuration, for example, as shown in the fourth row of FIG. 5, the server 90 requests location information from Mr. XX's terminal device 1 in response to the question “What is Mr. XX doing?” The terminal device 1 returns position information.

Then, the server 90 recognizes the action of Mr. XX based on the position information. For example, if you are moving on the track at a speed faster than the speed of humans, it is judged that you are moving on a train, and “Mr. XX is on the train. And generate a response.

[Effects of Fifth Embodiment]
In the voice response system 100, the server 90 acquires information recorded in the other terminal device 1 from another terminal device 1 different from the requesting terminal device 1 and provides the information to the other terminal device 1. That is, in the voice response system 100, the server 90 acquires information for generating a response to the character information from the other terminal device 1.

According to such a voice response system 100, a response can be generated based on information recorded in another terminal device 1.
In the voice response system 100, when the terminal device 1 requests information for generating a response to the character information from another terminal device 1, the terminal device 1 returns information corresponding to the request.

In this configuration, the terminal device 1 includes sensors for detecting position information, temperature, humidity, illuminance, noise level, and a database such as dictionary information, and extracts necessary information as required.

According to such a voice response system 100, information unique to the other terminal device 1 such as the position of the other terminal device 1 can be acquired. In addition, information unique to itself can be transmitted to another terminal device 1.

[Sixth Embodiment]
[Process of Sixth Embodiment]
Next, in the voice response system according to the sixth embodiment, a personality DB 106 is prepared, in which personality information that associates personalities of users or persons who are related to the users according to preset categories is recorded. ing. For example, as shown in FIG. 14, the personality DB 106 records the names of users and parties and the personality classifications of these persons in association with each other.

In the personality DB 106 shown in FIG. 14, a personality test is performed on users and related parties, and the test results are also recorded. Here, when generating the personality information, a known personality analysis technique (Rorschach test, Sondy test, etc.) may be used. In addition, when generating personality information, aptitude inspection technology used for employment tests by companies and the like may be used.

When generating personality information, for example, personality information generation processing shown in FIG. 15 is performed. The personality information generation process is a process that starts when, for example, the terminal device 1 is input to generate personality information using the operation unit 70 or the like.

In the personality information generation process, as shown in FIG. 15, first, the microphone 37 is turned on (S242), and one of the predetermined four-choice questions is output by voice (S244). At this time, the four-choice question may be acquired from the server 90, or a problem recorded in advance in the memory 39 may be asked.

Subsequently, it is determined whether or not there is a voice response from the target person (user or related person) (S246). If there is no answer (S246: NO), the process of S246 is repeated.
If there is an answer (S246: YES), conversation parameters such as word ending and conversation speed are extracted (S248), and it is determined whether or not the current problem is the final problem (S250). If it is not the final problem (S250: NO), the next problem is selected (S252), and the process returns to S242.

If it is a final question (S250: YES), a personality analysis is performed by answering a four-choice question (S254), and a personality analysis is performed using conversation parameters (S256). Here, in the personality analysis based on conversation parameters, those who are confident in themselves tend to have a strong ending, those who are not confident tend to have a weak ending, and those who are impatient have a fast conversation speed, those who are quiet have a slow conversation speed Trends can be captured.

Subsequently, these personality analysis results are comprehensively analyzed, such as weighted average (S258), and assigned to personality categories (S260). Specifically, the personality of the subject obtained through the test is scored and assigned to the personality classification for each score.

Subsequently, the target person and the personality classification are associated (S262) and recorded in the personality DB 106 (S264). That is, the relationship between the target person and the personality classification is transmitted to the server 90. At this time, the test result is also transmitted to the server 90, and the server 90 constructs the personality DB 106 as shown in FIG. When such processing ends, the personality information generation processing ends.

When using the personality DB 106 thus generated, a response candidate DB 105 is prepared in which personality classifications and responses different from each other are associated with each other. The server 90 acquires response candidates representing a plurality of different responses to the character information in the process of S56, selects a response to be output from the response candidates according to the personality information, and in the processes of S60 and S64, Output the selected response.

[Effects of Sixth Embodiment]
In the voice response system 100, the terminal device 1 generates personality information of a user or a person concerned based on answers to a plurality of preset questions, and acquires the generated personality information.

According to such a voice response system 100, personality information can be generated in the server 90 or the terminal device 1.
Further, in the voice response system 100, the calculation unit 101 generates personality information of the user or related person based on the character string included in the input character information.

According to such a voice response system 100, personality information can be generated in the process in which the user uses the voice response system 100.
Moreover, according to such a voice response system 100, a different response can be performed according to the character of the user or a person related to the user (related person). Therefore, usability can be improved for the user.

[Modification of Sixth Embodiment]
In the sixth embodiment, the response may be output after being narrowed down to one according to the personality, or the voices of different voice colors may be associated with the plurality of responses and output.

Of the personality information generation processing, the processing of S248 and S254 to S264 may be performed by the server 90. In this case, as in the first embodiment, the voice and problem may be exchanged between the terminal device 1 and the server 90 while allowing the server 90 to identify the terminal device 1.

Furthermore, in the voice response system 100, the server 90 may detect any one of the user's actions and operations, and generate learning information or personality information based on these.

According to such a voice response system 100, for example, when it is detected that the user jumps on the train for several consecutive days, the user is prompted to leave the house several minutes earlier from the next day, or the user is angry from the conversation. When it is detected that there is a tendency to be easy, it is possible to output voice and music to suppress mood.

[Seventh Embodiment]
[Process of Seventh Embodiment]
Next, in the voice response system according to the seventh embodiment, a preference DB 108 is prepared in which preference information in which preferences of users and parties are associated in accordance with preset categories is recorded. For example, as shown in FIG. 16, the preference DB 108 stores the names of users and related parties and the preference of these persons as the type of preference such as food preference (food), color preference (color), hobby, etc. Are recorded in association with each other.

In particular, as for food preferences, sweet taste (sweet), spicy taste (spicy), the middle level, and for color tastes, warm color (warm), cool color (cold), middle order, hobby Are classified into indoor hobbies (inside), outdoor hobbies (outside), and both indoor and outdoor hobbies (inside and outside).

When constructing such a preference DB 108, for example, preference information generation processing shown in FIG. 17 is executed. The preference information generation process is performed between S48 and S54, for example.
Specifically, as shown in FIG. 17, keywords relating to preference are extracted from character information (S282), and among objects identified by image processing, those relating to preference are extracted (S284). In addition, in the preference DB 108, the preference-related keywords and the classifications within the types (category sweet, average, hot, etc.) are associated with each other in the preference DB 108, and are extracted in these processes. When keywords and objects are included in the preference DB 108, they are extracted as preferences.

Subsequently, the counter is incremented for each keyword group related to preference (S288). For example, when the type of preference is “food preference” and the type is “spicy”, such as kimchi, the counter corresponding to “food preference” and “spicy” is incremented. .

Then, the preference information (preference DB 108) is updated based on the counter value (S290). That is, for each “preference type”, the “type” with the largest counter value is recorded in the preference DB 108 as the preference feature of the user or the person concerned as the best match. When such processing ends, the preference information generation processing ends.

When the preference DB 108 generated in this way is used, the response candidate DB 105 prepares a response in which different responses are associated with each preference, and the server 90 performs a plurality of different character information items in the process of S56. A response candidate representing the response is acquired, a response to be output from the response candidate is selected according to the preference information, and the selected response is output in the processing of S60 and S64.

[Effects of the seventh embodiment]
In the voice response system 100, the server 90 generates preference information indicating a preference tendency of a user or a person concerned based on a character string included in the character information. Then, a response to be output from the response candidates is selected based on the preference information, and the selected response is output.

According to such a voice response system 100, a response can be made according to the preference of the user or the person concerned. For example, when a user asks the terminal device 1 “What do you want Mr. XX?” When buying a present of a related person, a response according to the preference information can be obtained.

[Modification of the seventh embodiment]
As shown in FIG. 18, the response candidate DB 105 may have a table in which personality classifications and preference information are associated with each other.

For example, in the example shown in FIG. 18, personality classifications and color preferences are associated with each other, and products that can be estimated to be happy when a woman gives a present as a present are arranged in a matrix.
In the process of S56, a response can be generated in consideration of both personality and preference.

[Eighth Embodiment]
[Process of Eighth Embodiment]
In the above embodiment, the voice is converted into the character information, but the operation by the user may be converted into the character information.

Specifically, the terminal device 1 captures the user's action as a captured image and transmits it to the server 90. The server 90 may perform, for example, an action character input process shown in FIG. The action character input process is a process started when a part of the user's body is reflected in the captured image in the process of S48.

In the action character input process, as shown in FIG. 19, first, a captured image is acquired (S302). Then, it is determined whether the user intends to input characters by handwriting or to input characters by sign language (S304, S308).

In these processes, for example, when the user's upper body is reflected in the captured image together with the face, it is determined that the user is trying to input characters in sign language, and the user's face is not reflected in the captured image. When the hand is reflected, it is determined that the user is trying to input characters by handwriting.

If an attempt is made to input characters by handwriting (S304: YES), the behavior of the fingertip or pen tip is recorded (S306), and the behavior is converted into character information based on this behavior (S312). Here, in the handwritten character / sign language DB 112, a behavior when handwriting a character and a character are associated with each other, and a hand motion and a character expressed by sign language are associated with each other. In the process of S312, character information is generated by referring to the handwritten character / sign language DB 112.

If a character is to be input in sign language (S304: NO, S308: YES), the sign language content is recognized with reference to the handwritten character / sign language DB 112, and the process of S312 described above is performed. Moreover, if it is not going to input a character by handwriting or sign language (S308: NO), the input process by another system will be performed (S314).

Subsequently, the character input by the operation is associated with the character input by the speech, and whether there is a similar sound (whether the matching degree between the reference waveform based on the character and the pronunciation waveform is equal to or higher than the reference value). Is determined (S316). If there is such a voice input (S316: YES), the accent and pronunciation characteristics when the user inputs this character are recorded in the learning DB 107 in association with the character (S318), and the action character input process is performed. Exit.

If there is no such voice input (S316: NO), the operation character input process is terminated.
[Effects of Eighth Embodiment]
In the voice response system 100, since the operation by the user is converted into character information, the user can input the character information without making a voice.

[Modification of Eighth Embodiment]
The operation according to the present embodiment is not limited to handwriting of characters or gesture gestures (for example, sign language), but may be any operation that is caused by a muscle operation.

[Ninth Embodiment]
[Process of Ninth Embodiment]
The contents of the learning DB 107 may be used in the other terminal device 1 when the user uses another terminal device 1 different from the terminal device 1 that the user normally uses. In this case, the ID and password of the terminal device 1 that is normally used are transmitted from the other terminal device 1 to the server 90 together with the usage request.

Then, the server 90 executes the other terminal use process shown in FIG. The other terminal use process is a process started when a use request is received.
In the other terminal use process, as shown in FIG. 20, it is first determined whether or not an ID and a password have been input (S332). If the ID and password are not input (S332: NO), the process of S332 is repeated.

If the ID and password are input (S332: YES), it is determined whether or not the authentication using the ID and password is completed (S334). If the authentication is completed (S334: YES), the fact that the authentication is complete is transmitted to the other terminal device 1 (S336), and the other terminal device 1 stores the learning DB 107 of the terminal device 1 corresponding to the ID and password. Setting to use is made (S338).

If the authentication is not completed (S334: NO), an error is transmitted to the other terminal device 1 (S340), and the other terminal use process is terminated.
[Effects of Ninth Embodiment]
In the voice response system 100, the server 90 transfers learning information of a certain terminal device 1 to another terminal device 1.

According to such a voice response system 100, even when a user who uses a certain terminal device 1 uses another terminal device 1, learning information recorded in a certain terminal device 1 (recorded in the server 90). Learning information). Therefore, even when other terminal devices 1 are used, the generation accuracy of character information can be improved. This is particularly effective when the user has a plurality of terminal devices 1.

Further, in the voice response system 100, the server 90 outputs information about the user in response to an inquiry from a person other than the user.
According to such a voice response system 100, for example, if the distance of a walk such as a user's meal content is detected, a question in a hospital or the like can be answered on behalf of the user. Moreover, you may be allowed to learn about health conditions and self-introduction.

[Modification of Ninth Embodiment]
Similar to the configuration of the ninth embodiment, when the request to end use and the ID and password are received, the use of the learning DB 107 for the terminal device 1 corresponding to the ID and password may be ended (prohibited).

[Tenth embodiment]
[Process of Tenth Embodiment]
In the voice response system of the tenth embodiment, the server 90 stores conversation contents and asks questions for obtaining the same contents about the heard contents. Specifically, the storage confirmation process shown in FIG. 21 is executed in S100 of the automatic conversation server process shown in FIG.

In the memory confirmation process, as shown in FIG. 21, the past conversation content is extracted from the learning DB 107 (S352), and a question with the keyword included in any of the conversation content as an answer is generated (S353). When such processing ends, the storage confirmation processing ends.

In the memory confirmation process, for example, “What was the menu for yesterday's dinner” or “Where did you go three days ago”?
[Effects of Tenth Embodiment]
According to such a voice response system 100, the user's memory ability can be confirmed and the memory can be fixed. It is also considered effective in suppressing the progression of dementia in the elderly.

[Eleventh embodiment]
[Process of Eleventh Embodiment]
Next, the voice response system according to the eleventh embodiment is configured such that a user can practice a foreign language using the terminal device 1 and the server 90.

Specifically, the sound generation determination process 1 shown in FIG. 22, the sound generation determination process 2 shown in FIG. 23, and the sound generation determination process 3 shown in FIG. 24 are executed in order. However, the server 90 executes one of the sound generation determination processes 1 to 3 each time the voice response server process (FIG. 2) is performed. Each of the sound generation determination processes 1 to 3 is executed as the process of S56 described above.

First, in the pronunciation determination process 1, as shown in FIG. 22, a response to instruct to input a predetermined sentence by voice is generated (S362). In this process, for example, a sentence serving as a model for a foreign language is generated, and the sentence is prompted to imitate following the model. When this process ends, the sound generation determination process 1 ends.

Next, when a sound is input along with the sound generation determination process 1, the sound generation determination process 2 is performed. In the pronunciation determination process 2, as shown in FIG. 23, the accuracy of pronunciation and accent is scored (score) (S372). In this process, the speech is regarded as a waveform, and the degree of coincidence of the waveform with the case where the text as an example is used as a waveform is scored.

Then, this score is recorded in the memory (S374), and the pronunciation determination process 2 is terminated. Subsequently, pronunciation determination processing 3 is performed. In the pronunciation determination process 3, as shown in FIG. 24, first, it is determined whether or not the score is less than a threshold value (S382).

If the score is less than the threshold (S382: YES), a response to instruct to input the same sentence is generated again (S384). In this process, for example, a response for prompting the user to speak after imitating the model is generated again.

If the score is equal to or greater than the threshold (S382: NO), a response that prompts the user to input the next sentence and the fact that the pronunciation is good is generated (S386). For example, it generates a response such as "Good pronunciation. Let's move on."

When such processing ends, the sound generation determination processing 3 ends.
[Effects of the eleventh embodiment]
In the voice response system 100, the server 90 detects the accuracy of voice pronunciation and accent input by the user, and outputs the detected accuracy.

According to such a voice response system 100, the accuracy of pronunciation and accent can be confirmed. For example, it is effective when practicing a foreign language.
Furthermore, in the voice response system 100, the server 90 causes the same question to be output again when the accuracy is a predetermined value or less.

According to such a voice response system 100, an accurate answer can be obtained by outputting the same question.
[Modification of Eleventh Embodiment]
In the voice response system 100, the server 90 may output a voice including a word closest to the pronunciation made by the user for confirmation when the accuracy is equal to or less than a predetermined value.

According to such a voice response system 100, the user can confirm the accuracy of pronunciation and accent.
[Twelfth embodiment]
[Process of 12th Embodiment]
Next, a voice response system according to the twelfth embodiment will be described. In the voice response system according to the twelfth embodiment, the user's emotion is detected from the voice input by the user, and a response that heals the user is generated according to the emotion.

Specifically, the emotion determination process shown in FIG. 25 and the emotion response generation process shown in FIG. 26 are executed. The emotion determination process is performed as the details of the process of S50 described above. As shown in FIG. 25, first, as shown in FIG. Then, the emotions are classified by the score and recorded in the memory (S394).

When such processing ends, the emotion determination processing ends. Subsequently, an emotion response generation process is executed in the process of S56 described above.
Specifically, as shown in FIG. 26, first, the emotion classification set in the emotion determination process is determined (S412). If the emotion category is a normal (S412: Normal), generated as a response (message) an ordinary greeting such as "Hello" (S414).

Also, if the emotion classification is anger (S412: anger), a sentence for calming the opponent's emotion, such as “Is it bothered you”, is generated as a response (S416). Further, if the emotion classification is joy (S412: joy), a bright nuance greeting is generated as a response compared to an ordinary greeting such as “It is fun today” (S418).

Also, if the emotion classification is confused (S412: confused), a greeting for caring for the other party, such as “How was it?” Is generated as a response (S420). When such processing ends, the emotion response generation processing ends.

[Effects of the twelfth embodiment]
In the voice response system 100, the server 90 detects the user's irritation and shaking by detecting the unexpectedly generated voice, and generates a message for suppressing the irritation and shaking.

According to such a voice response system 100, when the user is frustrated or shaken, these can be suppressed. Therefore, it is possible to suppress the utterance of trouble between the user and the surroundings.

[Thirteenth embodiment]
[Process of 13th Embodiment]
Next, a voice response system according to a thirteenth embodiment will be described. In the voice response system according to the thirteenth embodiment, a process of guiding the user to an object in the captured image is performed. This process is performed by the server 90 as the details of the process of S56 described above.

When the terminal device 1 inputs a voice message such as “Please guide to the visible tower”, guidance processing is performed in the processing of S56. In the guidance process, as shown in FIG. 27, first, the terminal position information is acquired from the GPS receiver 27 or the like of the terminal device 1 (S432).

Then, based on the sound (character information) and the image processing, a target object is specified from among the objects in the captured image, and this position is specified (S434). In this processing, the position of the object is specified in the map information (which may be acquired from the outside or may be held by the server 90) based on the shape, relative position, etc. of the object. For example, when a tower is reflected in the captured image, the tower is specified on the map from the position of the terminal device 1 and the shape of the tower.

Subsequently, a route to this object is searched (S436), and route information is acquired (S438). This processing can be realized by using the same processing as that in a known cloud navigation device.

Then, a response for guiding the route is generated (S440). In this process, a response similar to the guidance by the navigation device may be generated.
When such a process ends, the guidance process ends. When the guidance process is performed while the user is moving, the automatic conversation server process may be used to reproduce the message on the condition that the user reaches the point to be guided.

[Effects of the thirteenth embodiment]
In the voice response system 100, when character information is input, the server 90 generates a response corresponding to a captured image obtained by imaging the surroundings of the voice response system 100, and outputs the response by voice.

According to such a voice response system 100, a response can be output by voice according to the captured image. Therefore, usability can be improved compared with the structure which produces | generates a response only from character information.

Further, in the voice response system 100, the server 90 searches for an object included in the character information from the captured image by image processing, specifies the position of the searched object, and guides to the position of the object.

According to such a voice response system 100, the user can be guided to the object in the captured image.
Further, in the voice response system 100, the server 90 obtains route information such as weather, temperature, humidity, traffic information, road surface condition and the like to the destination when performing guidance to the destination, and the route information is voiced. Output.

According to such a voice response system 100, the situation (route information) to the destination can be notified to the user by voice.
[Modification of the thirteenth embodiment]
In addition to the above configuration, character information may be input so as to respond to what is recognized, and what (someone) is recognized from the captured image may be output by voice.

Further, in the voice response system 100, the server 90 may acquire a moving image obtained by capturing the shape of the mouth of the user when inputting character information by voice instead of the process of S48. In this case, instead of the processing of S52, the voice may be converted into character information, and the character information may be corrected by estimating an unclear part of the voice based on the moving image.

According to such a voice response system 100, the utterance content can be estimated from the shape of the mouth, so that an unclear part of the voice can be estimated well.
[Fourteenth embodiment]
[Process of 14th Embodiment]
Next, a voice response system according to the fourteenth embodiment will be described. In the voice response system according to the fourteenth embodiment, the user is requested to perform a predetermined operation, and it is determined whether the user has performed the operation as requested. In this configuration, in the automatic conversation terminal process shown in FIG. 6 and the automatic conversation server process shown in FIG. 7, the movement request process 1 shown in FIG. 28 and the movement request process 2 shown in FIG. It is carried out in order.

First, when the process of S54 is completed, the movement request process 1 is started, and the movement request process 1 outputs a response (message) instructing to move the line of sight or the head to a predetermined position as shown in FIG. (S452). When this process ends, the movement request process 1 ends.

Subsequently, when the process of S54 is completed, the movement request process 2 is started. In the movement request process 2, as shown in FIG. 29, it is determined whether or not the position of the line of sight or the head has moved as instructed ( S462). In this process, the user's action is detected by performing image processing on an image captured by the camera or using detection results by various sensors of the terminal device 1. In addition, when detecting a gaze by image processing, a known gaze recognition technique may be employed.

If the line of sight and head are not moving as instructed (S462: NO), the response generated in S452 is output again (S464). If the line of sight and head are moving as instructed (S462: YES), another arbitrary response is generated (S466).

When such processing ends, the movement request processing 2 ends.
[Effects of the 14th embodiment]
In the voice response system 100, the user's line of sight is detected, and if the user's line of sight does not move to a predetermined position in response to the call, a voice requesting to move the line of sight to the predetermined position is output.

According to such a voice response system 100, the user can be made to see a specific position. Therefore, it is possible to reliably perform safety confirmation when driving the vehicle.
In the voice response system 100, the server 90 observes the position of the body part and the facial expression, and outputs a voice requesting to change the position of the body part and the facial expression when there are few changes to the call. To do.

According to such a voice response system 100, the position of the body part of the user can be moved to a specific position or can be guided to have a specific facial expression. The present invention can be used when driving a vehicle or performing a physical examination.

[Fifteenth embodiment]
[Processing of Fifteenth Embodiment]
Next, a voice response system according to the fifteenth embodiment will be described. In the voice response system according to the fifteenth embodiment, when a user inputs a broadcast program or music as voice, a process for complementing when the broadcast program or music is interrupted is performed.

In this configuration, broadcast music supplement processing shown in FIG. 30 is performed as the details of S56 described above. In the broadcast music complementing process, as shown in FIG. 30, it is first determined whether or not the broadcast program or the music (the song if the user sings) has been interrupted (S482).

If it is interrupted (S482: YES), the broadcast program or music synchronized in the process of S492 described later is set as the response content (S484), and the broadcast music complementing process is terminated. If there is no interruption (S482: NO), the broadcast program is acquired if the broadcast program is being viewed (S486), and if the music is being played, the corresponding music is acquired (S488).

Here, in the karaoke DB 116, music and lyrics are recorded in association with each other, and when music is acquired in this process, music with lyrics is acquired.
Subsequently, the broadcast program or music to be viewed by the user is specified (S490). Then, this broadcast program or music is acquired and prepared so that it can be played back in synchronization with the broadcast program or music viewed by the user (S492), and the broadcast music supplement processing is terminated.

[Effects of the fifteenth embodiment]
In the voice response system 100, the server 90 acquires a broadcast program similar to the broadcast program viewed by the user, and outputs the broadcast program that the user acquired by outputting the broadcast program that the user acquired when the broadcast program is interrupted. Complement.

According to such a voice response system 100, it is possible to compensate for the broadcast program viewed by the user not to be interrupted.
In the voice response system 100, when the user sings a song without lyrics, the server 90 compares the song with the lyrics with the lyrics added by the user, and only the user's lyrics. The lyrics are output by voice in the part where there is no.

According to such a voice response system 100, it is possible to make up for a portion where a user who uses a so-called karaoke apparatus cannot sing (a portion where the lyrics are interrupted).
[Sixteenth Embodiment]
[Process of Sixteenth Embodiment]
Next, a voice response system according to the sixteenth embodiment will be described. In the voice response system according to the sixteenth embodiment, when a character is included in the captured image, when the terminal device 1 receives a question about how to read the character from the user, the character information is acquired from the outside. How to read characters included in information is output by voice.

In this configuration, the character explanation process shown in FIG. 31 is performed as the details of S56 described above. In the character commentary process, as shown in FIG. 31, it is first determined whether or not a reading question such as “how to read” has been received (S502). If a reading question has been received (S502: YES), the image-recognized character is searched for reading from another server or the like connected via the Internet 85 (S504), and the obtained reading is set as a response. In step S506, the character explanation process is terminated.

If it is not a reading question (S502: NO), a question of "meaning of words" such as the contents described in the Japanese dictionary. It is determined whether or not it has been received (S508). If a meaning question is received, the meaning of the image-recognized character (word) is retrieved from another server or the like connected via the Internet network 85 (S510), and the obtained meaning is set as a response ( S512), the character explanation process is terminated.

[Effects of Sixteenth Embodiment]
According to such a voice response system 100, since the reading of the image-recognized character is retrieved from another server or the like and the obtained reading is set as a response, the user is informed of how to read the character and the meaning of the word. Can teach.

[Seventeenth embodiment]
[Process of 17th Embodiment]
Next, a voice response system according to the seventeenth embodiment will be described. In the voice response system according to the seventeenth embodiment, based on the sensor value detected by the terminal device 1, the server 90 detects an abnormal action or state of the user of the terminal device 1, and notifies when there is an abnormality. Perform the process.

Specifically, the terminal device 1 performs the behavior response terminal process shown in FIG. 32, and the server 90 performs the behavior response server process. In the action response terminal process, as shown in FIG. 32, first, outputs from various sensors mounted on the terminal device 1 are acquired (S522), and an image captured by the camera 41 is acquired (S524). Then, the obtained outputs from the various sensors and the captured images are packet-transmitted to the server 90 (S526), and the action response terminal process is terminated.

Next, in the action response server process, as shown in FIG. 33, first, the processes of S42 to S44 described above are performed. Subsequently, based on the position information of the terminal device 1 (detection result by the GPS receiver 27), an action such as a bag is specified (S532), and the environment of the user is determined based on the detection results by the

temperature sensors

15, 19 and the like. It is detected (S534). Then, an abnormality is detected (S536).

In this process, an abnormality is detected based on the change in position information and the environment. For example, when the user does not move in a place where the temperature is high or low, or when the user exists in a place where the user does not normally go, it is detected that there is an abnormality (S536). Alternatively, the position information and the environment are scored, and if this score is below the reference value (out of the reference range), it is determined that there is an abnormality.

Subsequently, it is determined whether or not an abnormality has been detected (S538). If no abnormality is detected (S538: NO), the behavior response server process is terminated. If an abnormality is detected (S538: YES), a message indicating that there is an abnormality is generated (S540), and a predetermined contact is notified (S542). Then, the processes of S62 to S68 (excluding S66) described above are performed, and the action response server process is terminated.

[Effects of Seventeenth Embodiment]
In the voice response system 100, the server 90 detects the user's behavior and the surrounding environment of the user, and generates a message according to the detected behavior and the surrounding environment.

According to such a voice response system 100, it is possible to notify a dangerous place or an area where entry is prohibited. It is also possible to detect that the user has an abnormal behavior.

Further, in the voice response system 100, the server 90 determines a health condition based on a captured image obtained by capturing the user, and generates a message according to the health condition.
According to such a voice response system 100, the health condition of the user can be managed.

Further, in the voice response system 100, the server 90 notifies a predetermined contact when the health condition is lower than the reference value.
According to such a voice response system 100, when the user's health state is equal to or less than a reference value, a report can be made. Therefore, the abnormality can be notified to the other person earlier.

[Other Embodiments]
Embodiments of the present invention are not limited to the above-described embodiments, and can take various forms as long as they belong to the technical scope of the present invention.

For example, the voice response system 100 may mediate exchange between two parties or between multiple parties. Specifically, when it is necessary to give way at an intersection or the like, the terminal devices 1 may negotiate which vehicle enters the intersection first. In this case, each terminal device 1 transmits information on the moving direction when approaching the intersection and the approaching speed to the intersection to the server 90, and the server 90 sends the information to each terminal device 1 according to the moving direction and the approaching speed. A priority order may be set, and a voice such as “Tare” or “Enterable” may be generated and output according to the priority order.

Further, when the terminal device 1 accepts an incoming call (incoming call) of communication that needs to respond in real time such as voice communication, the incoming call may be accepted only at the convenience of the user. Specifically, when the user's face can be imaged by the camera 41, it may be assumed that the user is convenient and the incoming call may be accepted.

In addition, there are people who are uncomfortable if they do not respond even if they call the other party during voice communication. In order to suppress such feelings, the situation of the other party may be communicated to a user who is waiting for a response from the other party. For example, if the user's schedule is managed in the terminal device 1 and the user does not respond to an incoming call, the user's schedule is searched for what the user is doing, or the user's schedule, It is possible to tell when the user can respond.

Also, when the user does not answer the incoming call, the location of the user may be notified to the caller. For example, if the user is connected to the Internet or the like via a smartphone or a personal computer, it can be determined which terminal is being operated. It is conceivable to identify the location of the user from this information and convey it to the caller.

Further, whether or not the user can respond to an incoming call may be determined using position information using GPS or the like. Based on location information, you can determine whether you are in a car, at home, etc. For example, if the user is on the move or on the bed, it is highly public or sleep What is necessary is just to judge that it cannot answer an incoming call by judging that it is inside. If the incoming call cannot be answered in this way, it can be considered to inform the caller what the user is doing as described above.

Also, in order to acquire the position information, a configuration using a security camera can be considered. In recent years, various location security cameras have been installed, so it is possible to recognize the position of the user using a configuration for identifying the person such as face authentication using these security cameras. . In addition, a situation determination such as what the user is doing using the security camera (whether or not the telephone can be answered) may be performed. Further, whether or not an incoming call can be answered can also be determined based on conditions such as whether another fixed telephone is being used (the incoming call cannot be answered while the fixed telephone is in use).

Furthermore, when the user of the terminal device 1 wants to have a conversation with someone, use the personality learning result of the user and call out the terminal device that is estimated to have good compatibility among users among the unspecified number. Also good. In such a case, a topic that is likely to be excited (a topic that both users are interested in (extracted using the learning result)) may be spoken to the user.

Further, when the voice response device is not used for a long time (when the user is not speaking for more than the reference time), the voice response device may put some words on the user.
At this time, words to be spoken using position information such as GPS may be selected.

[Relationship Between Means Described in the Claims or Means for Solving the Problems (Invention) and Configuration in Embodiment]
The terminal device 1 and the server 90 in the above embodiment correspond to an example of the voice response device of the present invention. Further, the processes of S22 and S56 in the above embodiment correspond to an example of a response acquisition unit of the present invention.

Furthermore, the processes of S28, S60, and S64 in the above embodiment correspond to an example of the audio output means of the present invention. Further, the processes of S2 and S6 in the above embodiment correspond to an example of the voice input means of the present invention.

Furthermore, the process of S14 in the above embodiment corresponds to an example of the voice transmission means of the present invention. The response candidate DB 105 in the above embodiment corresponds to an example of a response recording unit of the present invention.

Furthermore, the process of S56 in the above embodiment corresponds to an example of the character information acquisition means of the present invention. Further, the processing of S22 and S56 in the above embodiment corresponds to an example of a response acquisition unit of the present invention.

Furthermore, the processes of S28, S60, and S64 in the above embodiment correspond to an example of the audio output means of the present invention. The processing of S254, S258, and S260 in the above embodiment corresponds to an example of the first personality information generation unit and the second personality information generation unit of the present invention. Moreover, the process of S56 in the said embodiment is corresponded to an example of the character information acquisition means of this invention.

Furthermore, the processing of S22 and S56 in the above embodiment corresponds to the response acquisition means of the present invention. Further, the processes of S28, S60, and S64 in the above embodiment correspond to an example of the audio output means of the present invention.

Furthermore, the processing of S254, S258, and S260 in the above embodiment corresponds to an example of the first personality information generation unit and the second personality information generation unit of the present invention.
Furthermore, the processing of S48 and S56 in the above embodiment corresponds to an example of a response generation unit of the present invention. Further, the processes of S28, S60, and S64 in the above embodiment correspond to an example of the audio output means of the present invention.

Furthermore, the modification in the above embodiment: the process of S48 corresponds to an example of the voice input moving image acquiring means of the present invention. Moreover, the process of S52 in the said embodiment is corresponded to an example of the character information conversion means of this invention.

Furthermore, the preference information generation process in the above embodiment corresponds to an example of the preference information generation means of the present invention. Moreover, the process of S56 in the said embodiment is corresponded to an example of the response candidate acquisition means of this invention.

Furthermore, the action character input process in the above embodiment corresponds to an example of the character information generating means of the present invention. Other device information acquisition means The other terminal use processing in the above embodiment corresponds to an example of the transfer means of the present invention.

Furthermore, the process of S98 in the above embodiment corresponds to an example of the reproduction condition determining means of the present invention. Moreover, the process of S100 in the said embodiment is corresponded to an example of the message reproduction | regeneration means of this invention.

Furthermore, the processing of S116 in the above embodiment corresponds to an example of the non-response transmission means of the present invention. Moreover, the process of S372 in the said embodiment is corresponded to an example of the speech accuracy detection means of this invention.

Furthermore, the process of S374 in the above embodiment corresponds to an example of the accuracy output means of the present invention. Moreover, the process of S204 in the said embodiment is corresponded to an example of the connection control means of this invention.

Furthermore, the process of S50 in the above embodiment corresponds to an example of the emotion determination means of the present invention. Moreover, the process of S438 in the said embodiment is equivalent to an example of the route information acquisition means of this invention.

Furthermore, the process of S462 in the above embodiment corresponds to an example of the line-of-sight detection means of the present invention. Further, the process of S464 in the above embodiment corresponds to an example of a line-of-sight movement request transmission unit of the present invention.

Furthermore, the process of S464 in the above embodiment corresponds to an example of a change request transmission unit of the present invention. Moreover, the process of S486 in the said embodiment is corresponded to an example of the broadcast program acquisition means of this invention.

Furthermore, the processing of S484 in the above embodiment corresponds to an example of the broadcast program complementing means and the lyrics adding means of the present invention. Further, the processing of S504 and S506 in the above embodiment corresponds to an example of the reading output means of the present invention. Further, the processing of S522 and S524 in the above embodiment corresponds to an example of the behavior environment detection means of the present invention.

Furthermore, the process of S538 in the above embodiment corresponds to an example of the health condition determining means of the present invention. Moreover, the process of S540 in the said embodiment is corresponded to an example of the health message production | generation means of this invention.

Furthermore, the process of S542 in the above embodiment corresponds to an example of the reporting means of the present invention.

Claims

A voice response device that allows voice response to input character information,
Response acquisition means for acquiring a plurality of different responses to the character information;
Voice output means for outputting the plurality of different responses in different voice colors;
A voice response device comprising:
The voice response device according to claim 1,
Voice input means for the user to input voice;
Voice transmitting means for converting the input voice to character information, generating a plurality of different responses to the character information and transmitting the response to the voice response device;
With
The response acquisition unit acquires the response from the external device.
The voice response device according to claim 1 or 2,
The voice response device or the external device includes response recording means in which a plurality of different responses including a positive response and a negative response to each character information are recorded for each of a plurality of character information,
The response acquisition means acquires the positive response and the negative response as the plurality of different responses,
The voice output device is characterized in that the voice output means reproduces a voice color different between the positive response and the negative response.