WO2019054009A1 - Dispositif de traitement d'informations, procédé de traitement d'informations et programme - Google Patents

Dispositif de traitement d'informations, procédé de traitement d'informations et programme Download PDF

Info

Publication number
WO2019054009A1
WO2019054009A1 PCT/JP2018/024544 JP2018024544W WO2019054009A1 WO 2019054009 A1 WO2019054009 A1 WO 2019054009A1 JP 2018024544 W JP2018024544 W JP 2018024544W WO 2019054009 A1 WO2019054009 A1 WO 2019054009A1
Authority
WO
WIPO (PCT)
Prior art keywords
output
information processing
user
control unit
sentence
Prior art date
Application number
PCT/JP2018/024544
Other languages
English (en)
Japanese (ja)
Inventor
早紀 横山
Original Assignee
ソニー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニー株式会社 filed Critical ソニー株式会社
Publication of WO2019054009A1 publication Critical patent/WO2019054009A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation

Definitions

  • the present disclosure relates to an information processing device, an information processing method, and a program.
  • Patent Document 1 discloses a technique for selecting a presentation level of information based on a gaze state of a user.
  • Patent Document 1 only selects a presentation level determined in advance based on a gaze state or the like. On the other hand, it is assumed that the usefulness of the information presentation depends on various factors besides the state of gaze. In addition, various output expressions are required according to the above factors.
  • the present disclosure proposes a new and improved information processing apparatus, information processing method, and program that can realize more flexible and effective information presentation.
  • the output control unit controls output of an output sentence in information presentation to the user, and the output control unit is configured to output the output sentence based on the output context acquired when the output sentence is output.
  • An information processing apparatus is provided that dynamically controls such output representation.
  • the processor may control output of an output sentence in information presentation to a user, wherein the controlling is performed based on the output context acquired upon output of the output sentence.
  • An information processing method is provided, further comprising dynamically controlling an output expression associated with an output sentence.
  • the computer is provided with an output control unit that controls an output of an output sentence in information presentation to a user, and the output control unit is configured to output an output sentence based on an output context acquired.
  • a program for functioning as an information processing apparatus, which dynamically controls an output expression related to the output sentence is provided.
  • various devices for presenting information to users have become widespread. Examples of the above-described device include an agent device that presents information to the user using speech and visual information.
  • the agent device can perform, for example, output of news or a message, a response to a user's inquiry, and the like by speech utterance, display of visual information, and the like.
  • the usefulness of the presented information depends on various factors other than the state of gaze.
  • the above factors include, for example, attributes of the user, states such as behavior and emotion, preferences and characteristics, and states of the surrounding environment.
  • An information processing apparatus, an information processing method, and a program according to an embodiment of the present disclosure are conceived based on the above points, and can realize more flexible and effective information presentation.
  • the information processing apparatus for realizing the information processing method according to the present embodiment dynamically controls the output expression related to the output sentence based on the output context acquired when outputting the output sentence to the user. , Is one of the features.
  • the output context refers to various situations when outputting an output sentence.
  • the output context according to the present embodiment includes, for example, a user context indicating user attributes, preferences, characteristics, actions, states, schedules, etc., and an environment context indicating the state of the surrounding environment.
  • FIG. 1 is a diagram for describing an outline of output control according to the present embodiment.
  • the upper part of FIG. 1 shows a user U1 who performs a user utterance UO1a related to a scheduled inquiry and an information processing terminal 10 which executes a response to the user utterance UO1a by a voice utterance SO1a.
  • FIG. 1 In the upper part of FIG. 1, an example is shown in the case where the user U1 is in a state of spare time relatively. Under the present circumstances, the information processing terminal 10 which concerns on this embodiment can output voice utterance SO1a which demonstrates a schedule in detail based on control by the information processing server 20.
  • SO1a voice utterance SO1a which demonstrates a schedule in detail based on control by the information processing server 20.
  • the information processing terminal 10 which concerns on this embodiment can output voice utterance SO1b which demonstrates a plan simply to the user who performed user utterance UO1a based on control by the information processing server 20.
  • the information processing terminal 10 can also transfer detailed schedule information to, for example, a smartphone possessed by the user U1 in order to maintain the integrity of the information.
  • the information processing server 20 may detect that the user U1 is in a hurry, for example, based on the image information captured by the information processing terminal 10. In addition, the information processing server 20 may detect that the user U1 is in a hurry by analyzing sound information on the user utterance UO 1a collected by the information processing server 20.
  • the information processing server 20 determines that the user U1 is in a state of hurry based on, for example, the information of the schedule registered by the user, and the voice utterance SO1b for briefly explaining the schedule is information It can also be output to the processing terminal 10.
  • the information processing server 20 may not necessarily output the voice utterance SO1b as a response to the user utterance UO1a.
  • the information processing server 20 can also cause the information processing terminal 10 to output the voice utterance SO1b spontaneously.
  • the information processing server 20 warns the user U1 by setting the speech utterance SO1 b as an output sentence to which the words “forget not?”, “Hurry up!”, And the like are added. May be As described above, according to the information processing server 20 according to the present embodiment, it is possible to realize more natural and effective information presentation by dynamically controlling the output expression of the output sentence based on the output context. .
  • FIG. 2 is a block diagram showing an exemplary configuration of the information processing system according to the present embodiment.
  • the information processing system according to the present embodiment includes an information processing terminal 10, an information processing server 20, and a sensor device 30.
  • the information processing terminal 10 and the information processing server 20, and the information processing server 20 and the sensor device 30 are connected so as to be able to communicate with each other via the network 40.
  • the information processing terminal 10 is an information processing apparatus that presents information to a user using voice and visual information based on control by the information processing server 20.
  • the information processing terminal 10 according to the present embodiment is characterized in that the information presentation described above is performed based on the output sentence and the output expression dynamically determined by the information processing server 20 based on the output context.
  • the information processing terminal 10 according to the present embodiment can be realized as various devices having a function of outputting voice and visual information.
  • the information processing terminal 10 according to the present embodiment may be, for example, a mobile phone, a smartphone, a tablet, a wearable device, a general-purpose computer, or a dedicated device of a stationary type or an autonomous moving type.
  • the information processing terminal 10 has a function of collecting various information related to the user and the surrounding environment.
  • the information processing terminal 10 collects, for example, sound information including an utterance of the user, image information obtained by imaging the user and the surroundings, and various other sensor information, and transmits the collected information to the information processing server 20.
  • the information processing server 20 is an information processing apparatus having a function of controlling output of an output sentence in information presentation to a user. At this time, the information processing server 20 according to the present embodiment is characterized by dynamically controlling an output expression related to the output sentence based on the output context acquired when outputting the output sentence.
  • the information processing server 20 acquires an output context based on sound information, image information, sensor information and the like collected by the information processing terminal 10 and the sensor device 30, and controls an output expression of an output sentence. it can.
  • the sensor device 30 has a function of collecting sound information, image information, and sensor information used for acquiring an output context by the information processing server 20.
  • Sensor device 30 concerning this embodiment is realized as various devices which have the above-mentioned function.
  • the sensor device 30 may be, for example, a home appliance, a game device, an office device, or the like.
  • the network 40 has a function of connecting the information processing terminal 10 with the information processing server 20 and the information processing server 20 with the sensor device.
  • the network 40 may include the Internet, a public line network such as a telephone network, a satellite communication network, various LANs (Local Area Networks) including Ethernet (registered trademark), a WAN (Wide Area Network), and the like.
  • the network 40 may include a dedicated line network such as an Internet Protocol-Virtual Private Network (IP-VPN).
  • IP-VPN Internet Protocol-Virtual Private Network
  • the network 40 may also include a wireless communication network such as Wi-Fi (registered trademark) or Bluetooth (registered trademark).
  • the configuration example of the information processing system according to the present embodiment has been described above.
  • the configuration described above with reference to FIG. 2 is merely an example, and the configuration of the information processing system according to the present embodiment is not limited to such an example.
  • the functions of the information processing terminal 10 and the information processing server 20 according to the present embodiment may be realized by a single device.
  • the information processing system according to the present embodiment may not necessarily include the sensor device 30.
  • the configuration of the information processing system according to the present embodiment can be flexibly deformed according to the specification and the operation.
  • FIG. 3 is a block diagram showing an example of a functional configuration of the information processing terminal 10 according to the present embodiment.
  • the information processing terminal 10 according to the present embodiment includes a display unit 110, an audio output unit 120, an audio input unit 130, an imaging unit 140, a sensor unit 150, a control unit 160, and a server communication unit 170. .
  • the display unit 110 has a function of outputting visual information such as an image or text.
  • the display unit 110 according to the present embodiment displays, for example, a text corresponding to an output sentence and an image including the output sentence based on control by the information processing server 20.
  • the display unit 110 includes a display device or the like that presents visual information.
  • the display device include a liquid crystal display (LCD) device, an organic light emitting diode (OLED) device, and a touch panel.
  • the display unit 110 according to the present embodiment may output visual information by a projection function.
  • the voice output unit 120 has a function of outputting various sounds including voiced speech.
  • the voice output unit 120 according to the present embodiment outputs a voice utterance corresponding to the output unit, for example, based on control by the information processing server 20.
  • the audio output unit 120 according to the present embodiment includes an audio output device such as a speaker or an amplifier.
  • the voice input unit 130 has a function of collecting sound information such as an utterance by a user and an ambient sound generated around the information processing terminal 10.
  • the sound information collected by the voice input unit 130 is used for voice recognition by the information processing server 20, recognition of the surrounding environment, and the like.
  • the voice input unit 130 according to the present embodiment includes a microphone for collecting sound information.
  • Imaging unit 140 The imaging unit 140 according to the present embodiment has a function of capturing an image of the user or the surrounding environment.
  • the image information captured by the imaging unit 140 is used for action recognition and state recognition of the user by the information processing server 20, and recognition of the surrounding environment.
  • the imaging unit 140 according to the present embodiment includes an imaging device capable of capturing an image. Note that the above image includes moving images as well as still images.
  • Sensor unit 150 The sensor unit 150 according to the present embodiment has a function of collecting various sensor information regarding the surrounding environment, the user's behavior, and the state. The sensor information collected by the sensor unit 150 is used for recognition of the surrounding environment by the information processing server 20, user's action recognition, and state recognition.
  • the sensor unit 150 includes, for example, an optical sensor including an infrared sensor, an acceleration sensor, a gyro sensor, a geomagnetic sensor, a heat sensor, a vibration sensor, a Global Navigation Satellite System (GNSS) signal receiving device, and the like.
  • an optical sensor including an infrared sensor, an acceleration sensor, a gyro sensor, a geomagnetic sensor, a heat sensor, a vibration sensor, a Global Navigation Satellite System (GNSS) signal receiving device, and the like.
  • GNSS Global Navigation Satellite System
  • Control unit 160 The control part 160 which concerns on this embodiment has a function which controls each structure with which the information processing terminal 10 is provided.
  • the control unit 160 controls, for example, start and stop of each component. Further, the control unit 160 inputs a control signal generated by the information processing server 20 to the display unit 110 or the audio output unit 120. Moreover, the control part 160 which concerns on this embodiment may have a function equivalent to the output control part 250 of the information processing server 20 mentioned later.
  • the server communication unit 170 has a function of performing information communication with the information processing server 20 via the network 40. Specifically, the server communication unit 170 transmits, to the information processing server 20, the sound information collected by the voice input unit 130, the image information captured by the imaging unit 140, and the sensor information collected by the sensor unit 150. Also, the server communication unit 170 receives, from the information processing server 20, control signals and the like related to the output statement.
  • the example of the functional configuration of the information processing terminal 10 according to the present embodiment has been described above.
  • the above configuration described with reference to FIG. 3 is merely an example, and the functional configuration of the information processing terminal 10 according to the present embodiment is not limited to such an example.
  • the information processing terminal 10 according to the present embodiment may not necessarily include all of the configurations shown in FIG. 3.
  • the information processing terminal 10 may be configured not to include the display unit 110, the sensor unit 150, and the like.
  • the control unit 160 according to the present embodiment may have the same function as the output control unit 250 of the information processing server 20.
  • the functional configuration of the information processing terminal 10 according to the present embodiment can be flexibly deformed according to the specification and the operation.
  • FIG. 4 is a block diagram showing an example of a functional configuration of the information processing server 20 according to the present embodiment.
  • the information processing server 20 according to the present embodiment includes a user recognition unit 210, an environment recognition unit 220, a speech recognition unit 230, a context acquisition unit 240, an output control unit 250, a user information storage unit 260, and parameter storage. And a communication unit 280.
  • the user recognition unit 210 has a function of performing various recognitions related to the user. For example, the user recognition unit 210 compares the speech or image of the user collected by the information processing terminal 10 or the sensor device 30 with the voice feature or image of the user stored in the user information storage unit 260 in advance. It can do recognition.
  • the user recognition unit 210 can recognize the user's action or state based on the sound information, the image information, and the sensor information collected by the information processing terminal 10 and the sensor device 30.
  • the user recognition unit 210 recognizes the movement and behavior of the user based on, for example, the collected image information and sensor information.
  • the user recognition unit 210 can recognize that the user is jogging, based on the acceleration information and the angular velocity information collected by the information processing terminal 10.
  • the user recognition unit 210 may recognize that the user is playing a game based on the operating status transmitted from the sensor device 30 which is a game device.
  • the user recognition unit 210 also recognizes various states relating to the user based on, for example, image information, sound information, and the like.
  • the user recognition unit 210 may recognize, for example, the user's gaze, expression, emotion, and the like based on the collected image information.
  • the environment recognition unit 220 has a function of performing various recognitions related to the surrounding environment based on sound information, image information, and sensor information collected by the information processing terminal 10 and the sensor device 30.
  • the environment recognition unit 220 may recognize the surrounding noise level based on the sound information collected by the information processing terminal 10, or a third party other than the user may use the surrounding based on the image information and the sensor information. It may be recognized that it exists.
  • the environment recognition unit 220 can also estimate the characteristics of the place where the user is located based on the image information and the sensor information. For example, the environment recognition unit 220 may estimate that the user is on a train or is in a busy street with many people.
  • the voice recognition unit 230 has a function of recognizing the user's speech based on the sound information collected by the information processing terminal 10.
  • the speech recognition unit 230 according to the present embodiment includes a speech zone detection function that detects a zone where a user utters a speech, a speech recognition function that converts sound information into text, and a text after conversion. It has an intention analysis function that analyzes the utterance intention.
  • the speech recognition unit 230 detects the user's speech style.
  • the above-mentioned speech style includes, for example, information such as the length, size, speed, speech and tone of speech.
  • the utterance style includes, for example, information such as the time taken for the user to speak after the output of the speech utterance by the information processing terminal 10, the user performed an interruption (barge-in) to the speech utterance, and the like. It may be
  • the context acquisition unit 240 has a function of acquiring an output context based on the results of various recognition performed by the user recognition unit 210, the environment recognition unit 220, and the speech recognition unit 230.
  • the context acquisition unit 240 according to the present embodiment can dynamically acquire the situation related to the output of the output sentence, that is, the output context, based on the user or the surrounding state, the input user's utterance, and the like.
  • the context acquisition unit 240 may acquire an output context based on information acquired from another application.
  • the context acquisition unit 240 can acquire, for example, user's schedule information, traffic jam information on the user's travel route, and the like from each application and comprehensively acquire an output context.
  • the context acquisition unit 240 may acquire the output context in consideration of the past history of the user's state stored in the user information storage unit 260, habit, characteristics, and the like. For example, even if it is the same action, depending on the user, a case where intention and meaning differ may be assumed. As an example, an action ( ⁇ , facial expression) performed when one user feels anxiety may be an action performed when another user feels angry. For this reason, the context acquiring unit 240 according to the present embodiment can comprehensively acquire the output context in consideration of the past history and habits for each user, thereby making it possible to estimate a situation with higher accuracy. .
  • the output control unit 250 has a function of controlling the output of an output sentence in the information presentation to the user. At this time, the output control unit 250 according to the present embodiment dynamically controls the output expression related to the output sentence based on the output context acquired by the context acquisition unit 240 when outputting the output sentence. Do.
  • the above output expression may include, for example, the sentence content of the output sentence. That is, the output control unit 250 according to the present embodiment may dynamically change the sentence content of the output sentence based on the output context acquired by the context acquisition unit 240. According to the above-described function of the output control unit 250 according to the present embodiment, the content itself of the output sentence can be dynamically changed according to the situation, and more valuable information can be presented to the user. Is possible.
  • the above-mentioned output expression includes an output mode, an output nuance, an output operation, and the like related to the output sentence. That is, based on the output context acquired by the context acquisition unit 240, the output control unit 250 according to the present embodiment can dynamically change the output mode, the output nuance, and the output operation related to the output sentence.
  • the above output mode refers to an auditory or visual expression relating to the output of an output sentence.
  • the output control unit 250 can control, for example, voice quality, size, prosody, output timing, effect, and the like of the voice utterance.
  • the above prosody includes the rhythm of the sound, strength and weakness, long and short, and the like.
  • the output control unit 250 can control, for example, the font, size, color, character decoration, arrangement, animation, etc. of the output sentence. According to the above-described function of the output control unit 250 according to the present embodiment, more effective information presentation can be realized by changing the aural or visual expression of the output sentence according to the situation. It becomes.
  • the above-mentioned output nuance refers to various expressions for conveying the intention included in the output sentence to the user.
  • the output control unit 250 can realize information presentation with a higher expressive power by controlling the output nuance of the output sentence based on the output context.
  • the control of the output nuance may be realized along with the control of the output mode and the output operation.
  • the above-mentioned output operation refers to the physical operation of the information processing terminal 10 related to the output of the output sentence.
  • the output operation may include the movement of parts such as limbs, an expression including sight line or blink, and the like.
  • the output operation includes, for example, various physical operations using light and vibration. According to the above-described function of the output control unit 250 according to the present embodiment, it is possible to cause the information processing terminal 10 to perform an appropriate output operation according to the situation. Further, the output control unit 250 may control an output operation of a character or the like to be displayed as visual information.
  • the user information storage unit 260 stores various information related to the user.
  • the user information storage unit 260 may store, for example, basic information such as the age and gender of the user, images and sounds of the user, preferences, characteristics, and the like. Also, the user information storage unit 260 stores the past history of the output context for each user.
  • the parameter storage unit 270 stores the history by associating the output sentence generated by the output control unit 250 and the output expression related to the output sentence with the output context. That is, it can be said that the parameter storage unit 270 according to the present embodiment stores an output rule of an output sentence according to each situation.
  • the communication unit 280 has a function of performing information communication with the information processing terminal 10 and the sensor device 30 via the network 40. Specifically, the communication unit 280 receives sound information, image information, and sensor information from the information processing terminal 10 and the sensor device 30. The communication unit 280 also transmits a control signal related to the output of the output sentence to the information processing terminal 10.
  • the functional configuration of the information processing server 20 has been described.
  • the above-mentioned functional composition explained using Drawing 4 is an example to the last, and functional composition of information processing server 20 concerning this embodiment is not limited to the example concerned.
  • the information processing server 20 may not necessarily have all of the configurations shown in FIG. 4.
  • the user recognition unit 210, the environment recognition unit 220, the speech recognition unit 230, the context acquisition unit 240, the user information storage unit 260, and the parameter storage unit 270 can be provided in another device different from the information processing server 20.
  • the functional configuration of the information processing server 20 according to the present embodiment can be flexibly deformed according to the specification and the operation.
  • FIG. 5 is a diagram showing an example of output control based on an output context including the state of another user according to the present embodiment.
  • FIG. 5 shows a situation in which the user U2 interacts with the information processing terminal 10 in the state where the user U1 is present in the surroundings.
  • the upper part of FIG. 5 shows an example of the case where the user U1 is in the normal state, and the lower part of FIG. 5 shows an example of the case where the user U1 is in a state of going out.
  • the output control unit 250 can perform different output control based on the output context including the state of the user U1 acquired by the context acquisition unit 240. For example, in the case shown in the upper part of FIG. 5, since the output control unit 250 indicates that the output context indicates that the user U1 is in the normal state, the output control unit 250 responds to the user utterance UO 5a related to the inquiry of the user U2. The output sentence is output as the speech utterance SO5a. At this time, the output control unit 250 can change the quality, the amount, the tone, and the like of the output sentence according to the age and the knowledge level of the user U2.
  • the output control unit 250 since the output control unit 250 indicates that the output context is in a state where the user U1 is in a state of interest, an output that urges preparation for going out.
  • the sentence is output as speech utterance SO5b.
  • the output control unit 250 according to the present embodiment is not only based on the state of the user U2 who interacts with the information processing terminal 10, but also based on the state of the user U1 existing around the user U2, Output expressions can be changed dynamically.
  • the output control unit 250 may not necessarily prioritize the response to the user's inquiry.
  • the output control unit 250 causes the information processing terminal 10 to output, to the information processing terminal 10, an output sentence assumed to be more valuable according to the output context, thereby realizing effective information presentation corresponding to the user's unintended needs. It is possible.
  • the output control unit 250 controls the output expression of the output sentence based on the output context including the history information of the output sentence in the past and the information acquired from the other application besides the user's state. You may
  • FIG. 6A is a diagram showing an example of output control based on an output context including a past history of an output sentence according to the present embodiment.
  • FIG. 6A shows a user U3 who performs a user utterance UO 6a asking a recommended learning course, and the information processing terminal 10 which outputs a voice utterance SO6a as a response to the user utterance UO 6a.
  • the output control unit 250 indicates that the output context has output an output sentence recommending a beginner's class in the past, and that the learning level of the current user U3 is improved compared to the previous output time. Based on the indication, it is possible to cause the information processing terminal 10 to output an output sentence that recommends the intermediate class.
  • FIG. 6B is a diagram showing an example of output control based on an output context including information acquired from another application.
  • FIG. 6B shows a user U3 who performs a user utterance UO 6b related to a weather inquiry and an information processing terminal 10 which outputs a voice utterance SO6b which is a response to the user utterance UO 6b.
  • the context acquisition unit 240 acquires an output context including schedule information acquired from the scheduler application.
  • the output control unit 250 may cause the information processing terminal 10 to output an output sentence including weather information of a business trip destination in addition to the weather of the current location since the output context indicates that the user U3 has a plan for a business trip. .
  • the output expression of the output sentence can be flexibly changed according to various situations indicated by the output context, and more valuable information presentation is realized. It becomes possible.
  • the control of the output expression according to the present embodiment is not limited to such an example.
  • the output control unit 250 according to the present embodiment can change the output nuance of the output sentence without changing the sentence content.
  • FIG. 7A and 7B are diagrams for describing control of output nuance according to the present embodiment.
  • FIG. 7A shows the user U3 who performs the user utterance UO 7a which is an inquiry related to the mood of the user U1, and the information processing terminal 10 which outputs the voice utterance SO7a which is a response to the user utterance UO 7a.
  • the output control unit 250 causes the information processing terminal 10 to output an output sentence corresponding to the answer as the voice utterance SO 7 a by interposing the inquiry related to the user utterance UO 7 a with the user U 1 at a remote place and obtaining an answer from the user U 1 be able to.
  • the output control unit 250 may cause the information processing terminal 10 to output the visual information SV7a related to the state of the user U1 together with the speech utterance SO7a.
  • the output control unit 250 can cause the information processing terminal 10 to output an image obtained by imaging the state of the user U1 or the avatar AU1 of the user U1 as the visual information SV7a.
  • the output control unit 250 indicates that the output context including the image information of the user U1 does not appear to be angry at the user U1, and thus "not angry at all" obtained from the user U1.
  • the answer may be output in a positive output expression.
  • FIG. 7B shows an example of the case where the user U1 makes an angry expression and makes an answer exactly the same as FIG. 7A.
  • the output control unit 250 since the output control unit 250 indicates that the output context included in the image information of the user U1 indicates that the user U1 is an angry expression, an output nuance that suggests that the user U1 is angry, a voice utterance SO7b is output to the information processing terminal 10.
  • the output control unit 250 can express the above suggestion by, for example, changing intonation and intervals related to the speech utterance SO7b while maintaining the same text content as the speech utterance SO7a.
  • the output control unit 250 according to the present embodiment can flexibly change the output nuance of the output sentence without changing the sentence content. According to the above-described function of the output control unit 250 according to the present embodiment, it is possible to realize more colorful expression based on the output context.
  • the above description has mainly focused on an example in which the user who makes a request such as an inquiry and the user who receives information presentation are the same.
  • control of the information presentation concerning this embodiment is not limited to the example concerned.
  • the output control unit 250 according to the present embodiment may control information presentation to a target user different from the request user based on a request by the request user.
  • the output control part 250 which concerns on this embodiment can control the output expression of an output sentence dynamically based on the output context which concerns on a request user, for example.
  • FIG. 8 is a diagram for describing output control based on an output context related to a request user.
  • FIG. 8 shows an example of output control in the case where a user U1 who is a request user who is at a remote location requests to mediate a message for the user U2 who is a target user.
  • the output control unit 250 can dynamically change the output expression including the sentence content of the output sentence based on the output context of the user U1 who is the requesting user.
  • the output control unit 250 may change the output expression of the output sentence, for example, based on the actual state of the user U1 at the time of outputting the output sentence for the user U2.
  • the output control unit 250 since the output context including the state of the user U1 indicates that the user U1 is busy, the output control unit 250 outputs an output statement reflecting the change of the situation of the user U1. It is output to the information processing terminal 10 as the speech utterance SO8a.
  • an output sentence taking into consideration the actual state of the requesting user at the time of output is generated instead of simply transmitting the request user's message to the target user.
  • the output control unit 250 may dynamically control the output expression of the output sentence based on the output context related to the target user.
  • FIG. 9 is a diagram for describing output control based on an output context related to a target user.
  • FIG. 9 shows an example of output control in the case where a user U1 who is a requesting user who is at a remote location requests to relay a message to a user U2 who is a target user.
  • the output control unit 250 can dynamically change the output expression including the sentence content of the output sentence based on the output context of the user U2 who is the target user.
  • the user U1 who is the requesting user is performing the user utterance UO 9a for instructing the action of the user U2 who is the target user.
  • the user utterance UO 9a is a content for instructing the user U2 that the user should not eat a snack until the homework is over.
  • the output control unit 250 may change the output expression of the output sentence based on, for example, the actual state of the user U2 at the time of outputting the output sentence for the user U2.
  • the output control unit 250 indicates that the output context including the state of the user U2 indicates that the user U1 has already finished the homework, so the output sentence reflecting the end of the action is voiced. It is output to the information processing terminal 10 as the utterance SO9a. Specifically, the output control unit 250 gives up having finished the homework, and causes the information processing terminal 10 to perform a speech utterance SO9a indicating that it is acceptable to eat a snack.
  • the output control unit 250 by generating an output sentence taking into consideration the actual state of the target user at the time of output, for example, more positive information presentation for the target user is performed. It is possible to provide value to both the requesting user and the target user.
  • the output control unit 250 may dynamically control the output expression of the output sentence based on the change of the output context with the passage of time.
  • the output control unit 250 according to the present embodiment can dynamically control the output expression, for example, based on the change in the output context at the time of occurrence of a trigger related to the output of the output sentence and at the time of output of the output sentence. .
  • the above-mentioned trigger refers to an event that triggers the output of an output sentence.
  • the trigger according to the present embodiment may be, for example, a request from a user or the like.
  • the output control unit 250 can output an output sentence serving as a response to the request to the information processing terminal 10, for example, triggered by a request from the user.
  • FIG. 10 and 11 are diagrams for describing output control based on a change in output context according to the present embodiment.
  • FIG. 11 shows a user U3 who performs a user utterance UO 10a requesting a remind concerning the bringing of documents and souvenirs, and an information processing terminal 10 outputting an output sentence corresponding to the request as a speech utterance SO10a. .
  • the output control unit 250 is an output including the sentence content of the output sentence based on the change of the output context at the time of detection of the user speech UO 10a, that is, at the trigger occurrence and at the output of the speech speech SO 10a. It is possible to change the expression.
  • the output control unit 250 may change the sentence content of the output sentence based on the change in the possession of the user U3 when the trigger occurs and when the output sentence is output.
  • the output control unit 250 selects the sentence content of the output sentence relating to the remind based on the fact that the user U3 possesses the document which the user U3 did not possess at the time of the trigger occurrence.
  • the changed voice utterance SO10a is output to the information processing terminal 10.
  • FIG. 11 shows a user U3 who performs a user utterance UO11a for requesting an explanation regarding an operation procedure, and an information processing terminal 10 which outputs an output sentence corresponding to the request.
  • the output control unit 250 causes the information processing terminal 10 to output the speech utterance SO11a without omitting all the input procedures set in advance.
  • the operation procedure includes access to a home page, login by user name input, and menu selection.
  • the lower part of FIG. 11 shows an example in which the user U3 voluntarily completes the login process without waiting for an explanation after accessing the home page.
  • the output control part 250 which concerns on this embodiment outputs the output sentence which abbreviate
  • the output control unit 250 can dynamically change the output expression of the output sentence based on the change of the output context that affects the content of at least a part of the output sentence.
  • the output control unit 250 may dynamically change the output expression of the output sentence based on the progress of the predetermined action by the user from the time of occurrence of the trigger to the time of output. .
  • the output control unit 250 can dynamically change the output expression of the output sentence based on the detection of the completion of the predetermined action by the user.
  • the predetermined action may be an action corresponding to at least a part of the output sentence.
  • the output sentence can be flexibly changed according to the change of the output context with the passage of time, and more efficient and high-value information presentation can be realized. It is possible to
  • the output control unit 250 may dynamically control the output expression of the output sentence based on the output contexts of a plurality of users.
  • 12 and 13 are diagrams for describing output control based on output contexts related to a plurality of users.
  • FIG. 12 shows users U2 and U3 who perform user utterances SO12a related to a restaurant inquiry, and the information processing terminal 10 outputting a response to the inquiry by the speech utterance SO12a.
  • FIG. 12 shows an example where the user U2 is on a diet and the user U3 eats a steak for lunch.
  • the output control part 250 which concerns on this embodiment may control the output expression of an output sentence dynamically so that the sum total of the profit of a plurality of users, ie, users U2 and U3, may increase.
  • the output control unit 250 is suitable for the user U2 who is on a diet based on the output context indicating the above-mentioned situation, and has a different taste from the meal that the user U3 takes for lunch
  • An output sentence recommending Japanese food is output as the speech utterance SO12a.
  • the output control unit 250 it is possible to present information that is estimated to be valuable for both the users U2 and U3, and even when there are a plurality of users. It is possible to provide profitable users with more users.
  • FIG. 13 shows the user U1 who performs the user utterance UO 13a asking the crowdedness of a specific restaurant, the user U3, and the information processing terminal 10 which outputs an answer to the user utterance UO 13a by the voice utterance SO13a. .
  • FIG. 13 shows an example where the above-mentioned specific restaurant is relatively crowded, and the user U3 has a meeting plan after one hour.
  • the output control part 250 which concerns on this embodiment may control the output expression of an output sentence dynamically so that the sum total of loss of several users, ie, users U2 and U3, may reduce.
  • the output control unit 250 causes the information processing terminal 10 to output, as the speech utterance SO 13 a, an output sentence that recommends a restaurant with more nearby restaurants based on the output context indicating the above-mentioned situation. .
  • the output control unit 250 may, for example, recommend a restaurant that has been used by the user U1 and has been highly evaluated before based on the past history related to information presentation.
  • information presentation can be performed to prevent the loss of both the users U2 and U3, and even when there are a plurality of users, more can be provided. It is possible to provide users with highly profitable information.
  • the context acquisition unit 240 may include, for example, the above-described diet status, meal content, schedule, and the like in the output context based on information acquired from a scheduler application, a message application, an SNS, etc. it can.
  • FIG. 14 is a flowchart showing the flow of output control by the information processing server 20 according to the present embodiment.
  • the communication unit 280 of the information processing terminal 10 receives collected information from the information processing terminal 10, the sensor device 30, and the like (S1101).
  • the collected information includes sound information, image information, and other sensor information.
  • the output control unit 250 detects a trigger related to the output of the output unit based on the recognition result by the speech recognition unit 230 (S1102).
  • the context acquisition unit 240 acquires an output context at the time of occurrence of a trigger (S1103).
  • the context acquisition unit 240 acquires an output context at the time of output of the output sentence based on the control by the output control unit 250 (S1104).
  • the output control unit 250 executes output control of the output sentence based on the output context acquired by the context acquisition unit 240 in steps S1103 and S1104 (S1105).
  • FIG. 15 is a block diagram illustrating an exemplary hardware configuration of the information processing terminal 10 and the information processing server 20 according to an embodiment of the present disclosure.
  • the information processing terminal 10 and the information processing server 20 include, for example, a CPU 871, a ROM 872, a RAM 873, a host bus 874, a bridge 875, an external bus 876, an interface 877, and an input device 878. , An output device 879, a storage 880, a drive 881, a connection port 882, and a communication device 883.
  • the hardware configuration shown here is an example, and some of the components may be omitted. In addition, components other than the components shown here may be further included.
  • the CPU 871 functions as, for example, an arithmetic processing unit or a control unit, and controls the overall operation or a part of each component based on various programs recorded in the ROM 872, the RAM 873, the storage 880, or the removable recording medium 901.
  • the ROM 872 is a means for storing a program read by the CPU 871, data used for an operation, and the like.
  • the RAM 873 temporarily or permanently stores, for example, a program read by the CPU 871 and various parameters appropriately changed when the program is executed.
  • the CPU 871, the ROM 872, and the RAM 873 are mutually connected via, for example, a host bus 874 capable of high-speed data transmission.
  • host bus 874 is connected to external bus 876, which has a relatively low data transmission speed, via bridge 875, for example.
  • the external bus 876 is connected to various components via an interface 877.
  • Input device 8708 For the input device 878, for example, a mouse, a keyboard, a touch panel, a button, a switch, a lever, and the like are used. Furthermore, as the input device 878, a remote controller (hereinafter, remote control) capable of transmitting a control signal using infrared rays or other radio waves may be used.
  • the input device 878 also includes a voice input device such as a microphone.
  • the output device 879 is a display device such as a CRT (Cathode Ray Tube), an LCD, or an organic EL, a speaker, an audio output device such as a headphone, a printer, a mobile phone, or a facsimile. It is a device that can be notified visually or aurally. Also, the output device 879 according to the present disclosure includes various vibration devices capable of outputting haptic stimulation.
  • the storage 880 is a device for storing various data.
  • a magnetic storage device such as a hard disk drive (HDD), a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like is used.
  • the drive 881 is a device that reads information recorded on a removable recording medium 901 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, or writes information on the removable recording medium 901, for example.
  • a removable recording medium 901 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory
  • the removable recording medium 901 is, for example, DVD media, Blu-ray (registered trademark) media, HD DVD media, various semiconductor storage media, and the like.
  • the removable recording medium 901 may be, for example, an IC card equipped with a non-contact IC chip, an electronic device, or the like.
  • connection port 882 is, for example, a port for connecting an externally connected device 902 such as a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface), an RS-232C port, or an optical audio terminal. is there.
  • an externally connected device 902 such as a USB (Universal Serial Bus) port, an IEEE 1394 port, a SCSI (Small Computer System Interface), an RS-232C port, or an optical audio terminal. is there.
  • the external connection device 902 is, for example, a printer, a portable music player, a digital camera, a digital video camera, an IC recorder, or the like.
  • the communication device 883 is a communication device for connecting to a network.
  • a communication card for wired or wireless LAN Bluetooth (registered trademark) or WUSB (Wireless USB), a router for optical communication, ADSL (Asymmetric Digital) (Subscriber Line) router, or modem for various communications.
  • Bluetooth registered trademark
  • WUSB Wireless USB
  • ADSL Asymmetric Digital
  • Subscriber Line Subscriber Line
  • the information processing server 20 has a function of controlling output of an output sentence in information presentation to the user. Further, at this time, the information processing server 20 dynamically controls an output expression related to the output sentence based on the output context acquired when outputting the output sentence. According to the configuration, it is possible to realize more flexible and effective information presentation.
  • each step concerning processing of information processing server 20 of this specification does not necessarily need to be processed in chronological order according to the order described in the flowchart.
  • the steps related to the processing of the information processing server 20 may be processed in an order different from the order described in the flowchart or may be processed in parallel.
  • An output control unit that controls output of an output sentence in presenting information to a user; Equipped with The output control unit dynamically controls an output expression related to the output sentence based on an output context acquired when outputting the output sentence.
  • Information processing device (2)
  • the output expression includes at least the sentence content of the output sentence, The output control unit dynamically changes the sentence content of the output sentence based on the output context.
  • the information processing apparatus according to (1).
  • the output representation includes at least one of an output mode, an output nuance, and an output operation according to the output sentence, The output control unit dynamically changes at least one of the output mode, the output nuance, and the output operation based on the output context.
  • the information processing apparatus according to (1) or (2).
  • the output control unit dynamically controls the output expression based on a change in the output context over time.
  • the information processing apparatus according to any one of the above (1) to (3).
  • the output control unit dynamically controls the output expression based on a change in the output context at the time of occurrence of a trigger related to the output of the output sentence and at the time of output of the output sentence.
  • the information processing apparatus according to any one of the above (1) to (4).
  • the output control unit dynamically changes the output expression based on a change in the output context that affects the content of at least a part of the output sentence.
  • the information processing apparatus according to (5).
  • the output control unit dynamically changes the output expression based on the progress of a predetermined action by the user between the occurrence of the trigger and the output.
  • the predetermined action is an action corresponding to at least a part of the output sentence.
  • the output control unit dynamically changes the output expression based on detection of completion of the predetermined action.
  • the output control unit controls information presentation to a target user based on a request by a request user.
  • the output control unit dynamically controls the output expression based on the output context related to the request user.
  • (11) The output control unit dynamically controls the output expression based on the output context of the target user.
  • the requesting user and the target user are located at remote places with each other, The information processing apparatus according to any one of the above (9) to (11).
  • the output control unit dynamically controls the output expression based on output contexts associated with a plurality of users.
  • the output control unit dynamically controls the output expression such that the sum of benefits of a plurality of users increases.
  • the output control unit dynamically controls the output representation such that the sum of losses of a plurality of users is reduced.
  • (16) The output control unit dynamically controls the output expression based on a past history related to the information presentation.
  • the information processing apparatus includes information related to the user's state, behavior, schedule, and / or environmental state.
  • the information processing apparatus according to any one of the above (1) to (16).
  • the output control unit dynamically controls an output expression of a speech utterance related to the output sentence.
  • the information processing apparatus according to any one of the above (1) to (17).
  • the processor controlling the output of the output sentence in the information presentation to the user; Including The controlling may dynamically control an output expression related to the output sentence based on an output context acquired when outputting the output sentence. Further include, Information processing method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

La présente invention a pour objet de présenter des informations de manière plus souple et efficace. À cet effet, l'invention porte sur un dispositif de traitement d'informations comprenant une unité de commande de sortie qui commande la sortie d'un texte de sortie dans la présentation d'informations à un utilisateur, l'unité de commande de sortie commandant dynamiquement des expressions de sortie concernant le texte de sortie sur la base d'un contexte de sortie acquis lorsque le texte de sortie est délivré en sortie. L'invention concerne également un procédé de traitement d'informations qui comprend une étape dans laquelle un processeur commande la sortie du texte de sortie dans la présentation d'informations à l'utilisateur, et comprend en outre une étape dans laquelle l'étape de commande implique une commande dynamique d'expressions de sortie concernant le texte de sortie sur la base d'un contexte de sortie acquis lorsque le texte de sortie est délivré en sortie.
PCT/JP2018/024544 2017-09-15 2018-06-28 Dispositif de traitement d'informations, procédé de traitement d'informations et programme WO2019054009A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2017178208 2017-09-15
JP2017-178208 2017-09-15

Publications (1)

Publication Number Publication Date
WO2019054009A1 true WO2019054009A1 (fr) 2019-03-21

Family

ID=65722685

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/024544 WO2019054009A1 (fr) 2017-09-15 2018-06-28 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Country Status (1)

Country Link
WO (1) WO2019054009A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007285976A (ja) * 2006-04-19 2007-11-01 Fujitsu Ltd 音声案内装置
JP2014164523A (ja) * 2013-02-25 2014-09-08 Sharp Corp メッセージ通知装置、制御方法、および制御プログラム
JP2015005058A (ja) * 2013-06-19 2015-01-08 ヤフー株式会社 レコメンド装置、レコメンド方法及びレコメンドプログラム
JP2015064689A (ja) * 2013-09-24 2015-04-09 シャープ株式会社 通知サーバ、通知システム、通知方法、プログラム、および記録媒体
WO2016136062A1 (fr) * 2015-02-27 2016-09-01 ソニー株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations, et programme
JP2017080374A (ja) * 2015-10-27 2017-05-18 シャープ株式会社 制御装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007285976A (ja) * 2006-04-19 2007-11-01 Fujitsu Ltd 音声案内装置
JP2014164523A (ja) * 2013-02-25 2014-09-08 Sharp Corp メッセージ通知装置、制御方法、および制御プログラム
JP2015005058A (ja) * 2013-06-19 2015-01-08 ヤフー株式会社 レコメンド装置、レコメンド方法及びレコメンドプログラム
JP2015064689A (ja) * 2013-09-24 2015-04-09 シャープ株式会社 通知サーバ、通知システム、通知方法、プログラム、および記録媒体
WO2016136062A1 (fr) * 2015-02-27 2016-09-01 ソニー株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations, et programme
JP2017080374A (ja) * 2015-10-27 2017-05-18 シャープ株式会社 制御装置

Similar Documents

Publication Publication Date Title
KR102100742B1 (ko) 디지털 어시스턴트 서비스의 원거리 확장
KR102334942B1 (ko) 돌봄 로봇을 위한 데이터 처리 방법 및 장치
US20220284896A1 (en) Electronic personal interactive device
KR102197869B1 (ko) 자연스러운 어시스턴트 상호작용
KR102279647B1 (ko) 디지털 어시스턴트 서비스의 원거리 확장
JP7418526B2 (ja) 自動アシスタントを起動させるための動的および/またはコンテキスト固有のホットワード
JP7247271B2 (ja) 非要請型コンテンツの人間対コンピュータダイアログ内へのプロアクティブな組込み
EP3766066B1 (fr) Génération de réponse dans une conversation
JP2019220194A (ja) 情報処理装置、情報処理方法及びプログラム
CN117033578A (zh) 基于设备间对话通信的主动协助
JPWO2019087811A1 (ja) 情報処理装置、及び情報処理方法
WO2016181670A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations, et programme
KR20210137118A (ko) 대화 단절 검출을 위한 글로벌 및 로컬 인코딩을 갖는 컨텍스트 풍부 주의 기억 네트워크를 위한 시스템 및 방법
JP2019091387A (ja) 情報処理装置及びプログラム
US20230108256A1 (en) Conversational artificial intelligence system in a virtual reality space
KR20240007261A (ko) 자동화된 어시스턴트 응답(들) 생성에 대규모 언어 모델 사용
WO2020116026A1 (fr) Dispositif de traitement de réponse, procédé de traitement de réponse et programme de traitement de réponse
JPWO2017175442A1 (ja) 情報処理装置、および情報処理方法
US20200234187A1 (en) Information processing apparatus, information processing method, and program
JP7230803B2 (ja) 情報処理装置および情報処理方法
DK180835B1 (en) Spoken notifications
WO2019054009A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations et programme
JPWO2018116556A1 (ja) 情報処理装置、および情報処理方法
US11935449B2 (en) Information processing apparatus and information processing method
US20220108693A1 (en) Response processing device and response processing method

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: JP

122 Ep: pct application non-entry in european phase

Ref document number: 18855474

Country of ref document: EP

Kind code of ref document: A1