WO2009087860A1 - Voice interactive device and computer-readable medium containing voice interactive program - Google Patents

Voice interactive device and computer-readable medium containing voice interactive program Download PDF

Info

Publication number
WO2009087860A1
WO2009087860A1 PCT/JP2008/072703 JP2008072703W WO2009087860A1 WO 2009087860 A1 WO2009087860 A1 WO 2009087860A1 JP 2008072703 W JP2008072703 W JP 2008072703W WO 2009087860 A1 WO2009087860 A1 WO 2009087860A1
Authority
WO
WIPO (PCT)
Prior art keywords
context
voice
attribute
determined
determination
Prior art date
Application number
PCT/JP2008/072703
Other languages
French (fr)
Japanese (ja)
Inventor
Akiko Yamato
Original Assignee
Brother Kogyo Kabushiki Kaisha
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Brother Kogyo Kabushiki Kaisha filed Critical Brother Kogyo Kabushiki Kaisha
Publication of WO2009087860A1 publication Critical patent/WO2009087860A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Definitions

  • the present invention relates to a voice interactive device and a computer-readable medium storing a voice interactive program. More specifically, the present invention relates to a voice interactive apparatus capable of changing the tone of voice when conversation contents change, and a computer-readable medium storing a voice interactive program.
  • the tone and tempo change as the conversation changes. For example, if the content of the story changes from a job story to a hobby story, the serious tone during the job story changes into a joyful and light tone during the hobby story.
  • an apparatus such as the user support apparatus described in Patent Document 1 interacts with the user at a fixed voice and a fixed speed. Therefore, even if the content of the conversation changes, the tone of the voice to talk with does not change accordingly, and the user may feel unnatural.
  • This disclosure is intended to provide a voice dialogue apparatus capable of changing the tone of voice when conversation contents change, and a computer-readable medium storing a voice dialogue program.
  • a voice input unit that inputs voice
  • a conversion unit that converts an input voice, which is a voice input by the voice input unit, into a character string
  • a context that stores a conversation context corresponding to a keyword
  • a keyword stored in the context storage means is extracted from a storage means and a converted character string that is a character string converted by the conversion means, and stored in the context storage means corresponding to the extracted keyword.
  • Attribute storage means for storing audio attributes, and attributes stored in the attribute storage means.
  • Output control means for causing the voice output means to output the conversation sentence determined by the sentence determination means, and the determination context that is the context determined by the context determination means is determined previously by the context determination means
  • a determination unit that determines whether or not the determination context has changed from a previous determination context that is a context; and when the determination unit determines that the determination context has changed, the attribute of the voice stored in the attribute storage unit is changed.
  • FIG. 2 is a hardware block diagram of the voice interactive apparatus 100.
  • FIG. 3 is a schematic diagram showing a configuration of a context tree storage area 131.
  • FIG. 4 is a schematic diagram of a tree structure of contexts stored in a context tree storage area 131.
  • FIG. 5 is a schematic diagram showing a configuration of a first attribute information storage area 1321.
  • FIG. It is a schematic diagram which shows the structure of the 2nd attribute information storage area 1322.
  • FIG. 10 is a schematic diagram showing a configuration of a third attribute information storage area 1323.
  • FIG. 3 is a flowchart of main processing of the voice interaction apparatus 100. It is a flowchart of the 1st process performed in the main process. It is a flowchart of the 2nd process performed in the main process. It is a figure which shows an example of the dialogue between a user and a voice dialogue agent.
  • the voice interaction apparatus 100 of this embodiment is a so-called personal computer. As shown in FIG. 1, the voice interaction apparatus 100 is provided with a CPU 10 that controls the voice interaction apparatus 100. Connected to the CPU 10 are a RAM 11 that temporarily stores various data and a ROM 12 that stores BIOS and the like. Further, a hard disk device 13, an output control unit 14, an input control unit 15, an audio output control unit 16, an audio input control unit 17, and a timer 18 are connected to the CPU 10 via a bus. An output device 24 is connected to the output control unit 14, and an input device 25 is connected to the input control unit 15.
  • the output device 24 is, for example, a display
  • the input device 25 is, for example, a mouse or a keyboard.
  • a speaker 26 is connected to the audio output control unit 16, and a microphone 27 is connected to the audio input control unit 17.
  • the timer 18 measures time.
  • the hard disk device 13 is provided with at least a context tree storage area 131, an attribute information storage area 132, an acoustic model storage area 133, a voice interaction program storage area 134, and other information storage areas 135.
  • a context tree storage area 131 a context tree indicating the relationship of contexts (contents of conversation) is stored.
  • the attribute information storage area 132 stores information related to a voice attribute (hereinafter referred to as “voice attribute information”) designated when a context conversation satisfying a predetermined condition is made.
  • the acoustic model storage area 133 stores a plurality of acoustic models for outputting sound from the microphone 27.
  • the voice interaction program storage area 134 stores a voice interaction program executed by the CPU 10.
  • other information storage area 135 other information used in the voice interactive apparatus 100 is stored.
  • the RAM 11 is provided with a currently determined context storage area 111, a previous determined context storage area 112, and an attribute storage area 113.
  • a context ID of the current determination context (hereinafter referred to as “determination context ID”) is stored.
  • a context ID of the determination context immediately before becoming the current context (hereinafter referred to as “previous determination context ID”) is stored.
  • the attribute storage area 113 stores attributes used when the voice output from the speaker 26 is synthesized.
  • the attribute data items are, for example, speed, pitch, acoustic model, and voice quality after filtering.
  • the voice dialogue agent when executed in the voice dialogue apparatus 100, the voice dialogue agent is activated.
  • An image of the character is displayed on the output device (display) 24 by the voice interaction agent.
  • This character image is a concrete representation of a voice interaction agent.
  • the user interacts with the voice interaction agent as if interacting with the character image.
  • a speech (voice) from the user is input from the microphone 27.
  • the input voice is analyzed as text and used as an input sentence from the user.
  • a response sentence corresponding to the input sentence is determined, voice-converted, and output from the speaker 26 as voice.
  • the character image also has a design that speaks a word, giving the user a sense of realism that is interacting with the character.
  • the dialogue content between the user and the voice dialogue agent is determined by a keyword in the user's input sentence.
  • This dialogue is called “context”.
  • This context is represented by a tree structure (see FIG. 3).
  • the voice interaction agent changes the attribute of the output voice of the voice interaction agent according to the specific context and the moving state of the context, and outputs a sound suitable for the content of the conversation.
  • the context tree storage area 131 provided in the HDD 13 will be described with reference to FIGS.
  • the context tree storage area 131 is provided with “context ID”, “context name”, and “keyword” as data items.
  • a context name is given for each context ID.
  • a keyword is assigned to the context ID.
  • the context associated with the keyword is set as the “determined context” that is the context of the current conversation. Note that the context shown in FIG. 2 is an example.
  • the context ID “0000” is an ID given to the context that is the root of the tree structure.
  • the context on the branch is given a 4-digit + 4-digit ID such as “0100-0000”.
  • the last four digits “0000” are the context ID of the parent (one layer higher). That is, “0100-0000” indicates a child of the context ID “0000” (lower one hierarchy).
  • the subsequent four-digit ID is referred to as “parent ID”.
  • the context tree storage area 131 shown in FIG. 2, as shown in FIG. 3, the context with the context name “general” having the context ID “0000” is the root.
  • the first two digits indicate the hierarchy of the tree structure.
  • the previous two-digit ID is “00”, indicating the hierarchy “00”.
  • the previous two-digit ID “01” indicates the first layer.
  • the first two digits of the previous four digits “02” indicate the second layer. Yes.
  • the last two digits of the preceding four digits are identification numbers in the same hierarchy. In the example shown in FIG. 2 and FIG.
  • the context ID is composed of “(own ID 4 digits) ⁇ (parent ID 4 digits)”, that is, “(hierarchical ID 2 digits) (identification number 2 digits) ⁇ (parent ID 4 digits)”. According to such a context ID assigning rule, IDs that do not overlap each other are assigned to the context, so that the context can be identified by the context ID.
  • the attribute information storage area 132 includes a first attribute information storage area 1321, a second attribute information storage area 1322, and a third attribute information storage area 1323.
  • the first attribute information storage area 1321 stores voice attribute information for changing attributes when a context having a special meaning becomes a determined context.
  • the first attribute information storage area 1321 is provided with “meaning”, “context ID”, “first change attribute”, and “second change attribute” as data items.
  • items of “type”, “method”, and “change value” are provided, respectively.
  • a context ID is assigned to each meaning, and two types of attributes can be set as change attributes. Examples of attribute types include the speed of output speech, the type of acoustic model used for speech synthesis, the pitch of output speech, and the voice quality of output speech after filtering.
  • the attribute is not limited to this, and an attribute that can be assigned to a speech synthesis program for performing speech synthesis may be used.
  • the context specified by the context ID assigned to the meaning is referred to as a “semantic context”.
  • the special meanings are “hobby”, “special field”, “disadvantage field”, and “chat”.
  • the context ID assigned to “hobby” is “0101-0000”. Since the first change attribute is “speed” and the change value is “1.2”, the speed of the output voice is changed to 1.2. Since the second change attribute is “pitch” and the method is “high”, the pitch of the output audio is changed by a predetermined amount.
  • the predetermined amount to be changed is determined in advance. For example, if the method is “high”, the pitch is changed by 0.1 higher than the current pitch. If the method is "low”, the pitch is changed 0.1 lower than the current pitch.
  • voice type is designated as the first change attribute
  • the change value is “modelC”.
  • modelC an acoustic model “modelC” is used among the acoustic models when performing speech synthesis.
  • the acoustic model is stored in the acoustic model storage area 133 of the HDD 13. Note that the example shown in FIG. 4 is merely an example, other meanings may be set, and a plurality of contexts may be assigned to one meaning.
  • the voice attribute information is not limited to the information shown in FIG.
  • the second attribute information storage area 1322 stores audio attribute information for changing attributes when a context of a specific hierarchy in the context tree becomes a determined context.
  • a context belonging to a specific hierarchy is referred to as a “specific hierarchy context”.
  • “hierarchy” and “first change attribute” are provided as data items.
  • items of “type”, “method”, and “change value” are provided in the “first change attribute”.
  • the first change attribute is assigned to each layer, and one voice attribute can be set as the change attribute.
  • “highest level”, “second level”, and “lowest level” are designated as specific levels. If the determined context is the highest layer of the context tree, that is, if the context ID is “0000”, an instruction to change all attributes to initial values is issued. If the determined context is the context of the second layer, that is, if the context ID is “02 ***-****” (* is an arbitrary number), an instruction to set the pitch to “0.6” is issued. The If the determined context is the lowest layer of the context tree, that is, in the example shown in FIGS. 2 and 3, if the context ID is “04 ***-***”, an instruction to set the voice quality to “0.4” is issued. The Note that the change instruction illustrated in FIG. 5 is an example, and the change instruction may be set for another layer, and the change content may be other content.
  • the third attribute information storage area 1323 stores voice attribute information for changing the attribute when the movement of the determination context is a specific position change. As shown in FIG. 6, “position change” and “first change attribute” are provided as data items in the third attribute information storage area 1323. In the “first change attribute”, items of “type”, “method”, and “change value” are provided. A first change attribute is assigned to each position change, and one voice attribute can be set as a change attribute.
  • the position changes are “moving next (small ID)”, “moving next (large ID)”, “moving up one level”, “moving down one level”, “up two levels “Move to” and “Move down two levels” are provided.
  • “Move up one level” indicates a move up to the context one level higher in the context tree.
  • the movement in the case where the parent ID of the determination context before the movement and the own ID of the determination context after the movement are equal corresponds to this “move up one level”.
  • “Move down one level” indicates movement to a context one level below in the context tree.
  • the movement in the case where the own ID of the determination context before movement is equal to the parent ID of the determination context after movement corresponds to this “move down one level”.
  • “Move up two layers” indicates a move up to a context two levels higher in the context tree.
  • the movement when the parent ID of the context of the parent ID of the determination context before the movement is equal to the own ID of the determination context after the movement corresponds to this “move up two levels”.
  • “Move down two levels” indicates movement to the context two levels below in the context tree.
  • the movement when the parent ID of the context of the parent ID of the determined context after the movement is equal to the own ID of the determined context before the movement corresponds to this “move down two levels”. That is, when viewed from the decision context before movement, the decision context after movement is a parent, a parent of a parent, a child, and a child of a child (these relationships are collectively referred to as In this embodiment, the voice attribute is changed to “parent-child relationship”.
  • the operation of the main process shown in FIG. 7 is executed by the CPU 10 according to the voice interaction program stored in the hard disk device 13.
  • the first determination context and voice attributes are set (S1).
  • the initial decision context and audio attributes are predetermined.
  • the first context ID is stored in the currently determined context storage area 111 of the RAM 11, and the first audio attribute is stored in the attribute storage area 113 of the RAM 11.
  • the context ID “0000” is the first determination context.
  • the value of the counter C that counts the number of times the decision context has changed is initialized to “0” that is an initial value (S2).
  • the timer 18 for measuring the reference time for changing the sound attribute is reset, and the time measurement is started (S3).
  • S4 When a sound is input from the microphone 27, it is determined whether or not a sound is input from the user (S4).
  • S4: NO When there is no voice input from the user (S4: NO), repeated input confirmation is performed (S4), and a standby state for input from the user is set.
  • the input voice is analyzed by a well-known voice analysis technique and converted into a character (S5). It is determined whether or not an instruction to end the voice interaction agent has been issued based on whether or not the obtained character string is a word indicating the end of the voice interaction agent (S6).
  • the words indicating the end of the voice interaction agent are registered in advance, for example, “End”, “Bye Bye”, “Goodbye”, “Jaane”, “End”, “Good Night”. If the obtained character string is not an end instruction (S6: NO), a keyword is extracted from the character string (S7).
  • the part of speech is decomposed, and it is determined whether or not there is a keyword in the obtained word. If a word registered in the “keyword” of the context tree storage area 131 is included in the word, the keyword that appears first in the character string is set as a keyword for context determination. Then, a determination context is determined based on the extracted keyword (S8). Specifically, the context ID associated with the extracted keyword is set as the context ID of the determined context. The context ID currently stored in the determined context storage area 111 is stored in the previous determined context storage area 112. The context ID associated with the keyword is stored in the currently determined context storage area 111.
  • a response sentence is determined in response to the text converted from the voice input by the user (S20).
  • the response sentence is determined based on a predetermined rule by a well-known dialogue technique.
  • the type of response sentence to be determined is not particularly important and will not be described.
  • the response sentence determined in S20 is voice-synthesized by a well-known voice synthesis technique based on the attribute stored in the attribute storage area 113 of the RAM 11 (S21) and output from the speaker 26 (S22). And it returns to S4 and the input from a user waits (S4).
  • the second process shown in FIG. 9 When the second process shown in FIG. 9 is started, first, it is determined whether or not the determined context is a semantic context (S38). If the context ID of the determined context is stored in the “context ID” of the first attribute information storage area 1321 (see FIG. 4), it is determined that the determined context is a semantic context (S38: YES). Therefore, the attribute of the output voice is changed (S41). Specifically, “first change attribute” and “second change attribute” in the first attribute information storage area 1321 are referred to. In this case, in the attribute storage area 113, the attribute designated by “type” is changed based on the designation of “method” or “change value”. For example, if the determined context ID is “0101-0000”, “Speed” is set to “1.2”, and “0.1” is added to the value of “Pitch”. Thereafter, the process returns to the main process.
  • the determination context is not a semantic context (S38: NO)
  • the determined context ID is a context ID belonging to the hierarchy specified in the “hierarchy” of the second attribute information storage area 1322 (see FIG. 5)
  • the self ID of the decision context ID is “0000” (the highest layer)
  • the hierarchy ID of the decision context ID is “02” (the second layer)
  • the decision context When the ID of the ID is “04” (lowermost layer), it is determined that the determined context is a specific hierarchy context.
  • the attribute specified by the “type” of the “first change attribute” in the second attribute information storage area 1322 is changed based on the designation of “method” or “change value”. (S42). For example, if the hierarchy ID is “02”, the “pitch” is set to “0.6”. Thereafter, the process returns to the main process.
  • the determined context is not the specific hierarchy context (S39: NO)
  • the determined context ID is compared with the previous determined context ID, and if the movement state is designated as “position change” in the third attribute information storage area 1323 (see FIG. 6), it is determined that the predetermined position change has occurred. (S40: YES). For example, in the example illustrated in FIG. 6, when the parent ID of the determination context before the movement and the own ID of the determination context after the movement are equal, it is determined that the position change is “move up one level”.
  • the attribute specified by the “type” of the “first change attribute” in the third attribute information storage area 1323 is changed based on the designation of “method” or “change value”. (S43). Thereafter, the process returns to the main process.
  • a response sentence is determined in response to a word converted from the voice input by the user (S20).
  • the response sentence is synthesized by a known speech synthesis technique based on the changed attribute stored in the attribute storage area 113 (S21) and output from the speaker 26 (S22). Then, the process returns to S4, and an input from the user is waited (S4).
  • the value of the counter C is “5” or more. Is determined (S15). If the value of the counter “C” is not “5” or more (S15: NO), the timer 18 is reset, time measurement is started (S16), and the second process is performed (S14).
  • the dialogue between the user and the voice dialogue agent proceeds.
  • the context changes if the changed context is a semantic context, a specific hierarchy context, or a predetermined position change occurs, the attribute of the output sound is changed. If the time when the decision context has not changed is equal to or longer than the predetermined time, the attribute is changed. If the determination context has exceeded the predetermined number of times within the predetermined time, the attribute is changed.
  • the response sentence output by the voice interaction agent is converted into voice based on the changed attribute stored in the attribute storage area 113, and the voice is output from the speaker 26.
  • dialogue number is a number assigned to a set of an input sentence from the user and a response sentence of the voice interaction agent.
  • the “input sentence from the user” is a sentence obtained by converting the voice input from the microphone 27 into characters.
  • Keyword is a keyword extracted from the input sentence.
  • the “context” is a determination context determined by a keyword. In “attribute”, “acoustic model”, “pitch”, “speed”, and “voice quality” are exemplified as voice attributes.
  • the “agent response text” is a response text output from the voice interaction agent in response to the input text. In the following specific example, all dialogues take place within 5 minutes.
  • the determination context ID of the first determination context is set to “0000”.
  • the initial value of the attribute is also stored in the attribute storage area 113 of the RAM 11 (S1).
  • Input sentence of dialogue number 1 for the "Hello”, "Hello” is extracted as a keyword (S7).
  • “Hello” because associated with the context of the context name of the context ID "0000”, "General” (see FIG. 2), determining the context ID is "0000” (S8). Since the previous determined context is also “0000”, there is no change in the context (S9: NO). In this case, if the measurement time by the timer 18 is less than 5 minutes (S31: NO), the attribute remains unchanged at the initial value.
  • Response sentence "Hello, recently somewhere to went out?” Is determined (S20), speech synthesis in accordance with the attribute of the initial value is performed (S21), the response sentence is output (S22).
  • the user makes the following remark, and the input sentence “Yes, I went to the exhibition” of the dialogue number 2 is input (S4: YES).
  • the input voice is converted into characters (S5), and "exhibition” is extracted as a keyword (S7). Since “Exhibition” is associated with the context of the context name “Art” with the context ID “0101-0000” (see FIG. 2), the determined context ID is “0101-0000” (S8). Since the previous determined context is “0000”, there is a change in the context (S9: YES).
  • the measurement time by the timer 18 is less than 5 minutes (S13: NO), and the determination context is a semantic context (S38: YES).
  • an input sentence of dialogue number 3 “This time it was an exhibition of pictures” is input (S4: YES). “Exhibition” is extracted as a keyword (S7). Since “Exhibition” is associated with the context with the context name “Art” having the context ID “0101-0000”, the determined context ID is “0101-0000” (S8). Since the previously determined context ID is also “0101-0000”, there is no change in the determined context (S9: NO). A response sentence “what picture?” Is determined (S20), speech synthesis is performed with the same attribute as the previous time (S21), and a response sentence is output (S22).
  • Japanese painting is extracted as a keyword (S7). Since “Japanese painting” is associated with the context with the context name “Japanese painting” having the context ID “0202-0101” (see FIG. 2), the determined context ID is “0202-0101” (S8). Since the previous determined context is “0101-0000”, there is a change in the context (S9: YES). The measurement time by the timer 18 is less than 5 minutes (S13: NO), and the determination context is a semantic context (S38: YES). Since the context ID “0202-0101” is a semantic context of the meaning “special field” (see FIG.
  • the acoustic model is changed to “modelC” (S41).
  • the response sentence “Hey, Japanese painting. Old picture or modern Japanese painting?” Is determined (S20), voice synthesis is performed with the changed attribute (S21), and a response sentence is output (S22).
  • the measurement time by the timer 18 is less than 5 minutes (S13: NO), and the determination context is a specific hierarchy context (lowermost layer) (S39: YES). Therefore, the voice quality is changed to “0.4” (S42).
  • a response sentence “where is the picture?” Is determined (S20), speech synthesis is performed with the changed attribute (S21), and a response sentence is output (S22).
  • the output voice of the voice interaction agent can be changed according to the content (context) of the conversation between the user and the voice interaction agent. Therefore, since the output voice of the voice dialogue agent becomes a voice commensurate with the context, a natural dialogue can be performed.
  • voice attribute information indicating an attribute suitable for the context is stored in correspondence with the context, it is possible to output a voice suitable for the context, that is, the content of the conversation. Therefore, the output sound can be switched to a sound suitable for the content of the conversation according to the change in context. Therefore, the user can have a natural conversation without feeling uncomfortable with the content and voice of the conversation.
  • the user who is having a conversation with the voice interactive apparatus 100 can grasp the hierarchy of the context of the conversation by the output voice. Therefore, the user can talk while grasping the change state of the content of the conversation, and helps to enjoy the conversation. For example, if a specific hierarchy is set as the lowest hierarchy, the user can know that the context does not change to detailed contents any more. Further, if the specific hierarchy is the highest hierarchy, the user can know that the conversation can be shifted to more detailed contents. Further, if a tree structure is devised so as to give some meaning to the context of a predetermined hierarchy, some meaning can be conveyed to the user by a change in voice attributes.
  • the user can understand the situation in which the content of the conversation becomes deeper, shallower, or changes in the context of the same level by the voice output from the voice interactive device 100. Therefore, the user can talk while grasping the change state of the content of the conversation, and helps to enjoy the conversation.
  • the user who is having a conversation with the voice interaction apparatus 100 knows that the context has been switched many times within a predetermined time by the voice output from the voice interaction apparatus 100. Therefore, the user can talk while feeling the change state of the content of the conversation, which helps to enjoy the conversation.
  • the user who is having a conversation with the voice interaction apparatus 100 knows that the same context has continued for a predetermined time or longer by the voice output from the voice interaction apparatus 100. Even if there is no change in the context, the attributes of the output voice change, which helps to enjoy the conversation.
  • the voice interaction device and the voice interaction system are not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present disclosure.
  • the voice interaction device having the voice interaction program is a so-called personal computer.
  • the device having the voice interaction program need not be a personal computer.
  • a portable terminal, a mobile phone, or a television may be used as long as a microphone for inputting sound and a speaker for outputting sound are provided.
  • the context tree shown in FIGS. 2 and 3 is an example, and the context tree of this example is not necessarily adopted.
  • the voice attribute information stored in the attribute information storage area 132 is also finely set, the voice more suitable for the conversation between the user and the voice interaction agent can be output.
  • the configuration of the context tree is devised, it is possible to cope with a more appropriate voice.
  • the context ID assignment rule is not limited to the above-described embodiment.
  • the voice interaction device may be configured so that the user can add contexts and keywords to the context tree. In this case, a character string may be received by the input device (keyboard or mouse) 24.
  • keywords are extracted from the input sentence, and the context is determined based on the first keyword that appears in the input sentence.
  • the keyword used to determine the context is not limited to the keyword that appears first.
  • the context having the lowest hierarchy among the contexts associated with the respective keywords may be used as the determination context.
  • the determination context may be determined according to the context of the previous dialog in consideration of the flow of the dialog. For example, assume that the keyword “program” is assigned to both the context name “concert” and the context name “computer”. In this case, if the context of the previous dialogue is “music”, the determined context may be “concert”.
  • the audio attribute is Be changed.
  • the attribute may be changed only when the determination context before and after the movement is a parent and a child. The attribute may be changed even when the relationship is more than four generations away.
  • the voice attributes are changed when the determination contexts before and after the movement belong to the same hierarchy and the identification numbers are different by one.
  • the attribute may be changed regardless of the identification number.
  • the attribute may be changed when the determination contexts before and after the movement belong to the same hierarchy and the parent is the same.

Abstract

In a case where an input of the voice from the user (S4: YES), the inputted voice is analyzed and the characters are converted (S5). In a case where the inputted voice is not an ending instruction (S6: NO), a keyword is extracted from the converted character string (S7) and determination context is determined on the basis of the extracted keyword (S8). In a case where there is a change in the determination context (S9: YES) and measurement time by a timer has not elapsed five minutes (S13: NO), it is judged whether or not the determination context is meaning context. In a case where the determination context is the meaning context, the “first change attribute” and “second change attribute” of a first attribute information storage area are referenced and the attribute of an output voice is changed (S14). A response sentence is determined (S20), voice-synthesized according to the attribute after change stored in the attribute storage area (S21), and outputted from a speaker (S22).

Description

音声対話装置及び音声対話プログラムを記憶したコンピュータ読み取り可能な媒体Spoken dialogue apparatus and computer-readable medium storing voice dialogue program
 本発明は、音声対話装置及び音声対話プログラムを記憶したコンピュータ読み取り可能な媒体に関する。より詳細には、本発明は、会話内容が変化した場合に音声の口調を変化させることのできる音声対話装置及び音声対話プログラムを記憶したコンピュータ読み取り可能な媒体に関するものである。 The present invention relates to a voice interactive device and a computer-readable medium storing a voice interactive program. More specifically, the present invention relates to a voice interactive apparatus capable of changing the tone of voice when conversation contents change, and a computer-readable medium storing a voice interactive program.
 従来、ユーザがコンピュータを使用する場合、キーボードやマウスによる情報入力、ディスプレイに文字や画像を表示することによる情報出力が行われている。このような入出力よりも、ユーザがより親しみやすい環境で情報の入出力ができるように、音声による入出力を行うユーザ支援装置及びシステムが提案されている(例えば、特許文献1参照)。特許文献1に記載のユーザ支援装置では、ユーザ支援装置とユーザとが対話することによって情報の入出力が行われる。
特開2002-163171号公報
Conventionally, when a user uses a computer, information is input by a keyboard or a mouse, and information is output by displaying characters or images on a display. There has been proposed a user support apparatus and system that perform voice input / output so that information can be input / output in a user-friendly environment rather than such input / output (see, for example, Patent Document 1). In the user support device described in Patent Literature 1, input / output of information is performed by the user support device and the user interacting with each other.
JP 2002-163171 A
 人間同士が会話をする場合、会話内容が変わるのに応じて口調やテンポにも変化が生じる。例えば、仕事の話から趣味の話へ話の内容が変化したのであれば、仕事の話中の真面目な口調が、趣味の話中の楽しげに軽い口調に変化する。しかしながら、特許文献1に記載のユーザ支援装置のような装置は、固定的な音声、固定的なスピードでユーザとの対話を行う。したがって、会話の内容が変化したとしても、それに応じて、対話する音声の口調が変化しなかったので、ユーザは不自然に感じることがあった。 When humans have a conversation, the tone and tempo change as the conversation changes. For example, if the content of the story changes from a job story to a hobby story, the serious tone during the job story changes into a joyful and light tone during the hobby story. However, an apparatus such as the user support apparatus described in Patent Document 1 interacts with the user at a fixed voice and a fixed speed. Therefore, even if the content of the conversation changes, the tone of the voice to talk with does not change accordingly, and the user may feel unnatural.
 本開示は、会話内容が変化した場合に音声の口調を変化させることのできる音声対話装置及び音声対話プログラムを記憶したコンピュータ読み取り可能な媒体を提供することを目的とする。 This disclosure is intended to provide a voice dialogue apparatus capable of changing the tone of voice when conversation contents change, and a computer-readable medium storing a voice dialogue program.
 本開示によれば、音声を入力する音声入力手段と、前記音声入力手段によって入力された音声である入力音声を文字列に変換する変換手段と、会話のコンテクストをキーワードに対応させて記憶するコンテクスト記憶手段と、前記変換手段により変換された文字列である変換文字列から前記コンテクスト記憶手段に記憶されているキーワードを抽出し、抽出された前記キーワードに対応して前記コンテクスト記憶手段に記憶されている前記コンテクストを前記入力音声のコンテクストに決定するコンテクスト決定手段と、前記入力音声に応じた会話文を決定する会話文決定手段と、音声を出力する音声出力手段と、前記音声出力手段によって出力される音声の属性を記憶する属性記憶手段と、前記属性記憶手段に記憶された属性で、前記会話文決定手段によって決定された前記会話文を前記音声出力手段に音声出力させる出力制御手段と、前記コンテクスト決定手段によって決定された前記コンテクストである決定コンテクストが、前記コンテクスト決定手段によって前回決定された前記コンテクストである前回決定コンテクストから変化したか否かを判断する判断手段と、前記判断手段によって前記決定コンテクストが変化したと判断された場合に、前記属性記憶手段に記憶されている音声の属性を変更する属性変更手段とを備えた音声対話装置が提供される。 According to the present disclosure, a voice input unit that inputs voice, a conversion unit that converts an input voice, which is a voice input by the voice input unit, into a character string, and a context that stores a conversation context corresponding to a keyword A keyword stored in the context storage means is extracted from a storage means and a converted character string that is a character string converted by the conversion means, and stored in the context storage means corresponding to the extracted keyword. Output by the context determination means for determining the context of the input voice as the context of the input voice, the conversation sentence determination means for determining a conversation sentence according to the input voice, the voice output means for outputting the voice, and the voice output means. Attribute storage means for storing audio attributes, and attributes stored in the attribute storage means. Output control means for causing the voice output means to output the conversation sentence determined by the sentence determination means, and the determination context that is the context determined by the context determination means is determined previously by the context determination means A determination unit that determines whether or not the determination context has changed from a previous determination context that is a context; and when the determination unit determines that the determination context has changed, the attribute of the voice stored in the attribute storage unit is changed There is provided a voice interactive apparatus comprising attribute changing means.
音声対話装置100のハードウェアブロック図である。2 is a hardware block diagram of the voice interactive apparatus 100. FIG. コンテクストツリー記憶エリア131の構成を示す模式図である。3 is a schematic diagram showing a configuration of a context tree storage area 131. FIG. コンテクストツリー記憶エリア131に記憶されているコンテクストのツリー構造の模式図である。4 is a schematic diagram of a tree structure of contexts stored in a context tree storage area 131. FIG. 第一属性情報記憶エリア1321の構成を示す模式図である。5 is a schematic diagram showing a configuration of a first attribute information storage area 1321. FIG. 第二属性情報記憶エリア1322の構成を示す模式図である。It is a schematic diagram which shows the structure of the 2nd attribute information storage area 1322. FIG. 第三属性情報記憶エリア1323の構成を示す模式図である。10 is a schematic diagram showing a configuration of a third attribute information storage area 1323. FIG. 音声対話装置100のメイン処理のフローチャートである。3 is a flowchart of main processing of the voice interaction apparatus 100. メイン処理中で実行される第一処理のフローチャートである。It is a flowchart of the 1st process performed in the main process. メイン処理中で実行される第二処理のフローチャートである。It is a flowchart of the 2nd process performed in the main process. ユーザと音声対話エージェントとの対話の一例を示す図である。It is a figure which shows an example of the dialogue between a user and a voice dialogue agent.
 以下、本開示に係る実施の形態を図面を参照して説明する。本実施形態の音声対話装置100は、所謂パーソナルコンピュータである。図1に示すように、音声対話装置100には、音声対話装置100の制御を司るCPU10が設けられている。CPU10には、各種のデータを一時的に記憶するRAM11と、BIOS等を記憶したROM12とが接続している。さらに、CPU10には、バスを介して、ハードディスク装置13、出力制御部14、入力制御部15、音声出力制御部16、音声入力制御部17、タイマ18が接続している。出力制御部14には出力機器24が接続され、入力制御部15には入力機器25が接続されている。出力機器24とは、例えばディスプレイであり、入力機器25とは、例えばマウスやキーボードである。音声出力制御部16にはスピーカ26が接続され、音声入力制御部17にはマイク27が接続されている。タイマ18は時間を計測する。 Hereinafter, embodiments according to the present disclosure will be described with reference to the drawings. The voice interaction apparatus 100 of this embodiment is a so-called personal computer. As shown in FIG. 1, the voice interaction apparatus 100 is provided with a CPU 10 that controls the voice interaction apparatus 100. Connected to the CPU 10 are a RAM 11 that temporarily stores various data and a ROM 12 that stores BIOS and the like. Further, a hard disk device 13, an output control unit 14, an input control unit 15, an audio output control unit 16, an audio input control unit 17, and a timer 18 are connected to the CPU 10 via a bus. An output device 24 is connected to the output control unit 14, and an input device 25 is connected to the input control unit 15. The output device 24 is, for example, a display, and the input device 25 is, for example, a mouse or a keyboard. A speaker 26 is connected to the audio output control unit 16, and a microphone 27 is connected to the audio input control unit 17. The timer 18 measures time.
 ハードディスク装置13には、コンテクストツリー記憶エリア131,属性情報記憶エリア132,音響モデル記憶エリア133,音声対話プログラム記憶エリア134,その他の情報記憶エリア135が少なくとも設けられている。コンテクストツリー記憶エリア131には、コンテクスト(会話の内容)の関係を示したコンテクストツリーが記憶されている。属性情報記憶エリア132には、所定の条件を満たすコンテクストの会話がなされている際に指定される音声属性に関する情報(以下、「音声属性情報」という)が記憶されている。音響モデル記憶エリア133には、音声をマイク27から出力するための複数の音響モデルが記憶されている。音声対話プログラム記憶エリア134には、CPU10で実行される音声対話プログラムが記憶されている。その他の情報記憶エリア135には、音声対話装置100で使用されるその他の情報が記憶されている。 The hard disk device 13 is provided with at least a context tree storage area 131, an attribute information storage area 132, an acoustic model storage area 133, a voice interaction program storage area 134, and other information storage areas 135. In the context tree storage area 131, a context tree indicating the relationship of contexts (contents of conversation) is stored. The attribute information storage area 132 stores information related to a voice attribute (hereinafter referred to as “voice attribute information”) designated when a context conversation satisfying a predetermined condition is made. The acoustic model storage area 133 stores a plurality of acoustic models for outputting sound from the microphone 27. The voice interaction program storage area 134 stores a voice interaction program executed by the CPU 10. In the other information storage area 135, other information used in the voice interactive apparatus 100 is stored.
 RAM11には現在決定コンテクスト記憶エリア111、前回決定コンテクスト記憶エリア112、属性記憶エリア113が設けられている。現在決定コンテクスト記憶エリア111には、現在の決定コンテクストのコンテクストID(以下、「決定コンテクストID」という。)が記憶される。前回決定コンテクスト記憶エリア112には、現在のコンテクストとなる直前の決定コンテクストのコンテクストID(以下、「前回決定コンテクストID」という。)が記憶される。属性記憶エリア113には、スピーカ26から出力される音声を音声合成する際の属性が記憶されている。属性のデータ項目は、例えば、スピード、ピッチ、音響モデル、フィルター後の声質である。 The RAM 11 is provided with a currently determined context storage area 111, a previous determined context storage area 112, and an attribute storage area 113. In the current determination context storage area 111, a context ID of the current determination context (hereinafter referred to as “determination context ID”) is stored. In the previous determination context storage area 112, a context ID of the determination context immediately before becoming the current context (hereinafter referred to as “previous determination context ID”) is stored. The attribute storage area 113 stores attributes used when the voice output from the speaker 26 is synthesized. The attribute data items are, for example, speed, pitch, acoustic model, and voice quality after filtering.
 本実施の形態では、音声対話装置100において、音声対話プログラムが実行されると、音声対話エージェントが起動される。音声対話エージェントによって、出力機器(ディスプレイ)24にキャラクタの画像が表示される。このキャラクタ画像が音声対話エージェントを具象化したものである。ユーザは、このキャラクタ画像と対話する感覚で音声対話エージェントとの対話を行う。ユーザからの発言(音声)は、マイク27から入力される。入力された音声がテキスト解析され、ユーザからの入力文とされる。入力文に応じた応答文が決定され、音声変換されてスピーカ26から音声出力される。音声出力の際には、キャラクタ画像も言葉を発しているような図柄となり、キャラクタと対話をしている臨場感をユーザに与える。 In the present embodiment, when the voice dialogue program is executed in the voice dialogue apparatus 100, the voice dialogue agent is activated. An image of the character is displayed on the output device (display) 24 by the voice interaction agent. This character image is a concrete representation of a voice interaction agent. The user interacts with the voice interaction agent as if interacting with the character image. A speech (voice) from the user is input from the microphone 27. The input voice is analyzed as text and used as an input sentence from the user. A response sentence corresponding to the input sentence is determined, voice-converted, and output from the speaker 26 as voice. At the time of voice output, the character image also has a design that speaks a word, giving the user a sense of realism that is interacting with the character.
 さらに、ユーザと音声対話エージェントとの対話内容が、ユーザの入力文中のキーワードにより決定される。この対話内容を「コンテクスト」という。このコンテクストはツリー構造で表される(図3参照)。音声対話エージェントは、特定のコンテクストや、コンテクストの移動状態に応じて音声対話エージェントの出力音声の属性を変更し、会話の内容に相応しい音声を出力する。 Furthermore, the dialogue content between the user and the voice dialogue agent is determined by a keyword in the user's input sentence. This dialogue is called “context”. This context is represented by a tree structure (see FIG. 3). The voice interaction agent changes the attribute of the output voice of the voice interaction agent according to the specific context and the moving state of the context, and outputs a sound suitable for the content of the conversation.
 図2及び図3を参照して、HDD13に設けられているコンテクストツリー記憶エリア131について説明する。 The context tree storage area 131 provided in the HDD 13 will be described with reference to FIGS.
 図2に示すように、コンテクストツリー記憶エリア131には、データ項目として「コンテクストID」,「コンテクスト名」,「キーワード」が設けられている。コンテクストID毎に、コンテクスト名が与えられている。さらに、コンテクストIDにはキーワードが割り当てられている。ユーザと音声対話エージェントとの会話の中にキーワードが出現した場合には、そのキーワードが対応付けられているコンテクストが、現在の会話のコンテクストである「決定コンテクスト」とされる。なお、図2に示すコンテクストは一例である。 As shown in FIG. 2, the context tree storage area 131 is provided with “context ID”, “context name”, and “keyword” as data items. A context name is given for each context ID. Further, a keyword is assigned to the context ID. When a keyword appears in the conversation between the user and the voice interaction agent, the context associated with the keyword is set as the “determined context” that is the context of the current conversation. Note that the context shown in FIG. 2 is an example.
 コンテクストIDの付与規則について説明する。コンテクストID「0000」は、ツリー構造の根となるコンテクストに対して付与されるIDである。枝上のコンテクストには、例えば「0100-0000」のように、4桁+4桁のIDが付与される。後の4桁「0000」は、親(1階層上位)のコンテクストIDである。つまり、「0100-0000」は、コンテクストID「0000」の子(1階層下位)であることを示す。以下、後の4桁のIDを「親ID」という。図2に示すコンテクストツリー記憶エリア131では、図3に示すように、コンテクストID「0000」のコンテクスト名「一般」のコンテクストが根となっている。コンテクストID「0000」のコンテクストの子として、コンテクストID「0100-0000」のコンテクスト名「音楽」のコンテクスト,コンテクストID「0101-0000」のコンテクスト名「アート」のコンテクスト,コンテクストID「0102-0000」のコンテクスト名「雑談」のコンテクストが根のコンテクストに接続している。 Context rule assignment rules are explained. The context ID “0000” is an ID given to the context that is the root of the tree structure. The context on the branch is given a 4-digit + 4-digit ID such as “0100-0000”. The last four digits “0000” are the context ID of the parent (one layer higher). That is, “0100-0000” indicates a child of the context ID “0000” (lower one hierarchy). Hereinafter, the subsequent four-digit ID is referred to as “parent ID”. In the context tree storage area 131 shown in FIG. 2, as shown in FIG. 3, the context with the context name “general” having the context ID “0000” is the root. As a child of the context with the context ID “0000”, the context with the context name “music” with the context ID “0100-0000”, the context with the context name “art” with the context ID “0101-0000”, and the context ID “0102-0000”. The context with the name "Chat" is connected to the root context.
 また、コンテクストIDの前の4桁のうち、先の2桁はツリー構造の階層を示している。図3に示すように、ツリー構造の根のコンテクストでは、先の2桁のIDは「00」であり、階層「00」を示している。コンテクストID「0100-0000」のコンテクスト名「音楽」のコンテクストでは、先の2桁のID「01」は1階層目を示している。コンテクストID「0100-0000」のコンテクストの子のコンテクストID「0200-0100」,「0201-0100」では、前の4桁のうちの先の2桁のID「02」は2階層目を示している。さらに、前の4桁のうちの後の2桁は、同一階層内での識別番号である。図2及び図3で示す例では、識別番号として、「00」から順に「01」,「02」が割り当てられている。以下、前の4桁を「自ID」、自IDのうち前の2桁を「階層ID」、後の2桁を「識別番号」という。つまり、コンテクストIDは「(自ID4桁)-(親ID4桁)」、すなわち「(階層ID2桁)(識別番号2桁)-(親ID4桁)」で構成される。このようなコンテクストIDの付与規則に従って、互いに重複しないIDがコンテクストに対して割り当てられるので、コンテクストをコンテクストIDによって識別できる。 Also, out of the four digits before the context ID, the first two digits indicate the hierarchy of the tree structure. As shown in FIG. 3, in the context of the root of the tree structure, the previous two-digit ID is “00”, indicating the hierarchy “00”. In the context of the context name “music” with the context ID “0100-0000”, the previous two-digit ID “01” indicates the first layer. In the context IDs “0200-0100” and “0201-0100” of the context child of the context ID “0100-0000”, the first two digits of the previous four digits “02” indicate the second layer. Yes. Further, the last two digits of the preceding four digits are identification numbers in the same hierarchy. In the example shown in FIG. 2 and FIG. 3, “01” and “02” are assigned in order from “00” as identification numbers. Hereinafter, the first four digits are referred to as “own ID”, the previous two digits of the own ID are referred to as “hierarchy ID”, and the latter two digits are referred to as “identification number”. That is, the context ID is composed of “(own ID 4 digits) − (parent ID 4 digits)”, that is, “(hierarchical ID 2 digits) (identification number 2 digits) − (parent ID 4 digits)”. According to such a context ID assigning rule, IDs that do not overlap each other are assigned to the context, so that the context can be identified by the context ID.
 次に、図4乃至図6を参照して、HDD13に設けられている属性情報記憶エリア132について説明する。属性情報記憶エリア132は、第一属性情報記憶エリア1321、第二属性情報記憶エリア1322、及び第三属性情報記憶エリア1323を備えている。 Next, the attribute information storage area 132 provided in the HDD 13 will be described with reference to FIGS. The attribute information storage area 132 includes a first attribute information storage area 1321, a second attribute information storage area 1322, and a third attribute information storage area 1323.
 まず、図4を参照して、第一属性情報記憶エリア1321について説明する。第一属性情報記憶エリア1321には、特別な意味を持つコンテクストが決定コンテクストとなった場合に属性を変更するための音声属性情報が記憶されている。図4に示すように、第一属性情報記憶エリア1321には、データ項目として「意味」,「コンテクストID」,「第一変更属性」,「第二変更属性」が設けられている。「第一変更属性」,「第二変更属性」には、それぞれ「種類」,「方法」,「変更値」の項目が設けられている。それぞれの意味にコンテクストIDが割り当てられており、属性のうち、2種類の属性を変更属性として設定することができる。属性の種類としては、例えば、出力音声のスピード,音声合成の際に使用する音響モデルの種類,出力音声のピッチ,フィルター後の出力音声の声質がある。なお、属性はこれに限らず、音声合成を行う音声合成プログラムに対して付与可能な属性を用いればよい。以下、意味に割り当てられているコンテクストIDで特定されるコンテクストを「意味コンテクスト」という。 First, the first attribute information storage area 1321 will be described with reference to FIG. The first attribute information storage area 1321 stores voice attribute information for changing attributes when a context having a special meaning becomes a determined context. As shown in FIG. 4, the first attribute information storage area 1321 is provided with “meaning”, “context ID”, “first change attribute”, and “second change attribute” as data items. In the “first change attribute” and “second change attribute”, items of “type”, “method”, and “change value” are provided, respectively. A context ID is assigned to each meaning, and two types of attributes can be set as change attributes. Examples of attribute types include the speed of output speech, the type of acoustic model used for speech synthesis, the pitch of output speech, and the voice quality of output speech after filtering. The attribute is not limited to this, and an attribute that can be assigned to a speech synthesis program for performing speech synthesis may be used. Hereinafter, the context specified by the context ID assigned to the meaning is referred to as a “semantic context”.
 図4に示す例では、特別の意味として「趣味」,「得意分野」,「不得意分野」,「雑談」がある。「趣味」に割り当てられているコンテクストIDは「0101-0000」である。第一変更属性は「スピード」であり、変更値が「1.2」とされているので、出力音声のスピードが1.2に変更される。第二変更属性は「ピッチ」であり、方法が「高く」なので、出力音声のピッチが所定量高く変更される。変更される所定量は、予め定められており、例えば方法が「高く」であれば、ピッチが現在のピッチよりも0.1高く変更される。方法が「低く」であれば、ピッチが現在のピッチよりも0.1低く変更される。また、意味「得意分野」では、第一変更属性として「声の種類」が指定されており、変更値が「modelC」とされている。これは、音声合成を行う際に、音響モデルのうち「modelC」という音響モデルが使用されるということを示す。音響モデルはHDD13の音響モデル記憶エリア133に記憶されている。なお、図4に示す例はあくまでも一例であり、他の意味を設定してもよいし、1つの意味に複数のコンテクストを割り当ててもよい。また、音声属性情報は図4に示す情報に限らない。 In the example shown in FIG. 4, the special meanings are “hobby”, “special field”, “disadvantage field”, and “chat”. The context ID assigned to “hobby” is “0101-0000”. Since the first change attribute is “speed” and the change value is “1.2”, the speed of the output voice is changed to 1.2. Since the second change attribute is “pitch” and the method is “high”, the pitch of the output audio is changed by a predetermined amount. The predetermined amount to be changed is determined in advance. For example, if the method is “high”, the pitch is changed by 0.1 higher than the current pitch. If the method is "low", the pitch is changed 0.1 lower than the current pitch. Further, in the meaning “special field”, “voice type” is designated as the first change attribute, and the change value is “modelC”. This indicates that an acoustic model “modelC” is used among the acoustic models when performing speech synthesis. The acoustic model is stored in the acoustic model storage area 133 of the HDD 13. Note that the example shown in FIG. 4 is merely an example, other meanings may be set, and a plurality of contexts may be assigned to one meaning. The voice attribute information is not limited to the information shown in FIG.
 次に、図5を参照して、第二属性情報記憶エリア1322について説明する。第二属性情報記憶エリア1322には、コンテクストツリーにおいて特定の階層のコンテクストが決定コンテクストとなった場合に属性を変更するための音声属性情報が記憶されている。以下、特定の階層に属するコンテクストを「特定階層コンテクスト」という。図5に示すように、第二属性情報記憶エリア1322には、データ項目として「階層」及び「第一変更属性」が設けられている。「第一変更属性」には、「種類」,「方法」,「変更値」の項目が設けられている。それぞれの階層に対して、第一変更属性が割り当てられており、1つの音声属性を変更属性として設定することができる。 Next, the second attribute information storage area 1322 will be described with reference to FIG. The second attribute information storage area 1322 stores audio attribute information for changing attributes when a context of a specific hierarchy in the context tree becomes a determined context. Hereinafter, a context belonging to a specific hierarchy is referred to as a “specific hierarchy context”. As shown in FIG. 5, in the second attribute information storage area 1322, “hierarchy” and “first change attribute” are provided as data items. In the “first change attribute”, items of “type”, “method”, and “change value” are provided. The first change attribute is assigned to each layer, and one voice attribute can be set as the change attribute.
 図5に示す例では、特定の階層として「最上位」,「2階層目」,「最下層」が指定されている。決定コンテクストがコンテクストツリーの最上位層、つまり、コンテクストIDが「0000」であれば、全ての属性を初期値に変更する指示が成される。決定コンテクストが2階層目のコンテクストである、つまり、コンテクストIDが「02**-****(*は任意の数)」であれば、ピッチを「0.6」とする指示が成される。決定コンテクストがコンテクストツリーの最下層、つまり、図2及び図3に示す例ではコンテクストIDが「04**-****」であれば、声質を「0.4」とする指示が成される。なお、図5に示す変更指示は一例であり、他の階層に対して変更指示を設定してもよく、また、変更内容は他の内容であってもよい。 In the example shown in FIG. 5, “highest level”, “second level”, and “lowest level” are designated as specific levels. If the determined context is the highest layer of the context tree, that is, if the context ID is “0000”, an instruction to change all attributes to initial values is issued. If the determined context is the context of the second layer, that is, if the context ID is “02 ***-****” (* is an arbitrary number), an instruction to set the pitch to “0.6” is issued. The If the determined context is the lowest layer of the context tree, that is, in the example shown in FIGS. 2 and 3, if the context ID is “04 ***-***”, an instruction to set the voice quality to “0.4” is issued. The Note that the change instruction illustrated in FIG. 5 is an example, and the change instruction may be set for another layer, and the change content may be other content.
 次に、図6を参照して、第三属性情報記憶エリア1323について説明する。詳細は後述するが、音声対話装置100では、決定コンテクストが変更された場合に、コンテクストツリーにおいてどのような位置関係で決定コンテクストが移動したかが判断される。第三属性情報記憶エリア1323には、決定コンテクストの移動が特定の位置変化であった場合に属性を変更するための音声属性情報が記憶されている。図6に示すように、第三属性情報記憶エリア1323には、データ項目として「位置変化」及び「第一変更属性」が設けられている。「第一変更属性」には、「種類」,「方法」,「変更値」の項目が設けられている。それぞれの位置変化に対して、第一変更属性が割り当てられており、1つの音声属性を変更属性として設定することができる。 Next, the third attribute information storage area 1323 will be described with reference to FIG. Although details will be described later, in the voice interaction device 100, when the determined context is changed, it is determined in what positional relationship the determined context has moved in the context tree. The third attribute information storage area 1323 stores voice attribute information for changing the attribute when the movement of the determination context is a specific position change. As shown in FIG. 6, “position change” and “first change attribute” are provided as data items in the third attribute information storage area 1323. In the “first change attribute”, items of “type”, “method”, and “change value” are provided. A first change attribute is assigned to each position change, and one voice attribute can be set as a change attribute.
 図6に示す例では、位置変化として「隣に移動(ID小)」,「隣に移動(ID大)」,「1階層上に移動」,「1階層下に移動」,「2階層上に移動」,「2階層下に移動」が設けられている。「隣に移動(ID小)」は、コンテクストツリーにおいて同じ階層の隣のコンテクストで、識別番号が1つ小さい方のコンテクストへの移動を示している。つまり、移動前の決定コンテクストと、移動後の決定コンテクストとの階層IDが等しく、「移動後の識別番号=移動前の識別番号-1」が成立する場合が、この「隣に移動(ID小)」に該当する。「隣に移動(ID大)」は、コンテクストツリーにおいて同じ階層の隣のコンテクストで、識別番号が1つ大きい方のコンテクストへの移動を示している。つまり、移動前の決定コンテクストと、移動後の決定コンテクストとの階層IDが等しく、「移動後の識別番号=移動前の識別番号+1」が成立する移動が、この「隣に移動(ID大)」に該当する。 In the example shown in FIG. 6, the position changes are “moving next (small ID)”, “moving next (large ID)”, “moving up one level”, “moving down one level”, “up two levels "Move to" and "Move down two levels" are provided. “Move to next (small ID)” indicates movement to the context of the next lower level in the context tree and having the identification number one smaller. In other words, when the hierarchy ID of the decision context before the movement and the decision context after the movement are equal and “identification number after movement = identification number before movement−1” is satisfied, this “move next (ID small ID ) ”. “Move to next (large ID)” indicates a move to a context having an identification number larger by one in the context next to the same hierarchy in the context tree. That is, the movement in which the hierarchy ID of the determination context before the movement and the determination context after the movement is equal and “the identification number after the movement = the identification number before the movement + 1” is established is the “movement next (large ID)”. It corresponds to.
 「1階層上に移動」は、コンテクストツリーにおいて1つ上の階層のコンテクストへの移動を示している。移動前の決定コンテクストの親IDと移動後の決定コンテクストの自IDとが等しい場合の移動が、この「1階層上に移動」に該当する。「1階層下に移動」は、コンテクストツリーにおいて1つ下の階層のコンテクストへの移動を示している。移動前の決定コンテクストの自IDと移動後の決定コンテクストの親IDとが等しい場合の移動が、この「1階層下に移動」に該当する。「2階層上に移動」は、コンテクストツリーにおいて2つ上の階層のコンテクストへの移動を示している。移動前の決定コンテクストの親IDのコンテクストの親IDと、移動後の決定コンテクストの自IDとが等しい場合の移動が、この「2階層上に移動」に該当する。「2階層下に移動」は、コンテクストツリーにおいて2つ下の階層のコンテクストへの移動を示している。移動後の決定コンテクストの親IDのコンテクストの親IDと、移動前の決定コンテクストの自IDとが等しい場合の移動が、この「2階層下に移動」に該当する。すなわち、移動前の決定コンテクストから見て、移動後の決定コンテクストが親である場合、親の親である場合、子である場合、及び子の子である場合(これらの関係を総称して、本実施の形態では「親子関係」という。)に音声の属性が変更される。 “Move up one level” indicates a move up to the context one level higher in the context tree. The movement in the case where the parent ID of the determination context before the movement and the own ID of the determination context after the movement are equal corresponds to this “move up one level”. “Move down one level” indicates movement to a context one level below in the context tree. The movement in the case where the own ID of the determination context before movement is equal to the parent ID of the determination context after movement corresponds to this “move down one level”. “Move up two layers” indicates a move up to a context two levels higher in the context tree. The movement when the parent ID of the context of the parent ID of the determination context before the movement is equal to the own ID of the determination context after the movement corresponds to this “move up two levels”. “Move down two levels” indicates movement to the context two levels below in the context tree. The movement when the parent ID of the context of the parent ID of the determined context after the movement is equal to the own ID of the determined context before the movement corresponds to this “move down two levels”. That is, when viewed from the decision context before movement, the decision context after movement is a parent, a parent of a parent, a child, and a child of a child (these relationships are collectively referred to as In this embodiment, the voice attribute is changed to “parent-child relationship”.
 次に、図7乃至図9を参照して、音声対話装置100において、音声対話エージェントが起動した際の動作について、音声の属性の変更に主点をおいて説明する。図7に示すメイン処理の動作は、ハードディスク装置13に記憶されている音声対話プログラムに従ってCPU10が実行する。まず、最初の決定コンテクスト及び音声の属性が設定される(S1)。最初の決定コンテクスト及び音声の属性は予め定められている。最初のコンテクストIDが、RAM11の現在決定コンテクスト記憶エリア111に記憶され、最初の音声の属性が、RAM11の属性記憶エリア113に記憶される。図2,図3に示す例では、例えば、コンテクストID「0000」が最初の決定コンテクストとされる。 Next, with reference to FIG. 7 to FIG. 9, the operation when the voice interaction agent is activated in the voice interaction device 100 will be described with a focus on changing the voice attribute. The operation of the main process shown in FIG. 7 is executed by the CPU 10 according to the voice interaction program stored in the hard disk device 13. First, the first determination context and voice attributes are set (S1). The initial decision context and audio attributes are predetermined. The first context ID is stored in the currently determined context storage area 111 of the RAM 11, and the first audio attribute is stored in the attribute storage area 113 of the RAM 11. In the example illustrated in FIGS. 2 and 3, for example, the context ID “0000” is the first determination context.
 次いで、決定コンテクストが変化した回数を計数するカウンタCの値が、初期値である「0」に初期化される(S2)。音声の属性を変化させる基準となる時間を計測するタイマ18がリセットされて、時間の計測が開始される(S3)。マイク27から音声が入力されることにより、ユーザからの音声の入力があったか否かの判断が行われる(S4)。ユーザからの音声の入力がない場合には(S4:NO)、繰り返し入力の確認が行われ(S4)、ユーザからの入力の待機状態とされる。 Next, the value of the counter C that counts the number of times the decision context has changed is initialized to “0” that is an initial value (S2). The timer 18 for measuring the reference time for changing the sound attribute is reset, and the time measurement is started (S3). When a sound is input from the microphone 27, it is determined whether or not a sound is input from the user (S4). When there is no voice input from the user (S4: NO), repeated input confirmation is performed (S4), and a standby state for input from the user is set.
 ユーザからの音声の入力があった場合には(S4:YES)、入力された音声が周知の音声解析技術によって解析されて、文字変換される(S5)。得られた文字列が音声対話エージェントの終了を示す文言であるか否かによって、音声対話エージェントの終了指示が行われたか否かの判断が行われる(S6)。音声対話エージェントの終了を示す文言は、予め登録されているものであり、例えば「終わるよ」,「バイバイ」,「さよなら」,「じゃあね」,「終わり」,「おやすみ」というものである。得られた文字列が終了指示でなければ(S6:NO)、文字列からキーワードが抽出される(S7)。具体的には、文字列が品詞分解され、得られた単語の中にキーワードがあるか否かの判断が行われる。単語の中に、コンテクストツリー記憶エリア131の「キーワード」に登録されている単語が含まれていれば、文字列の中で最も早く出現したキーワードが、コンテクスト決定のためのキーワードとされる。そして、抽出されたキーワードに基づいて決定コンテクストが決定される(S8)。具体的には、抽出されたキーワードが対応付けられているコンテクストIDが、決定コンテクストのコンテクストIDとされる。現在決定コンテクスト記憶エリア111に記憶されているコンテクストIDが、前回決定コンテクスト記憶エリア112に記憶される。キーワードが対応付けられているコンテクストIDが、現在決定コンテクスト記憶エリア111に記憶される。 When there is a voice input from the user (S4: YES), the input voice is analyzed by a well-known voice analysis technique and converted into a character (S5). It is determined whether or not an instruction to end the voice interaction agent has been issued based on whether or not the obtained character string is a word indicating the end of the voice interaction agent (S6). The words indicating the end of the voice interaction agent are registered in advance, for example, “End”, “Bye Bye”, “Goodbye”, “Jaane”, “End”, “Good Night”. If the obtained character string is not an end instruction (S6: NO), a keyword is extracted from the character string (S7). Specifically, the part of speech is decomposed, and it is determined whether or not there is a keyword in the obtained word. If a word registered in the “keyword” of the context tree storage area 131 is included in the word, the keyword that appears first in the character string is set as a keyword for context determination. Then, a determination context is determined based on the extracted keyword (S8). Specifically, the context ID associated with the extracted keyword is set as the context ID of the determined context. The context ID currently stored in the determined context storage area 111 is stored in the previous determined context storage area 112. The context ID associated with the keyword is stored in the currently determined context storage area 111.
 次いで、決定コンテクストに変化があったか否かの判断が行われる(S9)。前回決定コンテクスト記憶エリア112に記憶されているコンテクストIDと、現在決定コンテクスト記憶エリア111に記憶されているコンテクストIDとが同じであれば、決定コンテクストに変化はなかったと判断される(S9:NO)。そして、第一処理が行われる(S10)。 Next, it is determined whether or not the decision context has changed (S9). If the context ID stored in the previously determined context storage area 112 is the same as the context ID stored in the currently determined context storage area 111, it is determined that the determined context has not changed (S9: NO). . Then, the first process is performed (S10).
 図8に示す第一処理が開始されると、タイマ18で計測されている時間が5分以上経過しているか否かが判断される(S31)。5分以上経過していなければ(S31:NO)、処理はメイン処理へ戻る。5分以上経過していれば(S31:YES)、カウンタCの値が「0」であるか否かが判断される(S32)。「0」である場合、すなわち、決定コンテクストが5分以上変化していない場合には(S32:YES)、属性の1つである「ピッチ」の値が0.8倍に変更される(S33)。タイマ18がリセットされ、時間の計測が開始されて(S34)、処理はメイン処理へ戻る。 When the first process shown in FIG. 8 is started, it is determined whether or not the time measured by the timer 18 has passed 5 minutes or more (S31). If five minutes or more have not elapsed (S31: NO), the process returns to the main process. If 5 minutes or more have elapsed (S31: YES), it is determined whether or not the value of the counter C is “0” (S32). If it is “0”, that is, if the decision context has not changed for 5 minutes or more (S32: YES), the value of “pitch” which is one of the attributes is changed to 0.8 times (S33). ). The timer 18 is reset, time measurement is started (S34), and the process returns to the main process.
 カウンタCの値が「0」でない場合には(S32:NO)、カウンタCの値が「5」以上であるか否かが判断される(S35)。「5」以上でなければ(S35:NO)、タイマ18がリセットされ、時間の計測が開始されて(S34)、処理はメイン処理へ戻る。カウンタCの値が「5」以上である場合、すなわち、5分間の間に決定コンテクストが少なくとも5回以上変化していれば(S35:YES)、全ての音声の属性が初期値に変更される(S36)。カウンタCの値が「0」に初期化される(S37)。タイマ18がリセットされ、時間の計測が開始されて(S34)、処理はメイン処理へ戻る。 When the value of the counter C is not “0” (S32: NO), it is determined whether or not the value of the counter C is “5” or more (S35). If it is not "5" or more (S35: NO), the timer 18 is reset, time measurement is started (S34), and the process returns to the main process. If the value of the counter C is “5” or more, that is, if the determined context has changed at least five times in 5 minutes (S35: YES), all audio attributes are changed to initial values. (S36). The value of the counter C is initialized to “0” (S37). The timer 18 is reset, time measurement is started (S34), and the process returns to the main process.
 処理が図7に示すメイン処理へ戻ると、ユーザによって入力された音声を変換した文言に応答する応答文が決定される(S20)。応答文の決定は、周知の対話技術によって、予め定められているルールに基づいて行われる。どのような応答文が決定されるかは、特に重要でないので説明を省略する。S20で決定された応答文が、RAM11の属性記憶エリア113に記憶されている属性に基づいて、周知の音声合成技術により音声合成され(S21)、スピーカ26から出力される(S22)。そして、S4へ戻り、ユーザからの入力が待機される(S4)。 When the process returns to the main process shown in FIG. 7, a response sentence is determined in response to the text converted from the voice input by the user (S20). The response sentence is determined based on a predetermined rule by a well-known dialogue technique. The type of response sentence to be determined is not particularly important and will not be described. The response sentence determined in S20 is voice-synthesized by a well-known voice synthesis technique based on the attribute stored in the attribute storage area 113 of the RAM 11 (S21) and output from the speaker 26 (S22). And it returns to S4 and the input from a user waits (S4).
 決定コンテクストに変化があった場合には(S9:YES)、決定コンテクストが変化した回数を計数するカウンタCの値に「1」が加算される(S12)。タイマ18で計測されている時間が5分以上経過しているか否かが判断される(S13)。5分以上経過していなければ(S13:NO)、第二処理が行われる(S14)。 If there is a change in the decision context (S9: YES), “1” is added to the value of the counter C that counts the number of times the decision context has changed (S12). It is determined whether or not the time measured by the timer 18 has passed 5 minutes or more (S13). If 5 minutes or more have not elapsed (S13: NO), the second process is performed (S14).
 図9に示す第二処理が開始されると、まず、決定コンテクストが意味コンテクストであるか否かの判断が行われる(S38)。決定コンテクストのコンテクストIDが、第一属性情報記憶エリア1321(図4参照)の「コンテクストID」に記憶されていれば、その決定コンテクストは意味コンテクストであると判断される(S38:YES)。そこで、出力音声の属性が変更される(S41)。具体的には、第一属性情報記憶エリア1321の「第一変更属性」及び「第二変更属性」が参照される。この場合、属性記憶エリア113において、「種類」で指定されている属性が「方法」又は「変更値」の指定に基づいて変更される。例えば、決定コンテクストIDが「0101-0000」であれば、「スピード」が「1.2」とされ、「ピッチ」の値に「0.1」が加算される。その後、処理はメイン処理へ戻る。 When the second process shown in FIG. 9 is started, first, it is determined whether or not the determined context is a semantic context (S38). If the context ID of the determined context is stored in the “context ID” of the first attribute information storage area 1321 (see FIG. 4), it is determined that the determined context is a semantic context (S38: YES). Therefore, the attribute of the output voice is changed (S41). Specifically, “first change attribute” and “second change attribute” in the first attribute information storage area 1321 are referred to. In this case, in the attribute storage area 113, the attribute designated by “type” is changed based on the designation of “method” or “change value”. For example, if the determined context ID is “0101-0000”, “Speed” is set to “1.2”, and “0.1” is added to the value of “Pitch”. Thereafter, the process returns to the main process.
 決定コンテクストが意味コンテクストでない場合には(S38:NO)、決定コンテクストが特定階層コンテクストであるか否かの判断が行われる(S39)。決定コンテクストIDが、第二属性情報記憶エリア1322(図5参照)の「階層」に指定されている階層に属するコンテクストIDである場合には、決定コンテクストが特定階層コンテクストであると判断される(S39:YES)。図5に示した例では、決定コンテクストIDの自IDが「0000」である場合(最上位層)、決定コンテクストIDの階層IDが「02」である場合(2階層目)、又は、決定コンテクストIDの階層IDが「04」である(最下層)場合に、決定コンテクストが特定階層コンテクストであると判断される。この場合、属性記憶エリア113において、第二属性情報記憶エリア1322の「第一変更属性」の「種類」で指定されている属性が、「方法」又は「変更値」の指定に基づいて変更される(S42)。例えば、階層IDが「02」であれば、「ピッチ」が「0.6」とされる。その後、処理はメイン処理へ戻る。 If the determination context is not a semantic context (S38: NO), it is determined whether the determination context is a specific hierarchy context (S39). When the determined context ID is a context ID belonging to the hierarchy specified in the “hierarchy” of the second attribute information storage area 1322 (see FIG. 5), it is determined that the determined context is a specific hierarchy context ( S39: YES). In the example shown in FIG. 5, when the self ID of the decision context ID is “0000” (the highest layer), when the hierarchy ID of the decision context ID is “02” (the second layer), or the decision context When the ID of the ID is “04” (lowermost layer), it is determined that the determined context is a specific hierarchy context. In this case, in the attribute storage area 113, the attribute specified by the “type” of the “first change attribute” in the second attribute information storage area 1322 is changed based on the designation of “method” or “change value”. (S42). For example, if the hierarchy ID is “02”, the “pitch” is set to “0.6”. Thereafter, the process returns to the main process.
 決定コンテクストが特定階層コンテクストでない場合には(S39:NO)、決定コンテクストの移動状態が所定の位置変化であるか否かの判断が行われる(S40)。決定コンテクストIDと前回決定コンテクストIDとが比較され、第三属性情報記憶エリア1323(図6参照)の「位置変化」に指定されている移動状態であれば、所定の位置変化であると判断される(S40:YES)。例えば、図6に示した例では、移動前の決定コンテクストの親IDと移動後の決定コンテクストの自IDとが等しい場合に、「1階層上に移動」の位置変化であると判断される。この場合、属性記憶エリア113において、第三属性情報記憶エリア1323の「第一変更属性」の「種類」で指定されている属性が、「方法」又は「変更値」の指定に基づいて変更される(S43)。その後、処理はメイン処理へ戻る。 If the determined context is not the specific hierarchy context (S39: NO), it is determined whether the moving state of the determined context is a predetermined position change (S40). The determined context ID is compared with the previous determined context ID, and if the movement state is designated as “position change” in the third attribute information storage area 1323 (see FIG. 6), it is determined that the predetermined position change has occurred. (S40: YES). For example, in the example illustrated in FIG. 6, when the parent ID of the determination context before the movement and the own ID of the determination context after the movement are equal, it is determined that the position change is “move up one level”. In this case, in the attribute storage area 113, the attribute specified by the “type” of the “first change attribute” in the third attribute information storage area 1323 is changed based on the designation of “method” or “change value”. (S43). Thereafter, the process returns to the main process.
 処理が図7に示すメイン処理へ戻ると、ユーザにより入力された音声を変換した文言に応答する応答文が決定される(S20)。応答文が、属性記憶エリア113に記憶されている変更後の属性に基づいて、周知の音声合成技術により音声合成され(S21)、スピーカ26から出力される(S22)。そして、処理はS4へ戻り、ユーザからの入力が待機される(S4)。 When the process returns to the main process shown in FIG. 7, a response sentence is determined in response to a word converted from the voice input by the user (S20). The response sentence is synthesized by a known speech synthesis technique based on the changed attribute stored in the attribute storage area 113 (S21) and output from the speaker 26 (S22). Then, the process returns to S4, and an input from the user is waited (S4).
 また、決定コンテクストに変化があり(S9:YES)、タイマ18で計測されている時間が5分以上経過している場合には(S13:YES)、カウンタCの値が「5」以上であるか否かが判断される(S15)。カウンタ「C」の値が「5」以上でなければ(S15:NO)、タイマ18がリセットされ、時間の計測が開始されて(S16)、第二処理が行われる(S14)。 In addition, when there is a change in the determination context (S9: YES) and the time measured by the timer 18 has passed 5 minutes or more (S13: YES), the value of the counter C is “5” or more. Is determined (S15). If the value of the counter “C” is not “5” or more (S15: NO), the timer 18 is reset, time measurement is started (S16), and the second process is performed (S14).
 カウンタCの値が「5」以上であれば(S15:YES)、全ての音声の属性が初期値に変更される(S17)。カウンタCの値が「0」に初期化される(S18)。タイマ18がリセットされ、時間の計測が開始される(S19)。応答文が決定され(S20)、決定された応答文が音声合成され(S21)、スピーカ26から出力される(S22)。そして、処理はS4へ戻り、ユーザからの入力が待機される(S4)。 If the value of the counter C is “5” or more (S15: YES), all voice attributes are changed to initial values (S17). The value of the counter C is initialized to “0” (S18). The timer 18 is reset and time measurement is started (S19). A response sentence is determined (S20), and the determined response sentence is synthesized (S21) and output from the speaker 26 (S22). Then, the process returns to S4, and an input from the user is waited (S4).
 S4~S22の処理が繰り返し行われることによって、ユーザと音声対話エージェントとの対話が進行する。コンテクストが変化した場合に、変化後のコンテクストが意味コンテクストであったり、特定階層コンテクストであったり、所定の位置変化が生じていたりすれば、出力音声の属性が変更される。決定コンテクストが変化しなかった時間が所定時間以上であれば、属性が変更される。決定コンテクストが所定時間内に所定回数以上していれば、属性が変更される。音声対話エージェントが出力する応答文は、属性記憶エリア113に記憶されている変更後の属性に基づいて、音声変換され、音声がスピーカ26から出力される。ユーザが終了を指示する言葉を入力した場合には、本処理は終了する。 By repeating the processing of S4 to S22, the dialogue between the user and the voice dialogue agent proceeds. When the context changes, if the changed context is a semantic context, a specific hierarchy context, or a predetermined position change occurs, the attribute of the output sound is changed. If the time when the decision context has not changed is equal to or longer than the predetermined time, the attribute is changed. If the determination context has exceeded the predetermined number of times within the predetermined time, the attribute is changed. The response sentence output by the voice interaction agent is converted into voice based on the changed attribute stored in the attribute storage area 113, and the voice is output from the speaker 26. When the user inputs a word for instructing the end, this process ends.
 以下、図10を参照して、図2~図6に示した例におけるユーザと音声対話エージェントとの対話を具体例を挙げて説明する。図10において、「対話番号」は、ユーザからの入力文と音声対話エージェントの応答文との組に付与した番号である。「ユーザからの入力文」は、マイク27から入力された音声を文字変換して得られた文である。「キーワード」は、入力文から抽出されたキーワードである。「コンテクスト」は、キーワードによって決定された決定コンテクストである。「属性」には、「音響モデル」,「ピッチ」,「スピード」,「声質」が音声の属性として例示されている。「エージェントの応答文」は、入力文に応じて音声対話エージェントから出力される応答文である。以下の具体例では、全ての対話が5分以内に行われている。 Hereinafter, with reference to FIG. 10, the dialogue between the user and the voice interaction agent in the example shown in FIGS. 2 to 6 will be described with a specific example. In FIG. 10, “dialogue number” is a number assigned to a set of an input sentence from the user and a response sentence of the voice interaction agent. The “input sentence from the user” is a sentence obtained by converting the voice input from the microphone 27 into characters. “Keyword” is a keyword extracted from the input sentence. The “context” is a determination context determined by a keyword. In “attribute”, “acoustic model”, “pitch”, “speed”, and “voice quality” are exemplified as voice attributes. The “agent response text” is a response text output from the voice interaction agent in response to the input text. In the following specific example, all dialogues take place within 5 minutes.
 まず、最初の決定コンテクストの決定コンテクストIDは「0000」とされる。そして、RAM11の属性記憶エリア113にも、属性の初期値が記憶される(S1)。対話番号1の入力文「こんにちは」に対して、「こんにちは」がキーワードとして抽出される(S7)。「こんにちは」はコンテクストID「0000」のコンテクスト名「一般」のコンテクストに対応付けられているので(図2参照)、決定コンテクストIDは「0000」とされる(S8)。前回の決定コンテクストも「0000」なので、コンテクストに変化はない(S9:NO)。この場合、タイマ18による計測時間が5分未満であれば(S31:NO)、属性は初期値のまま変更されない。応答文「こんにちは、最近どこかへ出掛けた?」が決定され(S20)、初期値の属性に応じて音声合成が行われ(S21)、応答文が出力される(S22)。 First, the determination context ID of the first determination context is set to “0000”. The initial value of the attribute is also stored in the attribute storage area 113 of the RAM 11 (S1). Input sentence of dialogue number 1 for the "Hello", "Hello" is extracted as a keyword (S7). "Hello" because associated with the context of the context name of the context ID "0000", "General" (see FIG. 2), determining the context ID is "0000" (S8). Since the previous determined context is also “0000”, there is no change in the context (S9: NO). In this case, if the measurement time by the timer 18 is less than 5 minutes (S31: NO), the attribute remains unchanged at the initial value. Response sentence "Hello, recently somewhere to went out?" Is determined (S20), speech synthesis in accordance with the attribute of the initial value is performed (S21), the response sentence is output (S22).
 次いで、ユーザが次の発言をし、対話番号2の入力文「そうだなぁ、展覧会へ行ったよ」が入力される(S4:YES)。入力された音声が文字変換され(S5)、「展覧会」がキーワードとして抽出される(S7)。「展覧会」はコンテクストID「0101-0000」のコンテクスト名「アート」のコンテクストに対応付けられているので(図2参照)、決定コンテクストIDは「0101-0000」とされる(S8)。前回の決定コンテクストは「0000」であるので、コンテクストに変化がある(S9:YES)。タイマ18による計測時間が5分未満であり(S13:NO)、決定コンテクストは意味コンテクストである(S38:YES)。「0101-0000」は、意味「趣味」の意味コンテクストのコンテクストIDであるので(図4参照)、ピッチは初期値の「1.0」に「0.1」が加算されて「1.1」となり、スピードは初期値の「1.0」から変更値の「1.2」に変更される(S41)。応答文「へえ、展覧会。絵とか彫刻とかを観るの?」が決定され(S20)、変更後の音声の属性で音声合成が行われ(S21)、応答文が出力される(S22)。 Next, the user makes the following remark, and the input sentence “Yes, I went to the exhibition” of the dialogue number 2 is input (S4: YES). The input voice is converted into characters (S5), and "exhibition" is extracted as a keyword (S7). Since “Exhibition” is associated with the context of the context name “Art” with the context ID “0101-0000” (see FIG. 2), the determined context ID is “0101-0000” (S8). Since the previous determined context is “0000”, there is a change in the context (S9: YES). The measurement time by the timer 18 is less than 5 minutes (S13: NO), and the determination context is a semantic context (S38: YES). Since “0101-0000” is the context ID of the semantic context of the meaning “hobby” (see FIG. 4), “0.1” is added to the initial value “1.0” and “1.1”. The speed is changed from the initial value “1.0” to the changed value “1.2” (S41). The response sentence “Hey, exhibition. Do you see pictures or sculptures?” Is determined (S20), speech synthesis is performed with the changed speech attributes (S21), and a response sentence is output (S22).
 次いで、対話番号3の入力文「今回は絵の展覧会だったよ」が入力される(S4:YES)。「展覧会」がキーワードとして抽出される(S7)。「展覧会」はコンテクストID「0101-0000」のコンテクスト名「アート」のコンテクストに対応付けられているので、決定コンテクストIDは「0101-0000」とされる(S8)。前回決定コンテクストIDも「0101-0000」なので、決定コンテクストに変化はない(S9:NO)。応答文「どんな絵?」が決定され(S20)、前回と同じ属性で音声合成が行われ(S21)、応答文が出力される(S22)。 Next, an input sentence of dialogue number 3 “This time it was an exhibition of pictures” is input (S4: YES). “Exhibition” is extracted as a keyword (S7). Since “Exhibition” is associated with the context with the context name “Art” having the context ID “0101-0000”, the determined context ID is “0101-0000” (S8). Since the previously determined context ID is also “0101-0000”, there is no change in the determined context (S9: NO). A response sentence “what picture?” Is determined (S20), speech synthesis is performed with the same attribute as the previous time (S21), and a response sentence is output (S22).
 次いで、対話番号4の入力文「日本画だよ」が入力される(S4:YES)。「日本画」がキーワードとして抽出される(S7)。「日本画」はコンテクストID「0202-0101」のコンテクスト名「日本画」のコンテクストに対応付けられているので(図2参照)、決定コンテクストIDは「0202-0101」とされる(S8)。前回の決定コンテクストは「0101-0000」であるので、コンテクストに変化がある(S9:YES)。タイマ18による計測時間が5分未満であり(S13:NO)、決定コンテクストは意味コンテクストである(S38:YES)。コンテクストID「0202-0101」は意味「得意分野」の意味コンテクストであるので(図4参照)、音響モデルは「modelC」に変更される(S41)。応答文「へえ、日本画。昔の絵?それとも現代の日本画?」が決定され(S20)、変更後の属性で音声合成が行われ(S21)、応答文が出力される(S22)。 Next, the input sentence “Japanese painting is” of dialogue number 4 is input (S4: YES). “Japanese painting” is extracted as a keyword (S7). Since “Japanese painting” is associated with the context with the context name “Japanese painting” having the context ID “0202-0101” (see FIG. 2), the determined context ID is “0202-0101” (S8). Since the previous determined context is “0101-0000”, there is a change in the context (S9: YES). The measurement time by the timer 18 is less than 5 minutes (S13: NO), and the determination context is a semantic context (S38: YES). Since the context ID “0202-0101” is a semantic context of the meaning “special field” (see FIG. 4), the acoustic model is changed to “modelC” (S41). The response sentence “Hey, Japanese painting. Old picture or modern Japanese painting?” Is determined (S20), voice synthesis is performed with the changed attribute (S21), and a response sentence is output (S22).
 次いで、対話番号5の入力文「昔のだね。狩野派の展覧会だったよ。」が入力される(S4:YES)。「狩野派」がキーワードとして抽出される(S7)。「狩野派」はコンテクストID「0304-0202」のコンテクスト名「狩野派」のコンテクストに対応付けられているので(図2参照)、決定コンテクストIDは「0304-0202」とされる(S8)。前回の決定コンテクストは「0202-0101」であるので、コンテクストに変化がある(S9:YES)。タイマ18による計測時間が5分未満であり(S13:NO)、決定コンテクストは、意味コンテクストでも特定階層コンテクストでもないが(S38:NO,S39:NO)、前回決定コンテクストから1つ下の階層に移動している(S40:YES)。そこで、スピードは、記憶されている値である「1.2」に「0.1」が加算されて「1.3」とされる(S43)。応答文「狩野派のどんな作品があったの?」が決定され(S20)、変更後の属性で音声合成が行われ(S21)、応答文が出力される(S22)。 Next, the input sentence of dialogue number 5 “Old Dane. It was a Kano school exhibition” is entered (S4: YES). “Kano School” is extracted as a keyword (S7). Since the “Kano school” is associated with the context of the context name “Kano school” with the context ID “0304-022” (see FIG. 2), the determined context ID is set to “0304-0202” (S8). Since the previous determined context is “0202-0101”, there is a change in the context (S9: YES). The measurement time by the timer 18 is less than 5 minutes (S13: NO), and the decision context is neither a semantic context nor a specific hierarchy context (S38: NO, S39: NO), but is one level lower than the previous decision context. It is moving (S40: YES). Therefore, the speed is set to “1.3” by adding “0.1” to the stored value “1.2” (S43). A response sentence “What kind of work was there in the Kano school?” Is determined (S20), speech synthesis is performed with the changed attribute (S21), and a response sentence is output (S22).
 次いで、対話番号6の入力文「狩野永徳っていう人の作品がメインに展示されていたよ。」が入力される(S4:YES)。「狩野永徳」がキーワードとして抽出される(S7)。「狩野永徳」はコンテクストID「0400-0304」のコンテクスト名「画家」のコンテクストに対応付けられているので(図2参照)、決定コンテクストIDは「0400-0304」とされる(S8)。前回の決定コンテクストは「0304-0202」であるので、コンテクストに変化がある(S9:YES)。タイマ18による計測時間が5分未満であり(S13:NO)、決定コンテクストは、特定階層コンテクスト(最下層)である(S39:YES)。そこで、声質は「0.4」に変更される(S42)。応答文「狩野永徳の代表作は?」が決定され(S20)、変更後の属性で音声合成が行われ(S21)、応答文が出力される(S22)。 Next, an input sentence of dialogue number 6 “A work by a person named Eikari Kano was on display” was entered (S4: YES). “Kano Ekinori” is extracted as a keyword (S7). Since “Kano Naganori” is associated with the context of the context name “painter” of the context ID “0400-0304” (see FIG. 2), the determined context ID is “0400-0304” (S8). Since the previous determined context is “0304-0202”, there is a change in the context (S9: YES). The measurement time by the timer 18 is less than 5 minutes (S13: NO), and the determination context is a specific hierarchy context (lowermost layer) (S39: YES). Therefore, the voice quality is changed to “0.4” (S42). The response sentence “What is Kano Naganori's representative work?” Is determined (S20), speech synthesis is performed with the changed attribute (S21), and a response sentence is output (S22).
 次いで、対話番号7の入力文「国宝の洛中洛外図屏風かなぁ」が入力される(S4:YES)。「洛中洛外図屏風」がキーワードとして抽出される(S7)。「洛中洛外図屏風」はコンテクストID「0401-0304」のコンテクスト名「作品」のコンテクストに対応付けられているので(図2参照)、決定コンテクストIDは「0401-0304」とされる(S8)。前回の決定コンテクストは「0400-0304」であるので、コンテクストに変化がある(S9:YES)。タイマ18による計測時間が5分未満であり(S13:NO)、決定コンテクストは、特定階層コンテクスト(最下層)である(S39:YES)。そこで、声質は「0.4」に変更される(S42)。応答文「どこにある絵なの?」が決定され(S20)、変更後の属性で音声合成が行われ(S21)、応答文が出力される(S22)。 Next, the input sentence of dialogue number 7 “National treasure 洛 中 洛 外 図 屏風 な な” is input (S4: YES). “Takanaka-gai-gai-fu-fu” is extracted as a keyword (S7). Since “Takanaka 洛 gaifu” is associated with the context of the context name “works” with the context ID “0401-034” (see FIG. 2), the determined context ID is “0401-0304” (S8). . Since the previous determined context is “0400-0304”, there is a change in the context (S9: YES). The measurement time by the timer 18 is less than 5 minutes (S13: NO), and the determination context is a specific hierarchy context (lowermost layer) (S39: YES). Therefore, the voice quality is changed to “0.4” (S42). A response sentence “where is the picture?” Is determined (S20), speech synthesis is performed with the changed attribute (S21), and a response sentence is output (S22).
 次いで、対話番号8の入力文「どこだったかな、わすれちゃった。バイバイ。」が入力される(S4:YES)。「バイバイ」が終了指示となり(S6:YES)、ユーザと音声対話エージェントとの対話が終了する。 Next, the input sentence of dialogue number 8 “Where was, I forgot. Bye Bye” is input (S4: YES). “Bye-bye” is a termination instruction (S6: YES), and the dialogue between the user and the voice interaction agent is terminated.
 以上のようにして、ユーザと音声対話エージェントとの会話の内容(コンテクスト)に応じて、音声対話エージェントの出力音声を変更することができる。よって、音声対話エージェントの出力音声がコンテクストに見合った音声となるので、自然な対話を行うことができる。 As described above, the output voice of the voice interaction agent can be changed according to the content (context) of the conversation between the user and the voice interaction agent. Therefore, since the output voice of the voice dialogue agent becomes a voice commensurate with the context, a natural dialogue can be performed.
 コンテクストに相応しい属性を示す音声属性情報を、コンテクストに対応させて記憶しておけば、コンテクスト、つまり会話の内容に相応しい音声を出力することができる。よって、コンテクストの変化に応じて、出力音声を会話の内容に相応しい音声に切り替えることができる。したがって、ユーザは、会話の内容と音声とに違和感を抱くことなく、自然な会話を行うことができる。 If voice attribute information indicating an attribute suitable for the context is stored in correspondence with the context, it is possible to output a voice suitable for the context, that is, the content of the conversation. Therefore, the output sound can be switched to a sound suitable for the content of the conversation according to the change in context. Therefore, the user can have a natural conversation without feeling uncomfortable with the content and voice of the conversation.
 音声対話装置100と会話をしているユーザは、会話のコンテクストの階層を、出力される音声によって把握することができる。よって、ユーザは、会話の内容の変化状況を把握しながら会話することができ、会話を楽しむ一助となる。例えば、特定の階層を最も下位の階層とすれば、ユーザは、それ以上コンテクストが詳細な内容に変化することがないことを知ることができる。また、特定の階層を最も上位の階層とすれば、ユーザは、会話をより詳細な内容に移行させることが可能である旨を知ることができる。また、所定の階層のコンテクストに何らかの意味を持たせるように、ツリー構造の構築に工夫を施せば、音声の属性の変化によって、ユーザに何らかの意味を伝えることができる。 The user who is having a conversation with the voice interactive apparatus 100 can grasp the hierarchy of the context of the conversation by the output voice. Therefore, the user can talk while grasping the change state of the content of the conversation, and helps to enjoy the conversation. For example, if a specific hierarchy is set as the lowest hierarchy, the user can know that the context does not change to detailed contents any more. Further, if the specific hierarchy is the highest hierarchy, the user can know that the conversation can be shifted to more detailed contents. Further, if a tree structure is devised so as to give some meaning to the context of a predetermined hierarchy, some meaning can be conveyed to the user by a change in voice attributes.
 ユーザは、音声対話装置100から出力される音声によって、会話の内容が深くなったり、浅くなったり、同レベルのコンテクストで変化していたりする状況が分かる。よって、ユーザは、会話の内容の変化状況を把握しながら会話することができ、会話を楽しむ一助となる。 The user can understand the situation in which the content of the conversation becomes deeper, shallower, or changes in the context of the same level by the voice output from the voice interactive device 100. Therefore, the user can talk while grasping the change state of the content of the conversation, and helps to enjoy the conversation.
 音声対話装置100と会話をしているユーザは、音声対話装置100から出力される音声により、コンテクストが所定時間内に何度も切り替わったことがわかる。よって、ユーザは、会話の内容の変化状況を感じながら会話することができ、会話を楽しむ一助となる。 The user who is having a conversation with the voice interaction apparatus 100 knows that the context has been switched many times within a predetermined time by the voice output from the voice interaction apparatus 100. Therefore, the user can talk while feeling the change state of the content of the conversation, which helps to enjoy the conversation.
 音声対話装置100と会話をしているユーザは、音声対話装置100から出力される音声により、同一のコンテクストが所定時間以上継続していることがわかる。コンテクストの変化がなかったとしても出力音声の属性が変化するので、会話を楽しむ一助となる。 The user who is having a conversation with the voice interaction apparatus 100 knows that the same context has continued for a predetermined time or longer by the voice output from the voice interaction apparatus 100. Even if there is no change in the context, the attributes of the output voice change, which helps to enjoy the conversation.
 なお、本開示に係る音声対話装置及び音声対話システムは、上記した実施の形態に限定されるものではなく、本開示の要旨を逸脱しない範囲内において種々変更を加え得ることは勿論である。上記実施の形態では、音声対話プログラムを搭載した音声対話装置を所謂パーソナルコンピュータとしたが、音声対話プログラムを搭載する装置はパーソナルコンピュータでなくともよい。例えば、携帯型の端末、携帯電話、テレビでもよく、音声を入力するマイク、音声を出力するスピーカを備えていればよい。 It should be noted that the voice interaction device and the voice interaction system according to the present disclosure are not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present disclosure. In the above-described embodiment, the voice interaction device having the voice interaction program is a so-called personal computer. However, the device having the voice interaction program need not be a personal computer. For example, a portable terminal, a mobile phone, or a television may be used as long as a microphone for inputting sound and a speaker for outputting sound are provided.
 図2及び図3に示したコンテクストツリーは一例であり、必ずしもこの例のコンテクストツリーを採用する必要はない。実際にユーザと音声対話エージェントとの会話に相応しい音声を出力するためには、さらに多くの分野のコンテクストを作成し、細分化、深層化したコンテクストツリーを用いることが望ましい。属性情報記憶エリア132に記憶されている音声属性情報も、細かい設定を行うほど、ユーザと音声対話エージェントとの会話にさらに相応しい音声を出力することができる。コンテクストツリーの構成を工夫すれば、さらに相応しい音声で対応を行うことができる。なお、同一階層のコンテクストを100以上に増やす場合には、コンテクストIDの桁数を増やす必要がある。また、コンテクストIDの付与法則は上記実施の形態の法則に限らない。ユーザがコンテクストツリーにコンテクスト及びキーワードを追加できるように音声対話装置を構成してもよい。この場合には、入力装置(キーボードやマウス)24によって文字列を受け付ければよい。 The context tree shown in FIGS. 2 and 3 is an example, and the context tree of this example is not necessarily adopted. In order to actually output a sound suitable for the conversation between the user and the voice interaction agent, it is desirable to create contexts in more fields and use a subdivided and deepened context tree. As the voice attribute information stored in the attribute information storage area 132 is also finely set, the voice more suitable for the conversation between the user and the voice interaction agent can be output. If the configuration of the context tree is devised, it is possible to cope with a more appropriate voice. When the number of contexts in the same hierarchy is increased to 100 or more, it is necessary to increase the number of digits of the context ID. The context ID assignment rule is not limited to the above-described embodiment. The voice interaction device may be configured so that the user can add contexts and keywords to the context tree. In this case, a character string may be received by the input device (keyboard or mouse) 24.
 上記実施の形態では、入力文からキーワードが抽出され、入力文中に最初に出現したキーワードに基づいてコンテクストが決定された。しかしながら、コンテクストの決定に使用するキーワードは、最初に出現したキーワードに限られない。例えば、複数のキーワードが入力文中に存在する場合には、それぞれのキーワードが対応付けられたコンテクストのうち、階層が最下位のコンテクストを決定コンテクストとしてもよい。同一のキーワードが複数のコンテクストに対応付けられている場合には、対話の流れを考慮して、前回の対話のコンテクストに応じて決定コンテクストを決定してもよい。例えば、「プログラム」というキーワードが、コンテクスト名「コンサート」とコンテクスト名「コンピュータ」との両方に割り当てられていたとする。この場合、前回の対話のコンテクストが「音楽」であれば、決定コンテクストは「コンサート」とすればよい。 In the above embodiment, keywords are extracted from the input sentence, and the context is determined based on the first keyword that appears in the input sentence. However, the keyword used to determine the context is not limited to the keyword that appears first. For example, when a plurality of keywords are present in the input sentence, the context having the lowest hierarchy among the contexts associated with the respective keywords may be used as the determination context. When the same keyword is associated with a plurality of contexts, the determination context may be determined according to the context of the previous dialog in consideration of the flow of the dialog. For example, assume that the keyword “program” is assigned to both the context name “concert” and the context name “computer”. In this case, if the context of the previous dialogue is “music”, the determined context may be “concert”.
 上記実施の形態では、移動前の決定コンテクストから見て、移動後の決定コンテクストが親である場合、親の親である場合、子である場合、及び子の子である場合に音声の属性が変更される。しかし、移動前後の決定コンテクストが親と子である場合のみ属性を変更してもよい。4世代以上離れた関係である場合にも属性を変更してもよい。 In the above-described embodiment, when the determination context after movement is a parent, when the determination context after movement is a parent, when it is a parent, when it is a child, and when it is a child of a child, the audio attribute is Be changed. However, the attribute may be changed only when the determination context before and after the movement is a parent and a child. The attribute may be changed even when the relationship is more than four generations away.
 上記実施の形態では、移動前後の決定コンテクストが共に同じ階層に属し、且つ識別番号が1つ異なる場合に、音声の属性が変更される。しかし、移動前後の決定コンテクストが共に同じ階層に属する場合には、識別番号に関わらず属性を変更してもよい。移動前後の決定コンテクストが共に同じ階層に属し、且つ親が同一である場合に属性を変更してもよい。 In the above embodiment, the voice attributes are changed when the determination contexts before and after the movement belong to the same hierarchy and the identification numbers are different by one. However, when the determination contexts before and after the movement belong to the same hierarchy, the attribute may be changed regardless of the identification number. The attribute may be changed when the determination contexts before and after the movement belong to the same hierarchy and the parent is the same.

Claims (7)

  1.  音声を入力する音声入力手段と、
     前記音声入力手段によって入力された音声である入力音声を文字列に変換する変換手段と、
     会話のコンテクストをキーワードに対応させて記憶するコンテクスト記憶手段と、
     前記変換手段により変換された文字列である変換文字列から前記コンテクスト記憶手段に記憶されているキーワードを抽出し、抽出された前記キーワードに対応して前記コンテクスト記憶手段に記憶されている前記コンテクストを前記入力音声のコンテクストに決定するコンテクスト決定手段と、
     前記入力音声に応じた会話文を決定する会話文決定手段と、
     音声を出力する音声出力手段と、
     前記音声出力手段によって出力される音声の属性を記憶する属性記憶手段と、
     前記属性記憶手段に記憶された属性で、前記会話文決定手段によって決定された前記会話文を前記音声出力手段に音声出力させる出力制御手段と、
     前記コンテクスト決定手段によって決定された前記コンテクストである決定コンテクストが、前記コンテクスト決定手段によって前回決定された前記コンテクストである前回決定コンテクストから変化したか否かを判断する判断手段と、
     前記判断手段によって前記決定コンテクストが変化したと判断された場合に、前記属性記憶手段に記憶されている音声の属性を変更する属性変更手段とを備えたことを特徴とする音声対話装置。
    Voice input means for inputting voice;
    Conversion means for converting an input voice, which is a voice input by the voice input means, into a character string;
    A context storage means for storing a conversation context corresponding to a keyword;
    A keyword stored in the context storage unit is extracted from a converted character string that is a character string converted by the conversion unit, and the context stored in the context storage unit is associated with the extracted keyword. Context determining means for determining the context of the input voice;
    A conversation sentence determining means for determining a conversation sentence according to the input voice;
    Audio output means for outputting audio;
    Attribute storage means for storing attributes of the sound output by the sound output means;
    Output control means for causing the voice output means to output the conversation sentence determined by the conversation sentence determination means with the attribute stored in the attribute storage means;
    A determination unit that determines whether or not the determination context that is the context determined by the context determination unit has changed from the previous determination context that is the context determined by the context determination unit;
    A voice dialogue apparatus comprising: attribute changing means for changing a voice attribute stored in the attribute storage means when the determination means determines that the determination context has changed.
  2.  前記音声出力手段によって出力される音声の属性に関する音声属性情報を前記コンテクストに対応させて記憶する属性情報記憶手段を備え、
     前記属性変更手段は、前記決定コンテクストが前記属性情報記憶手段に前記音声属性情報が記憶されているコンテクストに変化した場合に、前記属性記憶手段に記憶されている音声の属性を、前記決定コンテクストに対応した前記音声属性情報の示す音声の属性に変更する請求項1に記載の音声対話装置。
    Attribute information storage means for storing audio attribute information related to audio attributes output by the audio output means in association with the context;
    The attribute changing means, when the decision context changes to a context in which the voice attribute information is stored in the attribute information storage means, the voice attribute stored in the attribute storage means to the decision context The voice interactive apparatus according to claim 1, wherein the voice dialog device changes to a voice attribute indicated by the corresponding voice attribute information.
  3.  前記コンテクスト記憶手段のデータ構造はツリー構造であり、前記ツリー構造の階層が上位から下位へ進むにつれ、詳細な会話内容となるように、複数のコンテクストを記憶しており、
     前記属性変更手段は、前記決定コンテクストと前記前回決定コンテクストとが前記ツリー構造の親子関係にある場合、又は同じ階層に属する場合に音声の属性を変更する請求項1又は2に記載の音声対話装置。
    The data structure of the context storage means is a tree structure, and as the hierarchy of the tree structure progresses from upper to lower, a plurality of contexts are stored so as to be detailed conversation contents.
    The spoken dialogue apparatus according to claim 1 or 2, wherein the attribute changing unit changes a voice attribute when the decision context and the previous decision context are in a parent-child relationship of the tree structure or belong to the same hierarchy. .
  4.  前記コンテクスト記憶手段のデータ構造はツリー構造であり、前記ツリー構造の階層が上位から下位へ進むにつれ、詳細な会話内容となるように、複数のコンテクストを記憶しており、
     前記属性変更手段は、前記決定コンテクストが前記ツリー構造の所定の階層のコンテクストとなった場合に音声の属性を変更する請求項1又は2に記載の音声対話装置。
    The data structure of the context storage means is a tree structure, and as the hierarchy of the tree structure progresses from upper to lower, a plurality of contexts are stored so as to be detailed conversation contents.
    The voice interaction apparatus according to claim 1, wherein the attribute changing unit changes a voice attribute when the determined context becomes a context of a predetermined hierarchy of the tree structure.
  5.  前記属性変更手段は、第一の所定時間内に前記決定コンテクストが所定回数以上変化した場合に音声の属性を変更することを特徴とする請求項1乃至4のいずれかに記載の音声対話装置。 5. The voice interactive apparatus according to claim 1, wherein the attribute changing unit changes a voice attribute when the determined context changes a predetermined number of times within a first predetermined time.
  6.  前記属性変更手段は、前記決定コンテクストが変化しない時間が第二の所定時間以上である場合に音声の属性を変更することを特徴とする請求項1乃至5のいずれかに記載の音声対話装置。 6. The voice interactive apparatus according to claim 1, wherein the attribute changing unit changes a voice attribute when the time during which the determination context does not change is equal to or longer than a second predetermined time.
  7.  請求項1乃至6のいずれかに記載の音声対話装置の各種処理手段としてコンピュータを動作させる音声対話プログラムを記憶したコンピュータ読み取り可能な媒体。 A computer-readable medium storing a voice dialogue program for operating a computer as various processing means of the voice dialogue apparatus according to any one of claims 1 to 6.
PCT/JP2008/072703 2008-01-10 2008-12-12 Voice interactive device and computer-readable medium containing voice interactive program WO2009087860A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2008-002851 2008-01-10
JP2008002851 2008-01-10

Publications (1)

Publication Number Publication Date
WO2009087860A1 true WO2009087860A1 (en) 2009-07-16

Family

ID=40852985

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2008/072703 WO2009087860A1 (en) 2008-01-10 2008-12-12 Voice interactive device and computer-readable medium containing voice interactive program

Country Status (2)

Country Link
JP (1) JP2009186989A (en)
WO (1) WO2009087860A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110875059A (en) * 2018-08-31 2020-03-10 深圳市优必选科技有限公司 Method and device for judging reception end and storage device

Families Citing this family (155)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
WO2011089450A2 (en) 2010-01-25 2011-07-28 Andrew Peter Nelson Jerram Apparatuses, methods and systems for a digital conversation management platform
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
JP5682543B2 (en) * 2011-11-28 2015-03-11 トヨタ自動車株式会社 Dialogue device, dialogue method and dialogue program
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
EP2954514B1 (en) 2013-02-07 2021-03-31 Apple Inc. Voice trigger for a digital assistant
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
DE112014002747T5 (en) 2013-06-09 2016-03-03 Apple Inc. Apparatus, method and graphical user interface for enabling conversation persistence over two or more instances of a digital assistant
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
AU2015266863B2 (en) 2014-05-30 2018-03-15 Apple Inc. Multi-command single utterance input method
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10152299B2 (en) 2015-03-06 2018-12-11 Apple Inc. Reducing response latency of intelligent automated assistants
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10460227B2 (en) 2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US20160378747A1 (en) 2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179309B1 (en) 2016-06-09 2018-04-23 Apple Inc Intelligent automated assistant in a home environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
DK201770383A1 (en) 2017-05-09 2018-12-14 Apple Inc. User interface for correcting recognition errors
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
DK201770427A1 (en) 2017-05-12 2018-12-20 Apple Inc. Low-latency intelligent automated assistant
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
DK179549B1 (en) 2017-05-16 2019-02-12 Apple Inc. Far-field extension for digital assistant services
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US20180336275A1 (en) 2017-05-16 2018-11-22 Apple Inc. Intelligent automated assistant for media exploration
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
DK179822B1 (en) 2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. Virtual assistant operation in multi-device environments
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
US11076039B2 (en) 2018-06-03 2021-07-27 Apple Inc. Accelerated task performance
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
DK180129B1 (en) 2019-05-31 2020-06-02 Apple Inc. User activity shortcut suggestions
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
WO2021056255A1 (en) 2019-09-25 2021-04-01 Apple Inc. Text detection using global geometry estimators

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002032370A (en) * 2000-07-18 2002-01-31 Fujitsu Ltd Information processor
JP2005241952A (en) * 2004-02-26 2005-09-08 Gap Kk Device, method, and program for knowledge processing
JP2006010845A (en) * 2004-06-23 2006-01-12 Nippon Hoso Kyokai <Nhk> Synthesized speech uttering device and program thereof, and data set generating device for speech synthesis, and program thereof
JP2007272773A (en) * 2006-03-31 2007-10-18 Xing Inc Interactive interface control system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002032370A (en) * 2000-07-18 2002-01-31 Fujitsu Ltd Information processor
JP2005241952A (en) * 2004-02-26 2005-09-08 Gap Kk Device, method, and program for knowledge processing
JP2006010845A (en) * 2004-06-23 2006-01-12 Nippon Hoso Kyokai <Nhk> Synthesized speech uttering device and program thereof, and data set generating device for speech synthesis, and program thereof
JP2007272773A (en) * 2006-03-31 2007-10-18 Xing Inc Interactive interface control system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110875059A (en) * 2018-08-31 2020-03-10 深圳市优必选科技有限公司 Method and device for judging reception end and storage device
CN110875059B (en) * 2018-08-31 2022-08-05 深圳市优必选科技有限公司 Method and device for judging reception end and storage device

Also Published As

Publication number Publication date
JP2009186989A (en) 2009-08-20

Similar Documents

Publication Publication Date Title
WO2009087860A1 (en) Voice interactive device and computer-readable medium containing voice interactive program
KR101143034B1 (en) Centralized method and system for clarifying voice commands
JP3679350B2 (en) Program, information storage medium and computer system
JP4395687B2 (en) Information processing device
WO2017168870A1 (en) Information processing device and information processing method
CN101622659A (en) Voice tone editing device and voice tone editing method
CN104899240B (en) Voice search device, speech search method
US20090204399A1 (en) Speech data summarizing and reproducing apparatus, speech data summarizing and reproducing method, and speech data summarizing and reproducing program
JP2008083100A (en) Voice interactive device and method therefor
JP2013196661A (en) Input control program, input control device, input control system and input control method
JP2022020659A (en) Method and system for recognizing feeling during conversation, and utilizing recognized feeling
JP2006251042A (en) Information processor, information processing method and program
KR101891495B1 (en) Method and computer device for controlling a display to display conversational response candidates to a user utterance input, and computer readable recording medium
JP2008011272A5 (en)
US20210081164A1 (en) Electronic apparatus and method for providing manual thereof
JP6642367B2 (en) Karaoke device and karaoke program
JP2007010995A (en) Speaker recognition method
JP2016090776A (en) Response generation apparatus, response generation method, and program
JP5488200B2 (en) Dialog apparatus, dialog method, and program
JP7230085B2 (en) Method and device, electronic device, storage medium and computer program for processing sound
JP2013239021A (en) Conference support system and method, computer program, and recording medium
JP2019078924A (en) Utterance contents evaluation system and utterance contents evaluation program
JP2013003430A (en) Karaoke device
JP2006235040A (en) Image forming apparatus, program, and recording medium
JP6747741B1 (en) Content creation support system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08870526

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08870526

Country of ref document: EP

Kind code of ref document: A1