WO2024069974A1 - Dialogue device, dialogue method, and dialogue program - Google Patents

Dialogue device, dialogue method, and dialogue program Download PDF

Info

Publication number
WO2024069974A1
WO2024069974A1 PCT/JP2022/036821 JP2022036821W WO2024069974A1 WO 2024069974 A1 WO2024069974 A1 WO 2024069974A1 JP 2022036821 W JP2022036821 W JP 2022036821W WO 2024069974 A1 WO2024069974 A1 WO 2024069974A1
Authority
WO
WIPO (PCT)
Prior art keywords
state
dialogue
domain
utterance
user
Prior art date
Application number
PCT/JP2022/036821
Other languages
French (fr)
Japanese (ja)
Inventor
航 光田
竜一郎 東中
哲也 杵渕
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2022/036821 priority Critical patent/WO2024069974A1/en
Publication of WO2024069974A1 publication Critical patent/WO2024069974A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types

Definitions

  • the present invention relates to a dialogue device, a dialogue method, and a dialogue program.
  • chat dialogue systems are known in which computers converse with humans using natural language, etc.
  • chat dialogue systems are mainly stateless and use a question-and-answer format in which the dialogue system selects and generates utterances based only on information from the user's most recent utterance.
  • This type of chat dialogue system has the problem that it is difficult to have a dialogue that goes beyond a question-and-answer format, making it impossible for users to have casual conversations that they feel are relevant, resulting in low user satisfaction.
  • Non-Patent Document 1 a technology has been proposed that limits the domain of casual conversation to travel and gives the dialogue system a state to realize a more coherent dialogue.
  • chat dialogue systems use a state transition diagram that manages the current state and a foundational table that represents the topic of a conversation shared with a user at a certain point in time, and transition to an appropriate state in response to the user's utterances and update the foundational table.
  • state transition diagrams and infrastructuring tables are designed specifically for a particular domain to enable interlocking dialogue. Therefore, to enable interlocking dialogue in other domains, it is necessary to design state transition diagrams and infrastructuring tables appropriate for that domain, which requires advanced knowledge of the dialogue system's speech understanding and dialogue management, resulting in high construction costs.
  • the present invention was made in consideration of the above, and aims to make it possible to realize a domain-independent chat dialogue system.
  • the dialogue device is characterized by having a storage unit that stores domain knowledge representing information of a specific domain to which a topic belongs and a state transition diagram representing transitions of specific states independent of the domain to which the topic belongs, an acquisition unit that acquires text representing an utterance by a user, an identification unit that identifies a dialogue state including a current state of a dialogue with the user and a destination state using information contained in the acquired text, the domain knowledge, and the state transition diagram, and a generation unit that generates an utterance according to the identified destination state.
  • the present invention makes it possible to realize a domain-independent chat dialogue system.
  • FIG. 1 is a diagram for explaining an overview of the dialogue device according to the first embodiment.
  • FIG. 2 is a schematic diagram illustrating a schematic configuration of the dialogue device according to the first embodiment.
  • FIG. 3 is a diagram illustrating an example of a data structure of domain knowledge.
  • FIG. 4 is a diagram for explaining the foundation table.
  • FIG. 5 is a diagram for explaining the process of the extraction unit.
  • FIG. 6 is a diagram illustrating a state transition diagram.
  • FIG. 7 is a diagram illustrating a state transition diagram.
  • FIG. 8 is a diagram illustrating an example of an utterance template.
  • FIG. 9 is a diagram illustrating an example of the dialogue processing result.
  • FIG. 10 is a flowchart showing an interaction processing procedure according to the first embodiment.
  • FIG. 10 is a flowchart showing an interaction processing procedure according to the first embodiment.
  • FIG. 11 is a diagram for explaining an overview of the dialogue device according to the second embodiment.
  • FIG. 12 is a schematic diagram illustrating a schematic configuration of the dialogue device according to the second embodiment.
  • FIG. 13 is a diagram for explaining the inter-page link structure.
  • FIG. 14 is a diagram for explaining the noun category dictionary.
  • FIG. 15 is a diagram for explaining the process of the creation unit.
  • FIG. 16 is a diagram illustrating a result of creating domain knowledge.
  • FIG. 17 is a flowchart showing an interaction process procedure according to the second embodiment.
  • FIG. 18 is a diagram illustrating an example of a computer that executes a dialogue program.
  • FIG. 1 is a diagram for explaining an overview of a dialogue device according to a first embodiment.
  • the dialogue device according to the first embodiment extends the foundation table 14b and the state transition diagram 14c to be domain-independent, and executes a chat dialogue with a user that is domain-independent.
  • This dialogue device uses a speech template 14d for generating system utterances and a domain knowledge 14a used for understanding and presenting topics, specialized for the domain. For example, by associating a speech template 14d specialized for a desired domain with each state of the state transition diagram 14c, an utterance of the desired domain (hereinafter, a system utterance) is output, and a dialogue that engages with the user is realized.
  • a system utterance a system utterance
  • Fig. 2 is a schematic diagram illustrating a schematic configuration of the dialogue device according to the first embodiment.
  • the dialogue device 10 of the present embodiment is realized by a general-purpose computer such as a personal computer, and includes an input unit 11, an output unit 12, a communication control unit 13, a storage unit 14, and a control unit 15.
  • the input unit 11 is realized using input devices such as a keyboard and a mouse, and inputs various instruction information, such as a command to start processing, to the control unit 15 in response to input operations by an operator.
  • the output unit 12 is realized by a display device such as a liquid crystal display, a printer, etc. For example, the output unit 12 displays the results of the dialogue processing described below.
  • the communication control unit 13 is realized by a NIC (Network Interface Card) or the like, and controls communication between the control unit 15 and external devices via telecommunication lines such as a LAN (Local Area Network) or the Internet.
  • the communication control unit 13 controls communication between the control unit 15 and a management device that collects and manages information about user terminals used by users and various domains.
  • the storage unit 14 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk.
  • the storage unit 14 stores in advance the processing programs that operate the dialogue device 10 and data used during the execution of the processing programs, or stores them temporarily each time processing is performed.
  • the storage unit 14 may be configured to communicate with the control unit 15 via the communication control unit 13.
  • the storage unit 14 stores domain knowledge 14a, foundation table 14b, state transition diagram 14c, speech template 14d, etc., which are used in the dialogue processing described below.
  • the domain knowledge 14a, foundation table 14b, state transition diagram 14c, and speech template 14d are acquired in advance from a management device that manages various types of information, for example, by the acquisition unit 15a described below, and stored in the storage unit 14.
  • Domain knowledge 14a represents information on a specific domain to which the topic belongs.
  • Infrastructural table 14b shows the correspondence between domain knowledge 14a and state transition diagram 14c.
  • State transition diagram 14c represents a specific state transition that is independent of the domain to which the topic belongs.
  • Speech template 14d is a template for system utterances according to a specific domain, and corresponds to a state in state transition diagram 14c.
  • FIG. 3 is a diagram illustrating an example of the data structure of domain knowledge.
  • Domain knowledge 14a is information that represents topics necessary for casual conversation in a specific domain in a tree structure, and includes, for example, information on topics, entities, categories, and scores, as illustrated in FIG. 3.
  • Topics are domain information that can be an introductory topic in the dialogue processing described below.
  • Entities are information that can be the focus of a topic.
  • categories are information that represents the attributes of each entity. In this way, topics, entities, and categories differ in the depth of the topic; whereas entities are the topics themselves, topics are topics used as an introduction without directly mentioning an entity, and categories represent topics about specific attributes of an entity.
  • domain knowledge 14a is illustrated, with the domain being cooking.
  • Topics in this cooking domain are, for example, Japanese cuisine, Chinese cuisine, Italian cuisine, etc.
  • entities under Japanese cuisine are, for example, mochi (rice cake), onigiri (rice ball), oden, etc.
  • categories for mochi and onigiri include rice, and categories for oden include bonito flakes, etc.
  • the scores shown in FIG. 3 indicate the priority of the topics, with a higher value indicating a higher priority.
  • control unit 15 refers to the domain knowledge 14a to identify what topic is being mentioned in the user's speech and to provide new topics. For example, in order to have a conversation that is relevant to the user, the control unit 15 changes the topic in this order using topics, entities, and categories that are set to satisfy predetermined conditions.
  • a has-a relationship is one in which one of the two entities in the relationship contains the other as an element.
  • a has-a relationship is one in which one of the two entities in the relationship contains the other as an element.
  • an is-a relationship is one in which one of the two entities in the relationship is an abstract concept of the other. For example, “cooking” and “Japanese cuisine”, and “vehicles” and “automobiles” each have an is-a relationship.
  • entities and categories need to have a many-to-one relationship, not a one-to-one relationship. If the topic changes from topic to entity to category, and then back to topic again, the user will get the impression that the topic has changed suddenly, and will not feel that the conversation was coherent. For this reason, it is necessary to change the topic to a different entity in the same category, and multiple entities need to be associated with one category. Note that an entity may be associated with multiple categories.
  • FIG. 4 is a diagram for explaining the foundational table.
  • Foundational table 14b is information that is understood by the system regarding the dialogue with the user, and is information that represents the topic at a certain point in the dialogue between the user and the system. Specifically, in foundational table 14b, the state of the dialogue is defined by the information in each row of the domain-independent slots illustrated in FIG. 4 and their values.
  • the information in each row of the slot is domain-independent information such as the target speaker, topic type, topic, entity, and category. Topic, entity, and category are the same as the domain knowledge 14a described above. Furthermore, each value is associated with a value in domain knowledge 14a by the identification unit 15c described later.
  • the target speaker indicates whether the topic is focused on the user or the system, and is set based on the state transition diagram 14c described below.
  • the topic type is information that roughly indicates pre-set topics such as favorite dishes, dishes you want to eat now, and dishes you don't like, and is set according to the order of the state transition diagram.
  • the target speaker and topic type specify which of the set of speech templates 14d (described below) defined for each target speaker and topic type should be applied. This makes it possible to output system utterances flexibly according to the target speaker and topic type.
  • speech template 14d "What food do you like?” is used. Or, when asking the user about the food they want to eat, speech template 14d "What food do you feel like eating right now?" is used.
  • control unit 15 is realized using a CPU (Central Processing Unit) or the like, and executes a processing program stored in memory.
  • the control unit 15 functions as an acquisition unit 15a, an extraction unit 15b, an identification unit 15c, and a generation unit 15d, as exemplified in FIG. 2, to execute interactive processing.
  • each of these functional units, or some of them, may be implemented in different hardware.
  • the control unit 15 may also include other functional units.
  • the control unit 15 may include a creation unit 15e, which will be described later.
  • the acquisition unit 15a acquires text representing an utterance by a user. For example, the acquisition unit 15a acquires an utterance represented by text of a user having a dialogue via the input unit 11 or via the communication control unit 13 from a user terminal or the like.
  • the extraction unit 15b extracts information for identifying the state of the dialogue from the acquired text.
  • the extraction unit 15b performs morphological analysis using any language analysis tool to extract modalities such as focus words, proper nouns, evaluation expressions, and negation expressions, which are keywords that represent topics. This enables the identification unit 15c, which will be described later, to identify the state to which the transition will be made in the state transition diagram 14c and to identify the information to be set in the foundation table 14b.
  • FIG. 5 is a diagram for explaining the processing of the extraction unit.
  • FIG. 5 illustrates an example of the analysis results using a language analysis tool called Richindexer.
  • line represents the input sentence, and forms and poses represent morphological analysis. Additionally, names represents proper nouns, sems represents modality, evals represents evaluative expressions, cents represents focus words, and da represents the estimated results of dialogue acts.
  • the identification unit 15c uses the information contained in the acquired text, the domain knowledge 14a, and the state transition diagram 14c to identify the state of the dialogue with the user, including the current state and the transition state.
  • the identification unit 15c identifies the state of the dialogue using the extracted information, domain knowledge 14a, and state transition diagram 14c. That is, the identification unit 15c uses the analysis results by the extraction unit 15b to generate a foundational table 14b that indicates the correspondence between domain knowledge 14a and state transition diagram 14c, and uses the generated foundational table 14b and state transition diagram 14c defined in a domain-independent state to grasp the current state of the dialogue and identify the next state to which the dialogue will transition.
  • the generation unit 15d generates an utterance according to the identified transition destination state. Specifically, the generation unit 15d generates an utterance using an utterance template 14d according to a predetermined domain that is associated with a state in the state transition diagram 14c.
  • Figs. 6 and 7 are diagrams illustrating state transition diagrams.
  • State transition diagram 14c is information that represents the transitions of various states that are independent of the domain.
  • the various information are the speech state, the user speech acquisition state, the analysis state, the notation state, and the conditional branch state.
  • the speech bubbles in Fig. 6 represent the observed speech and processing, and the bold-framed arrows represent the transitions of the observed states.
  • the speech states shown by diagonal shading from the upper left to the lower right in Figure 6 are states in which system utterances are made, and each is associated with an utterance template 14d, described below.
  • the user utterance acquisition state shown by diagonal shading from the upper right to the lower left in Figure 6 is a state in which the system waits until acquisition of the user's utterance is complete.
  • the analysis state is a state in which analysis of the user utterance is made, or a state in which the transition destination changes as a result of searching domain knowledge 14a.
  • the notational states shown by vertical dashed shading in Figure 6 are states shown so that the arrows do not overlap for the convenience of notating the state transition diagram 14c.
  • the conditional branch state is a state in which the transition destination changes based only on the dialogue state.
  • the identified states include a negative judgment state, a positive judgment state, an experience judgment state, an impression judgment state, an element (topic, entity, category) extraction state in the user's utterance, an element (topic, entity, category) search state in the domain knowledge, and a random transition state. This makes it possible to express complex transitions, such as executing a specific process depending on the dialogue state.
  • speech template 14d is a template for creating system utterances.
  • speech template 14d the information for each row of the slots in foundation table 14b is written as blank.
  • system utterance is generated by inserting the current values in foundation table 14b into the blanks.
  • System utterances for multiple utterance states that have been passed through before transitioning to the user utterance acquisition state are output together as one. Also, if multiple utterances are associated with the next state, one utterance is selected and output based on an arbitrary priority, such as dictionary order.
  • the state transition diagram 14c is not limited to the example shown in Figure 6, but is designed to meet the requirements listed below. This allows the user to get the feeling that they are having a complex dialogue with the system.
  • the state transition diagram 14c is domain-independent, it is not limited to cooking, but can also be applied to other domains such as travel, food, and sports, enabling dialogue that provides high user satisfaction.
  • the system When a value is set for a topic or other topic of a specific depth during a dialogue, the system will output utterances that can respond appropriately to any topic mentioned by the user, such as other topics of the same depth, or any topic deeper, such as entities or categories.
  • the target speaker first talks about the user, then about the system, and then repeats this process.
  • the topic type will change value in a pre-defined order, and once all topic types have been used, the value of the last topic type will continue to be used.
  • the dialogue state values are carried over as much as possible.
  • the topic of "onigiri” is changed to "chimaki” which is in the same category of "rice” by changing only the entity.
  • the entity and category are changed to "miso soup” or the like in the same topic of "Japanese cuisine.”
  • the target speaker When changing the topic, if the number of changes between topics, entities, or categories reaches a predetermined maximum number, the target speaker will be changed and the topic will continue. For example, after a topic about the user's favorite food is discussed, the topic will change to the system's predetermined favorite food, and if this change reaches a predetermined maximum number, the topic type will be changed.
  • the system asks the user about a topic, presents an entity, asks about their experience with that entity, and then asks about their thoughts about the category. If the user speaks further than expected, such as when the system presents an entity and the user gives their opinion, the system skips to that state and transitions to that state.
  • the system In response to the user's experiences and impressions, the system outputs utterances that correspond to the presence or absence of positive/negative expressions, evaluation expressions, focus words, etc.
  • state transition diagram 14c is not particularly limited, and it may be described in any format capable of expressing state transitions.
  • state transition diagram 14c may be set in SCXML format, as illustrated in FIG. 7.
  • variables corresponding to infrastructural table 14b are defined in the datamodel tag, and each state is defined in the state tag.
  • FIG. 8 is a diagram illustrating an example of a speech template.
  • the speech template 14d is a list of utterances that are associated one-to-one with speech states by domain. A unique ID expressed as text is assigned to the speech state, and an ID is also assigned to each utterance. When a state transition occurs and a certain state is reached, an utterance corresponding to that state is output. An utterance can be set by any text, including an empty sentence.
  • FIG. 8 shows an example of a speech template expressed in YAML format.
  • the speech is set in a single format or in a list format beginning with "-".
  • text in [ ] such as the user name (user_name), system name (sys_name), topic (topic), etc. represents a special variable, and the corresponding value in the foundation table 14b is substituted.
  • FIG. 9 is a diagram illustrating an example of the dialogue processing result.
  • FIG. 9 illustrates the dialogue processing result for the "food" domain. It was confirmed by multiple evaluators that the level of satisfaction achieved was equal to or higher than that achieved by the dialogue system described in Non-Patent Document 1.
  • Fig. 10 is a flowchart showing a dialogue processing procedure according to the first embodiment.
  • the flowchart in Fig. 10 is started, for example, when a user performs an operation input to instruct the start of the dialogue processing.
  • the acquisition unit 15a acquires text representing an utterance by a user (step S1). For example, the acquisition unit 15a acquires an utterance represented by text of a user having a dialogue via the input unit 11 or via the communication control unit 13 from a user terminal or the like.
  • the extraction unit 15b extracts information for identifying the state of the dialogue from the acquired text (step S2). For example, the extraction unit 15b performs morphological analysis using any language analysis tool to extract modalities such as focus words, proper nouns, evaluative expressions, and negative expressions, which are keywords that indicate topics.
  • modalities such as focus words, proper nouns, evaluative expressions, and negative expressions, which are keywords that indicate topics.
  • the identification unit 15c uses the information included in the acquired text, the domain knowledge 14a, and the state transition diagram 14c to identify the state of the dialogue with the user, including the current state and the transition destination state (step S3). Specifically, the identification unit 15c uses the extracted information, the domain knowledge 14a, and the state transition diagram 14c to identify the state of the dialogue.
  • the identification unit 15c uses the analysis results by the extraction unit 15b to generate a foundational table 14b that indicates the correspondence between the domain knowledge 14a and the state transition diagram 14c, and uses the generated foundational table 14b and the state transition diagram 14c defined in a domain-independent state to grasp the current state of the dialogue and identify the next state to which the dialogue will transition.
  • the generation unit 15d generates an utterance according to the identified transition destination state (step S4). Specifically, the generation unit 15d generates an utterance using an utterance template 14d corresponding to a predetermined domain that is associated with a state in the state transition diagram 14c. The generation unit 15d outputs the generated utterance to a user terminal or the like via the output unit 12 or the communication control unit 13, and presents it to the user. This completes the series of dialogue processes.
  • Second Embodiment Domain knowledge 14a which aggregates information on a specific domain to which a topic belongs, has conventionally been created using regular expressions created manually for a specific domain and linguistic analysis of Wikipedia articles, for example, using categories that are pre-assigned to Wikipedia.
  • the dialogue device of the second embodiment therefore acquires topics by utilizing link relationships between articles and creates domain knowledge 14a. For example, since "onigiri” is considered to be a representative dish in the field of "Japanese cuisine,” there is a high possibility that there is a link relationship between an article on "Japanese cuisine” and an article on "onigiri” in which the text of one article contains a link to the other article.
  • the dialogue device then creates domain knowledge 14a in a tree structure in which nouns expressing topics such as "onigiri” and “chimaki” that belong to the same category, such as "food,” are represented as nodes, and relationships between topics are represented as edges.
  • domain knowledge 14a represents the relationship that "onigiri” is included in “Japanese cuisine” and is made of "rice.”
  • FIG. 11 is a diagram for explaining an overview of the dialogue device according to the second embodiment.
  • the dialogue device uses a noun category dictionary 14f, which is a dictionary that maps nouns to specific categories, as topics that belong to the same category, and creates a tree-structured domain knowledge 14a using an inter-page link structure 14e, which is represented by a link structure between article pages that explain a specific subject, as an association between topics.
  • a noun category dictionary 14f is a dictionary that maps nouns to specific categories, as topics that belong to the same category
  • an inter-page link structure 14e which is represented by a link structure between article pages that explain a specific subject, as an association between topics.
  • Fig. 12 is a schematic diagram illustrating a schematic configuration of a dialogue device according to the second embodiment.
  • the dialogue device 10a shown in Fig. 12 differs from the dialogue device 10 of the first embodiment shown in Fig. 2 in that the storage unit 14 of the dialogue device 10a stores an inter-page link structure 14e and a noun category dictionary 14f instead of the foundation table 14b, the state transition diagram 14c, and the speech template 14d.
  • the control unit 15 also differs from the dialogue device 10 of the first embodiment in that it has a creation unit 15e instead of the extraction unit 15b, the identification unit 15c, and the generation unit 15d. Descriptions of other functional units similar to those of the dialogue device 10 shown in Fig. 2 will be omitted.
  • the storage unit 14 may store the foundation table 14b, the state transition diagram 14c, and the speech template 14d, and the control unit 15 may have an extraction unit 15b, an identification unit 15c, and a generation unit 15d.
  • the storage unit 14 stores domain knowledge 14a created in the dialogue processing described below, and an inter-page link structure 14e and a noun category dictionary 14f used in the dialogue processing.
  • the noun category dictionary 14f is a dictionary that maps nouns to specific categories.
  • the inter-page link structure 14e is information that includes links on the explanation page of each noun to the explanation page of other nouns.
  • the inter-page link structure 14e and the noun category dictionary 14f are acquired in advance by the acquisition unit 15a from, for example, a management device, and stored in the storage unit 14.
  • FIG. 13 is a diagram for explaining the inter-page link structure.
  • the inter-page link structure 14e refers to any resource in which links to related things and events are embedded in individual explanatory pages for various things and events, such as Wikipedia.
  • the "100 Best Cherry Blossom Spots in Japan" page shown in Figure 13(a) lists famous cherry blossom sightseeing spots and includes embedded links to each location.
  • the link leads to a page for "Nagatoro Valley” as shown in Figure 13(b), which in turn leads to a page for "Saitama Prefecture” as shown in Figure 13(c). If the links between such pages can be properly followed, it is possible to acquire the knowledge that "Nagatoro Valley” is in “Saitama Prefecture” and that "cherry blossoms" are famous there. In other words, the path “Saitama Prefecture - Nagatoro Valley - Cherry Blossoms" can be acquired as knowledge.
  • the noun category dictionary 14f is a dictionary that maps nouns that represent various things and events to specific categories. For example, in the noun category dictionary 14f, the nouns “Japan” and “Italy” are mapped to the category “country name,” and the nouns “onigiri” and “chimaki” are mapped to the category "food name.”
  • FIG. 14 is a diagram for explaining the noun category dictionary.
  • FIG. 14 illustrates Shinra, an example of the noun category dictionary 14f.
  • Shinra is a resource that maps Wikipedia titles to an extended named entity dictionary.
  • the noun "Hakone Tozan Railway Ke-1 passenger car", which is the Wikipedia page title, is mapped to the train name category "1.7.17.2”.
  • the control unit 15 of this embodiment has an acquisition unit 15a and a creation unit 15e.
  • the acquisition unit 15a acquires a specific root page of the target domain and a specific example of a path that represents a relationship with the root page.
  • the acquisition unit 15a acquires a specific root page of the target domain and at least one specific example of a path that represents a relationship with the root page via the input unit 11, or via the communication control unit 13 from a user terminal or the like.
  • the acquisition unit 15a when creating domain knowledge 14a that represents information on the "cooking" domain in a tree structure, accepts an input that specifies, for example, the "Japanese cuisine” page as the root page of the tree structure.
  • the acquisition unit 15a also accepts an input that specifies "Japanese cuisine” - "rice balls” - "rice” as a specific example of a path to be acquired.
  • the links between pages are used as edges in a tree structure to represent the relationships between topics, so it is necessary that there be links between the pages of each noun in the specific path examples.
  • the "Japanese cuisine” page needs to include a link to the "onigiri” page.
  • the creation unit 15e uses the noun category dictionary 14f and the inter-page link structure 14e to create domain knowledge 14a of the target domain. Specifically, the creation unit 15e refers to the noun category dictionary 14f to identify the category of the noun included in the root page of the specific example, and refers to the inter-page link structure 14e to list the nouns in the same category as the category among the nouns linked to the root page, thereby creating the domain knowledge 14a.
  • Shinra is used as the noun category dictionary 14f
  • "onigiri” rice ball
  • the creation unit 15e extracts links corresponding to "food name” from the root page "Japanese cuisine” to automatically list food names such as “chimaki” (rice dumplings) and "okonomiyaki” (savory pancakes). In doing so, the creation unit 15e can create comprehensive domain knowledge 14a by listing as many food names as possible.
  • the creation unit 15e searches for link destinations in the same category "Food Name_Other” for "Chimaki” and “Okonomiyaki” as well, and automatically lists nouns expressing topics such as “Rice” and "Flour”. In doing so, the creation unit 15e is able to create substantial domain knowledge 14a by listing as many topics as possible.
  • the link destinations for each dish may contain various link destinations, which may be acquired as noise. Therefore, the creation unit 15e suppresses noise by making the following judgments.
  • opening paragraph at the beginning of each Wikipedia page which provides an overview of the subject, is likely to contain links to nouns that are highly important to the page and closely related to the subject. Also, opening paragraphs are sometimes written in chronological order, and previous links tend to point to older information.
  • FIG. 15 is a diagram for explaining the processing of the creation unit.
  • the creation unit 15e uses CirrusSearch (https://www.mediawiki.org/wiki/Help:CirrusSearch/ja).
  • CirrusSearch meta-information such as link destinations is recorded as dump data in a format that is easy to automatically analyze. Also, since the implementation of the processing for analysis using crawl data is complicated, the page contents and link destinations of Wikipedia are analyzed.
  • FIG. 15 shows an example of an entry that summarizes information about a page about a sports car called “MG Midget.”
  • the creation unit 15e extracts topics using "opening_text,” “outgoing_link,” “incoming_links,” and the like from the data shown in FIG. 15. Note that "incoming_links” is used for ranking to show the validity of the extracted path.
  • FIG. 16 is a diagram illustrating the results of creating domain knowledge.
  • topics such as “mochi” and “onigiri” are automatically acquired for “Japanese cuisine.” Topics are also automatically acquired for "German cuisine” through a process similar to the process described above for "Japanese cuisine.”
  • “o” is added to the beginning of some nouns such as “mochi” (rice cake) and “kome” (rice) to make them more commonly used.
  • the score is a value that indicates the validity of the topic of each line, i.e., the path of "topic” - “entity” - “category”, and is calculated using Wikipedia's "incoming_links".
  • the value of "incoming_links” on the category page is used as the score, but this is not limiting and the score may be calculated using "incoming_links" on the topic or entity page.
  • Fig. 17 is a flowchart showing a dialogue processing procedure according to the second embodiment.
  • the flowchart in Fig. 17 is started, for example, when a user performs an operation input to instruct the start of the dialogue processing.
  • the acquisition unit 15a acquires a specific root page of the target domain and a specific example of a path that represents a relationship with the root page (step S11). For example, the acquisition unit 15a acquires a specific root page of the target domain and at least one specific example of a path that represents a relationship with the root page via the input unit 11 or from a user terminal or the like via the communication control unit 13.
  • the creation unit 15e creates domain knowledge 14a of the target domain using the noun category dictionary 14f and the inter-page link structure 14e (step S12). Specifically, the creation unit 15e refers to the noun category dictionary 14f to identify the category of the nouns included in the root page of the specific example, and creates domain knowledge 14a by referring to the inter-page link structure 14e and listing nouns in the same category as the nouns included in the root page that are linked to the specific example.
  • the creation unit 15e also stores the created domain knowledge 14a in the storage unit 14. This completes the series of dialogue processes.
  • the dialogue device 10 of the first embodiment and the dialogue device 10a of the second embodiment may be devices that cooperate with each other.
  • the dialogue device 10 of the first embodiment may use the domain knowledge 14a generated by the dialogue device 10a to have a dialogue with a user.
  • the dialogue device 10 of the first embodiment and the dialogue device 10a of the second embodiment may be implemented in the same hardware.
  • the storage unit 14 stores domain knowledge 14a representing information of a specific domain to which the topic belongs, and a state transition diagram 14c representing transitions of specific states independent of the domain to which the topic belongs.
  • the acquisition unit 15a acquires text representing an utterance by a user.
  • the identification unit 15c identifies a dialogue state including a current state of the dialogue with the user and a transition destination state, using information included in the acquired text, the domain knowledge 14a, and the state transition diagram 14c.
  • the generation unit 15d generates an utterance according to the identified transition destination state.
  • the extraction unit 15b extracts information for identifying the state of the dialogue from the acquired text.
  • the identification unit 15c identifies the state of the dialogue using the extracted information, the domain knowledge 14a, and the state transition diagram 14c.
  • the identification unit 15c uses the extracted information to generate a foundational table 14b that indicates the correspondence between the domain knowledge 14a and the state transition diagram 14c, and identifies the state of the dialogue using the generated foundational table 14b and the state transition diagram 14c.
  • the dialogue device 10 can realize a chat dialogue system using a domain-independent state transition diagram 14c simply by setting domain knowledge 14a for each domain.
  • the generation unit 15d also generates utterances using utterance templates 14d corresponding to a specific domain and associated with a state in the state transition diagram 14c. In this way, the dialogue device 10 can easily realize a domain-independent chat dialogue system simply by setting the utterance templates 14d for each domain.
  • the storage unit 14 further stores a noun category dictionary 14f that maps nouns to specific categories, and an inter-page link structure 14e that includes links on the explanation page of each noun to the explanation page of other nouns, the acquisition unit 15a further acquires a specific root page of the target domain and a specific example of a path that represents a relationship with the root page, and the creation unit 15e creates domain knowledge 14a of the target domain using the noun category dictionary 14f and the inter-page link structure 14e.
  • the creation unit 15e creates domain knowledge 14a by referring to the noun category dictionary 14f to identify the category of the nouns included in the root page of the specific example, and referring to the inter-page link structure 14e to list the nouns included in the root page that belong to the same category. This makes it easy to customize the specific example to a desired topic and acquire domain knowledge 14a related to a desired domain.
  • a program in which the process executed by the dialogue device 10 according to the above embodiment is written in a language executable by a computer can also be created.
  • the dialogue device 10 can be implemented by installing a dialogue program that executes the above dialogue process as package software or online software on a desired computer.
  • the above dialogue program can be executed by an information processing device, so that the information processing device can function as the dialogue device 10.
  • the information processing device referred to here includes desktop or notebook personal computers.
  • the information processing device also includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDA (Personal Digital Assistant).
  • the functions of the dialogue device 10 may also be implemented on a cloud server.
  • FIG. 18 is a diagram showing an example of a computer that executes an interactive program.
  • the computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these components is connected by a bus 1080.
  • the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012.
  • the ROM 1011 stores a boot program such as a BIOS (Basic Input Output System).
  • BIOS Basic Input Output System
  • the hard disk drive interface 1030 is connected to a hard disk drive 1031.
  • the disk drive interface 1040 is connected to a disk drive 1041.
  • a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1041.
  • the serial port interface 1050 is connected to a mouse 1051 and a keyboard 1052, for example.
  • the video adapter 1060 is connected to a display 1061, for example.
  • the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. Each piece of information described in the above embodiment is stored, for example, in the hard disk drive 1031 or memory 1010.
  • the dialogue program is stored in the hard disk drive 1031, for example, as a program module 1093 in which instructions to be executed by the computer 1000 are written.
  • the program module 1093 in which each process executed by the dialogue device 10 described in the above embodiment is written is stored in the hard disk drive 1031.
  • data used for information processing by the dialogue program is stored as program data 1094, for example, in the hard disk drive 1031.
  • the CPU 1020 reads the program module 1093 and program data 1094 stored in the hard disk drive 1031 into the RAM 1012 as necessary, and executes each of the procedures described above.
  • the program module 1093 and program data 1094 related to the dialogue program are not limited to being stored in the hard disk drive 1031, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1041 or the like.
  • the program module 1093 and program data 1094 related to the dialogue program may be stored in another computer connected via a network, such as a LAN or WAN (Wide Area Network), and read by the CPU 1020 via the network interface 1070.
  • Memory at least one processor coupled to the memory; Including, The memory includes: storing domain knowledge representing information of a predetermined domain to which the topic belongs and a state transition diagram representing transitions of a predetermined state independent of the domain to which the topic belongs; The processor, Obtaining text representing an utterance by a user; Identifying a state of a dialogue with the user, including a current state and a transition state, using information included in the acquired text, the domain knowledge, and the state transition diagram; A dialogue device that generates an utterance in response to the identified state of the transition destination.
  • a non-transitory storage medium storing a program executable by a computer to execute an interactive process includes: Refer to a memory that stores domain knowledge that represents information of a predetermined domain to which the topic belongs and a state transition diagram that represents transitions of a predetermined state that is independent of the domain to which the topic belongs; Obtaining text representing an utterance by a user; Identifying a state of a dialogue with the user, including a current state and a transition state, using information included in the acquired text, the domain knowledge, and the state transition diagram; A non-transitory storage medium that generates an utterance in response to the identified destination state.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A storage unit (14) stores domain knowledge (14a) representing information on a prescribed domain to which a topic belongs, and a state transition diagram (14c) representing the transition of a prescribed state independent of the domain to which the topic belongs. An acquisition unit (15a) acquires text representing speech made by a user. A specification unit (15c) uses information included in the acquired text, the domain knowledge (14a), and the state transition diagram (14c) so as to specify dialog states that include the current state of the dialog with the user and the state of a transition destination. A generation unit (15d) generates speech in accordance with the specified state of the transition destination.

Description

対話装置、対話方法および対話プログラムDialogue device, dialogue method, and dialogue program
 本発明は、対話装置、対話方法および対話プログラムに関する。 The present invention relates to a dialogue device, a dialogue method, and a dialogue program.
 従来、コンピュータが人間と自然言語等を用いて対話を行う対話システムが知られている。そのうち、雑談対話システムは、状態を持たず、直前のユーザ発話の情報のみに基づいて対話システムが発話を選択・生成する一問一答形式が主流である。このような一問一答形式の雑談対話システムは、一問一答を超える対話が困難であるため、ユーザがかみ合ったと感じる雑談ができず、ユーザの満足度が低いという問題がある。  Conventionally, dialogue systems are known in which computers converse with humans using natural language, etc. Among these, chat dialogue systems are mainly stateless and use a question-and-answer format in which the dialogue system selects and generates utterances based only on information from the user's most recent utterance. This type of chat dialogue system has the problem that it is difficult to have a dialogue that goes beyond a question-and-answer format, making it impossible for users to have casual conversations that they feel are relevant, resulting in low user satisfaction.
 これに対し、雑談対話のドメインを旅行に限定し、対話システムに状態を持たせることによりかみ合った対話を実現する技術が提案されている(非特許文献1参照)。 In response to this, a technology has been proposed that limits the domain of casual conversation to travel and gives the dialogue system a state to realize a more coherent dialogue (see Non-Patent Document 1).
 しかしながら、従来の技術では、ドメインに依存しない雑談対話システムの実現は困難である。例えば、従来の雑談対話システムは、現在の状態を管理する状態遷移図と、ユーザと共有された対話のある時点での話題を表す基盤化テーブルとを用いて、ユーザの発話に応じて適切な状態に遷移したり、基盤化テーブルを更新したりすることにより実現されている。 However, with conventional technology, it is difficult to realize a domain-independent chat dialogue system. For example, conventional chat dialogue systems use a state transition diagram that manages the current state and a foundational table that represents the topic of a conversation shared with a user at a certain point in time, and transition to an appropriate state in response to the user's utterances and update the foundational table.
 この状態遷移図と基盤化テーブルとは、かみ合った対話を可能とするために、特定のドメインに特化して設計されている。したがって、他のドメインでかみ合った対話を可能とするためには、そのドメインに適切な状態遷移図と基盤化テーブルとを設計する必要があり、対話システムの発話理解や対話管理に関する高度な知識が必要となり、構築のコストが高いという問題がある。 These state transition diagrams and infrastructuring tables are designed specifically for a particular domain to enable interlocking dialogue. Therefore, to enable interlocking dialogue in other domains, it is necessary to design state transition diagrams and infrastructuring tables appropriate for that domain, which requires advanced knowledge of the dialogue system's speech understanding and dialogue management, resulting in high construction costs.
 本発明は、上記に鑑みてなされたものであって、ドメインに依存しない雑談対話システムの実現を可能とすることを目的とする。 The present invention was made in consideration of the above, and aims to make it possible to realize a domain-independent chat dialogue system.
 上述した課題を解決し、目的を達成するために、本発明に係る対話装置は、話題の属する所定のドメインの情報を表すドメイン知識と、話題の属するドメインに依存しない所定の状態の遷移を表す状態遷移図とを記憶する記憶部と、ユーザによる発話を表すテキストを取得する取得部と、取得されたテキストに含まれる情報と、前記ドメイン知識と、前記状態遷移図とを用いて、前記ユーザとの対話の現在の状態と遷移先の状態とを含む対話の状態を特定する特定部と、特定された前記遷移先の状態に応じて、発話を生成する生成部と、を有することを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the dialogue device according to the present invention is characterized by having a storage unit that stores domain knowledge representing information of a specific domain to which a topic belongs and a state transition diagram representing transitions of specific states independent of the domain to which the topic belongs, an acquisition unit that acquires text representing an utterance by a user, an identification unit that identifies a dialogue state including a current state of a dialogue with the user and a destination state using information contained in the acquired text, the domain knowledge, and the state transition diagram, and a generation unit that generates an utterance according to the identified destination state.
 本発明によれば、ドメインに依存しない雑談対話システムの実現が可能となる。 The present invention makes it possible to realize a domain-independent chat dialogue system.
図1は、第1の実施形態に係る対話装置の概要を説明するための図である。FIG. 1 is a diagram for explaining an overview of the dialogue device according to the first embodiment. 図2は、第1の実施形態に係る対話装置の概略構成を例示する模式図である。FIG. 2 is a schematic diagram illustrating a schematic configuration of the dialogue device according to the first embodiment. 図3は、ドメイン知識のデータ構成を例示する図である。FIG. 3 is a diagram illustrating an example of a data structure of domain knowledge. 図4は、基盤化テーブルを説明するための図である。FIG. 4 is a diagram for explaining the foundation table. 図5は、抽出部の処理を説明するための図である。FIG. 5 is a diagram for explaining the process of the extraction unit. 図6は、状態遷移図を例示する図である。FIG. 6 is a diagram illustrating a state transition diagram. 図7は、状態遷移図を例示する図である。FIG. 7 is a diagram illustrating a state transition diagram. 図8は、発話テンプレートを例示する図である。FIG. 8 is a diagram illustrating an example of an utterance template. 図9は、対話処理結果を例示する図である。FIG. 9 is a diagram illustrating an example of the dialogue processing result. 図10は、第1の実施形態に係る対話処理手順を示すフローチャートである。FIG. 10 is a flowchart showing an interaction processing procedure according to the first embodiment. 図11は、第2の実施形態に係る対話装置の概要を説明するための図である。FIG. 11 is a diagram for explaining an overview of the dialogue device according to the second embodiment. 図12は、第2の実施形態に係る対話装置の概略構成を例示する模式図である。FIG. 12 is a schematic diagram illustrating a schematic configuration of the dialogue device according to the second embodiment. 図13は、ページ間リンク構造を説明するための図である。FIG. 13 is a diagram for explaining the inter-page link structure. 図14は、名詞カテゴリ辞書を説明するための図である。FIG. 14 is a diagram for explaining the noun category dictionary. 図15は、作成部の処理を説明するための図である。FIG. 15 is a diagram for explaining the process of the creation unit. 図16は、ドメイン知識の作成結果を例示する図である。FIG. 16 is a diagram illustrating a result of creating domain knowledge. 図17は、第2の実施形態に係る対話処理手順を示すフローチャートである。FIG. 17 is a flowchart showing an interaction process procedure according to the second embodiment. 図18は、対話プログラムを実行するコンピュータの一例を示す図である。FIG. 18 is a diagram illustrating an example of a computer that executes a dialogue program.
 以下、図面を参照して、本発明の一実施形態を詳細に説明する。なお、この実施形態により本発明が限定されるものではない。また、図面の記載において、同一部分には同一の符号を付して示している。 Below, one embodiment of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to this embodiment. In addition, in the drawings, the same parts are denoted by the same reference numerals.
[第1の実施形態]
 図1は、第1の実施形態に係る対話装置の概要を説明するための図である。第1の実施形態に係る対話装置は、基盤化テーブル14bと状態遷移図14cとをドメインに依存しないものに拡張して、ドメインに依らないユーザとの雑談対話を実行する。この対話装置は、システムの発話を生成するための発話テンプレート14dと、話題の理解と提示に用いるドメイン知識14aとをドメインに特化させて利用する。例えば、所望のドメインに特化した発話テンプレート14dを状態遷移図14cの各状態に対応づけることで、所望のドメインの発話(以下、システム発話)を出力し、ユーザとのかみ合った対話を実現する。
[First embodiment]
FIG. 1 is a diagram for explaining an overview of a dialogue device according to a first embodiment. The dialogue device according to the first embodiment extends the foundation table 14b and the state transition diagram 14c to be domain-independent, and executes a chat dialogue with a user that is domain-independent. This dialogue device uses a speech template 14d for generating system utterances and a domain knowledge 14a used for understanding and presenting topics, specialized for the domain. For example, by associating a speech template 14d specialized for a desired domain with each state of the state transition diagram 14c, an utterance of the desired domain (hereinafter, a system utterance) is output, and a dialogue that engages with the user is realized.
[対話装置の構成]
 図2は、第1の実施形態に係る対話装置の概略構成を例示する模式図である。図2に例示するように、本実施形態の対話装置10は、パソコン等の汎用コンピュータで実現され、入力部11、出力部12、通信制御部13、記憶部14、および制御部15を備える。
[Configuration of the dialogue device]
Fig. 2 is a schematic diagram illustrating a schematic configuration of the dialogue device according to the first embodiment. As illustrated in Fig. 2, the dialogue device 10 of the present embodiment is realized by a general-purpose computer such as a personal computer, and includes an input unit 11, an output unit 12, a communication control unit 13, a storage unit 14, and a control unit 15.
 入力部11は、キーボードやマウス等の入力デバイスを用いて実現され、操作者による入力操作に対応して、制御部15に対する処理開始などの各種指示情報を入力する。出力部12は、液晶ディスプレイなどの表示装置、プリンター等によって実現される。例えば、出力部12には、後述する対話処理の結果が表示される。 The input unit 11 is realized using input devices such as a keyboard and a mouse, and inputs various instruction information, such as a command to start processing, to the control unit 15 in response to input operations by an operator. The output unit 12 is realized by a display device such as a liquid crystal display, a printer, etc. For example, the output unit 12 displays the results of the dialogue processing described below.
 通信制御部13は、NIC(Network Interface Card)等で実現され、LAN(Local Area Network)やインターネットなどの電気通信回線を介した外部の装置と制御部15との通信を制御する。例えば、通信制御部13は、ユーザが使用するユーザ端末や多様なドメインに関する情報を収集し管理する管理装置等と、制御部15との通信を制御する。 The communication control unit 13 is realized by a NIC (Network Interface Card) or the like, and controls communication between the control unit 15 and external devices via telecommunication lines such as a LAN (Local Area Network) or the Internet. For example, the communication control unit 13 controls communication between the control unit 15 and a management device that collects and manages information about user terminals used by users and various domains.
 記憶部14は、RAM(Random Access Memory)、フラッシュメモリ(Flash Memory)等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。記憶部14には、対話装置10を動作させる処理プログラムや、処理プログラムの実行中に使用されるデータなどが予め記憶され、あるいは処理の都度一時的に記憶される。なお、記憶部14は、通信制御部13を介して制御部15と通信する構成でもよい。 The storage unit 14 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 14 stores in advance the processing programs that operate the dialogue device 10 and data used during the execution of the processing programs, or stores them temporarily each time processing is performed. The storage unit 14 may be configured to communicate with the control unit 15 via the communication control unit 13.
 本実施形態における記憶部14は、後述する対話処理に用いられるドメイン知識14a、基盤化テーブル14b、状態遷移図14c、発話テンプレート14d等を記憶する。ドメイン知識14a、基盤化テーブル14b、状態遷移図14c、発話テンプレート14dは、予め各種の情報を管理する管理装置等から、例えば後述する取得部15aにより取得され、記憶部14に記憶される。 In this embodiment, the storage unit 14 stores domain knowledge 14a, foundation table 14b, state transition diagram 14c, speech template 14d, etc., which are used in the dialogue processing described below. The domain knowledge 14a, foundation table 14b, state transition diagram 14c, and speech template 14d are acquired in advance from a management device that manages various types of information, for example, by the acquisition unit 15a described below, and stored in the storage unit 14.
 ドメイン知識14aは、話題の属する所定のドメインの情報を表す。基盤化テーブル14bは、ドメイン知識14aと状態遷移図14cとの対応付けを示す。状態遷移図14cは、話題の属するドメインに依存しない所定の状態の遷移を表す。発話テンプレート14dは、所定のドメインに応じたシステム発話のテンプレートであって、状態遷移図14cの状態に対応付けられる。 Domain knowledge 14a represents information on a specific domain to which the topic belongs. Infrastructural table 14b shows the correspondence between domain knowledge 14a and state transition diagram 14c. State transition diagram 14c represents a specific state transition that is independent of the domain to which the topic belongs. Speech template 14d is a template for system utterances according to a specific domain, and corresponds to a state in state transition diagram 14c.
 ここで、図3は、ドメイン知識のデータ構成を例示する図である。ドメイン知識14aは、特定のドメインの雑談に必要な話題を木構造で表した情報であり、図3に例示するように、例えば、トピック、エンティティ、カテゴリ、スコアの情報を含む。 Here, FIG. 3 is a diagram illustrating an example of the data structure of domain knowledge. Domain knowledge 14a is information that represents topics necessary for casual conversation in a specific domain in a tree structure, and includes, for example, information on topics, entities, categories, and scores, as illustrated in FIG. 3.
 トピックは、後述する対話処理において導入の話題となりうるドメインの情報である。エンティティは、トピックの話題のうちで焦点となりうる情報である。また、カテゴリは、各エンティティの属性を表す情報である。このように、トピック、エンティティ、カテゴリは、話題の深さが異なる、エンティティが話題そのものであるのに対し、トピックは、エンティティについていきなり言及することなく導入として用いられる話題であり、カテゴリは、エンティティの特定の属性についての話題を表す。 Topics are domain information that can be an introductory topic in the dialogue processing described below. Entities are information that can be the focus of a topic. Furthermore, categories are information that represents the attributes of each entity. In this way, topics, entities, and categories differ in the depth of the topic; whereas entities are the topics themselves, topics are topics used as an introduction without directly mentioning an entity, and categories represent topics about specific attributes of an entity.
 図3には、ドメインを料理としたドメイン知識14aが例示されている。この料理のドメインにおけるトピックは、例えば、日本料理、中華料理、イタリアン等である。また、日本料理の配下のエンティティは、例えば、お餅、おにぎり、おでん等である。また、例えば、お餅、おにぎりのカテゴリとしてはお米、おでんのカテゴリとしては鰹節等が例示されている。また、図3に示すスコアは、話題の優先順位を表し、値が大きいほど優先度が高いことを示している。 In FIG. 3, domain knowledge 14a is illustrated, with the domain being cooking. Topics in this cooking domain are, for example, Japanese cuisine, Chinese cuisine, Italian cuisine, etc. Furthermore, entities under Japanese cuisine are, for example, mochi (rice cake), onigiri (rice ball), oden, etc. Furthermore, examples of categories for mochi and onigiri include rice, and categories for oden include bonito flakes, etc. Furthermore, the scores shown in FIG. 3 indicate the priority of the topics, with a higher value indicating a higher priority.
 後述する対話処理において、制御部15は、ドメイン知識14aを参照して、ユーザ発話でどの話題について言及されているかを同定したり、新しい話題を提供したりする。例えば、制御部15は、ユーザとの間でかみ合う雑談を行うために、所定の条件を満たすように設定されたトピック、エンティティ、カテゴリを用いて、この順に話題を変える。 In the dialogue processing described below, the control unit 15 refers to the domain knowledge 14a to identify what topic is being mentioned in the user's speech and to provide new topics. For example, in order to have a conversation that is relevant to the user, the control unit 15 changes the topic in this order using topics, entities, and categories that are set to satisfy predetermined conditions.
 具体的には、トピックとエンティティ、エンティティとトピックの関係は、has-a関係、またはis-a関係であることが必要である。ここで、has-a関係とは、この関係にある二者のうちの一方が他方を要素として含む関係である。例えば、「日本料理」と「おにぎり」、「自動車」と「タイヤ」は、それぞれhas-a関係にある。また、is-a関係とは、この関係にある二者のうちの一方が他方の抽象概念である関係である。例えば、「料理」と「日本料理」、「乗り物」と「自動車」は、それぞれis-a関係にある。 Specifically, the relationship between a topic and an entity, and between an entity and a topic, must be a has-a relationship or an is-a relationship. Here, a has-a relationship is one in which one of the two entities in the relationship contains the other as an element. For example, "Japanese cuisine" and "rice balls", and "automobiles" and "tires" each have a has-a relationship. Also, an is-a relationship is one in which one of the two entities in the relationship is an abstract concept of the other. For example, "cooking" and "Japanese cuisine", and "vehicles" and "automobiles" each have an is-a relationship.
 また、エンティティとカテゴリとは、一対一ではなく、多対一の関係にあることが必要である。トピック、エンティティ、カテゴリと話題が変わった後に、再度話題がトピックに戻ると、ユーザは急に話題が変わったという印象を抱き、対話がかみ合ったと感じにくい。そのため、同一カテゴリの異なるエンティティに話題を変えることが必要であり、一つのカテゴリに複数のエンティティが対応付けられていることが必要となる。なお、エンティティは複数のカテゴリが対応付けられていてもよい。 Furthermore, entities and categories need to have a many-to-one relationship, not a one-to-one relationship. If the topic changes from topic to entity to category, and then back to topic again, the user will get the impression that the topic has changed suddenly, and will not feel that the conversation was coherent. For this reason, it is necessary to change the topic to a different entity in the same category, and multiple entities need to be associated with one category. Note that an entity may be associated with multiple categories.
 また、図4は、基盤化テーブルを説明するための図である。基盤化テーブル14bは、システムによりユーザとの対話について理解されている情報であり、ユーザとシステムとの対話のある時点での話題を表す情報である。具体的には、基盤化テーブル14bでは、図4に例示するドメインに依存しないスロットの各行の情報と、その値とにより、対話の状態が定義される。 FIG. 4 is a diagram for explaining the foundational table. Foundational table 14b is information that is understood by the system regarding the dialogue with the user, and is information that represents the topic at a certain point in the dialogue between the user and the system. Specifically, in foundational table 14b, the state of the dialogue is defined by the information in each row of the domain-independent slots illustrated in FIG. 4 and their values.
 ここで、スロットの各行の情報は、対象話者、話題種別、トピック、エンティティ、カテゴリ等のドメインに依存しない情報である。トピック、エンティティ、カテゴリは、上記のドメイン知識14aと同様である。また、それぞれの値は、後述する特定部15cにより、ドメイン知識14aの値が対応付けられる。 Here, the information in each row of the slot is domain-independent information such as the target speaker, topic type, topic, entity, and category. Topic, entity, and category are the same as the domain knowledge 14a described above. Furthermore, each value is associated with a value in domain knowledge 14a by the identification unit 15c described later.
 対象話者は、ユーザ/システムのいずれに着目した話題かを表し、後述する状態遷移図14cに基づいて設定される。話題種別は、好きな料理、今食べたい料理、苦手な料理等の予め設定された話題を大まかに表す情報であり、状態遷移図の順に従って設定される。この対象話者および話題種別により、対象話者および話題種別ごとに定義された後述する発話テンプレート14dの集合のうち、適用するものが特定される。これにより、対象話者や話題種別に応じてシステム発話を臨機応変に出力することが可能となる。 The target speaker indicates whether the topic is focused on the user or the system, and is set based on the state transition diagram 14c described below. The topic type is information that roughly indicates pre-set topics such as favorite dishes, dishes you want to eat now, and dishes you don't like, and is set according to the order of the state transition diagram. The target speaker and topic type specify which of the set of speech templates 14d (described below) defined for each target speaker and topic type should be applied. This makes it possible to output system utterances flexibly according to the target speaker and topic type.
 例えば、トピックを話題にしてユーザの好きな料理を聞く場合には、「何の料理が好き?」という発話テンプレート14dを使用する。あるいは、ユーザが食べたい料理を聞く場合には、「今何料理を食べたい気分?」という発話テンプレート14dを使用する。 For example, when talking about a topic and asking the user about their favorite food, speech template 14d "What food do you like?" is used. Or, when asking the user about the food they want to eat, speech template 14d "What food do you feel like eating right now?" is used.
 図2の説明に戻る。制御部15は、CPU(Central Processing Unit)等を用いて実現され、メモリに記憶された処理プログラムを実行する。これにより、制御部15は、図2に例示するように、取得部15a、抽出部15b、特定部15cおよび生成部15dとして機能して、対話処理を実行する。なお、これらの機能部は、それぞれ、あるいは一部が異なるハードウェアに実装されてもよい。また、制御部15は、その他の機能部を備えてもよい。例えば、制御部15は、後述する作成部15eを備えてもよい。 Returning to the explanation of FIG. 2, the control unit 15 is realized using a CPU (Central Processing Unit) or the like, and executes a processing program stored in memory. As a result, the control unit 15 functions as an acquisition unit 15a, an extraction unit 15b, an identification unit 15c, and a generation unit 15d, as exemplified in FIG. 2, to execute interactive processing. Note that each of these functional units, or some of them, may be implemented in different hardware. The control unit 15 may also include other functional units. For example, the control unit 15 may include a creation unit 15e, which will be described later.
 取得部15aは、ユーザによる発話を表すテキストを取得する。例えば、取得部15aは、対話を行うユーザのテキストで表される発話を、入力部11を介して、またはユーザ端末等から通信制御部13を介して取得する。 The acquisition unit 15a acquires text representing an utterance by a user. For example, the acquisition unit 15a acquires an utterance represented by text of a user having a dialogue via the input unit 11 or via the communication control unit 13 from a user terminal or the like.
 抽出部15bは、取得されたテキストから、対話の状態を特定するための情報を抽出する。例えば、抽出部15bは、任意の言語解析ツールを用いて形態素解析を行って、話題を表すキーワードである焦点語、固有名詞、評価表現、否定表現等のモダリティ等を抽出する。これにより、後述する特定部15cが、状態遷移図14cにおける遷移先の状態を特定したり、基盤化テーブル14bに設定する情報を特定したりすることが可能となる。 The extraction unit 15b extracts information for identifying the state of the dialogue from the acquired text. For example, the extraction unit 15b performs morphological analysis using any language analysis tool to extract modalities such as focus words, proper nouns, evaluation expressions, and negation expressions, which are keywords that represent topics. This enables the identification unit 15c, which will be described later, to identify the state to which the transition will be made in the state transition diagram 14c and to identify the information to be set in the foundation table 14b.
 ここで、図5は、抽出部の処理を説明するための図である。図5には、Richindexerと呼ばれる言語解析ツールを用いた解析結果が例示されている。図5に示す例において、lineは入力文を表し、forms、posesは形態素解析を表す。また、namesは固有名詞、semsはモダリティ、evalsは評価表現、centsは焦点語、daは対話行為の推定結果をそれぞれ表す。 Here, FIG. 5 is a diagram for explaining the processing of the extraction unit. FIG. 5 illustrates an example of the analysis results using a language analysis tool called Richindexer. In the example shown in FIG. 5, line represents the input sentence, and forms and poses represent morphological analysis. Additionally, names represents proper nouns, sems represents modality, evals represents evaluative expressions, cents represents focus words, and da represents the estimated results of dialogue acts.
 図2の説明に戻る。特定部15cは、取得されたテキストに含まれる情報と、ドメイン知識14aと、状態遷移図14cとを用いて、ユーザとの対話の現在の状態と遷移先の状態とを含む対話の状態を特定する。 Returning to the explanation of FIG. 2, the identification unit 15c uses the information contained in the acquired text, the domain knowledge 14a, and the state transition diagram 14c to identify the state of the dialogue with the user, including the current state and the transition state.
 具体的には、特定部15cは、抽出された情報と、ドメイン知識14aと、状態遷移図14cとを用いて、対話の状態を特定する。すなわち、特定部15cは、抽出部15bによる解析結果を用いて、ドメイン知識14aと状態遷移図14cとの対応付けを示す基盤化テーブル14bを生成し、生成した基盤化テーブル14bとドメインに依存しない状態で定義された状態遷移図14cとを用いて、対話の現在の状態を把握し、遷移先である次の状態を特定する。 Specifically, the identification unit 15c identifies the state of the dialogue using the extracted information, domain knowledge 14a, and state transition diagram 14c. That is, the identification unit 15c uses the analysis results by the extraction unit 15b to generate a foundational table 14b that indicates the correspondence between domain knowledge 14a and state transition diagram 14c, and uses the generated foundational table 14b and state transition diagram 14c defined in a domain-independent state to grasp the current state of the dialogue and identify the next state to which the dialogue will transition.
 生成部15dは、特定された遷移先の状態に応じて、発話を生成する。具体的には、生成部15dは、状態遷移図14cの状態に対応付けられた、所定のドメインに応じた発話テンプレート14dを用いて、発話を生成する。 The generation unit 15d generates an utterance according to the identified transition destination state. Specifically, the generation unit 15d generates an utterance using an utterance template 14d according to a predetermined domain that is associated with a state in the state transition diagram 14c.
 ここで、図6および図7は、状態遷移図を例示する図である。状態遷移図14cは、ドメインに依存しない各種の状態の遷移を表す情報である。本実施形態における各種の情報とは、発話状態、ユーザ発話取得状態、解析状態、表記用状態、条件分岐状態である。図6に示す吹き出しは、観測された発話や処理を示し、太枠矢印は、観測された状態の遷移を表す。 Here, Figs. 6 and 7 are diagrams illustrating state transition diagrams. State transition diagram 14c is information that represents the transitions of various states that are independent of the domain. In this embodiment, the various information are the speech state, the user speech acquisition state, the analysis state, the notation state, and the conditional branch state. The speech bubbles in Fig. 6 represent the observed speech and processing, and the bold-framed arrows represent the transitions of the observed states.
 図6に左上から右下の斜線の網掛けで示す発話状態は、システム発話を行う状態であり、それぞれに後述する発話テンプレート14dが対応付けられている。図6に右上から左下の斜線の網掛けで示すユーザ発話取得状態は、ユーザの発話の取得が完了するまで待機している状態である。解析状態は、ユーザ発話の解析を行う状態、あるいは、ドメイン知識14aを検索した結果、遷移先が変化する状態である。図6に縦の破線の網掛けで示す表記用状態は、状態遷移図14cの表記の便宜上、矢印が重ならないように図示した状態である。条件分岐状態は、対話状態のみに基づいて遷移先が変化する状態である。 The speech states shown by diagonal shading from the upper left to the lower right in Figure 6 are states in which system utterances are made, and each is associated with an utterance template 14d, described below. The user utterance acquisition state shown by diagonal shading from the upper right to the lower left in Figure 6 is a state in which the system waits until acquisition of the user's utterance is complete. The analysis state is a state in which analysis of the user utterance is made, or a state in which the transition destination changes as a result of searching domain knowledge 14a. The notational states shown by vertical dashed shading in Figure 6 are states shown so that the arrows do not overlap for the convenience of notating the state transition diagram 14c. The conditional branch state is a state in which the transition destination changes based only on the dialogue state.
 このように対話の状態を複数の状態に分類して管理することにより、対話の管理に必要となる複雑な条件分岐を表現することが容易に可能となる。 By classifying and managing the state of a dialogue in this way, it becomes easy to express the complex conditional branching required for dialogue management.
 また、上記の解析状態では、抽出部15bによる解析結果に基づいて、さらに解析が行われて状態が特定される。特定される状態には、否定判定状態、肯定判定状態、経験判定状態、感想判定状態、ユーザ発話中の要素(トピック、エンティティ、カテゴリ)抽出状態、ドメイン知識中の要素(トピック、エンティティ、カテゴリ)検索状態、ランダム遷移状態が含まれる。これにより、対話状態に応じて特定の処理を実行する等、複雑な遷移を表現することが可能となる。 In addition, in the above analysis state, further analysis is performed based on the analysis result by the extraction unit 15b to identify the state. The identified states include a negative judgment state, a positive judgment state, an experience judgment state, an impression judgment state, an element (topic, entity, category) extraction state in the user's utterance, an element (topic, entity, category) search state in the domain knowledge, and a random transition state. This makes it possible to express complex transitions, such as executing a specific process depending on the dialogue state.
 上記の発話状態には、システム発話を作成する際のテンプレートである発話テンプレート14dが対応付けられている。発話テンプレート14dでは、基盤化テーブル14bのスロットの各行の情報がブランクとして記述されている。そして、発話を生成する際に、ブランクに基盤化テーブル14bの現在の値を挿入することにより、システム発話が生成される。 The above speech states are associated with speech template 14d, which is a template for creating system utterances. In speech template 14d, the information for each row of the slots in foundation table 14b is written as blank. When generating an utterance, the system utterance is generated by inserting the current values in foundation table 14b into the blanks.
 ユーザ発話取得状態に遷移するまでに経由した複数の発話状態のシステム発話が1つにまとめて出力される。また、次の状態に複数の発話が対応付けられている場合には、辞書順等の任意の優先順位に基づいて1つの発話が選択されて出力される。  System utterances for multiple utterance states that have been passed through before transitioning to the user utterance acquisition state are output together as one. Also, if multiple utterances are associated with the next state, one utterance is selected and output based on an arbitrary priority, such as dictionary order.
 状態遷移図14cは、図6に示した例に限定されず、以下に示す要件を満たすように設計される。これにより、ユーザがシステムとの複雑な対話ができているという感覚を得ることが可能となる。また、状態遷移図14cはドメインに依存しないため、料理に限定されず、旅行、食べ物、スポーツ等、他のドメインについても適用可能であり、ユーザの満足度の高い対話が可能となる。 The state transition diagram 14c is not limited to the example shown in Figure 6, but is designed to meet the requirements listed below. This allows the user to get the feeling that they are having a complex dialogue with the system. In addition, since the state transition diagram 14c is domain-independent, it is not limited to cooking, but can also be applied to other domains such as travel, food, and sports, enabling dialogue that provides high user satisfaction.
(要件)
 ・任意のタイミングでユーザが「答えたくない」等の否定的な発話を行った場合には、「それはすみません」等、その否定的な発話に対応するシステム発話を出力する。
(Requirements)
If the user makes a negative utterance such as "I don't want to answer" at any time, the system will output an utterance corresponding to that negative utterance, such as "I'm sorry about that."
 ・任意のタイミングでユーザが「あなたはどうですか」等の質問の発話を行った場合には、「私は~なのです」等、その質問に対応するようなシステム発話を出力する。 - If at any time the user utters a question such as "How about you?", the system will output a response to the question such as "I am..."
 ・話題を深められる場合には、可能な限り深めるように遷移させる。 - If the topic can be deepened, transition to deepen it as much as possible.
 ・対話状態でトピック等、特定の深さの話題まで値が設定されている場合には、他のトピック等、同等の深さの異なる話題、あるいはエンティティ、カテゴリ等、それより深い任意の話題にユーザが言及しても、いずれにも適切に対応可能にシステム発話を出力する。 - When a value is set for a topic or other topic of a specific depth during a dialogue, the system will output utterances that can respond appropriately to any topic mentioned by the user, such as other topics of the same depth, or any topic deeper, such as entities or categories.
 ・対象話者は、最初はユーザについて、次にシステムについての話題とし、これを繰り返す。 - The target speaker first talks about the user, then about the system, and then repeats this process.
 ・話題種別は、予め設定された所定の順版で値を変更し、すべての話題種別が使用されたら、最後の話題種別の値を使い続ける。 - The topic type will change value in a pre-defined order, and once all topic types have been used, the value of the last topic type will continue to be used.
 ・話題を変更する場合には、対話状態の値が可能な限り引き継がれるようにする。例えば、図4に例示した対話状態にある場合に、「おにぎり」の話題を、エンティティのみを変更して同じカテゴリ「お米」である「ちまき」の話題に変更する。あるいは、そのような話題がなければ、エンティティとカテゴリを変更して、同じトピック「日本料理」の「お味噌汁」等に変更する。 - When changing topics, the dialogue state values are carried over as much as possible. For example, in the dialogue state shown in Figure 4, the topic of "onigiri" is changed to "chimaki" which is in the same category of "rice" by changing only the entity. Alternatively, if there is no such topic, the entity and category are changed to "miso soup" or the like in the same topic of "Japanese cuisine."
 ・話題を変更する場合に、トピック間、エンティティ間、カテゴリ間での変更が所定の最大回数に達した場合には、対象話者を変更して話題を継続する。例えば、ユーザの好きな料理についての話題の後には、システムの所定の好きな料理についての話題に変更することとし、この変更が所定の最大回数に達した場合には、話題種別を変更する。 - When changing the topic, if the number of changes between topics, entities, or categories reaches a predetermined maximum number, the target speaker will be changed and the topic will continue. For example, after a topic about the user's favorite food is discussed, the topic will change to the system's predetermined favorite food, and if this change reaches a predetermined maximum number, the topic type will be changed.
 ・システムは、ユーザにトピックを尋ね、エンティティを提示し、そのエンティティについての経験を聞いた後、そのカテゴリについての感想を聞く。システムがエンティティを提示したらユーザが感想を述べる等、想定より先のユーザ発話があった場合には、その状態までスキップして遷移させる。 The system asks the user about a topic, presents an entity, asks about their experience with that entity, and then asks about their thoughts about the category. If the user speaks further than expected, such as when the system presents an entity and the user gives their opinion, the system skips to that state and transitions to that state.
 ・上記の対話においてユーザが否定的な発話を行った場合には、以降は話題を深めずに、上記の要件に従って話題を変更する。  - If the user makes a negative statement in the above dialogue, the topic will not be further explored, but will be changed in accordance with the above requirements.
 ・ユーザの経験、感想に対しては、肯定表現/否定表現の有無、評価表現の有無、焦点語の有無等に応じたシステム発話を出力する。 - In response to the user's experiences and impressions, the system outputs utterances that correspond to the presence or absence of positive/negative expressions, evaluation expressions, focus words, etc.
 ・情報の提示に限らず、ユーザから引き出した対話状態の値が未知の話題であれば、その話題について質問したり感想を聞いたりする遷移を設定する。 - Not only will it be limited to presenting information, but if the dialogue state value elicited from the user is an unknown topic, set up a transition to ask a question about that topic or hear their opinion.
 なお、状態遷移図14cの形式は特に限定されず、状態遷移を表現可能なフォーマットで記述されればよい。例えば、状態遷移図14cは、図7に例示するように、SCXMLフォーマットで設定されてもよい。図7に示す例では、datamodelのタグの中に、基盤化テーブル14bに対応する変数が定義され、stateタグで、各状態が定義されている。 The format of state transition diagram 14c is not particularly limited, and it may be described in any format capable of expressing state transitions. For example, state transition diagram 14c may be set in SCXML format, as illustrated in FIG. 7. In the example shown in FIG. 7, variables corresponding to infrastructural table 14b are defined in the datamodel tag, and each state is defined in the state tag.
 また、図8は、発話テンプレートを例示する図である。発話テンプレート14dは、ドメインにより、発話状態と一対一に対応付けられた、発話のリストである。発話状態にはテキストで表される一意のIDが付与され、各発話にもIDが付与されている。状態が遷移してある状態になった場合に、当該状態に対応する発話が出力される。発話は、空文を含む任意のテキストにより設定可能である。 FIG. 8 is a diagram illustrating an example of a speech template. The speech template 14d is a list of utterances that are associated one-to-one with speech states by domain. A unique ID expressed as text is assigned to the speech state, and an ID is also assigned to each utterance. When a state transition occurs and a certain state is reached, an utterance corresponding to that state is output. An utterance can be set by any text, including an empty sentence.
 図8には、yamlフォーマットで表現された発話テンプレートが例示されている。図8に示す例では、発話は1つまたは「-」で始まるリスト形式で設定されている。複数の発話が指定された場合には、所定の優先順位により、あるいはランダムに1つが選択される。また、ユーザ名(user_name)、システム名(sys_name)、トピック(topic)等の[]で表されるテキストは特殊変数を表し、基盤化テーブル14bの対応する値が代入される。 FIG. 8 shows an example of a speech template expressed in YAML format. In the example shown in FIG. 8, the speech is set in a single format or in a list format beginning with "-". When multiple speeches are specified, one is selected according to a predetermined priority order or randomly. In addition, text in [ ] such as the user name (user_name), system name (sys_name), topic (topic), etc. represents a special variable, and the corresponding value in the foundation table 14b is substituted.
 また、図9は、対話処理結果を例示する図である。図9には、「食べ物」ドメインの対話処理結果が例示されている。なお、複数の評価者により、非特許文献1に記載の対話システムによる対話と同等またはそれ以上の満足度が得られたことが確認された。 FIG. 9 is a diagram illustrating an example of the dialogue processing result. FIG. 9 illustrates the dialogue processing result for the "food" domain. It was confirmed by multiple evaluators that the level of satisfaction achieved was equal to or higher than that achieved by the dialogue system described in Non-Patent Document 1.
[対話処理]
 次に、図10を参照して、第1の実施形態に係る対話装置10による対話処理について説明する。図10は、第1の実施形態に係る対話処理手順を示すフローチャートである。図10のフローチャートは、例えば、ユーザが開始を指示する操作入力を行ったタイミングで開始される。
[Interaction processing]
Next, the dialogue processing by the dialogue device 10 according to the first embodiment will be described with reference to Fig. 10. Fig. 10 is a flowchart showing a dialogue processing procedure according to the first embodiment. The flowchart in Fig. 10 is started, for example, when a user performs an operation input to instruct the start of the dialogue processing.
 まず、取得部15aが、ユーザによる発話を表すテキストを取得する(ステップS1)。例えば、取得部15aは、対話を行うユーザのテキストで表される発話を、入力部11を介して、またはユーザ端末等から通信制御部13を介して取得する。 First, the acquisition unit 15a acquires text representing an utterance by a user (step S1). For example, the acquisition unit 15a acquires an utterance represented by text of a user having a dialogue via the input unit 11 or via the communication control unit 13 from a user terminal or the like.
 次に、抽出部15bが、取得されたテキストから、対話の状態を特定するための情報を抽出する(ステップS2)。例えば、抽出部15bは、任意の言語解析ツールを用いて形態素解析を行って、話題を表すキーワードである焦点語、固有名詞、評価表現、否定表現等のモダリティ等を抽出する。 Then, the extraction unit 15b extracts information for identifying the state of the dialogue from the acquired text (step S2). For example, the extraction unit 15b performs morphological analysis using any language analysis tool to extract modalities such as focus words, proper nouns, evaluative expressions, and negative expressions, which are keywords that indicate topics.
 次に、特定部15cが、取得されたテキストに含まれる情報と、ドメイン知識14aと、状態遷移図14cとを用いて、ユーザとの対話の現在の状態と遷移先の状態とを含む対話の状態を特定する(ステップS3)。具体的には、特定部15cは、抽出された情報と、ドメイン知識14aと、状態遷移図14cとを用いて、対話の状態を特定する。 Then, the identification unit 15c uses the information included in the acquired text, the domain knowledge 14a, and the state transition diagram 14c to identify the state of the dialogue with the user, including the current state and the transition destination state (step S3). Specifically, the identification unit 15c uses the extracted information, the domain knowledge 14a, and the state transition diagram 14c to identify the state of the dialogue.
 すなわち、特定部15cは、抽出部15bによる解析結果を用いて、ドメイン知識14aと状態遷移図14cとの対応付けを示す基盤化テーブル14bを生成し、生成した基盤化テーブル14bと、ドメインに依存しない状態で定義された状態遷移図14cとを用いて、対話の現在の状態を把握し、遷移先である次の状態を特定する。 In other words, the identification unit 15c uses the analysis results by the extraction unit 15b to generate a foundational table 14b that indicates the correspondence between the domain knowledge 14a and the state transition diagram 14c, and uses the generated foundational table 14b and the state transition diagram 14c defined in a domain-independent state to grasp the current state of the dialogue and identify the next state to which the dialogue will transition.
 そして、生成部15dが、特定された遷移先の状態に応じて、発話を生成する(ステップS4)。具体的には、生成部15dは、状態遷移図14cの状態に対応付けられた、所定のドメインに応じた発話テンプレート14dを用いて、発話を生成する。生成部15dは、生成した発話を、出力部12あるいは通信制御部13を介してユーザ端末等に出力し、ユーザに対して提示する。これにより、一連の対話処理が終了する。 Then, the generation unit 15d generates an utterance according to the identified transition destination state (step S4). Specifically, the generation unit 15d generates an utterance using an utterance template 14d corresponding to a predetermined domain that is associated with a state in the state transition diagram 14c. The generation unit 15d outputs the generated utterance to a user terminal or the like via the output unit 12 or the communication control unit 13, and presents it to the user. This completes the series of dialogue processes.
[第2の実施形態]
 話題の属する所定のドメインの情報を集約したドメイン知識14aは、従来、特定のドメインに特化して人手で作成した正規表現と、Wikipedia記事の言語解析により、例えば、Wikipediaに予め付与されているカテゴリを用いて作成されている。
Second Embodiment
Domain knowledge 14a, which aggregates information on a specific domain to which a topic belongs, has conventionally been created using regular expressions created manually for a specific domain and linguistic analysis of Wikipedia articles, for example, using categories that are pre-assigned to Wikipedia.
 しかしながら、ドメインごとに正規表現や言語解析プログラムの実装が必要であり、他のドメインに適用するためには多大なコストがかかるうえ、作成者に高度な言語処理の知識が求められる。例えば、Wikipediaのカテゴリには一貫した付与規則があるわけではなく、所望のドメインの話題を十分な精度で取得することが困難である。 However, regular expressions and language analysis programs must be implemented for each domain, and applying them to other domains is costly and requires the creator to have advanced knowledge of language processing. For example, there are no consistent rules for assigning categories on Wikipedia, making it difficult to obtain topics in a desired domain with sufficient accuracy.
 そこで、第2の実施形態の対話装置は、記事間のリンク関係を利用して話題を獲得してドメイン知識14aを作成する。例えば、「日本料理」の中で「おにぎり」は代表的な料理と考えられるため、「日本料理」の記事と「おにぎり」の記事との間には、一方の記事の本文に他方の記事のリンクが含まれるというリンク関係がある可能性が高い。 The dialogue device of the second embodiment therefore acquires topics by utilizing link relationships between articles and creates domain knowledge 14a. For example, since "onigiri" is considered to be a representative dish in the field of "Japanese cuisine," there is a high possibility that there is a link relationship between an article on "Japanese cuisine" and an article on "onigiri" in which the text of one article contains a link to the other article.
 そこで、対話装置は、例えば「食べ物」等の同一のカテゴリに属する「おにぎり」「ちまき」等の話題を表す名詞をノードとして表し、話題間の関係をエッジとして表す木構造のドメイン知識14aを作成する。例えば、ドメイン知識14aは、「おにぎり」は「日本料理」に含まれ、「お米」でできているという関係を表す。 The dialogue device then creates domain knowledge 14a in a tree structure in which nouns expressing topics such as "onigiri" and "chimaki" that belong to the same category, such as "food," are represented as nodes, and relationships between topics are represented as edges. For example, domain knowledge 14a represents the relationship that "onigiri" is included in "Japanese cuisine" and is made of "rice."
 ここで、図11は、第2の実施形態に係る対話装置の概要を説明するための図である。具体的には、対話装置は、名詞を特定のカテゴリにマッピングした辞書である名詞カテゴリ辞書14fを同一のカテゴリに属する話題として用い、特定の対象について解説した記事ページ間のリンク構造で表されるページ間リンク構造14eを話題間の関連として、木構造のドメイン知識14aを作成する。 Here, FIG. 11 is a diagram for explaining an overview of the dialogue device according to the second embodiment. Specifically, the dialogue device uses a noun category dictionary 14f, which is a dictionary that maps nouns to specific categories, as topics that belong to the same category, and creates a tree-structured domain knowledge 14a using an inter-page link structure 14e, which is represented by a link structure between article pages that explain a specific subject, as an association between topics.
 その際に、獲得したい知識を例示するドメイン知識具体例を1つ与えて、話題に応じてカスタマイズすることにより、様々なドメインで所望の知識を話題として獲得することが可能となる。 In this case, by providing one specific example of domain knowledge that illustrates the knowledge to be acquired and customizing it according to the topic, it becomes possible to acquire the desired knowledge as a topic in various domains.
[対話装置の構成]
 図12は、第2の実施形態に係る対話装置の概略構成を例示する模式図である。図12に示す対話装置10aの記憶部14には、基盤化テーブル14b、状態遷移図14c、発話テンプレート14dに代えて、ページ間リンク構造14e、名詞カテゴリ辞書14fを記憶する点が、図2に示した第1の実施形態の対話装置10とは異なる。また、制御部15には、抽出部15b、特定部15cおよび生成部15dに代えて、作成部15eを有する点が、第1の実施形態の対話装置10とは異なる。その他の図2に示した対話装置10と同様の機能部については、説明を省略する。
[Configuration of the dialogue device]
Fig. 12 is a schematic diagram illustrating a schematic configuration of a dialogue device according to the second embodiment. The dialogue device 10a shown in Fig. 12 differs from the dialogue device 10 of the first embodiment shown in Fig. 2 in that the storage unit 14 of the dialogue device 10a stores an inter-page link structure 14e and a noun category dictionary 14f instead of the foundation table 14b, the state transition diagram 14c, and the speech template 14d. The control unit 15 also differs from the dialogue device 10 of the first embodiment in that it has a creation unit 15e instead of the extraction unit 15b, the identification unit 15c, and the generation unit 15d. Descriptions of other functional units similar to those of the dialogue device 10 shown in Fig. 2 will be omitted.
 なお、記憶部14が、基盤化テーブル14b、状態遷移図14c、発話テンプレート14dを記憶し、制御部15が抽出部15b、特定部15cおよび生成部15dを有してもよい。 The storage unit 14 may store the foundation table 14b, the state transition diagram 14c, and the speech template 14d, and the control unit 15 may have an extraction unit 15b, an identification unit 15c, and a generation unit 15d.
 本実施形態における記憶部14は、後述する対話処理で作成されるドメイン知識14aと、対話処理に用いられるページ間リンク構造14e、名詞カテゴリ辞書14fを記憶する。名詞カテゴリ辞書14fは、名詞を所定のカテゴリにマッピングした辞書である。また、ページ間リンク構造14eは、各名詞の解説ページに他の名詞の解説ページへのリンクを含む情報である。これらのページ間リンク構造14e、名詞カテゴリ辞書14fは、予め、例えば管理装置等から取得部15aにより取得され、記憶部14に記憶される。 In this embodiment, the storage unit 14 stores domain knowledge 14a created in the dialogue processing described below, and an inter-page link structure 14e and a noun category dictionary 14f used in the dialogue processing. The noun category dictionary 14f is a dictionary that maps nouns to specific categories. The inter-page link structure 14e is information that includes links on the explanation page of each noun to the explanation page of other nouns. The inter-page link structure 14e and the noun category dictionary 14f are acquired in advance by the acquisition unit 15a from, for example, a management device, and stored in the storage unit 14.
 ここで、図13は、ページ間リンク構造を説明するための図である。ページ間リンク構造14eは、様々なモノ・コトの個々の解説ページに、関連するモノ・コトのリンクが埋め込まれているような任意のリソースのことであり、例えばWikipediaである。 Here, FIG. 13 is a diagram for explaining the inter-page link structure. The inter-page link structure 14e refers to any resource in which links to related things and events are embedded in individual explanatory pages for various things and events, such as Wikipedia.
 図13(a)に例示する「日本さくら名所100選」のページには、桜の有名な観光地が列挙され各地のリンクが埋め込まれている。そのリンク先には、図13(b)に例示するように、例えば「長瀞渓谷」のページがあり、さらにそのリンク先には、図13(c)に例示するように、例えば「埼玉県」のページがある。このようなページ間のリンクを適切にたどることができれば、「長瀞渓谷」が「埼玉県」にあり、「桜」が有名という知識を獲得することが可能となる。すなわち、「埼玉県-長瀞渓谷-桜」というパスを知識として得ることができる。 The "100 Best Cherry Blossom Spots in Japan" page shown in Figure 13(a) lists famous cherry blossom sightseeing spots and includes embedded links to each location. The link leads to a page for "Nagatoro Valley" as shown in Figure 13(b), which in turn leads to a page for "Saitama Prefecture" as shown in Figure 13(c). If the links between such pages can be properly followed, it is possible to acquire the knowledge that "Nagatoro Valley" is in "Saitama Prefecture" and that "cherry blossoms" are famous there. In other words, the path "Saitama Prefecture - Nagatoro Valley - Cherry Blossoms" can be acquired as knowledge.
 名詞カテゴリ辞書14fは、様々なモノ・コトを表す名詞を特定のカテゴリにマッピングする辞書である。例えば、名詞カテゴリ辞書14fでは、名詞「日本」「イタリア」はカテゴリ「国名」にマッピングされ、名詞「おにぎり」「ちまき」はカテゴリ「食べ物名」にマッピングされている。 The noun category dictionary 14f is a dictionary that maps nouns that represent various things and events to specific categories. For example, in the noun category dictionary 14f, the nouns "Japan" and "Italy" are mapped to the category "country name," and the nouns "onigiri" and "chimaki" are mapped to the category "food name."
 ここで、図14は、名詞カテゴリ辞書を説明するための図である。図14には、名詞カテゴリ辞書14fの一例の森羅が例示されている。森羅は、Wikipediaのタイトルを拡張固有表現辞典にマッピングしたリソースである。図14に示す例では、Wikipediaのページタイトルである名詞「箱根登山鉄道ケ1形客車」が、列車名のカテゴリ「1.7.17.2」にマッピングされている。 Here, FIG. 14 is a diagram for explaining the noun category dictionary. FIG. 14 illustrates Shinra, an example of the noun category dictionary 14f. Shinra is a resource that maps Wikipedia titles to an extended named entity dictionary. In the example shown in FIG. 14, the noun "Hakone Tozan Railway Ke-1 passenger car", which is the Wikipedia page title, is mapped to the train name category "1.7.17.2".
 図12の説明に戻る。本実施形態の制御部15は、取得部15aおよび作成部15eを有する。取得部15aは、対象のドメインの所定のルートページと、該ルートページとの関連を表すパスの具体例とを取得する。例えば、取得部15aは、入力部11を介して、あるいはユーザ端末等から通信制御部13を介して、対象のドメインの所定のルートページと、該ルートページとの関連を表すパスの少なくとも1つの具体例とを取得する。 Returning to the explanation of FIG. 12, the control unit 15 of this embodiment has an acquisition unit 15a and a creation unit 15e. The acquisition unit 15a acquires a specific root page of the target domain and a specific example of a path that represents a relationship with the root page. For example, the acquisition unit 15a acquires a specific root page of the target domain and at least one specific example of a path that represents a relationship with the root page via the input unit 11, or via the communication control unit 13 from a user terminal or the like.
 例えば、「料理」のドメインの情報を木構造で表すドメイン知識14aを作成する場合に、取得部15aは、木構造のルートページとして、例えば「日本料理」のページを指定する入力を受け付ける。また、取得部15aは、獲得したいパスの具体例として、「日本料理」-「おにぎり」-「米」の指定入力を受け付ける。 For example, when creating domain knowledge 14a that represents information on the "cooking" domain in a tree structure, the acquisition unit 15a accepts an input that specifies, for example, the "Japanese cuisine" page as the root page of the tree structure. The acquisition unit 15a also accepts an input that specifies "Japanese cuisine" - "rice balls" - "rice" as a specific example of a path to be acquired.
 なお、このパスは、得られるドメイン知識14aの「トピック」-「エンティティ」-「カテゴリ」のパスに対応する。 Note that this path corresponds to the "topic"-"entity"-"category" path of the obtained domain knowledge 14a.
 ここで、後述する処理において、ページ間のリンクを木構造のエッジとして話題間の関連が表されるため、パスの具体例の各名詞のページ間にリンクが存在することが必要である。例えば、「日本料理」のページに「おにぎり」のページへのリンクが含まれていることが必要である。 In the process described below, the links between pages are used as edges in a tree structure to represent the relationships between topics, so it is necessary that there be links between the pages of each noun in the specific path examples. For example, the "Japanese cuisine" page needs to include a link to the "onigiri" page.
 作成部15eは、名詞カテゴリ辞書14fとページ間リンク構造14eとを用いて、対象のドメインのドメイン知識14aを作成する。具体的には、作成部15eは、名詞カテゴリ辞書14fを参照して具体例のルートページに含まれる名詞のカテゴリを特定し、ページ間リンク構造14eを参照し、該ルートページに含まれるリンク先の名詞のうち該カテゴリと同一のカテゴリの名詞を列挙することにより、ドメイン知識14aを作成する。 The creation unit 15e uses the noun category dictionary 14f and the inter-page link structure 14e to create domain knowledge 14a of the target domain. Specifically, the creation unit 15e refers to the noun category dictionary 14f to identify the category of the noun included in the root page of the specific example, and refers to the inter-page link structure 14e to list the nouns in the same category as the category among the nouns linked to the root page, thereby creating the domain knowledge 14a.
 例えば、名詞カテゴリ辞書14fとして森羅を利用すれば、「おにぎり」はカテゴリ「料理名」にマッピングされている。そこで、作成部15eは、ルートページ「日本料理」から「料理名」に該当するリンク先を抽出することにより、「ちまき」「お好み焼き」等の料理名を自動的に列挙する。その際に、作成部15eは、できる限り多数の料理名を列挙することにより、充実したドメイン知識14aを作成することが可能となる。 For example, if Shinra is used as the noun category dictionary 14f, "onigiri" (rice ball) is mapped to the category "food name." The creation unit 15e then extracts links corresponding to "food name" from the root page "Japanese cuisine" to automatically list food names such as "chimaki" (rice dumplings) and "okonomiyaki" (savory pancakes). In doing so, the creation unit 15e can create comprehensive domain knowledge 14a by listing as many food names as possible.
 また、森羅を利用すれば、「おにぎり」に埋め込まれているリンク先の「米」はカテゴリ「食べ物名_その他」にマッピングされている。そこで、作成部15eは、「ちまき」「お好み焼き」についても同じカテゴリ「食べ物名_その他」のリンク先を探索することにより、「米」「小麦粉」等の話題を表す名詞を自動的に列挙する。その際に、作成部15eは、できる限り多数の話題を列挙することにより、充実したドメイン知識14aを作成することが可能となる。 Furthermore, when Shinra is used, the link destination "Rice" embedded in "Onigiri" is mapped to the category "Food Name_Other". Therefore, the creation unit 15e searches for link destinations in the same category "Food Name_Other" for "Chimaki" and "Okonomiyaki" as well, and automatically lists nouns expressing topics such as "Rice" and "Flour". In doing so, the creation unit 15e is able to create substantial domain knowledge 14a by listing as many topics as possible.
 ただし、各料理のリンク先には様々なリンク先が含まれ、ノイズとして獲得される場合がある、そこで、作成部15eは、以下の判定を行うことにより、ノイズを抑える。 However, the link destinations for each dish may contain various link destinations, which may be acquired as noise. Therefore, the creation unit 15e suppresses noise by making the following judgments.
 ・テンプレートページやWikipediaの管理用ページは、ページタイトルの文字列を用いて除外する。 - Template pages and Wikipedia administration pages are excluded using the page title string.
 ・話題を1つだけ獲得したいにも関わらずに複数の話題の候補が列挙された場合には、リンク先がページの冒頭段落に含まれている話題を採用し、冒頭段落に複数の話題が含まれている場合には、そのうおの最後尾の話題を採用する。  - If multiple topic candidates are listed when only one topic is desired, the topic contained in the first paragraph of the page the link destination is to be used, and if the first paragraph contains multiple topics, the last topic in that paragraph will be used.
 例えば、Wikipediaの各ページの先頭にあって、対象についての概要を説明している冒頭段落には、当該ページにおける重要度が非常に高く、対象との関連が深い名詞のリンクが含まれると考えられるためである。また、冒頭段落は時系列に記述されている場合があり、前方のリンク先は過去の情報である傾向があるためである。 For example, the opening paragraph at the beginning of each Wikipedia page, which provides an overview of the subject, is likely to contain links to nouns that are highly important to the page and closely related to the subject. Also, opening paragraphs are sometimes written in chronological order, and previous links tend to point to older information.
 ここで、図15は、作成部の処理を説明するための図である。作成部15eは、例えば図15に示すように、CirrusSearch(https://www.mediawiki.org/wiki/Help:CirrusSearch/ja)を利用する。CirrusSearchでは、リンク先等のメタ的な情報が自動解析しやすい形式でダンプデータとして記録されている。また、クロールデータを利用して解析する処理の実装は猥雑なため、Wikipediaのページ内容やリンク先の解析を行う。 Here, FIG. 15 is a diagram for explaining the processing of the creation unit. For example, as shown in FIG. 15, the creation unit 15e uses CirrusSearch (https://www.mediawiki.org/wiki/Help:CirrusSearch/ja). In CirrusSearch, meta-information such as link destinations is recorded as dump data in a format that is easy to automatically analyze. Also, since the implementation of the processing for analysis using crawl data is complicated, the page contents and link destinations of Wikipedia are analyzed.
 図15には、「MG・ミジェット」というスポーツカーについてのページの情報がまとめられたエントリが例示されている。作成部15eは、図15に例示したデータのうち、「opening_text」「outgoing_link」「incoming_links」等を利用して話題の抽出を行う。なお、「incoming_links」は、抽出されたパスの妥当性を示すためのランキングに使用される。 FIG. 15 shows an example of an entry that summarizes information about a page about a sports car called "MG Midget." The creation unit 15e extracts topics using "opening_text," "outgoing_link," "incoming_links," and the like from the data shown in FIG. 15. Note that "incoming_links" is used for ranking to show the validity of the extracted path.
 また、図16は、ドメイン知識の作成結果を例示する図である。図16に示す例では、例えば「日本料理」について、「お餅」「おにぎり」等の話題が自動的に獲得されている。また、「ドイツ料理」についても、「日本料理」についての上記の処理と同様の処理により話題が自動的に獲得されている。 FIG. 16 is a diagram illustrating the results of creating domain knowledge. In the example shown in FIG. 16, for example, topics such as "mochi" and "onigiri" are automatically acquired for "Japanese cuisine." Topics are also automatically acquired for "German cuisine" through a process similar to the process described above for "Japanese cuisine."
 なお、表記については、一般的な言い方になるように、「餅」「米」等の一部の名詞の先頭に「お」が付与されている。また、スコアは、各行の話題すなわち「トピック」-「エンティティ」-「カテゴリ」のパスの妥当性を表す値であり、Wikipediaの「incoming_links」を用いて算出されている。図16に示す例では、カテゴリのページの「incoming_links」の値がスコアとして用いられていが、これに限定されず、トピックやエンティティのページの「incoming_links」を用いて算出されてもよい。 Regarding the notation, "o" is added to the beginning of some nouns such as "mochi" (rice cake) and "kome" (rice) to make them more commonly used. The score is a value that indicates the validity of the topic of each line, i.e., the path of "topic" - "entity" - "category", and is calculated using Wikipedia's "incoming_links". In the example shown in Figure 16, the value of "incoming_links" on the category page is used as the score, but this is not limiting and the score may be calculated using "incoming_links" on the topic or entity page.
[対話処理]
 次に、図17を参照して、第2の実施形態に係る対話装置10aによる対話処理について説明する。図17は、第2の実施形態に係る対話処理手順を示すフローチャートである。図17のフローチャートは、例えば、ユーザが開始を指示する操作入力を行ったタイミングで開始される。
[Interaction processing]
Next, the dialogue processing by the dialogue device 10a according to the second embodiment will be described with reference to Fig. 17. Fig. 17 is a flowchart showing a dialogue processing procedure according to the second embodiment. The flowchart in Fig. 17 is started, for example, when a user performs an operation input to instruct the start of the dialogue processing.
 まず、取得部15aが、対象のドメインの所定のルートページと、該ルートページとの関連を表すパスの具体例とを取得する(ステップS11)。例えば、取得部15aは、入力部11を介して、あるいはユーザ端末等から通信制御部13を介して、対象のドメインの所定のルートページと、該ルートページとの関連を表すパスの少なくとも1つの具体例とを取得する。 First, the acquisition unit 15a acquires a specific root page of the target domain and a specific example of a path that represents a relationship with the root page (step S11). For example, the acquisition unit 15a acquires a specific root page of the target domain and at least one specific example of a path that represents a relationship with the root page via the input unit 11 or from a user terminal or the like via the communication control unit 13.
 次に、作成部15eが、名詞カテゴリ辞書14fとページ間リンク構造14eとを用いて、対象のドメインのドメイン知識14aを作成する(ステップS12)。具体的には、作成部15eは、名詞カテゴリ辞書14fを参照して具体例のルートページに含まれる名詞のカテゴリを特定し、ページ間リンク構造14eを参照し、該ルートページに含まれるリンク先の名詞のうち該カテゴリと同一のカテゴリの名詞を列挙することにより、ドメイン知識14aを作成する。 Next, the creation unit 15e creates domain knowledge 14a of the target domain using the noun category dictionary 14f and the inter-page link structure 14e (step S12). Specifically, the creation unit 15e refers to the noun category dictionary 14f to identify the category of the nouns included in the root page of the specific example, and creates domain knowledge 14a by referring to the inter-page link structure 14e and listing nouns in the same category as the nouns included in the root page that are linked to the specific example.
 また、作成部15eは、作成したドメイン知識14aを記憶部14に記憶させる。これにより、一連の対話処理が終了する。 The creation unit 15e also stores the created domain knowledge 14a in the storage unit 14. This completes the series of dialogue processes.
[その他の実施形態]
 上記の第1の実施形態の対話装置10と第2の実施形態の対話装置10aとは、協働する装置であってもよい。例えば、第1の実施形態の対話装置10aが生成したドメイン知識14aを用いて、第1の実施形態の対話装置10が、ユーザとの対話をおこなってもよい。その場合には、第1の実施形態の対話装置10と第2の実施形態の対話装置10aとが同一のハードウェアに実装されてもよい。
[Other embodiments]
The dialogue device 10 of the first embodiment and the dialogue device 10a of the second embodiment may be devices that cooperate with each other. For example, the dialogue device 10 of the first embodiment may use the domain knowledge 14a generated by the dialogue device 10a to have a dialogue with a user. In this case, the dialogue device 10 of the first embodiment and the dialogue device 10a of the second embodiment may be implemented in the same hardware.
[効果]
 以上、説明したように、本実施形態の対話装置10において、記憶部14が、話題の属する所定のドメインの情報を表すドメイン知識14aと、話題の属するドメインに依存しない所定の状態の遷移を表す状態遷移図14cとを記憶する。取得部15aが、ユーザによる発話を表すテキストを取得する。特定部15cが、取得されたテキストに含まれる情報と、ドメイン知識14aと、状態遷移図14cとを用いて、ユーザとの対話の現在の状態と遷移先の状態とを含む対話の状態を特定する。生成部15dが、特定された遷移先の状態に応じて、発話を生成する。
[effect]
As described above, in the dialogue device 10 of this embodiment, the storage unit 14 stores domain knowledge 14a representing information of a specific domain to which the topic belongs, and a state transition diagram 14c representing transitions of specific states independent of the domain to which the topic belongs. The acquisition unit 15a acquires text representing an utterance by a user. The identification unit 15c identifies a dialogue state including a current state of the dialogue with the user and a transition destination state, using information included in the acquired text, the domain knowledge 14a, and the state transition diagram 14c. The generation unit 15d generates an utterance according to the identified transition destination state.
 具体的には、抽出部15bが、取得されたテキストから、対話の状態を特定するための情報を抽出する。この場合に、特定部15cは、抽出された情報と、ドメイン知識14aと、状態遷移図14cとを用いて、対話の状態を特定する。 Specifically, the extraction unit 15b extracts information for identifying the state of the dialogue from the acquired text. In this case, the identification unit 15c identifies the state of the dialogue using the extracted information, the domain knowledge 14a, and the state transition diagram 14c.
 すなわち、特定部15cは、抽出された情報を用いて、ドメイン知識14aと状態遷移図14cとの対応付けを示す基盤化テーブル14bを生成し、生成した該基盤化テーブル14bと状態遷移図14cとを用いて、対話の状態を特定する。 In other words, the identification unit 15c uses the extracted information to generate a foundational table 14b that indicates the correspondence between the domain knowledge 14a and the state transition diagram 14c, and identifies the state of the dialogue using the generated foundational table 14b and the state transition diagram 14c.
 このように、対話装置10は、ドメインごとのドメイン知識14aを設定するだけで、ドメインに依存しない状態遷移図14cを用いて雑談対話システムを実現することが可能となる。 In this way, the dialogue device 10 can realize a chat dialogue system using a domain-independent state transition diagram 14c simply by setting domain knowledge 14a for each domain.
 また、生成部15dは、状態遷移図14cの状態に対応付けられた、所定のドメインに応じた発話テンプレート14dを用いて、発話を生成する。このように、対話装置10は、ドメインごとの発話テンプレート14dを設定するだけで、容易にドメインに依存しない雑談対話システムの実現が可能となる。 The generation unit 15d also generates utterances using utterance templates 14d corresponding to a specific domain and associated with a state in the state transition diagram 14c. In this way, the dialogue device 10 can easily realize a domain-independent chat dialogue system simply by setting the utterance templates 14d for each domain.
 また、記憶部14が、名詞を所定のカテゴリにマッピングした名詞カテゴリ辞書14fと、各名詞の解説ページに他の名詞の解説ページへのリンクを含むページ間リンク構造14eをさらに記憶し、取得部15aが、対象のドメインの所定のルートページと、該ルートページとの関連を表すパスの具体例とをさらに取得し、作成部15eが、名詞カテゴリ辞書14fとページ間リンク構造14eとを用いて、対象のドメインのドメイン知識14aを作成する。 The storage unit 14 further stores a noun category dictionary 14f that maps nouns to specific categories, and an inter-page link structure 14e that includes links on the explanation page of each noun to the explanation page of other nouns, the acquisition unit 15a further acquires a specific root page of the target domain and a specific example of a path that represents a relationship with the root page, and the creation unit 15e creates domain knowledge 14a of the target domain using the noun category dictionary 14f and the inter-page link structure 14e.
 具体的には、作成部15eは、名詞カテゴリ辞書14fを参照して具体例のルートページに含まれる名詞のカテゴリを特定し、ページ間リンク構造14eを参照し、該ルートページに含まれる名詞のうち該カテゴリと同一のカテゴリの名詞を列挙することにより、ドメイン知識14aを作成する。これにより、具体例を所望の話題にカスタマイズして、所望のドメインに関するドメイン知識14aを獲得することが容易に可能となる。 Specifically, the creation unit 15e creates domain knowledge 14a by referring to the noun category dictionary 14f to identify the category of the nouns included in the root page of the specific example, and referring to the inter-page link structure 14e to list the nouns included in the root page that belong to the same category. This makes it easy to customize the specific example to a desired topic and acquire domain knowledge 14a related to a desired domain.
[プログラム]
 上記実施形態に係る対話装置10が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。一実施形態として、対話装置10は、パッケージソフトウェアやオンラインソフトウェアとして上記の対話処理を実行する対話プログラムを所望のコンピュータにインストールさせることによって実装できる。例えば、上記の対話プログラムを情報処理装置に実行させることにより、情報処理装置を対話装置10として機能させることができる。ここで言う情報処理装置には、デスクトップ型またはノート型のパーソナルコンピュータが含まれる。また、その他にも、情報処理装置にはスマートフォン、携帯電話機やPHS(Personal Handyphone System)などの移動体通信端末、さらには、PDA(Personal Digital Assistant)などのスレート端末などがその範疇に含まれる。また、対話装置10の機能を、クラウドサーバに実装してもよい。
[program]
A program in which the process executed by the dialogue device 10 according to the above embodiment is written in a language executable by a computer can also be created. As an embodiment, the dialogue device 10 can be implemented by installing a dialogue program that executes the above dialogue process as package software or online software on a desired computer. For example, the above dialogue program can be executed by an information processing device, so that the information processing device can function as the dialogue device 10. The information processing device referred to here includes desktop or notebook personal computers. In addition, the information processing device also includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDA (Personal Digital Assistant). The functions of the dialogue device 10 may also be implemented on a cloud server.
 図18は、対話プログラムを実行するコンピュータの一例を示す図である。コンピュータ1000は、例えば、メモリ1010と、CPU1020と、ハードディスクドライブインタフェース1030と、ディスクドライブインタフェース1040と、シリアルポートインタフェース1050と、ビデオアダプタ1060と、ネットワークインタフェース1070とを有する。これらの各部は、バス1080によって接続される。 FIG. 18 is a diagram showing an example of a computer that executes an interactive program. The computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these components is connected by a bus 1080.
 メモリ1010は、ROM(Read Only Memory)1011およびRAM1012を含む。ROM1011は、例えば、BIOS(Basic Input Output System)等のブートプログラムを記憶する。ハードディスクドライブインタフェース1030は、ハードディスクドライブ1031に接続される。ディスクドライブインタフェース1040は、ディスクドライブ1041に接続される。ディスクドライブ1041には、例えば、磁気ディスクや光ディスク等の着脱可能な記憶媒体が挿入される。シリアルポートインタフェース1050には、例えば、マウス1051およびキーボード1052が接続される。ビデオアダプタ1060には、例えば、ディスプレイ1061が接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1031. The disk drive interface 1040 is connected to a disk drive 1041. A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1041. The serial port interface 1050 is connected to a mouse 1051 and a keyboard 1052, for example. The video adapter 1060 is connected to a display 1061, for example.
 ここで、ハードディスクドライブ1031は、例えば、OS1091、アプリケーションプログラム1092、プログラムモジュール1093およびプログラムデータ1094を記憶する。上記実施形態で説明した各情報は、例えばハードディスクドライブ1031やメモリ1010に記憶される。 Here, the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. Each piece of information described in the above embodiment is stored, for example, in the hard disk drive 1031 or memory 1010.
 また、対話プログラムは、例えば、コンピュータ1000によって実行される指令が記述されたプログラムモジュール1093として、ハードディスクドライブ1031に記憶される。具体的には、上記実施形態で説明した対話装置10が実行する各処理が記述されたプログラムモジュール1093が、ハードディスクドライブ1031に記憶される。 The dialogue program is stored in the hard disk drive 1031, for example, as a program module 1093 in which instructions to be executed by the computer 1000 are written. Specifically, the program module 1093 in which each process executed by the dialogue device 10 described in the above embodiment is written is stored in the hard disk drive 1031.
 また、対話プログラムによる情報処理に用いられるデータは、プログラムデータ1094として、例えば、ハードディスクドライブ1031に記憶される。そして、CPU1020が、ハードディスクドライブ1031に記憶されたプログラムモジュール1093やプログラムデータ1094を必要に応じてRAM1012に読み出して、上述した各手順を実行する。 In addition, data used for information processing by the dialogue program is stored as program data 1094, for example, in the hard disk drive 1031. Then, the CPU 1020 reads the program module 1093 and program data 1094 stored in the hard disk drive 1031 into the RAM 1012 as necessary, and executes each of the procedures described above.
 なお、対話プログラムに係るプログラムモジュール1093やプログラムデータ1094は、ハードディスクドライブ1031に記憶される場合に限られず、例えば、着脱可能な記憶媒体に記憶されて、ディスクドライブ1041等を介してCPU1020によって読み出されてもよい。あるいは、対話プログラムに係るプログラムモジュール1093やプログラムデータ1094は、LANやWAN(Wide Area Network)等のネットワークを介して接続された他のコンピュータに記憶され、ネットワークインタフェース1070を介してCPU1020によって読み出されてもよい。 The program module 1093 and program data 1094 related to the dialogue program are not limited to being stored in the hard disk drive 1031, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1041 or the like. Alternatively, the program module 1093 and program data 1094 related to the dialogue program may be stored in another computer connected via a network, such as a LAN or WAN (Wide Area Network), and read by the CPU 1020 via the network interface 1070.
 以上の実施形態に関し、更に以下の付記を開示する。 The following notes are further provided with respect to the above embodiment.
 (付記項1)
 メモリと、
 前記メモリに接続された少なくとも1つのプロセッサと、
 を含み、
 前記メモリは、
 話題の属する所定のドメインの情報を表すドメイン知識と、話題の属するドメインに依存しない所定の状態の遷移を表す状態遷移図とを記憶し、
 前記プロセッサは、
 ユーザによる発話を表すテキストを取得し、
 取得されたテキストに含まれる情報と、前記ドメイン知識と、前記状態遷移図とを用いて、前記ユーザとの対話の現在の状態と遷移先の状態とを含む対話の状態を特定する特定し、
 特定された前記遷移先の状態に応じて、発話を生成する
 対話装置。
(Additional Note 1)
Memory,
at least one processor coupled to the memory;
Including,
The memory includes:
storing domain knowledge representing information of a predetermined domain to which the topic belongs and a state transition diagram representing transitions of a predetermined state independent of the domain to which the topic belongs;
The processor,
Obtaining text representing an utterance by a user;
Identifying a state of a dialogue with the user, including a current state and a transition state, using information included in the acquired text, the domain knowledge, and the state transition diagram;
A dialogue device that generates an utterance in response to the identified state of the transition destination.
 (付記項2)
 対話処理を実行するようにコンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、
 前記対話処理は、
 話題の属する所定のドメインの情報を表すドメイン知識と、話題の属するドメインに依存しない所定の状態の遷移を表す状態遷移図とを記憶するメモリを参照し、
ユーザによる発話を表すテキストを取得し、
 取得されたテキストに含まれる情報と、前記ドメイン知識と、前記状態遷移図とを用いて、前記ユーザとの対話の現在の状態と遷移先の状態とを含む対話の状態を特定する特定し、
 特定された前記遷移先の状態に応じて、発話を生成する
 非一時的記憶媒体。
(Additional Note 2)
A non-transitory storage medium storing a program executable by a computer to execute an interactive process,
The interactive process includes:
Refer to a memory that stores domain knowledge that represents information of a predetermined domain to which the topic belongs and a state transition diagram that represents transitions of a predetermined state that is independent of the domain to which the topic belongs;
Obtaining text representing an utterance by a user;
Identifying a state of a dialogue with the user, including a current state and a transition state, using information included in the acquired text, the domain knowledge, and the state transition diagram;
A non-transitory storage medium that generates an utterance in response to the identified destination state.
 以上、本発明者によってなされた発明を適用した実施形態について説明したが、本実施形態による本発明の開示の一部をなす記述および図面により本発明は限定されることはない。すなわち、本実施形態に基づいて当業者等によりなされる他の実施形態、実施例および運用技術等は全て本発明の範疇に含まれる。 The above describes an embodiment of the invention made by the inventor, but the present invention is not limited to the descriptions and drawings that form part of the disclosure of the present invention according to this embodiment. In other words, other embodiments, examples, operational techniques, etc. made by those skilled in the art based on this embodiment are all included in the scope of the present invention.
 10 対話装置
 11 入力部
 12 出力部
 13 通信制御部
 14 記憶部
 14a ドメイン知識
 14b 基盤化テーブル
 14c 状態遷移図
 14d 発話テンプレート
 14e ページ間リンク構造
 14f 名詞カテゴリ辞書
 15 制御部
 15a 取得部
 15b 抽出部
 15c 特定部
 15d 生成部
 15e 作成部
REFERENCE SIGNS LIST 10 Dialogue device 11 Input unit 12 Output unit 13 Communication control unit 14 Storage unit 14a Domain knowledge 14b Foundation table 14c State transition diagram 14d Speech template 14e Inter-page link structure 14f Noun category dictionary 15 Control unit 15a Acquisition unit 15b Extraction unit 15c Identification unit 15d Generation unit 15e Creation unit

Claims (8)

  1.  話題の属する所定のドメインの情報を表すドメイン知識と、話題の属するドメインに依存しない所定の状態の遷移を表す状態遷移図とを記憶する記憶部と、
     ユーザによる発話を表すテキストを取得する取得部と、
     取得されたテキストに含まれる情報と、前記ドメイン知識と、前記状態遷移図とを用いて、前記ユーザとの対話の現在の状態と遷移先の状態とを含む対話の状態を特定する特定部と、
     特定された前記遷移先の状態に応じて、発話を生成する生成部と、
     を有することを特徴とする対話装置。
    a storage unit for storing domain knowledge representing information of a predetermined domain to which the topic belongs and a state transition diagram representing a transition of a predetermined state independent of the domain to which the topic belongs;
    an acquisition unit for acquiring text representing an utterance by a user;
    an identification unit that identifies a state of a dialogue including a current state of a dialogue with the user and a transition state using information included in the acquired text, the domain knowledge, and the state transition diagram;
    A generator that generates an utterance in response to the identified transition destination state;
    13. An interactive device comprising:
  2.  取得されたテキストから、対話の状態を特定するための情報を抽出する抽出部をさらに有し、
     前記特定部は、抽出された前記情報と、前記ドメイン知識と、前記状態遷移図とを用いて、前記対話の状態を特定する、
     ことを特徴とする請求項1に記載の対話装置。
    The method further includes an extraction unit that extracts information for identifying a state of the dialogue from the acquired text,
    the identification unit identifies a state of the dialogue by using the extracted information, the domain knowledge, and the state transition diagram.
    2. The interactive device according to claim 1 .
  3.  前記特定部は、抽出された前記情報を用いて、前記ドメイン知識と前記状態遷移図との対応付けを示すテーブルを生成し、生成した該テーブルと前記状態遷移図とを用いて、前記対話の状態を特定することを特徴とする請求項2に記載の対話装置。 The dialogue device according to claim 2, characterized in that the identification unit uses the extracted information to generate a table showing the correspondence between the domain knowledge and the state transition diagram, and identifies the state of the dialogue using the generated table and the state transition diagram.
  4.  前記生成部は、前記状態遷移図の状態に対応付けられた、所定のドメインに応じた発話テンプレートを用いて、前記発話を生成することを特徴とする請求項1に記載の対話装置。 The dialogue device according to claim 1, characterized in that the generation unit generates the utterance using an utterance template corresponding to a predetermined domain and associated with a state of the state transition diagram.
  5.  前記記憶部は、名詞を所定のカテゴリにマッピングした辞書と、各名詞の解説ページに他の名詞の解説ページへのリンクを含むページ間リンク構造をさらに記憶し、
     前記取得部が、対象のドメインの所定のルートページと、該ルートページとの関連を表すパスの具体例とをさらに取得し、
     前記辞書と前記ページ間リンク構造とを用いて、前記対象のドメインのドメイン知識を作成する作成部を、さらに有することを特徴とする請求項1に記載の対話装置。
    the storage unit further stores a dictionary in which nouns are mapped to predetermined categories, and an inter-page link structure in which an explanation page of each noun includes a link to an explanation page of another noun;
    The acquiring unit further acquires a specific root page of the target domain and a specific example of a path representing a relationship with the root page;
    2. The interactive device according to claim 1, further comprising a creation unit that creates domain knowledge of the target domain using the dictionary and the inter-page link structure.
  6.  前記作成部は、前記辞書を参照して前記具体例のルートページに含まれる名詞のカテゴリを特定し、前記ページ間リンク構造を参照し、該ルートページに含まれるリンク先の名詞のうち該カテゴリと同一のカテゴリの名詞を列挙することにより、前記ドメイン知識を作成することを特徴とする請求項5に記載の対話装置。 The interactive device according to claim 5, characterized in that the creation unit creates the domain knowledge by referring to the dictionary to identify the category of the noun included in the root page of the specific example, referring to the inter-page link structure, and listing nouns of the same category as the category among the nouns of the linked nouns included in the root page.
  7.  対話装置が実行する対話方法であって、
     前記対話装置は、話題の属する所定のドメインの情報を表すドメイン知識と、話題の属するドメインに依存しない所定の状態の遷移を表す状態遷移図とを記憶する記憶部を有し、
     ユーザによる発話を表すテキストを取得する取得工程と、
     取得されたテキストに含まれる情報と、前記ドメイン知識と、前記状態遷移図とを用いて、前記ユーザとの対話の現在の状態と遷移先の状態とを含む対話の状態を特定する特定工程と、
     特定された前記遷移先の状態に応じて、発話を生成する生成工程と、
     を含んだことを特徴とする対話方法。
    A dialogue method executed by a dialogue device, comprising:
    The dialogue device has a storage unit for storing domain knowledge representing information of a predetermined domain to which a topic belongs, and a state transition diagram representing a transition of a predetermined state independent of the domain to which the topic belongs,
    - acquiring text representing an utterance by a user;
    an identification step of identifying a state of a dialogue including a current state of a dialogue with the user and a transition state using information included in the acquired text, the domain knowledge, and the state transition diagram;
    A generation step of generating an utterance in response to the identified transition destination state;
    A method of interaction comprising:
  8.  前記対話装置は、話題の属する所定のドメインの情報を表すドメイン知識と、話題の属するドメインに依存しない所定の状態の遷移を表す状態遷移図とを記憶する記憶部を参照し、
     ユーザによる発話を表すテキストを取得する取得ステップと、
     取得されたテキストに含まれる情報と、前記ドメイン知識と、前記状態遷移図とを用いて、前記ユーザとの対話の現在の状態と遷移先の状態とを含む対話の状態を特定する特定ステップと、
     特定された前記遷移先の状態に応じて、発話を生成する生成ステップと、
     をコンピュータに実行させるための対話プログラム。
    the dialogue device refers to a storage unit that stores domain knowledge representing information of a predetermined domain to which a topic belongs and a state transition diagram representing a transition of a predetermined state independent of the domain to which the topic belongs;
    - acquiring a text representing an utterance by a user;
    a step of identifying a state of a dialogue including a current state of a dialogue with the user and a transition state using information included in the acquired text, the domain knowledge, and the state transition diagram;
    A generation step of generating an utterance in response to the identified transition destination state;
    An interactive program for causing a computer to execute the following:
PCT/JP2022/036821 2022-09-30 2022-09-30 Dialogue device, dialogue method, and dialogue program WO2024069974A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/036821 WO2024069974A1 (en) 2022-09-30 2022-09-30 Dialogue device, dialogue method, and dialogue program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/036821 WO2024069974A1 (en) 2022-09-30 2022-09-30 Dialogue device, dialogue method, and dialogue program

Publications (1)

Publication Number Publication Date
WO2024069974A1 true WO2024069974A1 (en) 2024-04-04

Family

ID=90476719

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/036821 WO2024069974A1 (en) 2022-09-30 2022-09-30 Dialogue device, dialogue method, and dialogue program

Country Status (1)

Country Link
WO (1) WO2024069974A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001034635A (en) * 1999-07-26 2001-02-09 Nec Corp System and method for term relation dictionary production and machine-readable recording medium recording program
JP2018010610A (en) * 2016-07-01 2018-01-18 パナソニックIpマネジメント株式会社 Agent device, dialog system, dialog method and program
JP2020119221A (en) * 2019-01-23 2020-08-06 カシオ計算機株式会社 Interactive device, interactive method, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001034635A (en) * 1999-07-26 2001-02-09 Nec Corp System and method for term relation dictionary production and machine-readable recording medium recording program
JP2018010610A (en) * 2016-07-01 2018-01-18 パナソニックIpマネジメント株式会社 Agent device, dialog system, dialog method and program
JP2020119221A (en) * 2019-01-23 2020-08-06 カシオ計算機株式会社 Interactive device, interactive method, and program

Similar Documents

Publication Publication Date Title
US10109264B2 (en) Composing music using foresight and planning
US11036774B2 (en) Knowledge-based question answering system for the DIY domain
TWI684881B (en) Method, system and non-transitory machine-readable medium for generating a conversational agentby automatic paraphrase generation based on machine translation
US10831796B2 (en) Tone optimization for digital content
WO2021232957A1 (en) Response method in man-machine dialogue, dialogue system, and storage medium
US11157704B2 (en) Constrained natural language processing
KR101963915B1 (en) Augmented conversational understanding architecture
US8543375B2 (en) Multi-mode input method editor
US11586689B2 (en) Electronic apparatus and controlling method thereof
KR102445519B1 (en) System and method for manufacturing conversational intelligence service providing chatbot
US11531693B2 (en) Information processing apparatus, method and non-transitory computer readable medium
CN115082602B (en) Method for generating digital person, training method, training device, training equipment and training medium for model
US10095736B2 (en) Using synthetic events to identify complex relation lookups
CN111402872A (en) Voice data processing method and device for intelligent voice conversation system
CN115129878A (en) Conversation service execution method, device, storage medium and electronic equipment
JP2023539232A (en) Conversational syntax using constrained natural language processing to access datasets
JP6895037B2 (en) Speech recognition methods, computer programs and equipment
US20200320134A1 (en) Systems and methods for generating responses for an intelligent virtual
US20230103313A1 (en) User assistance system
WO2024069974A1 (en) Dialogue device, dialogue method, and dialogue program
JP2021039727A (en) Text processing method, device, electronic apparatus, and computer-readable storage medium
JP5722375B2 (en) End-of-sentence expression conversion apparatus, method, and program
CN110263346B (en) Semantic analysis method based on small sample learning, electronic equipment and storage medium
KR20220040997A (en) Electronic apparatus and control method thereof
JP6948978B2 (en) Data structure of knowledge data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22961033

Country of ref document: EP

Kind code of ref document: A1