WO2024069974A1

WO2024069974A1 - Dialogue device, dialogue method, and dialogue program

Info

Publication number: WO2024069974A1
Application number: PCT/JP2022/036821
Authority: WO
Inventors: 航光田; 竜一郎東中; 哲也杵渕
Original assignee: 日本電信電話株式会社
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2024-04-04

Abstract

A storage unit (14) stores domain knowledge (14a) representing information on a prescribed domain to which a topic belongs, and a state transition diagram (14c) representing the transition of a prescribed state independent of the domain to which the topic belongs. An acquisition unit (15a) acquires text representing speech made by a user. A specification unit (15c) uses information included in the acquired text, the domain knowledge (14a), and the state transition diagram (14c) so as to specify dialog states that include the current state of the dialog with the user and the state of a transition destination. A generation unit (15d) generates speech in accordance with the specified state of the transition destination.

Description

Dialogue device, dialogue method, and dialogue program

The present invention relates to a dialogue device, a dialogue method, and a dialogue program.

　Conventionally, dialogue systems are known in which computers converse with humans using natural language, etc. Among these, chat dialogue systems are mainly stateless and use a question-and-answer format in which the dialogue system selects and generates utterances based only on information from the user's most recent utterance. This type of chat dialogue system has the problem that it is difficult to have a dialogue that goes beyond a question-and-answer format, making it impossible for users to have casual conversations that they feel are relevant, resulting in low user satisfaction.

In response to this, a technology has been proposed that limits the domain of casual conversation to travel and gives the dialogue system a state to realize a more coherent dialogue (see Non-Patent Document 1).

However, with conventional technology, it is difficult to realize a domain-independent chat dialogue system. For example, conventional chat dialogue systems use a state transition diagram that manages the current state and a foundational table that represents the topic of a conversation shared with a user at a certain point in time, and transition to an appropriate state in response to the user's utterances and update the foundational table.

These state transition diagrams and infrastructuring tables are designed specifically for a particular domain to enable interlocking dialogue. Therefore, to enable interlocking dialogue in other domains, it is necessary to design state transition diagrams and infrastructuring tables appropriate for that domain, which requires advanced knowledge of the dialogue system's speech understanding and dialogue management, resulting in high construction costs.

The present invention was made in consideration of the above, and aims to make it possible to realize a domain-independent chat dialogue system.

In order to solve the above-mentioned problems and achieve the object, the dialogue device according to the present invention is characterized by having a storage unit that stores domain knowledge representing information of a specific domain to which a topic belongs and a state transition diagram representing transitions of specific states independent of the domain to which the topic belongs, an acquisition unit that acquires text representing an utterance by a user, an identification unit that identifies a dialogue state including a current state of a dialogue with the user and a destination state using information contained in the acquired text, the domain knowledge, and the state transition diagram, and a generation unit that generates an utterance according to the identified destination state.

The present invention makes it possible to realize a domain-independent chat dialogue system.

FIG. 1 is a diagram for explaining an overview of the dialogue device according to the first embodiment. FIG. 2 is a schematic diagram illustrating a schematic configuration of the dialogue device according to the first embodiment. FIG. 3 is a diagram illustrating an example of a data structure of domain knowledge. FIG. 4 is a diagram for explaining the foundation table. FIG. 5 is a diagram for explaining the process of the extraction unit. FIG. 6 is a diagram illustrating a state transition diagram. FIG. 7 is a diagram illustrating a state transition diagram. FIG. 8 is a diagram illustrating an example of an utterance template. FIG. 9 is a diagram illustrating an example of the dialogue processing result. FIG. 10 is a flowchart showing an interaction processing procedure according to the first embodiment. FIG. 11 is a diagram for explaining an overview of the dialogue device according to the second embodiment. FIG. 12 is a schematic diagram illustrating a schematic configuration of the dialogue device according to the second embodiment. FIG. 13 is a diagram for explaining the inter-page link structure. FIG. 14 is a diagram for explaining the noun category dictionary. FIG. 15 is a diagram for explaining the process of the creation unit. FIG. 16 is a diagram illustrating a result of creating domain knowledge. FIG. 17 is a flowchart showing an interaction process procedure according to the second embodiment. FIG. 18 is a diagram illustrating an example of a computer that executes a dialogue program.

Below, one embodiment of the present invention will be described in detail with reference to the drawings. Note that the present invention is not limited to this embodiment. In addition, in the drawings, the same parts are denoted by the same reference numerals.

[First embodiment]
FIG. 1 is a diagram for explaining an overview of a dialogue device according to a first embodiment. The dialogue device according to the first embodiment extends the foundation table 14b and the state transition diagram 14c to be domain-independent, and executes a chat dialogue with a user that is domain-independent. This dialogue device uses a speech template 14d for generating system utterances and a domain knowledge 14a used for understanding and presenting topics, specialized for the domain. For example, by associating a speech template 14d specialized for a desired domain with each state of the state transition diagram 14c, an utterance of the desired domain (hereinafter, a system utterance) is output, and a dialogue that engages with the user is realized.

[Configuration of the dialogue device]
Fig. 2 is a schematic diagram illustrating a schematic configuration of the dialogue device according to the first embodiment. As illustrated in Fig. 2, the dialogue device 10 of the present embodiment is realized by a general-purpose computer such as a personal computer, and includes an input unit 11, an output unit 12, a communication control unit 13, a storage unit 14, and a control unit 15.

The input unit 11 is realized using input devices such as a keyboard and a mouse, and inputs various instruction information, such as a command to start processing, to the control unit 15 in response to input operations by an operator. The output unit 12 is realized by a display device such as a liquid crystal display, a printer, etc. For example, the output unit 12 displays the results of the dialogue processing described below.

The communication control unit 13 is realized by a NIC (Network Interface Card) or the like, and controls communication between the control unit 15 and external devices via telecommunication lines such as a LAN (Local Area Network) or the Internet. For example, the communication control unit 13 controls communication between the control unit 15 and a management device that collects and manages information about user terminals used by users and various domains.

The storage unit 14 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 14 stores in advance the processing programs that operate the dialogue device 10 and data used during the execution of the processing programs, or stores them temporarily each time processing is performed. The storage unit 14 may be configured to communicate with the control unit 15 via the communication control unit 13.

In this embodiment, the storage unit 14 stores domain knowledge 14a, foundation table 14b, state transition diagram 14c, speech template 14d, etc., which are used in the dialogue processing described below. The domain knowledge 14a, foundation table 14b, state transition diagram 14c, and speech template 14d are acquired in advance from a management device that manages various types of information, for example, by the acquisition unit 15a described below, and stored in the storage unit 14.

Domain knowledge 14a represents information on a specific domain to which the topic belongs. Infrastructural table 14b shows the correspondence between domain knowledge 14a and state transition diagram 14c. State transition diagram 14c represents a specific state transition that is independent of the domain to which the topic belongs. Speech template 14d is a template for system utterances according to a specific domain, and corresponds to a state in state transition diagram 14c.

Here, FIG. 3 is a diagram illustrating an example of the data structure of domain knowledge. Domain knowledge 14a is information that represents topics necessary for casual conversation in a specific domain in a tree structure, and includes, for example, information on topics, entities, categories, and scores, as illustrated in FIG. 3.

Topics are domain information that can be an introductory topic in the dialogue processing described below. Entities are information that can be the focus of a topic. Furthermore, categories are information that represents the attributes of each entity. In this way, topics, entities, and categories differ in the depth of the topic; whereas entities are the topics themselves, topics are topics used as an introduction without directly mentioning an entity, and categories represent topics about specific attributes of an entity.

In FIG. 3, domain knowledge 14a is illustrated, with the domain being cooking. Topics in this cooking domain are, for example, Japanese cuisine, Chinese cuisine, Italian cuisine, etc. Furthermore, entities under Japanese cuisine are, for example, mochi (rice cake), onigiri (rice ball), oden, etc. Furthermore, examples of categories for mochi and onigiri include rice, and categories for oden include bonito flakes, etc. Furthermore, the scores shown in FIG. 3 indicate the priority of the topics, with a higher value indicating a higher priority.

In the dialogue processing described below, the control unit 15 refers to the domain knowledge 14a to identify what topic is being mentioned in the user's speech and to provide new topics. For example, in order to have a conversation that is relevant to the user, the control unit 15 changes the topic in this order using topics, entities, and categories that are set to satisfy predetermined conditions.

Specifically, the relationship between a topic and an entity, and between an entity and a topic, must be a has-a relationship or an is-a relationship. Here, a has-a relationship is one in which one of the two entities in the relationship contains the other as an element. For example, "Japanese cuisine" and "rice balls", and "automobiles" and "tires" each have a has-a relationship. Also, an is-a relationship is one in which one of the two entities in the relationship is an abstract concept of the other. For example, "cooking" and "Japanese cuisine", and "vehicles" and "automobiles" each have an is-a relationship.

Furthermore, entities and categories need to have a many-to-one relationship, not a one-to-one relationship. If the topic changes from topic to entity to category, and then back to topic again, the user will get the impression that the topic has changed suddenly, and will not feel that the conversation was coherent. For this reason, it is necessary to change the topic to a different entity in the same category, and multiple entities need to be associated with one category. Note that an entity may be associated with multiple categories.

FIG. 4 is a diagram for explaining the foundational table. Foundational table 14b is information that is understood by the system regarding the dialogue with the user, and is information that represents the topic at a certain point in the dialogue between the user and the system. Specifically, in foundational table 14b, the state of the dialogue is defined by the information in each row of the domain-independent slots illustrated in FIG. 4 and their values.

Here, the information in each row of the slot is domain-independent information such as the target speaker, topic type, topic, entity, and category. Topic, entity, and category are the same as the domain knowledge 14a described above. Furthermore, each value is associated with a value in domain knowledge 14a by the identification unit 15c described later.

The target speaker indicates whether the topic is focused on the user or the system, and is set based on the state transition diagram 14c described below. The topic type is information that roughly indicates pre-set topics such as favorite dishes, dishes you want to eat now, and dishes you don't like, and is set according to the order of the state transition diagram. The target speaker and topic type specify which of the set of speech templates 14d (described below) defined for each target speaker and topic type should be applied. This makes it possible to output system utterances flexibly according to the target speaker and topic type.

For example, when talking about a topic and asking the user about their favorite food, speech template 14d "What food do you like?" is used. Or, when asking the user about the food they want to eat, speech template 14d "What food do you feel like eating right now?" is used.

Returning to the explanation of FIG. 2, the control unit 15 is realized using a CPU (Central Processing Unit) or the like, and executes a processing program stored in memory. As a result, the control unit 15 functions as an acquisition unit 15a, an extraction unit 15b, an identification unit 15c, and a generation unit 15d, as exemplified in FIG. 2, to execute interactive processing. Note that each of these functional units, or some of them, may be implemented in different hardware. The control unit 15 may also include other functional units. For example, the control unit 15 may include a creation unit 15e, which will be described later.

The acquisition unit 15a acquires text representing an utterance by a user. For example, the acquisition unit 15a acquires an utterance represented by text of a user having a dialogue via the input unit 11 or via the communication control unit 13 from a user terminal or the like.

The extraction unit 15b extracts information for identifying the state of the dialogue from the acquired text. For example, the extraction unit 15b performs morphological analysis using any language analysis tool to extract modalities such as focus words, proper nouns, evaluation expressions, and negation expressions, which are keywords that represent topics. This enables the identification unit 15c, which will be described later, to identify the state to which the transition will be made in the state transition diagram 14c and to identify the information to be set in the foundation table 14b.

Here, FIG. 5 is a diagram for explaining the processing of the extraction unit. FIG. 5 illustrates an example of the analysis results using a language analysis tool called Richindexer. In the example shown in FIG. 5, line represents the input sentence, and forms and poses represent morphological analysis. Additionally, names represents proper nouns, sems represents modality, evals represents evaluative expressions, cents represents focus words, and da represents the estimated results of dialogue acts.

Returning to the explanation of FIG. 2, the identification unit 15c uses the information contained in the acquired text, the domain knowledge 14a, and the state transition diagram 14c to identify the state of the dialogue with the user, including the current state and the transition state.

Specifically, the identification unit 15c identifies the state of the dialogue using the extracted information, domain knowledge 14a, and state transition diagram 14c. That is, the identification unit 15c uses the analysis results by the extraction unit 15b to generate a foundational table 14b that indicates the correspondence between domain knowledge 14a and state transition diagram 14c, and uses the generated foundational table 14b and state transition diagram 14c defined in a domain-independent state to grasp the current state of the dialogue and identify the next state to which the dialogue will transition.

The generation unit 15d generates an utterance according to the identified transition destination state. Specifically, the generation unit 15d generates an utterance using an utterance template 14d according to a predetermined domain that is associated with a state in the state transition diagram 14c.

Here, Figs. 6 and 7 are diagrams illustrating state transition diagrams. State transition diagram 14c is information that represents the transitions of various states that are independent of the domain. In this embodiment, the various information are the speech state, the user speech acquisition state, the analysis state, the notation state, and the conditional branch state. The speech bubbles in Fig. 6 represent the observed speech and processing, and the bold-framed arrows represent the transitions of the observed states.

The speech states shown by diagonal shading from the upper left to the lower right in Figure 6 are states in which system utterances are made, and each is associated with an utterance template 14d, described below. The user utterance acquisition state shown by diagonal shading from the upper right to the lower left in Figure 6 is a state in which the system waits until acquisition of the user's utterance is complete. The analysis state is a state in which analysis of the user utterance is made, or a state in which the transition destination changes as a result of searching domain knowledge 14a. The notational states shown by vertical dashed shading in Figure 6 are states shown so that the arrows do not overlap for the convenience of notating the state transition diagram 14c. The conditional branch state is a state in which the transition destination changes based only on the dialogue state.

By classifying and managing the state of a dialogue in this way, it becomes easy to express the complex conditional branching required for dialogue management.

In addition, in the above analysis state, further analysis is performed based on the analysis result by the extraction unit 15b to identify the state. The identified states include a negative judgment state, a positive judgment state, an experience judgment state, an impression judgment state, an element (topic, entity, category) extraction state in the user's utterance, an element (topic, entity, category) search state in the domain knowledge, and a random transition state. This makes it possible to express complex transitions, such as executing a specific process depending on the dialogue state.

The above speech states are associated with speech template 14d, which is a template for creating system utterances. In speech template 14d, the information for each row of the slots in foundation table 14b is written as blank. When generating an utterance, the system utterance is generated by inserting the current values in foundation table 14b into the blanks.

　System utterances for multiple utterance states that have been passed through before transitioning to the user utterance acquisition state are output together as one. Also, if multiple utterances are associated with the next state, one utterance is selected and output based on an arbitrary priority, such as dictionary order.

The state transition diagram 14c is not limited to the example shown in Figure 6, but is designed to meet the requirements listed below. This allows the user to get the feeling that they are having a complex dialogue with the system. In addition, since the state transition diagram 14c is domain-independent, it is not limited to cooking, but can also be applied to other domains such as travel, food, and sports, enabling dialogue that provides high user satisfaction.

(Requirements)
If the user makes a negative utterance such as "I don't want to answer" at any time, the system will output an utterance corresponding to that negative utterance, such as "I'm sorry about that."

- If at any time the user utters a question such as "How about you?", the system will output a response to the question such as "I am..."

- If the topic can be deepened, transition to deepen it as much as possible.

- When a value is set for a topic or other topic of a specific depth during a dialogue, the system will output utterances that can respond appropriately to any topic mentioned by the user, such as other topics of the same depth, or any topic deeper, such as entities or categories.

- The target speaker first talks about the user, then about the system, and then repeats this process.

- The topic type will change value in a pre-defined order, and once all topic types have been used, the value of the last topic type will continue to be used.

- When changing topics, the dialogue state values are carried over as much as possible. For example, in the dialogue state shown in Figure 4, the topic of "onigiri" is changed to "chimaki" which is in the same category of "rice" by changing only the entity. Alternatively, if there is no such topic, the entity and category are changed to "miso soup" or the like in the same topic of "Japanese cuisine."

- When changing the topic, if the number of changes between topics, entities, or categories reaches a predetermined maximum number, the target speaker will be changed and the topic will continue. For example, after a topic about the user's favorite food is discussed, the topic will change to the system's predetermined favorite food, and if this change reaches a predetermined maximum number, the topic type will be changed.

The system asks the user about a topic, presents an entity, asks about their experience with that entity, and then asks about their thoughts about the category. If the user speaks further than expected, such as when the system presents an entity and the user gives their opinion, the system skips to that state and transitions to that state.

　- If the user makes a negative statement in the above dialogue, the topic will not be further explored, but will be changed in accordance with the above requirements.

- In response to the user's experiences and impressions, the system outputs utterances that correspond to the presence or absence of positive/negative expressions, evaluation expressions, focus words, etc.

- Not only will it be limited to presenting information, but if the dialogue state value elicited from the user is an unknown topic, set up a transition to ask a question about that topic or hear their opinion.

The format of state transition diagram 14c is not particularly limited, and it may be described in any format capable of expressing state transitions. For example, state transition diagram 14c may be set in SCXML format, as illustrated in FIG. 7. In the example shown in FIG. 7, variables corresponding to infrastructural table 14b are defined in the datamodel tag, and each state is defined in the state tag.

FIG. 8 is a diagram illustrating an example of a speech template. The speech template 14d is a list of utterances that are associated one-to-one with speech states by domain. A unique ID expressed as text is assigned to the speech state, and an ID is also assigned to each utterance. When a state transition occurs and a certain state is reached, an utterance corresponding to that state is output. An utterance can be set by any text, including an empty sentence.

FIG. 8 shows an example of a speech template expressed in YAML format. In the example shown in FIG. 8, the speech is set in a single format or in a list format beginning with "-". When multiple speeches are specified, one is selected according to a predetermined priority order or randomly. In addition, text in [ ] such as the user name (user_name), system name (sys_name), topic (topic), etc. represents a special variable, and the corresponding value in the foundation table 14b is substituted.

FIG. 9 is a diagram illustrating an example of the dialogue processing result. FIG. 9 illustrates the dialogue processing result for the "food" domain. It was confirmed by multiple evaluators that the level of satisfaction achieved was equal to or higher than that achieved by the dialogue system described in Non-Patent Document 1.

[Interaction processing]
Next, the dialogue processing by the dialogue device 10 according to the first embodiment will be described with reference to Fig. 10. Fig. 10 is a flowchart showing a dialogue processing procedure according to the first embodiment. The flowchart in Fig. 10 is started, for example, when a user performs an operation input to instruct the start of the dialogue processing.

First, the acquisition unit 15a acquires text representing an utterance by a user (step S1). For example, the acquisition unit 15a acquires an utterance represented by text of a user having a dialogue via the input unit 11 or via the communication control unit 13 from a user terminal or the like.

Then, the extraction unit 15b extracts information for identifying the state of the dialogue from the acquired text (step S2). For example, the extraction unit 15b performs morphological analysis using any language analysis tool to extract modalities such as focus words, proper nouns, evaluative expressions, and negative expressions, which are keywords that indicate topics.

Then, the identification unit 15c uses the information included in the acquired text, the domain knowledge 14a, and the state transition diagram 14c to identify the state of the dialogue with the user, including the current state and the transition destination state (step S3). Specifically, the identification unit 15c uses the extracted information, the domain knowledge 14a, and the state transition diagram 14c to identify the state of the dialogue.

In other words, the identification unit 15c uses the analysis results by the extraction unit 15b to generate a foundational table 14b that indicates the correspondence between the domain knowledge 14a and the state transition diagram 14c, and uses the generated foundational table 14b and the state transition diagram 14c defined in a domain-independent state to grasp the current state of the dialogue and identify the next state to which the dialogue will transition.

Then, the generation unit 15d generates an utterance according to the identified transition destination state (step S4). Specifically, the generation unit 15d generates an utterance using an utterance template 14d corresponding to a predetermined domain that is associated with a state in the state transition diagram 14c. The generation unit 15d outputs the generated utterance to a user terminal or the like via the output unit 12 or the communication control unit 13, and presents it to the user. This completes the series of dialogue processes.

Second Embodiment
Domain knowledge 14a, which aggregates information on a specific domain to which a topic belongs, has conventionally been created using regular expressions created manually for a specific domain and linguistic analysis of Wikipedia articles, for example, using categories that are pre-assigned to Wikipedia.

However, regular expressions and language analysis programs must be implemented for each domain, and applying them to other domains is costly and requires the creator to have advanced knowledge of language processing. For example, there are no consistent rules for assigning categories on Wikipedia, making it difficult to obtain topics in a desired domain with sufficient accuracy.

The dialogue device of the second embodiment therefore acquires topics by utilizing link relationships between articles and creates domain knowledge 14a. For example, since "onigiri" is considered to be a representative dish in the field of "Japanese cuisine," there is a high possibility that there is a link relationship between an article on "Japanese cuisine" and an article on "onigiri" in which the text of one article contains a link to the other article.

The dialogue device then creates domain knowledge 14a in a tree structure in which nouns expressing topics such as "onigiri" and "chimaki" that belong to the same category, such as "food," are represented as nodes, and relationships between topics are represented as edges. For example, domain knowledge 14a represents the relationship that "onigiri" is included in "Japanese cuisine" and is made of "rice."

Here, FIG. 11 is a diagram for explaining an overview of the dialogue device according to the second embodiment. Specifically, the dialogue device uses a noun category dictionary 14f, which is a dictionary that maps nouns to specific categories, as topics that belong to the same category, and creates a tree-structured domain knowledge 14a using an inter-page link structure 14e, which is represented by a link structure between article pages that explain a specific subject, as an association between topics.

In this case, by providing one specific example of domain knowledge that illustrates the knowledge to be acquired and customizing it according to the topic, it becomes possible to acquire the desired knowledge as a topic in various domains.

[Configuration of the dialogue device]
Fig. 12 is a schematic diagram illustrating a schematic configuration of a dialogue device according to the second embodiment. The dialogue device 10a shown in Fig. 12 differs from the dialogue device 10 of the first embodiment shown in Fig. 2 in that the storage unit 14 of the dialogue device 10a stores an inter-page link structure 14e and a noun category dictionary 14f instead of the foundation table 14b, the state transition diagram 14c, and the speech template 14d. The control unit 15 also differs from the dialogue device 10 of the first embodiment in that it has a creation unit 15e instead of the extraction unit 15b, the identification unit 15c, and the generation unit 15d. Descriptions of other functional units similar to those of the dialogue device 10 shown in Fig. 2 will be omitted.

The storage unit 14 may store the foundation table 14b, the state transition diagram 14c, and the speech template 14d, and the control unit 15 may have an extraction unit 15b, an identification unit 15c, and a generation unit 15d.

In this embodiment, the storage unit 14 stores domain knowledge 14a created in the dialogue processing described below, and an inter-page link structure 14e and a noun category dictionary 14f used in the dialogue processing. The noun category dictionary 14f is a dictionary that maps nouns to specific categories. The inter-page link structure 14e is information that includes links on the explanation page of each noun to the explanation page of other nouns. The inter-page link structure 14e and the noun category dictionary 14f are acquired in advance by the acquisition unit 15a from, for example, a management device, and stored in the storage unit 14.

Here, FIG. 13 is a diagram for explaining the inter-page link structure. The inter-page link structure 14e refers to any resource in which links to related things and events are embedded in individual explanatory pages for various things and events, such as Wikipedia.

The "100 Best Cherry Blossom Spots in Japan" page shown in Figure 13(a) lists famous cherry blossom sightseeing spots and includes embedded links to each location. The link leads to a page for "Nagatoro Valley" as shown in Figure 13(b), which in turn leads to a page for "Saitama Prefecture" as shown in Figure 13(c). If the links between such pages can be properly followed, it is possible to acquire the knowledge that "Nagatoro Valley" is in "Saitama Prefecture" and that "cherry blossoms" are famous there. In other words, the path "Saitama Prefecture - Nagatoro Valley - Cherry Blossoms" can be acquired as knowledge.

The noun category dictionary 14f is a dictionary that maps nouns that represent various things and events to specific categories. For example, in the noun category dictionary 14f, the nouns "Japan" and "Italy" are mapped to the category "country name," and the nouns "onigiri" and "chimaki" are mapped to the category "food name."

Here, FIG. 14 is a diagram for explaining the noun category dictionary. FIG. 14 illustrates Shinra, an example of the noun category dictionary 14f. Shinra is a resource that maps Wikipedia titles to an extended named entity dictionary. In the example shown in FIG. 14, the noun "Hakone Tozan Railway Ke-1 passenger car", which is the Wikipedia page title, is mapped to the train name category "1.7.17.2".

Returning to the explanation of FIG. 12, the control unit 15 of this embodiment has an acquisition unit 15a and a creation unit 15e. The acquisition unit 15a acquires a specific root page of the target domain and a specific example of a path that represents a relationship with the root page. For example, the acquisition unit 15a acquires a specific root page of the target domain and at least one specific example of a path that represents a relationship with the root page via the input unit 11, or via the communication control unit 13 from a user terminal or the like.

For example, when creating domain knowledge 14a that represents information on the "cooking" domain in a tree structure, the acquisition unit 15a accepts an input that specifies, for example, the "Japanese cuisine" page as the root page of the tree structure. The acquisition unit 15a also accepts an input that specifies "Japanese cuisine" - "rice balls" - "rice" as a specific example of a path to be acquired.

Note that this path corresponds to the "topic"-"entity"-"category" path of the obtained domain knowledge 14a.

In the process described below, the links between pages are used as edges in a tree structure to represent the relationships between topics, so it is necessary that there be links between the pages of each noun in the specific path examples. For example, the "Japanese cuisine" page needs to include a link to the "onigiri" page.

The creation unit 15e uses the noun category dictionary 14f and the inter-page link structure 14e to create domain knowledge 14a of the target domain. Specifically, the creation unit 15e refers to the noun category dictionary 14f to identify the category of the noun included in the root page of the specific example, and refers to the inter-page link structure 14e to list the nouns in the same category as the category among the nouns linked to the root page, thereby creating the domain knowledge 14a.

For example, if Shinra is used as the noun category dictionary 14f, "onigiri" (rice ball) is mapped to the category "food name." The creation unit 15e then extracts links corresponding to "food name" from the root page "Japanese cuisine" to automatically list food names such as "chimaki" (rice dumplings) and "okonomiyaki" (savory pancakes). In doing so, the creation unit 15e can create comprehensive domain knowledge 14a by listing as many food names as possible.

Furthermore, when Shinra is used, the link destination "Rice" embedded in "Onigiri" is mapped to the category "Food Name_Other". Therefore, the creation unit 15e searches for link destinations in the same category "Food Name_Other" for "Chimaki" and "Okonomiyaki" as well, and automatically lists nouns expressing topics such as "Rice" and "Flour". In doing so, the creation unit 15e is able to create substantial domain knowledge 14a by listing as many topics as possible.

However, the link destinations for each dish may contain various link destinations, which may be acquired as noise. Therefore, the creation unit 15e suppresses noise by making the following judgments.

- Template pages and Wikipedia administration pages are excluded using the page title string.

　- If multiple topic candidates are listed when only one topic is desired, the topic contained in the first paragraph of the page the link destination is to be used, and if the first paragraph contains multiple topics, the last topic in that paragraph will be used.

For example, the opening paragraph at the beginning of each Wikipedia page, which provides an overview of the subject, is likely to contain links to nouns that are highly important to the page and closely related to the subject. Also, opening paragraphs are sometimes written in chronological order, and previous links tend to point to older information.

Here, FIG. 15 is a diagram for explaining the processing of the creation unit. For example, as shown in FIG. 15, the creation unit 15e uses CirrusSearch (https://www.mediawiki.org/wiki/Help:CirrusSearch/ja). In CirrusSearch, meta-information such as link destinations is recorded as dump data in a format that is easy to automatically analyze. Also, since the implementation of the processing for analysis using crawl data is complicated, the page contents and link destinations of Wikipedia are analyzed.

FIG. 15 shows an example of an entry that summarizes information about a page about a sports car called "MG Midget." The creation unit 15e extracts topics using "opening_text," "outgoing_link," "incoming_links," and the like from the data shown in FIG. 15. Note that "incoming_links" is used for ranking to show the validity of the extracted path.

FIG. 16 is a diagram illustrating the results of creating domain knowledge. In the example shown in FIG. 16, for example, topics such as "mochi" and "onigiri" are automatically acquired for "Japanese cuisine." Topics are also automatically acquired for "German cuisine" through a process similar to the process described above for "Japanese cuisine."

Regarding the notation, "o" is added to the beginning of some nouns such as "mochi" (rice cake) and "kome" (rice) to make them more commonly used. The score is a value that indicates the validity of the topic of each line, i.e., the path of "topic" - "entity" - "category", and is calculated using Wikipedia's "incoming_links". In the example shown in Figure 16, the value of "incoming_links" on the category page is used as the score, but this is not limiting and the score may be calculated using "incoming_links" on the topic or entity page.

[Interaction processing]
Next, the dialogue processing by the dialogue device 10a according to the second embodiment will be described with reference to Fig. 17. Fig. 17 is a flowchart showing a dialogue processing procedure according to the second embodiment. The flowchart in Fig. 17 is started, for example, when a user performs an operation input to instruct the start of the dialogue processing.

First, the acquisition unit 15a acquires a specific root page of the target domain and a specific example of a path that represents a relationship with the root page (step S11). For example, the acquisition unit 15a acquires a specific root page of the target domain and at least one specific example of a path that represents a relationship with the root page via the input unit 11 or from a user terminal or the like via the communication control unit 13.

Next, the creation unit 15e creates domain knowledge 14a of the target domain using the noun category dictionary 14f and the inter-page link structure 14e (step S12). Specifically, the creation unit 15e refers to the noun category dictionary 14f to identify the category of the nouns included in the root page of the specific example, and creates domain knowledge 14a by referring to the inter-page link structure 14e and listing nouns in the same category as the nouns included in the root page that are linked to the specific example.

The creation unit 15e also stores the created domain knowledge 14a in the storage unit 14. This completes the series of dialogue processes.

[Other embodiments]
The dialogue device 10 of the first embodiment and the dialogue device 10a of the second embodiment may be devices that cooperate with each other. For example, the dialogue device 10 of the first embodiment may use the domain knowledge 14a generated by the dialogue device 10a to have a dialogue with a user. In this case, the dialogue device 10 of the first embodiment and the dialogue device 10a of the second embodiment may be implemented in the same hardware.

[effect]
As described above, in the dialogue device 10 of this embodiment, the storage unit 14 stores domain knowledge 14a representing information of a specific domain to which the topic belongs, and a state transition diagram 14c representing transitions of specific states independent of the domain to which the topic belongs. The acquisition unit 15a acquires text representing an utterance by a user. The identification unit 15c identifies a dialogue state including a current state of the dialogue with the user and a transition destination state, using information included in the acquired text, the domain knowledge 14a, and the state transition diagram 14c. The generation unit 15d generates an utterance according to the identified transition destination state.

Specifically, the extraction unit 15b extracts information for identifying the state of the dialogue from the acquired text. In this case, the identification unit 15c identifies the state of the dialogue using the extracted information, the domain knowledge 14a, and the state transition diagram 14c.

In other words, the identification unit 15c uses the extracted information to generate a foundational table 14b that indicates the correspondence between the domain knowledge 14a and the state transition diagram 14c, and identifies the state of the dialogue using the generated foundational table 14b and the state transition diagram 14c.

In this way, the dialogue device 10 can realize a chat dialogue system using a domain-independent state transition diagram 14c simply by setting domain knowledge 14a for each domain.

The generation unit 15d also generates utterances using utterance templates 14d corresponding to a specific domain and associated with a state in the state transition diagram 14c. In this way, the dialogue device 10 can easily realize a domain-independent chat dialogue system simply by setting the utterance templates 14d for each domain.

The storage unit 14 further stores a noun category dictionary 14f that maps nouns to specific categories, and an inter-page link structure 14e that includes links on the explanation page of each noun to the explanation page of other nouns, the acquisition unit 15a further acquires a specific root page of the target domain and a specific example of a path that represents a relationship with the root page, and the creation unit 15e creates domain knowledge 14a of the target domain using the noun category dictionary 14f and the inter-page link structure 14e.

Specifically, the creation unit 15e creates domain knowledge 14a by referring to the noun category dictionary 14f to identify the category of the nouns included in the root page of the specific example, and referring to the inter-page link structure 14e to list the nouns included in the root page that belong to the same category. This makes it easy to customize the specific example to a desired topic and acquire domain knowledge 14a related to a desired domain.

[program]
A program in which the process executed by the dialogue device 10 according to the above embodiment is written in a language executable by a computer can also be created. As an embodiment, the dialogue device 10 can be implemented by installing a dialogue program that executes the above dialogue process as package software or online software on a desired computer. For example, the above dialogue program can be executed by an information processing device, so that the information processing device can function as the dialogue device 10. The information processing device referred to here includes desktop or notebook personal computers. In addition, the information processing device also includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), as well as slate terminals such as PDA (Personal Digital Assistant). The functions of the dialogue device 10 may also be implemented on a cloud server.

FIG. 18 is a diagram showing an example of a computer that executes an interactive program. The computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these components is connected by a bus 1080.

The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012. The ROM 1011 stores a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1031. The disk drive interface 1040 is connected to a disk drive 1041. A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1041. The serial port interface 1050 is connected to a mouse 1051 and a keyboard 1052, for example. The video adapter 1060 is connected to a display 1061, for example.

Here, the hard disk drive 1031 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. Each piece of information described in the above embodiment is stored, for example, in the hard disk drive 1031 or memory 1010.

The dialogue program is stored in the hard disk drive 1031, for example, as a program module 1093 in which instructions to be executed by the computer 1000 are written. Specifically, the program module 1093 in which each process executed by the dialogue device 10 described in the above embodiment is written is stored in the hard disk drive 1031.

In addition, data used for information processing by the dialogue program is stored as program data 1094, for example, in the hard disk drive 1031. Then, the CPU 1020 reads the program module 1093 and program data 1094 stored in the hard disk drive 1031 into the RAM 1012 as necessary, and executes each of the procedures described above.

The program module 1093 and program data 1094 related to the dialogue program are not limited to being stored in the hard disk drive 1031, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1041 or the like. Alternatively, the program module 1093 and program data 1094 related to the dialogue program may be stored in another computer connected via a network, such as a LAN or WAN (Wide Area Network), and read by the CPU 1020 via the network interface 1070.

The following notes are further provided with respect to the above embodiment.

(Additional Note 1)
Memory,
at least one processor coupled to the memory;
Including,
The memory includes:
storing domain knowledge representing information of a predetermined domain to which the topic belongs and a state transition diagram representing transitions of a predetermined state independent of the domain to which the topic belongs;
The processor,
Obtaining text representing an utterance by a user;
Identifying a state of a dialogue with the user, including a current state and a transition state, using information included in the acquired text, the domain knowledge, and the state transition diagram;
A dialogue device that generates an utterance in response to the identified state of the transition destination.

(Additional Note 2)
A non-transitory storage medium storing a program executable by a computer to execute an interactive process,
The interactive process includes:
Refer to a memory that stores domain knowledge that represents information of a predetermined domain to which the topic belongs and a state transition diagram that represents transitions of a predetermined state that is independent of the domain to which the topic belongs;
Obtaining text representing an utterance by a user;
Identifying a state of a dialogue with the user, including a current state and a transition state, using information included in the acquired text, the domain knowledge, and the state transition diagram;
A non-transitory storage medium that generates an utterance in response to the identified destination state.

The above describes an embodiment of the invention made by the inventor, but the present invention is not limited to the descriptions and drawings that form part of the disclosure of the present invention according to this embodiment. In other words, other embodiments, examples, operational techniques, etc. made by those skilled in the art based on this embodiment are all included in the scope of the present invention.

REFERENCE SIGNS LIST 10 Dialogue device 11 Input unit 12 Output unit 13 Communication control unit 14 Storage unit 14a Domain knowledge 14b Foundation table 14c State transition diagram 14d Speech template 14e Inter-page link structure 14f Noun category dictionary 15 Control unit 15a Acquisition unit 15b Extraction unit 15c Identification unit 15d Generation unit 15e Creation unit

Claims

a storage unit for storing domain knowledge representing information of a predetermined domain to which the topic belongs and a state transition diagram representing a transition of a predetermined state independent of the domain to which the topic belongs;
an acquisition unit for acquiring text representing an utterance by a user;
an identification unit that identifies a state of a dialogue including a current state of a dialogue with the user and a transition state using information included in the acquired text, the domain knowledge, and the state transition diagram;
A generator that generates an utterance in response to the identified transition destination state;
13. An interactive device comprising:
The method further includes an extraction unit that extracts information for identifying a state of the dialogue from the acquired text,
the identification unit identifies a state of the dialogue by using the extracted information, the domain knowledge, and the state transition diagram.
2. The interactive device according to claim 1 .
The dialogue device according to claim 2, characterized in that the identification unit uses the extracted information to generate a table showing the correspondence between the domain knowledge and the state transition diagram, and identifies the state of the dialogue using the generated table and the state transition diagram.
The dialogue device according to claim 1, characterized in that the generation unit generates the utterance using an utterance template corresponding to a predetermined domain and associated with a state of the state transition diagram.
the storage unit further stores a dictionary in which nouns are mapped to predetermined categories, and an inter-page link structure in which an explanation page of each noun includes a link to an explanation page of another noun;
The acquiring unit further acquires a specific root page of the target domain and a specific example of a path representing a relationship with the root page;
2. The interactive device according to claim 1, further comprising a creation unit that creates domain knowledge of the target domain using the dictionary and the inter-page link structure.
The interactive device according to claim 5, characterized in that the creation unit creates the domain knowledge by referring to the dictionary to identify the category of the noun included in the root page of the specific example, referring to the inter-page link structure, and listing nouns of the same category as the category among the nouns of the linked nouns included in the root page.
A dialogue method executed by a dialogue device, comprising:
The dialogue device has a storage unit for storing domain knowledge representing information of a predetermined domain to which a topic belongs, and a state transition diagram representing a transition of a predetermined state independent of the domain to which the topic belongs,
- acquiring text representing an utterance by a user;
an identification step of identifying a state of a dialogue including a current state of a dialogue with the user and a transition state using information included in the acquired text, the domain knowledge, and the state transition diagram;
A generation step of generating an utterance in response to the identified transition destination state;
A method of interaction comprising:
the dialogue device refers to a storage unit that stores domain knowledge representing information of a predetermined domain to which a topic belongs and a state transition diagram representing a transition of a predetermined state independent of the domain to which the topic belongs;
- acquiring a text representing an utterance by a user;
a step of identifying a state of a dialogue including a current state of a dialogue with the user and a transition state using information included in the acquired text, the domain knowledge, and the state transition diagram;
A generation step of generating an utterance in response to the identified transition destination state;
An interactive program for causing a computer to execute the following: