WO2019158014A1 - 由计算机实施的与用户对话的方法和计算机系统 - Google Patents

由计算机实施的与用户对话的方法和计算机系统 Download PDF

Info

Publication number
WO2019158014A1
WO2019158014A1 PCT/CN2019/074666 CN2019074666W WO2019158014A1 WO 2019158014 A1 WO2019158014 A1 WO 2019158014A1 CN 2019074666 W CN2019074666 W CN 2019074666W WO 2019158014 A1 WO2019158014 A1 WO 2019158014A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
input
natural language
semantic representation
team
Prior art date
Application number
PCT/CN2019/074666
Other languages
English (en)
French (fr)
Inventor
邬学宁
Original Assignee
上海好体信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海好体信息科技有限公司 filed Critical 上海好体信息科技有限公司
Publication of WO2019158014A1 publication Critical patent/WO2019158014A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation

Definitions

  • the present disclosure relates to a computer implemented method and computer system for talking to a user, and more particularly to a computer implemented method and computer system for a user dialogue with a vertical field.
  • the existing dialogue robot can better talk to people simply, but it is difficult to appropriately answer complicated questions or engage in deep dialogue with the user. For example, when a user's problem or expression requires one or more logical reasoning to understand or respond, the dialogue robot often cannot cope. Such problems are more common and common for robots in the vertical domain than in the open field.
  • "Open field” means that when a user talks to a robot, the conversation is not restricted to a specific field, and the user can talk to the robot about any topic.
  • the “vertical domain” is also called “closed field”.
  • the dialogue robot in the vertical field refers to the dialogue is limited to a specific field or industry when the user talks with the robot.
  • chatbots in the vertical domain because the conversation is restricted to a certain field, the user will try to make a complicated dialogue with the robot for the deep topic in the specific field, and expect a more in-depth reply.
  • the simple response and database query can not get the appropriate response, the existing dialogue robot can not cope with the dialogue situation in the vertical field.
  • a computer-implemented method of talking to a user comprising: receiving input from a user in a natural language format; performing natural language understanding on the input, generating a semantic representation; using the knowledge map to represent the semantic representation Processing to generate a response; natural language generation based on the response to get the output in natural language format; and providing the output to the user.
  • the method is used in the vertical field.
  • a computer system comprising: an input ⁇ output interface configured to receive an input in a natural language format from a user and to provide an output in a natural language format to a user; a processor; and a memory It is configured to couple to the processor and store the computer program.
  • the processor is configured to execute the program to: receive input from a user in a natural language format; perform natural language understanding of the input, generate a semantic representation; process the semantic representation with the knowledge map to generate a response; The language is generated to get the output of the natural language format; and the output is provided to the user.
  • the method is used in the vertical field.
  • One of the advantages of embodiments in accordance with the present disclosure is that it is possible to answer complex and/or deep questions of the user in the vertical domain.
  • FIG. 1 is a diagram showing a computer system in accordance with an embodiment of the present disclosure.
  • FIG. 2 is a flow diagram of a method of talking to a user implemented by a computer system, in accordance with an embodiment of the present disclosure.
  • FIG. 3 is a schematic diagram of an intent-based semantic representation, in accordance with an embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of a knowledge map in accordance with the present disclosure.
  • FIG. 5 is a schematic diagram of a grammar-based semantic representation, in accordance with an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of text analyzed by dependency grammar, in accordance with an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of text analyzed by dependency grammar, in accordance with an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of text analyzed by dependency grammar, in accordance with an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of text analyzed by dependency grammar, in accordance with an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of text analyzed by dependency grammar, in accordance with an embodiment of the present disclosure.
  • FIG. 11 is a schematic diagram of an expression represented by a knowledge map, in accordance with an embodiment of the present disclosure.
  • FIG. 1 is a diagram showing a computer system 1 for implementing a method of talking to a user in accordance with the present disclosure, in accordance with an embodiment of the present disclosure.
  • the computer system 1 may be referred to as a "conversational robot.”
  • the computing system 1 shown in FIG. 1 is an example of a hardware device that can be applied to the present disclosure.
  • the computing system 1 can be a variety of computing devices that perform processing and/or computing, including but not limited to workstations, servers, desktop computers, laptops, tablets, personal digital assistants, smart phones, on-board computers, smart speakers, or Their combination.
  • the computer system 1 includes various components that can be included.
  • computer system 1 includes a processor 10, a memory 20, and an input ⁇ output interface 30.
  • the processor 10 can be any type of processor and can include, but is not limited to, a general purpose processor and/or a professional purpose processor (such as a specially processed chip).
  • the memory 20 can include or be connected to any storage device, such as a non-transitory storage device, and can perform data storage.
  • Memory 20 includes, but is not limited to, a disk drive, an optical storage device, a solid state storage device, a floppy disk, a hard disk, a flexible disk, or any other magnetic medium that a computer can read and record data, instructions, and/or code.
  • Types of memory 20 include, for example, but are not limited to, ROM (Read Only Memory), RAM (Random Access Memory), Flash Cache Memory, other memory chips, and/or other storage media.
  • Memory 20 can be coupled to processor 10 and store any data/instructions/code.
  • the memory stores a computer program for the technical solution of the present disclosure, which can be read and executed by a processor to implement the technical solution of the present disclosure.
  • the input output interface 30 is configured to receive input in a natural language format from a user and provide the user with an output in a natural language format.
  • input-output interface 30 can include and/or be connected to any device that can receive input from a user in a native language format and provide the user with an output in a natural language format, including but not limited to a mouse, keyboard, touch screen, microphone, and/or Or remote control, as well as monitors, speakers, video/audio output ports, vibrators, and/or printers.
  • the various devices shown in Figure 1 can be connected by, for example, a bus and composed of local devices. Additionally, the input and output interface 30 can be located in a remote device remote from the processor 10, for example, in a user's mobile device. In addition, the various devices illustrated in FIG. 1 may employ a cloud computing configuration in which individual functions are split and shared by multiple devices connected through a network. For example, processor 10 and memory 20 can be distributed across multiple devices and distributed for deployment. In some embodiments, a portion of processor 10 may be located in a remote device, such as in a user's mobile device, and the mobile device carries a portion of the features of the disclosed aspects. For example, the technical solution of the present disclosure includes an APP that is executed by a mobile device.
  • an APP that is executed by a mobile device.
  • the manner of communication between the various devices may include, for example, but not limited to, wired communication devices and/or wireless communication devices.
  • Wired communication devices include, for example, modems, network cards, and fiber optic communication devices.
  • Wireless communication devices include, for example, infrared communication devices, Bluetooth devices, 1302.11 devices, WIFI devices, WiMax devices, cellular communication devices, and the like.
  • FIG. 2 is a flow diagram of a method of talking to a user implemented by computer system 1 in accordance with an embodiment of the present disclosure.
  • a computer-implemented method of talking to a user in accordance with an embodiment of the present disclosure begins in step S201, ie, processor 10 receives input from a user in a natural language format via input/output interface 30.
  • Natural language refers to the language people use every day, and it is the language used to communicate between people. Simple examples of natural language include Chinese, English, German, etc., which people use every day.
  • the logical language which is the language used by people to communicate with machines. Simple examples of logical languages include various computer languages.
  • the user's input can be text, voice, video, etc. in a natural language format. For example, the user's input can be a piece of text entered by the input method.
  • the user's input may be a piece of speech input through a microphone, after which the speech may be converted to text by speech recognition.
  • the user's input may be a video input through the camera and microphone, after which the speech in the video may be converted to text by speech recognition.
  • the user's input can include various types of sentences.
  • the user's input can be a question that the user wishes to be answered.
  • the user's input can be "Which team does Player A play for?”, "Who is the coach of Player A?”, "The coach of Team A and the coach of Team B are between What relationship?”, "Which team does Player A's brother play for?”, "Which team in the international league is also a member of Team B".
  • a chat bot in accordance with an embodiment of the present disclosure may provide an answer to the question as a reply.
  • the user's input may not be a question that the user wishes to answer, but may be, for example, a certain fact or state stated by the user, for example, "player A's performance is good", “team A and team B's coach's tactics. Very similar, "The performance of the player A's brother is too bad” and so on.
  • a chat bot in accordance with an embodiment of the present disclosure may provide an appropriate response, for example, a reasonable explanation or explanation based on user input as a response.
  • An example of a reply of a chat bot according to an embodiment of the present disclosure will be specifically described below in conjunction with the above examples.
  • the user's input is not limited to the above examples, and may include other various types of sentences.
  • step S202 the processor 10 performs natural language understanding on the input to generate a semantic representation.
  • Natural Language Understanding refers to the meaning of natural language expressed in a way that computers can understand and process. It is part of Natural Language Processing (NLP).
  • NLP Natural Language Processing
  • the purpose of natural language understanding is to obtain a semantic representation of a natural language that enables a computer to understand the user's thoughts.
  • Semantic representations can have a variety of expressions, and in an embodiment of the present disclosure, as an example, a semantic representation expressed in an intent and a semantic representation represented in a grammatical structure are provided.
  • the semantic representation is based on the user's intent, and natural language understanding of the input basically includes two parts, entity extraction and intent recognition.
  • the text of the user's input usually a sentence
  • the text can be pre-processed first.
  • the sentence can be divided into independent words or phrases by word-cutting, and then the part of speech is determined by part-of-speech tagging and labeled.
  • the grammatical function of the words in the sentence is analyzed to determine the composition of each word in the sentence and the structure of the sentence.
  • the sentence is subjected to entity extraction, and the noun in the sentence is extracted as an entity to determine the object involved in the sentence.
  • the sentence is inferred to determine the user's intention.
  • entity extraction employs methods based on, for example, word vectors, uses a large amount of corpus for machine learning training, and can optimize model performance by manually adding entities.
  • an expression obtained by extracting a sentence through a entity is referred to as a template.
  • a large number of templates with known intents are trained using a classifier using, for example, a machine learning algorithm.
  • a machine learning algorithm can be utilized to automatically estimate the probability that the template belongs to an intent, and select the most probable intent as the intent to identify.
  • New templates can be added to the training template periodically to update the model that is intended to be identified.
  • a semantic representation representing the user's intent can be generated based on the extracted entity and the intent of the identified user.
  • the semantic representation may be represented as the user's intent and one or more attributes associated with the intent.
  • the user's intent can be a question that the user desires to be answered.
  • the user's intent may be "query the player's team", then the corresponding attribute may include at least "player name.”
  • the user's intention may be "query the coach of the team", then the corresponding attribute may at least include "team name” and the like.
  • the user's intention may be "inquiring the relationship between two people", then the corresponding attribute may include at least "the name of the person 1" and “the name of the person 2" and the like.
  • the attribute may also include the time period for which the query is directed. For example, when the user's intention is "query the player's team”, the attribute may include “when the player belongs to the team”, and when the user's intention is “query the team's coach", the attribute may include “the coach” When coaching the team, and when the user's intention is to "query the relationship between two people," the attributes can include “when the relationship between the two.”
  • the user's intent may be a certain fact or state of the statement.
  • the user's intent may be "evaluation player”, “evaluation team”, “evaluation coach”, etc.
  • corresponding attributes may include “player name”, “team name”, “coach name”, and the like.
  • the attribute may also include the time period for which the evaluation is directed.
  • both the user's intent and attributes can be generated by an entity obtained by natural language understanding of the user's input.
  • the attribute can be populated in one or more ways. The manner in which the attributes are populated will be described in detail below.
  • step S203 the semantic representation is processed by the processor 10 using the knowledge map to generate a reply.
  • Knowledge map is a structured semantic knowledge base for describing concepts and their relationships in the physical world in symbolic form.
  • the basic components are, for example, the "entity-relationship-entity" triplet and the "entity-parameter-
  • the value "triad" is interconnected by relationships to form a network of knowledge structures. That is to say, entities (or concepts, events, etc.) constitute nodes in the knowledge map, and various relationships between entities constitute connections in the network.
  • knowledge maps are characterized by reasoning ability (that is, the ability to retrieve information through reasoning) and to graphically display structured knowledge that has been classified.
  • each entity (node) shown in the figure includes “team”, “player”, “coach”, “international league”, “national team”, etc., and the relationship between entities includes “effectiveness”. , “teaching”, “brothers”, “good friends” and so on.
  • the figure also includes the parameters of the entity, such as “nationality”, “number of goals", “number of assists”, etc., and corresponding values.
  • the entities, relationships, parameters, and the like shown in FIG. 4 are all schematic, and various entities, relationships, and parameters are contemplated by those skilled in the art, which are all included within the scope of the present disclosure.
  • FIG. 4 only a portion of the entities, relationships, and parameters are shown in FIG. 4. Those skilled in the art will appreciate that other entities may be added to the figure, and each entity may have various relationships, and each entity may also Can have various parameters.
  • the nodes, relationships, parameters, and the like shown in FIG. 4 are schematic, and the knowledge map according to an embodiment of the present disclosure may include more nodes, relationships, and parameters, and between nodes.
  • the relationship can be more complicated.
  • the two nodes are not limited to one relationship, but may include a plurality of different relationships.
  • a dimension representing time can also be added to represent different relationships and parameters between nodes in different time periods.
  • knowledge maps in accordance with embodiments of the present disclosure can be very large and complex, and include one-dimensional, two-dimensional, three-dimensional, and even more dimensional structures.
  • the construction of the knowledge map also relies on the extraction of the "entity-relationship-entity" triples and the "entity-parameter-value” triples.
  • knowledge elements can be extracted from a large amount of raw data (eg, books, newspapers, magazines, web pages, various types of databases) using automated means (eg, deep neural networks, etc.) or semi-automatic (eg, automated means of manual intervention). And extract the triples and store them in the knowledge map.
  • further knowledge fusion is needed to integrate the same entity with different names through Entity Di sambiguation and Entity Resolution.
  • top-down and bottom-up For the construction of the knowledge map, you can adopt two methods: top-down and bottom-up. For example, a top-down approach is used for important nodes such as players and teams, that is, extracting ontology information from a high-quality data source such as Wikipedia into the knowledge base. In addition, for other relatively less important information, a bottom-up approach is used to extract data sets from public sources such as the Internet, select information with higher confidence, and add knowledge maps.
  • the storage method of the constructed knowledge map may be, for example, a Resource Description Framework (RDF) or a parameter graph (Property Graph).
  • RDF Resource Description Framework
  • Property Graph parameter graph
  • the query statement in order to process the semantic representation with the knowledge map to generate a reply, may be generated according to the semantic representation, and the knowledge map may be queried with the query statement to generate a reply.
  • the query for querying the knowledge map may be, for example, a Cypher language or a SPARQL language commonly used in the field of graph databases.
  • step S204 the processor 10 performs natural language generation based on the reply to obtain an output of the natural language format.
  • Natural Language Generation refers to the meaning of natural language in a way that computers can understand and process. It is also part of Natural Language Processing (NLP).
  • NLP Natural Language Processing
  • the purpose of natural language generation is to transform the language used by a computer into a natural language used by humans.
  • Those skilled in the art are also familiar with and aware of the various principles and common means of natural language generation. Natural language generation can be simpler than natural language understanding. For example, processor 10 only needs to simply provide the resulting response to the user.
  • the output of the natural language format may be text composed of replies, speech generated by language synthesis, or video generated by animation software or the like.
  • step S205 the output is provided to the user via the input and output interface 30.
  • text can be displayed to a user through a display device
  • voice can be played to a user through a speaker
  • video can be provided to a user through a display and a speaker, and the like.
  • the method shown in FIG. 2 is for a vertical field.
  • the same noun can be avoided in different fields to refer to different entities, thereby greatly reducing "entity disambiguation” and "co-finger digestion” in entity extraction. Difficulty.
  • the difficulty of constructing the knowledge map and the scale of the constructed knowledge map can be greatly reduced, and the difficulty in identifying the intent and attributes in natural language processing is greatly reduced.
  • applying the method of the embodiments of the present disclosure in the vertical domain can answer complex questions.
  • the reasoning ability of the knowledge map can be utilized to process the user's input, thereby being able to answer more in-depth questions from the user, so that the user can interact with the robot for the specific Topics in the field engage in vertical, deep conversations.
  • step S201 the processor 10 receives, via the microphone, the input of the natural language format provided by the user in language, "Which team is the player A?”, and converts the input into speech by voice recognition.
  • "Player A” even in other sports fields (for example, the rugby field, the volleyball field), there are a plurality of "players A” of the same name. Since this embodiment is applied to the vertical field (soccer field), no misunderstanding will occur. Player A and the corresponding team are the same as those in other sports fields. Therefore, the embodiments of the present disclosure reduce the case where the same noun points to different entities, thereby reducing the complexity of semantic recognition, and the method of applying the embodiments of the present disclosure in the vertical domain can reply complexities compared to the dialogue robot in the open domain. The problem.
  • the text is then pre-processed by processor 10.
  • the word is first divided into independent words or phrases by word-cutting and each word is marked with a part of speech.
  • Text that has been tagged with a part of speech can be represented as follows:
  • NN prep., r., and v. are the abbreviation of nouns, prepositions, pronouns, and verbs, respectively.
  • the parsed text can be represented as follows:
  • Sub., Obj. and Pred. are the English abbreviation of subject, object and predicate respectively.
  • the processor 10 After preprocessing the text, the processor 10 performs physical extraction on the text in step S202, extracting nouns in the text as entities, thereby determining the objects involved in the sentence.
  • the text extracted by the entity can be represented as follows:
  • ⁇ Person> and ⁇ Team> respectively indicate that the entity in front of them is a character and a team.
  • the text is intent-recognized, so that the intent is identified as "query the player's team.”
  • the resulting semantic representation includes the user's intent to "query the player's team” and the attribute "player A.”
  • the processor 10 queries the knowledge map using the Cypher statement in step S203 to obtain the team in which "player A" is located.
  • the query statement is:
  • the processor 10 obtains the output of the natural language format by performing natural language generation based on the reply in step S204.
  • the output of the resulting natural language format is "Player A plays at Team A.”
  • the output is provided to the user by the processor 10 through the display or speaker at step S205.
  • the processor 10 For example, on the screen, "Player A plays in team A” and “Player A plays in team A” are played through the speaker.
  • the query for the knowledge map is:
  • a dialogue with the user in the vertical field of soccer is completed, and a response in the natural language format is provided for the user's inquiry.
  • the accuracy of the response is greatly improved and the user experience is improved.
  • step S201 the processor 10 receives, via the microphone, an input of the natural language format provided by the user in language, "Who is the coach of player A?”, and converts the input into text by voice recognition.
  • this input it can be seen from Figure 4 that "Player A” plays for “Team A” and “Team A” is coached by “Coach A”, but between "Player A” and “Coach A”, There is no direct connection to indicate the relationship between the two. That is to say, in the data stored in the system, the relationship between the two is not recorded. At this time, for the existing chat robot, such a problem may not be correctly answered due to lack of corresponding information.
  • by utilizing the knowledge map in a manner as shown below a correct answer can be obtained to provide the user with the appropriate output.
  • Text that has been tagged with a part of speech can be represented as follows:
  • Player A/NN /u. Coach / NN is /v. Who / pron.?
  • NN u., v. and pron. are the abbreviation of noun, auxiliary, verb and pronoun respectively.
  • the parsed text can be represented as follows:
  • Adj., Sub., Obj. and Pred. are the English abbreviations of adjectives, subjects, objects and predicates, respectively.
  • the processor 10 After preprocessing the text, the processor 10 performs physical extraction on the text in step S202, extracting nouns in the text as entities, thereby determining the objects involved in the sentence.
  • the text extracted by the entity can be represented as follows:
  • ⁇ Person> and ⁇ Name> respectively indicate that the entity in front of it is a person.
  • the text is intent-recognized, so that the intent is identified as "inquiring the player's coach.”
  • the resulting semantic representation includes the user's intent “query player's coach” and attribute "player A”.
  • the processor 10 queries the knowledge map using the Cypher statement in step S203 to obtain the name of the player A's coach.
  • the query statement is:
  • the MATCH statement first queries the team played by player A, and then queries the coach of the team.
  • the relationship "[:REL_Coach]” indicates that the relationship between the team and the coach is "team. Coached by the coach.”
  • this embodiment of the present disclosure can obtain a final reply by adding a one-time inference process using the knowledge map in the above query sentence.
  • "REL_BELONG_TO_TEAM” indicates that the relationship between player A and team A is "for the team”
  • "REL_Coach” indicates the coach of team A.
  • the result of this query is:
  • the processor 10 obtains the output of the natural language format by performing natural language generation based on the reply in step S204.
  • the output of the resulting natural language format is "The coach of player A is coach A.”
  • the output is provided to the user by the processor 10 through the display or speaker at step S205.
  • the processor 10 For example, on the screen, "the player of the player A is the coach A", and the player "the coach of the player A is the coach A” is displayed through the speaker.
  • the reasoning ability of the knowledge map is further utilized on the basis of the example 1 in the process of generating the reply, which greatly improves the depth and accuracy of the reply, thereby improving the user experience.
  • a new triple can be generated in the knowledge map, and the obtained new relationship is stored in the knowledge map. For example, you can add the following triples to your knowledge map:
  • the Cypher statement written to the triple can be, for example:
  • the triple may be added, for example, after asking the user "Is the reply useful?" and getting a positive response from the user.
  • the connections of the newly added triples are indicated by dashed lines.
  • the content of the knowledge map can be continuously supplemented, improved and added with the help of the user, which is beneficial to the management of the knowledge map.
  • the user's input may not be a question that the user wishes to answer, but may be, for example, a certain fact or state stated by the user.
  • step S201 the processor 10 receives the input "the player A's performance is good" in the natural language format provided by the user in the language by the microphone, and converts the input into text by voice recognition.
  • the text is then pre-processed by processor 10.
  • the word is first divided into independent words or phrases by word-cutting and each word is marked with a part of speech.
  • Text that has been tagged with a part of speech can be represented as follows:
  • NN u and adj. are the abbreviation of noun, auxiliary and adjective respectively.
  • the parsed text can be represented as follows:
  • Adj., Sub. and Pred. are the English abbreviations of adjectives, subjects, objects and predicates, respectively.
  • the processor 10 After preprocessing the text, the processor 10 performs physical extraction on the text in step S202, extracting nouns in the text as entities, thereby determining the objects involved in the sentence.
  • the text extracted by the entity can be represented as follows:
  • ⁇ Person> indicates that the entity in front of it is a person.
  • the text is intentionally identified so that the intent is identified as "evaluating the performance of the player.”
  • the resulting semantic representation includes the user's intent to "evaluate the player's performance” and the attribute "player A.”
  • the knowledge map is queried by the processor 10 using the Cypher statement in step S203 to query the parameters and values associated with the performance of player A. For example, you can query the player's goals and assists.
  • the query statement is:
  • the processor 10 obtains the output of the natural language format by performing natural language generation based on the reply in step S204.
  • the output of the resulting natural language format is " Player A has scored 5 goals and 11 assists.”
  • the output is provided to the user by the processor 10 through the display or speaker at step S205.
  • the processor 10 For example, on the screen, "Player A has scored 5 goals and 11 assists", and “Player A has scored 5 goals and 11 assists” through the speaker.
  • a dialogue with the user in the vertical field of soccer is completed, providing a response in a natural language format for some fact or state stated by the user.
  • the accuracy of the response is greatly improved and the user experience is improved.
  • the semantic representation may be based on a grammatical structure, and the natural language understanding of the input basically comprises two parts: entity extraction and grammatical structure recognition.
  • entity extraction portion is similar to the entity extraction above for the semantic representation based on intent, and is not repeated here.
  • dependent grammar analysis identifies the grammatical components of "subjective" and "fixed complement” in the sentence, and analyzes the relationship between the components.
  • the dependencies included in the dependency grammar analysis include, for example, the subject-predicate relationship (SBV), the verb-object relationship (VOB), the inter-object relationship (IOB), the pre-object (FOB), the linguistic (DBL), and the centering relationship (ATT).
  • intermediate structure ADV
  • dynamic complement structure CMP
  • parallel relationship COO
  • mediation relationship POB
  • LAD left attachment relationship
  • RAD right attachment relationship
  • IS independent structure
  • WP punctuation
  • HED core relationship
  • an expression of a grammatical structure obtained by extracting a sentence through a entity is referred to as a template.
  • a large number of templates with known grammatical structures are trained using a machine learning algorithm using a classifier.
  • a machine learning algorithm can be used to automatically estimate the probability that the template belongs to a certain grammatical structure, and the most probabilistic grammatical structure is selected as the recognized grammatical structure.
  • New templates can be added to the training template periodically to update the model identified by the grammatical structure.
  • FIG. 5 is a schematic diagram of a semantic representation based on a grammatical structure, in accordance with an embodiment of the present disclosure.
  • the expression in Figure 5 is represented by a map.
  • an expression represented by a map is a small segment taken from the entire knowledge map, including one or more attributes, and these attributes correspond to, for example, entities, relationships between entities, values, and correspondences in the knowledge map. Parameters, etc.
  • the individual components of the input grammatical structure can be positioned, placed, or aligned into an attribute in the expression represented by the map to provide a semantic representation of the input.
  • one or more attributes are unknown based on the user's input and are therefore represented by a question mark.
  • At least one of the attributes represented by the question mark can be used as the object to be queried.
  • Those skilled in the art can understand how to generate a query statement according to the expression represented by the map to query the knowledge map. To put it succinctly, this process is similar to the process of finding a segment in the knowledge map that matches the relationship of each attribute in the expression, and obtaining the specific content of the object being queried from the found segment.
  • the semantic representation based on the grammatical structure can be represented by any other form as long as it can represent the identified grammatical structure and can be used to generate a query statement that queries the knowledge map.
  • the semantic representation based on the grammatical structure does not need to understand the user's intention compared to the semantic representation based on the intent, so even if the user's intention is not clear, is not easy to represent or is not easy to understand, or the template for the intent is not obtained in advance, the user can still be The input is processed to get the appropriate response.
  • individual attributes in an expression represented by a knowledge map may be generated by natural language understanding of the user's input.
  • the attribute can be populated in one or more ways. The manner in which the attributes are populated will be described in detail below.
  • step S201 the processor 10 receives, via the microphone, the input of the natural language format provided by the user in language, "Which team is the player A?”, and converts the input into speech by voice recognition.
  • the processor 10 performs word segmentation, part-of-speech tagging, dependency syntax analysis, entity extraction, and the like in step S202.
  • the processor 10 obtains the output of the natural language format by performing natural language generation based on the reply in step S204.
  • the output of the resulting natural language format is "Player A plays at Team A.”
  • the output is provided to the user by the processor 10 through the display or speaker at step S205.
  • the processor 10 For example, on the screen, "Player A plays in team A” and “Player A plays in team A” are played through the speaker.
  • step S201 the processor 10 receives, via the microphone, an input of the natural language format provided by the user in language, "Who is the coach of player A?", and converts the input into text by voice recognition.
  • the processor 10 performs word segmentation, part-of-speech tagging, dependency syntax analysis, entity extraction, and the like in step S202.
  • the processor 10 obtains the output of the natural language format by performing natural language generation based on the reply in step S204.
  • the output of the resulting natural language format is "The coach of player A is coach A.”
  • the output is provided to the user by the processor 10 through the display or speaker at step S205.
  • the processor 10 For example, on the screen, "the player of the player A is the coach A", and the player "the coach of the player A is the coach A” is displayed through the speaker.
  • a certain attribute of the semantic representation may not be directly obtained by natural language understanding of the input.
  • the user's input may not directly include the entity involved, but rather indirectly introduces the entity involved by the description.
  • the user's input can be "team of team A” and "brother of player A” and the like. In this case, it is not possible to directly determine what the entities involved in "Coach of Team A” and "Brother of Player A” are based on user input.
  • chat bots can't cope with such situations and can't provide users with appropriate answers.
  • the chat bot according to the embodiment of the present application can process the semantic representation using the reasoning ability of the knowledge map according to the input of the user, and determine the attribute, thereby being able to further provide an appropriate response.
  • the semantic representation may be processed by the knowledge map according to the input of the user to determine the attribute.
  • the attribute can be obtained directly from the triplet stored in the knowledge spectrum, or the reasoning ability of the knowledge map can be used to derive the attribute from the user's input through several steps of reasoning. An example of determining this attribute using a knowledge map is provided below.
  • step S201 the processor 10 receives, via the microphone, an input of the natural language format provided by the user in a language manner, "What is the relationship between the coach of the team A and the coach of the team B?", and converts the input into a speech recognition by speech recognition. Text.
  • this input it can be seen from Fig. 4 that the coach of "team A” is “coach A” and the coach of "team B” is “coach B", but in the input of the user, there is no direct inquiry about "coach” What is the relationship between A and coach B?".
  • chat bots such problems may not be correctly answered due to lack of corresponding information.
  • a correct answer can be obtained to provide the user with the appropriate output.
  • Text that has been tagged with a part of speech can be represented as follows:
  • NN conj., v., and pron. are the abbreviation of nouns, conjunctions, verbs, prepositions, and pronouns, respectively.
  • the parsed text can be represented as follows:
  • Adj., Sub., Obj. and Pred. are the English abbreviations of adjectives, subjects, objects and predicates, respectively.
  • the processor 10 After preprocessing the text, the processor 10 performs physical extraction on the text in step S202, extracting nouns in the text as entities, thereby determining the objects involved in the sentence.
  • the text extracted by the entity can be represented as follows:
  • ⁇ Team>, ⁇ Person> and ⁇ Relation> respectively indicate that the entities in front of them are teams, people and relationships.
  • the text is intent-identified, so that the intent is identified as "inquiring the relationship between two people.”
  • the attribute associated with the intent is determined to be the "name" of both.
  • the attributes of the "name" of the two coaches are not provided, and therefore, the natural language understanding of the user's input cannot be obtained.
  • the knowledge map is used to obtain the attribute when the semantic representation is processed using the knowledge map.
  • the processor 10 queries the knowledge map using the Cypher statement to obtain the name of the coach of team A.
  • the query statement is:
  • the processor 10 queries the knowledge map using the Cypher statement in step S203 to obtain the relationship between "Coach A" and "Coach B".
  • the query statement is:
  • the processor 10 obtains the output of the natural language format by performing natural language generation based on the reply in step S204.
  • the output of the resulting natural language format is "the coach of team A and the coach of team B are friends.”
  • the output is provided to the user by the processor 10 through the display or speaker at step S205.
  • the processor 10 For example, on the screen, "the coach of team A and the coach of team B are friends", and the player "the coach of team A and the coach of team B are friends" are played through the speaker.
  • the output of the natural language format may also be, for example, "Coach A and Coach B are friend relationships", thereby omitting the process of inferring using the knowledge map when filling attributes, and only for the identified intent. And the determined properties to generate the output. This helps to reduce the burden on the system when generating the output of the natural language format, and provides direct results to the user to improve the user experience.
  • the text of the input analyzed by the dependency syntax can be as shown in FIG.
  • the answer can be "friendship".
  • the depth and accuracy of the reply are greatly improved, thereby improving the user experience.
  • the user's input may not be a question that the user wishes to answer, but may be, for example, a certain fact or state stated by the user.
  • the user's input can be "The tactics of Team A and Team B's coach are very similar.”
  • the user does not directly ask questions about "Coach A” and "Coach B", for existing chat bots, such problems may not be correctly answered due to lack of corresponding information.
  • a correct answer can be obtained to provide the user with the appropriate output.
  • the reasoning ability of the knowledge map is further utilized, which greatly improves the depth and accuracy of the reply, thereby improving the user experience.
  • step S201 the processor 10 receives, via the microphone, the input of the natural language format provided by the user in language, "Which team is the player A's brother playing?", and converts the input into text by voice recognition.
  • the text is then pre-processed by processor 10.
  • Text that has been tagged with a part of speech can be represented as follows:
  • NN u., prep., adv., and v. are English abbreviations of nouns, auxiliary words, prepositions, adverbs, and verbs, respectively.
  • the parsed text can be represented as follows:
  • Adj., Sub., Obj. and Pred. are the English abbreviations of adjectives, subjects, objects and predicates, respectively.
  • the processor 10 After preprocessing the text, the processor 10 performs physical extraction on the text in step S202, extracting nouns in the text as entities, thereby determining the objects involved in the sentence.
  • the text extracted by the entity can be represented as follows:
  • ⁇ Team> and ⁇ Person> respectively indicate that the entities in front of them are teams and people.
  • the text is intent-recognized, so that the intent is identified as "query the player's team.”
  • the knowledge map is used to obtain the attribute when the semantic representation is processed using the knowledge map.
  • the processor 10 queries the knowledge map using the Cypher statement to obtain the name of the brother of team A.
  • the query statement is:
  • the result is "player B”. Therefore, "Player B” is populated into this attribute. Finally, the resulting semantic representation includes the user's intent to "query the player's team” and the attribute "player B.”
  • the processor 10 queries the knowledge map using the Cypher statement in step S203 to obtain the team in which "player B" is located.
  • the query statement is:
  • REL_BELONG_TO_TEAM indicates that the relationship between player B and team B is "for the team.”
  • the processor 10 obtains the output of the natural language format by performing natural language generation based on the reply in step S204.
  • the output of the resulting natural language format is "Player B plays at Team B.”
  • the output is provided to the user by the processor 10 through the display or speaker at step S205.
  • the processor 10 For example, on the screen, "Player B plays in team B”, “Player B plays in team B” is played through the speaker.
  • the text of the input analyzed by the dependency syntax can be as shown in FIG.
  • the depth and accuracy of the reply are greatly improved, thereby improving the user experience.
  • step S201 the processor 10 receives the input of the natural language format provided by the user in the language by the microphone, "Which team of the international league is also the player of the national team B?", and the input is recognized by voice. Convert to text.
  • the processor 10 performs word segmentation, part-of-speech tagging, dependency syntax analysis, entity extraction, and the like in step S202.
  • the text analyzed by the dependent grammar can be as shown in FIG.
  • the depth and accuracy of the reply are greatly improved, thereby improving the user experience.
  • the processor 10 obtains the output of the natural language format by performing natural language generation based on the reply in step S204.
  • the output of the obtained natural language format is "the goalkeeper of team A is also the player of the national team team B" or "the goalkeeper C of team A is also the player of the national team team B" and the like.
  • the output is provided to the user by the processor 10 through the display or speaker at step S205.
  • the processor 10 For example, on the screen, "the team A's goalkeeper is also a member of the national team team B", and the player "the team A's goalkeeper is also the national team team B player” is played through the speaker.
  • the present example can also apply a semantic representation based on the user's intention, and a description thereof will be omitted herein.
  • the user's input may not be a question that the user wishes to answer, but may be, for example, a certain fact or state stated by the user.
  • the user's input can be "The player A's brother's performance is too bad.”
  • the user does not directly ask the question about "player B" (the brother of player A)
  • the existing chat robot such a problem may not be correctly answered due to the lack of corresponding information.
  • a correct answer can be obtained to provide the user with the appropriate output.
  • the knowledge map is queried by the processor 10 using the Cypher statement to obtain the name of the brother of the player A, that is, "player B”.
  • the knowledge map is queried by the processor 10 using the Cypher statement to query parameters and values associated with the performance of Player B. For example, you can query the number of goals and assists for Player B. The responses received were “2 goals” and “4 assists”.
  • the output of the natural language format is obtained by the processor 10 performing natural language generation based on the reply. For example, the output of the resulting natural language format is " Player B has scored 2 goals and 4 assists.”
  • the reasoning ability of the knowledge map is further utilized, which greatly improves the depth and accuracy of the reply, thereby improving the user experience.
  • a default value may be set for the attribute.
  • the default user's input may relate to the current season or this year's game.
  • the default user input may refer to the most famous of the players.
  • the attribute when a certain attribute of the semantic representation cannot be directly obtained by natural language understanding of the input, the attribute may be determined according to an event occurring within a period of time before and/or after the current time point. For example, when a user's input involves multiple players, if an event associated with one of the players occurs within a certain period of time before the current time point, before the current time point, and/or after the current time point, then This attribute is for this player.
  • the period of time may be, for example, one hour, one day, one week, one month, one season or one year, and the associated event may be a match in which the player participates, other activities in which the player participates, other news events associated with the player, etc.
  • one attribute corresponds to a plurality of players, if a match in which a certain player participates is being performed upon receiving the user's input, it is determined that the attribute is this one player.
  • the attribute when a certain attribute of the semantic representation cannot be directly obtained by natural language understanding of the input, the attribute may be determined by the context of the user's input. For example, if a user mentions or discusses a team in the course of a conversation, then if an attribute corresponds to multiple teams or multiple players, the attribute is determined to be the one discussed above. Team or player of the team.
  • the attribute when a certain attribute of the semantic representation cannot be directly obtained by natural language understanding of the input, the attribute may be determined according to the user's profile. For example, a user's profile can be created, recording various parameters of the user. For example, the location of the user, the team the user cares about, the player the user cares about, the team the user does not like, the player the user does not like, the code and/or nickname used by the user to refer to the team or player.
  • the user profile when there are multiple possible options for an attribute, it can be determined which option the attribute should correspond to.
  • the attribute should be the team in which the user is located, the team the user cares about, or the player the user cares about.
  • the team or player that the user does not like may be excluded from these options.
  • the team or player corresponding to the attribute may be determined according to the code and/or nickname commonly used by the user.
  • an inquiry for the attribute may be generated, natural language generation is performed according to the inquiry to obtain an output, and the output is provided to the user. And receive input from the user for the query.
  • the attribute can be determined by asking the user. For example, when the team or player mentioned by the user may have multiple corresponding options, the user may be asked "Are you asking XX team?" or "Do you ask A of the XX team?" And determine the attribute based on the user's input for the query.
  • the type of inquiry may be a question question in addition to the general question, that is, the user may be asked "Are you asking Team A, Team B or Team C?” or "You are asking Player A,” Player B or player C?”
  • the order in which the options provided to the user in the interrogative sentence are selected may be arranged according to the probability of each option. For example, the higher the visibility of a team or player, the higher the relevance to the question, the higher the probability of the option, and the higher probability option will be ranked higher.
  • a knowledge map may be utilized to generate an inquiry for the attribute. For example, similar to Example 2 above, when the user's input is "Who is the coach of Player A?”, the knowledge map can be used to obtain the player A's team is "Team A", and then, the user can be Ask “Are you asking Team A's coach?”. For example, similar to Example 6 above, when the user's input is "What is the relationship between the coach of Team A and the coach B of Team B?", the knowledge map can be used to obtain the coach of Team A. "Coach A” and Team B's coach is “Coach B". After that, you can ask the user "Are you asking about the relationship between coach A and coach B?". Obviously, by using knowledge maps to generate queries, you can greatly improve the efficiency of communication with users and improve the user experience.
  • attributes of the semantic representation can be determined in various other ways, for example, various "entities" in the field of knowledge maps can be “Discrimination” and “co-finger digestion” techniques are used to determine attributes.
  • the various ways of determining the attributes of the semantic representation mentioned above may be combined with each other.
  • the knowledge map can be used to finalize attributes from the parameters determined by the various means mentioned above.
  • attributes that are separately determined by the various modes mentioned above can be combined with each other to determine an attribute.
  • an inquiry can be generated based on an option of an attribute determined by the various manners mentioned above, and the attribute is determined based on the user's input for the inquiry.
  • Cypher language and the SPARQL language are used as examples to describe the query for the knowledge map, but those skilled in the art can understand that any other language in the field of the graph database can be used to query the knowledge map in the present disclosure.
  • semantic representation based on the intent and the semantic representation based on the grammatical structure are discussed in the embodiments of the present disclosure, those skilled in the art can also understand that the semantic representation can have other various expressions, and these expressions are all It is included in the present disclosure and can be applied to embodiments of the present disclosure. Additionally, in some embodiments of the present disclosure, various expressions of these semantic representations may be used in conjunction with each other. For example, for user input, processing based on an intent-based semantic representation may be used first, and for example, when the user's intent is not recognized, processing is performed using a semantic representation based on the grammatical structure.
  • embodiments of the present disclosure may also incorporate various techniques known in the art (eg, a database, A search engine, etc.) to generate a response to the user's input. These techniques are also incorporated in the present disclosure as part of the present disclosure and may be applied to embodiments of the present disclosure.
  • the word "exemplary” means “serving as an example, instance, or illustration” rather than as a “model” to be precisely copied. Any implementations exemplarily described herein are not necessarily to be construed as preferred or advantageous over other implementations. Furthermore, the present disclosure is not limited by any of the stated or implied theory presented in the above technical field, the background art, the invention or the specific embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本公开涉及由计算机实施的与用户对话的方法和计算机系统。该方法包括从用户接收自然语言格式的输入;对输入进行自然语言理解,生成语义表示;利用知识图谱对语义表示进行处理,以生成答复;根据答复进行自然语言生成来得到自然语言格式的输出;以及将输出提供给用户。其中,该方法用于垂直领域。该计算机系统包括:输入\输出接口,被配置为从用户接收自然语言格式的输入并向用户提供自然语言格式的输出;处理器;以及存储器,其被配置为耦合到处理器并存储计算机程序。其中,处理器被配置为执行该程序以执行本公开的由计算机实施的与用户对话的方法。

Description

由计算机实施的与用户对话的方法和计算机系统
相关申请的交叉引用
本申请要求于2018年02月13日递交的中国专利申请201810147719.6号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开涉及用计算机实施的与用户对话的方法和计算机系统,具体来说,涉及一种由计算机实施的用于垂直领域的与用户对话的方法和计算机系统。
背景技术
近年来,对话和聊天机器人(Chatbot)正在代替图形用户界面而成为新的用户界面(UI)。随着智能音箱等的涌现,对话机器人被认为是替代移动设备APP的下一代用户入口。
当前,对话机器人在拟人化交互方面取得了一定进展。机器人以“说人话”作为优化目标,并且通过使用海量的语料进行训练并应用深度学习算法,用户有时已经难以分辨出是在与机器人进行对话。但是,由于所采用的技术的局限,当前的对话机器人仅能够进行简单地对话和处理简单的问题,并且如果遇到无法回答的问题或无法理解的表述,聊天机器人一般会简单地调用搜索引擎对用户输入中的关键词进行搜索,并直接将搜索结果的网页返回给用户。
发明内容
如上所述,已有的对话机器人能较好的与人简单地对话,但难以适当地回答复杂的问题或与用户进行深度对话。例如,当用户的问题或表述需要一步或多步的逻辑推理才能理解或应答时,对话机器人往往无法进行应对。相对于开放领域来说,这样的问题对于垂直领域的机器人更加普遍和常见。“开放领域”指的是在用户与机器人对话时,对话不被限制在某一个具体领域,用户可以和机器人聊任何话题。“垂直领域”也叫作“封闭领域”,垂直领域的对话机器人指的是在用户与机器人对话时,对话被限制在某一个具体领域或行业。对于开放领域的聊天机器人,用户的聊天往往比较简 单,并且对聊天机器人的期望不高。而对于垂直领域的聊天机器人,由于对话被限制在某一领域,用户会试图与机器人针对该具体领域中有深度的话题进行复杂的对话,并且期望得到更有深度的答复。针对这样的话题和对话,因为不能通过简单的搜索和数据库查询来得到合适的答复,所以已有的对话机器人也无法应对垂直领域中的对话情景。
因此,需要提供一种特别是在垂直领域中能够答复用户的各种问题的对话机器人。本公开的一个目的是提供一种由计算机实施的与用户对话的方法和计算机系统,来解决以上的至少一个技术问题。
根据本公开的第一方面,提供了一种由计算机实施的与用户对话的方法,包括:从用户接收自然语言格式的输入;对输入进行自然语言理解,生成语义表示;利用知识图谱对语义表示进行处理,以生成答复;根据答复进行自然语言生成来得到自然语言格式的输出;以及将输出提供给用户。其中,该方法用于垂直领域。
根据本公开的第二方面,提供了一种计算机系统,包括:输入\输出接口,被配置为从用户接收自然语言格式的输入并向用户提供自然语言格式的输出;处理器;以及存储器,其被配置为耦合到处理器并存储计算机程序。处理器被配置为执行该程序以执行以下操作:从用户接收自然语言格式的输入;对输入进行自然语言理解,生成语义表示;利用知识图谱对语义表示进行处理,以生成答复;根据答复进行自然语言生成来得到自然语言格式的输出;以及将输出提供给用户。其中,该方法用于垂直领域。
根据本公开的实施例的优点之一在于,能够在垂直领域中答复用户的复杂和/或有深度的问题。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同说明书一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1是示出根据本公开的实施例的计算机系统的示图。
图2是根据本公开的实施例的由计算机系统实施的与用户对话的方法的流程图。
图3是根据本公开的实施例的基于意图的语义表示的示意图。
图4是根据本公开的知识图谱的示意图。
图5是根据本公开的实施例的基于语法的语义表示的示意图。
图6是根据本公开的实施例的经过依存语法分析的文本的示意图。
图7是根据本公开的实施例的经过依存语法分析的文本的示意图。
图8是根据本公开的实施例的经过依存语法分析的文本的示意图。
图9是根据本公开的实施例的经过依存语法分析的文本的示意图。
图10是根据本公开的实施例的经过依存语法分析的文本的示意图。
图11是根据本公开的实施例的由知识图谱表示的表达式的示意图。
注意,在以下说明的实施例中,有时在不同的附图之间共同使用同一附图标记来表示相同部分或具有相同功能的部分,而省略其重复说明。在本说明书中,使用相似的标号和字母表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。
图1是示出根据本公开的实施例的计算机系统1的示图,该计算机系统1用来实施根据本公开的与用户对话的方法。在一些情况下,该计算机系统1可以被称作为“对话机器人”。
图1中示出的该计算系统1为可以应用到本公开的硬件设备的示例。该计算系统1可以是执行处理和/或计算的各种计算设备,包括但不局限于工作站、服务器、桌面计 算机、膝上计算机、平板电脑、个人数字助理、智能电话、车载计算机、智能音箱或者它们的组合。
该计算机系统1包括可以包括各种元件。例如,计算机系统1包括处理器10、存储器20和输入\输出接口30。该处理器10可以是任何类型的处理器,并且可以包括但不局限于通用目的处理器和/或专业目的处理器(诸如特殊处理的芯片)。存储器20可以包括或连接到任何存储设备,例如,非瞬态存储设备,并且可以进行数据存储。存储器20包括但不局限于计算机可以读取和记录数据、指令和/或代码的盘驱动器、光学存储设备、固态存储设备、软盘、硬盘、柔性盘或任何其他磁性介质。存储器20的类型例如包括但不局限于ROM(只读存储器)、RAM(随机存取存储器)、快速缓存存储器、其他存储芯片和/或其他存储介质。存储器20可以耦合到处理器10并存储任何数据/指令/代码。例如,该存储器存储用于本公开的技术方案的计算机程序,该计算机程序可以被处理器读取并执行,以实施本公开的技术方案。输入输出接口30被配置为从用户接收自然语言格式的输入并向用户提供自然语言格式的输出。例如,输入输出接口30可以包括和/或连接到可以从用户接收自言语言格式的输入和向用户提供自然语言格式的输出的任何设备,包括但不局限于鼠标、键盘、触摸屏、麦克风和/或遥控器,以及显示器、扬声器、视频/音频输出端口、振动器和/或打印机等。
图1中示出的各种设备可以由例如总线连接,并且由本地设备构成。另外,输入输出接口30可以位于远离处理器10的远程设备中,例如,位于用户的移动设备中。另外,图1中示出的各种设备可以采用云计算的配置,其中各个功能由通过网络连接的多个设备分割和共享。例如,处理器10和存储器20可以分布在多个设备中,并进行分布式部署。在一些实施例中,处理器10的一部分可以位于远程设备中,例如,位于用户的移动设备中,并由移动设备承载本公开的技术方案的一部分特征。例如,本公开的技术方案包括由移动设备执行的APP。各个设备之间的通信方式可以包括例如但不局限于有线通信设备和/或无线通信设备。有线通信设备例如包括调制解调器、网卡和光纤通信设备等。无线通信设备例如包括红外通信设备、蓝牙设备、1302.11设备、WIFI设备、WiMax设备、蜂窝通信设备等。
图2是根据本公开的实施例的由计算机系统1实施的与用户对话的方法的流程图。
如图2所示,根据本公开的实施例的由计算机实施的与用户对话的方法开始于步骤S201,即,处理器10通过输入/输出接口30从用户接收自然语言格式的输入。自然 语言指的是人们日常使用的语言,它是人与人之间沟通所使用的语言。自然语言的简单例子包括人们日常使用的汉语、英语、德语等。与自然语言对应的是逻辑语言,它是人与机器沟通所使用的语言。逻辑语言的简单例子包括各种计算机语言。用户的输入可以是自然语言格式的文字、语音、视频等。例如,用户的输入可以是通过输入法输入的一段文字。或者,用户的输入可以是通过麦克风输入的一段语音,之后该语音可以通过语音识别被转换为文字。或者,用户的输入可以是通过摄像头和麦克风输入的一段视频,之后该视频中的语音可以通过语音识别被转换为文字。
用户的输入可以包括各种类型的句子。例如,用户的输入可以是用户希望得到解答的问题。例如,在“足球领域”中,用户的输入可以是“球员A在哪个球队效力?”、“球员A的教练是谁?”、“球队A的教练和球队B的教练之间是什么关系?”、“球员A的兄弟在哪个球队效力?”、“国际联赛的哪个球队的门将也是球队B的队员”等。针对这样的用户输入,根据本公开的实施例的聊天机器人可以提供该问题的解答,作为答复。此外,用户的输入也可以不是用户希望得到解答的问题,而可以是例如用户陈述的某种事实或者状态,例如,“球员A的表现不错”、“球队A和球队B的教练的战术很相似”、“球员A的兄弟的表现太差了”等。针对这样的用户输入,根据本公开的实施例的聊天机器人可以提供适当的反应,例如,根据用户的输入进行合理的解释或说明,来作为答复。在下文中会结合以上例子对根据本公开的实施例的聊天机器人的答复的示例进行具体说明。此外,本领域技术人员可以理解,用户的输入不仅局限于以上示例,并且也可以包括其他各种类型的句子。
之后,在步骤S202,由处理器10对输入进行自然语言理解,生成语义表示。自然语言理解(Natural Language Understanding,NLU)指的是用计算机能够理解和处理的方式来表示自然语言的意义,它是自然语言处理(Natural Language Processing,NLP)的一部分。简单来说,自然语言理解的目的是获得自然语言的语义表示,该语义表示使得计算机能够明白用户的想法。
语义表示可以具有各种表达方式,并且在本公开的实施例中,作为示例,提供了以意图表示的语义表示和以语法结构表示的语义表示。
根据本公开的一个实施例,语义表示是基于用户的意图的,并且对输入进行自然语言理解基本上包括实体抽取和意图识别两部分。具体来说,在接收到用户的输入的文本(通常是一句话)时,可以先对文本进行预处理。例如,可以通过切词将这句话 划分为独立的词或词组,之后通过词性标注来确定每个词的词性并进行标注。之后,根据所标注的词性进行句法分析,来对句子中的词语语法功能进行分析,从而确定每个词在句子中的成分以及句子的结构。在对文本进行预处理之后,对句子进行实体抽取,抽取句子中的名词作为实体,从而确定句子中所涉及的对象。之后,根据所抽取的实体,对句子进行意图识别,以确定用户的意图。
如本领域中已知的,实体抽取采用基于例如词向量的方法,利用大量语料进行机器学习训练,并能通过手动添加实体的方法来优化模型表现。在本公开的实施例中,对语句经过实体抽取得到的表达式被称为模版。首先,利用分类器使用例如机器学习算法对具有已知意图的大量模版进行训练。经过训练,当接收到用户的输入并从其形成新的模版后,可以利用机器学习算法来自动估计该模版属于某个意图的概率,并选择概率最大的意图作为识别的意图。可以定期将新的模版加入训练模版以更新意图识别的模型。
在此之后,可以根据所抽取的实体和所识别的用户的意图来生成表示用户的意图的语义表示。图3是根据本公开的实施例的基于意图的语义表示的示意图。在本公开的一个实施例中,语义表示可以被表示为用户的意图和与该意图相关的一个或多个属性。如上文所讨论的,用户的意图可以是用户期望得到解答的问题。例如,用户的意图可以是“查询球员所属球队”,那么对应的属性至少可以包括“球员姓名”。另外,用户的意图可以是“查询球队的教练”,那么对应的属性至少可以包括“球队名”等。另外,用户的意图可以是“查询两人之间的关系”,那么对应的属性至少可以包括“人物1的姓名”和“人物2的姓名”等。另外,属性还可以包括查询所针对的时间段。例如,当用户的意图是“查询球员所属球队”时,属性可以包括“球员何时属于该球队”,当用户的意图是“查询球队的教练”时,属性可以包括“该教练何时执教该球队”,而当用户的意图是“查询两人之间的关系”时,属性可以包括“两人何时的关系”。在本公开的一些实施例中,用户的意图可以是陈述的某种事实或者状态。例如,用户的意图可以是“评价球员”、“评价球队”、“评价教练”等,对应的属性可以包括“球员姓名”、“球队名称”、“教练姓名”等。另外,类似地,属性也可以包括评价所针对的时间段。
本领域技术人员可以理解,以上提供的用户的意图和对应的属性是示例性的,并且意图和属性不被局限于以上示例,也可以包括其他各种意图和属性。
在本公开的实施例中,用户的意图和属性都可以通过对于用户的输入进行自然语言理解获得的实体来产生。但是,当某一属性无法通过对输入进行自然语言理解获得的实体得到时,可以通过一个或多个方式填充该属性。对于填充属性的方式,将在下文中详细描述。
除了以上内容之外,本领域技术人员已知的关于自然语言理解的所有内容都可以被结合在本公开中,被包括在本公开所讨论的范围内。
之后,在步骤S203,由处理器10利用知识图谱对语义表示进行处理,以生成答复。知识图谱(Knowledge graph)的概念对于本领域技术人员来说是已知的。知识图谱是一种结构化的语义知识库,用于以符号形式描述物理世界中的概念及其相互关系,其基本组成单位例如是“实体-关系-实体”三元组以及“实体-参数-值”三元组,实体间通过关系相互连接,构成网状的知识结构。也就是说,实体(或概念、事件等)构成了知识图谱中的节点,而实体间的各种关系构成了网络中的连线。相比于传统的信息检索方式,知识图谱的特点是具有推理能力(即,能够通过推理来实现信息的检索)以及能够以图形化方式展示经过分类整理的结构化知识。
图4是根据本公开的知识图谱的示意图。如图4所示,图中示出的各个实体(节点)包括“球队”、“球员”、“教练”、“国际联赛”、“国家队”等,而实体间的关系包括“效力”、“执教”、“兄弟”、“好朋友”等。另外,图中还包括实体的参数,例如,“国籍”、“进球数”、“助攻数”等以及对应的数值。图4中示出的实体、关系、参数等都是示意性的,并且本领域技术人员可以想到各种实体、关系、参数,它们都包括在本公开的范围内。此外,为了清楚,图4中仅示出了一部分实体、关系、参数,本领域技术人员可以想到可以向图中增加其他实体,并且每个实体之间可以具有各种关系,而每个实体也可以具有各种参数。
通过该知识图谱,可以直观地表示出为各个球队效力的球员和指导该球队的教练、球员之间的关系、教练之间的关系、球员的各种相关参数及其数值、球队的各种相关参数及其数值等等。根据图中示出的知识图谱,可以以图形化方式直观地展示关于示出的球队的各种知识,并且通过推理,可以从一个实体通过连接各个实体之间的关系确定另一个实体。
本领域技术人员可以明白,图4是中示出的节点、关系和参数等都是示意性的,根据本公开的实施例的知识图谱可以包括更多的节点、关系和参数,并且节点之间的 关系可以更加复杂。另外,两个节点之间不局限于一个关系,而可以包括多种不同的关系。此外,在图4中示出的知识图谱的基础上,例如还可以增加表示时间的维度,以表示在各个不同时间段中节点之间的不同关系和参数。因此,根据本公开的实施例的知识图谱可以是非常庞大和复杂的,并且包括一维、二维、三维甚至更多维度的结构。
知识图谱的构建同样依赖于“实体-关系-实体”三元组以及“实体-参数-值”三元组的抽取。例如,可以利用自动手段(例如,深度神经网络等)或半自动(例如,人工干预的自动手段)从大量原始数据(例如,书籍、报纸、杂志、网页、各类数据库)中提取出知识要素,并进行三元组的抽取,并将其存入知识图谱中。在一些情况下,需要进一步进行知识融合,通过实体消歧(Entity Di sambiguation)和共指消解(Entity Resolution)来将名称不同的同一实体进行整合。
对于知识图谱的构建可以采取自上而下和自下而上而上2种方法。例如,对于球员和球队等重要节点采用自上而下的方法,即,从维基百科等高质量的数据源,提取本体信息加入知识库中。另外,对其他相对不那么重要的信息,采用自下而上的方法,从例如互联网上等公开的数据集合进行提取,选择置信度较高的信息,加入知识图谱。所构建的知识图谱的存储方式例如可以是资源描述框架(Resource Description Framework,RDF)或参数图(Property Graph)等。
除了以上内容之外,为了简洁,省略了本领域技术人员已知的各种构建知识图谱的技术,这些技术都被包括在本公开中并且可以应用到本公开的实施例中。
在本公开的一些实施例中,在步骤S203中,为了利用知识图谱对语义表示进行处理以生成答复,可以根据语义表示生成查询语句,并用查询语句对知识图谱进行查询,以生成答复。对知识图谱进行查询的语句例如可以是图数据库领域常用的Cypher语言或SPARQL语言等。通过用查询语句表示所生成的语义表示,并用查询语句对知识图谱进行查询,可以沿着知识图谱中的节点和关系得到答复。通过使用知识图谱来对语义表示进行处理,可以利用知识图谱的推理能力回答复杂和/或具有一定深度的问题。在下文中会提供用知识图谱来对语义表示进行处理的具体示例。
之后,在步骤S204,由处理器10根据答复进行自然语言生成来得到自然语言格式的输出。自然语言生成(Natural Language Generation,NLG)指的是用计算机能够理解和处理的方式来表示自然语言的意义,它也是自然语言处理(NLP)的一部分。简 单来说,自然语言生成的目的是将计算机使用的语言转换成为人类使用的自然语言。本领域技术人员也已经熟悉和知晓自然语言生成的各种原理和常用手段。相比于自然语言理解,自然语言生成可以更加简单。例如,处理器10仅需要将所得到的答复简单提供给用户就可以了。因此,在此省略对于自然语言生成的具体解释,并且本领域技术人员已知的关于自然语言生成的所有内容都可以被结合在本公开中,被包括在本公开所讨论的范围内。自然语言格式的输出可以是由答复组成的文本、由语言合成所产生的语音或者由动画软件等生成的视频。
最后,在步骤S205,通过输入输出接口30将输出提供给用户。例如,可以通过显示设备将文本显示给用户、通过扬声器将语音播放给用户、通过显示器和扬声器将视频提供给用户等。
根据本公开的实施例,图2中所示的方法是用于垂直领域的。在本公开的实施例中,由于将对话限制在一个垂直领域,可以避免了同一名词在不同领域指代不同实体的情况,从而极大地减少实体提取中“实体消歧”和“共指消解”的难度。另外,还可以极大地减少构造知识图谱的难度和所构造的知识图谱的规模,并且极大地减少自然语言处理中识别意图和属性的难度。这样,在垂直领域应用本公开的实施例的方法能够答复复杂的问题。另外,由于使用了知识图谱来对语义表示进行处理来生成答复,可以利用知识图谱的推理能力来处理用户的输入,从而能够回答来自用户的更有深度的问题,使得用户可以与机器人针对该具体领域中的话题进行垂直的、有深度的对话。在下文中,结合下面的具体示例,可以更清楚地理解本公开的以上有益效果。在本公开中,以“单项运动领域”中的“足球领域”作为“垂直领域”的示例进行描述。但是,本领域技术人员可以理解,本公开的技术方案可以应用在各种“单项运动领域”中,例如,篮球领域、排球领域、橄榄球领域、羽毛球领域、乒乓球领域等。另外,本领域技术人也可以理解,除了“单项运动领域”,本公开的技术方案也可以应用在其他各种垂直领域,例如,演艺圈领域、历史领域、地理领域等。
接下来,结合图4来说明根据本公开的实施例的与用户对话的方法的具体示例。
示例1:
在步骤S201中,处理器10通过麦克风接收到用户用语言方式提供的自然语言格式的输入“球员A在哪个球队效力?”,并将该输入通过语音识别转换为文字。对于 “球员A”,即使在其它运动领域(例如,橄榄球领域、排球领域)也有多个同名的“球员A”,由于该实施例应用于垂直领域(足球领域),所以不会发生误认为该球员A以及对应的球队是其它运动领域的球队的情况。所以,相比于开放领域的对话机器人,本公开的实施例减少了同一名词指向不同实体的情况,从而降低了语义识别的复杂度,并且在垂直领域应用本公开的实施例的方法能够答复复杂的问题。
之后,由处理器10对该文本进行预处理。例如,首先通过切词将该文本划分为独立的词或词组并对每个词进行词性标注。经过词性标注的文本可以被如下表示:
球员A/NN在/prep.哪个/r.球队/NN效力/v.?
其中,NN、prep.、r.和v.分别是名词、介词、代词和动词的英文缩写。
之后,根据所标注的词性进行句法分析,来确定每个词在句子中的成分以及句子的结构。经过句法分析的文本可以被如下表示:
球员A/Sub.在哪个球队/Obj.效力/Pred.?
其中,Sub.、Obj.和Pred.分别是主语、宾语和谓语的英文缩写。
在对文本进行预处理之后,由处理器10在步骤S202对文本进行实体抽取,抽取文本中的名词作为实体,从而确定句子中所涉及的对象。经过实体抽取的文本可以被如下表示:
球员A<Person>在哪个球队<Team>效力?
其中,<Person>和<Team>分别表示其前面的实体为人物和球队。
同时,对该文本进行意图识别,从而识别出的意图为“查询球员所属球队”。
之后,确定与该意图相关的属性为“姓名”,并且将“球员A”填充到该属性中。所得到的语义表示包括用户的意图“查询球员所属球队”和属性“球员A”。
接下来,由处理器10在步骤S203利用Cypher语句查询知识图谱,得到“球员A”所在球队。例如,查询语句为:
MATCH(:PERSON{name:"球员A"})-[:REL_BELONG_TO_TEAM]->(team:TEAM)
其中,“REL_BELONG_TO_TEAM”表示球员A与球队A的关系为“为该球队效力”。该查询语句的返回结果为:
RETURN team.team_name(返回结果:“球队A”)
所以,所得到的答复为“球队A”。
这也可以从图4中的相关部分中直观地得到:
Figure PCTCN2019074666-appb-000001
之后,由处理器10在步骤S204根据答复进行自然语言生成来得到自然语言格式的输出。例如,所得到的自然语言格式的输出为“球员A在球队A踢球”。
最后,在步骤S205由处理器10通过显示器或扬声器将输出提供给用户。例如,在屏幕上显示“球员A在球队A踢球”、通过扬声器播放“球员A在球队A踢球”等。
上文中示出了以Cypher语言对知识图谱进行查询的示例,本领域技术人员可以明白对知识谱图进行查询的语句不局限于Cypher语言,还可以例如是SPARQL语言等。
当采用SPARQL语言时,查询知识图谱的语句为:
PREFIX football:<http://example.com/footaball/>
SELECT DISTINCT?x WHERE{
?player football:name"球员A";
        football:team?team.
?team football:clubName?x.
}
所得到的结果与Cypher语言的查询结果相同,都是“球队A”。
根据本公开的以上示例,完成了在足球运动的垂直领域的与用户的一次对话,针对用户的询问提供了自然语言格式的答复。与直接返回搜索结果的网页相比,极大地提升了答复准确度以及改善了用户体验。
本领域技术人员可以明白,以上示例仅仅是根据本申请的利用知识图谱的垂直领域聊天机器人的简单示例,并且通过下文中更复杂的示例,根据本申请的聊天机器人的特征和优点将会更加明显。
示例2:
在步骤S201中,处理器10通过麦克风接收到用户用语言方式提供的自然语言格式的输入“球员A的教练是谁?”,并将该输入通过语音识别转换为文字。关于该输入,从图4可以看到,“球员A”为“球队A”效力,而“球队A”由“教练A”执教,但是在“球员A”与“教练A”之间,并没有直接的连线来表明二者的关系。也就是说,在系统中存储的数据中,并没有记录二者之间的关系。此时,对于已有的聊天机器人, 由于缺乏相应信息,所以可能无法正确答复这样的问题。但是对于根据本公开的实施例,通过按照如下所示的方式利用知识图谱,可以得到正确的答复,从而向用户提供合适的输出。
之后,由处理器10对该文本进行预处理。经过词性标注的文本可以被如下表示:
球员A/NN的/u.教练/NN是/v.谁/pron.?
其中,NN、u.、v.和pron.分别是名词、助词、动词和代词的英文缩写。
之后,根据所标注的词性进行句法分析,来确定每个词在句子中的成分以及句子的结构。经过句法分析的文本可以被如下表示:
球员A的/Adj.教练/Sub.是/Pred.谁/Obj.?
其中,Adj.、Sub.、Obj.和Pred.分别是形容词、主语、宾语和谓语的英文缩写。
在对文本进行预处理之后,由处理器10在步骤S202对文本进行实体抽取,抽取文本中的名词作为实体,从而确定句子中所涉及的对象。经过实体抽取的文本可以被如下表示:
[球员A<Person>的教练]<Person>是谁<Person>?
其中,<Person>和<Name>分别表示其前面的实体为人。
同时,对该文本进行意图识别,从而识别出的意图为“查询球员的教练”。
之后,确定与该意图相关的属性为“球员ID”,并且将“球员A”填充到该属性中。所得到的语义表示包括用户的意图“查询球员的教练”和属性“球员A”。
接下来,由处理器10在步骤S203利用Cypher语句查询知识图谱,得到球员A的教练的姓名。例如,查询语句为:
MATCH(:PERSON{name:"球员A"})-[:REL_BELONG_TO_TEAM]->(team:TEAM)<-[:R EL_Coach]-(coach:Person)
在该查询语句中,通过MATCH语句首先查询该球员A所效力的球队,之后再查询该球队的教练,其中关系“[:REL_Coach]”表示球队与教练之间的关系为“球队由该教练执教”。通过以上查询语句并参照图4,可以看到通过在以上的查询语句中加入了利用知识图谱的一次推理过程,本公开的该实施例能够得到最后的答复。在该查询语句中,“REL_BELONG_TO_TEAM”表示球员A与球队A的关系为“为该球队效力”,而“REL_Coach”表示球队A的教练。该查询语句的返回结果为:
RETURN coach.name(查询结果为:“教练A”)
所以,所得到的答复为“教练A”。
这也可以从图4中的相关部分中直观地得到:
Figure PCTCN2019074666-appb-000002
之后,由处理器10在步骤S204根据答复进行自然语言生成来得到自然语言格式的输出。例如,所得到的自然语言格式的输出为“球员A的教练为教练A”。
最后,在步骤S205由处理器10通过显示器或扬声器将输出提供给用户。例如,在屏幕上显示“球员A的教练为教练A”、通过扬声器播放“球员A的教练为教练A”等。
另外,当采用SPARQL语言时,查询知识图谱的语句为:
PREFIX football:<http://example.com/footaball/>
SELECT DISTINCT?x WHERE{
?player football:name"球员A";
        football:team?team.
?coach football:coach?team;
        football:name?x.
}
所得到的结果与Cypher语言的查询结果相同,都是“教练A”。
根据本公开的以上示例,在生成答复的过程中在示例1的基础上进一步利用了知识图谱的推理能力,极大地提高了答复的深度和准确度,从而改善了用户体验。
另外,在得到答复之后,可以在知识图谱中产生新的三元组,将得到的新的关系存储在知识图谱中。例如,可以在知识图谱中增加以下三元组:
Figure PCTCN2019074666-appb-000003
写入该三元组的Cypher语句例如可以是:
CREATE(:PERSON{name:"球员A"})<-[:REL_Coach]-(:PERSON{name:"教练A"})
为了提高知识图谱的数据的准确性,该三元组例如可以在询问用户“该答复是否有用?”并得到用户的肯定答复之后增加。在图4中,为了与知识图谱中的原三元组区分,新增加的三元组的连线用虚线表示。
通过根据与用户的对话来为知识图谱增加新的三元组,可以在用户的帮助下不断地补充、完善和增加知识图谱的内容,有利于知识图谱的管理。
示例3
如上文提到的,用户的输入也可以不是用户希望得到解答的问题,而可以是例如用户陈述的某种事实或者状态。
例如,在步骤S201中,处理器10通过麦克风接收到用户用语言方式提供的自然语言格式的输入“球员A的表现不错”,并将该输入通过语音识别转换为文字。之后,由处理器10对该文本进行预处理。例如,首先通过切词将该文本划分为独立的词或词组并对每个词进行词性标注。经过词性标注的文本可以被如下表示:
球员A/NN的/u.表现/NN不错/adj.
其中,NN、u和adj.分别是名词、助词和形容词的英文缩写。
之后,根据所标注的词性进行句法分析,来确定每个词在句子中的成分以及句子的结构。经过句法分析的文本可以被如下表示:
球员A的/Adj.表现/Sub.不错/Pred.
其中,Adj.、Sub.和Pred.分别是形容词、主语、宾语和谓语的英文缩写。
在对文本进行预处理之后,由处理器10在步骤S202对文本进行实体抽取,抽取文本中的名词作为实体,从而确定句子中所涉及的对象。经过实体抽取的文本可以被如下表示:
球员A<Person>的表现不错?
其中,<Person>表示其前面的实体为人。
同时,对该文本进行意图识别,从而识别出的意图为“评价球员的表现”。
之后,确定与该意图相关的属性为“球员ID”,并且将“球员A”填充到该属性中。所得到的语义表示包括用户的意图“评价球员的表现”和属性“球员A”。
接下来,由处理器10在步骤S203利用Cypher语句查询知识图谱,查询与球员A的表现相关联的参数和数值。例如,可以查询该球员的进球数和助攻数。查询语句为:
MATCH(p:PERSON{name:"球员A"})
RETURN p.goal as goal,p.assist as assist
其中,“p.goal”和“p.assi st”分别表示球员A的进球数量和助攻数量,通过 该MATCH语句获得这两个属性的对应数值。该查询语句的返回结果为:5,11。
所以,所得到的答复为“5个进球”和“11个助攻”。
这也可以从图4中的相关部分中直观地得到:
Figure PCTCN2019074666-appb-000004
Figure PCTCN2019074666-appb-000005
之后,由处理器10在步骤S204根据答复进行自然语言生成来得到自然语言格式的输出。例如,所得到的自然语言格式的输出为“球员A已经取得了5个进球和11个助攻的成绩”。
最后,在步骤S205由处理器10通过显示器或扬声器将输出提供给用户。例如,在屏幕上显示“球员A已经取得了5个进球和11个助攻的成绩”、通过扬声器播放“球员A已经取得了5个进球和11个助攻的成绩”等。
另外,当采用SPARQL语言时,查询知识图谱的语句为:
PREFIX football:<http://example.com/footaball/>
SELECT DISTINCT?n0?n1 WHERE{
?player football:name"球员A";
        football:goal?n0;
        football:assist?n1.
}
所得到的结果与Cypher语言的查询结果相同,都是“5个进球”和“11个助攻”。
根据本公开的以上示例,完成了在足球运动的垂直领域的与用户的一次对话,针对用户陈述的某种事实或者状态提供了自然语言格式的答复。与直接返回搜索结果的网页相比,极大地提升了答复准确度以及改善了用户体验。
另外,通过对用户陈述的事实或状态进行答复,使得不仅可以回答用户提出的问题,还可以答复用户提出的除了问题之外的话题,改善了用户的感受。
根据本公开的另一个实施例,语义表示可以基于语法结构,并且对输入进行自然语言理解基本上包括实体抽取和语法结构识别两部分。实体抽取部分与上文中关于基 于意图的语义表示的实体抽取类似,在此不再进行重复。在对输入进行切词、标注词性、确定句子成分并且抽取实体之后,根据所抽取的实体来识别输入的语法结构,并用所抽取的实体和所识别的语法结构来生成语义表示。
具体来说,用户的输入在经过切词、标注词性、确定句子成分并且抽取实体之后,可以根据例如依存语法分析将句子中的各个成分识别为具有不同的依存关系,从而揭示其语法结构。直观来讲,依存语法分析识别句子中的“主谓宾”、“定状补”这些语法成分,并分析各成分之间的关系。依存语法分析所包括的依存关系例如包括:主谓关系(SBV)、动宾关系(VOB)、间宾关系(IOB)、前置宾语(FOB)、兼语(DBL)、定中关系(ATT)、状中结构(ADV)、动补结构(CMP)、并列关系(COO)、介宾关系(POB)、左附加关系(LAD)、右附加关系(RAD)、独立结构(IS)、标点(WP)、核心关系(HED)等。通过提供这样的依存关系,可以表示出句子的语法结构。
在本公开的实施例中,对语句经过实体抽取得到的语法结构的表达式被称为模版。首先,利用分类器使用机器学习算法对具有已知语法结构的大量模版进行训练。经过训练,当接收到用户的输入并从其形成新的模版后,可以利用例如机器学习算法来自动估计该模版属于某个语法结构的概率,并选择概率最大的语法结构作为识别的语法结构。可以定期将新的模版加入训练模版以更新语法结构识别的模型。
在此之后,可以根据所抽取的实体和所识别的语法结构来生成表示用户的意图的语义表示。在本公开的一个实施例中,语义表示可以被表示为与所识别的语法结构对应的表达式。图5是根据本公开的实施例的基于语法结构的语义表示的示意图。为了清楚,图5中的表达式由图谱来表示。简单来说,由图谱表示的表达式是从整个知识图谱中截取的一小片段,包括一个或多个属性,并且这些属性对应于知识图谱中的例如实体、实体之间的关系、数值和对应的参数等。可以将输入的语法结构中的各个成分定位、放置或对齐到由图谱表示的表达式中的一个属性中,从而提供该输入的语义表示。其中,一个或多个属性是基于用户的输入而未知的并且因此由问号表示。由问号表示的属性中的至少一个可以作为被查询的对象。本领域技术人员可以理解如何根据由图谱表示的表达式生成查询语句,来对知识图谱进行查询。形象地说,这个过程类似于在知识图谱中查找符合表达式中的各个属性的关系的片段,并从查找到的片段中获得被查询对象的具体内容的过程。本领域技术人员可以明白,基于语法结构的语义表示可以由任何其他的形式来表示,只要其能够表示所识别的语法结构并且能够被 用来产生查询知识图谱的查询语句即可。
与基于意图的语义表示相比,基于语法结构的语义表示不需要理解用户的意图,因此即使用户意图不明确、不容易表示或不容易理解或者预先没有获得针对该意图的模板,仍然可以对用户的输入进行处理,来得到合适的答复。
在本公开的实施例中,由知识图谱表示的表达式中的各个属性可以通过对于用户的输入进行自然语言理解来产生。但是,当某一属性无法通过对输入进行自然语言理解得到时,可以通过一个或多个方式填充该属性。对于填充属性的方式,将在下文中详细描述。
可以看出,图5中以最简单的“实体-关系-实体”三元组以及“实体-参数-值”三元组为例示出了由知识图谱表示的表达式的示例。本领域技术人员可以明白,由知识图谱表示的表达式可以具有各种其他更复杂的表达方式。下文中将结合具体示例提供由知识图谱表示的表达式的其他表达方式。
示例4
在步骤S201中,处理器10通过麦克风接收到用户用语言方式提供的自然语言格式的输入“球员A在哪个球队效力?”,并将该输入通过语音识别转换为文字。
之后,由处理器10在步骤S202对文本进行切词、词性标注、依存语法分析、实体抽取等。
经过依存语法分析的文本可以如图6所示。
对应的由知识图谱表示的表达式为:
Figure PCTCN2019074666-appb-000006
之后,根据该表达式,用Cypher或SPARQL语言生成查询语句来查询知识图谱,可以得到答复为“球队A”。
之后,由处理器10在步骤S204根据答复进行自然语言生成来得到自然语言格式的输出。例如,所得到的自然语言格式的输出为“球员A在球队A踢球”。
最后,在步骤S205由处理器10通过显示器或扬声器将输出提供给用户。例如,在屏幕上显示“球员A在球队A踢球”、通过扬声器播放“球员A在球队A踢球”等。
示例5:
在步骤S201中,处理器10通过麦克风接收到用户用语言方式提供的自然语言格式的输入“球员A的教练是谁?”,并将该输入通过语音识别转换为文字。
之后,由处理器10在步骤S202对文本进行切词、词性标注、依存语法分析、实体抽取等。
经过依存语法分析的文本可以如图7所示。
对应的由知识图谱表示的表达式为:
Figure PCTCN2019074666-appb-000007
之后,根据该表达式,用Cypher或SPARQL语言生成查询语句来查询知识图谱,可以得到答复为“教练A”。
之后,由处理器10在步骤S204根据答复进行自然语言生成来得到自然语言格式的输出。例如,所得到的自然语言格式的输出为“球员A的教练为教练A”。
最后,在步骤S205由处理器10通过显示器或扬声器将输出提供给用户。例如,在屏幕上显示“球员A的教练为教练A”、通过扬声器播放“球员A的教练为教练A”等。
在本公开的实施例中,在有些情况下,对于基于意图的语义表示和基于语法结构的语义表示来说,语义表示的某一属性可能无法通过对输入进行自然语言理解直接得到。例如,在一些情况下,用户的输入可能并不直接包括所涉及的实体,而是通过描述间接引出了所涉及的实体。例如,用户的输入可以是“球队A的教练”和“球员A的兄弟”等等。在这种情况下,无法直接根据用户输入来确定“球队A的教练”和“球员A的兄弟”所涉及的实体到底是什么。在另一些情况下,即使能够在用户的输入中直接获得所涉及的实体,但是因为队员、教练、球队等实体经常存在同名的情况、同一实体采用不同名称缩写或别名的情况、同一个外文名称对应不同中文翻译的情况等,可能仍然无法确定该实体对应于知识图谱中的哪一个实体。在另一些情况下,可能接收到的用户的语句不完整、不清楚,使得无法完全正确理解用户的意思,自然也无法获得语义表示的一些属性。
很明显,已有的聊天机器人无法应对这样的情况,不能提供给用户合适的答复。 但是,根据本申请的实施例的聊天机器人可以根据用户的输入,利用知识图谱的推理能力对语义表示进行处理,确定该属性,从而能够进一步提供恰当的答复。
因此,当语义表示的某一属性无法通过对输入进行自然语言理解直接得到时,需要通过各种方式获得该属性。以下根据本公开的实施例对获取属性的各种方式进行说明。
根据本公开的一些实施例,当语义表示的某一属性不能通过对输入进行自然语言理解直接得到时,可以根据用户的输入,利用知识图谱对语义表示进行处理,确定该属性。例如,可以直接从知识谱图中存储的三元组中得到该属性,或者利用知识图谱的推理能力,从用户的输入经过若干步推理得到该属性。以下提供关于利用知识图谱确定该属性的示例。
本公开中用来确定属性的方式既可以应用到基于意图的语义表示,又可以应用到基于语法的语义表示。
示例6
在步骤S201中,处理器10通过麦克风接收到用户用语言方式提供的自然语言格式的输入“球队A的教练和球队B的教练是什么关系?”,并将该输入通过语音识别转换为文字。关于该输入,从图4可以看到,“球队A”的教练是“教练A”而“球队B”的教练是“教练B”,但是在用户的输入中,并没有直接询问“教练A和教练B是什么关系?”。此时,对于已有的聊天机器人,由于缺乏相应信息,所以可能无法正确答复这样的问题。但是对于根据本公开的实施例,通过按照如下所示的方式利用知识图谱,可以得到正确的答复,从而向用户提供合适的输出。
之后,由处理器10对该文本进行预处理。经过词性标注的文本可以被如下表示:
球队A/NN的/u.教练/NN和/conj.球队B/NN的/u.教练/NN是/v.什么/pron.关系/NN?
其中,NN、conj.、v.和pron.分别是名词、连词、动词、介词和代词的英文缩写。
之后,根据所标注的词性进行句法分析,来确定每个词在句子中的成分以及句子的结构。经过句法分析的文本可以被如下表示:
球队A的/Adj.教练/Sub.和球队B的/Adj.教练/Sub.是/Pred.什么关系/Obj.?
其中,Adj.、Sub.、Obj.和Pred.分别是形容词、主语、宾语和谓语的英文缩写。
在对文本进行预处理之后,由处理器10在步骤S202对文本进行实体抽取,抽取文本中的名词作为实体,从而确定句子中所涉及的对象。经过实体抽取的文本可以被如下表示:
[球队A<Team>的教练]<Person>和[球队B<Team>的教练]<Person>是什么关系<Relation>?
其中,<Team>、<Person>和<Relation>分别表示其前面的实体为球队、人物和关系。
同时,对该文本进行意图识别,从而识别出的意图为“查询两人之间的关系”。
之后,确定与该意图相关的属性为两者的“姓名”。在该示例中,用户的输入中仅提供了“球队A的教练”和“球队B的教练”,并没有提供两个教练的姓名,因此,无法通过对用户的输入的自然语言理解来填充两个教练的“姓名”的属性。
为了填充该属性,在本公开的该示例中,在利用知识图谱对语义表示进行处理时,利用知识图谱来得到该属性。
具体来说,由处理器10利用Cypher语句查询知识图谱,得到球队A的教练的姓名。例如,查询语句为:
MATCH(:TEAM{name:"球队A"})<-[:REL_Coach]-(person:PERSON)
RETURN person.name(返回结果:教练A)
在该查询语句中,通过知识图谱查询了球队A的教练,其中,“REL_Coach”表示球队的教练。
所以,所得到的结果为“教练A”。因此,将“教练A”填充到第一个教练的“姓名”的属性中。类似地,通过知识图谱查询球队B的教练,所得到的结果为“教练B”,并将“教练B”填充到第二个教练的“姓名”的属性中。最终,所得到的语义表示包括用户的意图“查询两人之间的关系”和属性“教练A”、“教练B”。
接下来,由处理器10在步骤S203中利用Cypher语句查询知识图谱,得到“教练A”和“教练B”之间的关系。例如,查询语句为:
MATCH(:PERSON{name:"教练A"})-[rel]->(:PERSON{name:"教练B"})
RETURN rel.label(返回结果:“Good friends”)
在该查询语句中,通过用MATCH语句和连接两个节点的符号“-[]->”直接查询了 两个节点“教练A”和“教练B”之间的关系。该查询语句的返回结果为“Good friends”。
所以,所得到的答复为“好友关系”。
这也可以从图4中的相关部分中直观地得到:
Figure PCTCN2019074666-appb-000008
之后,由处理器10在步骤S204根据答复进行自然语言生成来得到自然语言格式的输出。例如,所得到的自然语言格式的输出为“球队A的教练和球队B的教练是好友关系”。
最后,在步骤S205由处理器10通过显示器或扬声器将输出提供给用户。例如,在屏幕上显示“球队A的教练和球队B的教练是好友关系”、通过扬声器播放“球队A的教练和球队B的教练是好友关系”等。
除了以上内容的输出之外,自然语言格式的输出还可以是例如“教练A和教练B是好友关系”,从而省略了在填充属性时利用知识图谱进行推理的过程,而仅仅针对所识别的意图和所确定的属性来生成输出。这样有利于减少生成自然语言格式的输出时系统的负担,并且给用户提供直接的结果可以改善用户的体验。
另外,当采用SPARQL语言时,查询知识图谱的语句为:
PREFIX football:<http://example.com/footaball/>
SELECT DISTINCT?x WHERE{
?coach0 football:coach/football:clubName"球队A".
?coach1 football:coach/football:clubName"球队B".
?coach0?rel?coach1.
?rel?label?x.
}
所得到的结果与Cypher语言的查询结果相同,都是“好友关系”。
另外,也可以将基于语法结构的语义表示应用在该示例中。
该输入经过依存语法分析的文本可以如图8所示。
对应的由知识图谱表示的表达式为:
Figure PCTCN2019074666-appb-000009
之后,根据该表达式,用Cypher或SPARQL语言生成查询语句来查询知识图谱,可以得到答复为“好友关系”。
根据本公开的以上示例,通过利用了知识图谱的推理能力来填充属性,极大地提高了答复的深度和准确度,从而改善了用户体验。
示例7
如上文提到的,用户的输入也可以不是用户希望得到解答的问题,而可以是例如用户陈述的某种事实或者状态。
例如,用户的输入可以是“球队A和球队B的教练的战术很相似”。在这种情况下,由于用户没有直接询问关于“教练A”和“教练B”的问题,对于已有的聊天机器人,由于缺乏相应信息,所以可能无法正确答复这样的问题。但是对于根据本公开的实施例,通过按照如下所示的方式利用知识图谱,可以得到正确的答复,从而向用户提供合适的输出。
与以上的示例6类似地,通过自然语言理解,可以获得用户的意图是“评价两人之间的关系”,而与该意图相关的属性为两者的“姓名”。之后,与以上的示例6类似地,由处理器10利用Cypher语句查询知识图谱,得到球队A和球队B的教练的姓名,即,“教练A”和“教练B”。接下来,由处理器10利用Cypher语句查询知识图谱,得到“教练A”和“教练B”之间的关系,即,“好友关系”。最后,由处理器10根据答复进行自然语言生成来得到自然语言格式的输出。例如,所得到的自然语言格式的输出为“球队A和球队B的教练之间是好友关系”。
根据本公开的以上示例,进一步利用了知识图谱的推理能力,极大地提高了答复的深度和准确度,从而改善了用户体验。
示例8
在步骤S201中,处理器10通过麦克风接收到用户用语言方式提供的自然语言格式的输入“球员A的兄弟在哪个球队效力?”,并将该输入通过语音识别转换为文字。之后,由处理器10对该文本进行预处理。经过词性标注的文本可以被如下表示:
球员A/NN的/u.兄弟/NN在/prep.哪个/adv.球队/NN效力/v.?
其中,NN、u.、prep.、adv.和v.分别是名词、助词、介词、副词和动词的英文缩 写。
之后,根据所标注的词性进行句法分析,来确定每个词在句子中的成分以及句子的结构。经过句法分析的文本可以被如下表示:
球员A的/Adj.兄弟/Sub.在哪个球队/Obj.效力/Pred.
其中,Adj.、Sub.、Obj.和Pred.分别是形容词、主语、宾语和谓语的英文缩写。
在对文本进行预处理之后,由处理器10在步骤S202对文本进行实体抽取,抽取文本中的名词作为实体,从而确定句子中所涉及的对象。经过实体抽取的文本可以被如下表示:
[球员A<Person>的兄弟]<Person>在哪个球队<Team>效力?
其中,<Team>和<Person>分别表示其前面的实体为球队和人物。
同时,对该文本进行意图识别,从而识别出的意图为“查询球员所属球队”。
之后,确定与该意图相关的属性为“姓名”。与示例1不同,在该示例中,用户的输入中仅提供了“球员A”和“球员A的兄弟”的描述,并没有提供“球员A的兄弟”的姓名,因此,无法通过对用户的输入的自然语言理解来填充该属性。
为了填充该属性,在本公开的该示例中,在利用知识图谱对语义表示进行处理时,利用知识图谱来得到该属性。
具体来说,由处理器10利用Cypher语句查询知识图谱,得到球队A的兄弟的姓名。例如,查询语句为:
MATCH(:PERSON{name:"球员A"})-[:REL_BROTHER]->(person:PERSON)
RETURN person.name(返回结果:“球员B”)
在该查询语句中,通过知识图谱查询了球员A的兄弟,其中,“REL_BROTHER”表示球员A的兄弟。
所以,所得到的结果为“球员B”。因此,将“球员B”填充到该属性中。最终,所得到的语义表示包括用户的意图“查询球员所属球队”和属性“球员B”。
接下来,与示例1类似地,由处理器10在步骤S203中利用Cypher语句查询知识图谱,得到“球员B”所在的球队。例如,查询语句为:
MATCH(:PERSON{name:"球员B"})-[:REL_BELONG_TO_TEAM]->(team:TEAM)
RETURN team.team_name(返回结果:“球队B”)
其中,“REL_BELONG_TO_TEAM”表示球员B与球队B的关系为“为该球队效力”。
所以,所得到的答复为“球队B”。
这也可以从图4中的相关部分中直观地得到:
Figure PCTCN2019074666-appb-000010
之后,由处理器10在步骤S204根据答复进行自然语言生成来得到自然语言格式的输出。例如,所得到的自然语言格式的输出为“球员B在球队B踢球”。
最后,在步骤S205由处理器10通过显示器或扬声器将输出提供给用户。例如,在屏幕上显示“球员B在球队B踢球”、通过扬声器播放“球员B在球队B踢球”等。
另外,当采用SPARQL语言时,查询知识图谱的语句为:
PREFIX football:<http://example.com/football/>
SELECT DISTINCT?x WHERE{
?player0 football:name"球员A";
        football:brother?player1.
?player1 football:team/football:clubName?x
}
所得到的结果与Cypher语言的查询结果相同,都是“球队B”。
另外,也可以将基于语法结构的语义表示应用在该示例中。
该输入经过依存语法分析的文本可以如图9所示。
对应的由知识图谱表示的表达式为:
Figure PCTCN2019074666-appb-000011
之后,根据该表达式,用Cypher或SPARQL语言生成查询语句来查询知识图谱,可以得到答复为“球队B”。
根据本公开的以上示例,通过利用了知识图谱的推理能力来填充属性,极大地提高了答复的深度和准确度,从而改善了用户体验。
在以上示例7和8中,为了描述方便,分别利用了两个Cypher语句来查询知识图谱,即,利用第一个Cypher语句来查找和填充属性,而利用第二个Cypher语句来获得答复。但是,本领域技术人员可以明白以上的描述是为了方便理解本公开。实际上, 在合适的情况下,这两个Cypher语句可以被合并为一个Cypher语句。例如,示例7的两个Cypher语句可以被合并为以下的一个Cypher语句:
MATCH(:TEAM{name:"球队A"})<-[:REL_Coach]-(person:PERSON)-->(:PERSON{name:"教练B"})
类似地,示例8中的两个Cypher语句可以被合并为以下的一个Cypher语句:
MATCH(:PERSON{name:"球员A"})-[:REL_BROTHER]->(person:PERSON)-[:REL_BELONG_TO_TEAM]->(team:TEAM)
通过整合查询过程和查询语句,有助于优化查询流程、提高系统的工作效率、改善用户体验。
示例9
在步骤S201中,处理器10通过麦克风接收到用户用语言方式提供的自然语言格式的输入“国际联赛的哪个球队的门将也是国家队球队B的队员?”,并将该输入通过语音识别转换为文字。
很明显,以上语句非常复杂,并且没有提供答复该问题所需的信息。对于已有的聊天机器人,由于这样的复杂程度和信息的缺乏,可能无法正确答复这样的问题。但是对于根据本公开的实施例,通过按照如下所示的方式利用知识图谱,可以得到正确的答复,从而向用户提供合适的输出。
之后,由处理器10在步骤S202对文本进行切词、词性标注、依存语法分析、实体抽取等。
经过依存语法分析的文本可以如图10所示。
对应的由知识图谱表示的表达式如图11所示。
之后,根据该表达式,用Cypher或SPARQL语言生成查询语句来查询知识图谱,可以得到答复为“球队A”。
根据本公开的以上示例,通过利用了知识图谱的推理能力来填充属性,极大地提高了答复的深度和准确度,从而改善了用户体验。
之后,由处理器10在步骤S204根据答复进行自然语言生成来得到自然语言格式的输出。例如,所得到的自然语言格式的输出为“球队A的门将也是国家队球队B的队员”或者“球队A的门将球员C也是国家队球队B的队员”等。
最后,在步骤S205由处理器10通过显示器或扬声器将输出提供给用户。例如,在屏幕上显示“球队A的门将也是国家队球队B的队员”、通过扬声器播放“球队A的门将也是国家队球队B的队员”等。
此外,本示例也可以应用基于用户的意图的语义表示,在此省略对其的描述。
示例10
如上文提到的,用户的输入也可以不是用户希望得到解答的问题,而可以是例如用户陈述的某种事实或者状态。
例如,用户的输入可以是“球员A的兄弟的表现太差了”。在这种情况下,由于用户没有直接询问关于“球员B”(球员A的兄弟)的问题,对于已有的聊天机器人,由于缺乏相应信息,所以可能无法正确答复这样的问题。但是对于根据本公开的实施例,通过按照如下所示的方式利用知识图谱,可以得到正确的答复,从而向用户提供合适的输出。
与以上的示例3类似地,通过自然语言理解,可以获得用户的意图是“评价球员的表现”,而与该意图相关的属性为“姓名”。之后,与以上的示例8类似地,由处理器10利用Cypher语句查询知识图谱,得到球员A的兄弟的姓名,即,“球员B”。接下来,与示例3类似地,由处理器10利用Cypher语句查询知识图谱,查询与球员B的表现相关联的参数和数值。例如,可以查询该球员B的进球数和助攻数。所得到的答复为“2个进球”和“4个助攻”。最后,由处理器10根据答复进行自然语言生成来得到自然语言格式的输出。例如,所得到的自然语言格式的输出为“球员B已经取得了2个进球和4个助攻的成绩”。
根据本公开的以上示例,进一步利用了知识图谱的推理能力,极大地提高了答复的深度和准确度,从而改善了用户体验。
根据本公开的一些实施例,用来确定属性的方式除了根据输入,利用知识图谱确定该属性之外,还可以有以下方式。
根据本公开的一些实施例,当语义表示的某一属性不能通过对输入进行自然语言理解直接得到时,可以为该属性设置默认值。例如,当用户的输入中未指定与输入相关联的时间时,可以默认用户的输入涉及当前赛季或今年的比赛。例如,当用户的输 入中提到的球员可能涉及重名的多个球员时,可以默认用户的输入涉及这些球员中最著名的那一个球员。
根据本公开的一些实施例,当语义表示的某一属性不能通过对输入进行自然语言理解直接得到时,可以根据当前时间点之前和/或之后一段时间内发生的事件来确定该属性。例如,当用户的输入涉及多个球员时,如果当前时间点、当前时间点之前一段时间内和/或当前时间点之后一段时间内发生了与这些球员中的一个球员相关联的事件,那么确定该属性为该球员。一段时间例如可以是一小时、一天、一周、一个月、一个赛季或一年,相关联的事件可以是该球员参加的比赛、该球员参加的其他活动、与该球员相关联的其他新闻事件等。例如,在一个属性对应多个球员的情况下,如果在接收到用户的输入时,正在进行有某一球员参加的比赛,那么确定该属性为这一个球员。
根据本公开的一些实施例,当语义表示的某一属性不能通过对输入进行自然语言理解直接得到时,可以通过用户的输入的上下文确定该属性。例如,用户在对话的过程中,在上文中提到或讨论过某一球队,那么在一个属性对应多个球队或多个球员的情况下,确定该属性为上文讨论过的那个球队或该球队的球员。
根据本公开的一些实施例,当语义表示的某一属性不能通过对输入进行自然语言理解直接得到时,可以根据该用户的简档确定该属性。例如,可以建立用户的简档,记录该用户的各种参数。例如,用户所在的位置、用户所关心的球队、用户所关心的球员、用户不喜欢的球队、用户不喜欢的球员、用户用来指代球队或球员的代号和/或昵称等。根据用户简档,在某一属性有多个可能选项时,可以确定该属性应当对应哪一个选项。例如,当用户提到的球队或球员可能有多个对应的选项时,可以确定属性应当是用户所在的位置的球队、用户所关心的球队或用户所关心的球员。或者,当用户提到的球队或球员可能有多个对应的选项时,可以从这些选项中排除掉用户不喜欢的球队或球员。或者,可以根据用户常用的代号和/或昵称来确定属性所对应的球队或球员。
根据本公开的一些实施例,当语义表示的某一属性不能通过对输入进行自然语言理解直接得到时,可以生成针对该属性的询问,根据询问进行自然语言生成来得到输出,将输出提供给用户,并从用户接收针对该询问的输入。换句话说,可以通过对用户进行追问来确定该属性。例如,当用户提到的球队或球员可能有多个对应的选项时, 可以向用户询问“您问的是XX球队吗?”或“您问的是XX球队的A吗?”等,并根据用户的针对该询问的输入来确定该属性。例如,询问的类型除了一般疑问句之外,还可以是选择疑问句,即,可以向用户询问“您问的是球队A、球队B还是球队C?”或“您问的是球员A、球员B还是球员C?”等。此外,选择疑问句中提供给用户的选项的顺序可以按照各个选项的概率来排列。例如,球队或球员的知名度越高、与问题的相关度越高,那么该选项的概率就越高,而概率高的选项将被排在更靠前的位置。
在本公开的一些实施例中,可以利用知识图谱来生成针对该属性的询问。例如,与上文的示例2类似地,当用户的输入是“球员A的教练是谁?”时,可以利用知识图谱来获得球员A的球队是“球队A”,之后,可以向用户询问“您问的是球队A的教练吗?”。例如,与上文的示例6类似地,当用户的输入是“球队A的教练和球队B的教练B之间是什么关系?”时,可以利用知识图谱来获得球队A的教练是“教练A”而球队B的教练是“教练B”,之后,可以向用户询问“您问的是教练A和教练B之间的关系吗?”。很明显,通过利用知识图谱来生成询问,可以极大地改善与用户沟通的效率,改善用户的体验。
本领域技术人员可以理解,以上提到的获取属性的各种方式都是示例性的,并且可以通过各种其它方式来确定语义表示的属性,例如,可以知识图谱领域中的各种“实体消歧”和“共指消解”技术等来确定属性。
另外,本领域技术人员可以理解,以上提到的确定语义表示的属性的各种方式可以彼此结合。例如,可以从通过上文提到的各种方式确定的参数出发来利用知识图谱最终确定属性。例如,可以将通过上文提到的各种方式分别确定的属性彼此结合,来确定属性。例如,可以根据通过上文提到的各种方式确定的属性的选项来生成询问,并根据用户针对该询问的输入来确定属性。
上文中利用Cypher语言和SPARQL语言为例描述了对知识图谱进行查询的语句,但是本领域技术人员可以明白,图数据库领域中的任何其他语言都可以用在本公开中对知识图谱进行查询。
另外,虽然在本公开的实施例中讨论了基于意图的语义表示和基于语法结构的语义表示,但是本领域技术人员也可以理解,语义表示可以具有其他各种表达方式,并且这些表达方式都被包括在本公开中并且可以应用到本公开的实施例中。另外,在本公开的一些实施例中,这些语义表示的各种表达方式可以彼此结合使用。例如,针对用户的输入,可以首先使用基于意图的语义表示进行处理,并且例如在不能识别用户 的意图时,再使用基于语法结构的语义表示来进行处理。
另外,虽然在本公开的实施例中仅讨论了利用知识图谱对语义表示进行处理以生成答复的情况,但是本公开的实施例也可以结合本领域中已知的各种技术(例如,数据库、搜索引擎等)来针对用户的输入生成答复。这些技术也被结合在本公开中,作为本公开的一部分并且可以应用到本公开的实施例中。
在说明书及权利要求中的词语“之前”、“之后”等,如果存在的话,用于描述性的目的而并不一定用于描述不变的相对位置。应当理解,这样使用的词语在适当的情况下是可互换的,使得在此所描述的本公开的实施例,例如,能够在与在此所示出的或另外描述的那些取向不同的其他取向上操作。
如在此所使用的,词语“示例性的”意指“用作示例、实例或说明”,而不是作为将被精确复制的“模型”。在此示例性描述的任意实现方式并不一定要被解释为比其它实现方式优选的或有利的。而且,本公开不受在上述技术领域、背景技术、发明内容或具体实施方式中所给出的任何所表述的或所暗示的理论所限定。
另外,仅仅为了参考的目的,还可以在下面描述中使用某种术语,并且因而并非意图限定。例如,除非上下文明确指出,否则涉及结构或元件的词语“第一”、“第二”和其它此类数字词语并没有暗示顺序或次序。
还应理解,“包括/包含”一词在本文中使用时,说明存在所指出的特征、整体、步骤、操作、单元和/或组件,但是并不排除存在或增加一个或多个其它特征、整体、步骤、操作、单元和/或组件以及/或者它们的组合。
相关领域普通技术人员应当意识到,在上述操作/步骤之间的边界仅仅是说明性的。多个操作/步骤可以结合成单个操作/步骤,单个操作/步骤可以分布于附加的操作/步骤中,并且操作/步骤可以在时间上至少部分重叠地执行。而且,另选的实施例可以包括特定操作/步骤的多个实例,并且在其他各种实施例中可以改变操作/步骤顺序。但是,其它的修改、变化和替换同样是可能的。因此,本说明书和附图应当被看作是说明性的,而非限制性的。
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是相关领域普通技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。在此公开的各实施例可以任意组合,而不脱离本公开的精神和范围。相关领域普通技术人员还应理解,可以对实施例进行多种修改而不脱离本公开的范围和精神。本公开的范围由所附权利要求来限定。

Claims (24)

  1. 一种由计算机实施的与用户对话的方法,包括:
    从用户接收自然语言格式的输入;
    对输入进行自然语言理解并生成语义表示;
    利用知识图谱对语义表示进行处理,以生成答复;
    根据答复进行自然语言生成来得到自然语言格式的输出;以及
    将输出提供给用户,
    其中,所述方法用于垂直领域。
  2. 根据权利要求1所述的方法,其中,所述输入包括用户希望得到解答的问题和用户陈述的事实或状态。
  3. 根据权利要求1所述的方法,其中,所述语义表示是基于用户的意图的,对输入进行自然语言理解并生成语义表示的步骤包括从所述输入抽取实体以及识别用户的意图,并根据所抽取的实体和所识别的用户的意图来生成语义表示。
  4. 根据权利要求3所述的方法,其中,语义表示包括用户的意图和一个或多个属性。
  5. 根据权利要求1所述的方法,其中,所述语义表示是基于语法结构的,对输入进行自然语言理解并生成语义表示的步骤包括对所述输入进行实体抽取以及识别输入的语法结构,并用所抽取的实体和所识别的语法结构来生成语义表示。
  6. 根据权利要求5所述的方法,其中,语义表示包括与所识别的语法结构对应的表达式,并且所述表达式包括一个或多个属性。
  7. 根据权利要求4或6所述的方法,其中,在对输入进行自然语言理解并生成语义表示的步骤中,当某一属性无法通过对输入进行自然语言理解直接得到时,通过以 下一个或多个方式获得该属性:
    为该属性设置默认值;
    根据所述输入,利用知识图谱确定该属性;
    根据当前时间点、当前时间点之前一段时间内和/或当前时间点之后一段时间内发生的事件来确定该属性;
    通过所述输入的上下文确定该属性;
    根据该用户的简档确定该属性;和
    生成针对该属性的询问,根据询问进行自然语言生成来得到输出,将输出提供给用户,并从用户接收针对该询问的输入。
  8. 根据权利要求7所述的方法,其中,利用知识图谱来生成针对该属性的询问。
  9. 根据权利要求1所述的方法,其中,利用知识图谱对语义表示进行处理以生成答复的步骤包括根据语义表示生成查询语句,并用查询语句对知识图谱进行查询,以生成答复。
  10. 根据权利要求1所述的方法,其中,所述输入和所述输出分别是自然语言格式的语音、视频和文字中的至少一个。
  11. 根据权利要求1所述的方法,其中,所述垂直领域包括单项运动领域。
  12. 根据权利要求11所述的方法,其中,所述单项运动领域包括足球领域、篮球领域、排球领域、橄榄球领域、羽毛球领域和乒乓球领域中的一个或多个。
  13. 一种计算机系统,包括:
    输入\输出接口,被配置为从用户接收自然语言格式的输入并向用户提供自然语言格式的输出;
    处理器;以及
    存储器,其被配置为耦合到处理器并存储计算机程序,其中,处理器被配置为执 行该程序以执行以下操作:
    从用户接收自然语言格式的输入;
    对输入进行自然语言理解并生成语义表示;
    利用知识图谱对语义表示进行处理,以生成答复;
    根据答复进行自然语言生成来得到自然语言格式的输出;以及
    将输出提供给用户,
    其中,所述方法用于垂直领域。
  14. 根据权利要求13所述的计算机系统,其中,所述输入包括用户希望得到解答的问题和用户陈述的事实或状态。
  15. 根据权利要求13所述的计算机系统,其中,所述语义表示是基于用户的意图的,并且在对输入进行自然语言理解并生成语义表示的操作中,处理器被进一步配置为从所述输入抽取实体以及识别用户的意图,并根据所抽取的实体和所识别的用户的意图来生成语义表示。
  16. 根据权利要求15所述的计算机系统,其中,语义表示包括用户的意图和与意图相关的一个或多个属性。
  17. 根据权利要求13所述的计算机系统,其中,所述语义表示是基于语法结构的,在对输入进行自然语言理解并生成语义表示的操作中,处理器被进一步配置为对所述输入进行实体抽取以及识别输入的语法结构,并用所抽取的实体和所识别的语法结构来生成语义表示。
  18. 根据权利要求17所述的计算机系统,其中,语义表示包括与所识别的语法结构对应的表达式,并且所述表达式包括一个或多个属性。
  19. 根据权利要求16或18所述的计算机系统,其中,在对输入进行自然语言理解并生成语义表示的操作中,处理器被配置为当语义表示的某一属性无法通过对输入 进行自然语言理解得到时,通过以下一个或多个方式获得该属性:
    为该属性设置默认值;
    根据所述输入,利用知识图谱确定该属性;
    根据当前时间点、当前时间点之前一段时间内和/或当前时间点之后一段时间内发生的事件来确定该属性;
    通过所述输入的上下文确定该属性;
    根据该用户的简档确定该属性;和
    生成针对该属性的询问,根据询问进行自然语言生成来得到输出,将输出提供给用户,并从用户接收针对该询问的输入。
  20. 根据权利要求19所述的计算机系统,其中,利用知识图谱来生成针对该属性的询问。
  21. 根据权利要求13所述的计算机系统,其中,利用知识图谱对语义表示进行处理以生成答复包括根据语义表示生成查询语句,并用查询语句对知识图谱进行查询,以生成答复。
  22. 根据权利要求13所述的计算机系统,其中,所述输入和所述输出分别是自然语言格式的语音、视频和文字中的至少一个。
  23. 根据权利要求13所述的计算机系统,其中,所述垂直领域包括单项运动领域。
  24. 根据权利要求23所述的计算机系统,其中,所述单项运动领域包括足球领域、篮球领域、排球领域、橄榄球领域、羽毛球领域和乒乓球领域中的一个或多个。
PCT/CN2019/074666 2018-02-13 2019-02-03 由计算机实施的与用户对话的方法和计算机系统 WO2019158014A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810147719.6A CN108491443B (zh) 2018-02-13 2018-02-13 由计算机实施的与用户对话的方法和计算机系统
CN201810147719.6 2018-02-13

Publications (1)

Publication Number Publication Date
WO2019158014A1 true WO2019158014A1 (zh) 2019-08-22

Family

ID=63340439

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/074666 WO2019158014A1 (zh) 2018-02-13 2019-02-03 由计算机实施的与用户对话的方法和计算机系统

Country Status (2)

Country Link
CN (1) CN108491443B (zh)
WO (1) WO2019158014A1 (zh)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491443B (zh) * 2018-02-13 2021-05-25 上海好体信息科技有限公司 由计算机实施的与用户对话的方法和计算机系统
US10923114B2 (en) * 2018-10-10 2021-02-16 N3, Llc Semantic jargon
CN109710941A (zh) * 2018-12-29 2019-05-03 上海点融信息科技有限责任公司 基于人工智能的用户意图识别方法和装置
CN109885665A (zh) * 2019-01-09 2019-06-14 北京小乘网络科技有限公司 一种数据查询方法、装置及系统
CN111739541B (zh) * 2019-03-19 2021-11-05 上海云思智慧信息技术有限公司 一种基于语音的会议协助方法及系统、存储介质及终端
CN109960811B (zh) * 2019-03-29 2024-04-26 联想(北京)有限公司 一种数据处理方法、装置及电子设备
CN110046227B (zh) * 2019-04-17 2023-07-18 腾讯科技(深圳)有限公司 对话系统的配置方法、交互方法、装置、设备和存储介质
CN110147451B (zh) * 2019-05-10 2021-06-29 云知声智能科技股份有限公司 一种基于知识图谱的对话命令理解方法
CN110147437B (zh) * 2019-05-23 2022-09-02 北京金山数字娱乐科技有限公司 一种基于知识图谱的搜索方法及装置
CN110399462B (zh) * 2019-07-26 2022-03-04 沈阳民航东北凯亚有限公司 一种信息的查询方法及装置
CN110413760B (zh) * 2019-07-31 2022-06-21 北京百度网讯科技有限公司 人机对话方法、装置、存储介质及计算机程序产品
CN110442700A (zh) * 2019-08-12 2019-11-12 珠海格力电器股份有限公司 用于人机交互的人机多轮对话方法及系统、智能设备
CN110674358B (zh) * 2019-08-29 2023-08-22 平安科技(深圳)有限公司 企业信息比对分析方法、装置、计算机设备及存储介质
CN111159467B (zh) * 2019-12-31 2022-05-10 青岛海信智慧家居系统股份有限公司 一种处理信息交互的方法及设备
CN112417132B (zh) * 2020-12-17 2023-11-17 南京大学 一种利用谓宾信息筛选负样本的新意图识别方法
CN114676689A (zh) * 2022-03-09 2022-06-28 青岛海尔科技有限公司 语句文本的识别方法和装置、存储介质及电子装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150095303A1 (en) * 2013-09-27 2015-04-02 Futurewei Technologies, Inc. Knowledge Graph Generator Enabled by Diagonal Search
CN104750795A (zh) * 2015-03-12 2015-07-01 北京云知声信息技术有限公司 一种智能语义检索系统和方法
CN106909662A (zh) * 2017-02-27 2017-06-30 腾讯科技(上海)有限公司 知识图谱构建方法及装置
CN108491443A (zh) * 2018-02-13 2018-09-04 上海好体信息科技有限公司 由计算机实施的与用户对话的方法和计算机系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956052A (zh) * 2016-04-27 2016-09-21 青岛海尔软件有限公司 一种基于垂直领域的知识图谱的构建方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150095303A1 (en) * 2013-09-27 2015-04-02 Futurewei Technologies, Inc. Knowledge Graph Generator Enabled by Diagonal Search
CN104750795A (zh) * 2015-03-12 2015-07-01 北京云知声信息技术有限公司 一种智能语义检索系统和方法
CN106909662A (zh) * 2017-02-27 2017-06-30 腾讯科技(上海)有限公司 知识图谱构建方法及装置
CN108491443A (zh) * 2018-02-13 2018-09-04 上海好体信息科技有限公司 由计算机实施的与用户对话的方法和计算机系统

Also Published As

Publication number Publication date
CN108491443B (zh) 2021-05-25
CN108491443A (zh) 2018-09-04

Similar Documents

Publication Publication Date Title
WO2019158014A1 (zh) 由计算机实施的与用户对话的方法和计算机系统
US11036774B2 (en) Knowledge-based question answering system for the DIY domain
US11188711B2 (en) Unknown word predictor and content-integrated translator
US10521463B2 (en) Answering questions via a persona-based natural language processing (NLP) system
US9910886B2 (en) Visual representation of question quality
Bunt et al. Towards an ISO standard for dialogue act annotation
US10169490B2 (en) Query disambiguation in a question-answering environment
US20130246392A1 (en) Conversational System and Method of Searching for Information
JP2017534941A (ja) オーファン発話検出システム及び方法
JP2017224190A (ja) コミュニケーションを支援する人工知能システム
JP7096172B2 (ja) キャラクタ性に応じた形容発話を含む対話シナリオを生成する装置、プログラム及び方法
Park et al. Systematic review on chatbot techniques and applications
US20200394185A1 (en) Intelligent knowledge-learning and question-answering
JP2020027548A (ja) キャラクタ属性に応じた対話シナリオを作成するプログラム、装置及び方法
WO2021120979A1 (zh) 生成专利概述信息的方法、装置、电子设备和介质
CN114817510B (zh) 问答方法、问答数据集生成方法及装置
Li et al. Twitter sentiment analysis of the 2016 US Presidential Election using an emoji training heuristic
Furbach et al. Cognitive systems and question answering
Mehta et al. Developing a conversational agent using ontologies
US20230222148A1 (en) Systems and methods for attribution of facts to multiple individuals identified in textual content
Miura et al. Interactive minutes generation system based on hierarchical discussion structure
Tan et al. A Real-World Human-Machine Interaction Platform in Insurance Industry
Kessler Structurally informed methods for improved sentiment analysis
JP2019061409A (ja) 複合語生成装置、プログラム及び複合語生成方法
Rehtanz Using Knowledge Graphs to Improve News Search and Exploration with Voice-Based Conversational Agents

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19754406

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19754406

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19754406

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 310821)

122 Ep: pct application non-entry in european phase

Ref document number: 19754406

Country of ref document: EP

Kind code of ref document: A1