WO2019119916A1 - 人机交互方法、系统及其电子设备 - Google Patents

人机交互方法、系统及其电子设备 Download PDF

Info

Publication number
WO2019119916A1
WO2019119916A1 PCT/CN2018/107891 CN2018107891W WO2019119916A1 WO 2019119916 A1 WO2019119916 A1 WO 2019119916A1 CN 2018107891 W CN2018107891 W CN 2018107891W WO 2019119916 A1 WO2019119916 A1 WO 2019119916A1
Authority
WO
WIPO (PCT)
Prior art keywords
intent
tag
node
human
information
Prior art date
Application number
PCT/CN2018/107891
Other languages
English (en)
French (fr)
Inventor
许建伟
秦昌博
Original Assignee
科沃斯商用机器人有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 科沃斯商用机器人有限公司 filed Critical 科沃斯商用机器人有限公司
Publication of WO2019119916A1 publication Critical patent/WO2019119916A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present application relates to a human-computer interaction method, system and electronic device thereof, and belongs to the technical field of automatic response systems.
  • human-machine dialogue in the vertical domain usually needs to be completed by the following process: First, the received language is generally converted into text vectorization; then, the pre-established knowledge base or the answer corresponding to the database is searched; finally, The corresponding answer is delivered to the user in the form of a voice, thereby implementing a complete human-machine dialogue process.
  • the existing man-machine dialogue system emphasizes the satisfaction of the user's needs, and does not consider from the perspective of the merchant at the same time. With the man-machine dialogue system, how to satisfy the merchant's recommendation for the product on the basis that the user does not have enough knowledge of all the product information. Demand, and then achieve the purpose of marketing recommendations.
  • the existing human-machine dialogue system can achieve a relatively simple dialogue content. When there is a change in user intention, a contradiction between the intention and the content of the intention, the human-computer interaction may not continue normally, seriously affecting the user. Use experience and robot productivity.
  • the technical problem to be solved by the present application is to provide a human-computer interaction method, system and electronic device thereof according to the deficiencies of the prior art, and find a sequence of intent tags under the control of the logic tree based on the machine translation multi-tag intent classifier in the shortest time.
  • a human-computer interaction method includes:
  • Step 100 Identify the voice input information of the user as text information, and further include:
  • Step 200 Output the text information to the intent tag through a machine translation multi-tag intent classifier, and dye the intent tag on the logic tree to form an intent node, and find a sequence of intent tags corresponding to the intent node under the control of the logic tree;
  • Step 300 When the sequence of intent tags is one, proceed directly to step 400;
  • step 400 it is determined whether the intent node path corresponding to the intent tag sequence is unique, if yes, proceed to step 400; otherwise, the query information is output to the user, and after the user replies, the process proceeds to step 100;
  • Step 400 Find a database according to the intent label sequence, and output commodity recommendation information.
  • Step 500 The human-computer interaction ends, waiting for the next voice input of the user.
  • the dyeing in the step 200 includes: filling a plurality of the intent tags in an intent slot of the logic tree to form an intent node.
  • the step 200 further includes:
  • the word vector is output to the plurality of intent tags by the machine translation multi-tag intent classifier, and each intent tag is filled in an intent slot of the logic tree to form an intent node.
  • the step 200 further includes: after the word vector outputs a plurality of intent tags through the machine translation multi-tag intent classifier, determining the Whether the intent tag can be filled into the intent slot of the logic tree. If it cannot be filled, go directly to step 500, otherwise continue to fill.
  • the step 200 further includes: after the word vector outputs a plurality of intent tags through the machine translation multi-tag intent classifier, determining that the intent tag is filled into the logic Whether the tree's intent slot is a mutually exclusive intent node in the same slot level, and if so, proceeds directly to step 500, otherwise continues to fill.
  • the step 500 further includes: outputting the promotion information before the end of the human-computer interaction.
  • Whether the intent node path corresponding to the intent tag sequence in the step 300 uniquely includes: whether the path of the intent node formed from the filled intent slot returns to the root node along the growth path of the logical tree is unique.
  • the intent node of the product recommendation information is output, and the step 400 further includes:
  • Step 401 Determine whether the intent node formed in the filled intent slot is an API intent node. If yes, look up the database and output the product recommendation information; otherwise, output the inquiry information to the user, and after the user replies, go to step 100.
  • the API intent node refers to an intent node that contains commodity category information.
  • the step 400 further includes:
  • step 401 it is judged whether the intent node formed in the filled intent slot is an optional node, if not, the process proceeds to step 401; otherwise, the inquiry information is output to the user, and after the user replies, the process proceeds to step 100.
  • step 100 is further included after the step 100 and before the step 200:
  • Determining whether the conversation scene is a business logic conversation if yes, proceeding to step 200; otherwise, entering a question and answer session, and performing a dialog scene determination for each user voice input in the question and answer session.
  • the machine translation multi-tag intent classifier is a cyclic neural network model.
  • the application also provides a human-computer interaction system, including:
  • a voice recognition module configured to identify a user's voice input information as user text information
  • a business logic dialog module configured to output the plurality of intent tags by the machine translation multi-tag intent classifier according to the text information of the user, and fill the plurality of the intent tags in an intent slot of the logic tree Forming an intent node, finding a sequence of intent tags corresponding to the intent node under the control of the logic tree, and determining an item to be recommended;
  • the search and dialog generation module searches for a corresponding database according to the product to be recommended, and outputs corresponding product recommendation information.
  • the human-computer interaction system further includes:
  • Scene segmentation module The module is used to predict the user input statement in advance, and delivers the user input statement to the corresponding module according to the pre-judgment result, and the corresponding module gives a corresponding answer.
  • the present application also provides a shopping guide robot system, including: a shopping guide robot and a background service terminal, the shopping guide robot includes an interactive screen, a voice recognition unit, a communication unit, a sensor unit, and a walking unit, wherein the interactive screen and the voice recognition unit are used for Identifying the user's voice input information as text information;
  • the communication unit is used for signal command communication between the shopping guide robot and the background service terminal;
  • the background service terminal includes a processing unit, a control unit, and a storage unit, where
  • the storage unit is configured to store a human-computer interaction program, and when the program is read and executed by the processing unit, the following operations are performed: outputting the text information to the intent tag through a machine-translated multi-tag intent classifier, and the intent tag is in logic Coloring on the tree to form an intent node, finding a sequence of intent tags corresponding to the intent node under the control of the logic tree; and searching the database according to the intent tag sequence, and outputting the product recommendation information;
  • the control unit is configured to output a signal command to control the shopping guide robot to perform a corresponding action.
  • the interactive screen further includes a microphone, a loudspeaker, and a touch screen Chinese-English input system coupled to the voice recognition unit.
  • the application also provides an electronic device, including:
  • a memory for storing a human-computer interaction program, the program, when being read and executed by the processing unit, performing an operation of: outputting the text information to the intent tag through a machine-translated multi-tag intent classifier, The tag is dyed on the logical tree to form an intent node, and the intent tag sequence corresponding to the intent node under the control of the logic tree is found; and according to the intent tag sequence, the database is searched and the product recommendation information is output.
  • the present application provides a human-computer interaction method, system, and electronic device thereof.
  • the machine-translated multi-tag intent classifier finds an intent tag sequence under the control of a logic tree, and clarifies user requirements and provides recommendations in the shortest time. Information and topic switching are free. While saving the cost of manual communication, it can also help merchants achieve marketing goals and achieve efficient communication between human and machine.
  • Figure 1 is a structural diagram of a machine translation multi-tag intent classifier
  • FIG. 2 is a schematic diagram of a logical tree model
  • FIG. 3 is a schematic diagram of module communication of the applicant's machine interaction system
  • FIG. 4 is a schematic diagram of a pre-customized logical tree model according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a search tree according to Embodiment 2 of the present application.
  • FIG. 6 is a schematic diagram of a search tree according to Embodiment 3 of the present application.
  • FIG. 7 is a schematic diagram of a search tree according to Embodiment 4 of the present application.
  • FIG. 8 is a schematic diagram of a search tree according to Embodiment 5 of the present application.
  • FIG. 1 is a block diagram of a machine translation multi-tag intent classifier.
  • a logic tree is included, which is composed of a ROOT root node 100 and a first slot level, a second slot level, and a third derived from the root node 100.
  • the intent node 200 of the slot level or even more slot levels constitutes a tree structure.
  • Each intent node 200 corresponds to an intent tag 400, and the intent tag 400 fills the process of forming an intent node in the intent slot as a coloring.
  • the working principle of the present application is as follows: since the user usually communicates with the shopping guide or the service robot by means of voice input, the voice input information of the user needs to be first recognized as the text information 300, that is, converted into a text sentence; The sentence is then segmented; the sentence after the segmentation is vectorized, which is part of the work of the encoder. The vector is then input into the machine translation multi-tag intent classifier, the intent tag 400 is output, the intent tag 400 is dyed on the logical tree to the corresponding intent node 200, and the intent tag sequence corresponding to the intent node 200 under the control of the logical tree is found.
  • the tag sequence forms a search tree in the logical tree, that is, the search tree is a subset of the logical tree.
  • the search tree formed at this time may be one or more.
  • the sequence of the intent tags is one
  • the database is searched according to the sequence of the intent tags, and the product recommendation information is output.
  • the number of the intent tags is multiple, it is determined whether the path of the intent node corresponding to the intent tag sequence is unique.
  • the database may be searched according to the intent label sequence, and the product recommendation information may be output; otherwise, the query information is output to the user, and the user re-recognizes the text information after the user replies, and continues to judge until the requirement is met, and the product recommendation information can be output; Or, because the product recommendation information cannot be output, the human-computer interaction is ended, and the user's next voice input is waited for.
  • the core content is based on the combination of the machine translation multi-tag intent classifier and the logic tree. Through the search tree formed by the sequence of intent tags, the user requirements are clarified in the shortest time, and the recommendation information is provided. Realize efficient communication between man and machine.
  • Step S1 identifying the voice input information of the user as text information
  • Step S2 performing word segmentation processing on the text information; in this step, the text information may be segmented by using any word segmentation tool in the prior art. For example, for the text message "I want to buy a computer", the word segmentation tool divides it into: I / think / buy / one / computer /;
  • Step S3 Vectorize the word-processed text, for example, by querying the word vector in the corpus, thereby converting the text into a combination of a plurality of high-dimensional vectors.
  • the transformed vector can be expressed as: [V1, V2, V3, V4, V5,], where V1-V5 are the corresponding word vectors of the respective participles in the example sentence.
  • Word vectorization techniques include word2vec and fasttext.
  • Fasttext technology can handle OOV vocabulary, which is used in this embodiment.
  • Step S4 The intent tag is obtained by using the word vector as an input of an intent classifier.
  • the machine translation multi-tag intent classifier is an intent classification model, with the text vector as the input, and the intent node on the logic tree controls the corresponding intent tag as the output, and the intent tag of the output can determine which intents and their The sequence relationship can be obtained by training the training corpus.
  • a machine translation coding decoding process can be used with Recurrent Neural Networks (RNN) to obtain a corresponding model formula.
  • RNN Recurrent Neural Networks
  • the encoder is the left part of the machine translation model, and the input information is encoded in order;
  • the decoder is the right part of the machine translation model, and the encoded information is decoded, and the decoding is to form an intent label sequence.
  • Figure 2 is a schematic diagram of a logical tree model.
  • the so-called logical tree is pre-customized according to the information provided by the merchant, and the complexity and refinement degree of the logical tree are related to the amount of information provided during customization.
  • the logic tree is constructed according to the requirements of the merchant, the data provided by the merchant, and the business architecture information.
  • the specific scene is preset and static in a fixed scenario; and the search tree is grown according to the logical tree in the human-machine dialogue process. What comes out is dynamic.
  • the entire process of search tree growth is generated based on the logical tree breadth-first traversal, so the search tree is always a subtree of the logical tree.
  • the growth of the search tree strictly depends on the dyeing process on the logical tree.
  • the intent tag sequence of the intent node under the control of the logical tree can be obtained, that is, the search tree.
  • the structure of the logical tree includes the root node 100 and the intent node 200 on the first slot level L1 generated under the root node 100, and the corresponding intent
  • the contents of the label are A, B and C, respectively, where A and B represent entities, ie: specific intentions, such as: shopping, eating, etc.; C stands for soft node, ie: optional node.
  • a and B represent entities, ie: specific intentions, such as: shopping, eating, etc.
  • C stands for soft node, ie: optional node.
  • the so-called “optional node” is used to represent some hidden features and user-specific attributes, such as: portraits of special users, including: age, gender, and purchase preferences, etc., which are preset in the logic tree according to consumption habits. .
  • the intent nodes of the same type are mutually exclusive, and the different types are not mutually exclusive. That is, as shown in Figure 2, the types of A and B are mutually exclusive, and the two are dotted. The box is set, and the optional nodes C and A and B belong to different types, so they are not mutually exclusive.
  • the meaning of the mutual exclusion of the same type of intent nodes is that when the topic is switched from A in the first slot level to B, the search tree is converted into the intent under the control of the logic tree of B. The sequence of tags, the topic is around the search tree of B.
  • each of the intent nodes 200 of the first slot level L1 may also divide the intent node of the second slot level L2, for example, the A in the first slot level intent node 200 may be divided into D in the intent node of the next level. And E, and the two are mutually exclusive; similarly, the B in the first slot level intent node 200 can be divided into F and G in the intent node of the next level, and the two are mutually exclusive, and the two are respectively framed by a dotted line. Up, expressed as the second slot level L2, and so on.
  • the intent node at the slot level is the API intent node 210, and so on, the next slot level J and K of H can represent "Apple” and ", respectively” "Samsung” is the specific attributes of goods, such as brands.
  • the so-called API intent node is that the intent tag corresponding to the intent node is generally the category of the commodity.
  • the category of the product is defined according to the requirements of the merchant and the general consumer.
  • the location and level of the API intent node is not fixed, and it is determined by the amount of information and the degree of refinement provided by the merchant when customizing the logical tree.
  • the significance of determining the API intent node is that it is a node that searches for the product recommendation information in the database. Only when the intent node formed in the filled intent slot is an API intent node, the database can be searched and the product recommendation information is output.
  • the database is a large structured database that can be provided by a merchant, which may include specific model specifications of the product, placement of the store or warehouse, inventory quantity, and the like, and specific information related to the commodity.
  • the output information comparison table in the database is required to output the recommended product information, and then the product recommendation information is naturally language-rendered, so that the final output result is natural. Language mode voice output.
  • the output information described in the output information comparison table is preferably text information. According to the requirements of the output format, for example, some non-robot platforms, when having a display interface and the like, may output text information or output voice information, that is, Before outputting, the text information is converted into voice information, for example, after tts is converted into voice information and played.
  • the output information comparison table and the product recommendation information may have a one-to-one correspondence relationship, or may be a one-to-many correspondence relationship, that is, one recommended product may correspond to a plurality of output information, and at this time, one of the outputs may be randomly selected.
  • the output information comparison table is continuously updated based on the merchant's database.
  • the human-computer interaction system is also divided into a training unit and a prediction unit.
  • Machine training is performed according to the results of manual labeling during training; predictive models are predicted using pre-trained models.
  • the training can be understood as the modeling process.
  • all human-computer interaction dialogues are used according to a certain percentage.
  • the training is automatically completed using computing hardware.
  • the prediction is to use the trained model to formally engage the user and the robot directly.
  • the corresponding data is obtained by the following steps:
  • Step Sx collecting corresponding information of the input question and the output answer
  • Step Sy Label the intent for each pair of input question corresponding information.
  • the intent tag sequence is obtained by the following steps:
  • one or more intent tags are obtained by the machine-translated multi-tag intent classifier, and the intent tag is colored on the logical tree to form an intent node, the intent is found
  • Table 2 The corresponding intent tag sequence of the node under the control of the logical tree, the corresponding relationship is shown in Table 2:
  • Input question Intent tag sequence I want to buy something shopping Mobile phone Shopping, digital products, mobile phones (API nodes) Apple's mobile phone Shopping, digital products, mobile phones (API nodes), Apple 128G 128G
  • the corpus can be continuously expanded to provide sufficient and rich corpus content for the human-computer interaction of the application.
  • the corpus is actually a sub-library of the database and is part of the database.
  • the output decoder portion consists of a path of the logical tree that does not include the root node, enabling machine translation to learn the classification results and the logical hierarchy; the sequence of paths is prioritized by breadth, ie: The search preferentially accesses the neighboring nodes of the tree and recursively completes the traversal search of the entire tree until the search is completed.
  • the labels given by the machine translation decoder are sorted according to the logical tree. Especially in the case of some missing tags, the logic tree can strictly filter and sort the labels given by machine translation, and the output sequence is always breadth.
  • the search tree is prioritized, and the search tree is colored according to the output sequence, so the search tree is a subset of the logical tree.
  • a humanized interactive system is also provided in the human-computer interaction system, which can greatly reduce the cost of manual labeling.
  • the machine learning method is based on the old model, which can give a set of predictive label values in advance, and correct the results of the artificial error.
  • the result of the manual labeling can be forcibly modified.
  • Manual labeling is the indicator. The person is given the label on each sentence.
  • the forced modification means that in the manual labeling process, if the label does not conform to the logical tree logic, the label will be recognized and forced to be corrected or reminded.
  • the labeler has an error in the label.
  • FIG. 3 is a schematic diagram of module communication of the applicant's machine interaction system.
  • the human-computer interaction system provided by the present application at least includes the following modules: a business logic dialog module, a question-answer module, and an open domain dialog module; and a replacement scenario segmentation module. , machine state control module and storage module, dialog retrieval and generation module. The functions of each of the above modules will be described in detail below.
  • Scene segmentation module contains a GBDT classifier, which is used to predict the user input statement in advance and send it to the corresponding module to improve the accuracy of the dialogue system response. For example, the question question submitted by the user is sent to the dialog system question and answer module, and the question and answer module gives the corresponding answer.
  • the scene segmentation module feeds the user's input into the business logic dialog module, the machine translates the multi-tag intent classifier to predict the user's intent.
  • This module contains a logic controller and a machine multi-tag intent classifier that guides the user based on the context of the user's conversation. Inferring their preferences based on the user's conversation history, and then recommending related products; it can also provide information about some public services of the merchant. This module is a module that mainly generates revenue for the merchant, which can save the merchant a lot of labor costs.
  • the machine translation multi-tag intent classifier includes the functions of encoding and decoding. During training, machine translation learns the structure of each tag and the structure of the logic tree. In the prediction, machine translation uses the trained model to output between the intent tag and the intent tag. The sequence relationship.
  • Open Domain Dialogue Module In order to make the dialogue system lively and interesting, the Open Domain Dialogue module can easily deal with users' wide-ranging problems, such as asking the name of the dialogue system, asking about the weather, and so on.
  • the open domain dialogue module answer feature is lively and interesting, and if the merchant has a demand, the corresponding content can also be integrated into the business logic dialog module.
  • the open domain dialog module mainly uses the related machine translation method and template reply method.
  • Question and answer module mainly to answer questions in the range of knowledge raised by users. For example, the user consults a certain place, asks the current offer of the merchant, and the like. This module can directly and accurately give the information the user needs.
  • the Q&A module replaces a class-level problem type classifier and divides the problem into location consultation, presence consultation, availability, entity definition consultation, enumeration, and other issues.
  • Storage Module Stores the conversation corpus and knowledge repository in the Lucene or Solr system.
  • a retrieval and dialog generation module for performing natural language rendering once the entity and user intent are identified, the answer to the question is retrieved and a response is generated by the template generation method, which reduces the size of the input corpus while responding to diversification.
  • the present application also provides a shopping guide robot system, including: a shopping guide robot and a background service terminal, the shopping guide robot includes an interactive screen, a voice recognition unit, a communication unit, a sensor unit, and a walking unit, wherein the interactive screen and the voice recognition unit are used for Identifying the user's voice input information as text information;
  • the communication unit is used for signal command communication between the shopping guide robot and the background service terminal;
  • the background service terminal includes a processing unit, a control unit, and a storage unit, where
  • the storage unit is configured to store a human-computer interaction program, and when the program is read and executed by the processing unit, the following operations are performed: outputting the text information to the intent tag through a machine-translated multi-tag intent classifier, and the intent tag is in logic Coloring on the tree to form an intent node, finding a sequence of intent tags corresponding to the intent node under the control of the logic tree; and searching the database according to the intent tag sequence, and outputting the product recommendation information;
  • the control unit is configured to output a signal command to control the shopping guide robot to perform a corresponding action.
  • the interactive screen further includes a microphone, a loudspeaker, and a touch screen Chinese-English input system coupled to the voice recognition unit.
  • FIG. 4 is a schematic diagram of a pre-customized logical tree model according to an embodiment of the present application.
  • a logical tree customized according to the information provided by the merchant in which only part of the information is illustrated, and not all, the amount of information in the logical tree customized in the actual operation is far beyond the schematic range of FIG. 4, but based on the figure.
  • the logical hierarchy shown in Figure 4 is constant. Specifically, the first level slot of the logic tree in FIG. 4 is divided into two meanings of “dining” and “shopping”, and the two intent contents are mutually exclusive; the same level also includes “price position”. “Price” is the soft node mentioned above, and “diet” and “shopping” are not mutually exclusive.
  • “shopping” node Under the “shopping” node, it is divided into nodes “home appliances", “digital” and “food”. The same level also includes “imports”, which are also soft nodes.
  • “Home appliances” can be divided into “air conditioning” and “television”; “air conditioning” is further divided into “Dajin”, “beauty” and other different brands.
  • the next level of “Digital” can be divided into “mobile phone”, “tablet” and “computer”; “mobile phone” can include “Apple”, “Samsung” and “Huawei” by brand.
  • the next level of "food” can include “fruit”, “snacks” and so on. By analogy, you can continue to rank based on more granular information provided by the merchant.
  • the user's input voice is: "I want to buy a large gold air conditioner, the kind that hangs on the wall", machine translation multi-label
  • the intent classifier converts the above speech into a text sentence and performs word segmentation: I / want / buy / big / gold / / air conditioning /, / hanging / in / wall / on / / / / species;
  • the corpus queries the word vector, thereby converting the text into a combination of multiple high-dimensional vectors, and processing the output intent tags through Fasttext technology, including: shopping, air conditioning, Daikin, wall-mounted, respectively filling the above-mentioned intent tags into the logical tree.
  • the database can be searched according to the above-mentioned intent label sequence, and the product recommendation information can be output, and the specific product recommendation information including the product model number, the placement position in the shelf, and the like can be obtained. , complete the human-computer interaction process.
  • FIG. 5 is a schematic diagram of a search tree according to Embodiment 2 of the present application.
  • the conventional human-computer interaction process may include the following contents:
  • Product recommendation information output Recommend Apple iphone TP phone for you.
  • the difference between this embodiment and the first embodiment is that in the first embodiment, the user directly utters the purchase intention, and in the embodiment, the user is confirmed by the step-by-step dialogue of the human-machine. Purchase intention.
  • the specific judgment and confirmation process is as follows: the person (1) belongs to the greeting term, and the scene segmentation module directly judges the content thereof, and does not need to perform word segmentation processing and directly issue the machine (2) as an automatic reply. After the word segmentation process, the person obtains two intent tags of “shopping” and “expensive”, and dyes the corresponding nodes in the logic tree, and judges that the path of the returning root node is two, and the user intention cannot be determined at this time. Issue the machine (4) for further judgment.
  • the intent label of the "mobile phone” is obtained, the "mobile phone” in the logic tree is dyed, and the intent tag sequence of the "mobile phone” is found as “shopping-digital-mobile phone", and The path to the intent tag sequence is unique from the "phone” back to the root directory. If in the customized logic tree, the "mobile phone” is preset as an API intent node, then the database can be directly searched for the product recommendation information, and in the custom logic tree of this embodiment, the "mobile phone” slot. The level of intent nodes are not yet API intent nodes, and the machine (6) continues to ask for more granular user intent.
  • FIG. 6 is a schematic diagram of a search tree according to Embodiment 3 of the present application.
  • the human-computer interaction process in the case of intent switching, when the user's product demand is switched from one situation to another, the main human-machine dialogue may include the following contents:
  • Product recommendation information output recommend the beautiful 1.5P air conditioner for you.
  • the dialogue from the person (1) to the machine (4) is the same as that of the second embodiment, and belongs to the conventional human-machine dialogue mode.
  • the person (3) has completed the dyeing of "shopping - digital - mobile phone", and the change in the purchase intention of the person (5), after the person (5), completed the "shopping - digital - tablet "Staining, at this time, "mobile" and “flat” belong to the same intention of the same slot level, so choose to search along the "slab” search tree to continue the next slot level, and the "mobile” search tree Then delete it.
  • FIG. 7 is a schematic diagram of a search tree according to Embodiment 4 of the present application.
  • the human-computer interaction process in the case of contradictory coexistence intention when the user proposes two or more requirements at the same time, or does not exist in the commodity shopping mall requested by the user, the main human-machine dialogue may include the following contents:
  • the mobile phone store is on the third floor
  • the air-conditioner store is on the fourth floor. Welcome .
  • FIG. 8 is a schematic diagram of a search tree according to Embodiment 5 of the present application.
  • the human-computer interaction process in the case of ambiguous nodes from bottom to top, the main human-machine dialogue can include the following:
  • Product recommendation information output Iphone8, 128G is recommended for you.
  • the intent label sequence that can be obtained may at least include "shopping - digital - mobile phone - apple", “shopping - digital - tablet - apple” and “shopping - food - fruit - apple” Three, and the path from "Apple" back to the root directory is not unique.
  • a counter-request request of the machine (4) is made, so that the currently selected intent node "Apple” has a unique path to reach the root node.
  • the database is searched according to the sequence of intent tags on the unique path, and the product recommendation information is output to complete the human-computer interaction process.
  • Table 3 only takes the purchase of an Apple mobile phone as an example, and lists several situations in which output information may appear under different conditions:
  • the general principle of the present application is: according to the information input by the user, output the intent tag through the machine translation multi-tag intent classifier, and dye the intent tag on the logical tree to form an intent node, find The intent tag sequence corresponding to the intent node under the control of the logic tree, if the path from the intent node corresponding to the intent tag sequence to the root node is unique, the database may be searched according to the intent tag sequence, and the product recommendation information is output. When there is no corresponding product recommendation information in the database, the human machine interaction can also be ended by outputting the promotion information.
  • the present application also provides an electronic device, including: a processor and a memory, the memory is configured to store a human-computer interaction program, and when the program is read and executed by the processing unit, the operation is performed to: pass the text information to the machine Translating the multi-tag intent classifier to output an intent tag, dyeing the intent tag on the logical tree to form an intent node, finding a sequence of intent tags corresponding to the intent node under the control of the logic tree; and searching the database according to the intent tag sequence , output product recommendation information.
  • a shopping guide robot is provided in the above embodiment of the present application, it is not limited thereto in practical applications.
  • human-computer interaction can be realized by a processor, and a human-computer interaction program in the memory can be called, and the information can be recommended through a series of processes as described above.
  • the recommended information is not limited to commodity information, but can be determined according to actual needs.
  • the present application provides a human-computer interaction method, system, and electronic device thereof.
  • the machine-translated multi-tag intent classifier finds an intent tag sequence under the control of a logic tree, and clarifies user requirements and provides recommendations in the shortest time. Information and topic switching are free. While saving the cost of manual communication, it can also help merchants achieve marketing goals and achieve efficient communication between human and machine.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

本申请实施例提供一种人机交互方法、系统及其电子设备,该方法包括:将用户的语音输入信息识别为文本信息,还包括:将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列;根据所述意图标签序列,查找数据库,输出商品推荐信息;本次人机交互结束,等待用户的下一次语音输入。本申请实施例基于机器翻译多标签意图分类器找到逻辑树控制下的意图标签序列,快速理清用户需求、提供推荐信息、话题切换自如,实现人机高效沟通。

Description

人机交互方法、系统及其电子设备
交叉引用
本申请引用于2017年12月22日递交的名称为“人机交互方法、系统及其电子设备”的第201711401906.4号中国专利申请,其通过引用被全部并入本申请。
技术领域
本申请涉及一种人机交互方法、系统及其电子设备,属于自动应答系统技术领域。
背景技术
在现有技术中,垂直领域的人机对话通常需要通过如下过程来完成:首先,一般是将接收的语言转化成文本向量化;然后,搜索预先设立的知识库或者数据库对应的答案;最后,将对应的答案以语音的形式输送给用户,从而实现完整的人机对话过程。现有的人机对话系统,更强调满足用户的需求,并没有同时从商家的角度考虑,借助人机对话系统,在用户对全部商品信息没有足够了解的基础上,如何满足商家对商品推荐的需求,进而达到营销推荐的目的。另外,现有的人机对话系统能够实现的对话内容比较单一,当出现用户意图切换改变、意图前后内容矛盾或者意图歧义的情况下,人机互动有可能无法再正常进行下去,严重影响用户的使用体验和机器人的工作效率。
发明内容
本申请所要解决的技术问题在于针对现有技术的不足,提供一种人机交 互方法、系统及其电子设备,基于机器翻译多标签意图分类器找到逻辑树控制下的意图标签序列,在最短时间内理清用户需求,提供推荐信息,话题切换自如,在节省人工沟通成本的同时,还能够协助商家达成营销目的,实现人机高效沟通。
本申请所要解决的技术问题是通过如下技术方案实现的:
一种人机交互方法,包括:
步骤100:将用户的语音输入信息识别为文本信息,还包括:
步骤200:将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列;
步骤300:当所述意图标签序列为一个时,直接进入步骤400;
当所述意图标签为多个时,判断所述意图标签序列对应的意图节点路径是否唯一,如果是,进入步骤400;否则输出询问信息给用户,待用户回复后进入步骤100;
步骤400:根据所述意图标签序列,查找数据库,输出商品推荐信息;
步骤500:本次人机交互结束,等待用户的下一次语音输入。
具体来说,所述步骤200中的染色包括:将多个所述意图标签填充在逻辑树的意图槽中形成意图节点。
为了准确地识别用户的语音输入信息,所述步骤200进一步包括:
所述文本信息进行分词和文本向量化处理,得到对应的词向量;
将所述词向量通过机器翻译多标签意图分类器输出多个意图标签,将每个意图标签填充在逻辑树的意图槽中形成意图节点。
当用户的商品在商家提供的数据库中不存在时,为了及时结束人机交互,所述步骤200进一步包括:所述词向量通过机器翻译多标签意图分类器输出多个意图标签后,判断所述意图标签是否能够填充到逻辑树的意图槽中,如果无法填充,则直接进入步骤500,否则继续填充。
当用户同时提出多个意图时,为了及时结束人机交互,所述步骤200进 一步包括:所述词向量通过机器翻译多标签意图分类器输出多个意图标签后,判断所述意图标签填充到逻辑树的意图槽是否为同一槽级中的互斥意图节点,如果是,则直接进入步骤500,否则继续填充。
为了给用户提供更好的服务,所述步骤500进一步包括:在人机交互结束前,输出促销信息。
所述步骤300中所述意图标签序列对应的意图节点路径是否唯一具体包括:从被填充的意图槽所形成的意图节点沿逻辑树的生长路径返回到根节点的路径是否唯一。
为了找到查找数据库,输出商品推荐信息的意图节点,所述步骤400进一步包括:
步骤401:判断被填充的意图槽中形成的意图节点是否为API意图节点,如果是则查找数据库,输出商品推荐信息;否则输出询问信息给用户,待用户回复后进入步骤100。
所述API意图节点是指包含了商品品类信息的意图节点。
为了提供更多的条件,所述步骤400进一步包括:
判断被填充的意图槽中形成的意图节点是否为可选节点,如果不是则进入步骤401;否则输出询问信息给用户,待用户回复后进入步骤100。
为了更有效地为用户提供对话,所述步骤100之后和步骤200之前还包括步骤110:
判断对话场景是否为业务逻辑对话,如果是,则进入步骤200;否则,进入问答对话,并对所述问答对话中的每一个用户语音输入都进行对话场景判断。
通常情况下,所述机器翻译多标签意图分类器为循环神经网络模型。
本申请还提供一种人机交互系统,包括:
语音识别模块:用于将用户的语音输入信息识别为用户文本信息;
业务逻辑对话模块:用于根据所述用户的文本信息,通过机器翻译多标签意图分类器,将所述文本信息输出多个意图标签,将多个所述意图标签填 充在逻辑树的意图槽中形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列,确定要推荐的商品;
检索和对话生成模块:根据所述要推荐的商品,查找对应的数据库,输出对应的商品推荐信息。
所述人机交互系统还包括:
场景切分模块:该模块用于对用户输入语句提前预判,根据预判结果将用户输入语句输送到对应的模块,并由所述对应的模块给出相应回答。
本申请还提供一种导购机器人系统,包括:导购机器人和后台服务终端,所述导购机器人包括交互屏、语音识别单元、通讯单元、传感器单元和行走单元,其中,交互屏和语音识别单元用于将用户的语音输入信息识别为文本信息;
通讯单元用于导购机器人与后台服务终端之间的信号指令通讯;
所述后台服务终端包括处理单元、控制单元和存储单元,其中,
存储单元用于存储人机交互程序,所述程序在被处理单元读取执行时,执行如下操作:将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列;并根据所述意图标签序列,查找数据库,输出商品推荐信息;
控制单元用于输出信号指令,控制导购机器人执行相应动作。
所述交互屏进一步包括与所述语音识别单元相连的话筒、扩音器和触屏式中英文输入系统。
本申请还提供一种电子设备,包括:
处理器;以及
存储器,所述存储器用于存储人机交互程序,所述程序在被处理单元读取执行时,执行如下操作:将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列;并根据所述意图标签序列,查 找数据库,输出商品推荐信息。
综上所述,本申请提供一种人机交互方法、系统及其电子设备,基于机器翻译多标签意图分类器找到逻辑树控制下的意图标签序列,在最短时间内理清用户需求,提供推荐信息,话题切换自如,在节省人工沟通成本的同时,还能够协助商家达成营销目的,实现人机高效沟通。
下面结合附图和具体实施例,对本申请的技术方案进行详细地说明。
附图说明
图1为机器翻译多标签意图分类器的结构图;
图2为逻辑树模型示意图;
图3为本申请人机交互系统的模块通讯示意图;
图4为本申请实施例一预先定制的逻辑树模型示意图;
图5为本申请实施例二的搜索树示意图;
图6为本申请实施例三的搜索树示意图;
图7为本申请实施例四的搜索树示意图;
图8为本申请实施例五的搜索树示意图。
具体实施方式
本申请提供一种人机交互方法、系统及其电子设备,在具体的实施例中可以是基于机器翻译的一种导购机器人或服务机器人。图1为机器翻译多标签意图分类器的结构图。如图1所示,在机器翻译多标签意图分类器中,包括一逻辑树,所述逻辑树由ROOT根节点100和由根节点100衍生下来的第一槽级、第二槽级、第三槽级甚至更多槽级的意图节点200组成树状结构。每一个意图节点200对应一个意图标签400,意图标签400填充在意图槽中形成意图节点的过程为染色。
具体来说,本申请的工作原理是这样的:由于用户通常采用语音输入的方式与导购或服务机器人沟通,因此需要首先将用户的语音输入信息识别为文本信息300,即:转化成文本句子;随后对句子进行分词;并将分词后的句子向量化,这属于编码器部分的工作。然后将向量输入机器翻译多标签意图分类器,输出意图标签400,将意图标签400在逻辑树上染色成对应的意图节点200,找到意图节点200在逻辑树控制下对应的意图标签序列,该意图标签序列形成逻辑树中搜索树,也就是说,搜索树为逻辑树的一个子集。由于机器翻译多标签意图分类器输出的意图标签可能是一个或多个,因此,此时所形成的搜索树可能是一个或多个。当所述意图标签序列为一个时,直接根据所述意图标签序列,查找数据库,输出商品推荐信息;当所述意图标签为多个时,判断所述意图标签序列对应的意图节点路径是否唯一,如果是,也可以根据所述意图标签序列,查找数据库,输出商品推荐信息;否则输出询问信息给用户,待用户回复后重新识别为文本信息,继续判断,直到达到要求,能够输出商品推荐信息;或者因无法输出商品推荐信息而结束本次人机交互,等待用户的下一次语音输入。如上所述仅仅为本申请工作原理,核心内容为基于机器翻译多标签意图分类器和逻辑树的结合,通过由意图标签序列形成的搜索树,在最短时间内理清用户需求,提供推荐信息,实现人机高效沟通。
为了能够根据用户的语音输入信息快速确定用户的意图,输出商品推荐信息,具体步骤如下:
步骤S1:将用户的语音输入信息识别为文本信息;
步骤S2:对所述文本信息进行分词处理;在本步骤中,可以采用现有技术中任意一种分词工具对文本信息进行分词。例如,针对文字信息“我想买个电脑”,分词工具将其分为:我/想/买/个/电脑/;
步骤S3:将分词处理后的文本向量化,例如,通过在语料库中查询词向量,从而将文本转换为多个高维向量的组合。如前的例句,转化后的向量可以表示为:[V1,V2,V3,V4,V5,],其中V1-V5为例句中各个分词的对应词向量。
词向量化技术包括有word2vec和fasttext。Fasttext技术可以处理OOV 词汇,在本实施例中选用该词向量化技术。
步骤S4:以所述词向量作为意图分类器的输入,得到意图标签。
其中,机器翻译多标签意图分类器为一个意图分类模型,以文本向量为输入,由逻辑树上的意图节点控制对应的意图标签为输出,通过输出的意图标签便可以确定是哪些意图以及它们的序列关系,通过对训练语料进行训练可以得到所述的意图分类模型。可以用循环神经网络(Recurrent Neural Networks,简称RNN)使用机器翻译编码解码过程来获得对应的模型公式。具体来说,编码器是机器翻译模型的左侧部分,将输入信息进行有序编码;解码器是机器翻译模型的右侧部分,将编码的信息进行解码,所述解码就是形成意图标签序列。
图2为逻辑树模型示意图。如图2并结合图1所示,所谓的逻辑树是根据商家的要求是所提供的信息预先定制的,逻辑树的复杂程度和细化程度与定制时所提供的信息量大小有关。逻辑树是根据商家的要求、商家提供的数据和业务架构信息构建的,具体场景预先设置好的,在一个固定场景下是静态的;而搜索树则是在人机对话过程中根据逻辑树生长出来的,是动态变化的。搜索树生长的整个过程是根据逻辑树广度优先遍历染色生成的,因此,搜索树总是逻辑树的一个子树。也就是说,搜索树的生长严格依赖于逻辑树上的染色过程,在对逻辑树的一个意图节点染色时,就能够得到该意图节点在逻辑树控制下的意图标签序列,即:搜索树。
如图2所示,为定制好的逻辑树模型,具体来说,逻辑树的结构包括了根节点100和在根节点100下生成的第一槽级L1上的意图节点200,所对应的意图标签的内容分别为A、B和C,其中A和B代表实体,即:具体意图,如:购物、吃饭等;C代表软节点,即:可选节点。所谓“可选节点”是用来表征一些隐藏特征和用户固有属性的,比如:特殊用户的画像,包括:年龄、性别和购买喜好等等,这些内容是根据消费习惯预设在逻辑树中的。在同一槽级的意图节点中,同一类型的意图节点彼此互斥,不同类型则不互斥,也就是说,如图2所示,A和B的类型相同是互斥的,采用虚线将两者框设 起来,而可选节点C和A、B属于不同类型,因此彼此不互斥。设定同一槽级的意图节点中,同一类型的意图节点彼此互斥的意义在于,当话题从第一槽级中的A转换到B时,搜索树转换成了B的逻辑树控制下的意图标签序列,话题围绕着B的搜索树展开,由于B与A是互斥的,此时,A以及A的子树就会被从搜索树中删除,从而完成了一次意图切换。同时,软节点C经过几轮对话后,重要性也在不断降低,使用户的意图越来越明确。
另外,第一槽级L1的每一个意图节点200下还可以分出第二槽级L2的意图节点,比如:第一槽级意图节点200中的A可以分成下一级的意图节点中的D和E,且两者互斥;同样地,第一槽级意图节点200中的B可以分成下一级的意图节点中的F和G,且两者互斥,分别用虚线将两者框设起来,表示为第二槽级L2,以此类推。假设A代表的具体意图为“购物”,则A下一槽级的D和E可以分别代表“家电”和“数码”,而E的下一槽级H和I分别代表“手机”和“电脑”,由于H和I已经涉及到了具体的商品的品类,因此处于该槽级的意图节点为API意图节点210,以此类推,H的下一槽级J和K可以分别代表“苹果”和“三星”,就是商品的具体属性,比如品牌。在逻辑树中,所谓的API意图节点就是该意图节点对应的意图标签一般为商品的品类。商品的品类是根据商家的要求和普通消费者定义而成的。例如:空调、电视、手机等等。但API意图节点的位置和层级并不是固定的,要根据定制逻辑树时商家提供的信息量和细化程度来确定。确定API意图节点的意义在于,它是在数据库中查找商品推荐信息的节点,只有当被填充的意图槽中形成的意图节点为API意图节点时,就可以查找数据库,输出商品推荐信息了。所述数据库是大型的结构化数据库,可以由商家提供,其中会包括商品的型号规格、在商场或库房的摆放位置、库存数量等等具体的与商品有关的信息。另外,为了方便用户获得商品推荐信息,确定所要推荐的商品,首先需要对照数据库中的输出信息对照表,输出推荐的商品信息,然后对商品推荐信息进行自然语言渲染,使最后输出的结果以自然语言的方式语音输出。关于输出信息对照表中所述的输出信息,较佳为文本信息,根据输出格式的 需求,例如一些非机器人平台,具有显示界面等设备时,可以输出文本信息,也可以输出语音信息,即在输出前,将文本信息转成语音信息,例如通过tts转为语音信息后播放。输出信息对照表与商品推荐信息可以为一一对应关系,也可以为一对多的对应关系,即:一个推荐商品可以对应多个输出信息,此时,可随机选取其中一个输出。为了提供更方便准确的服务,输出信息对照表基于商家的数据库会不断进行更新。
和大多数机器学习系统类似,本申请所提供的人机交互系统也分为训练单元和预测单元。训练的时候按照人工标注的结果进行机器学习训练;预测的时候使用预先训练好的模型预测。其中的训练,可以理解为建模过程,建模的时候会按照一定百分比使用所有人机交互对话,训练是使用计算硬件自动完成的。预测是使用训练的模型正式投入用户和机器人直接的交互过程。
在训练过程中,通过如下步骤获得相应数据:
步骤Sx:收集输入问句及输出答案的对应信息;
步骤Sy:为每一对输入问句对应信息标注意图。
经过步骤Sx和步骤Sy,得到的数据如表1所示:
表1
输入问题 意图
我要买东西 购物
手机 购物、数码产品、手机(API节点)
苹果的手机 购物、数码产品、手机(API节点)、苹果
128G 128G
在预测过程中,通过如下步骤获得意图标签序列:
将文本信息的词向量作为机器多标签意图分类器的输入,通过机器翻译多标签意图分类器得到一个或多个意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列, 对应关系如表2所示:
表2
输入问题 意图标签序列
我要买东西 购物
手机 购物、数码产品、手机(API节点)
苹果的手机 购物、数码产品、手机(API节点)、苹果
128G 128G
经过上述语料训练方法,可以不断扩充语料,为本申请的人机交互提供充分、丰富的语料内容。语料库其实也是数据库的一个子库,属于数据库中的一部分。
在训练的时候,输出解码器部分由逻辑树的某一个路径(path)构成,该路径不包括根节点,使机器翻译能够学习到分类结果以及逻辑层级结构;路径的序列按照广度优先,即:搜索优先访问树的相邻节点并递归的完成整个树的遍历搜索,直到搜索完成。在预测的时候,机器翻译解码器给出的标签按照逻辑树排序,尤其在一些缺失标签的时候,逻辑树能够对机器翻译给出的标签做出严格的筛选和排序,输出的序列总是广度优先搜索树,根据输出序列染色并生成搜索树,因此搜索树是逻辑树的一个子集。
除了上述单元,人机交互系统中还设有一套人工标注辅助系统,可以大大减少人工标注的成本。具体来说,机器学习方法是基于旧的模型可以预先给出一组预测标签值,对人工对错的结果做修正,同时因为逻辑树的存在,可以对人工标注的结果进行强制修改。人工标注是指标注人员对每一句话结合上、下文给出标签,强制修改则是指在人工标注过程中,如果产生不符合逻辑树逻辑的标注,该标注会被识别到并强制修正或者提醒标注人员该标注存在错误。
图3为本申请人机交互系统的模块通讯示意图。如图3所示,为了顺利 实现人机交互流程,本申请提供的人机交互系统至少要包括如下几种模块:业务逻辑对话模块、问答模块和开放域对话模块;还包换场景切分模块、机器状态控制模块和贮存模块,对话检索和生成模块。以下分别对上述各个模块的功能做详细地说明。
场景切分模块:该模块包含一个GBDT分类器,用于对用户输入语句提前预判,送入对应的模块,提高对话系统回答的准确性。例如:将用户咨询的问题语句送入对话系统问答模块,由问答模块给出相应回答。当场景切分模块将用户的输入送入业务逻辑对话模块后,就由机器翻译多标签意图分类器来预测用户意图。
业务逻辑对话模块:该模块包含一个逻辑控制器和机器多标签意图分类器,根据用户对话的上、下文引导用户。根据用户对话历史推断其喜好、进而推荐相关产品;也可以提供商家一些公共服务方面的信息。这个模块是主要让商家产生收益的模块,可以为商家节省大量人工成本。其中机器翻译多标签意图分类器包含编码和解码的功能,在训练时,机器翻译学习每个标签意图和逻辑树的结构,在预测时,机器翻译使用训练出来的模型输出意图标签和意图标签间的序列关系。
开放域对话模块:为了使对话系统生动有趣不呆板,开放域对话模块可以简单的应对用户宽泛的问题,例如:询问对话系统的名字,询问天气等。开放域对话模块回答特点是生动有趣,而且如果商家有需求,相应内容也可以整合到业务逻辑对话模块中,开放域对话模块主要使用上、下文相关的机器翻译方法和模板回复方法。
问答模块:主要为应对用户提出的知识范围内的问题咨询。例如:用户咨询某一个地点,询问商家当前的优惠活动等。这个模块可以直接精准的给出用户需要的信息。问答模块包换一个初等的问题类型分类器,将问题划分为地点咨询、存在咨询、可否咨询、实体定义咨询、列举和其他问题等等。
存贮模块:将对话语料和知识库存储在Lucene或Solr系统中。
检索和对话生成模块,用于完成自然语言的渲染:一旦识别到实体和用 户意图后,检索问题的答案并通过模板生成的方法生成回复,在回复多样化的同时减少录入语料的规模。
上面列举的各个模块实际上是相互联系相互依存的。但任何一个模块的缺失都不会影响到整个对话系统的稳定性,在一部分模块缺失的情况下,对话系统的回答质量会下降而不是宕机。因此,本领域技术人员可以根据实际的需求,对上述模块进行选择组合,应用到人机互动系统中,从而实现相应的功能。
本申请还提供一种导购机器人系统,包括:导购机器人和后台服务终端,所述导购机器人包括交互屏、语音识别单元、通讯单元、传感器单元和行走单元,其中,交互屏和语音识别单元用于将用户的语音输入信息识别为文本信息;
通讯单元用于导购机器人与后台服务终端之间的信号指令通讯;
所述后台服务终端包括处理单元、控制单元和存储单元,其中,
存储单元用于存储人机交互程序,所述程序在被处理单元读取执行时,执行如下操作:将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列;并根据所述意图标签序列,查找数据库,输出商品推荐信息;
控制单元用于输出信号指令,控制导购机器人执行相应动作。
所述交互屏进一步包括与所述语音识别单元相连的话筒、扩音器和触屏式中英文输入系统。
以下通过具体实施例,对本申请的人机交互过程进行详细地描述。
实施例一
图4为本申请实施例一预先定制的逻辑树模型示意图。如图4所示,对按照商家提供的信息定制的逻辑树,其中仅仅示意了部分信息而非全部,实际操作中定制的逻辑树中的信息量远远超出图4的示意范围,但基于图4所示的逻辑层次关系是不变的。具体来说,图4中的逻辑树在根节点下的第一 级槽分为“吃饭”和“购物”两项意图,且该两项意图内容相斥;同一级中还包括“价位”,“价位”则为上述的软节点,与“吃饭”和“购物”不互斥。在“购物”这一节点之下,又分为节点“家电”、“数码”和“食品”,同一级中还包括“进口”,“进口”同样为软节点。而“家电”可以分为“空调”和“电视”;“空调”又进一步分为“大金”、“美的”等不同的品牌。“数码”的下一级可以分为“手机”、“平板”和“电脑”;“手机”按品牌可以包括“苹果”、“三星”和“华为”等等。而“食品”的下一级可以包括“水果”、“零食”等等。以此类推,还可以根据商家提供的更细化的信息继续分级。
在如图4所示的逻辑树预制完成之后,在用户的使用过程中,比如:用户的输入的语音为:“我要买大金的空调,挂在墙上的那种”,机器翻译多标签意图分类器将上述语音转化为文本句子并进行分词处理:我/要/买/大/金/的/空调/,/挂/在/墙/上/的/那/种;将上述分词通过在语料库中查询词向量,从而将文本转换为多个高维向量的组合,并通过Fasttext技术处理获得输出意图标签包括:购物、空调、大金,壁挂式,分别将上述意图标签填充到逻辑树的意图节点中,并找到与其对应的意图标签序列,即:购物-家电-空调-大金-壁挂式,而且从“壁挂式”这一意图节点,可以依此沿“大金-空调-家电-购物”返回到逻辑树的根目录,且路径唯一。通过判断,由于其中的“空调”已经属于API意图节点,因此可以根据上述意图标签序列查找数据库,输出商品推荐信息,获得最终包括产品型号、货架中的摆放位置等内容的具体的商品推荐信息,完成人机交互过程。
实施例二
图5为本申请实施例二的搜索树示意图。如图5所示为常规的人机交互流程,主要的人机对话可以包括如下内容:
人(1)你好!
机(2)你好!你需要购物还是吃饭?
人(3)购物,贵点儿的。
机(4)你要买什么呢?
人(5)手机。
机(6)你要什么品牌?
人(7)苹果的手机。
机(8)你要多大内存?
人(9)4G的。
输出的商品推荐信息:为您推荐苹果iphone TP手机。
从上述内容可知,本实施例与实施例一之间的区别在于,在实施例一中,用户直接说出了购买意图,而在本实施例中,是通过人机逐步对话的方式来确认用户的购买意图。具体的判断和确认过程是这样的:人(1)属于招呼用语,场景切分模块会直接对其内容进行判断,无需对其进行分词处理并直接发出机(2)作为自动回复。人(3)经过分词处理后,获得“购物”和“贵”两个意图标签,将逻辑树中的相应节点染色后,判断发现返回根节点的路径为两条,此时无法确定用户意图,发出机(4)做进一步判断。对人(5)的“手机”做分词处理后获得“手机”的意图标签,将逻辑树中的“手机”染色,找到“手机”所在的意图标签序列为“购物-数码-手机”,且该意图标签序列从“手机”返回根目录的路径唯一。如果在定制的逻辑树中,“手机”被预设为API意图节点,此时就可以直接搜索数据库,输出商品推荐信息了,而在本实施例的定制逻辑树中,“手机”这一槽级的意图节点还不是API意图节点,机(6)继续询问更细化的用户意图。根据人(7)的回答,从上次染色的“手机”意图节点开始,继续向下一级染色,获得“购物-数码-手机-苹果-4G”的意图标签序列,根据上述意图标签序列查找数据库,输出商品推荐信息,完成人机交互过程。
实施例三
图6为本申请实施例三的搜索树示意图。如图6所示为意图切换情况下的人机交互流程,当用户的商品需求从一种情况切换到另一种情况时,主要 的人机对话可以包括如下内容:
人(1)你好!
机(2)你好!你需要购物还是吃饭?
人(3)我要买手机,贵的。
机(4)好的,你要什么品牌?
人(5)算了,我还是看看平板吧。
机(6)你要什么品牌?
人(7)看看三星的平板吧。
机(8)你要多大内存?
人(9)其实我想买空调。
机(10)你要多少匹的?
人(11)1.5P,美的的。
输出的商品推荐信息:为您推荐美的1.5P空调。
从上述内容可知,在本实施例中,从人(1)至机(4)的对话和实施例二相同,属于常规的人机对话方式。但是在人(3)中已经完成了对“购物-数码-手机”的染色,而人(5)的购买意图发生的变化,在人(5)之后,同时完成了对“购物-数码-平板”的染色,此时,“手机”和“平板”属于同一槽级相斥的两个意图,因此选择沿着“平板”的搜索树继续向下一槽级染色,而“手机”的搜索树随之删除。同样地,在人(9)中有发生了一次意图切换,此时的意图序列变成了“购物-家电-空调-美的-1.5P”,根据上述意图标签序列查找数据库,输出商品推荐信息,完成人机交互过程。
实施例四
图7为本申请实施例四的搜索树示意图。如图7所示为矛盾并存意图情况下的人机交互流程,当用户同时提出两种以上需求时,或者用户要求的商品商场中不存在时,主要的人机对话可以包括如下内容:
人(1)你好!
机(2)你好!你需要购物还是吃饭?
人(3)我要买苹果手机还有空调。
从上述的人机对话内容可知,在人(3)之后,找到并列的“购物-数码-手机-苹果”和“购物-家电-空调”两个意图标签序列,其中的“数码”和“家电”为同一槽级的互斥意图,用户的输入意图无法构建一条路径下的搜索树,调用普通回应,不再进行逻辑引导。
尽管本次人机交互中没有达成明确的意图,为了进一步为用户提供服务,还可以在判断无法进行逻辑引导之后主动输出促销信息,比如:手机卖场在三楼,空调卖场在四楼,欢迎光临。
如果人机对话的内容为:
人(1)你好,我要买炸弹。
切换到闲聊模块
机(2)商场二楼有促销大卖场,推荐您去看看。
从上述内容可知,当对人(1)的内容进行分词处理时,在定制的逻辑树中显然没有与“炸弹”对应的意图节点,此时,可以切换到闲聊模块或者主动输出促销信息。
如果人机对话的内容为:
人(1)你好,我要买安卓系统的苹果手机。
机(2)不存在这种商品。
同样地,从上述内容可知,当对人(1)的内容进行分词处理时,在定制的逻辑树中只可能出现“购物-数码-手机-苹果-IOS”和“购物-数码-手机-三星-安卓”意图标签序列,且“苹果”和“三星”又属于同一槽级上的互斥意图节点,无法进行逻辑引导,只能输出“不存在这种商品”作为人机互动咨询答复。
实施例五
图8为本申请实施例五的搜索树示意图。如图8所示为歧义节点从底到上的情况下的人机交互流程,主要的人机对话可以包括如下内容:
人(1)你好!
机(2)你好!你需要购物还是吃饭?
人(3)我要买苹果。
机(4)你要买手机、平板?还是水果呢?
人(5)我买手机。
机(6)好的,你要多大内存?
人(7)128G的。
输出的商品推荐信息:为您推荐iphone8,128G。
从上述内容可知,在人(3)之后,能够获得的意图标签序列至少可以包括“购物-数码-手机-苹果”、“购物-数码-平板-苹果”和“购物-食品-水果-苹果”三种,且从“苹果”返回到根目录的路径不是唯一的。当无法判断时,会做出机(4)的反问请求,使得当前选用的意图节点“苹果”有唯一的路径到达根节点。根据该唯一路径上的意图标签序列查找数据库,输出商品推荐信息,完成人机交互过程。
实施例六
以下表3仅以购买苹果手机为例,列出在不同条件下,可能会出现的输出信息的几种情形:
表3
Figure PCTCN2018107891-appb-000001
从表3中所罗列的信息可知,本申请的总体原则是:根据用户输入的信息,通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列,如果从意图标签序列对应的意图节点到根节点的路径是唯一的,即可根据所述意图标签序列,查找数据库,输出商品推荐信息。当数据库中没有对应的商品推荐信息时,还可以以输出促销信息的方式来结束人机交互。
本申请还提供一种电子设备,包括:处理器以及存储器,所述存储器用 于存储人机交互程序,所述程序在被处理单元读取执行时,执行如下操作:将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列;并根据所述意图标签序列,查找数据库,输出商品推荐信息。由上述内容可知,尽管在本申请的上述实施例中提供的是一种导购机器人,但在实际应用中,并不局限于此。只要提供一种电子设备,能够通过处理器实现人机交互,并调用存储器中的人机交互程序,经过上述的一系列处理,即可实现对信息的推荐。显然,所推荐的信息也不仅仅局限于商品信息,可以根据实际需要而定。
综上所述,本申请提供一种人机交互方法、系统及其电子设备,基于机器翻译多标签意图分类器找到逻辑树控制下的意图标签序列,在最短时间内理清用户需求,提供推荐信息,话题切换自如,在节省人工沟通成本的同时,还能够协助商家达成营销目的,实现人机高效沟通。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设 备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由 语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (15)

  1. 一种人机交互方法,其特征在于,包括:
    将用户的语音输入信息识别为文本信息;
    将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列;
    根据所述意图标签序列,查找数据库,输出商品推荐信息。
  2. 如权利要求1所述的人机互动方法,其特征在于:
    当所述意图标签序列为一个时,直接根据所述意图标签序列,查找数据库,输出商品推荐信息;
    当所述意图标签为多个时,判断所述意图标签序列对应的意图节点路径是否唯一,如果是,则根据所述意图标签序列,查找数据库,输出商品推荐信息;否则输出询问信息给用户,待用户回复后,返回将用户的语音输入信息识别为文本信息的处理。
  3. 如权利要求1所述的人机交互方法,其特征在于,所述染色包括:将多个所述意图标签填充在逻辑树的意图槽中形成意图节点。
  4. 如权利要求1所述的人机交互方法,其特征在于,将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列进一步包括:
    所述文本信息进行分词和文本向量化处理,得到对应的词向量;
    将所述词向量通过机器翻译多标签意图分类器输出多个意图标签,将每个意图标签填充在逻辑树的意图槽中形成意图节点。
  5. 如权利要求4所述的人机交互方法,其特征在于,将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签 序列进一步包括:所述词向量通过机器翻译多标签意图分类器输出多个意图标签后,判断所述意图标签是否能够填充到逻辑树的意图槽中,如果无法填充,则直接结束本次人机交互,否则继续填充。
  6. 如权利要求4所述的人机交互方法,其特征在于,将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列进一步包括:
    所述词向量通过机器翻译多标签意图分类器输出多个意图标签后,判断所述意图标签填充到逻辑树的意图槽是否为同一槽级中的互斥意图节点,如果是,则直接结束本次人机交互,否则继续填充。
  7. 如权利要求5或6所述的人机交互方法,其特征在于,所述结束本次人机交互前,输出促销信息。
  8. 如权利要求1所述的人机交互方法,其特征在于,所述意图标签序列对应的意图节点路径是否唯一具体包括:从被填充的意图槽所形成的意图节点沿逻辑树的生长路径返回到根节点的路径是否唯一。
  9. 如权利要求1所述的人机交互方法,其特征在于,根据所述意图标签序列,查找数据库,输出商品推荐信息进一步包括:
    判断被填充的意图槽中形成的意图节点是否为API意图节点,如果是则查找数据库,输出商品推荐信息;否则输出询问信息给用户,待用户回复后,返回将用户的语音输入信息识别为文本信息的处理。
  10. 如权利要求8所述的人机交互方法,其特征在于,根据所述意图标签序列,查找数据库,输出商品推荐信息进一步包括:
    判断被填充的意图槽中形成的意图节点是否为可选节点,如果不是则判断被填充的意图槽中形成的意图节点是否为API意图节点,如果是则查找数据库,输出商品推荐信息;否则输出询问信息给用户,待用户回复后,返回将用户的语音输入信息识别为文本信息的处理;否则输出询问信息给用户,待用户回复后,返回将用户的语音输入信息识别为文本信息的处理。
  11. 如权利要求1所述的人机交互方法,其特征在于,
    在将用户的语音输入信息识别为文本信息之后,
    将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列之前,
    还包括:
    判断对话场景是否为业务逻辑对话,如果是,则将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列;否则,进入问答对话,并对所述问答对话中的每一个用户语音输入都进行对话场景判断。
  12. 如权利要求1所述的人机交互方法,其特征在于,所述机器翻译多标签意图分类器为循环神经网络模型。
  13. 一种人机交互系统,包括:
    语音识别模块:用于将用户的语音输入信息识别为用户文本信息;
    业务逻辑对话模块:用于根据所述用户的文本信息,通过机器翻译多标签意图分类器,将所述文本信息输出多个意图标签,将多个所述意图标签填充在逻辑树的意图槽中形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列,确定要推荐的商品;
    检索和对话生成模块:根据所述要推荐的商品,查找对应的数据库,输出对应的商品推荐信息。
  14. 如权利要求13所述的人机交互系统,其特征在于,所述人机交互系统还包括:
    场景切分模块:该模块用于对用户输入语句提前预判,根据预判结果将用户输入语句输送到对应的模块,并由所述对应的模块给出相应回答。
  15. 一种电子设备,其特征在于,包括:
    处理器;以及
    存储器,所述存储器用于存储人机交互程序,所述程序在被处理单元读取执行时,执行如下操作:将所述文本信息通过机器翻译多标签意图分类器输出意图标签,将所述意图标签在逻辑树上染色形成意图节点,找到所述意图节点在逻辑树控制下对应的意图标签序列;并根据所述意图标签序列,查找数据库,输出商品推荐信息。
PCT/CN2018/107891 2017-12-22 2018-09-27 人机交互方法、系统及其电子设备 WO2019119916A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711401906.4A CN110019725A (zh) 2017-12-22 2017-12-22 人机交互方法、系统及其电子设备
CN201711401906.4 2017-12-22

Publications (1)

Publication Number Publication Date
WO2019119916A1 true WO2019119916A1 (zh) 2019-06-27

Family

ID=66993035

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/107891 WO2019119916A1 (zh) 2017-12-22 2018-09-27 人机交互方法、系统及其电子设备

Country Status (2)

Country Link
CN (1) CN110019725A (zh)
WO (1) WO2019119916A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399500A (zh) * 2019-07-23 2019-11-01 广州市要啥网信息技术有限公司 基于语音智能的商业流量方法及系统
CN111191016A (zh) * 2019-12-27 2020-05-22 车智互联(北京)科技有限公司 一种多轮对话处理方法、装置及计算设备
CN111753068A (zh) * 2020-05-27 2020-10-09 江汉大学 一种开放域对话回复自动生成方法、系统及存储介质
CN112115276A (zh) * 2020-09-18 2020-12-22 平安科技(深圳)有限公司 基于知识图谱的智能客服方法、装置、设备及存储介质
CN112560403A (zh) * 2019-09-26 2021-03-26 北京国双科技有限公司 文本的处理方法及装置、电子设备
WO2021118462A1 (en) * 2019-12-09 2021-06-17 Active Intelligence Pte Ltd Context detection
CN113076470A (zh) * 2020-01-06 2021-07-06 北京沃东天骏信息技术有限公司 一种物品推荐方法和装置
CN113609851A (zh) * 2021-07-09 2021-11-05 浙江连信科技有限公司 心理学上想法认知偏差的识别方法、装置及电子设备

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516051A (zh) * 2019-07-26 2019-11-29 北京搜狗科技发展有限公司 一种数据处理方法、装置和电子设备
CN110459210A (zh) * 2019-07-30 2019-11-15 平安科技(深圳)有限公司 基于语音分析的问答方法、装置、设备及存储介质
CN110598766B (zh) * 2019-08-28 2022-05-10 第四范式(北京)技术有限公司 一种商品推荐模型的训练方法、装置及电子设备
CN110704592B (zh) * 2019-09-27 2021-06-04 北京百度网讯科技有限公司 语句分析处理方法、装置、计算机设备和存储介质
CN111091832B (zh) * 2019-11-28 2022-12-30 秒针信息技术有限公司 一种基于语音识别的意向评估方法和系统
CN110909146B (zh) * 2019-11-29 2022-09-09 支付宝(杭州)信息技术有限公司 用于推送反问标签的标签推送模型训练方法、装置及设备
CN111694939B (zh) * 2020-04-28 2023-09-19 平安科技(深圳)有限公司 智能调用机器人的方法、装置、设备及存储介质
CN111768767B (zh) * 2020-05-22 2023-08-15 深圳追一科技有限公司 用户标签提取方法和装置、服务器、计算机可读存储介质
CN111627433B (zh) * 2020-06-16 2023-11-28 北京云迹科技股份有限公司 机器人语音订单处理的方法和装置
CN112417110A (zh) * 2020-10-27 2021-02-26 联想(北京)有限公司 一种信息处理方法及装置
CN112328849B (zh) * 2020-11-02 2024-05-07 腾讯科技(深圳)有限公司 用户画像的构建方法、基于用户画像的对话方法及装置
CN112380875A (zh) * 2020-11-18 2021-02-19 杭州大搜车汽车服务有限公司 对话标签跟踪方法、装置、电子装置及存储介质
CN113095049A (zh) * 2021-03-19 2021-07-09 广州文远知行科技有限公司 标注行为事件的方法、装置、计算机设备和存储介质
CN114333808A (zh) * 2021-12-31 2022-04-12 深圳市巨鼎医疗股份有限公司 一种自助终端的交互方法、智能终端以及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224586A (zh) * 2014-06-10 2016-01-06 谷歌公司 从先前会话检索情境
CN106471496A (zh) * 2014-06-26 2017-03-01 微软技术许可有限责任公司 在搜索中从查询重新表达中识别意图
CN106709040A (zh) * 2016-12-29 2017-05-24 北京奇虎科技有限公司 一种应用搜索方法和服务器
CN106777013A (zh) * 2016-12-07 2017-05-31 科大讯飞股份有限公司 对话管理方法和装置
CN107193865A (zh) * 2017-04-06 2017-09-22 上海奔影网络科技有限公司 人机交互中自然语言意图理解方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096717B (zh) * 2011-02-15 2013-01-16 百度在线网络技术(北京)有限公司 搜索方法及搜索引擎
CN102800006B (zh) * 2012-07-23 2016-09-14 姚明东 基于客户购物意图挖掘的实时商品推荐方法
CN105159977B (zh) * 2015-08-27 2019-01-25 百度在线网络技术(北京)有限公司 信息交互处理方法及装置
US20170185673A1 (en) * 2015-12-25 2017-06-29 Le Holdings (Beijing) Co., Ltd. Method and Electronic Device for QUERY RECOMMENDATION
JP2017152948A (ja) * 2016-02-25 2017-08-31 株式会社三菱東京Ufj銀行 情報提供方法、情報提供プログラム、および情報提供システム
CN107256267B (zh) * 2017-06-19 2020-07-24 北京百度网讯科技有限公司 查询方法和装置
CN107346340A (zh) * 2017-07-04 2017-11-14 北京奇艺世纪科技有限公司 一种用户意图识别方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224586A (zh) * 2014-06-10 2016-01-06 谷歌公司 从先前会话检索情境
CN106471496A (zh) * 2014-06-26 2017-03-01 微软技术许可有限责任公司 在搜索中从查询重新表达中识别意图
CN106777013A (zh) * 2016-12-07 2017-05-31 科大讯飞股份有限公司 对话管理方法和装置
CN106709040A (zh) * 2016-12-29 2017-05-24 北京奇虎科技有限公司 一种应用搜索方法和服务器
CN107193865A (zh) * 2017-04-06 2017-09-22 上海奔影网络科技有限公司 人机交互中自然语言意图理解方法及装置

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399500A (zh) * 2019-07-23 2019-11-01 广州市要啥网信息技术有限公司 基于语音智能的商业流量方法及系统
CN112560403A (zh) * 2019-09-26 2021-03-26 北京国双科技有限公司 文本的处理方法及装置、电子设备
WO2021118462A1 (en) * 2019-12-09 2021-06-17 Active Intelligence Pte Ltd Context detection
CN111191016A (zh) * 2019-12-27 2020-05-22 车智互联(北京)科技有限公司 一种多轮对话处理方法、装置及计算设备
CN111191016B (zh) * 2019-12-27 2023-06-02 车智互联(北京)科技有限公司 一种多轮对话处理方法、装置及计算设备
CN113076470A (zh) * 2020-01-06 2021-07-06 北京沃东天骏信息技术有限公司 一种物品推荐方法和装置
CN111753068A (zh) * 2020-05-27 2020-10-09 江汉大学 一种开放域对话回复自动生成方法、系统及存储介质
CN111753068B (zh) * 2020-05-27 2024-03-26 江汉大学 一种开放域对话回复自动生成方法、系统及存储介质
CN112115276A (zh) * 2020-09-18 2020-12-22 平安科技(深圳)有限公司 基于知识图谱的智能客服方法、装置、设备及存储介质
CN112115276B (zh) * 2020-09-18 2024-05-24 平安科技(深圳)有限公司 基于知识图谱的智能客服方法、装置、设备及存储介质
CN113609851A (zh) * 2021-07-09 2021-11-05 浙江连信科技有限公司 心理学上想法认知偏差的识别方法、装置及电子设备

Also Published As

Publication number Publication date
CN110019725A (zh) 2019-07-16

Similar Documents

Publication Publication Date Title
WO2019119916A1 (zh) 人机交互方法、系统及其电子设备
JP7483608B2 (ja) ダイアログフローをナビゲートするための機械学習ツール
JP7108122B2 (ja) コンピュータによるエージェントのための合成音声の選択
US11106983B2 (en) Intelligent interaction method and intelligent interaction system
CN110121706B (zh) 提供会话中的响应
WO2018196684A1 (zh) 对话机器人生成方法及装置
US20160379106A1 (en) Human-computer intelligence chatting method and device based on artificial intelligence
CN109716334A (zh) 选择下一用户提示类型
JP2020537777A (ja) 発言のユーザ意図を識別するための方法および装置
TW201915790A (zh) 關注點文案的生成
US20210193108A1 (en) Voice synthesis method, device and apparatus, as well as non-volatile storage medium
CN111708869B (zh) 人机对话的处理方法及装置
CN108829847B (zh) 基于翻译的多模态建模方法及其在商品检索中的应用
CN104933204A (zh) 智能网络应答的方法和装置
CN114546326A (zh) 一种虚拟人手语生成方法和系统
CN113836932A (zh) 交互方法、装置和系统,以及智能设备
CN114201589A (zh) 对话方法、装置、设备和存储介质
KR102224931B1 (ko) 신경망을 이용한 패션 상품 관련 정보 정제를 위한 서비스 제공 장치 및 방법
US11854544B1 (en) Entity resolution of product search filters
US11762871B2 (en) Methods and apparatus for refining a search
US20220414171A1 (en) System and method for generating a user query based on a target context aware token
CN115601109A (zh) 虚拟空间的物品推荐模型训练方法、物品推荐方法和装置
CN117933387A (zh) 对话数据生成方法、系统、模型训练方法、对话处理方法
CN117784948A (zh) 输入推荐方法、装置、设备及可读存储介质
CN115482490A (zh) 物品分类模型训练方法及物品分类方法、装置及介质

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.11.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18892706

Country of ref document: EP

Kind code of ref document: A1