WO2020151652A1 - 基于自然智能的自然表达处理方法、回应方法、设备及系统,对机器人进行训练的方法,人机交互系统,对基于自然智能的人机交互系统进行训练的方法,端到端控制方法和控制系统 - Google Patents

基于自然智能的自然表达处理方法、回应方法、设备及系统,对机器人进行训练的方法,人机交互系统,对基于自然智能的人机交互系统进行训练的方法,端到端控制方法和控制系统 Download PDF

Info

Publication number
WO2020151652A1
WO2020151652A1 PCT/CN2020/073180 CN2020073180W WO2020151652A1 WO 2020151652 A1 WO2020151652 A1 WO 2020151652A1 CN 2020073180 W CN2020073180 W CN 2020073180W WO 2020151652 A1 WO2020151652 A1 WO 2020151652A1
Authority
WO
WIPO (PCT)
Prior art keywords
expression
natural
language information
standard
data
Prior art date
Application number
PCT/CN2020/073180
Other languages
English (en)
French (fr)
Inventor
刘贝
余自立
陈浩然
朱显中
张志峰
Original Assignee
艾肯特公司
刘贝
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910065177.2A external-priority patent/CN110059167A/zh
Priority claimed from CN201910064406.9A external-priority patent/CN110059166A/zh
Priority claimed from CN201910065178.7A external-priority patent/CN110046232A/zh
Priority claimed from CN201910065098.1A external-priority patent/CN110008317A/zh
Priority claimed from CN201910065179.1A external-priority patent/CN110059168A/zh
Priority claimed from CN201910303688.3A external-priority patent/CN110019688A/zh
Application filed by 艾肯特公司, 刘贝 filed Critical 艾肯特公司
Publication of WO2020151652A1 publication Critical patent/WO2020151652A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation

Definitions

  • the present invention relates to a method for processing natural expression, in particular to a natural expression processing method, processing and response method, equipment and system based on natural intelligence, a method for training a robot, a human-computer interaction system, and Training method based on natural intelligence human-computer interaction system, end-to-end control method and control system.
  • MI Machine Intelligence
  • AI Artificial Intelligence
  • NLP Natural Language Processing
  • AI-NLP artificial intelligence
  • the speech recognizer can achieve 90% accuracy, but if the error occurs in the keyword (word), then the existing AI-NLP technology cannot achieve correct semantic understanding.
  • the accuracy of the speech recognizer will decrease, it is more difficult to accurately understand semantics using AI-NLP technology.
  • AI-NLP needs to manually construct a large number of grammatical models and semantic models, it will incur huge labor costs.
  • the world’s major companies engaged in the research and development and application of AI-NLP technology have thousands or more employees engaged in manual voice annotation and model building.
  • a natural expression processing method based on natural intelligence includes: receiving input of natural expression, obtaining first language information with first information granularity, and converting the first language information into The second language information with the second information granularity, wherein the magnitude of the second information granularity is between the magnitude of the first information granularity and the magnitude of the text information granularity, and the second language information is converted into the third Language information, and the third language information is the result of understanding natural expressions.
  • the second language information and the third language information corresponding to the second language information are stored in the database as pairing data.
  • various permutations and combinations of the elements of the second language information and the third language information or various permutations and combinations of the elements of the third language information are cyclically iterated to establish a first Correspondence between the various permutations and combinations of the elements of the second language information and the third language information or the various permutations and combinations of the elements of the third language information to obtain more pairing data of the second language information and the third language information, And stored in the database.
  • the natural expression processing method based on natural intelligence wherein after obtaining the second language information from the input first language information, the second language information is compared with the existing second language information in the database , And then determine the third language information corresponding to the second language information according to the comparison result, or calculate the correct rate of the second language information corresponding to a certain third language information. If the machine understanding ability is not mature enough, it is insufficient or not If it is determined to convert the second language information to a certain third language information, then perform artificial assisted understanding.
  • the third language information corresponding to the meaning of the natural expression is obtained, and the The second language information obtained from the first language information is associated with the third language information or the first language information is associated with the third language information, and the new pairing data is obtained and stored in the database.
  • the natural expression processing method based on natural intelligence for the pairing data of new second language information and third language information or the pairing data of new first language information and third language information, the various permutations and combinations of the second language information or the elements of the second language information converted from the first language information and the various permutations and combinations of the third language information or the elements of the third language information in the Correspondence between the various permutations and combinations of the elements of the second language information and the third language information or the various permutations and combinations of the elements of the third language information to obtain more pairing data of the second language information and the third language information , And stored in the database.
  • the wrong correspondence between the second language information and the third language information in the database is corrected by artificial assisted understanding.
  • the machine understanding ability is measured by self-confidence, and the self-confidence is calculated based on the correspondence between the second language information and the third language information.
  • the natural expression processing method based on natural intelligence of the embodiment of the present invention, after obtaining the second language information from the first language information, it passes through one or more of a deep neural network, a finite state converter, and an automatic encoder decoder. One to generate the logarithmic probability or similar score for the third language information, and then use the normalized exponential function to calculate the confidence in the third language information.
  • the information granularity of the second language information is 1/10 to 1/1000 of the information granularity of the text.
  • the conversion model from the second language information to the third language information is also performed Perform cycle optimization.
  • the second language information obtained by loop iteration is used to test the conversion of the second language information to the third language information by the machine, and the second language information that cannot be converted correctly
  • the language information and the third language information that should correspond correctly are written into the comparison table.
  • the second language information converted from the natural expression is first compared with the second language information stored in the comparison table.
  • a natural expression processing and response method based on natural intelligence includes: obtaining third language information according to the aforementioned natural expression processing method; calling or generating information that matches the third language information Standard response; output the standard response in a way corresponding to the first language information.
  • the standard response is fixed data stored in the response database in advance, or based on variable parameters and basic data of the standard response stored in the response database in advance. Generate standard responses.
  • a natural expression processing and response device based on natural intelligence
  • the dialogue gateway receives the natural expression from the user, sends it to the central controller for subsequent processing, and sends the response to the natural expression to the user
  • the central controller receives the natural expression from the dialogue gateway, and works with robots and MAU workstations
  • the natural expression is converted into a standard expression representing the meaning of the natural expression
  • the response generator generates a standard response corresponding to the standard expression according to the standard expression instruction
  • the robot converts the natural expression into secondary language information according to the instructions of the central controller , Where the magnitude of the information granularity of the secondary language information is between the magnitude of the information granularity of the natural expression and the magnitude of the information granularity of the text, and the secondary language information is converted into a standard expression;
  • the MAU workstation will express the natural
  • a human-computer interaction system based on natural intelligence which includes: natural expression processing and response equipment and calling equipment, wherein the user communicates with the natural expression processing and response equipment through the calling equipment, MAU Artificial agents perform manual operations on natural expression processing and response equipment.
  • Natural expression processing and response equipment includes: dialogue gateway, central controller, MAU workstation, robot, expression database, response database and response generator, among which, the dialogue gateway receives from users The natural expression is sent to the central controller for subsequent processing, and the response to the natural expression is sent to the user; the central controller receives the natural expression from the dialogue gateway, and works with the robot and the MAU workstation to convert the natural expression into Represents the standard expression of the meaning of the natural expression, and according to the standard expression instruction response generator generates a standard response corresponding to the standard expression; the robot converts the natural expression into secondary language information according to the instructions of the central controller.
  • the magnitude of information granularity of language information is between the magnitude of natural expression of information granularity and that of text, and the secondary language information is converted into standard expression;
  • the MAU workstation presents the natural expression to MAU artificial agents , MAU human agents input or select the standard expression through the MAU workstation, and then the MAU workstation sends the standard expression to the central controller;
  • the training database is used to store secondary language information and the paired data of the standard expression;
  • the response database stores response related data, including Standard response data for calling and/or data used to generate a response;
  • the response generator receives instructions from the central controller, and generates a response to the user’s natural expression by calling and/or running the data in the response database.
  • the device It further includes a trainer for training the robot to convert natural expressions into standard expressions, wherein the trainer makes the robot to train the pair of secondary language information and standard expressions that exist in the training database to transform the secondary language information
  • the various permutations and combinations of elements and the standard expression or the various permutations and combinations of the standard expression elements are looped and iterated to establish various permutations and combinations of the elements of the secondary language information and various permutations and combinations of the standard expression or standard expression elements Correspondence between, obtain more secondary language information and standard expression paired data, and store in the training database.
  • a natural expression processing method based on natural intelligence includes: receiving a first natural expression, converting the first natural expression into secondary language information, and calculating the secondary language information that will be converted from the first natural expression.
  • the level language information is converted into the confidence of the standard expression in the database.
  • the confidence of a certain standard expression is calculated to be not lower than the first confidence threshold, the standard expression is output as the result of understanding the first natural expression.
  • the natural expression processing method based on natural intelligence when the calculated confidence is lower than the second confidence threshold, it is prompted to input the second natural expression that has the same meaning as the first natural expression.
  • the natural expression processing method based on natural intelligence wherein the second natural expression is converted into secondary language information, and the confidence of converting the secondary language information converted from the second natural expression into the standard expression in the database is calculated When it is calculated that the degree of confidence for a certain standard expression is not lower than the first confidence degree threshold, output the standard expression as the result of understanding the first natural expression.
  • the natural expression processing method based on natural intelligence when the calculated confidence in a certain standard expression is lower than the first confidence threshold but not lower than the second confidence threshold, it is prompted to enter the third natural Expression to confirm whether the standard expression corresponds to the meaning of the first natural expression.
  • the third natural expression is converted into secondary language information, and the secondary language information converted from the third natural expression is calculated to be converted into the second meaning of "confirmation".
  • the confidence of the second standard expression if the confidence is not lower than the first confidence threshold, output the first standard expression as a result of understanding the first natural expression.
  • the secondary language information converted from the first natural expression and the first standard expression are stored in a database as paired data.
  • the first natural expression is artificially assisted in understanding or other manual processing.
  • the confidence level is calculated based on the correspondence between the secondary language information and the standard expression through the deep neural network, the finite state converter, and the automatic encoder decoder One or more of to generate a logarithmic probability or similar score for a single or multiple criteria, and then use a normalized exponential function to calculate the confidence in the expression or the multiple criteria.
  • the magnitude of the information granularity of the secondary language information is smaller than the magnitude of the information granularity of the text.
  • the information granularity of the secondary language information is 1/10 to 1/1000 of the information granularity of the text.
  • the natural expression processing method based on natural intelligence for the existing paired secondary language information and standard expression in the database, various permutations and combinations of the elements of the secondary language information are combined with the standard
  • the various permutations and combinations of the elements expressed or expressed by the standard are cyclically iterated to establish the correspondence between the various permutations and combinations of the elements of the secondary language information and the standard expressions or the various permutations and combinations of the elements of the standard expression to obtain more
  • the matching data of the secondary language information and the standard expression are stored in the database.
  • the natural expression processing method based on natural intelligence wherein the secondary language information obtained by loop iteration is used to test the conversion of the secondary language information to the standard expression by the machine, and the secondary language information that cannot be converted correctly
  • the standard expression that should correspond correctly is written into the comparison table, and for the subsequent input natural expression, the secondary language information converted from the natural expression is first compared with the secondary language information stored in the comparison table.
  • the conversion model of the secondary language information to the standard expression is also loop-optimized.
  • a natural expression processing and response method based on natural intelligence includes: obtaining a first standard expression by the aforementioned natural expression processing method; invoking or generating a standard response matching the standard expression; The standard response is output in a manner corresponding to the first natural expression.
  • a natural expression processing and response device based on natural intelligence
  • the dialogue gateway receives the natural expression from the user, sends it to the central controller for subsequent processing, and sends the response to the natural expression to the user
  • the central controller receives the natural expression from the dialogue gateway, and works with robots and MAU workstations
  • the natural expression is converted into a standard expression representing the meaning of the natural expression
  • the response generator generates a standard response corresponding to the standard expression according to the standard expression instruction
  • the robot converts the natural expression into secondary language information according to the instructions of the central controller Calculate the degree of confidence that converts the secondary language information converted from natural expression into the standard expression in the training database.
  • MAU workstation When the calculated confidence level for a certain standard expression is not lower than the first confidence level threshold, convert the secondary language information into this Standard expression; MAU workstation presents natural expressions to external MAU artificial agents, MAU artificial agents input or select standard expressions through MAU workstation, and then MAU workstation sends the standard expression to the central controller; training database is used to store secondary language information Matching data with standard expressions; the response database stores response related data, including standard response data for calling and/or data used to generate responses; the response generator receives instructions from the central controller, and calls and/or runs the response database Data to generate responses to the user’s natural expression.
  • a human-computer interaction system based on natural intelligence which includes: natural expression processing and response equipment and calling equipment, wherein the user communicates with the natural expression processing and response equipment through the calling equipment, MAU Artificial agents perform manual operations on natural expression processing and response equipment.
  • Natural expression processing and response equipment includes: dialogue gateway, central controller, MAU workstation, robot, training database, response database and response generator, among which the dialogue gateway receives from users The natural expression is sent to the central controller for subsequent processing, and the response to the natural expression is sent to the user; the central controller receives the natural expression from the dialogue gateway, and works with the robot and the MAU workstation to convert the natural expression into The standard expression that represents the meaning of the natural expression, and the response generator generates a standard response corresponding to the standard expression according to the standard expression instruction; the robot converts the natural expression into secondary language information according to the instructions of the central controller, and the calculation will be expressed by the natural The converted secondary language information is converted into the confidence level of the standard expression in the training database.
  • the secondary language information is converted into the standard expression; MAU workstation
  • the natural expression is presented to the MAU artificial agent.
  • the MAU artificial agent inputs or selects a standard expression through the MAU workstation, and then the MAU workstation sends the standard expression to the central controller;
  • the training database is used to store the secondary language information and the paired data of the standard expression;
  • the response database stores response-related data, including standard response data for calling and/or data used to generate the response;
  • the response generator receives instructions from the central controller, and generates information to the user by calling and/or running the data in the response database.
  • a natural expression processing method based on natural intelligence includes: setting multiple standard expressions corresponding to multiple intentions in a database, receiving natural expressions, and converting natural expressions into secondary Language information: obtain parts corresponding to multiple intentions from secondary language information, and convert the acquired parts of secondary language information corresponding to multiple intentions into standard expressions, where the information granularity of the secondary language information is of the order of magnitude Less than the order of magnitude of the information granularity of the text.
  • the secondary language information converted from the natural expression and the multiple standard expressions respectively corresponding to multiple intentions converted from the secondary language information are used as a pair
  • the data is stored in the database, and the various permutations and combinations of the elements of the secondary language information and the combinations of multiple standard expressions or the various permutations and combinations of the elements of the multiple standard expressions are cyclically iterated to establish secondary language information Correspondence between the various permutations and combinations of the elements of, and the combination of multiple standard expressions or the various permutations and combinations of the elements of the combination of multiple standard expressions, to obtain more paired data of secondary language information and standard expression combinations , And stored in the database.
  • the natural expression processing method based on natural intelligence wherein the secondary language information obtained by loop iteration is used to test the conversion of the secondary language information to the standard expression by the machine, and the secondary language information that cannot be converted correctly
  • the standard expression that should correspond correctly is written into the comparison table, and for the subsequent input natural expression, the secondary language information converted from the natural expression is first compared with the secondary language information stored in the comparison table.
  • the conversion model of the secondary language information to the standard expression is also loop-optimized.
  • the secondary language information is compared with the existing secondary language information in the database, and then Determine the standard expression or standard expression combination corresponding to the secondary language information according to the comparison result, and/or calculate the probability that the secondary language information correctly corresponds to a certain standard expression. If the machine understanding ability is not mature enough, it is insufficient or not Determine the conversion of the secondary language information to a certain standard expression, then perform artificial auxiliary understanding, understand the natural expression of the input manually, and obtain the standard expression or standard expression combination corresponding to a certain intention or certain intentions.
  • the secondary language information obtained by the natural expression is corresponding to the standard expression or the standard expression combination or the natural expression is corresponding to the standard expression or the standard expression combination, and the new paired data is obtained and stored in the database.
  • the natural expression processing method based on natural intelligence for the paired data of the new secondary language information and the standard expression or the standard expression combination or the paired data of the new natural expression and the standard expression or the standard expression combination, Various permutations and combinations of the secondary language information or elements of secondary language information converted from natural expressions and the standard expression or standard expression combination itself or various permutations and combinations of the standard expression or standard expression combination elements Carry out loop iterations to establish the correspondence between the various permutations and combinations of the elements of the secondary language information and the standard expression or the standard expression combination itself or the various permutations and combinations of the elements of the standard expression or the standard expression combination, and obtain more times
  • the paired data of level language information and standard expression or standard expression combination are stored in the database.
  • the wrong correspondence between the secondary language information in the database and the standard expression or combination of standard expressions is corrected by artificially assisted understanding.
  • the machine understanding ability is measured by self-confidence, and the self-confidence is calculated based on the correspondence between secondary language information and standard expression.
  • one or more of a deep neural network, a finite state converter, and an automatic encoder decoder is used to Generate a logarithmic probability or similar score for a single or multiple standard expressions, and then use a normalized exponential function to calculate the confidence in the expression or the multiple standard expressions.
  • the information granularity of the secondary language information is 1/10 to 1/1000 of the information granularity of the text.
  • parts corresponding to multiple intentions are obtained from secondary language information through multiple understandings or multiple rounds of conversation.
  • a plurality of upper intents are set in a database, and a plurality of lower intents are set in each upper intent, and in one intention acquisition operation, the secondary language information is obtained
  • the parts corresponding to the respective lower intentions of different upper intentions, and these parts are converted into standard expressions.
  • the standard expression corresponding to one of the multiple intents, or the combination of standard expressions corresponding to a part of the multiple intents is previously stored in the database Store the standard expression and the natural expression or secondary language information corresponding to the standard expression as paired training data, or store the standard expression combination and the natural expression or secondary language information corresponding to the standard expression combination as paired training data, and Use these paired training data for training.
  • a natural expression processing and response method based on natural intelligence includes: obtaining a standard expression or a combination of standard expressions according to the aforementioned natural expression processing method; invoking or generating a standard expression or standard expression The standard response that matches the combination of expressions; the standard response is output in a way that corresponds to the natural expression.
  • the standard response is fixed data pre-stored in the response database, or the standard response is generated based on variable parameters and basic data of the standard response pre-stored in the response database.
  • a natural expression processing and response device based on natural intelligence
  • the dialogue gateway receives the natural expression from the user, sends it to the central controller for subsequent processing, and sends the response to the natural expression to the user
  • the central controller receives the natural expression from the dialogue gateway, and works with robots and MAU workstations
  • the natural expression is converted into multiple standard expressions corresponding to the set intentions, and the standard expression instruction response generator generates a standard response corresponding to the standard expression
  • the robot converts the natural expression into secondary expressions according to the instructions of the central controller Level language information.
  • the information granularity of the secondary language information is The order of magnitude is smaller than the magnitude of the information granularity of the text;
  • the MAU workstation presents the natural expression to the external MAU artificial agent, the MAU artificial agent enters or selects the standard expression through the MAU workstation, and then the MAU workstation sends the standard expression to the central controller;
  • training database Stores secondary language information and standard expression or standard expression paired data;
  • the response database stores response related data, including standard response data for calling and/or data used to generate responses;
  • the response generator receives instructions from the central controller, By calling and/or running the data in the response database to generate a response to the user's natural expression, the trainer is used to train the robot to convert the natural expression into a standard expression or a combination of standard expressions.
  • a human-computer interaction system based on natural intelligence which includes: natural expression processing and response equipment and calling equipment, wherein the user communicates with the natural expression processing and response equipment through the calling equipment, MAU Artificial agents perform manual operations on natural expression processing and response equipment.
  • Natural expression processing and response equipment includes: dialogue gateway, central controller, MAU workstation, robot, expression database, response database and response generator, among which, the dialogue gateway receives from users The natural expression is sent to the central controller for subsequent processing, and the response to the natural expression is sent to the user; the central controller receives the natural expression from the dialogue gateway, and works with the robot and the MAU workstation to convert the natural expression into Multiple standard expressions corresponding to the multiple intents set, and the standard expression instruction response generator generates a standard response corresponding to the standard expression; the robot converts the natural expression into secondary language information according to the instructions of the central controller.
  • the secondary language information acquires the parts corresponding to multiple intents, and converts the acquired parts of the secondary language information corresponding to the multiple intentions into standard expressions, where the information granularity of the secondary language information is less than that of the text information
  • MAU workstation presents natural expressions to external MAU artificial agents, MAU artificial agents input or select standard expressions through MAU workstations, and then MAU workstations send the standard expressions to the central controller
  • training database stores secondary language information Matching data combined with standard expressions or standard expressions
  • the response database stores response-related data, including standard response data for calling and/or data used to generate responses
  • the response generator receives instructions from the central controller, and calls and/or Run the data in the response database to generate a response to the user's natural expression.
  • the trainer is used to train the robot to convert the natural expression into a standard expression or a combination of standard expressions.
  • a natural expression processing method based on natural intelligence includes: receiving and storing natural expressions, converting natural expressions into secondary language information, and calculating secondary language information to be converted from natural expressions Converted to the confidence of the standard expression in the database, when the calculated confidence for the first standard expression is not lower than the first confidence threshold, the first standard expression is output as the result of understanding the first natural expression; Below the first confidence threshold, the silent agent understands the stored natural expression.
  • the silent agent can understand the natural expression, the second standard expression obtained by the silent agent is input; when the silent agent cannot understand the natural expression, the silent agent Prompt to re-enter the natural expression with the same meaning or transfer to the senior agent to understand the stored natural expression and respond.
  • the knowledge base designer constructs the background of the speech based on the response of the senior agent to the natural expression that the silent agent cannot understand.
  • the secondary language information converted from the natural expression and the second standard expression are stored in a database as paired data.
  • the confidence level is calculated based on the correspondence between the secondary language information and the standard expression through the deep neural network, the finite state converter, and the automatic encoder decoder One or more of to generate a logarithmic probability or similar score for a single or multiple standard expressions, and then use a normalized exponential function to calculate the confidence level for the expression or the multiple standards.
  • the magnitude of the information granularity of the secondary language information is smaller than the magnitude of the information granularity of the text.
  • the information granularity of the secondary language information is 1/10 to 1/1000 of the information granularity of the text.
  • the natural expression processing method based on natural intelligence for the existing paired secondary language information and standard expression in the database, various permutations and combinations of the elements of the secondary language information are combined with the standard
  • the various permutations and combinations of the elements expressed or expressed by the standard are cyclically iterated to establish the correspondence between the various permutations and combinations of the elements of the secondary language information and the standard expressions or the various permutations and combinations of the elements of the standard expression to obtain more
  • the matching data of the secondary language information and the standard expression are stored in the database.
  • the natural expression processing method based on natural intelligence wherein the secondary language information obtained by loop iteration is used to test the conversion of the secondary language information to the standard expression by the machine, and the secondary language information that cannot be converted correctly
  • the standard expression that should correspond correctly is written into the comparison table, and for the subsequent input natural expression, the secondary language information converted from the natural expression is first compared with the secondary language information stored in the comparison table.
  • the conversion model of the secondary language information to the standard expression is also loop-optimized.
  • a natural expression processing and response method based on natural intelligence includes: obtaining a first standard expression or a second standard expression through the aforementioned natural expression processing method; Standard response matching the standard expression or the second standard expression; output the standard response in a way corresponding to the natural expression.
  • a natural expression processing and response device based on natural intelligence
  • the gateway receives the natural expression from the user, sends it to the central controller for subsequent processing, and sends the response to the natural expression to the user;
  • the central controller receives the natural expression from the dialogue gateway, and works with robots and MAU workstations,
  • the natural expression is converted into a standard expression representing the meaning of the natural expression, and the response generator generates a standard response corresponding to the standard expression according to the standard expression instruction; the robot converts the natural expression into secondary language information according to the instructions of the central controller.
  • the MAU workstation When the calculated confidence for the first standard expression is not lower than the first confidence threshold, convert the secondary language information into the first A standard expression; the MAU workstation presents the natural expression to the external MAU artificial seats. Among them, the MAU artificial seats include silent seats and advanced seats. The silent seats enter or select standard expressions through the MAU workstation, and then the MAU workstation sends the standard expression to the central The controller, when the calculated confidence level is lower than the first confidence level threshold, the silent agent understands the stored natural expression. When the silent agent can understand the natural expression, the silent agent inputs the second standard expression obtained by the understanding.
  • the silent agent prompts the user to re-enter the natural expression with the same meaning or transfers to the senior agent to understand the stored natural expression and respond;
  • the training database is used to store secondary language information and standard expression pairing data;
  • response The database stores response-related data, including standard response data for calling and/or data used to generate responses;
  • the response generator receives instructions from the central controller, and generates natural response to the user by calling and/or running the data in the response database Expressed response.
  • a human-computer interaction system based on natural intelligence which includes: natural expression processing and response equipment and calling equipment, wherein the user communicates with the natural expression processing and response equipment through the calling equipment, MAU Artificial agents perform manual operations on natural expression processing and response equipment.
  • Natural expression processing and response equipment includes: dialogue gateway, central controller, MAU workstation, robot, training database, response database and response generator, among which the dialogue gateway receives from users The natural expression is sent to the central controller for subsequent processing, and the response to the natural expression is sent to the user; the central controller receives the natural expression from the dialogue gateway, and works with the robot and the MAU workstation to convert the natural expression into The standard expression that represents the meaning of the natural expression, and the response generator generates a standard response corresponding to the standard expression according to the standard expression instruction; the robot converts the natural expression into secondary language information according to the instructions of the central controller, and the calculation will be expressed by the natural The converted secondary language information is converted into the confidence level of the standard expression in the training database.
  • the secondary language information is converted into the first standard expression;
  • the MAU workstation presents the natural expression to the MAU artificial seats.
  • the MAU artificial seats include silent seats and advanced seats.
  • the silent seats enter or select a standard expression through the MAU workstation, and then the MAU workstation sends the standard expression to the central controller.
  • the confidence level of is lower than the first confidence threshold, and the silent agent understands the stored natural expression.
  • the silent agent can understand the natural expression
  • the second standard expression obtained from the input of the silent agent is input.
  • the silent agent When the silent agent cannot understand the natural expression, Then the silent agent prompts the user to re-enter the natural expression with the same meaning or transfers to the senior agent to understand the stored natural expression and respond;
  • the training database is used to store secondary language information and standard expression pairing data;
  • the response database stores response related data, It includes standard response data for calling and/or data for generating a response;
  • the response generator receives instructions from the central controller, and generates a response to the user's natural expression by calling and/or running the data in the response database.
  • a method for training a human-computer interaction system based on natural intelligence includes: generating a text script corresponding to a standard expression, and obtaining a voice corresponding to the text script through a text-to-speech conversion tool, Each piece of speech is converted into secondary language information, where the magnitude of the information granularity of the secondary language information is smaller than the magnitude of the information granularity of the text, and the secondary language information and its corresponding standard expression are stored in the database as paired data,
  • various permutations and combinations of the elements of the secondary language information and the standard expression or various permutations and combinations of the standard expression elements are iterated to create Correspondence between the various permutations and combinations of the elements of the secondary language information and the standard expression or the various permutations and combinations of the elements of the standard expression, to obtain more paired data of the secondary language information and the standard expression, and store it in the database in.
  • a voice is input, the input voice is converted into secondary language information, and the secondary language information obtained from the input voice conversion is combined with the database Compare the existing secondary language information in the language information, and then determine the standard expression corresponding to the secondary language information according to the comparison result, and/or calculate the probability that the secondary language information correctly corresponds to a standard expression, if the machine understands If the ability is not mature enough or uncertain to convert the secondary language information to a certain standard expression, then perform artificial assisted understanding, through artificial understanding of the input voice, obtain the standard expression, and obtain the secondary language from the voice The information corresponds to the standard expression, and new paired data is obtained and stored in the database.
  • the method for training a human-computer interaction system based on natural intelligence wherein for the paired data of the new secondary language information and the standard expression or the standard expression combination or the new natural expression and the standard expression or the standard expression Combined pairing data, which combines various permutations and combinations of the secondary language information or elements of secondary language information converted from natural expressions with the standard expression or standard expression combination itself or the standard expression or standard expression combination elements Iterates through the various permutations and combinations of the secondary language information, and establishes the correspondence between the various permutations and combinations of the elements of the secondary language information and the standard expression or standard expression combination itself or the various permutations and combinations of the standard expression or standard expression combination. Get more secondary language information and standard expression or standard expression combination data, and store it in the database.
  • the wrong correspondence between the secondary language information in the database and the standard expression or the combination of standard expressions is corrected by artificially assisted understanding.
  • the self-confidence is used to measure machine understanding ability, and the self-confidence is calculated based on the correspondence between secondary language information and standard expressions.
  • the method for training a human-computer interaction system based on natural intelligence after obtaining secondary language information from natural expressions, it passes through a deep neural network, a finite state converter, and an automatic encoder decoder. One or more are used to generate a logarithmic probability or similar score for a single or multiple standard expressions, and then a normalized exponential function is used to calculate the confidence level for the expression or the multiple standard expressions.
  • the information granularity of the secondary language information is 1/10 to 1/1000 of the information granularity of the text.
  • the secondary language information obtained by loop iteration is used to test the conversion of the secondary language information to the standard expression by the machine, and it will not be correctly converted
  • the secondary language information and the correct corresponding standard expression are written into the comparison table.
  • the secondary language information converted from the natural expression is first compared with the secondary language information stored in the comparison table.
  • the secondary language information is also converted to the standard expression
  • the model is cyclically optimized.
  • a voice processing and response device based on natural intelligence
  • natural intelligence includes: dialogue gateway, central controller, MAU workstation, robot, training database, response database, response generator, text-to-speech conversion
  • the dialogue gateway receives the voice from the user, sends it to the central controller for subsequent processing, and sends the response to the voice to the user;
  • the central controller receives the voice from the dialogue gateway and works with the robot and MAU workstation,
  • the voice is converted into a standard expression representing the meaning of the voice, and the standard expression instruction response generator generates a standard response corresponding to the standard expression;
  • the robot converts the voice into secondary language information according to the instructions of the central controller, where ,
  • the magnitude of the information granularity of the secondary language information is less than the magnitude of the text information granularity, and the secondary language information is converted into a standard expression;
  • the MAU workstation presents the voice to the external MAU artificial agent, and the MAU artificial agent inputs it through the MAU workstation Or select the standard expression, and then
  • the text-to-speech converter is based on the text corresponding to the standard expression
  • the script generates the voice corresponding to the text script.
  • the robot converts the voice obtained by the text-to-speech converter into secondary language information, and stores the secondary language information and the corresponding standard expression corresponding to the text in the training database.
  • the device further includes a trainer for training the robot to convert speech into standard expressions, wherein the robot combines various permutations of elements of secondary language information with corresponding standard expressions or various permutations of elements of the standard expression The combination performs loop iteration, establishes the correspondence between the various permutations and combinations of the elements of the secondary language information and the standard expression or the various permutations and combinations of the elements of the standard expression, and stores the paired data of the obtained secondary language information and the standard expression. In the training database.
  • a human-computer interaction system based on natural intelligence which includes: natural expression processing and response equipment and calling equipment, wherein the user communicates with the natural expression processing and response equipment through the calling equipment, MAU Artificial agents perform manual operations on natural expression processing and response equipment.
  • Natural expression processing and response equipment includes: dialogue gateway, central controller, MAU workstation, robot, training database, response database, response generator, text-to-speech converter, among them,
  • the dialogue gateway receives the voice from the user, sends it to the central controller for subsequent processing, and sends the response to the voice to the user;
  • the central controller receives the voice from the dialogue gateway, and works with the robot and MAU workstation to convert the voice It is a standard expression that represents the meaning of the voice, and the response generator generates a standard response corresponding to the standard expression according to the standard expression instruction;
  • the robot converts the voice into secondary language information according to the instructions of the central controller, where the secondary language
  • the magnitude of the information granularity of the information is smaller than the magnitude of the information granularity of the text, and the secondary language information is converted into a standard expression;
  • the MAU workstation presents the voice to the external MAU artificial agent, and the MAU artificial agent inputs or selects the standard expression through the MAU workstation , And then the
  • the text-to-speech converter generates a response to the user’s voice based on the text script corresponding to the standard expression.
  • the robot converts the voice obtained by the text-to-speech converter into secondary language information, and stores the secondary language information and the standard expression corresponding to the corresponding text in the training database.
  • the device further includes A trainer, which is used to train a robot to convert speech into a standard expression, where the robot performs cyclic iterations of various permutations and combinations of elements of secondary language information with corresponding standard expressions or various permutations and combinations of elements of the standard expression , To establish the correspondence between the various permutations and combinations of the elements of the secondary language information and the standard expression or the various permutations and combinations of the elements of the standard expression, and the obtained secondary language information and the standard expression pairing data are stored in the training database .
  • a trainer which is used to train a robot to convert speech into a standard expression, where the robot performs cyclic iterations of various permutations and combinations of elements of secondary language information with corresponding standard expressions or various permutations and combinations of elements of the standard expression .
  • a method for training a robot includes: training the robot with correct pairing data of expression data and intention data in the training database; the robot understands the expression data, Compare the understanding result with the correctly paired intention data to find the wrong expression data; write the wrong expression data and the corresponding intention data into a comparison table independent of the training database, where the robot first understands Compare the expression data you want to understand with the expression data in the comparison table. If you find that the expression data is in the comparison table, you can directly find the corresponding understanding result through the comparison table. If the expression data is not found in the comparison table, then Then compare them in the training database.
  • the expression data is secondary language information converted from natural expression.
  • the information granularity of the secondary language information is 1/10 to 1/1000 of the information granularity of the text.
  • the method for training a robot for the paired expression data and intention data already in the training database, various permutations and combinations of the elements of the expression data are combined with the intention data or the intention data.
  • the various permutations and combinations of the elements of the element are looped and iterated to establish the correspondence between the various permutations and combinations of the elements that express the data and the various permutations and combinations of the intent data or the elements of the intent data, and obtain more expression data and intent data
  • the paired data is stored in the training database.
  • the conversion model from the expression data to the intention data is also cyclically optimized.
  • the comparison table is also used to store expression data with a higher occurrence probability and intent data corresponding thereto.
  • the wrong correspondence between the expression data and the intention data in the training database is corrected by artificially assisted understanding.
  • a script corresponding to the intention data is generated, the natural expression corresponding to the script is obtained through a conversion tool, and the expression data is converted from the natural expression, thereby obtaining the expression data and the intention The correct pairing of data.
  • the script is a text script
  • the natural expression is speech.
  • One or more parameters of the speech speed, volume, tone, and intonation of the changed speech are adjusted through the text-to-speech conversion tool.
  • a natural expression processing and response device which includes: a dialogue gateway, a central controller, a MAU workstation, a robot, a training database, a response database, and a response generator, wherein the dialogue gateway receives The natural expression of the user is sent to the central controller for subsequent processing, and the response to the natural expression is sent to the user; the central controller receives the natural expression from the dialogue gateway, and works with the robot and the MAU workstation to convert the natural expression In order to express the intention data of the meaning of the natural expression, and according to the intention data, the response generator generates a standard response corresponding to the intention data; the robot converts the natural expression into expression data according to the instructions of the central controller, and obtains the corresponding expression data.
  • the MAU workstation presents the natural expression to the external MAU artificial agent, the MAU artificial agent inputs or selects the intent data through the MAU workstation, and then the MAU workstation sends the intent data to the central controller;
  • the training database is used to store the expression Matching data of data and intention data;
  • the response database stores response related data, including standard response data for calling and/or data used to generate responses;
  • the response generator receives instructions from the central controller and calls and/or runs the response database
  • the device further includes a trainer for training the robot to obtain intent data from the natural expression, wherein the trainer uses the aforementioned method to train the robot.
  • a human-computer interaction system which includes: a natural expression processing and response device and a calling device, wherein the user communicates with the natural expression processing and response device through the calling device, and the MAU artificial seat responds to the natural
  • the expression processing and response equipment is manually operated.
  • the natural expression processing and response equipment includes: dialogue gateway, central controller, MAU workstation, robot, training database, response database, and response generator.
  • the dialogue gateway receives natural expressions from users, Send to the central controller for subsequent processing, and send the response to the natural expression to the user;
  • the central controller receives the natural expression from the dialogue gateway, and works with the robot and the MAU workstation to convert the natural expression into the natural expression
  • the response generator generates a standard response corresponding to the intention data
  • the robot converts the natural expression into the expression data according to the instructions of the central controller, and obtains the intent data corresponding to the expression data ;
  • MAU workstation presents the natural expression to the external MAU artificial agent, the MAU artificial agent inputs or selects intent data through the MAU workstation, and then the MAU workstation sends the intent data to the central controller;
  • the training database is used to store expression data and intent data Matching data;
  • the response database stores response related data, including standard response data for calling and/or data used to generate the response;
  • the response generator receives instructions from the central controller and generates it by calling and/or running the data in the response database In response to the natural
  • an end-to-end control method which includes: when an operator controls the device, collecting sensor data from the external environment of the controlled device and/or the controlled device itself through sensors, and real-time Record the control data generated by the manipulator’s manipulation; input the time-related sensor data and control data as pairing data to the robot; the robot uses the pairing data for training; among them, the trained robot makes judgments based on the sensor data.
  • Controlling the device includes: inputting the sensor data collected by the sensor into the robot; the robot determines whether it can determine the control data corresponding to the sensor data, and if the control data can be determined, the robot determines the control data according to the control data corresponding to the sensor data. Control the device; if the robot cannot determine the control data, the operator controls the device.
  • the sensor collects sensor data from the external environment of the controlled device and/or the controlled device itself through the sensor , And record the control data generated by the operator's manipulation in real time, and use the sensor data and control data as paired data to train the robot.
  • the model that corresponds the sensor data to the control data is automatically optimized.
  • the robot judges whether the control data corresponding to the sensor data can be determined based on the degree of confidence, wherein the degree of confidence is calculated based on the corresponding relationship between the sensor data and the control data, and One or more of deep neural network, finite state converter, autoencoder and decoder to generate logarithmic probability or similar score of control data, and then use normalized exponential function to calculate the control data Confidence.
  • the sensor data includes an image
  • the robot converts the image into secondary image information before training, and the information granularity of the secondary image information is coarser than pixels but larger than that used for object recognition The information granularity is fine.
  • the sensor data includes voice
  • the robot converts the voice into secondary voice information before training, and the information granularity of the secondary voice information is less than that of text. Orders of magnitude.
  • the sensor data and the control data are correlated within a preset time interval.
  • the sensor data is collected from the image, sound and distance collected by the sensor from the external environment of the controlled device and/or the controlled device itself within the preset time interval. One or more changes over time.
  • the device is a vehicle, a drone or a digital processing terminal.
  • an end-to-end control system which includes: sensors, robots, and controllers.
  • the sensor controls the device from the external environment and / Or the controlled device itself collects sensor data, and records the control data generated by the controller caused by the manipulator's control in real time; inputs the sensor data and control data associated in time to the robot as pairing data; the robot uses the pairing data to perform Training; where the trained robot makes a judgment based on sensor data and controls the device, including: inputting sensor data collected by the sensor into the robot; the robot judges whether it can determine the corresponding control data based on the sensor data, if it can If the control data is determined, the robot produces a control signal according to the control data corresponding to the sensor data, and the controller controls the device according to the control signal; if the robot cannot determine the control data, the operator controls the device through the controller.
  • Figure 1 outlines the layer-by-layer conversion process from the collected sound waves (A language information) to Y language information;
  • Figure 2 shows an example of conversion from collected sound waves (A language information) to Y language information
  • Figure 3 shows an example of recognizing voice information
  • Figure 4 is a schematic diagram of the principle of multi-layer perception
  • Figure 5 shows an example of using Gaussian mixture model to convert collected sound waves into X language information
  • Fig. 6 schematically shows a flow of a natural expression processing method according to an embodiment of the present invention
  • FIG. 7 schematically shows the flow of a natural expression processing and response method according to an embodiment of the present invention.
  • Fig. 8 schematically shows a process of information extraction and slot filling of a human-computer interaction system based on natural intelligence according to an embodiment of the present invention
  • Fig. 9 further exemplarily shows the filling processing flow of the natural expression slot under the inquiry item of "book a ticket";
  • Fig. 10 schematically shows an intelligent human-computer interaction system according to an embodiment of the present invention
  • Figure 11 further shows part of the structure of the intelligent answering device in the system of Figure 10;
  • 12A to 12P schematically show the operation interface of the intention acquisition and slot filling system according to an embodiment of the present invention
  • FIG. 13 schematically shows a natural expression processing process combining robot understanding and manual assisted understanding (MAU) according to an embodiment of the present invention
  • Fig. 14 schematically shows an example of the operation interface presented by the MAU workstation to the MAU artificial agent 9;
  • Figure 15 shows an example of intelligent human-computer interaction
  • FIG. 16A and 16B schematically show the marking of road objects in the image obtained by the sensor
  • FIG. 16C schematically shows the marking of the traffic road indicator in the image obtained by the sensor
  • Figure 17 schematically shows an end-to-end control system based on natural intelligence according to an embodiment of the present invention
  • Fig. 18 schematically shows a process in which a trained robot makes a judgment based on sensor data and controls the controller.
  • AI Artificial Intelligence
  • NI Natural Intelligence
  • AI-NLP technology In the field of natural language processing (NLP), artificial intelligence-based natural language processing technology (AI-NLP technology) needs to first transcribe natural language speech into text, and then perform natural language understanding (NLU, Natural Language Understanding). Using this method as described below has great disadvantages, but there are historical reasons for it.
  • artificial intelligence-based speech recognition technology also uses the same methodology and similar methods to generate paired corpus between speech and text through collection and labeling, and use grammatical models and semantic models for learning. Still take the accuracy of speech recognition as the value orientation.
  • this method of recognizing speech as text and then understanding the recognized text itself makes the accuracy of language understanding have a low theoretical limit.
  • the process of recognizing speech as text itself loses a lot of information (for example, the uncompressed data volume of the sound of 5 minutes of dual-channel, 16-bit sampling number, 44.1kHz sampling frequency is about 50MB, and if it is per minute Speaking speed of 200 Chinese characters, 1000 Chinese characters in five minutes, the corresponding data volume is 2KB, the difference is 25000 times), and this part of the lost information is likely to contain the key information required for language understanding.
  • the information granularity of the recognized text is very coarse compared with the original speech.
  • Natural intelligence is the imitation of human intelligent behavior, which is based on gray-scale logic. Specifically, in the process of interacting with the outside world, the human brain does not first convert the expression of external perception (sound, image, contact, taste, etc.) into text and then understand it, but directly analyzes and understands the information obtained through the perception of sensory organs . This understanding is based on existing knowledge and experience (experience can also be understood as probabilistic) to obtain the meaning of external expression from the perceived information, so it may actually be biased. For example, when an observer sees a person shaking his head, he usually thinks that the observer expressed a negative attitude.
  • artificial intelligence it can recognize the external expression as text describing the action through image recognition or video recognition—"shaking his head” or “someone shaking his head gently” or longer sentences, but it cannot According to the recognized text content itself, whether the application result based on general cognitive rules is correct is checked. This is because artificial intelligence only collects the information used to determine the action (which may include the subject of the action) for recognition, and filters out other information, but the filtered information happens to include the information used to determine the action The key information of the true intention expressed. And the loss of these key information is irreversible and cannot be retrieved from the text describing the recognized action.
  • the so-called road objects include surrounding street scene data, such as intersections, viaducts, tunnels, urban roads, etc., as well as pedestrians, vehicles, traffic lights, indicator signs, prohibition signs, etc.
  • unmanned vehicles need to mark traffic road signs during route planning, for example, go straight, turn left, turn right, no traffic, no driving, etc., as shown in Figure 16C.
  • Artificial intelligence-based autonomous driving schemes will have various problems under complex road conditions, which are caused by the inherent defects of the above-mentioned "label + identification + rule” methodology.
  • the aforementioned artificial intelligence-based speech processing method first converts speech into text, then recognizes the text, and then uses artificially established grammatical and semantic rules to analyze the meaning of textual sentences; similarly, this type of artificial intelligence
  • the automatic driving or automatic control method is to first mark the road objects, then use the model to identify and track the marked parts, and then control based on the control rules according to the results of the identification and tracking.
  • this kind of automatic driving or automatic control method will also cause the problem of ignoring or insensitive to other surrounding environment information due to the recognition of objects, and thus cause the judgment error of the surrounding environment or road conditions due to the lack of information; on the other hand, this
  • the control rules in this method are clear and mechanical, and there is no flexible discretion based on their own experience by human drivers or controllers.
  • a dog quickly runs to the corner of the intersection, and the artificial intelligence-based autonomous driving scheme will identify the dog and track it, and find that it is running at a certain speed
  • the front direction of the car instructs the vehicle to slow down or brake in advance to avoid; but the human driver will notice that the dog’s owner is a certain distance behind the dog and is holding the dog’s rein, so he can judge based on experience that the dog will not or There is a high probability that it will not rush to the front of the vehicle, so drive the vehicle to continue or decelerate slightly.
  • the above-mentioned problems of automatic driving and automatic control based on artificial intelligence methodology can basically be avoided because it does not adopt the architecture of "label + recognition + rule". Specifically, first, when processing the image information obtained by the sensor, it is not necessary to use the object detection model to identify the object, determine the position of the object, and track the object. It only needs to sample the image information, such as from Pixel-level image information obtains secondary image information whose information granularity is coarser than pixels but finer than that used for object recognition; then, objects are not identified from these secondary image information, but are compared with those at the same time point.
  • the control parameters are correlated to form paired data and stored in the database; the image information (video composed of multiple frames of images) obtained by the sensor at a certain time interval is analyzed to detect the changes of the secondary image information over time; the secondary image information The change over time corresponds to the change of human control behavior or control parameters, forming paired data and storing it in the database; the robot is based on the pairing data of secondary image information and control parameters, and the change and control behavior of secondary image information or The paired data of the change of the control parameter is trained; the trained robot can determine the corresponding control behavior or the change of the control parameter according to the secondary image information and the change of the secondary image information; through the control behavior and control parameters output by the robot The amount of change to be controlled.
  • the control behavior or the change of the control parameter can be obtained by recording the behavior or result of the human on-site control or remote control (leading to the change of the control parameter), and at the same time can be real-time with the image data collected by the sensor or the secondary image obtained by conversion
  • the data wants to correspond to form paired data.
  • the control method or control system that adopts the natural intelligence methodology does not need to identify objects through annotations and models, nor does it need to establish a large number of rules for control. It only needs to imitate human perception and corresponding control behaviors to automatically Train the control model and use the trained model to realize automatic control. This saves the massive labor cost required for labeling and rule establishment, and can avoid potential security risks similar to the above-mentioned labeling errors or incomplete rules.
  • sensors can also introduce sound information, such as warning sounds of special vehicles, whistling of surrounding vehicles, human voices, animal sounds, thunder and rain and other natural sounds.
  • the sensor can even introduce odor information, such as the smell of gasoline.
  • information obtained by other sensing devices such as radar and ultrasound that detect the surrounding environment, information obtained by monitoring the parameters of the controlled device, etc.
  • information obtained by monitoring the parameters of the controlled device can all be used as raw data for training robots . Converting these raw data into secondary data and then performing data pairing and training can reduce the amount of data and calculations.
  • the various types of data obtained by the sensor are directly processed (and It is also feasible not to perform coarse-grained conversion of images, sound waves, etc.).
  • Sound waves is the physical layer data collected by sound wave acquisition devices (such as microphones).
  • B language is a language formed by various permutations and combinations of B elements.
  • B elements can be phonemes, and certain permutations and combinations of B elements form syllables.
  • phoneme and “syllable” have the same meaning in the category of linguistics.
  • Fig. 2 shows examples of B elements, which are Chinese (Chinese) phonemes.
  • C language is a language formed by various permutations and combinations of C elements. All or part of the permutation and combination of B elements form C elements, so it can also be understood that B language is converted to C elements, and C elements constitute C language. Therefore, the conversion relationship from B language to C language is a "many-to-many" relationship. If the linguistic system of phonemes and syllables is used, the C element corresponds to the "words" in natural language.
  • Figure 2 shows examples of C elements, which are Chinese characters.
  • D language is a language formed by various permutations and combinations of D elements. All or part of the permutation and combination of C elements form the D element, so it can also be understood as the conversion of C language to D element, and D element constitutes D language. Therefore, the conversion relationship from C language to D language is also a "many-to-many" relationship. If the linguistic system of phonemes, syllables, and characters is used, the D element corresponds to the "word” or "phrase” in natural language.
  • Figure 2 shows examples of D elements, which are Chinese words.
  • the conversion from A language information (sound waves) to B language information (phonemes) can generally be completed automatically by a robot relatively accurately.
  • the conversion from B language information (phoneme) to C language information (words) may have a higher error rate.
  • the original language information entered by the customer is "Ping Pong is auctioned off”. It may be that "Ping Pong” may be identified as “Ping Pong” because of the customer's pronunciation or accent. , "Bai” may be recognized as “fear”, and as a result, this sound wave is finally converted into the seven words "I am afraid that the tablet is sold out”.
  • the recognition result of the robot needs to be corrected, usually by means of artificial assisted recognition.
  • the artificial assisted recognition at this stage is called Transcription.
  • the so-called transliteration means that the transcriber uses a specific transcribing tool to accurately cut the "sonic wave" (A language information), and then convert the cut bands into corresponding "words” (C language information). It is to define the conversion/translation relationship of A language (sound wave) ⁇ C language (word) for the robot.
  • Whether the cutting is accurate depends on whether the transcriber is careful enough and familiar with the transcribing tools; and whether it can be accurately converted into the corresponding "word", the key depends on the context of the transcriber on the sound wave , And the context (other sound waves before and after this sound wave), whether it has been accurately understood. Especially for Chinese characters, there are many homophones, which makes it more difficult for transliterators to work accurately.
  • D language information (words, phrases) is obtained from C language information (words).
  • the conversion from word to word will also cause ambiguity.
  • the recognition from sound wave to word is accurate, and the result of the seven-character sequence of "Ping pong auction finished” is obtained, it will still be converted to at least “Ping Pong racket”
  • the meanings of the two results "+sold+over” and “table tennis+auction+over” are obviously different.
  • manual identification can be used to correct.
  • the artificial assisted recognition at this stage is called Keyword Spotting, or “word cutting” for short.
  • the word cutter combines the transcribed “characters” (C language information) to form “words (key words).
  • Word D language information
  • word D language information
  • word D language information
  • the accuracy of word segmentation often depends on the level of knowledge of the word segmentation personnel. For different fields, people who are familiar with the business content and terms of the field are required to perform word segmentation operations, and the cost will be higher than that of transcription.
  • D language information ie Y language information
  • D language information ie Y language information
  • the robot cannot recognize its meaning, and the technician will put “my”, “credit card”, “missing” as new keywords into the grammar table of the database; another The customer said: “My swipe card is lost” and the robot can't recognize its meaning.
  • the technician uses "my”, “swipe card” (meaning “credit card”), and “lost” as new keywords Put it into the syntax table of the database. In this way, through manual assistance, the meaning or needs of customers are understood and summarized into the database.
  • Keyword Pile-up This kind of artificial auxiliary recognition is called Keyword Pile-up, or "Pile-up” for short, which is to accumulate the permutation and combination of "words” and put them into the database according to their meanings.
  • the workload of this work is also huge, and it also requires the professional knowledge of training personnel to assist in understanding.
  • Multi-Layer Perception MLP
  • the principle shown in Figure 4 has its drawbacks. The point is: every conversion will cause the original information to be distorted to a certain extent, and at the same time will add more processing load to the system, causing further performance loss. The greater the number of conversions, the greater the distortion of the original information, and the slower the processing speed of the system. In the same way, since the robot training in the aforementioned processing process requires the intervention of human-assisted recognition, on the one hand, it will produce a high workload and cost, on the other hand, multiple human interventions will also increase the probability of error.
  • X language is logical layer data obtained after speech signal processing (SSP, Speech Signal Processing) is performed on A language data, which is called "X language" in the embodiment of the present invention.
  • X language is a language formed by various permutations and combinations of X elements.
  • the X element is the system through a certain modeling tool, such as: Gaussian Mixture Model (GMM, Gaussian Mixture Model), the sound wave is automatically cut into several column-shaped elements with different heights.
  • Fig. 5 shows an example of using a Gaussian mixture model to convert the collected sound waves (represented by a histogram) into X elements (represented by a vector quantization histogram).
  • the number of X elements can be controlled within a certain range (for example, less than 200).
  • a combination of 2-bit ASCII characters can be defined as the ID of the X element, as shown in FIG. 2.
  • the cut sound wave unit corresponds to the X element one-to-one.
  • the A language information can be considered as a combination of sound wave units, and the X language information is a combination of X elements, the conversion relationship from A language to X language is "many-to-many" "Relationship.
  • Fig. 3 also shows an example of the X element represented by ASCII characters.
  • the process of natural language processing or natural language understanding based on the aforementioned AI principles does not involve X language information.
  • the reason why the X language (X element) layer is identified in Figure 1, Figure 2 and Figure 3 is that Explain that from the perspective of information granularity, the X element is located between sound waves and phonemes; on the other hand, it shows that natural language processing or natural language understanding can also be A ⁇ X ⁇ B ⁇ C or D and A ⁇ X ⁇ B ⁇ C ⁇
  • the path of D that is, it is also possible to use the X element as the intermediate data for the conversion between the A language (sound wave) and the B language (phoneme).
  • Y language refers to language information that reflects the "meaning” or “meaning” obtained after understanding the original natural language information A.
  • the "standard expression” defined in the embodiment of the present invention is a form of "Y language”.
  • natural language processing or natural language understanding based on natural intelligence first takes the irregular natural expression information expressed in the form of physical data from the user, such as sound waves (ie "A language information"), through some construction Model tool to perform basic automatic recognition or conversion to obtain language information ("X language information") in the form of arrangement and combination of several basic elements ("X elements”), and then recognize or convert the X obtained from A language information
  • the language information is then converted into some form of standard expression ("Y language information”). That is to say, the processing path of A ⁇ X ⁇ Y is adopted without conversion into “characters” and “words” (“C language information” and “D language information”), nor into phonemes (“B language information” ), as mentioned earlier, this is an important difference between natural intelligence and artificial intelligence in natural expression processing.
  • this difference is a difference in processing path and a methodological difference. Therefore, the multi-layer "many-to-many" relationship conversion of B ⁇ C ⁇ D ⁇ Y can be omitted, the accuracy and efficiency of the expression information conversion can be improved, and the workload and error rate of manual auxiliary recognition can also be reduced.
  • the natural expression processing based on natural intelligence does not need to convert the expression into text when processing the expression of non-text information, but into X language information, this kind of X language information It has much finer information granularity than text, so it has higher key information recognition accuracy as mentioned above.
  • the information granularity of X language information and the information granularity of text are orders of magnitude difference.
  • the information granularity of text is 1, then the information granularity of X language information is 1/10, 1/100, 1/ 1000 and so on; on the other hand, because the X language information is obtained by sampling and converting A language information (sound waves, images, etc.), the X language information is more granular than the A language information, such as the previous one.
  • the information granularity of the text is 1, then the information granularity of the sound wave is such as 1/10000, 1/100000, 1/1000000 and so on.
  • the aforementioned B language information (constituted by phonemes), C language information (constituted by syllables), and D language information (constituted by words or phrases) are basically in the same order of magnitude as the information granularity of the text, so they are in the same order as the X language information When comparing the level of information granularity, it is similar to text.
  • the correspondence from A language information to Y language information can be applied to dialects and mixed languages or mixed voices, for example, Chinese mixed with English, Cantonese mixed with Mandarin, Shanghainese mixed with English and Mandarin, and More subdivided languages, dialects and their mixtures can even be applied to the mixture of multiple ways of speaking, and the understanding accuracy will not be affected.
  • Chinese mixed with English Chinese mixed with English
  • Cantonese mixed with Mandarin Chinese
  • Shanghainese mixed with English and Mandarin Chinese mixed with Chinese and Mandarin
  • More subdivided languages, dialects and their mixtures can even be applied to the mixture of multiple ways of speaking, and the understanding accuracy will not be affected.
  • For the NLP technology of artificial intelligence even if a massive grammar model is made and a great cost is paid, it is impossible to obtain high understanding accuracy.
  • conditions such as mixed languages, mixed dialects, and mixed speaking styles will cause the exponential growth of grammar models, which is simply impossible to achieve.
  • the above uses natural language processing and natural language understanding as examples to illustrate the similarities and differences between natural intelligence and artificial intelligence.
  • the natural expression methods of human beings are diverse.
  • the natural expression from customers that is, "A language information” can be divided into the following four categories: text information, voice information, image information, and animation information.
  • the text information expression can be: the customer enters text through the keyboard to express himself, for example, the customer types "how much money in my savings account?" on the user interface of a bank's Internet channel call center;
  • the image information expression can be: Customers express themselves through images.
  • voice information expression can be: customers express themselves through words, for example, During a conversation with a customer service specialist of a bank’s service hotline (telephone channel call center), the customer said on the phone: "What do you mean? I don't understand too much”; animation (or "video") information expression can be : The customer expresses his disagreement by shaking his head in front of the camera (this is similar to the general situation described above).
  • the natural expression of the customer is automatically recognized or converted to obtain information expressed in a certain language.
  • the A language information is voice information
  • the sound wave waveform information can be collected by a modeling tool and automatically recognized or converted into a certain X language (corresponding to the voice information) by the system (intelligent robot);
  • the A language information is graphic information ,
  • the graphics pixel information can be collected by the modeling tool and automatically recognized or converted into the X language (corresponding to the image information) by the system (intelligent robot);
  • the A language information is animation information, for example, it can be collected by the modeling tool
  • Graphic pixel information and image change speed information are automatically recognized or converted into X language (corresponding to animation information) by the system (intelligent robot);
  • the A language information is text information, the text information is converted to a character unit (base Meta) X language or no conversion.
  • the X language information obtained from the automatic conversion of the A language information or the text information that does not need to be converted is further processed to obtain a regular standard expression (Y language information) that can be "understood" by the computer or other processing equipment.
  • Y language information can be automatically processed by computer business systems.
  • the regularized standard expression (Y language information) can be realized by regularized coding.
  • regularized coding For example, the following number + English letter coding method is adopted, which includes industry code, industry business code, organization code, organization business code, and expression information code.
  • Dialect code (3 digits 1-999)
  • the industry code represents the industry where the service provider is pointed to by the random natural expression (A language information) from the customer.
  • a language information For example, it can be represented by two English letters, which can cover 676 industries.
  • add The three-digit English letter sub-industry code can add 17,576 sub-industries covering each industry.
  • the code can basically cover all common industries; the industry business code represents the service demand pointed to by the A language information from the customer, and it can also be represented by multiple Arabic numerals.
  • the organization code indicates the service provider pointed to by the A language information from the customer, for example, it can identify the country and city where the agency is located; the agency business code indicates the internal personalized business division of the service provider, which is convenient The organization conducts personalized internal management; the expression information code represents the identifying information of the client's A language information itself, which can include the type of information, the type of language, etc., which are represented by numbers and letters.
  • ⁇ BNK Bank (sub-industry)
  • ⁇ 2710000000 First-level industry business category—2 (credit card) Second-level industry business category—7 (adjustment of credit limit) Third-tier industry business category—1 (increased credit limit) 0000000 (no further sub-categories)
  • the organization code is,
  • the institutional business code is,
  • ⁇ 00000 Institutional business category (In this Y language information, there is no institutional business category defined by "ICBC Headquarters", which means that the Y language information belongs to the industry business category and is universal for the banking industry.)
  • the code for expressing information is,
  • ⁇ 02 Voice (The type of A language information provided by the customer is "voice")
  • the A language information corresponding to the Y language information can be, such as, "My credit card limit is too small”, “I want to increase my credit card limit”, “I want to reduce my credit card limit”, Voice messages such as "I need to adjust the credit card limit”.
  • the above-mentioned industry code, organization code, and organization business code can all be preset as system default values. In other words, it is enough to obtain the business code and expression information code from the A language information provided by the customer.
  • the Y language information can be expressed as "271000000002zh-CN003"; or, if it is 3 digits for a specific application
  • the number is enough to represent the industry business code, and it can be further expressed as "27102zh-CN003"; in addition, if it is only for voice services, it can be expressed as "271zh-CN003"; if only the expression of customer needs is considered, no concern Express your own type information, even just use "271".
  • Example 2 TVTKT11200000000014047730305000000000001240003fr-CH000
  • ⁇ TKT Ticketing (sub-industry)
  • ⁇ 1120000000 First-level industry business category—1 (air ticket) Second-level industry business category—1 (ticket change) Third-level industry business category—2 (postponed) 0000000 (no further sub-categories)
  • 001404773030500000000000 Country code 001 (United States) 404 (Georgia, Atlanta) 773030500000000000 (U.S. Delta Airlines)
  • ⁇ 12400 First-level organization business scope-1 (discount ticket) Secondary organization business scope-2 (low season) Third-tier organization business scope-4 (Asia-Pacific region) 00 (no further sub-categories)
  • ⁇ 03 Image (The type of A language information provided by the customer is "image". For example, when a customer encounters a system error when changing a ticket on the Delta official website, the customer will take a screenshot as a natural expression of seeking help from Delta customer service .)
  • the A language information corresponding to the Y language information is obtained through image recognition.
  • the above-mentioned industry code and organization code can be preset as system default values.
  • the digits represent the business code of the industry, and the three digits represent the business code of the organization, which can be represented by "112124" only.
  • the natural expression from the customer often reflects the specific needs of the customer.
  • a language information is automatically converted into X language information or language information without conversion (when A language information is text Information time), and then convert the X language information or text language information into a standard expression in coded form (Y language information).
  • Y language information can include industry code, industry business code, organization code, organization business code, and expression information code.
  • the A language information can also include specific parameters that reflect customer needs (which can be called "demand parameters"), such as: "transfer 5000 yuan to Zhang San” (example 1), "I want to see one The movie is called "Chinese Partners" (Example 2) and so on.
  • the specific demand code set (for example, including one or more of the aforementioned industry code, industry business code, organization code, organization business code, and expression information code) corresponds to a specific parameter set.
  • the demand code for "watching a movie" is 123
  • its corresponding parameter set can include the parameter: movie name.
  • the Y language information corresponding to this A language information is "123 ⁇ Chinese partner>”.
  • 123 is the requirement code
  • the five Chinese characters in ⁇ > are the requirement parameters.
  • There are many ways to distinguish between demand codes and demand parameters in Y language information such as using symbols such as " ⁇ >", spaces, or in a specific order.
  • IVR Interactive Voice Response
  • IVR Interactive Voice Response
  • IVR Interactive Voice Response
  • Customers can dial a designated phone number to enter the system and follow the instructions of the system , Type in the appropriate options or personal information to listen to the pre-recorded information, or combine the data through the computer system according to the preset procedure (Call Flow) to read out specific information (such as account balance, payable amount, etc.) by voice, You can also enter transaction instructions through the system to perform preset transactions (such as transfers, changing passwords, changing contact phone numbers, etc.).
  • Organizations can construct special coding rules for standard expressions and dialogue techniques based on standard expressions to standardize the internal customer service dialogue data system and protect data (even if the third party knows the Y language information of the organization, they cannot understand the code correspondence Dialogue data).
  • the service provider of the intelligent expression processing engine provides the conversion service from A language information to Y language information, even if they understand the data of A language information and the corresponding Y language information (for example, a set of numbers or numbers + Letter code), and the corresponding meaning of the Y language information in the customer service dialogue database of the user organization will not be known, so that data security and intelligent expression processing services can be provided.
  • the aforementioned process of converting A language information into X language information can be implemented by voice signal processing technology, voice recognition technology, image recognition technology and video processing technology, and these technologies can also be existing technologies.
  • the coding standard expression idea according to the embodiment of the present invention can also be applied to the recognition process of natural expression, and X language information is expressed through regular coding.
  • the natural expression processing method of the embodiment of the present invention first, the natural expression of the customer (A language information) is automatically converted to obtain the X language information, or the C language information is directly obtained without conversion (when the A language information is text information); Then the X language information or C language information is converted into Y language information.
  • a language information A language information
  • C language information C language information
  • the irregular natural expression information such as text, voice, graphics, and video is converted into X language information; then X language is used as the left language, Y language As the right language, through the use of machine translation (MT, Machine Translation) technology, the conversion of X language information to Y language information is realized.
  • MT Machine Translation
  • the "Speech Signal Processing” technology to automatically convert/translate language A into language X (based on the current "speech signal processing" "Technology, the conversion accuracy rate of A ⁇ X can be as high as 95% or more, and the improved "voice signal processing” technology does a better job in noise reduction, which can increase the conversion accuracy rate of A ⁇ X to more than 99%); Then use machine translation technology to realize the automatic machine translation of X ⁇ Y, without the need to go through the multi-layer conversion of X ⁇ B ⁇ C ⁇ D ⁇ Y.
  • a machine translation algorithm similar to that based on statistical analysis of example samples can be used to convert the converted irregular natural expression (X language information) into a regularized standard expression (Y language information).
  • This kind of machine translation algorithm requires that the amount of corresponding data between X language and Y language is large enough and accurate enough.
  • the natural intelligence-based solution provides a new artificial agent working mode, MAU (Mortal Aided Understanding).
  • MAU Magnetic Aided Understanding
  • the corresponding data accumulation between language A and language Y is realized.
  • the demand code "271” can be used to indicate the meaning of adjusting the credit card limit.
  • “21” can also be used to indicate the meaning of credit card loss reporting. In this way, "21” can be used to correspond to the aforementioned "My Naturally expressed information such as "credit card is missing" or "my swipe card is lost".
  • the said MAU can use the existing simple code input method to convert the traditional "speaking seat” to "non-speaking seat”-silent seat, which makes the work of the seat more comfortable and the work efficiency can be greatly improved.
  • it makes full use of human understanding of the highest value, collects massive amounts of A/X language and Y language corresponding data accurately and at a high speed, and provides it to the MT engine for loop iteration and self-learning A/X ⁇ Y conversion /Translation rules, forming and optimizing the A/X ⁇ Y translation model.
  • Machine translation is an artificial intelligence technology used to automatically translate two languages.
  • the "language” referred to here is not a narrow national language (for example: Chinese, English%), but a broad way of expressing information.
  • language can be divided into four categories: text, voice, image, animation (or "video").
  • Language is information formed by various permutations and combinations of elements in an element set.
  • English text is a language formed by 128 ASCII characters (elements) in the ASCII character set (element set) through various one-dimensional (serial) permutations and combinations
  • the language of Chinese is defined by the national standard code Thousands of Chinese characters in the text plus the infinite arrangement and combination of punctuation marks (basic elements that constitute Chinese information)
  • RGB plane image is composed of three sub-pixels of red, green, and blue, through various two-dimensional Wide) another language formed by permutation and combination.
  • the MT robot is an iterative loop of permutation and combination of elements that make up the language.
  • the English "May I have your" 15 ASCII character elements (3 English letters "May” + 1 space + 1 English
  • the permutation and combination of the letter "I” + 1 space + 4 English letters "have” + 1 space + 4 English letters "your") corresponds to the permutation and combination of the Chinese characters "Excuse me” in the 3 national standard codes ;
  • the permutation and combination of the three ASCII character elements in English "age” corresponds to the permutation and combination of the two Chinese characters in the Chinese "year” code.
  • the robot can accurately translate the English "May I have your age?” in the inspection data sheet into Chinese "May I have your age?”, it proves that the robot has learned the Chinese and English translation of this sentence. If it can't, it proves that the robot hasn't learned yet. Then the robot needs to modify its own learning method (for example, find another path to try to learn again), digest the training data table again, this is another iteration; ... so this "iterative correction" is repeated continuously, It can make the robot's translation accuracy continuously climb. When it climbs to a certain level (for example, the translation accuracy rate is 70%), the robot's translation accuracy rate may remain at this level, and it is difficult to go up again, which means that it encounters the bottleneck of "machine self-learning". Then you need to add MT training data table data for the robot.
  • the data of the MT training data table can be imported from an external database, or can be generated and added through "manual assisted understanding".
  • new natural expression examples such as the above natural expression "My credit card can be overdrawn too little", and its corresponding standard expression "271" are added to the existing MT training data table, thereby increasing and updating MT training Data sheet data. Therefore, through “artificial assisted understanding", on the one hand, it can realize the accurate and stable conversion of the natural expression of the target (convert it to the standard expression-Y language information), on the other hand, it can realize the efficient addition of MT training data table data and The update makes the data in the system MT training data table richer and more accurate, and may also improve the accuracy of the robot's translation (conversion) efficiently.
  • the MT robot needs to exhaustively list all the permutations and combinations of the 20 ASCII character elements of the left value of #3 "May I have your time", and the right value of #3 "May I have your time”. All permutations and combinations of characters in the 10 national standard codes are listed exhaustively. That is, the MT robot needs to exhaustively list all permutations and combinations of the left and right two sets of elements of each pair of data in the training data table.
  • the MT robot must be able to find many repetitive permutations and combinations (such as “your”, “May I have your”, “age”, “time”, “you”, “may you”, “Age”%), so as to find out a certain correspondence between the permutations and combinations of these repetitive left language elements and the right language elements, which is the translation model between the two languages.
  • the machine translation between X language ⁇ Y language in the present invention is the same as the machine translation principle of Chinese and English, except that we changed English to X language and Chinese to Y language, and the left and right languages The set of elements is different.
  • machine translation technology can be used to automatically translate one language into another.
  • the technical principle is to perform basic element-level analysis on the collected pairing information of the two languages (left language and right language), and through iterative comparison of various permutations and combinations of the basic elements of a large number of language information pairs to find Develop the rules of conversion/translation between the two languages and form a translation model for the two languages.
  • the technology of the present invention extends the application range of machine translation technology from automatic translation between different national languages to automatic conversion of all irregular multimedia natural expression information (text, voice, image, video, that is, A language information) into The regularized standard information (Y language information), so that the business systems of all walks of life can process them, so as to realize the true and practical natural expression processing.
  • irregular multimedia natural expression information text, voice, image, video, that is, A language information
  • Y language information Y language information
  • the natural expression processing according to the embodiment of the present invention can be restricted to the specific business of a specific industry organization.
  • the above-mentioned credit card business the scale of the MT training data table required by the processing system can be greatly reduced, thereby increasing While the robot understands the maturity threshold, it reduces the cost of constructing and maintaining the MT training data table, and it can also effectively shorten the maturity period of the A/X ⁇ Y translation model.
  • the natural expression processing system realizes the conversion from natural expression to coded standard expression.
  • the basis of this conversion lies in the MT training data table (ie, training database) storing the paired data of A/X language information and Y language information, and the A/X ⁇ Y translation model obtained on the basis of the MT training data table. Therefore, it is necessary to collect a certain amount of accurate A/X language data and Y language data to generate the MT training data table, and form an A/X ⁇ Y translation model through the self-learning (self-training) of the robot (information processing system) .
  • the formation of the MT training data table can be carried out by artificially assisted understanding.
  • the above method of converting A language information into Y language information is also applicable to the case where the A language information is text information.
  • words such as Chinese words, English words, etc.
  • characters such as English letters and characters, German letters and characters, etc.
  • the X language information may be converted into X language information with characters as X elements, and the X ⁇ Y translation model training is performed according to the above method, so as to realize the A ⁇ Y translation (conversion).
  • there is also no need for character recognition and grammatical analysis in the conversion of A ⁇ Y no need for the support of word segmentation and grammar tables, and no restrictions on language and language mixing.
  • the problem of machine understanding of natural expression is equivalent to the process of converting A language information into Y language information.
  • the thicker X language information will then be obtained by matching the X language information with the Y language information to obtain the Y language information corresponding to the A language information.
  • the X language information can be words, characters, etc., or it can be information that has much finer information granularity than the text.
  • the X language information and the Y language information are corresponded by an algorithm similar to machine translation (also called conversion Or recognition) will not be restricted by the grammar rules of word processing, and there is no need to build models such as grammar models and rule bases. Because there is no need for manual model construction and rule library maintenance, this kind of machine translation-like algorithm can also achieve 100% machine self-learning.
  • This kind of loop iteration can include multiple iterative training on the training data, that is, the data after one training (the data used for training plus the new data obtained by training) is used as the training data for the next training to be trained again. After multiple cycles, new training data is continuously obtained, and all the data is stored in the training database; if there is a new paired data input, all data (new data and existing data) are looped and iterated.
  • the database of X language information and Y language information can be automatically expanded, including not only the input paired data, but also the training data expanded through the permutation and combination of elements and training iterations. This is why we call the database a training database.
  • the machine can expand by self-training to realize the corresponding conversion of X language information to Y language information database. While obtaining new training data through loop iterations, it also iterates the machine's understanding model (including the conversion model from X language information to Y language information), so as to optimize the model and enhance the accuracy of the model.
  • the optimization of the model is also done automatically by the machine, and the work of the model engineer is not required, which can greatly reduce the cost of machine learning.
  • the X language information is obtained from the newly input A language information
  • the X language information is input into the conversion model of the X language information to the Y language information
  • the Y language information corresponding to the X language information is determined through the calculation of the conversion model. Or calculate the correct rate of mapping the X language information to a certain Y language information. If the comprehension ability of the machine is not mature enough, or uncertain to convert the X language information to a certain Y language information, then it is usually necessary to perform artificial assisted understanding.
  • the erroneous correspondence between X language information and Y language information in the training database can be corrected by artificially assisted understanding.
  • a certain piece of A language information (natural expression) corresponding to a certain piece of Y language information is designated through artificial assisted understanding, replacing the previous piece of Y language information corresponding to the piece of A language information, or the robot is informed that a certain piece of A language information corresponds to the robot through artificial assisted understanding
  • the correct rate of a piece of Y language information is higher than the correct rate of the piece of A language information corresponding to the previous piece of Y language information, so that the corresponding relationship between the X language information and the Y language information converted from the piece of A language information is corrected Or optimization.
  • the aforementioned machine learning that expands and trains the paired data can be implemented using one or more of the models of statistics, deep learning, probability calculation, and fast optimization path searching. But the model itself may bring small errors, which can be called inherent errors. For the expansion and training of large amounts of data, the errors caused by the inherent errors of this system are visible. For example, if the expanded data is 5 million pieces, the test using these paired data results in an error rate of 0.2%, that is, 10,000 pieces of the expanded paired data are wrong. In order to compensate for this inherent error, the known X language information data that cannot be correctly identified and the Y language information that should correspond correctly are written into the comparison table.
  • the comparison table can also be expanded correspondingly by the above method to improve conversion and recognition The accuracy rate.
  • the comparison table can also be used to store expression data with higher occurrence probability and its pairing data. In this way, the overall search speed can be increased by first looking up the comparison table in subsequent searches, thereby increasing the speed of machine understanding.
  • the probability threshold can be set based on the statistical results to filter the data stored in the comparison table.
  • the use of this comparison table is also applicable to machine learning systems or human-computer interaction systems based on natural intelligence and other machine intelligence methodology.
  • Fig. 6 schematically shows the flow of a natural expression processing method according to an embodiment of the present invention.
  • step S20 the system receives natural expression information (A language information).
  • the natural expression information may be text information, voice information, image information, video information, and the like.
  • step S21 it is judged whether the understanding ability of the robot is mature.
  • the judgment of whether the robot is mature in understanding can be based on the result that the robot converts A language information into X language information within a certain time interval (set according to specific application requirements), and then converts X language information into Y language information. Compared with the result of manually converting A language information into Y language information, the same number of times is divided by the total number of comparisons, and the percentage obtained is the robot understanding accuracy rate. It is also possible to use the robot to judge whether the understanding ability is mature, that is, the robot estimates the probability or accuracy of the correct conversion of one or some pieces of A-language information into certain Y-language information. We also call it the robot’s " Confidence" or "confidence value”.
  • the robot's confidence in the conversion of specific Y language information will continue to increase.
  • the calculation of the robot's self-confidence or self-confidence value is based on the corresponding relationship between X language information and Y voice information. Specifically, after the X language information is obtained through conversion or extraction from the A language information, one or more recognizers/classifiers such as deep neural networks, finite state converters, autoencoders and decoders are used to generate the language Y The logarithmic probability or similar score of the information, and then use the normalized exponential function to calculate the robot confidence.
  • the confidence level can be calculated corresponding to the designated Y language information (standard expression), where the designated standard expression can be a single or multiple (more than one).
  • the converted standard expression can be "standard expression 1", “standard expression 2” or “standard expression 3", in other words, it is necessary to identify “standard expression 1” from a certain natural expression "", “Standard Expression 2", “Standard Expression 3” corresponding to the intention.
  • the confidence level is calculated independently for a single standard expression, that is, the X language information (secondary language information) obtained by converting the natural expression is calculated separately to correspond to "standard expression 1", "standard expression 2" or “standard expression 3"
  • the result obtained is: the confidence level converted to "standard expression 1” is 80%, the confidence level converted to "standard expression 2” is 40%, and the confidence level converted to "standard expression 3" is 10 %.
  • the confidence threshold is set to 80% at this time, "standard expression 1" meets the threshold requirements; if the confidence threshold is set to 90%, none of the three standard expressions meets the threshold requirements; if the confidence threshold is set to 40%, Then there are two standard expressions whose confidence level meets the threshold requirements, and the standard expression with higher confidence level can be output as the understanding result, but such a low confidence level threshold is usually not set.
  • the self-confidence expressed for each standard is independent, so it is not the case that the cumulative sum of the self-confidence is 100%.
  • the confidence level converted to "standard expression 1" is calculated to be 70%.
  • the self-confidence of "Standard Expression 2" is 20%, and the self-confidence of "Standard Expression 3" is 10%, so it is more conducive to distinguish the understanding results of natural expression through the confidence threshold.
  • relative confidence can also be used, that is, after calculating the comprehension probability for different standard expressions, the mutual numerical comparison between the probabilities is used to further calculate the confidence. For example, the probability of understanding (recognition) for "standard expression 1" is 65%, and the probability of understanding (recognizing) for "standard expression 2" is 35%. The calculated confidence level can be 80% for "standard expression 1". "Standard Expression 2" is 20%.
  • the aforementioned robot understanding accuracy and robot confidence are both indicators to measure the maturity of robot understanding.
  • the calculation of robot understanding accuracy usually requires a certain amount of data accumulation.
  • the robot understanding accuracy can be a statistical result for a larger amount of training data, which is a more accurate measurement of the robot than the case of less training data. Understand the level of maturity.
  • the robot confidence is an evaluation of the understanding ability of the robot to understand a certain piece of A language information or some pieces of A-language information.
  • the understanding ability of the robot can be more accurately evaluated in the case of less training data.
  • the calculation of the accuracy of robot understanding usually requires a certain amount of data accumulation. This is because more data can more widely represent the diversity of expression, so as to more accurately reflect the actual application situation.
  • the robot understanding accuracy rate can be a statistical result for a larger amount of training data, which is a more accurate measure of the robot's understanding maturity level than the case of less training data.
  • the confidence value is used to measure the reliability of the robot's own answers.
  • the accuracy of robot understanding is an assessment of the maturity of a specific application, and the confidence level reflects the uncertainty of the robot's own answer.
  • the robot understanding accuracy or confidence level set according to the application needs, we call it the "robot understanding maturity threshold". If the robot understanding accuracy or confidence is lower than the robot understanding maturity threshold, the system considers that the robot understanding is not mature and will not use the robot conversion result, but will continue to use the manual conversion result Y2 to ensure the system's accurate understanding of the A language information And stability. At the same time, the system adds the A language information automatically converted by the machine to the X language information (left language), and the manual conversion result Y2 (right language) into the MT training data table for the MT robot self-training.
  • step S22 If the robot's understanding is mature, the robot will automatically convert the natural expression A into the standard expression Y in step S22; if the robot's understanding is not mature, the robot will try to convert the natural expression A into the standard expression Y1 in step S23. At the same time, in step S24, the MAU agent converts the natural expression A into the standard expression Y2.
  • step S26 if it is determined in step S21 that the understanding ability of the robot is mature, the result Y of automatic conversion by the robot is output; otherwise, the result Y2 of manual conversion of MAU seats is output.
  • step S25 the following subsequent processing is performed on the natural expression A, the result Y1 of the robot's attempted conversion, and the result Y2 of the manual conversion of the MAU agent: Automatically convert A into X language information (left language) together with Y2 (right Side language), as a pair of new paired data into the MT training data table; compare Y1 and Y2, and use it as statistical data to "judge whether the robot is mature".
  • keep the original data A and update the language data on the left side of the MT training data table when the A ⁇ X conversion technology is further developed in the future (the conversion accuracy is higher).
  • FIG. 7 schematically shows the flow of the natural expression processing and response method according to an embodiment of the present invention.
  • step S30 the natural expression A is first received in step S30.
  • step S31 it is determined whether the natural expression A can be converted into the standard expression Y through machine conversion.
  • This step is equivalent to step S21 in FIG. 6. Similar to the processing in FIG. 6, when it is determined in step S31 that the required standard expression cannot be obtained by machine conversion, manual conversion processing is performed in step S32.
  • a response prompting the customer to re-enter is made in step S33, and then the process returns to step S30 to receive the customer Natural expression information A re-entered.
  • "Prompt the customer to re-enter the response” can be, for example, the voice prompt "Excuse me, please tell me your needs again", "Please speak slowly”; the text prompt "Excuse me, please write more specifically”; Or image prompts, etc.
  • step S34 the standard expression of machine conversion or manual conversion is output.
  • step S35 a standard response matching the standard expression is searched.
  • the standard response can be fixed data pre-stored in the database, or it can be the basic data of the standard response stored in the database in advance, and then through the system operation, the basic data and the case variable parameters are synthesized to generate the standard response.
  • the standard response ID is set as the primary key of the response data, and the corresponding relationship table between the requirement code of the standard expression (Y language information) and the standard response ID is set in the database, so that the standard expression (Y language information) The demand code of is associated with the response data.
  • Tables 1 to 3 schematically show examples of the expression data table, the expression response relationship table, and the response data table, respectively.
  • the standard expression and the standard response ID may have a many-to-one relationship, as shown in Table 4.
  • the demand code of the standard expression (Y language information)
  • the demand code of the standard expression (Y language information) can also be directly used as the primary key of the response data.
  • standard expressions can include information related to natural expressions, such as expression type, language type, dialect type, and so on.
  • the natural expression from the customer is the voice "received”
  • the standard response to the query through the converted standard expression is the voice "OK, got it, thank you!”
  • the natural expression from the customer is the image "Transfer failed page” "Screenshot”
  • the standard response is obtained as a video "Easy tutorial for Transfer Error Correction" through the converted standard expression query.
  • the corresponding response can be manually matched in step S36.
  • Manual matching can associate the standard expression with the standard response ID by entering or selecting the standard response ID, or directly associate the standard expression with the response data, and can also create new response data. The reason why the standard response cannot be found may be that the standard expression was newly added manually, or it may be that the standard response of the same type is not matched.
  • the response of machine matching or manual matching is output in step S37.
  • the content of the response can be called or generated according to different information types. For example, for voice responses, you can play back real-person recordings or output voices through speech synthesis (Text To Speech, TTS); for user digital operations, for example, the phone key sequence combination "2-5-1000", run through the program to complete the "credit card” Repayment of 1,000 yuan” operation.
  • standard expressions can be used to quickly point to responses, so that customers no longer need to spend a long time traversing complex conventional function menus to find the self-service they need.
  • manual operations are mainly limited to background "decision-making" work, including determining standard expressions (Y language information) demand codes, selecting responses (or response IDs), or generating response operations, but not required Directly communicate with customers through calls or text input (except input standard expression (Y language information) demand parameters) at the front desk, which is the aforementioned silent seat mode.
  • This can save a lot of human resources and greatly improve work efficiency.
  • the standardized response provided by the system to customers compared with the traditional free-style response provided by manual agents directly to customers, is not affected by the emotions, voice glands, accents, business proficiency and other factors of artificial agents, and can better guarantee the customer experience The stability.
  • a converted natural expression (X language information)-standard expression-standard response database can be established, and the system will gradually realize automatic understanding and response.
  • the X language information data in the database can also have the advantages of fine information granularity, narrow business scope, and high data fidelity, thereby reducing the difficulty of robot training and shortening the maturity period of robot intelligence.
  • the process of human-computer interaction can be controlled by setting the robot confidence threshold.
  • the first confidence threshold is set as a criterion for judging whether the robot's understanding is mature. It is also possible to intelligently control human-computer interaction by setting other thresholds of confidence. For example, set a second confidence threshold. When the robot's confidence is lower than the first confidence threshold but not lower than the second confidence threshold, the robot asks the user to confirm whether the natural expression of the input is a certain standard expression . For another example, set a third confidence threshold. When the robot's confidence is lower than the second confidence threshold but not lower than the third confidence threshold, the robot asks the user to repeat the natural expression input. When the robot's confidence is lower than the third With confidence threshold, the robot will automatically switch to artificially assisted understanding.
  • Figure 15 shows an example of smart human-computer interaction for identity authentication.
  • the robot asks the user the question “Is it Mr. Yu himself?"
  • the user input is naturally expressed as a speech, and the meaning expressed can be "Yes” or “No One of the four meanings of ", "not heard clearly” or “not interested or unwilling to answer”.
  • the first confidence threshold is set to 80%
  • the second confidence threshold is set to 60%.
  • the standard expression (corresponding to the meaning or intention of natural expression) is well designed to distinguish between standard expressions, it will not happen that the confidence in more than one standard expression exceeds 50%, but due to training data deviation and other reasons There will also be a situation of high confidence deviation. At this time, if the confidence of more than one standard expression exceeds the threshold, the standard expression with the highest confidence can be automatically selected as the understanding result. It is also possible to calculate the degree of confidence expressed in more than one standard together, and make the sum of the degree of confidence expressed in each standard 100%, so that there will be no situation where the degree of confidence expressed in more than one standard is higher than 50%. .
  • the robot When the robot’s confidence in understanding the meaning of the user’s first voice response is not higher than 80% but not lower than 60%, that is to say, for example, the robot understands that the meaning of the user’s answer may be "no", but not very OK (60% ⁇ CL ⁇ 80%, CL represents confidence), then the robot asks the user to confirm whether the answer means "no", the user then enters the voice response again, and the robot understands the user's second voice response , If the robot’s confidence in understanding the meaning of the user’s second voice response is not less than 80%, the robot obtains the user’s confirmation of the meaning of its first voice response ("Yes"), and therefore will The corresponding meaning of the user’s voice is understood as one of “yes”, “no”, “not heard clearly” or “not interested or unwilling to answer”, and if the robot is confident in understanding the meaning of the user’s second voice response If it does not reach 80% or the user's confirmation result is "no", then the robot will resort to manual assistance or manual response.
  • the robot When the robot’s confidence in understanding the meaning of the user’s first voice response is less than 60%, or the robot cannot understand the meaning of the user’s answer, the robot asks the user to answer again, and the user enters the voice answer again, and the robot then To understand the user’s voice response this time, if the robot’s confidence in understanding the meaning of the user’s second voice response is not less than 80%, the robot will interpret the meaning of the user’s voice as “yes”, “no”, One of "not heard clearly” or “not interested or unwilling to answer”, and if the robot is still less than 80% confident in understanding the meaning of the user's (second) voice response, the robot will ask for help For manual understanding or manual response.
  • the above description of the example shown in FIG. 15 only sets that the user answers through two rounds.
  • the number of user answers can be increased. For example, in the second round, if the robot’s understanding confidence is less than 80% but not less than 60% for the user’s second voice response, then The user can be asked to make a third voice response to confirm whether the robot understands the meaning of the second voice response correctly.
  • a third confidence threshold can also be set, for example, 40%.
  • the robot’s confidence in understanding the meaning of the user’s first voice response is lower than 40%, it will automatically switch to a manual assisted understanding process, which can reduce the number of interactions. Improve user experience.
  • first confidence threshold 90%
  • second confidence threshold 50. %and many more.
  • the above-mentioned multi-round interaction method based on understanding confidence can be regarded as feedback control of the robot through user expression input, that is to say, logical output is generated according to the robot's understanding confidence in the expression input by the user, and through the logic The output controls the interaction process logically.
  • the direct effect of this scheme is that it can greatly reduce the workload of manual understanding or manual response.
  • the probability that the robot’s confidence is not less than 80% is 60%.
  • the clarity of meaning and/or pronunciation of the user's second voice input will be improved, so the robot is usually more effective for the user
  • the correct rate of understanding the second expression of the same meaning will also increase. In this way, by automatically prompting the user to repeat the meaning, the robot's understanding accuracy can be increased. Furthermore, the user is automatically prompted to confirm its meaning.
  • the robot can confirm its meaning to the user.
  • the correct rate of understanding or the degree of confidence in understanding is usually relatively high, such as between 90% and 100%.
  • the X language information converted from the expression input by the user for the first time and the preset meaning (standard expression expressed in Y language information) corresponding to the expression can be taken as
  • the paired data is stored in the training database, and the aforementioned method is used to train the paired data. In this way, only through the user's expression input, the paired data of the extended training database can be generated, without the need for manual assistance to understand or confirm the meaning of the user's expression. In other words, intelligent data accumulation and automatic learning by robots (engines) on the server can be realized.
  • the user expression involved in the human-computer interaction solution shown in the example in FIG. 15 may be voice or other expressions. If it is a human-machine voice interaction solution, it can be controlled through IVR.
  • the "precise information extraction method”, in layman's terms, is to obtain multiple intentions from a natural expression.
  • natural expression is not limited to natural language, but can also be static images, dynamic images, and so on.
  • the Y language information corresponding to multiple intentions is obtained from a natural expression.
  • the robot first converts A language information into X language information, then analyzes and calculates the parts corresponding to the preset intention from the X language, and then converts these parts into Y language information respectively. That is to say, compared with the aforementioned natural expression processing process, the key information is screened and extracted in the X language information layer, using precise conversion or partial conversion instead of overall conversion.
  • This method can improve the accuracy of robot understanding, especially for natural expressions that contain multiple key information reflecting intent, the accuracy of accurate conversion is higher than that of overall conversion.
  • each individual code means: 1 (air ticket), 1 (ticket change), 2 (postponed), 1 (discount ticket), 2 (off season), 4 (Asia Pacific), the first three codes correspond to operations, and the last three codes correspond to objects.
  • the intent to restore to complete is to postpone the change of off-season discount (airport) tickets for the Asia- Pacific region. Assuming that the key information of the demand is classified, including slot 1-operation, slot 2-object, then if you fill in the slot, you can fill in the slot 1 "postponed ticket change", and fill in the slot 2 " Off-season discount tickets for the Asia- Pacific region”.
  • slot filling is a process of extracting intents according to requirements, and storing them according to classification. As before, if you use a code to indicate the intention of subdivision, you can fill in "12" in slot 1, "12" in slot 2, and "4" in slot 3.
  • the ticket destination corresponding to slot 3 may include hundreds of international airports and thousands of domestic airports. If this is the case, letter combinations can also be used.
  • the three-character airport code formulated by the International Air Transport Association (IATA) or a code formed by a combination of letters and numbers to indicate a specific airport name.
  • IATA International Air Transport Association
  • these codes are sometimes not conducive to the manual assistance personnel to remember and input.
  • the manual assistance personnel can directly fill the specific destination (airport name) into the slot with a system that can automatically fill it with the code corresponding to the expression. For example, directly fill in the city name or city code, such as Shanghai, Shanghai, Shanghai, etc.
  • slot 1 corresponds to "departure place”
  • slot 2 corresponds to "destination”
  • slot 3 Corresponding to "date”
  • slot 2 with "Beijing” slot 3 with "tomorrow” or the specific date corresponding to "tomorrow” automatically determined by the system, and other information, such as booking Personal information, pick-up destination information, etc. will not be used in this information extraction and slot filling operation.
  • the robot From the conversion process, the robot first converts the A language information into the X language information, and then extracts the part corresponding to the Y language information to be filled into the slot from the X language information, then converts it into the Y language information and fills the slot. That is to say, compared with the aforementioned natural expression processing process, the key information is screened and extracted in the X language information layer, using precise conversion instead of overall conversion. This method can improve the accuracy of the robot's understanding, especially for natural expressions that contain multiple key information that reflects the intent, the accuracy of accurate conversion is higher than that of overall conversion.
  • Fig. 8 schematically shows an information extraction and slot filling process of a human-computer interaction system based on natural intelligence according to an embodiment of the present invention.
  • step S40 the system receives natural expression information (A language information).
  • the natural expression information may be text information, voice information, image information, video information, and the like.
  • step S41 it is judged whether the robot's precise information extraction capability (or simply called the "intention acquisition capability") is mature.
  • the judgment of whether the robot's accurate information extraction ability is mature is based on the robot converts the A language information into the X language information within a certain time interval (set according to the specific application requirements), and then extracts and fills in the X language information
  • the corresponding part of the Y language information of the slot is converted into Y language information and compared with the Y language information that needs to be filled in the slot obtained directly from the A language information. The number of times the two are the same, divided by the total number of comparisons, is obtained The percentage is the accuracy of the robot's precise information extraction or the accuracy of the intention acquisition.
  • the robot can also be used to judge whether the understanding ability is mature, that is, the robot estimates the probability that it can obtain the correct information for a certain intention based on a certain piece or certain pieces of A language information.
  • Information extraction confidence or “intent to gain confidence” (also colloquially referred to as “slot filling confidence”).
  • slot filling confidence also colloquially referred to as “slot filling confidence”.
  • the confidence level is calculated based on the correspondence between the X language information (secondary language information) and the Y language information (standard expression), through one of a deep neural network, a finite state converter, and an autoencoder/decoder. Or more to generate a logarithmic probability or similar score for Y language information, and then use a normalized exponential function to calculate the confidence level.
  • the robot intention acquisition accuracy rate or intention acquisition confidence level set according to the application needs is called the "robot intention acquisition maturity threshold". If the robot intention acquisition accuracy rate or the intention acquisition confidence is lower than the robot intention acquisition maturity threshold, the system considers that the robot intention acquisition capability is not mature, and will not use the robot intention to obtain the result YF, but will continue to use the artificial intention to obtain the result YF2. Ensure the accuracy and stability of the system's intentional acquisition of A language information. At the same time, the system adds the A language information to the X language information (left language) automatically converted by the machine, and the artificial intention acquisition result YF2 (right language) into the MT training data table (ie training database) for the MT robot self-training. .
  • step S42 If the robot's ability to extract accurate information is mature, in step S42, let the robot automatically perform intent acquisition and slot filling operations, convert A language information into X language information, and then extract from the X language information the Y language information corresponding to the slot to be filled Is converted into Y language information and filled in the slot; if the robot's ability to extract accurate information is not yet mature, in step S43, the robot tries to convert the standard expression YF1 to be extracted from the natural expression A and fills it into the slot, and at the same time in step In S44, the MAU agent directly obtains the Y language information YF2 that needs to be filled in the slot from the A language information and fills in the slot.
  • step S45 the following follow-up processing is performed on the natural expression A, the standard expression extraction conversion result YF1 and the MAU agent manual extraction conversion result YF2 that the robot attempts to convert from the natural expression A to be extracted: automatically convert A into X language information (left language) and YF2 (right language) are put into the MT training data table as a pair of new paired data; YF1 and YF2 are compared to "judge whether the robot's precise information extraction ability is mature" Statistics.
  • keep the original data A and update the language data on the left side of the MT training data table when the A ⁇ X conversion technology is further developed in the future (the conversion accuracy is higher).
  • step S43 it is not actually necessary to fill the groove with YF1. It is also possible to use the filling data of YF1 and YF2 as training data or statistical data.
  • a language information is a text, as described above, the text itself or characters are obtained as X elements or converted into X language information for subsequent operations.
  • Fig. 9 further exemplarily shows the processing flow of natural expression intention acquisition and slot filling under the inquiry item of "book a ticket”.
  • the natural expression "I will fly to Beijing from Shanghai tomorrow night and go home” is received in step ES11.
  • the expression can be in the form of voice, text, etc.
  • step ES12 it is judged whether the intention embodied in the expression is an inquiry item of "booking air tickets”, if it is judged that it is not an inquiry item of "booking air tickets”, the user is prompted that the current inquiry item is "booking air tickets” or the user is requested to confirm the current The requirement is "book a ticket”, and then ask the user to re-enter the expression.
  • Step ES12 may also be performed before the user enters the expression at the beginning of the process, that is, the user is first prompted for the current inquiry item. Then in step ES13, it is further judged whether the user is booking a ticket for himself or someone else. The user can input "my parents”, “my wife”, “chairman”, etc., if the robot can recognize the corresponding expressions If a specific person has information about these persons, the information of the person who booked the ticket can be automatically filled in the corresponding slot.
  • the robot further extracts information related to the "departure place", information related to the "destination”, and information related to the "date” in step ES15, which is extracted from the aforementioned information
  • the conversion of X language information to Y language information is consistent in principle and basic method, except that only information related to "departure place", information related to "destination” and information related to "date” are accurate Extracted and converted into Y language information.
  • the robot can ask or autonomously determine whether the user has other intentions after obtaining the intention in step ES15 and filling in the slot. For example, in this example, the user’s expression also includes “go home”.
  • the robot finds that the “return” After the expression of "home”, subsequent processing can be carried out, for example, asking or autonomously prompting the user whether the user needs pick-up service, and the user’s home address (if the robot’s knowledge base includes the user’s home address data) can be filled in "access Machine destination" slot. After the required information is filled in the slot, the robot can perform corresponding response operations, for example, display or voice to inform the customer of flight information that may meet the customer's needs, and so on.
  • the multiple steps in FIG. 9 may include the aforementioned (for example, as shown in FIG. 7 or FIG. 8) the process of converting natural expressions into standard expressions or performing intention acquisition and filling slots based on natural expressions. For example, judging the question item from natural expression (ES12), confirming the ticketing person (ES13, ES14), confirming the "departure place", “destination” and “date” (ES15), confirming other intentions (ES16), and follow up Slot filling treatment (ES17).
  • the extraction of corresponding information for each slot and the filling of each slot can be realized through multiple understandings. For example, for the first time to understand and confirm that the "inquiry item" (that is, the frequently asked questions "FAQ” described later) is "booking a ticket”, this can reduce the A/X ⁇ Y database to the "inquiry item” as " Within the range corresponding to "booking air tickets", the amount of data and calculation required for robot understanding and training is greatly reduced, and the convergence speed of iterative calculations is greatly accelerated.
  • the "inquiry item” can also be determined by the default method or the method selected by the user. In the same way, it can be determined by re-understanding or defaulting or user selection to confirm that the user booked the ticket for himself.
  • the robot first asks the user for the service required by text, language, or image-"book a ticket", then asks the user who booked the ticket, and then asks the user "departure”, “destination”, and “date” , "Preferred Time”, “Price” and other information, and ask for other needs (such as pick-up, etc.).
  • VPA Virtual Personal Assitant
  • Natural expression processing methods, human-computer interaction methods and precise information extraction methods based on natural intelligence can be particularly applied to customer service systems such as the aforementioned interactive voice response IVR or Internet call center system ICCS or other remote customer contact systems (such as telephone Sales system, network sales system, VTM intelligent remote terminal).
  • customer service systems such as the aforementioned interactive voice response IVR or Internet call center system ICCS or other remote customer contact systems (such as telephone Sales system, network sales system, VTM intelligent remote terminal).
  • the requirement for machine translation is not the exact meaning verbatim, but the need to convert the natural expression of the customer into information that the system can understand, so as to provide the customer with a response corresponding to the expression. That is to say, the machine translation here focuses on the understanding of the substantive meaning behind human language, so as to express the actual intentions or needs of customers "understood” from natural expressions in a form that is easier to process by computer programs.
  • Fig. 10 schematically shows an intelligent human-computer interaction system according to an embodiment of the present invention.
  • the intelligent human interaction system includes an intelligent answering device 1 (equivalent to the server side), and a calling device 2 (equivalent to a client).
  • the client 8 communicates with the intelligent answering device 1 through the calling device 2, and MAU artificial seat 9 (System service personnel) Manually operate the intelligent answering device 1.
  • the intelligent answering device 1 includes a dialogue gateway 11, a central controller 12, a MAU workstation 13, and a robot 14.
  • the smart answering device 1 further includes a trainer 15.
  • Customer 8 refers to the organization's remote sales and remote service objects.
  • Distance selling usually refers to an organization actively contacting customers in the form of "outgoing" through its own dedicated telephone or Internet channels, trying to promote its products and services.
  • Remote service usually refers to the organization's customers actively contacting the organization in the form of "call-in” through the organization's exclusive telephone or Internet channel to inquire about or use the organization's products and services.
  • the calling device 2 may be a dedicated telephone channel or Internet channel established by an organization for remote sales (outgoing call service) to customers 8 and remote service (inbound call service) to customers.
  • Telephone channel call systems such as Automatic Call Distribution (ACD) are automated business systems through the background of the organization (for example, traditional IVR systems based on telephone button technology, or new VP (Voice Portal) voice based on intelligent voice technology). Portal system) and artificial seats, a dialogue channel for interacting with the customer 8 in the form of voice.
  • ACD Automatic Call Distribution
  • Internet channel call systems such as Internet Call Center (ICC) based on Instant Messaging (IM) technology
  • ICC Internet Call Center
  • IM Instant Messaging
  • NLP Natural Language Processing
  • human agents interactive dialogue channels with customers 8 in the form of text, voice, image, and video.
  • the intelligent answering device 1 allows the organization to control its back-end automatic business system and manual seats, as well as the dialogue with the customer 8 in the form of text, voice, image, video and other multimedia, thereby realizing the standardized and automated interaction between the organization and the customer dialogue.
  • the dialogue gateway 11 plays the role of "front portal" in the intelligent answering device 1, and its main functions include: receiving irregular natural expressions (in text, voice, image, video) from the client 8 via the calling device 2 and regularizing unnatural expressions Expression (for example, in the form of telephone keyboard keys, etc.), sent to the central controller 12 for subsequent processing; received instructions from the central controller 12 to realize the response to the customer 8 expression (in the form of text, voice, image, video, program, etc.) ).
  • irregular natural expressions in text, voice, image, video
  • Expression for example, in the form of telephone keyboard keys, etc.
  • the dialogue gateway 11 includes an expression receiver 111, an identity authenticator 112, a response database 113, and a response generator 114.
  • the expression receiver 111 receives the expression from the client 8 through the calling device 2.
  • the expression can be the aforementioned various irregular natural expressions and regularized unnatural expressions.
  • an identity authenticator 112 is provided before the expression receiver 111.
  • the identity authenticator 112 can identify and verify the identity of the client 8 in the initial stage of the conversation.
  • Traditional "password input” technology such as: phone key input password, keyboard input website login password, etc.
  • Pass-phrase + voice-print recognition” technology can also be used; The above two technologies can be mixed at the same time.
  • the response database 113 stores response data used to respond to customers. Similar to the example shown in the above table, the data can include the following types:
  • Text Pre-programmed text, for example, the text answer in the online banking FAQ (Frequently Asked Questions).
  • Image Pre-made image, for example, Beijing subway network map. It also includes non-video animations, such as GIF files, FLASH files, etc. that the bank introduces to customers how to conduct international remittance operations in the online banking system.
  • Video Pre-made videos, for example, electric iron suppliers show customers how to use their new products.
  • Templates text, voice, image, and program templates that can be filled with variables.
  • the response generator 114 receives an instruction from the central controller 12, and generates a response to the customer 8 by calling and/or running the data in the response database 113.
  • the response data can be inquired from the response database 113, or displayed text, image, or played voice, video, or executed program; it can also be called template in the response database 113 according to the command, Fill in the variable parameters transmitted in the instruction, or play real-time TTS speech synthesis (for example, "You have successfully repaid the credit card 5000 yuan.” where "5000 yuan" is the variable in the instruction), or display a paragraph of text , Or display a real-time generated picture or animation, or execute a program.
  • the central controller 12 may maintain and update the data in the response database 113, including response data, standard response ID, and so on.
  • the central controller 12 receives the customer demand expression information (including: irregular natural expression and regularized unnatural expression) from the expression receiver 111, and cooperates with the robot 14 and via the MAU workstation 13 and the MAU artificial seat 9 to transfer the customer
  • the irregular natural expression information is converted into a standard expression or extracted and converted into the required standard expression and filled in the corresponding slot, and the corresponding standard response ID is determined according to the conversion result of the standard expression or the intent to obtain the result, Then the standard response ID is sent to the response generator 114.
  • the central controller 12 may update the data in the MT training data table.
  • the robot 14 is an application robot that implements the aforementioned machine intelligence technology.
  • the robot 14 can convert natural expressions (A language information) such as text information, voice information, image information, and video information to obtain standard expressions (Y language information) and/or the aforementioned intention acquisition and slot filling operations.
  • a language information such as text information, voice information, image information, and video information
  • standard expressions Y language information
  • the robot 14's ability to understand or extract accurate information reaches a certain level, for example, when the ability to understand judgment or accurate information in a certain category is mature, it can independently perform the A ⁇ X ⁇ Y Switching or filling slot operations without the assistance of manual seats.
  • the MT training data table can be set in the robot 14, or can be an external database, and the standard expression data or slot filling result data (right language) stored in the demand code can be associated with the standard response ID.
  • This database can be updated by the central controller 12.
  • the database used for text translation, speech recognition, image recognition, video processing, etc. may be an external database,
  • the MAU workstation 13 is the interface between the intelligent answering device 1 and the MAU artificial seat 9.
  • the MAU workstation 13 presents the identified natural expression or the original expression of the customer to the MAU artificial seat 9.
  • the MAU human agent 9 inputs or selects the standard expression, or enters or selects the filling content through the MAU workstation 13, and then the MAU workstation 13 sends the standard expression or filling content to the central controller 12.
  • the MAU human agent 9 inputs or selects the response (or the standard response ID) through the MAU workstation 13.
  • the smart response device 1 may also include a trainer 15.
  • the trainer 15 is used to train the robot 14 to convert natural expressions into standard expressions and/or obtain intentions from natural expressions.
  • the trainer 15 uses the judgment result of the MAU artificial seat 9 to train the robot 11 to continuously improve the robot understanding accuracy rate or the intention understanding accuracy rate of the robot 11 in various categories (for example, the aforementioned business category and secondary business category, etc.).
  • the trainer 15 compares the standard expression conversion result of the MAU artificial seat 9 with the standard expression conversion result of the robot 11, such as The result is the same, and the category "number of accurate robot judgments" and “number of robot judgments" are increased accordingly.
  • the trainer 15 will The MAU artificial seat 9's intention acquisition result is compared with the robot 11's intention acquisition result. If the result is the same, the category "Accurate number of robot intention acquisition” and “Number of robot intention acquisition” will be added accordingly; otherwise, the result will be manually converted Or the intentional acquisition result (the intentional acquisition result can also be represented by the manual filling slot result) is added to the MT training data table as new robot training data.
  • the trainer 15 can also instruct the robot 14 to perform the aforementioned "self-learning" and training.
  • the trainer 15 can also be used to train the robot 14 in machine intelligence technologies such as text translation, speech recognition, image recognition, and video processing.
  • the trainer 15 can also maintain and update MT training data tables, databases used for text translation, speech recognition, image recognition, and video processing.
  • the trainer 15 can also be integrated with the central controller 12.
  • the response generator 114 and the response database 113 may be independent of the dialogue gateway 11 or integrated in the central controller 12.
  • the intelligent response device 1 can implement the aforementioned natural expression processing and response methods.
  • the dialogue gateway 11 receives the natural expression information from the client 8 from the calling device 2 through the expression receiver 111 and sends it to the central controller 12; the central controller 12 instructs the robot 11 to recognize the natural expression information as computer-processable Some form of language information (such as X language information) and related expression information, and then instruct the robot 11 to convert the language information and related expression information into standard expressions; if the understanding of the robot 11 is not mature enough or corpus matching is not achieved ,
  • the central controller 12 instructs the MAU workstation 13 to prompt the MAU artificial seat 9 to perform the manual conversion of the standard expression; the MAU artificial seat 9 converts the language information and related expression information recognized by the robot 11 into the standard expression.
  • the MAU artificial seat 9 can directly convert the unrecognized irregular natural expression information into a standard expression; the central controller 12 queries the expression-responds to the database, searches The standard response ID that matches the standard expression is output. If there is no matching result, the MAU workstation 13 will prompt the MAU manual agent 9 to select the standard response and enter the corresponding standard response ID.
  • the MAU manual agent 9 can also directly Associate the standard expression with the response data, or create new response data; the central controller 12 instructs the response generator 114 to call and/or run the data in the response database 113 to generate a response to the customer 8 expression; then, the dialogue gateway 11 The response is fed back to the customer 8 through the calling device 2; optionally, the central controller 12 maintains and updates the MT training data table or response database according to the standard expression or standard response determined or added by the MAU manual agent 9, and maintains and updates accordingly Expression-Response to the database.
  • the smart answering device 1 can also implement the aforementioned intention acquisition and slot filling methods.
  • the dialogue gateway 11 receives the natural expression information from the client 8 from the calling device 2 through the expression receiver 111 and sends it to the central controller 12; the central controller 12 instructs the robot 11 to recognize the natural expression information as computer-processable In some form of language information (such as X language information), the robot 11 is then instructed to extract the part corresponding to the required standard expression from the language information, convert it into the standard expression, and fill in the slot; if the robot 11 extracts accurate information If the ability is not mature enough or the corpus is not matched, and the slot filling cannot be completed, the central controller 12 instructs the MAU workstation 13 to prompt the MAU artificial agent 9 to manually fill the slot; the MAU artificial agent 9 directly understands the natural expression, and according to the understanding result or The slot filling operation is performed through the standard expression obtained through the understanding, which is input through the MAU workstation 13 and sent to the central controller 12; the central controller 12 queries the expression-response database
  • the MAU artificial agent 9 can also directly express the standard (including the result of filling the slot ) Is associated with the response data, or creates new response data; the central controller 12 instructs the response generator 114 to call and/or run the data in the response database 113 to generate a response to the customer 8; then, the dialogue gateway 11 will respond Feedback to the customer 8 through the calling device 2; optionally, the central controller 12 maintains and updates the MT training data table or response database respectively according to the standard expression (including the result of filling the slot) or standard response determined or added by the MAU manual agent 9, and Maintain and update the expression-response database accordingly.
  • the standard including the result of filling the slot
  • the central controller 12 maintains and updates the MT training data table or response database respectively according to the standard expression (including the result of filling the slot) or standard response determined or added by the MAU manual agent 9, and Maintain and update the expression-response database accordingly.
  • 12A to 12P schematically show the operation interface of the intention acquisition and slot filling system according to an embodiment of the present invention.
  • Fig. 12A shows an interface for setting "FAQ".
  • the so-called “FAQ” can refer to common questions in human-computer interaction (also the aforementioned "question item”).
  • “Change Password” is to change the password
  • “Check Credit Balance” is to check the credit card balance.
  • Customer Service means after-sales service, etc.
  • "Id” is a unique identifier assigned to the FAQ, which is used to conveniently query, input or select the FAQ.
  • the interface of FIG. 12A can be used to display and set FAQ. For example, when a new FAQ needs to be added, for example, to order a flight ticket, the content description "Flight” and "Id” of the FAQ can be manually input.
  • the interface of FIG. 12A can also be used to set a higher level of application scenarios. FAQs can be classified by application scenarios. For example, “Check Credit Balance” is included in the credit card service scenario, and "Flight” is included in the travel service scenario.
  • multiple rounds of human-machine dialogue can be used to achieve intent acquisition and slot filling operations for multiple FAQs in one application scenario or multiple FAQs in multiple application scenarios, or through a single intention acquisition and slot filling operation At the same time, it realizes intent acquisition and multi-slot filling operations across FAQs and even across application scenarios.
  • the dialog and response display interface of FIG. 12B shows the initial data form of the FAQ.
  • "Customer support” represents the robot's words
  • "test” represents the user's expression
  • "Engine Response” part showing the system (engine) Understanding FAQID and FAQ.
  • ResponseID is the identification of the system response corresponding to "Question”. Different "Question” can have different “ResponseID” or correspond to the same “ResponseID”.
  • the input box and "Go" in the upper left corner of the interface in Figure 12C are used to select the page of the data table under it, and you can jump by entering the page number.
  • the radio box "Include training data” is used to select whether the search result includes existing training data; the radio box "Mismatch FAQID” is used to select retrieval Does the result include training data with different FAQID and EXPECTED FAQID, so that you can view the unmatched data before manual correction; the reset control key “Reset” is used to reset the search conditions of "Question” at a time; search control The key “Search” is used to search "Question” and its related data according to the set search conditions; the training engine control key “Train Engine” is used to start the search engine (it can also be considered as the aforementioned robot or part of the robot) For training, manually assign the corresponding FAQID to the user expression (“Question”), which is equivalent to assigning the corresponding FAQ, click the control key "Train Engine” (training engine) to train the robot.
  • the interface shown in FIG. 12D is used to generate slots corresponding to the intent for the FAQ.
  • the left side of the interface in Figure 12D is the system menu bar, among which, under the "FAQ” item there is: "Tree Editor”, which is used to edit the language of manual interaction, that is, the dialogue logic based on the understanding of the user's expression. ; "Import/Export” (input/output) is used to input or batch upload FAQ data or output FAQ data.
  • the "Chat” (dialogue) item is used to display, select, and edit the human-computer interactive dialogue.
  • the selected FAQ is "Flight", fill in "Slot ID” in the "ID” input box, for example, “FROM” pointing to the “departure point” , "TO” pointing to the destination, etc.
  • the “Sort” column is used to input the shortcut key corresponding to the slot (ie hot key, Hot Key), which is used for quick input when the silent agent assigns the corresponding value to the slot manually, for example, "1" corresponds to "From”, and " 2" corresponds to "TO”, so that the slot value can be quickly specified by entering the "Sort” value or code when manually assisting input or querying the slot value.
  • the "Description” input box is used to input the description of the content filled in the slot, for example, "FROM” is used to describe the starting point, and "TO” is used to describe the destination.
  • the valid value of the slot “Valid Values” is the value that can be effectively filled in the slot.
  • the effective value of the slot can be regarded as a standard expression converted and extracted from the natural expression of the user. For example, as shown in Figure 12E-1 and Figure 12E-2, the "PEK”, “PVG”, “HKG” etc. entered in the edit box pointed to by "Valid Values" are all unique codes for the airport.
  • the effective value of the same slot can be adapted to fill in different slots.
  • a certain slot effective value can be used, and the slot effective value has the same meaning in different FAQs or application scenarios.
  • the slot effective value "PEK” corresponding to Beijing International Airport can also be used in "dining” or “shopping” application scenarios, and it can also be used in another FAQ "Pick Up” (pick-up station) under "travel” application scenarios.
  • the same effective slot value can also be used to represent different meanings.
  • the effective value of the filled slot corresponds to the standard expression, and each standard expression can correspond to multiple X language information, and the A language information (natural expression) of the X language information obtained by conversion is diverse.
  • a language information for example, SH, Shanghai Pu Dong, Shanghai, Shanghai Pudong, Pudong, Pu Dong, Shanghai Pudong International Airport, Pudong International Airport , Shanghai, Shanghai Pudong, can correspond to PVG, that is to say, when any one of these expressions appears in the user’s natural expression, it may be considered as corresponding to the slot effective value PVG, and then converted to PVG and fill in the corresponding Slot.
  • the silent seat can be used to understand the Shanghai Pudong Airport from natural expression, and it is the departure place, and then input the PVG into the slot corresponding to the departure place.
  • the correct X language information and the matching data of the filling slot result will be saved in the database (ie the aforementioned MT training data table) for the robot to learn.
  • the robot uses the correct X language information and the result of filling the slot to learn to improve the accuracy of understanding and confidence, so it can also accelerate the training of the robot by importing training data from outside.
  • training can also be performed through local paired data.
  • Figure 12F by clicking the control key As shown in the pop-up window shown in Figure 12G, click the control key "Choose File” to upload the slot data file to the slot "FROM”.
  • the slot data file includes, for example, such data: PVG, SH; PVG, Shanghai Pu Dong; PVG, Shanghai; PVG, Shanghai Pudong; PVG, Pudong; PVG, Pu Dong; PVG, Shanghai Pudong International Airport; PVG, Pudong International Airport; PVG, Shanghai; PVG, Shanghai Pudong; HKG, Hong Kong International Airport; HKG, Hong Kong; HKG, HK; HKG, Hongkong; HKG, Hong Kong Chek Lap Kok International Airport; HKG, Hong Kong International Airport; HKG, Hong Kong Airport; HKG, Hong Kong; PEK, BJ; PEK, Beijing; PEK, Beijing Capital International Airport; PEK, Beijing Shou Du Ji Chang; PEK, Beijing Shou Du Guo Ji Ji Chang; PEK, Beijing Capital Airport; PEK, Capital Airport; PEK, Beijing Capital International Airport; PEK, Beijing, etc.
  • These slot data include multiple expressions corresponding to PVG, HKG, and PEK. After training, the robot
  • the formed pairing data is called partial pairing data
  • Training does not fully realize the robot's ability to automatically fill the slot, but this training can effectively improve the robot's understanding accuracy and confidence, thereby improving the robot's ability to acquire and fill the slot .
  • this kind of training can be carried out in advance before the artificial auxiliary training, which improves the convergence speed of the iterative operation, thereby reducing the workload of artificial auxiliary training. Therefore, this kind of training based on local pairing data can be regarded as pre-training completely performed by the robot itself.
  • the actual data used in training is still the paired data formed by the X language information obtained after converting the slot data corresponding to the slot effective value and the slot effective value.
  • the method of training with partial paired data shown in FIGS. 12F to 12H can also be used in the aforementioned standard expression understanding conversion as an optional training method.
  • Figure 12I shows the main guide interface of the manual auxiliary filling slot.
  • data filtering there are multiple input boxes for data filtering in the upper part of the interface.
  • “Update Date From...To" uses the update date as the data filter condition
  • "Create Date From...To” uses the creation date as the data filter condition
  • "Confidence Min:...Max” uses the confidence level.
  • “QID”, “Question”, “Faqid”, “Expected Faqid”, and “ResponseId” have the same meanings as mentioned above, and they can also be used as data selection conditions.
  • the control key “Search” is used to search according to the set search conditions, and the control key “Reset” is used to reset the search conditions all at once.
  • the radio button “Include training data” (including training data) is used to select whether the search result includes the existing training data; the radio box “Mismatch FAQID” is used to select whether the search result includes the unmatched FAQID.
  • the part “hong kong” highlighted in blue in Figure 12J-1 is selected and passed Select control key Or enter the shortcut key corresponding to the slot ("1" or "2") to select the slot corresponding to the part to be filled, for example, slot 1 "FROM", and then it will be displayed in the text box in the middle of the row corresponding to "FROM""Hongkong", the silent agent staff use the drop-down menu on the right side of the same row to select the corresponding slot valid value, for example, the slot valid value corresponding to the selected expression "hong kong" is "HKG".
  • Figure 12M shows an example of accessing the engine to test the training effect.
  • the robot can correctly recognize that the FAQ is "Flight”; and you can see it in the "Engine Response” part on the right Then, the robot correctly records the FAQ ID as "Flight”, and can automatically get the correct fill-in content "From", “HKG", “To”, "PEK”; when inputting the expression "I want to buy a ticket from Beijing to" shanghai", the robot can correctly recognize that the FAQ is "Flight”; and in the "Engine Response” section on the right, you can see that the robot correctly records the FAQ ID as "Flight”, and can automatically get the correct filling content "From” ",”PEK”,”To”,”PVG”.
  • Figure 12N shows another example of accessing the engine to test the training effect.
  • the engine responds with missing slot values or wrong slot filling values.
  • the robot can correctly recognize that the FAQ is "Flight”; it can be seen in the "Engine Response” part on the right , The robot correctly records the FAQ ID as "Flight”, and can automatically get the correct fill-in content "From” and “HKG", but it lacks the fill-in content "To” and "PEK”.
  • the X language information obtained from the conversion of the text expression and the result of filling the slot can form paired data for training, or the text expression and the result of filling the slot can be formed into paired data for storage, and then the text expression can be converted into X language information.
  • FIG. 13 schematically shows a process of natural expression processing combined with robot understanding and manual assisted understanding (MAU) according to an embodiment of the present invention. As shown in Figure 13, there are four levels of processing from top to bottom.
  • the first level of processing is automatically completed by the robot.
  • the robot's understanding maturity threshold can be set as a condition for the robot to automatically perform processing.
  • the robot's understanding maturity threshold can be the robot's understanding accuracy threshold or the robot's confidence threshold. For example, if the robot understanding maturity threshold is set to 90, then as shown in Figure 13, natural expressions with robot understanding accuracy or robot confidence lower than 90 will not be automatically processed by the robot, but will be transferred to silent seats for standardized processing .
  • the second level of processing is completed by silent agents.
  • the silent seat is a way of using the customer service staff's ability to understand natural expressions to provide standardized understanding results, thereby assisting the robot in answering and forming pairing data for training the robot.
  • the robot will transfer the natural expression to be understood to the silent seat for processing.
  • the silent agent uses its own senses to receive the natural expression by watching, answering, etc., and understands the natural expression based on its own understanding ability, and then outputs the understanding result with the standard expression, and then the robot automatically performs automatic according to the understanding result answer.
  • silent agents is the understanding ability of ordinary customer service personnel, and because there is no need to directly respond to customer expressions, there is no need for silent agents to have voice, accent, and response proficiency requirements, which can be said to reduce the professional ability of customer personnel Requirements are also conducive to promoting social employment.
  • the robot automatically receives expressions and responds.
  • the silent agent is only responsible for understanding and does not need to respond, which can save a lot of human resources.
  • the silent agent can understand and operate multiple conversations at the same time. So as to further improve the work efficiency; on the other hand, the silent agent outputs the standard expression according to the natural expression output, so the natural expression and the corresponding standard expression form paired data, and the aforementioned MT training data table can be added to the robot. Carry out training to improve the understanding ability of the robot.
  • the improvement of the understanding ability of the robot under the condition that the understanding maturity threshold remains unchanged, a less and less proportion of customer expressions are transferred to silent seats, so the number of artificial seats can be further reduced. Reduce labor costs, thereby realizing closed-loop positive feedback of the system.
  • the robot automatically responds based on the understanding of the silent agent, and it can also ensure that the response is not affected by the emotions, voice glands, accent, and business proficiency of the customer service staff. For specific categories (or specific vertical applications), if the amount of standard response is not too much, you can use pre-recorded voice, video, etc. as the response, compared to synthesized voice or synthesized animation through TTS technology. Can bring a better user experience.
  • Fig. 14 schematically shows an example of an operation interface presented by the MAU workstation to the MAU artificial agent 9, where the MAU artificial agent 9 is the silent agent.
  • the operation interface of the MAU workstation 13 includes: a client expression display area 131, a dialogue status display area 132, a navigation area 133, a category selection area 134, and a shortcut area 135.
  • the customer expression display area 131 displays the natural expression of the customer (that is, the user), for example, presenting text, images, and voices converted from text, or displaying the image itself as a natural expression, or prompting links, etc., artificially by MAU Seat 9 chooses to click to listen to the voice expression.
  • the dialog status display area 132 displays real-time status information of the dialog between the customer 8 and the MAU artificial agent 9 or the robot 14, such as the number of round trips, the total duration of the dialog, customer information, and so on.
  • the display area may not be set.
  • the navigation area 133 displays the categories that the MAU manual agent 9 has selected to reach.
  • the left end of the area displays the text version of the current category path (as shown in the figure: bank ⁇ credit card), and the right end displays the category pair code (as shown in the figure: "12", “1” represents the “bank” category, and "2 "Means "credit card”, which is the next level in the "bank” category.
  • “1" is used to represent the "bank” category
  • "BNK” is not used. The two identification functions are identical).
  • the category selection area 134 is for the MAU human agent 9 to select the next level category.
  • MAU manual seat 9 has entered the next level of "banking" category "credit card”
  • the category of "credit card” has 7 sub-categories: “activate new card”, “apply for new card” And application progress inquiry", “repayment”.
  • the expression of customer 8 is "My credit card can be overdrawn too little.”
  • MAU human agent 9 can also directly input "127" on the keyboard after receiving and understanding the expression of customer 8 to reach the target category "Bank ⁇ Credit Card ⁇ Adjust Credit Limit". In this way, customer 8 does not need to spend a long time traversing the complicated function menu tree to find the self-service they need, and only need to directly speak their needs, MAU artificial seat 9 can quickly help customers directly start the "adjustment of credit card limit" processing Therefore, the user experience becomes easy and convenient, and the current self-service process utilization rate of the traditional IVR system will be greatly improved.
  • Shortcut area 135 provides common shortcut keys for MAU manual agent 9, for example, "-" returns to the upper category, "0" transfers to the manual agent, and "+” returns to the top category (in this example, the root category "Bank") .
  • the shortcut area 135 can also provide other shortcut keys for the MAU artificial agent 9.
  • the shortcut area 135 can increase the processing speed of the MAU artificial seat 9.
  • the shortcut area 135 is also an optional setting area.
  • the third layer is handled by senior agents.
  • a silent agent encounters a non-standard situation, that is, when he/she is not sure whether his/her understanding of the customer’s expression is correct, or finds that there is no standard expression that can be used to express it in the system, or finds that there is no accurate response in the system
  • the silent agent can transfer the processing to the senior agent, and the senior agent communicates directly with the customer by voice or text.
  • This means that senior agents are usually responsible for handling non-standard situations (including emerging situations).
  • the silent agent can also report that the customer did not hear clearly or understand the customer’s expression. Ask the customer to express it again or express it in another way. If you still think you can’t handle it, transfer it to a senior agent.
  • the senior agents here are somewhat similar to the agent supervisors of traditional customer service, dealing with difficult problems.
  • Senior agents can also provide positive feedback to the system. Specifically, the senior agent will form a Q&A (question and answer) for customer problems (specific expressions) and solutions (responses) encountered, and provide them to the back-end knowledge base designer.
  • the knowledge base designer carries out the background construction of speech skills, such as designing a tree-like dialogue scheme for a specific category or its sub-categories. As shown in Figure 13, the knowledge base designer designs a new FAQ "FAQ-12" under the subcategory "Branch-11" of the business category "Branch-1" based on the Q&A provided by the senior agent.
  • the FAQ may include standard expressions and slot filling results corresponding to customer expressions, as well as standard responses corresponding to standard expressions and slot filling results.
  • the aforementioned MAU artificial seat 9 may include the above-mentioned silent seat, may also include the above-mentioned advanced seat, and may also include a knowledge base designer.
  • the X language information (ie secondary language information) obtained by converting the natural expression and the meaning (intention) corresponding to the natural expression
  • the Y language information (that is, the standard expression) constitutes the paired data
  • self-learning (training) is carried out through the iterative comparison of the element arrangement and combination.
  • the basis of machine self-learning (training) is the paired data of natural expression and standard expression corresponding to the meaning of natural expression.
  • such pairing data can be obtained by means of artificially assisted understanding such as silent seats, or it can be obtained by the user input natural expression for verification. It is also possible to obtain such pairing data automatically by a machine.
  • a text script corresponding to the standard expression can be generated first. For example, if the standard expression is the meaning of "yes”, then the text script can be written in multiple to correspond to this meaning. For example, “Yes” (English), “right”, “ah”, etc., these scripts can be manually written or called from the database; then the corresponding voice can be obtained through the text-to-speech tool (TTS). So get the standard expression-voice pairing data. Since the standard expression can be pre-designed, the TTS tool is more accurate for text-to-speech conversion, so accurate pairing data can be obtained, and the standard expression can be converted into secondary language information and standards with less information granularity than text. The paired data expressed to form data for machine self-learning. We can also call this method the pre-training of natural intelligent robots.
  • the TTS tool can be used to enrich and expand the speech corresponding to the standard expression, and increase the matching corpus.
  • the TTS tool can be used to adjust one or more of the speech speed, volume, tone, and tone of the voice.
  • the speech speed of 1.1 times and 0.9 times, the volume of 1.1 times and 0.9 times, and the fine-tuning of the voice sound waves by random variables, the selection of the random variables and the range of changes can be determined based on the big data statistical model of human voice. It is also possible to use TTS tools with voice models of different genders, TTS tools with voice models in different languages or different dialects, and TTS tools with voice models such as different speaking habits and ways of speaking, to generate speech for training.
  • Such pre-training pairing data and the pairing data generated based on these data and stored in the training database can also be copied to training databases in other vertical fields or categories as needed, or these pairing data can be removed from the current training database.
  • the above-mentioned pre-training method is also applicable to the situation where the natural expression is the aforementioned static image, dynamic image, video, etc.
  • the response generator 114 can be used as the aforementioned TTS tool to generate speech corresponding to the standard expression.
  • Fig. 17 schematically shows an end-to-end control system based on natural intelligence according to an embodiment of the present invention.
  • the control system includes sensors, robots and controllers.
  • the sensor is one end and the controller is the other end.
  • the control method is: obtaining image and other information through the sensor, and processing by the robot based on known rules and data to obtain control Signal, and then control by outputting a control signal to the controller.
  • the controller provides the robot with control data as part of the training data.
  • a human driver drives or remotely controls the vehicle in the vehicle, and its driving or manipulation includes controlling the acceleration, deceleration and direction of the vehicle through the accelerator, brake, steering wheel, and gear position, as well as the control of turn signals and fog lights. , Width indicator lights, wipers, etc.
  • the controller obtains the control parameters generated by the human driver.
  • the control parameters can be the specific operation behaviors of the accelerator, brake, steering wheel, gear position, turn signal, fog lamp, width indicator light, wiper, etc., or it can be The control variables brought about by these operation behaviors, such as acceleration and deceleration, acceleration and deceleration, steering angle and steering angle speed, light switch, etc.
  • These control parameters and/or control quantities are provided to the robot by the controller as control data.
  • sensors obtain sensor data such as images, sounds, and radar ranging by sensing the driving environment, which are also provided to the robot.
  • the manipulation behavior of a human driver is his response to the driving environment, that is, the sensor data and control data have a causal relationship in the time dimension. Therefore, the sensor data and control data in multiple time intervals are used as paired data
  • the robot can obtain the ability to obtain corresponding control data from the sensor data, and iteratively update the model used to correspond the sensor data to the control data while training. This is similar to the aforementioned machine learning/training from the X element converted from the natural expression to the standard Y element.
  • the model can also be a neural network model.
  • the sensor data used for robot learning/training can also be changes with respect to time, such as images, sounds, and radar ranging data.
  • the change data over time of the partial image relative to the overall image can be further used as the sensor data.
  • the temporal change data of this partial image relative to the overall image often reflects moving objects in the road environment.
  • the image, sound, and radar ranging data in the sensor data are close to human perception data. It can also be said that these data are simulated data that a human driver perceives when driving. Therefore, when recording the driving or manipulation behavior of a human driver in the car or remotely, the driver’s field of view change can be considered, the image data in the field of view can be weighted, and the areas with different sensitivity in the field of view can be applied differently. Weights.
  • sensor data may also include detection data of the vehicle's own state, such as vehicle speed, steering, fuel consumption, tire pressure, wind speed, wind resistance, and so on.
  • the sensor data on the sensor side and the control data on the controller side are input to the robot for learning/training as time-related pairing data, so that the robot can draw analogy from the other, and output the control signal for controlling the controller from the sensor data.
  • the time interval corresponding to the data may be a small preset interval (for example, the image frame interval or a multiple thereof), or it may be from the start to the end of the manipulation behavior (for example, stop steering the steering wheel and stop manipulating the accelerator and brake).
  • the end-to-end control method and control system according to the embodiments of the present invention can also be used for drones.
  • the control of the UAV can be carried out remotely, and tools such as virtual vision (VR) helmets or glasses can be used.
  • VR virtual vision
  • the operator reacts by watching the images collected by the camera carried by the drone and listening to the sound collected by the microphone carried by the drone.
  • the drone is controlled by the remote control console or the control handle.
  • the drone is under control. It can fly in horizontal and vertical directions, can hover and flip, and can perform operations such as security monitoring, environmental monitoring, drug spraying, emergency rescue, and buzzer warning.
  • the robot can have the ability to automatically react to the flight environment.
  • the flying environment of drones is relatively simple. Because of its low mass and no people, the requirements for safety are lower; the more advantage is that the drone is in the automatic cruise process. It can be hovered and adjusted at any time, that is, when the robot cannot automatically determine the next behavior, it can have more time for manual assistance, which is similar to manual assistance by silent seats in the foregoing embodiment.
  • computer automation is a simpler and safer application.
  • it is to record the computer's response to human operations displayed on the computer screen through the camera, that is, sensor data; at the same time, correspondingly record human operations on the computer through the mouse, keyboard, sound, etc. or the amount of control generated by these operations on the computer.
  • the robot can replace the human to operate the computer according to the operating habits of the trained personnel.
  • the repetitive action set in a video game is close to the player’s actual game operation habits, so it will not be considered as a substitute robot; another example is to repeat the download and save operations.
  • Some data websites are set to prevent crawling robots.
  • the robot can be realized by a stand-alone computer or by the controlled computer itself.
  • the computers here can be extended to digital processing terminals such as mobile phones and tablets.
  • Fig. 18 schematically shows a process in which a trained robot makes a judgment based on sensor data and controls the controller.
  • the sensor data collected by the sensor is input to the robot; in step S51, the robot determines whether the corresponding control data can be determined based on the sensor data; if the control data can be determined, then in step S52, the robot The control data corresponding to the data generates a control signal, and sends the control signal to the controller.
  • the controller controls the device according to the control signal in step S54; if the control data cannot be determined in step S51, it is manually controlled in step S53
  • the controller controls the device through the controller in step S54, and records the control data generated by the manipulation; the control data generated by the manual control controller can be paired and saved with the corresponding sensor data to further train the robot.
  • the robot can also use the aforementioned confidence level or similar indicators to determine whether the corresponding control data can be determined based on the sensor data.
  • the human-computer interaction system based on natural intelligence may include one or more computers, mobile terminals or other data processing equipment, wherein such data processing equipment may be used to perform automatic conversion from natural expression to standard expression Processing or precise information extraction based on natural expression.
  • the system can also realize closed-loop feedback and pre-training.
  • standard expressions can be used to quickly point responses, so that customers do not need to spend a long time traversing complex routines Function menu to find the self-service you need.
  • the converted natural expression information (X language information)—standard expression (including intent acquisition information)— can be established through automatic learning, training, and artificially assisted understanding of the robot.
  • the standard response database gradually realizes the automatic understanding and response of the system.
  • the converted natural expression information data stored in the database can also have the advantages of narrow business scope and high fidelity, thereby reducing the difficulty of robot training and shortening the mature period of robot intelligence.
  • silent agents mainly perform background "decision-making" work, including determining standard expressions (Y language information) or intentions, selecting responses (or response IDs), or generating response operations, etc., but they do not need to call on the front desk. Or text input and other methods to directly communicate with customers. This can save a lot of human resources and improve work efficiency.
  • the standardized response provided by the system to customers compared with the traditional free-style response provided by traditional artificial agents directly to customers, is not affected by the emotions, voice glands, accents, business proficiency of artificial agents and many other factors, and can more guarantee customers The stability of the experience.
  • robots can be realized in units of specific application scenarios (business categories), so as to realize the intelligence of the overall system point by point.
  • business categories business categories
  • the mechanism of "robot understanding maturity point by point" is easier to be recognized and accepted by institutions, because the risk is relatively low, the cost of upgrading the old system is not high, and it will not have a negative impact on daily operations.
  • the control method or control system of the natural intelligence methodology does not need to identify objects through annotations and models, nor does it need to establish a large number of rules for control, and only need to imitate human perception and corresponding control Behavior, you can automatically train the control model, and use the trained model to achieve automatic control. This saves the massive labor cost required for labeling and rule establishment, and can avoid potential security risks similar to the above-mentioned labeling errors or incomplete rules.

Abstract

一种基于自然智能的自然表达处理方法,包括:接收自然表达的输入,得到具有第一信息颗粒度的第一语言信息,转换为具有第二信息颗粒度的第二语言信息,第二信息颗粒度的数量级介于第一信息颗粒度的数量级与文字的信息颗粒度的数量级之间,将第二语言信息转换为第三语言信息,作为对自然表达进行理解的结果,第二语言信息和与该第二语言信息对应的第三语言信息作为配对数据被存储在数据库,将该第二语言信息的元素的各种排列组合与该第三语言信息或者该第三语言信息的元素的各种排列组合进行循环迭代,建立第二语言信息的元素的各种排列组合与第三语言信息或第三语言信息的元素的各种排列组合之间的对应关系,获得更多的配对数据。

Description

基于自然智能的自然表达处理方法、回应方法、设备及系统,对机器人进行训练的方法,人机交互系统,对基于自然智能的人机交互系统进行训练的方法,端到端控制方法和控制系统
本申请要求于2019年1月23日递交的第201910064406.9号、第201910064402.0号、第201910065098.1号、第201910065178.7号、第201910065177.2号和第201910065179.1号中国专利申请以及于2019年4月16日提交的第201910303688.3号中国专利申请以及于2019年5月20日提交的第201910420711.7号中国专利申请的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本发明涉及一种对自然表达的处理方法,具体而言,涉及一种基于自然智能的自然表达处理方法、处理及回应方法、设备及系统,对机器人进行训练的方法,人机交互系统,对基于自然智能的人机交互系统进行训练的方法,端到端控制方法和控制系统。
背景技术
目前常用的机器智能(MI,Machine Intelligence)技术主要包括人工智能(AI,Artificial Intelligence)技术。其中比较常见的有基于人工智能的自然语言处理(NLP,Natural Language Processing)技术、无人驾驶技术等。
对基于人工智能的自然语言处理(也可简称为AI-NLP)而言,对于所处理的语音,需要从语音先转文本,再通过已建立的语法模型和语义模型来实现语义的理解。不过,这种方法受制于语音识别器的准确率。例如,一个句子有10个字,语音识别器能够实现90%的准确度,但如果错误发生在关键词(字),那么现有的AI-NLP技术便无法实现正确的语义理解。特别地,在噪音环境下,由于语音识别器的准确率会下降,所以要用AI-NLP技术准确地进行语义理解的难度也更高了。另一方面,由于AI-NLP需要人工构建海量的语法模型和语义模型,因而会产生极大的人工成本。事实上,目前世 界上从事AI-NLP技术研发和应用的主要企业均有数千甚至更多的员工从事语音的人工标注和模型构建。
对基于人工智能的无人驾驶技术而言,对于用诸如摄像头或雷达的传感器收集的图像数据,仍然需要先从图像数据进行物体检测,识别出预定物体并对其进行定位,然后再根据检测到的物体的性质和位置来进行驾驶决策。例如,当机器从摄像头获得的图像中检测出行进道路前方有一个人或者一只猫,则控制车辆减速或刹车。因此,在实现自动驾驶系统时,首先要建立物体检测模型,而在建立和训练物体检测模型时,需要大量的经过人工标注的图像数据,使得物体检测模型能够学习对物体的识别和定位。这种人工标注工作会带来很高的成本,也容易发生错误。
发明内容
根据本发明的一个方面,提供了一种基于自然智能的自然表达处理方法,其中,包括:接收自然表达的输入,得到具有第一信息颗粒度的第一语言信息,将第一语言信息转换为具有第二信息颗粒度的第二语言信息,其中,第二信息颗粒度的数量级介于第一信息颗粒度的数量级与文字的信息颗粒度的数量级之间,将第二语言信息转换为第三语言信息,第三语言信息作为对自然表达进行理解的结果,其中,第二语言信息和与该第二语言信息对应的第三语言信息作为配对数据被存储在数据库,对于数据库中已有的成对的第二语言信息和第三语言信息,将该第二语言信息的元素的各种排列组合与该第三语言信息或者该第三语言信息的元素的各种排列组合进行循环迭代,建立第二语言信息的元素的各种排列组合与第三语言信息或第三语言信息的元素的各种排列组合之间的对应关系,获得更多的第二语言信息与第三语言信息的配对数据,并存储在数据库中。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,当从输入的第一语言信息获得第二语言信息后,将该第二语言信息与数据库中已有的第二语言信息进行比较,然后根据比较结果来确定与该第二语言信息对应的第三语言信息,或者计算将该第二语言信息对应到某第三语言信息的正确率,如果机器理解能力不够成熟,不足以或者不确定将该第二语言信息转换到某条第三语言信息,那么进行人工辅助理解,通过人工对输入的第一语言 信息进行理解,得到与自然表达的含义所对应的第三语言信息,并且将从该第一语言信息得到的第二语言信息与第三语言信息对应起来或者将第一语言信息与第三语言信息对应起来,得到新的配对数据存入数据库。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,对于新的第二语言信息与第三语言信息的配对数据或者新的第一语言信息与第三语言信息的配对数据,将其中的第二语言信息或者由第一语言信息转换得到的第二语言信息的元素的各种排列组合与其中的第三语言信息或者该第三语言信息的元素的各种排列组合进行循环迭代,建立第二语言信息的元素的各种排列组合与第三语言信息或第三语言信息的元素的各种排列组合之间的对应关系,获得更多的第二语言信息与第三语言信息的配对数据,并存储在数据库中。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,通过人工辅助理解纠正数据库中第二语言信息与第三语言信息之间错误的对应关系。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,通过自信度来衡量机器理解能力,其中,基于第二语言信息与第三语言信息的对应关系来计算自信度。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,从第一语言信息得到第二语言信息之后,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对第三语言信息的对数概率或相类似分数,再利用归一化的指数函数来计算出对第三语言信息的自信度。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,第二语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,在对第二语言信息与第三语言信息的配对数据进行循环迭代时,也对第二语言信息到第三语言信息的转换模型进行循环优化。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,用循环迭代得到的第二语言信息测试机器对于第二语言信息到第三语言信息的转换,并将不能被正确转换的第二语言信息及其应正确对应的第三语言信息写入对照表,对于后续输入的自然表达,由自然表达转换的第二语言信息先与 对照表中存储的第二语言信息进行对比。
根据本发明的一方面,提供了一种基于自然智能的自然表达处理及回应方法,其中包括:通过根据前述的自然表达处理方法获得第三语言信息;调用或生成与第三语言信息相匹配的标准回应;以与第一语言信息对应的方式输出标准回应。
根据本发明实施例的基于自然智能的自然表达处理及回应方法,其中,标准回应是预先存储在回应数据库中的固定数据,或者基于变量参数和预先在回应数据库中存储的标准回应的基础数据来生成标准回应。
根据本发明的一方面,提供了一种基于自然智能的自然表达处理及回应设备,其中,包括:对话网关,中央控制器,MAU工作站,机器人,表达数据库,回应数据库和回应生成器,其中,对话网关接收来自用户的自然表达,发送给中央控制器进行后续处理,并且将对自然表达的回应发送给用户;中央控制器接收来自对话网关的自然表达,并与机器人以及MAU工作站协同工作,将该自然表达转换为表示该自然表达的含义的标准表达,并根据标准表达指示回应生成器生成与该标准表达对应的标准回应;机器人根据中央控制器的指示,将自然表达转换为次级语言信息,其中,次级语言信息的信息颗粒度的数量级介于自然表达的信息颗粒度的数量级与文字的信息颗粒度的数量级之间,并将次级语言信息转换为标准表达;MAU工作站将自然表达呈现给外部的MAU人工座席,MAU人工座席通过MAU工作站输入或者选择标准表达,然后MAU工作站将该标准表达发送给中央控制器;训练数据库用于存储次级语言信息和标准表达的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的自然表达的回应,其中,设备进一步包括训练器,该训练器用于训练机器人将自然表达转换为标准表达,其中,训练器使得机器人对于训练数据库中已有的成对的次级语言信息和标准表达,将该次级语言信息的元素的各种排列组合与该标准表达或者该标准表达的元素的各种排列组合进行循环迭代比较,建立次级语言信息的元素的各种排列组合与标准表达或标准表达的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达的配对数据,并存储在训练数据库中。
根据本发明的一方面,提供了一种基于自然智能的人机交互系统,其中,包括:自然表达处理及回应设备和呼叫设备,其中,用户通过呼叫设备与自然表达处理及回应设备通信,MAU人工座席对自然表达处理及回应设备进行人工操作,自然表达处理及回应设备包括:对话网关,中央控制器,MAU工作站,机器人,表达数据库,回应数据库和回应生成器,其中,对话网关接收来自用户的自然表达,发送给中央控制器进行后续处理,并且将对自然表达的回应发送给用户;中央控制器接收来自对话网关的自然表达,并与机器人以及MAU工作站协同工作,将该自然表达转换为表示该自然表达的含义的标准表达,并根据标准表达指示回应生成器生成与该标准表达对应的标准回应;机器人根据中央控制器的指示,将自然表达转换为次级语言信息,其中,次级语言信息的信息颗粒度的数量级介于自然表达的信息颗粒度的数量级与文字的信息颗粒度的数量级之间,并将次级语言信息转换为标准表达;MAU工作站将自然表达呈现给MAU人工座席,MAU人工座席通过MAU工作站输入或者选择标准表达,然后MAU工作站将该标准表达发送给中央控制器;训练数据库用于存储次级语言信息和标准表达的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的自然表达的回应,其中,设备进一步包括训练器,该训练器用于训练机器人将自然表达转换为标准表达,其中,训练器使得机器人对于训练数据库中已有的成对的次级语言信息和标准表达,将该次级语言信息的元素的各种排列组合与该标准表达或者该标准表达的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准表达的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达的配对数据,并存储在训练数据库中。
根据本发明的一方面,提供了一种基于自然智能的自然表达处理方法,其中包括:接收第一自然表达,将第一自然表达转换为次级语言信息,计算将由第一自然表达转换的次级语言信息转换为数据库中的标准表达的自信度,当计算得到对于某标准表达的自信度不低于第一自信度阈值,输出该标准表达作为对第一自然表达进行理解的结果。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,当计算 的自信度均低于第二自信度阈值,提示输入与第一自然表达具有相同含义的第二自然表达。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,将第二自然表达转换为次级语言信息,计算将由第二自然表达转换的次级语言信息转换为数据库中的标准表达的自信度,当计算得到对于某标准表达的自信度不低于第一自信度阈值,输出该标准表达作为对第一自然表达进行理解的结果。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,当计算得到的对某标准表达的自信度低于第一自信度阈值但不低于第二自信度阈值,提示输入第三自然表达以确认该标准表达是否对应于第一自然表达的含义。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,将第三自然表达转换为次级语言信息,计算将由第三自然表达转换的次级语言信息转换为表示“确认”含义的第二标准表达的自信度,如果该自信度不低于第一自信度阈值,输出第一标准表达作为对第一自然表达进行理解的结果。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,将由第一自然表达转换的次级语言信息与第一标准表达作为配对数据存储在数据库。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,如果计算的自信度低于第一自信度阈值或者其它自信度阈值,对第一自然表达进行人工辅助理解或者其它人工处理。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,基于次级语言信息与标准表达的对应关系来计算自信度,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对单条或多条标准表达的对数概率或相类似分数,再利用归一化的指数函数来计算出对该条或该多条标准表达的自信度。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,次级语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,对于数据库中已有的成对的次级语言信息和标准表达,将该次级语言信息的元素的各种排列组合与该标准表达或者该标准表达的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准表达的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达的配对数据,并存储在数据库中。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,用循环迭代得到的次级语言信息测试机器对于次级语言信息到标准表达的转换,并将不能被正确转换的次级语言信息及其应正确对应的标准表达写入对照表,对于后续输入的自然表达,由自然表达转换的次级语言信息先与对照表中存储的次级语言信息进行对比。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,在对次级语言信息与标准表达的配对数据进行循环迭代时,也对次级语言信息到标准表达的转换模型进行循环优化。
根据本发明的一方面,提供了一种基于自然智能的自然表达处理及回应方法,其中,包括:前述的自然表达处理方法获得第一标准表达;调用或生成与标准表达相匹配的标准回应;以与第一自然表达对应的方式输出标准回应。
根据本发明的一方面,提供了一种基于自然智能的自然表达处理及回应设备,其中,包括:对话网关,中央控制器,MAU工作站,机器人,训练数据库,回应数据库和回应生成器,其中,对话网关接收来自用户的自然表达,发送给中央控制器进行后续处理,并且将对自然表达的回应发送给用户;中央控制器接收来自对话网关的自然表达,并与机器人以及MAU工作站协同工作,将该自然表达转换为表示该自然表达的含义的标准表达,并根据标准表达指示回应生成器生成与该标准表达对应的标准回应;机器人根据中央控制器的指示,将自然表达转换为次级语言信息,计算将由自然表达转换的次级语言信息转换为训练数据库中的标准表达的自信度,当计算得到对于某标准表达的自信度不低于第一自信度阈值,将次级语言信息转换为该标准表达;MAU工作站将自然表达呈现给外部的MAU人工座席,MAU人工座席通过MAU工作站输入或者选择标准表达,然后MAU工作站将该标准表达 发送给中央控制器;训练数据库用于存储次级语言信息和标准表达的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的自然表达的回应。
根据本发明的一方面,提供了一种基于自然智能的人机交互系统,其中,包括:自然表达处理及回应设备和呼叫设备,其中,用户通过呼叫设备与自然表达处理及回应设备通信,MAU人工座席对自然表达处理及回应设备进行人工操作,自然表达处理及回应设备包括:对话网关,中央控制器,MAU工作站,机器人,训练数据库,回应数据库和回应生成器,其中,对话网关接收来自用户的自然表达,发送给中央控制器进行后续处理,并且将对自然表达的回应发送给用户;中央控制器接收来自对话网关的自然表达,并与机器人以及MAU工作站协同工作,将该自然表达转换为表示该自然表达的含义的标准表达,并根据标准表达指示回应生成器生成与该标准表达对应的标准回应;机器人根据中央控制器的指示,将自然表达转换为次级语言信息,计算将由自然表达转换的次级语言信息转换为训练数据库中的标准表达的自信度,当计算得到对于某标准表达的自信度不低于第一自信度阈值,将次级语言信息转换为该标准表达;MAU工作站将自然表达呈现给MAU人工座席,MAU人工座席通过MAU工作站输入或者选择标准表达,然后MAU工作站将该标准表达发送给中央控制器;训练数据库用于存储次级语言信息和标准表达的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的自然表达的回应。
根据本发明的一个方面,提供了一种基于自然智能的自然表达处理方法,其中包括:在数据库中设置分别与多个意图对应的多个标准表达,接收自然表达,将自然表达转换为次级语言信息,从次级语言信息获取与多个意图对应的部分,将获取的与多个意图对应的次级语言信息的部分分别转换为标准表达,其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,由自然表达转换得到的次级语言信息和从该次级语言信息转换得到的分别与多个 意图对应的多个标准表达作为配对数据被存储在数据库,将该次级语言信息的元素的各种排列组合与多个标准表达的组合或者该多个标准表达的组合的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与多个标准表达的组合或者该多个标准表达的组合的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达组合的配对数据,并存储在数据库中。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,用循环迭代得到的次级语言信息测试机器对于次级语言信息到标准表达的转换,并将不能被正确转换的次级语言信息及其应正确对应的标准表达写入对照表,对于后续输入的自然表达,由自然表达转换的次级语言信息先与对照表中存储的次级语言信息进行对比。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,在对次级语言信息与标准表达的配对数据进行循环迭代时,也对次级语言信息到标准表达的转换模型进行循环优化。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,当从输入的自然表达获得次级语言信息后,将该次级语言信息与数据库中已有的次级语言信息进行比较,然后根据比较结果来确定与该次级语言信息对应的标准表达或标准表达组合,和/或计算将该次级语言信息正确对应到某标准表达的概率,如果机器理解能力不够成熟,不足以或者不确定将该次级语言信息转换到某标准表达,那么进行人工辅助理解,通过人工对输入的自然表达进行理解,得到与某个或某些意图所对应的标准表达或标准表达组合,并且将从该自然表达得到的次级语言信息与标准表达或标准表达组合对应起来或者将自然表达与标准表达或标准表达组合对应起来,得到新的配对数据存入数据库。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,对于新的次级语言信息与标准表达或标准表达组合的配对数据或者新的自然表达与标准表达或标准表达组合的配对数据,将其中的次级语言信息或者由自然表达转换得到的次级语言信息的元素的各种排列组合与其中的标准表达或标准表达组合本身或者该标准表达或标准表达组合的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准 表达组合本身或者该标准表达或标准表达组合的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达或标准表达组合的配对数据,并存储在数据库中。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,通过人工辅助理解纠正数据库中次级语言信息与标准表达或标准表达组合之间错误的对应关系。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,通过自信度来衡量机器理解能力,其中,基于次级语言信息与标准表达的对应关系来计算自信度。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,从自然表达得到次级语言信息之后,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对单条或多条标准表达的对数概率或相类似分数,再利用归一化的指数函数来计算出对于该条或该多条标准表达的自信度。
根据本发明实施例的基于自然智能的自然表达处理方法,次级语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,通过多次理解或者多轮会话来从次级语言信息获取与多个意图对应的部分。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,在数据库中设置多个上位意图,在每个上位意图设置多个下位意图,在一次意图获取操作中,从次级语言信息获取与不同上位意图的各自下位意图对应的部分,并将这些部分转换为标准表达。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,对于与多个意图中的一个意图对应的标准表达,或者与多个意图中的一部分意图对应的标准表达的组合,预先在数据库中存储该标准表达和与该标准表达对应的自然表达或次级语言信息作为配对训练数据,或存储标准表达组合和与该标准表达组合对应的自然表达或次级语言信息作为配对训练数据,并利用这些配对训练数据进行训练。
根据本发明的一方面,提供了一种基于自然智能的自然表达处理及回应方法,其中包括:通过根据前述的自然表达处理方法获得标准表达或标准表 达的组合;调用或生成与标准表达或标准表达的组合相匹配的标准回应;以与自然表达对应的方式输出标准回应。
根据本发明实施例的自然表达处理及回应方法,其中,标准回应是预先存储在回应数据库中的固定数据,或者基于变量参数和预先在回应数据库中存储的标准回应的基础数据来生成标准回应。
根据本发明的一方面,提供了一种基于自然智能的自然表达处理及回应设备,其中,包括:对话网关,中央控制器,MAU工作站,机器人,表达数据库,回应数据库和回应生成器,其中,对话网关接收来自用户的自然表达,发送给中央控制器进行后续处理,并且将对自然表达的回应发送给用户;中央控制器接收来自对话网关的自然表达,并与机器人以及MAU工作站协同工作,将该自然表达转换为与设置的多个意图对应的多个标准表达,并根据标准表达指示回应生成器生成与该标准表达对应的标准回应;机器人根据中央控制器的指示,将自然表达转换为次级语言信息,从次级语言信息获取与多个意图对应的部分,将获取的与多个意图对应的次级语言信息的部分分别转换为标准表达,其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级;MAU工作站将自然表达呈现给外部的MAU人工座席,MAU人工座席通过MAU工作站输入或者选择标准表达,然后MAU工作站将该标准表达发送给中央控制器;训练数据库存储次级语言信息和标准表达或标准表达组合的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的自然表达的回应,训练器,该训练器用于训练机器人将自然表达转换为标准表达或标准表达组合。
根据本发明的一方面,提供了一种基于自然智能的人机交互系统,其中,包括:自然表达处理及回应设备和呼叫设备,其中,用户通过呼叫设备与自然表达处理及回应设备通信,MAU人工座席对自然表达处理及回应设备进行人工操作,自然表达处理及回应设备包括:对话网关,中央控制器,MAU工作站,机器人,表达数据库,回应数据库和回应生成器,其中,对话网关接收来自用户的自然表达,发送给中央控制器进行后续处理,并且将对自然表达的回应发送给用户;中央控制器接收来自对话网关的自然表达,并与机 器人以及MAU工作站协同工作,将该自然表达转换为与设置的多个意图对应的多个标准表达,并根据标准表达指示回应生成器生成与该标准表达对应的标准回应;机器人根据中央控制器的指示,将自然表达转换为次级语言信息,从次级语言信息获取与多个意图对应的部分,将获取的与多个意图对应的次级语言信息的部分分别转换为标准表达,其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级;MAU工作站将自然表达呈现给外部的MAU人工座席,MAU人工座席通过MAU工作站输入或者选择标准表达,然后MAU工作站将该标准表达发送给中央控制器;训练数据库存储次级语言信息和标准表达或标准表达组合的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的自然表达的回应,训练器,该训练器用于训练机器人将自然表达转换为标准表达或标准表达组合。
根据本发明的一方面,提供了一种基于自然智能的自然表达处理方法,其中,包括:接收并存储自然表达,将自然表达转换为次级语言信息,计算将由自然表达转换的次级语言信息转换为数据库中的标准表达的自信度,当对于第一标准表达所计算的自信度不低于第一自信度阈值,输出第一标准表达作为对第一自然表达进行理解的结果;当自信度低于第一自信度阈值,静默座席对存储的自然表达进行理解,当静默座席能够理解自然表达,则由静默座席输入理解得到的第二标准表达;当静默座席不能理解自然表达,则静默座席提示再次输入具有相同含义的自然表达或者转由高级座席理解存储的自然表达并进行应答。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,知识库设计师根据高级座席对静默座席不能理解的自然表达的应答进行话术的后台构建。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,将由自然表达转换的次级语言信息与第二标准表达作为配对数据存储在数据库。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,基于次级语言信息与标准表达的对应关系来计算自信度,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对单条或多条标准表 达的对数概率或相类似分数,再利用归一化的指数函数来计算出对于该条或该多条标准表达的自信度。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,次级语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,对于数据库中已有的成对的次级语言信息和标准表达,将该次级语言信息的元素的各种排列组合与该标准表达或者该标准表达的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准表达的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达的配对数据,并存储在数据库中。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,用循环迭代得到的次级语言信息测试机器对于次级语言信息到标准表达的转换,并将不能被正确转换的次级语言信息及其应正确对应的标准表达写入对照表,对于后续输入的自然表达,由自然表达转换的次级语言信息先与对照表中存储的次级语言信息进行对比。
根据本发明实施例的基于自然智能的自然表达处理方法,其中,在对次级语言信息与标准表达的配对数据进行循环迭代时,也对次级语言信息到标准表达的转换模型进行循环优化。
根据本发明的一方面,提供了一种基于自然智能的自然表达处理及回应方法,其中,包括:通过前述的自然表达处理方法获得第一标准表达或第二标准表达;调用或生成与第一标准表达或第二标准表达相匹配的标准回应;以与自然表达对应的方式输出标准回应。
根据本发明的一方面,提供了种基于自然智能的自然表达处理及回应设备,其中,包括:对话网关,中央控制器,MAU工作站,机器人,训练数据库,回应数据库和回应生成器,其中,对话网关接收来自用户的自然表达,发送给中央控制器进行后续处理,并且将对自然表达的回应发送给用户;中央控制器接收来自对话网关的自然表达,并与机器人以及MAU工作站协同工作,将该自然表达转换为表示该自然表达的含义的标准表达,并根据标准 表达指示回应生成器生成与该标准表达对应的标准回应;机器人根据中央控制器的指示,将自然表达转换为次级语言信息,计算将由自然表达转换的次级语言信息转换为训练数据库中的标准表达的自信度,当对于第一标准表达所计算的自信度不低于第一自信度阈值,将次级语言信息转换为第一标准表达;MAU工作站将自然表达呈现给外部的MAU人工座席,其中,MAU人工座席包括静默座席和高级座席,静默座席通过MAU工作站输入或者选择标准表达,然后MAU工作站将该标准表达发送给中央控制器,当计算得到的自信度低于第一自信度阈值,静默座席对存储的自然表达进行理解,当静默座席能够理解自然表达,则由静默座席输入理解得到的第二标准表达,当静默座席不能理解自然表达,则静默座席提示用户再次输入具有相同含义的自然表达或者转由高级座席理解存储的自然表达并进行应答;训练数据库用于存储次级语言信息和标准表达的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的自然表达的回应。
根据本发明的一方面,提供了一种基于自然智能的人机交互系统,其中,包括:自然表达处理及回应设备和呼叫设备,其中,用户通过呼叫设备与自然表达处理及回应设备通信,MAU人工座席对自然表达处理及回应设备进行人工操作,自然表达处理及回应设备包括:对话网关,中央控制器,MAU工作站,机器人,训练数据库,回应数据库和回应生成器,其中,对话网关接收来自用户的自然表达,发送给中央控制器进行后续处理,并且将对自然表达的回应发送给用户;中央控制器接收来自对话网关的自然表达,并与机器人以及MAU工作站协同工作,将该自然表达转换为表示该自然表达的含义的标准表达,并根据标准表达指示回应生成器生成与该标准表达对应的标准回应;机器人根据中央控制器的指示,将自然表达转换为次级语言信息,计算将由自然表达转换的次级语言信息转换为训练数据库中的标准表达的自信度,当对于第一标准表达所计算的自信度不低于第一自信度阈值,将次级语言信息转换为第一标准表达;MAU工作站将自然表达呈现给MAU人工座席,其中,MAU人工座席包括静默座席和高级座席,静默座席通过MAU工作站输入或者选择标准表达,然后MAU工作站将该标准表达发送给中央 控制器,当计算得到的自信度低于第一自信度阈值,静默座席对存储的自然表达进行理解,当静默座席能够理解自然表达,则由静默座席输入理解得到的第二标准表达,当静默座席不能理解自然表达,则静默座席提示用户再次输入具有相同含义的自然表达或者转由高级座席理解存储的自然表达并进行应答;训练数据库用于存储次级语言信息和标准表达的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的自然表达的回应。
根据本发明的一方面,提供了一种对基于自然智能的人机交互系统进行训练的方法,其中包括:生成与标准表达对应的文字脚本,通过文本语音转换工具得到与文字脚本对应的语音,将各条语音分别转换为次级语言信息,其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级,次级语言信息和与其对应标准表达作为配对数据被存储在数据库,对于数据库中已有的成对的次级语言信息和标准表达,将该次级语言信息的元素的各种排列组合与该标准表达或者该标准表达的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或或标准表达的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达的配对数据,并存储在数据库中。
根据本发明实施例的对基于自然智能的人机交互系统进行训练的方法,其中,输入语音,将输入的语音转换为次级语言信息,将从输入的语音转换得到的次级语言信息与数据库中已有的次级语言信息进行比较,然后根据比较结果来确定与该次级语言信息对应的标准表达,和/或计算将该次级语言信息正确对应到某标准表达的概率,如果机器理解能力不够成熟,不足以或者不确定将该次级语言信息转换到某标准表达,那么进行人工辅助理解,通过人工对输入的语音进行理解,得到标准表达,并且将从该语音得到的次级语言信息与该标准表达对应起来,得到新的配对数据存入数据库。
根据本发明实施例的对基于自然智能的人机交互系统进行训练的方法,其中,对于新的次级语言信息与标准表达或标准表达组合的配对数据或者新的自然表达与标准表达或标准表达组合的配对数据,将其中的次级语言信息或者由自然表达转换得到的次级语言信息的元素的各种排列组合与其中的 标准表达或标准表达组合本身或者该标准表达或标准表达组合的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准表达组合本身或者该标准表达或标准表达组合的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达或标准表达组合的配对数据,并存储在数据库中。
根据本发明实施例的对基于自然智能的人机交互系统进行训练的方法,其中,通过人工辅助理解纠正数据库中次级语言信息与标准表达或标准表达组合之间错误的对应关系。
根据本发明实施例的对基于自然智能的人机交互系统进行训练的方法,其中,通过自信度来衡量机器理解能力,其中,基于次级语言信息与标准表达的对应关系来计算自信度。
根据本发明实施例的对基于自然智能的人机交互系统进行训练的方法,其中,从自然表达得到次级语言信息之后,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对单条或多条标准表达的对数概率或相类似分数,再利用归一化的指数函数来计算出对于该条或该多条标准表达的自信度。
根据本发明实施例的对基于自然智能的人机交互系统进行训练的方法,次级语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
根据本发明实施例的对基于自然智能的人机交互系统进行训练的方法,其中,用循环迭代得到的次级语言信息测试机器对于次级语言信息到标准表达的转换,并将不能被正确转换的次级语言信息及其应正确对应的标准表达写入对照表,对于后续输入的自然表达,由自然表达转换的次级语言信息先与对照表中存储的次级语言信息进行对比。
根据本发明实施例的对基于自然智能的人机交互系统进行训练的方法,其中,在对次级语言信息与标准表达的配对数据进行循环迭代时,也对次级语言信息到标准表达的转换模型进行循环优化。
根据本发明的一方面,提供了一种基于自然智能的语音处理及回应设备,其中,包括:对话网关,中央控制器,MAU工作站,机器人,训练数据库,回应数据库,回应生成器,文本语音转换器,其中,对话网关接收来自用户的语音,发送给中央控制器进行后续处理,并且将对语音的回应发送 给用户;中央控制器接收来自对话网关的语音,并与机器人以及MAU工作站协同工作,将该语音转换为表示该语音的含义的标准表达,并根据标准表达指示回应生成器生成与该标准表达对应的标准回应;机器人根据中央控制器的指示,将语音转换为次级语言信息,其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级,并将次级语言信息转换为标准表达;MAU工作站将语音呈现给外部的MAU人工座席,MAU人工座席通过MAU工作站输入或者选择标准表达,然后MAU工作站将该标准表达发送给中央控制器;训练数据库用于存储次级语言信息和标准表达的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的语音的回应,文本语音转换器,基于与标准表达对应的文字脚本生成与该文字脚本对应的语音,机器人将文本语音转换器得到的语音转换为次级语言信息,并将该次级语言信息与相应文本所对应的标准表达构成配对数据存储在训练数据库,其中,设备进一步包括训练器,该训练器用于训练机器人将语音转换为标准表达,其中,机器人将次级语言信息的元素的各种排列组合与对应的标准表达或者该标准表达的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准表达的元素的各种排列组合之间的对应关系,获得的次级语言信息与标准表达的配对数据,存储在训练数据库中。
根据本发明的一方面,提供了一种基于自然智能的人机交互系统,其中,包括:自然表达处理及回应设备和呼叫设备,其中,用户通过呼叫设备与自然表达处理及回应设备通信,MAU人工座席对自然表达处理及回应设备进行人工操作,自然表达处理及回应设备包括:对话网关,中央控制器,MAU工作站,机器人,训练数据库,回应数据库,回应生成器,文本语音转换器,其中,对话网关接收来自用户的语音,发送给中央控制器进行后续处理,并且将对语音的回应发送给用户;中央控制器接收来自对话网关的语音,并与机器人以及MAU工作站协同工作,将该语音转换为表示该语音的含义的标准表达,并根据标准表达指示回应生成器生成与该标准表达对应的标准回应;机器人根据中央控制器的指示,将语音转换为次级语言信息,其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级,并将次 级语言信息转换为标准表达;MAU工作站将语音呈现给外部的MAU人工座席,MAU人工座席通过MAU工作站输入或者选择标准表达,然后MAU工作站将该标准表达发送给中央控制器;训练数据库用于存储次级语言信息和标准表达的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的语音的回应,文本语音转换器,基于与标准表达对应的文字脚本生成与该文字脚本对应的语音,机器人将文本语音转换器得到的语音转换为次级语言信息,并将该次级语言信息与相应文本所对应的标准表达构成配对数据存储在训练数据库,其中,设备进一步包括训练器,该训练器用于训练机器人将语音转换为标准表达,其中,机器人将次级语言信息的元素的各种排列组合与对应的标准表达或者该标准表达的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准表达的元素的各种排列组合之间的对应关系,获得的次级语言信息与标准表达的配对数据,存储在训练数据库中。
根据本发明的一方面,提供了一种对机器人进行训练的方法,其中,包括:用训练数据库中的表达数据与意图数据的正确配对数据来对机器人进行训练;机器人对这些表达数据进行理解,将理解结果与正确配对的意图数据进行对比,找到理解错误的表达数据;将理解错误的表达数据及与其对应的意图数据写入独立于训练数据库的对照表,其中,机器人在以后进行理解时先将所要理解的表达数据与对照表中的表达数据进行比对,如果发现该表达数据在对照表中,则直接通过对照表找到对应的理解结果,如果在对照表中没有找到该表达数据,那么再在训练数据库中进行比对。
根据本发明实施例的对机器人进行训练的方法,其中,表达数据是从自然表达转换得到的次级语言信息。
根据本发明实施例的对机器人进行训练的方法,其中,次级语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
根据本发明实施例的对机器人进行训练的方法,其中,对于训练数据库中已有的成对的表达数据和意图数据,将该表达数据的元素的各种排列组合与该意图数据或者该意图数据的元素的各种排列组合进行循环迭代,建立表达数据的元素的各种排列组合与意图数据或意图数据的元素的各种排列组 合之间的对应关系,获得更多的表达数据与意图数据的配对数据,并存储在训练数据库中。
根据本发明实施例的对机器人进行训练的方法,其中,在对表达数据与意图数据的配对数据进行循环迭代时,也对表达数据到意图数据的转换模型进行循环优化。
根据本发明实施例的对机器人进行训练的方法,其中,对照表还用来存储出现概率较高的表达数据及与其对应的意图数据。
根据本发明实施例的对机器人进行训练的方法,其中,通过人工辅助理解纠正训练数据库中表达数据与意图数据之间错误的对应关系。
根据本发明实施例的对机器人进行训练的方法,其中,生成与意图数据对应的脚本,通过转换工具得到与该脚本对应的自然表达,从该自然表达转换得到表达数据,从而获得表达数据与意图数据的正确配对数据。
根据本发明实施例的对机器人进行训练的方法,其中,脚本是文字脚本,自然表达是语音,通过文本语音转换工具调整变化语音的语速、音量、语气、语调中的一个或多个参数。
根据本发明的一方面,提供了一种自然表达处理及回应设备,其中,包括:对话网关,中央控制器,MAU工作站,机器人,训练数据库,回应数据库,回应生成器,其中,对话网关接收来自用户的自然表达,发送给中央控制器进行后续处理,并且将对自然表达的回应发送给用户;中央控制器接收来自对话网关的自然表达,并与机器人以及MAU工作站协同工作,将该自然表达转换为表示该自然表达的含义的意图数据,并根据意图数据指示回应生成器生成与该意图数据对应的标准回应;机器人根据中央控制器的指示,将自然表达转换为表达数据,并得到与该表达数据对应的意图数据;MAU工作站将自然表达呈现给外部的MAU人工座席,MAU人工座席通过MAU工作站输入或者选择意图数据,然后MAU工作站将该意图数据发送给中央控制器;训练数据库用于存储表达数据和意图数据的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的自然表达的回应,其中,设备进一步包括训练器,该训练器用于训练机器人从自然表达获得意图数据,其中,训练器用前述的方 法来对机器人进行训练。
根据本发明的一方面,提供了一种人机交互系统,其中,包括:自然表达处理及回应设备和呼叫设备,其中,用户通过呼叫设备与自然表达处理及回应设备通信,MAU人工座席对自然表达处理及回应设备进行人工操作,自然表达处理及回应设备包括:对话网关,中央控制器,MAU工作站,机器人,训练数据库,回应数据库,回应生成器,其中,对话网关接收来自用户的自然表达,发送给中央控制器进行后续处理,并且将对自然表达的回应发送给用户;中央控制器接收来自对话网关的自然表达,并与机器人以及MAU工作站协同工作,将该自然表达转换为表示该自然表达的含义的意图数据,并根据意图数据指示回应生成器生成与该意图数据对应的标准回应;机器人根据中央控制器的指示,将自然表达转换为表达数据,并得到与该表达数据对应的意图数据;MAU工作站将自然表达呈现给外部的MAU人工座席,MAU人工座席通过MAU工作站输入或者选择意图数据,然后MAU工作站将该意图数据发送给中央控制器;训练数据库用于存储表达数据和意图数据的配对数据;回应数据库存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;回应生成器接收中央控制器的指令,通过调用和/或运行回应数据库中的数据来生成对用户的自然表达的回应,其中,设备进一步包括训练器,该训练器用于训练机器人从自然表达获得意图数据,其中,训练器用前述的方法来对机器人进行训练。
根据本发明的一方面,提供了一种端到端控制方法,其中,包括:在操作者操控设备时,通过传感器从被控设备的外部环境和/或被控设备本身收集传感器数据,并实时记录由操控者的操控所产生的控制数据;将在时间上关联的传感器数据和控制数据作为配对数据输入给机器人;机器人用配对数据进行训练;其中,经过训练的机器人根据传感器数据做出判断并控制所述设备,包括:将通过传感器收集到的传感器数据输入机器人;机器人判断是否能够根据传感器数据确定与之对应的控制数据,如果能确定控制数据,则机器人根据与传感器数据对应的控制数据来控制所述设备;如果机器人不能确定控制数据,则由操作者对设备进行操控。
根据本发明实施例所述的端到端控制方法,其中,机器人不能确定控制数据而由操作者对设备进行操控时,通过传感器从被控设备的外部环境和/ 或被控设备本身收集传感器数据,并实时记录由操控者的操控所产生的控制数据,将该传感器数据和控制数据作为配对数据来训练机器人。
根据本发明实施例的端到端控制方法,其中,在对机器人进行训练时,自动优化将传感器数据对应到控制数据的模型。
根据本发明实施例的端到端控制方法,其中,机器人基于自信度来判断是否能够确定与传感器数据对应的控制数据,其中,基于传感器数据与控制数据的对应关系来计算所述自信度,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对控制数据的对数概率或相类似分数,再利用归一化的指数函数来计算出对该控制数据的自信度。
根据本发明实施例的端到端控制方法,其中,传感器数据包括图像,机器人在进行训练之前将图像转换为次级图像信息,该次级图像信息的信息颗粒度比像素粗但比物体识别所用的信息颗粒度细。
根据本发明实施例的端到端控制方法,其中,传感器数据包括语音,机器人在进行训练之前将语音转换为次级语音信息,该次级语音信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级。
根据本发明实施例的端到端控制方法,其中,在预设的时间间隔内关联传感器数据和控制数据。
根据本发明实施例的端到端控制方法,其中,传感器数据是在所述预设时间间隔内由传感器从被控设备的外部环境和/或被控设备本身收集的图像、声音和距离中的一种或多种相对于时间的变化。
根据本发明实施例的端到端控制方法,其中,所述设备是交通工具、无人机或者数字处理终端。
根据本发明的一方面,提供了一种端到端控制系统,其中,包括:传感器,机器人和控制器,其中,在操作者通过控制器操控设备时,通过传感器从被控设备的外部环境和/或被控设备本身收集传感器数据,并实时记录由操控者的操控使得控制器所产生的控制数据;将在时间上关联的传感器数据和控制数据作为配对数据输入给机器人;机器人用配对数据进行训练;其中,经过训练的机器人根据传感器数据做出判断并控制所述设备,包括:将通过传感器收集到的传感器数据输入机器人;机器人判断是否能够根据传感器数据确定与之对应的控制数据,如果能确定控制数据,则机器人根据与传感器 数据对应的控制数据来生产控制信号,控制器根据该控制信号控制所述设备;如果机器人不能确定控制数据,则由操作者通过控制器对设备进行操控。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本发明的一些实施例,而非对本发明的限制。
图1概括示出了从采集的声波(A语言信息)到Y语言信息的逐层转换过程;
图2示出了从采集的声波(A语言信息)到Y语言信息的转换的一个例子;
图3示出了对语音信息进行识别的一个例子;
图4是多层感知的原理示意图;
图5示出了一个利用高斯混合模型将采集的声波转换为X语言信息的例子;
图6示意性地示出了根据本发明一个实施例的自然表达处理方法的流程;
图7示意性地示出了根据本发明一个实施例的自然表达处理及回应方法的流程;
图8示意性地示出了根据本发明一个实施例的基于自然智能的人机交互系统的信息萃取和槽填充流程;
图9进一步示例性地示出了“订机票”问询项下的自然表达槽填充处理流程;
图10示意性示出了根据本发明实施例的智能人机交互系统;
图11进一步示出了图10系统中的智能应答设备的部分结构;
图12A~图12P示意性地示出了根据本发明实施例的意图获取和槽填充系统的操作界面;
图13示意性地示出了根据本发明实施例的机器人理解与人工辅助理解(MAU)相结合的自然表达处理过程;
图14示意性地示出了一个由MAU工作站呈现给MAU人工座席9的操 作界面的例子;
图15示出了智能人机交互的一个例子;
图16A和图16B示意性地示出了在通过传感器获取的图像中对道路物体的标注,图16C示意性地示出了在通过传感器获取的图像中对交通道路指示标志的标注;
图17示意性地示出了根据本发明实施例的基于自然智能的端到端控制系统;
图18示意性地示出了经过训练的机器人根据传感器数据做出判断并控制所述控制器的过程。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例的附图,对本发明实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于所描述的本发明的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其它实施例,都属于本发明保护的范围。
除非另作定义,此处使用的技术术语或者科学术语应当为本发明所属领域内具有一般技能的人士所理解的通常意义。本发明专利申请说明书以及权利要求书中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“一个”或者“一”等类似词语也不表示数量限制,而是表示存在至少一个。
人工智能(AI)是基于计算机技术和传统IT技术建立的,简单讲,就是通过人为建立计算机可以执行的规则,来模仿人类智力活动的结果,其背后的核心逻辑是黑白逻辑(Distinct Logic)。而自然智能(NI,Natural Intelligence)虽然也是通过计算机技术实现,但却是模仿人类智力活动本身,由计算机自己构建规则,其背后的核心逻辑是灰度逻辑(Fuzzy Logic)。
在自然语言处理(NLP)领域,基于人工智能的自然语言处理技术(AI-NLP技术)需要先将自然语言的语音转写为文本,然后再进行自然语言理解(NLU,Natural Language Understanding)。采用这种方法如下所述有很大的弊端,但这其中有其历史的原因。
一方面,人工智能技术在自然语言处理领域的发展起源于机器翻译,即,将一段某种语言的文字翻译为另一种语言的文字,以两种语言文字的准确对应为价值导向。并且采用概率和统计的方法基于语法规则对自然语言进行处理。但是,对于随意性较强的非书面语或者文义容易高度模糊的长串句子,当套用实际语法进行分析时可能会产生出成千上万种可能性。尽管可以采用诸如语料库及马可夫模型(Markov models)等方法减少歧义,但是仍然要求建立含有庞大数据量的配对语料的语料库供计算机学习和使用。而无论生成配对语料(往往通过人为标注)所需的数据收集、语法模型和语义模型的建立、还是检索和反馈所需的计算,都需要非常大的计算资源和成本。
基于人工智能的语音识别技术作为上述机器翻译技术的延伸,也是采用相同的方法论和类似的方法,通过收集和标注而生成语音和文本之间的配对语料、利用语法模型、语义模型等进行学习,仍然以语音识别的准确度为价值取向。
但是,由于自然语言理解或意图识别并不要求对每个字都进行准确严谨的翻译或识别,因此上述思路和方法实际上带来了巨大的资源浪费。
另一方面,这种先将语音识别为文本再对所识别的文本进行理解的方法本身使得语言理解的准确度存在一个并不高的理论极限。这是因为,将语音识别为文本的过程本身损失了大量的信息(例如,5分钟双声道、16位采样位数、44.1kHz采样频率声音的不压缩数据量约50MB,而如果按照每分钟200汉字的语速,五分钟1000汉字,对应的数据量为2KB,相差25000倍),而这部分损失的信息很可能包含了语言理解所需的关键信息。换句话说,识别出的文本与原始语音相比,信息颗粒度是非常粗的。例如,类似前面所述的例子,对于具有10个字的一句话,如果其中的关键字是3个且识别错1个关键字就会造成理解错误,那么即使能够保证90%的识别准确率,理解正确的概率也只有70%;即使能够通过文字之间的关联信息一定程度上减小因识别错1个或者多个关键字而造成理解错误的概率,整体理解正确率也不会比70%高多少。例如,业界称80%准确率瓶颈为“AI商用的世纪魔咒”。
自然智能是对于人类智能行为的模仿,人类智能行为本身就是基于灰度逻辑的。具体在与外界交互过程中,人脑并非先将从外部感知的表达(声音、图像、接触、味道等)转换为文本,再进行理解,而是直接对通过感觉器官 感知获得的信息进行分析理解。该理解基于已有知识和经验(经验也可以理解是概然性)从感知的信息获得外部表达的含义,因此实际上是可能有偏差的。例如,观察者看到一个人在摇头,通常会认为被观察者表达了否定的态度。不过如果观察者意识到自己身处印度,被观察者是印度人,并且了解印度人轻微摇头(晃头)多表示肯定态度的话,那么观察者对于该表达(摇头)的理解就会是正确的,尽管这与一般认知相反;而如果观察者不具有这样的知识和经验,那么他/她就可能做出错误的判断,但是观察者仍然可以通过被观察者面部表情、肢体动作或手势等表达来纠正自己的判断(这个问题实际中更加复杂,如果被观察的印度人是面无表情或者微笑但两眉较为靠近眉心地左右上下摇晃头,那么他更可能表达的是“不置可否”的态度)。
而对于人工智能来说,它可以通过图像识别或视频识别来将所述外部表达识别为描述动作的文本——“摇头”或者“某某人轻轻摇头”或者更长的语句,但是其不能根据所识别出的文本内容本身对基于一般认知规则在此的适用结果是否正确进行校验。这是因为,人工智能只采集了用于判断动作(可以例如包括动作主体)本身的信息来进行识别,而滤除了其它的信息,但这些被滤除的信息中恰巧包括了用于判断该动作所表达的真实意图的关键信息。并且这些关键信息的损失是不可逆的,并不能从描述所识别的动作的文本中重新获得。
再举一个自然语言理解的例子。听到一句话“tian shang hui hui i”,可能识别为“天上灰灰地”或者“天上灰灰鸡”,但是如果具有对咿呀学语的孩童的表达或者对于口齿不清的人都表达具有理解能力,甚至有对演变而来的网络语言的理解能力,则会理解到该语音正确的含义是“天上飞飞机”。
从上面这个关于自然语言理解的例子我们也可以想到,不掌握文字的孩童或者原始文明,可以清晰地通过语言来表达和沟通自己的意图,这也是自然智能相比人工智能更接近人类智能原因所在——自然智能能够处理文本但不依赖文本,但人工智能依赖文本作为识别和理解的媒介。
类似地,对于基于人工智能的无人驾驶技术,如前所述,需要先利用物体检测模型对传感器获取的图像信息等进行分析,识别出对象物并确定该对象物的位置。而在建立和训练物体检测模型时,需要大量的经过人工标注的图像数据。这种人工标注工作会带来很高的成本,也容易发生错误。并且, 由于要实现对象物的识别,目前的物体检测过程所花费的时间可能多于驾驶员实际反应时间(人的反应速度一般在0.1秒~0.3秒),可能不能满足实时控制的要求。
图16A和图16B示意性地示出了在通过传感器获取的图像中对道路物体的标注。所谓的道路物体包括周边的街景数据,比如十字路口、高架桥、隧道、城市道路等,还有行人、车辆、红绿灯、指示标志、禁止标志等。此外,无人驾驶车辆在路线规划时还需要标注交通道路指示标志,例如,直行、向左转弯、向右转弯、禁止通行、禁止驶车等,如图16C所示。对车辆行驶时收集到的多帧视频数据,还可能需要对于已识别的车辆、行人等运动物体进行轨迹跟踪标注。这些标注往往通过人工进行,或者在自动标注后由人工检查校验。由于道路中及道路附近的需标注物体是动态变化或随时间更新的,因此这种标注的工作量是巨大的,即使通过计算机对于标注后的海量数据进行自学习,出于驾驶安全性的角度,仍然需要很大量的人工检验。除了人工标注本身的误差率因素,由于人工标注往往是非实景非实时的,因此因为标注员对于所标注的实际道路的不熟悉,可能会因道路本身的标识错误(例如交通标志错误、道路指示标志错误)而带来标注错误。
除了人工标注所可能带来的问题之外,基于现有人工智能的自动驾驶方案还需要在车辆上设置很多雷达或超声传感器来对车辆的各个方向进行实时监测,以防止或者预防车辆在行进或移动过程中与周围静止或者相对于车辆运动的物体发生碰撞或刮蹭。这种设计在增加了车辆成本的同时也带来了新的误差或者错误因素。
另一方面,当基于对道路物体等车辆周围路况因素进行观测或监测的结果来对车辆进行控制时,需要根据不同的实时路况做出相应的反应。由于路况可以是非常复杂且又实时变化的,因此需要建立大量的规则来与之相适应,这同样会带来大量的成本和误差。
基于人工智能的自动驾驶方案在复杂路况下会出现各种的问题,是上述“标注+识别+规则”的方法论的固有缺陷造成的。前述的基于人工智能的语音处理方法,先将语音转换为文字,再对文字进行识别,然后利用人为建立的语法和语义等规则来分析文字化的语句的含义;类似地,这种基于人工智能的自动驾驶或自动控制方法,先对道路物体进行标注,然后利用模型对标 注部分进行识别和跟踪,再基于控制规则根据识别和跟踪的结果进行控制。因而这种自动驾驶或自动控制方法同样会产生因对物体进行识别而对其它周围环境信息忽略或者不敏感的问题,从而因信息缺失而造成对周围环境或路况的判断误差;另一方面,这类方法中的控制规则本身是明确而机械的,并没有人类驾驶员或控制人员基于自身经验的灵活裁量。
例如,当车辆在路口右转(右行国家)时,一条狗快速跑向该路口转角,基于人工智能的自动驾驶方案会识别出该条狗并对其进行跟踪,发现其以一定速度跑向车头前方方向,则指示车辆提前减速或刹车以避让;但人类驾驶员会注意到该条狗后面一定距离有狗的主人,并且手里牵着狗的缰绳,于是可以根据经验判断狗不会或者很大概率不会冲到车头前方,于是驾驶车辆继续行进或者略减速行进。再例如,当车辆后方跟随行驶有由人类驾驶的大型车辆时,人类驾驶员很可能为了自身安全考虑会变更车道或者加大与该大型车辆的距离并且注意避免急刹车,尽管如果发生追尾责任全在大型车辆的驾驶员;而如果将这样的驾驶习惯也形成为需要自动驾驶车辆遵守的规则,那么无疑会大大增加规则的复杂度。
而且,不同品牌不同型号的车辆或设备之间的性能差异也会导致使得控制规则和规则下的控制参数更加复杂。
如果应用前述自然智能的方法论,由于并非采用“标注+识别+规则”的架构,因此基本上可以避免基于人工智能方法论的自动驾驶和自动控制的上述问题。具体地,首先,在对传感器获取的图像信息进行处理时,并不需要用物体检测模型识别出对象物、确定对象物的位置以及对于对象物进行追踪,只需要对图像信息进行采样,例如从像素级的图像信息获得信息颗粒度比像素粗但是要比物体识别所用的信息颗粒度细的次级图像信息;然后并不从这些次级图像信息识别物体,而是将其与同时间点的控制参数相关联,形成配对数据,存入数据库;对于传感器以一定的时间间隔获取的图像信息(多帧图像构成的视频)进行分析,检测次级图像信息随时间的变化;将次级图像信息随时间的变化与人的控制行为或者控制参数的变化量相对应,形成配对数据,并存储于数据库;机器人基于次级图像信息与控制参数的配对数据以及次级图像信息的变化与控制行为或者控制参数的变化量的配对数据进行训练;经过训练的机器人可以根据次级图像信息及次级图像信息的变化来 确定对应的控制行为或控制参数的变化量;通过机器人输出的控制行为和控制参数的变化量来进行控制。
控制行为或者控制参数的变化量可以通过对人类现场控制或者远程控制的行为或结果(导致控制参数变化)进行记录来获得,并同时可以实时地与传感器采集的图像数据或转换得到的次级图像数据想对应而形成配对数据。
采用自然智能方法论的控制方法或控制系统,并不需要通过标注和模型来识别物体,也不需要建立大量的规则来进行控制,只需模仿人类的感知及与之对应的控制行为,即可自动训练控制模型,并用训练后的模型实现自动控制。从而节省标注和规则建立所需要的海量人力成本,并且能够避免类似于上述的因为标注误差或规则不完备而带来的潜在安全风险。
进一步,除了图像信息之外,传感器还可以引入声音信息,比如特殊车辆的警示声、周边车辆的鸣笛、人的说话声、动物叫声、打雷下雨等自然界的声音等。传感器甚至还可以引入气味信息,比如汽油的味道等。这些不同模态的信息可以与图像信息一起构成类人感知信息(类似人类感知得到的信息),并采用上述的基于自然智能的方法训练机器人和实现自动控制。
除了类人感知信息之外,雷达、超声等对周围环境进行检测的其它传感设备获得的信息、对被控设备自身参数进行监测获得的信息等等,都可以作为用于训练机器人的原始数据。将这些原始数据转换为次级数据后再进行数据配对和训练,可以减少数据量和计算量,当然在数据存储能力和计算能力允许的情况下,直接对传感器获得的各类数据进行处理(而不对图像、声波等进行粗颗粒度转换)也是可行的。
以下我们先以自然语言处理为例,进一步阐述自然智能与人工智能的异同。
如图1所示,为了便于说明,我们将不同的信息用字母A-D和X-Y表示。
A语言,即声波,是由声波采集设备(如:麦克风)收集的物理层数据。
“B语言”,是由B元素的各种排列组合形成的语言。B元素可以是音素,而B元素的某些排列组合构成音节。这里的“音素”和“音节”与其在语言学范畴下的含义相同。图2中示出了B元素的例子,这些例子是中文(汉 语)的音素。
“C语言”是由C元素的各种排列组合形成的语言。B元素的全部或部分排列组合形成C元素,因此也可以理解为B语言转换为C元素,而C元素构成了C语言。于是,从B语言到C语言的转换关系是“多对多”的关系。如果沿用音素、音节的语言学体系,C元素对应于自然语言中的“字”。图2中示出了C元素的例子,这些例子是中文的字。
“D语言”是由D元素的各种排列组合形成的语言。C元素的全部或部分排列组合形成D元素,因此也可以理解为C语言转换为D元素,而D元素构成了D语言。于是,从C语言到D语言的转换关系也是“多对多”的关系。如果沿用音素、音节、字的语言学体系,D元素对应于自然语言中的“词”或“短语”。图2中示出了D元素的例子,这些例子是中文的词。
图2中的“C语言”例子和“D语言”的例子看上去内容相同,均由“我”、“的”、“信”、“用”、“卡”、“丢”、“了”顺序组成,但是,熟悉中文的人可以知道,仅从C语言来进行理解,会产生很大的多义性,而转换为“D语言”后,该表达的含义就比较确定了。对于其它语种而言,从字→词或短语的转换对于语义理解也是十分重要的,特别是由智能系统(语音机器人)实现语音识别的情况下。根据不同的自然语言,“字”和“词”,也就是C语言信息和D语言信息,也可能归为一个语言信息层级。
如果基于前述AI的原理进行自然语言处理或者自然语言理解(NLU),那么参考图3,是沿着A→B→C或D的路径,也可以是A→B→C→D。也就是前述的将采集到的语音(声波)转换为文本的过程。
其中,从A语言信息(声波)到B语言信息(音素)的转换,一般可以由机器人比较准确地自动完成。但是,从B语言信息(音素)到C语言信息(字)的转换,可能会发生较高的错误率。例如,以中文为例,如图3例子所示,客户输入的原始语言信息为“乒乓球拍卖完了”,可能因为客户发音或口音的问题,“乒乓球”可能被分别识别为“平板就”,“拍”可能被识别为“怕”,结果这段声波最终被转换成“平板就怕卖完了”七个字。为了提高机器人的识别准确度,特别是针对诸如上述发音或口音的问题,需要对机器人的识别结果进行纠正,通常采用人工辅助识别的方式。此阶段的人工辅助识别称为转写(Transcription)。所谓转写,就是转写人员通过使用特 定的转写工具,将“声波”(A语言信息)进行精准切割,然后将切割出来的波段各自转成相应的“字”(C语言信息),也就是为机器人定义A语言(声波)→C语言(字)的转换/翻译关系。切割是否精准,关键取决于转写人员是否足够细心,对转写工具掌握的熟悉程度;而能否准确转成相应的“字”,关键取决于转写人员对这段声波所处的语境,以及上下文(位于这段声波前后的其他声波),是否已经准确理解。特别是汉字,同音字很多,也加大了转写人员精准工作的难度。
接下来,从C语言信息(字)获得D语言信息(词、短语)。从字到词的转换同样会发生歧义,如前例,即使从声波到字的识别是准确的,得到了“乒乓球拍卖完了”七个字的顺序排列结果,但是仍然会转换为至少“乒乓球拍+卖+完了”和“乒乓球+拍卖+完了”两种结果,其含义显然是不同的。同样,可以采取人工辅助识别来进行纠正。此阶段的人工辅助识别称为关键字切割(Keyword Spotting),也可以简称为“切词”,就是切词人员将转写出来的“字”(C语言信息)进行组合,形成“词(关键字)”(D语言信息),也就是为机器人定义C语言(字)→D语言(词)的转换/翻译关系。切词是否准确,往往取决于切词人员对业务知识的掌握程度。针对不同的领域,需要熟悉该领域业务内容和用语的人员进行切词操作,其成本也会比转写有所提高。
最后,从D语言信息理解意思(即Y语言信息)。仅仅获得了一定顺序排列的词语,往往还不能准确了解客户的真实含义。例如,客户说“我的信用卡不见了”,机器人识别不出其含义,技术员就将“我的”、“信用卡”、“不见了”作为新的关键字放入数据库的语法表中;另一个客户说:“俺的刷刷卡丢了”,机器人又识别不出其含义,技术员就将“俺的”、“刷刷卡”(就是“信用卡”的意思)、“丢了”作为新的关键字放入数据库的语法表中。这样,通过人工辅助的方式,将客户的含义或者需求加以理解,并归纳加入数据库。这种人工辅助识别称为关键字堆砌(Keyword Pile-up),简称为“堆词”,就是积累“词”的排列组合,并根据其的含义予以归纳入数据库。这项工作的工作量也是巨大的,并且也需要训练人员的专业知识来辅助理解。
如果依照A→B→C→D→Y的多层“多对多”关系转换,在学术上被称为多层感知(MLP,Multi-Layer Perception),如图4所示的原理,其弊端在 于:每做一次转换,都会造成原始信息在某种程度上的失真,同时也会给系统增加更多的处理负荷,造成进一步性能损失。转换的次数越多,原始信息的失真越厉害,而系统的处理速度也越慢。同理,由于在前述处理过程中的机器人训练均需要人工辅助识别的介入,一方面会产生很高的工作量和成本,另一方面多次人为介入也会提高出错的概率。
X语言是对A语言数据进行语音信号处理(SSP,Speech Signal Processing)后所得到的逻辑层数据,本发明实施例中称之为“X语言”。X语言是由X元素的各种排列组合形成的语言。X元素是系统通过某种建模工具,如:高斯混合模型(GMM,Gaussian Mixture Model),将声波自动切割成的高低不同的若干柱状元素。图5示出了一个利用高斯混合模型将采集的声波(以直方图表示)转换为X元素(以矢量量化直方图表示)的例子。
根据不同的建模工具,应用于不同的自然语音集,X元素的数量可以控制在一定的范围内(例如,200以下)。根据本发明的实施例,可以将2位ASCII字符的组合定义为X元素的ID,如图2所示。也就是说,X元素的数量最高可达16,384(128 x 128=16,384),可以满足未来因声波建模技术的进一步发展而需增加X元素数量的需求。切割后的声波单元与X元素是一一对应的,由于A语言信息可以认为是声波单元的组合,X语言信息是X元素的组合,因而从A语言到X语言的转换关系是“多对多”的关系。在图3中也示出了用ASCII字符表示的X元素的例子。
基于前述AI的原理进行自然语言处理或者自然语言理解的过程中,并不涉及X语言信息,之所以在图1、图2和图3中标识出了X语言(X元素)层,一方面是说明从信息颗粒度的角度X元素是位于声波和音素之间的;另一方面说明进行自然语言处理或者自然语言理解也是可以采用A→X→B→C或D以及A→X→B→C→D的路径,也就是说,将X元素作为A语言(声波)与B语言(音素)之间转换的中间数据也是可以的。
“Y语言”,如图1和图2所示,是指对原始自然语言信息A进行理解后获得的体现“意思”或者“含义”的语言信息。本发明实施例所定义的“标准表达”即为“Y语言”的一种形式。根据本发明的实施例,例如:银行业可以用业务编码“21”代表“信用卡挂失”的含义;可以用业务编码“252”代表“信用卡部分还款”的含义,而“252-5000”(需求代码=252,需求参 数=5000)则代表“信用卡还款5000元”的含义;娱乐业可以用编码“24”代表“观看电影”的含义,而“24-中国合伙人”(需求代码=24,需求参数=“中国合伙人”)则代表“观看电影《中国合伙人》”的含义。
简单而言,基于自然智能的自然语言处理或者自然语言理解,首先将来自用户的以物理数据形式表现的无规则的自然表达信息,例如,声波(即“A语言信息”),通过某种建模工具,进行基本的自动识别或转换,得到以若干基本元素(“X元素”)排列组合的形式表现的语言信息(“X语言信息”),然后将从A语言信息识别或转换得到的X语言信息再转换成某种形式的标准表达(“Y语言信息”)。也就是说,采用A→X→Y的处理路径,而无需转换为“字”和“词”(“C语言信息”和“D语言信息”),也无需转换成音素(“B语言信息”),如前所述,这是自然智能与人工智能在自然表达处理中很重要的不同点。可以看出,这种不同是处理路径的不同,也是方法论上的不同。于是可以省去B→C→D→Y的多层“多对多”关系转换,提高表达信息转换的正确率和效率,也可以降低人工辅助识别的工作量和出错率。
与前述的基于人工智能进行自然表达处理相比,基于自然智能的自然表达处理在处理非文字信息的表达时,无需将该表达转换为文本,而是转换为X语言信息,这种X语言信息具有比文字细得多的信息颗粒度,因此如前所述具有更高的关键信息识别准确度。例如,X语言信息的信息颗粒度与文字的信息颗粒度是在数量级上的差异,如果文字的信息颗粒度是1,那么X语言信息的信息颗粒度是1/10、1/100、1/1000等等的数量级;另一方面,由于X语言信息是对A语言信息(声波、图像等)进行采样和转换得到的,因此X语言信息比A语言信息的信息颗粒度粗,例如沿用之前的比例关系,如果文字的信息颗粒度是1,那么声波的信息颗粒度是诸如1/10000,1/100000,1/1000000等等。前述的B语言信息(由音素构成)、C语言信息(由音节构成)、D语言信息(由词或短语构成),基本上与文字的信息颗粒度处于同一数量级,因此它们在与X语言信息进行信息颗粒度层面的对比时,与文字是相似的。
关于信息颗粒度与表达理解准确度的关系,例如,对于语言表达“I lost my credit  cart”,而正确的表达应该是“I lost my credit  card”,如果以单词为信息颗粒单位,那么最后一个词“card”错误,理解的错误率会是1/5=20%, 则理解正确率为80%;如果以字符(包括空格)为信息颗粒单位,那么最后一个字符“d”错误,理解的错误率会是1/22=4.55%,则理解正确率为95.45%。
基于同样的原理,如果采用更细的信息颗粒度,那么理论上有更高的正确率。根据我们上线产品的实测数据,采用更细信息颗粒度的X基元,可以获得95%以上甚至更高的理解准确率,轻松突破前述的“AI商用的世纪魔咒”。
另一方面,在处理文字信息的表达时(即A语言信息是文字信息的情况),因为Y语言信息是基于前述的灰度逻辑与A语言信息对应的,也就是说,A语言信息到Y语言信息的对应可以容忍较大的模糊度。更有利的是,由于不需要将A语言信息转换为某种人类语言的文本,因而不会受到同音字和语法的限制,也不会受到说话方式的限制。因此,基于自然智能的方法,从A语言信息到Y语言信息的对应可以适用于方言和混合语言或混合语音,例如,汉语中夹杂英语,粤语中混合普通话,上海话中混合英语和普通话,以及更多的细分语言、方言及其混合,甚至还可以适用于多种说话方式的混合,而且理解准确率不会受到影响。对于人工智能的NLP技术,即使做出海量语法模型并付出极大成本,也无法获得高理解准确率。实际上,对于AI而言,混合语种、混合方言、混合说话方式等情况会造成语法模型的指数增长,是根本无法实现的。
以上用自然语言处理和自然语言理解作为例子说明自然智能与人工智能的异同。人类的自然表达方法是多种多样的,例如,可将来自客户的自然表达,即“A语言信息”分为以下四大类:文字信息、语音信息、图像信息、动画信息。其中,文字信息表达可以是:客户通过键盘输入文本表达自己,例如,客户在一家银行的互联网通道呼叫中心用户界面上键入“我的储蓄账户里还有多少钱?”;图像信息表达可以是:客户通过图像表达自己,例如,客户通过电脑桌面截屏工具,将使用某种软件的出错信息,以图像的方式表达自己所遇到的问题;语音信息表达可以是:客户通过说话表达自己,例如,客户与一家银行的服务热线(电话通道呼叫中心)客服专员对话,期间在电话上说:“你说的到底是什么意思?我不是太明白”;动画(或称“视频”)信息表达可以是:客户通过在镜头前摇头以表达自己不同意(这类似于前述的一般情况)。
在基于自然智能进行自然表达处理时,首先,将客户的自然表达(A语言信息)进行自动识别或转换,得到以某种语言形式表示的信息。如果A语言信息是语音信息,那么例如可以通过建模工具采集声波波形信息并通过系统(智能机器人)自动识别或转换为某种(对应于语音信息)的X语言;如果A语言信息是图形信息,那么例如可以通过建模工具采集图形像素信息并通过系统(智能机器人)自动识别或转换为(对应于图像信息的)X语言;如果A语言信息是动画信息,那么例如可以通过建模工具采集图形像素信息和图像变化速度信息并通过系统(智能机器人)自动识别或转换为(对应于动画信息的)X语言;如果A语言信息是文字信息,则将文字信息转换为以字符为单位(基元)的X语言或者无需转换。
然后,对以上从A语言信息自动转换得到的X语言信息或无需转换的文字信息进一步处理,得到计算机或其它处理设备能够“理解”的规则化标准表达(Y语言信息)。Y语言信息可被计算机业务系统进行自动处理。
根据本发明的实施例,可以用规则化的编码来实现所述规则化标准表达(Y语言信息)。例如,采用如下的数字+英文字母的编码方式,其包括行业代码,行业业务代码,机构代码,机构业务代码和表达信息代码。
(1)行业代码
主行业(2位英文字母,最多26×26=676个主行业)
子行业(3位英文字母,每个主行业最多有26×26×26=17,576个子行业)
(2)行业业务代码
一级行业业务范畴(1位数字0-9)
二级行业业务范畴(1位数字0-9)
三级行业业务范畴(1位数字0-9)
四级行业业务范畴(1位数字0-9)
五级行业业务范畴(1位数字0-9)
六级行业业务范畴(1位数字0-9)
七级行业业务范畴(1位数字0-9)
八级行业业务范畴(1位数字0-9)
九级行业业务范畴(1位数字0-9)
十级行业业务范畴(1位数字0-9)
(3)机构代码(UID)(24位数字=国家号3位+城市号3位+机构号18位)
(4)机构业务代码
一级机构业务范畴(0-9)
二级机构业务范畴(0-9)
三级机构业务范畴(0-9)
四级机构业务范畴(0-9)
五级机构业务范畴(0-9)
(5)表达信息代码
信息类型代码(2位数字1-99)
语言代码(使用RFC3066标准:http://tools.ietf.org/html/rfc3066,如zh-CN代表“简体中文”)
方言代码(3位数字1-999)
其中,行业代码表示来自客户的无规则自然表达(A语言信息)所指向的提供服务的主体所在的行业,例如,可以用2位英文字母表示,则可以涵盖676个行业,可选地,增加3位英文字母的子行业代码,可增加涵盖每个行业的17576个子行业。这样,该编码基本上可以涵盖所有常见的行业;行业业务代码表示来自客户的A语言信息所指向的服务需求,同样可以用多位阿拉伯数字表示,例如,采用10位数字进行编码,可以涵盖更多的行业业务范畴;机构代码表示来自客户的A语言信息所指向的提供服务的主体,例如,可以标识该机构所在国家和城市;机构业务代码表示提供服务的主体的内部个性化业务划分,便于机构进行个性化内部管理;表达信息代码表示客户的A语言信息本身的标识性信息,可以包括信息的类型、语言的类型等等,用数字和字母表示。
以下是根据以上编码方式的规则化标准表达(Y语言信息)的两个例子:
例一:FSBNK27100000000860109558800000000000000000002zh-CN003
其中,
行业代码为,
·FS=Financial Service金融服务(主行业)
·BNK=Bank银行(子行业)
行业业务代码为,
·2710000000=一级行业业务范畴—2(信用卡) 二级行业业务范畴—7(调整信用额度) 三级行业业务范畴—1(增加信用额度) 0000000(再无更细分范畴)
机构代码为,
·086010955880000000000000=国家号086(中国) 010(北京)955880000000000000(中国工商银行总行)
机构业务代码为,
·00000=无机构业务范畴(在这个Y语言信息中,没有“中国工商银行总行”这个机构自己定义的机构业务范畴,即表示:该Y语言信息完全属于行业业务范畴,为银行业通用。)
表达信息代码为,
·02=语音(客户提供的A语言信息类型为“语音”)
·zh-CN=大陆中文
·003=广东话方言
在此例子中,该Y语言信息所对应的A语言信息可以是,诸如,“我的信用卡额度太少了”,“我想增加我的信用卡额度”,“我要减低我的信用卡额度”,“我需要调整信用卡额度”等等语音信息。
在一些特定的应用情形,特别是提供服务的主体确定的情况,上述的行业代码、机构代码和机构业务代码都可以作为系统缺省值预设。也就是说,仅从客户提供的A语言信息中获得业务代码和表达信息代码即可,在这种情况下,可以将Y语言信息表示为“271000000002zh-CN003”;或者,如果针对特定应用3位数字表示行业业务代码就够了,则可以进一步表示为“27102zh-CN003”;再者,如果仅针对语音服务,则可以表示为“271zh-CN003”;如果只考虑客户的需求表达,而不关心表达自身的类型信息,甚至仅用“271”表示即可。例二:TVTKT11200000000014047730305000000000001240003fr-CH000
·TV=Traveling Service旅游服务(主行业)
·TKT=Ticketing票务(子行业)
·1120000000=一级行业业务范畴—1(飞机票) 二级行业业务范畴—1(机票改签) 三级行业业务范畴—2(延后) 0000000(再无更细分范畴)
·001404773030500000000000=国家号001(美国) 404(乔治亚州、亚特兰大市) 773030500000000000(美国Delta航空公司)
·12400=一级机构业务范畴—1(折扣票) 二级机构业务范畴—2(淡季) 三级机构业务范畴—4(亚太区) 00(再无更细分范畴)
·03=图像(客户提供的A语言信息类型为“图像”,如:客户在Delta官方网站上进行机票改签操作时,遇到系统报错,客户将屏幕截图,作为向Delta客服求助的自然表达。)
·fr-CH=瑞士法文
·000=无方言
在此例子中,Y语言信息所对应的A语言信息是通过图像识别得到的。同理,在提供服务的主体确定的情况,上述的行业代码、机构代码可以作为系统缺省值预设。在这种情况下,可以将Y语言信息表示为“11200000001240003fr-CH000”;如果只考虑客户的需求表达,而不关心表达自身的类型信息,仅用“112000000012400”表示即可;如果针对特定应用3位数字表示行业业务代码,3位数字表示机构业务代码,仅用“112124”表示即可。
以上只是根据本发明实施例的规则化标准表达(Y语言信息)的例子,可以采用不同的代码位数和代码排列顺序,也可以采用不同的代码表示或编码方式。采用这种代码化的方式来表示所理解的表达意图,可以减少人工辅助理解所需要的人工输入工作量,更便捷地和实时地实现人工辅助理解。
来自客户的自然表达(A语言信息)往往体现了该客户的具体需求,如前所述,首先将客户的A语言信息自动转换为X语言信息或无需转换的语言信息(当A语言信息是文字信息的时候),然后将X语言信息或文字语言信息转换为编码形式的标准表达(Y语言信息)。在前面的例子中,Y语言信息可以包括行业代码,行业业务代码,机构代码,机构业务代码和表达信息代码。可选地,A语言信息也可以包括体现客户需求范畴下的具体参数(可称之为“需求参数”),如:“转5000块给张三”(例一)、“我想看一部电影, 叫《中国合伙人》”(例二)等等。特定的需求代码集(例如包括前述的行业代码,行业业务代码,机构代码,机构业务代码和表达信息代码中的一种或多种)对应特定的参数集。如上例二,若“看电影”的需求代码是123,其对应的参数集可以包括参数:电影名称。那么。这个A语言信息对应的Y语言信息是“123<中国合伙人>”。123是需求代码,<>里的五个中文字是需求参数。在Y语言信息中将需求代码与需求参数区分的方式有多种,可以是利用诸如“<>”的符号,也可以是用空格,还可以用特定顺序排列等方式。
上述的用代码来表示规则化的标准表达的例子,特别可以应用于交互式语音应答系统(IVRS,Interactive Voice Response System)。交互式语音应答(IVR,Interactive Voice Response)是一种基于电话的语音增值业务的统称。很多机构(如银行,信用卡中心,电信运营商等)都通过交互式语音应答系统(IVRS)向客户提供各式各样的自助服务,客户可拨打指定的电话号码,进入系统,根据系统之指示,键入适当的选项或个人资料,以听取预录之信息,或经计算机系统根据预设的程序(Call Flow)组合数据,以语音方式读出特定的资料(如户口结余、应付金额等),还可通过系统输入交易指示,以进行预设的交易(如转账、更改密码、更改联系电话号码等)。机构可以构建专用的标准表达的代码化规则以及基于标准表达的对话术,从而规范内部的客服对话数据体系,并进行数据的保护(第三方即使获知该机构的Y语言信息,也无法了解代码对应的对话数据)。另一方面,智能表达处理引擎的服务商在提供A语言信息到Y语言信息的转换服务中,即使了解了A语言信息的数据和对应转换得到的Y语言信息(例如,一组数字或数字+字母的代码),也不会知道该Y语言信息在用户机构的客服对话数据库中所对应的含义,从而能够提供数据安全的智能表达处理服务。
前述的将A语言信息转换为X语言信息的过程,可以通过语音信号处理技术、语音识别技术、图像识别技术和视频处理技术来实现,这些技术也可以是已有的技术。实际上,根据本发明实施例的编码化标准表达思想也可以被应用到自然表达的识别处理中,通过规则化的编码来表示X语言信息。
根据本发明实施例的自然表达处理方法,首先对客户的自然表达(A语言信息)进行自动转换得到X语言信息,或无需转换直接得到C语言信息(当A语言信息是文字信息的时候);然后将X语言信息或C语言信息转换 为Y语言信息。从而省去A→B→C→D→Y的多层“多对多”关系转换,可以提高表达信息转换的正确率和效率,也可以降低人工辅助识别的工作量和出错率。
具体地,根据本发明实施例的技术,首先通过建模工具,将文本、语音、图形、视频这些非规则化的自然表达信息转换成X语言信息;然后将X语言作为左侧语言,Y语言作为右侧语言,通过使用机器翻译(MT,Machine Translation)技术,实现X语言信息到Y语言信息的转换。
具体而言,以处理语音这种非规则化自然表达信息为例,首先利用“语音信号处理(Speech Signal Processing)”技术自动将A语言自动转换/翻译成X语言(基于目前的“语音信号处理”技术,A→X的转换准确率可高达95%以上,而改进的“语音信号处理”技术在降噪方面做得更好,可将A→X的转换准确率提升至99%以上);然后再利用机器翻译技术实现X→Y的自动机器翻译,而无需再通过X→B→C→D→Y的多层转换。
可以利用类似于基于对实例样本进行统计分析的机器翻译算法来将转换得到的无规则自然表达(X语言信息)转换为规则化标准表达(Y语言信息)。这种机器翻译算法要求X语言与Y语言之间对应数据的量足够大,而且足够准确。
进一步,考虑到已可以实现A→X的精确机器自动转换,为了积累X语言与Y语言之间的对应数据,可以积累A语言与Y语言之间的对应数据。于是,基于自然智能的方案提供了MAU(Mortal Aided Understanding,人工辅助理解)这一新的人工座席工作模式,通过人工理解结合代码输入,实现A语言与Y语言之间的对应数据积累。如前例,可以用“271”这个需求代码来表示调整信用卡额度的含义,同理,也可以用“21”来表示信用卡挂失的含义,这样就可以用“21”来对应于前述的“我的信用卡不见了”或“俺的刷刷卡丢了”的自然表述信息。
所述的MAU可以通过已有的这种简洁的代码输入方式,将传统“说话的座席”转为“不用说话的座席”——静默座席,令座席的工作变得更舒适,工作效率得以大幅提升之余,更充分利用了人类最高价值的理解能力,准确而高速地收集海量的A/X语言与Y语言的对应数据,提供给MT引擎进行循环迭代,自学习A/X→Y的转换/翻译规律,形成和优化A/X→Y的翻译模 型。
以下介绍根据本发明的机器翻译技术及机器翻译机器人训练技术的工作原理。
机器翻译是用来对两种语言进行自动翻译的一种人工智能技术。这里所指的“语言”不是狭义的国家语言(例如:中文、英文……),而是广义的信息表达方式。如前所述,以表达方式分,语言可分为四大类:文字、语音、图像、动画(或称“视频”)。
语言是由元素集里的元素,通过各种排列组合而形成的信息。例如:英文文字是由ASCII字符集(元素集)里的128个ASCII字符(元素),通过各种一维(串行)排列组合而形成的一种语言;中文这种语言,就是由国标码里的几千个中文字再加上标点符号(构成中文信息的基本元素)的无限排列组合;又例如:RGB平面图像是由红、绿、蓝三种子像素,通过各种二维(长与宽)排列组合而形成的另一种语言。
任何两种语言之间存在着某种转换/翻译规律,都可以通过分析两种语言元素排列组合的对应关系,找出两种语言之间的自动转换/翻译规律。首先需要人工收集两种语言的对应数据(或称“翻译样本”),然后通过对两种语言元素排列组合的迭代循环,自动找出两种语言之间的自动转换/翻译规律,形成两种语言的翻译模型。
做机器翻译需要两张数据表:“训练数据表(Training Dataset)”和“检验数据表(Testing Dataset)”。
这两张表的数据结构是类似的:存储的是一对对的数据,左值是“左语言”(或称“源语言”),右值是“右语言”(或称“目标语言”)。我们可以形象地做这么一个比喻:“训练数据表”是人类给MT机器人自学的课本,而“检验数据表”则是人类给MT机器人出的考题,用以评估机器人的自学效果。
下面是英文→中文的MT“训练数据表”和“检验数据表”的例子:
训练数据表
Figure PCTCN2020073180-appb-000001
Figure PCTCN2020073180-appb-000002
检验数据表
  英文 中文
  May I have your age? 请问您年纪?
  …… ……
MT机器人是以组成语言的元素为单位进行排列组合的迭代循环的。如上例中,通过训练数据表中的#3和#4两组数据对,发现英文“May I have your”这15个ASCII字符元素(3个英文字母“May”+1个空格+1个英文字母“I”+1个空格+4个英文字母“have”+1个空格+4个英文字母“your”)的排列组合对应着中文的“请问您”这3个国标码中文字的排列组合;通过训练数据表中的#2和#5两组数据对,发现英文“age”这3个ASCII字符元素的排列组合对应这中文的“年纪”这2个国标码中文字的排列组合。
因此,如果机器人能将检验数据表中的英文“May I have your age?”准确翻译成中文“请问您年纪?”,那就证明机器人学会了这一句的中英文翻译。如果不能,那就证明机器人还没学会。那么机器人就需要修正一下自己的学习方法(例如,寻找另一条路径去尝试再学习),对训练数据表重新消化一次,这又是一次迭代;……如此不断重复着这种“迭代修正”,可使得机器人的翻译准确率不断地爬升。当爬升到一定程度(例如,翻译准确率为70%)后,机器人的翻译准确率可能会一直徘徊在这个水平,再也很难上去了,也就是说遇到了“机器自学习”的瓶颈,那么就需要为机器人增加MT训练数据表数据。MT训练数据表的数据可以从外部数据库导入,也可以通过“人工辅助理解”来进行生成和添加。
例如,沿用之前信用卡业务的例子,假设所得到的无规则自然表达为“我的信用卡能透支的太少了”,而机器人理解力不够成熟的时候,“人工辅助理解”可以介入,通过人工将该表达理解为“我想增加我的信用卡额度”,并输入与之对应的Y语言信息。可选地,“人工辅助理解”处理无需记录对于自然表达的理解过程和理解结果,只需要记录作为最终处理结果的对应标准表达(Y语言信息)。这样可以简化人工操作,节省资源。例如,操作员可能只需要输入“271”作为对应标准表达即完成了对于无规则自然表达“我的信用卡能透支的太少了”的处理。例如,将新的自然表达实例,例如上述的自然表达“我的信用卡能透支的太少了”,及其对应的标准表达“271”添加进现有MT训练数据表,从而增加和更新MT训练数据表数据。于是,通过“人工辅助理解”,一方面可以实现对于目标自然表达的准确而稳定的转换(将其转换为标准表达–Y语言信息),另一方面可以实现MT训练数据表数据的高效添加与更新,从而使得系统MT训练数据表中的数据更加丰富、准确,也可能使得机器人的翻译(转换)准确率高效得到提升。
理论上,MT机器人需要对#3的左值“May I have your time”这20个ASCII字符元素的所有排列组合进行穷尽罗列,也需要对#3的右值“请问您现在什么时间了”这10个国标码中文字的所有排列组合进行穷尽罗列。即,MT机器人需要对训练数据表中的每一对数据的左右两组元素的所有排列组合都进行穷尽罗列。通过这种元素级的穷尽罗列,MT机器人一定能发现很多重复出现的排列组合(如“your”、“May I have your”、“age”、“time”、“您”、“请问您”、“年纪”……),从而能找出这些重复出现的左语言元素排列组合和右语言元素排列组合之间的某种对应关系,也就是两种语言之间的翻译模型。也就是说,训练数据表里左右语言数据对的数量越大,MT机器人所能发现的重复出现的左右两种语言元素的排列组合也就越多,而左右两边重复出现的元素排列组合的对应关系也就越多,那么MT机器人所掌握的左右两种语言的转换/翻译规律也就越多,翻译模型也就越成熟。因此,采用根据本发明技术思想的“规则化标准表达”和“人工辅助理解”,可以更高效地积累MT训练数据表数据,帮助实现机器人自学习和自动机器翻译。
本发明中的X语言→Y语言之间的机器翻译,与中英文的机器翻译原理是一样的,只不过我们把英文改成了X语言,把中文改成了Y语言,而左 右两种语言的元素集不同而已。
如前所述,机器翻译技术可用于将一种语言自动翻译成另一种语言。其技术原理就是对收集到的两种语言的配对信息(左侧语言和右侧语言)进行基本元素级分析,通过对大量的语言信息对的基本元素各种排列组合进行循环迭代比较,从而找出两种语言之间的转换/翻译规律,形成两种语言的翻译模型。
本发明的技术将机器翻译技术的应用范围从对不同国家语言之间进行自动翻译,延展到将所有非规则化多媒体自然表达信息(文字、语音、图像、视频,即A语言信息)自动转换成所述的规则化标准信息(Y语言信息),以便各行各业的业务系统可以对它们进行处理,从而实现真正意义上的、实用的自然表达处理。
对于自然语言处理而言,由于不需要进行传统的机器翻译所需的多层语言学分析,而采用对实例进行基本元素级分析的方式,可以增加翻译的准确度和快捷度,同时,也很容易通过添加自然表达实例和标准表达来进行更新和扩充。
对于本发明实施例的自然表达处理而言,因为只需要进行自然表达(A语言信息)到标准表达(Y语音信息)的转换,换句话说,只需建立A/X→Y的翻译模型,并非对文本的语言翻译结果,因此无需对翻译结果进行修改的处理。
此外,根据本发明实施例的自然表达处理,可以被限制用于具体行业机构的具体业务,例如,上述的信用卡业务,则处理系统所需的MT训练数据表规模可以大大缩小,由此在提高机器人理解成熟阀值的同时,降低MT训练数据表构建和维护的成本,同时也可以有效缩短A/X→Y翻译模型的成熟周期。
如前所述,根据本发明实施例的自然表达处理系统,实现了从自然表达到编码化的标准表达的转换。该转换的基础在于存储A/X语言信息与Y语言信息配对数据的MT训练数据表(即训练数据库),以及在MT训练数据表基础上得到的A/X→Y的翻译模型。因此,需要采集一定量的准确的A/X语言数据和Y语言数据来生成MT训练数据表,并通过机器人(信息处理系统)的自学习(自训练)来形成A/X→Y的翻译模型。而形成MT训练数据 表是可以通过人工辅助理解来进行的。
上述的将A语言信息转换为Y语言信息的方法同样适用于A语言信息是文字信息的情况。在这种情况下,可以将A语言信息中的字(例如中文单字、英文单字等)或字符(例如英文字母及字符、德文字母及字符等)作为X元素,从而将A语言信息直接作为X语言信息或者转换为以字符为X元素的X语言信息,并根据上述方法进行X→Y的翻译模型的训练,从而实现A→Y的翻译(转换)。并且,同样不需要在A→Y的转换中进行文字识别和语法分析,不需要分词库和语法表的支撑,也不受语种和语种混合的限制。
在此概括一下根据本发明实施例的自然智能方法论。在自然智能的方法论下,将对自然表达的机器理解问题,等价于将A语言信息转换为Y语言信息的过程,在该过程中,先从A语言信息获得比A语言信息的信息颗粒度更粗的X语言信息,然后将通过将X语言信息与Y语言信息相对应来获得与A语言信息对应的Y语言信息。
更具体地,X语言信息可以是字、字符等,也可以是比文字具有细很多的信息颗粒度的信息,利用类似机器翻译的算法将X语言信息与Y语言信息对应(也可以称为转换或识别)并不会受到文字处理的语法规则的限制,无需建立语法模型等模型以及规则库。因为无需人工进行模型构建及规则库的维护,因此这种类似机器翻译的算法还可以实现百分之百的机器自学习。构建X语言信息与Y语言信息的训练数据库,输入X语言信息与Y语言信息的正确配对数据,并让机器(即机器人或处理引擎)对于数据库中已有X语言信息的元素的各种排列组合与已有Y语言信息或者已有Y语言信息的元素的各种排列组合(这些排列组合也包括该Y语言信息本身)进行循环迭代,进一步建立X语言信息的元素的各种排列组合与Y语言信息或Y语言信息的元素的各种排列组合之间的对应关系,从而获得更多的X语言信息与Y语言信息的配对数据并存储在训练数据库中。这种循环迭代可以包括对于训练数据进行多次迭代训练,也就是说,将一次训练后的数据(训练所用的数据加上训练得到的新的数据)作为下一次训练的训练数据再次进行训练,经过多次循环不断得到新的训练数据,将所有的数据存储在训练数据库中;如果有新的配对数据输入,再对所有数据(新的数据和已有的数据)进行循环迭代。
于是,在NI方法论下,X语言信息与Y语言信息的数据库是可以自动扩展的,既包括输入的配对数据,也包括通过元素的排列组合以及训练迭代而扩展得到的训练数据。这也是我们称所述数据库为训练数据库的原因。在AI的方法论下,需要人工对规则库添加规则,而规则库本身不能自我扩展,而在NI的方法论下,机器可以通过自我训练来扩展用于实现X语言信息到Y语言信息的对应转换的数据库。在通过循环迭代获得新的训练数据的同时,也对机器的理解模型(包括X语言信息到Y语言信息的转换模型)进行迭代,从而实现对模型的优化,增强模型的准确度。在自然智能的方法论下,模型的优化也是机器自动完成的,并不需要模型工程师的工作,从而能够极大地降低机器学习的成本。
当从新输入的A语言信息获得X语言信息后,将该X语言信息输入到X语言信息到Y语言信息的转换模型,通过该转换模型的计算来确定与该X语言信息对应的Y语言信息,或者计算将该X语言信息对应到某Y语言信息的正确率。如果机器的理解能力不够成熟,不足以或者不确定将该X语言信息转换到某Y语言信息,那么通常需要进行人工辅助理解。通过人工对所述新输入的A语言信息进行理解,也就是说,用人的理解能力对原始的自然表达进行理解而非对转换后的X语言信息进行理解,得到与自然表达的含义所对应的Y语言信息,并且可以将从该A语言信息得到的X语言信息与Y语言信息对应起来,得到新的配对数据存入训练数据库,并进行上述的数据扩展和训练,从而在训练数据库中增加新的数据并对模型进行优化。由于前述的数据扩展,使得按照这种方式进行人工辅助理解可以不仅仅增加一条配对数据,而增加一组或多组配对数据,从而快速增加数据库的配对数据量,提升机器的理解能力。
另外,通过人工辅助理解还可以纠正训练数据库中X语言信息与Y语言信息之间错误的对应关系。例如,通过人工辅助理解指定某条A语言信息(自然表达)对应某条Y语言信息,取代之前与该条A语言信息对应的Y语言信息,或者通过人工辅助理解告知机器人某条A语言信息对应某条Y语言信息的正确率高于该条A语言信息与之前的一条Y语言信息相对应的正确率,从而由该条A语言信息转换的X语言信息与Y语言信息的对应关系被予以纠正或优化。
在自然智能的方法论中,上述对配对数据进行扩展和训练的机器学习,可以采用统计学、深度学习、概率计算、快速最优化路径搜寻等模型中的一种或者多种来实现。但模型本身可能带来微小的误差,可以称之为固有误差。对于大数据量的扩展和训练而言,这种系统固有误差所导致的错误是可见的。例如,如果扩展后的数据为500万条,利用这些配对数据进行测试得到错误率为0.2%,也就是说,扩展后的配对数据中有1万条是错误的。为了对这种固有误差进行弥补,将这些已知的不能被正确识别出的X语言信息数据及其应正确对应的Y语言信息写入对照表。每次进行自然表达的转换时,可以先将由自然表达转换的X语言信息与对照表中存储的X语言信息进行对比,如果新的X语言信息在对照表中,则可以通过对照表得到其正确对应的Y语言信息;如果新的X语言信息不在对照表中,则在训练数据库中检索与之相同或相近的X语言信息,并通过例如自信度判断等方式确定与该新的X语言信息对应的Y语言信息。由于设置了用于纠错的对照表,而数据库中对照表之外数据的准确度是经过筛选来保证的,于是可得出对已看见数据的零错误率效果。在向训练数据库增加新的配对数据(为了节省计算量和时间,通常是批量的),并进行数据扩展训练后,还可以用上述的方法对所述对照表进行相应的扩展,提高转换和识别的准确率。
机器学习模型本身带来的固有误差同样会存在于基于其它机器智能方法论的机器学习系统。同样地,采用上述的发现错误数据、将错误数据写入对照表、并且在后续的检索中先查找对照表的方式,可以解决基于其它机器智能方法论的机器学习系统或人机交互系统的所述固有误差问题,从而提高机器理解的准确率。具体地,用一定量的表达数据与意图数据的正确配对数据来对机器人进行训练;然后让机器人对这些表达数据进行理解,将理解结果与正确配对的意图数据进行对比,找到理解错误的表达数据;将所发现的这些不能被正确理解出的表达数据及与其对应的意图数据写入对照表;机器人在以后进行理解时先将所要理解的表达数据与对照表中的表达数据进行比对,如果发现该数据在对照表中,则直接通过对照表找到对应的理解结果(意图数据),如果在对照表中没有找到该表达数据,那么再在训练数据库中进行比对。这样可以弥补前述的固有误差,提高理解的准确度。对照表作为快速查找表,还可以用来存储出现概率较高的表达数据及其配对数据,这 样通过在后续的检索中先查找对照表的方式,提高整体搜索速度,从而提高机器理解的速度。可以基于统计结果设置概率阈值来对存储进入对照表的数据进行筛选。这种对照表的使用方案同样适用于基于自然智能和基于其它机器智能方法论的机器学习系统或人机交互系统。
图6示意性地示出了根据本发明一个实施例的自然表达处理方法的流程。
在步骤S20,系统接收自然表达信息(A语言信息),如前所述,该自然表达信息可以是文本信息、语音信息、图像信息、视频信息等。
在步骤S21,判断机器人的理解能力是否成熟。其中,对于机器人理解是否成熟的判断,可以基于在一定时间区间内(根据具体应用要求设定),机器人将A语言信息转换成X语言信息,然后将X语言信息转换成Y语言信息的结果,与人工将A语言信息直接转换成Y语言信息的结果进行比较,两者相同的次数,除以比较的总次数,得到的百分比,就是机器人理解准确率。还可以采用机器人自我判断理解能力是否成熟的方式,即机器人估计其将某条或某些条A语言信息正确地转换为确定的Y语言信息的概率或准确率,我们也称之为机器人的“自信度”或“自信值”。随着人工辅助训练及机器人的自学习,机器人对于特定Y语言信息的转换自信度会不断提高。对机器人自信度或自信值的计算,是基于X语言信息与Y语音信息的对应关系来进行的。具体而言,从A语言信息通过转换或者提取得到X语言信息之后,通过诸如深度神经网络、有穷状态转换器、自动编码器解码器等一个或多个识别器/分类器来产生对Y语言信息的对数概率或相类似分数,再利用归一化的指数函数来计算出机器人自信度。
该自信度可以是对应于指定的Y语言信息(标准表达)来进行计算,其中指定的标准表达可以是单条,也可以是多条(多于一条)。例如,对于某条自然表达,其转换得到的标准表达可以是“标准表达1”、“标准表达2”或“标准表达3”,或者说,需要从某条自然表达中识别出“标准表达1”、“标准表达2”、“标准表达3”中的一个所对应的意图。
如果针对单条标准表达独立计算自信度,即,分别计算将所述自然表达转换得到的X语言信息(次级语言信息)对应到“标准表达1”、“标准表达2”或“标准表达3”的自信度,得到的结果例如为:转换为“标准表达1” 的自信度为80%、转换为“标准表达2”的自信度为40%,转换为“标准表达3”的自信度为10%。如果此时设置自信度阈值为80%,则“标准表达1”满足阈值要求;如果设置自信度阈值为90%,则三个标准表达均不满足阈值要求;如果设置自信度阈值为40%,那么有两个标准表达的自信度满足阈值要求,可以输出自信度更高的那个标准表达作为理解结果,但通常不会设定这么低的自信度阈值。在这种方案下,对于各条标准表达的自信度是独立的,因此并非自信度累加求和为100%的情况。另一方面,对于同一条自然表达还可以计算相对于其它标准表达的理解自信度,只不过对于设定话术而言,通常并不需要计算对于话术之外的标准表达的理解自信度。
如果一次对于多条标准表达计算自信度,还可以利用类似softmax函数来使得多条标准表达的自信度之和为100%,例如计算得到转换为“标准表达1”的自信度为70%、转换为“标准表达2”的自信度为20%,转换为“标准表达3”的自信度为10%,那么更有利于通过自信度阈值对自然表达的理解结果进行区分。
此外,还可以采用相对自信度,即对不同标准表达计算理解概率后,用各个概率之间的相互数值比较关系来进一步计算自信度。例如,对于“标准表达1”的理解(识别)概率为65%,对于“标准表达2”的理解(识别)概率为35%,计算得到的自信度可以是“标准表达1”为80%,“标准表达2”为20%。
前述的机器人理解准确率和机器人自信度都是衡量机器人理解成熟程度的指标。其中,对于机器人理解准确率的计算通常需要一定时间的数据积累,换句话说,机器人理解准确率可以是针对较大训练数据量的统计结果,其相比较少训练数据的情况更准确地衡量机器人理解成熟度程度。而机器人自信度是针对机器人对于某条或某些条A语言信息进行理解的理解能力评估,对于较少训练数据的情况也可以比较准确地评估机器人的理解能力。其中,对于机器人理解准确率的计算通常需要一定时间的数据积累,这是因为较多数据能更广泛地代表表达的多样性,从而更准确地反映实际应用的情况。换句话说,机器人理解准确率可以是针对较大训练数据量的统计结果,其相比较少训练数据的情况更准确地衡量机器人理解成熟度程度。而自信值则是用来衡量机器人自身答案的可靠性。一般来说,当从用户表达获得的用 于语义理解的语言象征比较模糊时,相比于语言象征更清晰的情况,机器人的自信度相对较低,这反映了用户自然表达的语义模糊或表达接近多个语义的情况。从另一个角度区分,机器人理解准确率是对特定应用的成熟度评估,而自信度则反映出机器人对自身答案的不确定性。
根据应用需要设定的机器人理解准确率或者自信度,我们称之为“机器人理解成熟阀值”。如果机器人理解准确率或者自信度低于机器人理解成熟阀值,系统则认为机器人理解尚未成熟,不会采用机器人转换结果,而仍继续采用人工转换结果Y2,以保证系统对A语言信息理解的准确与稳定。同时,系统将A语言信息通过机器自动转换的X语言信息(左侧语言),以及人工转换结果Y2(右侧语言)加入MT训练数据表中,供MT机器人自训练使用。
如果机器人理解成熟了,则在步骤S22让机器人自动将该自然表达A直接转换为标准表达Y;如果机器人理解还未成熟,则在步骤S23由机器人尝试将该自然表达A转换为标准表达Y1,同时在步骤S24由MAU座席将该自然表达A转换为标准表达Y2。
在步骤S26,若步骤S21判断机器人理解能力已经成熟,则输出由机器人自动转换的结果Y;否则,输出有MAU座席人工转换的结果Y2。
可选地,在步骤S25,对自然表达A、机器人尝试转换的结果Y1、MAU座席人工转换的结果Y2进行如下的后续处理:将A自动转换成X语言信息(左侧语言)连同Y2(右侧语言),作为一对新的配对数据放入MT训练数据表中;将Y1和Y2进行比较,用作“判断机器人理解是否成熟”的统计数据。可选地,将原始数据A保留,当未来A→X转换技术进一步发展成熟(转换准确率更高)时,更新MT训练数据表的左侧语言数据。
图7示意性地示出了根据本发明一个实施例的自然表达处理及回应方法的流程。
在图7所示的处理中,首先在步骤S30接收自然表达A。
然后在步骤S31判断是否能够通过机器转换将自然表达A转换为标准表达Y。该步骤等同于图6中步骤S21。类似于图6的处理,当在步骤S31判断不能通过机器转换得到所需的标准表达时,在步骤S32进行人工转换处理。
在实际应用中,可能存在即使通过人工处理仍不能理解所识别的自然表达或者理解客户所表达的需求,这时,在步骤S33作出提示客户重新输入的回应,然后处理回到步骤S30,接收客户再次输入的自然表达信息A。“提示客户重新输入的回应”可以是,例如,语音提示“不好意思,请您再讲一遍您的需求”,“请您讲慢一些”;文字提示“不好意思,请您写具体些”;或者图像提示等。
在步骤S34输出机器转换或人工转换的标准表达。在步骤S35查询与该标准表达匹配的标准回应。标准回应可以是预先存储在数据库中的固定数据,也可以是预先在数据库中存储标准回应的基础数据,然后经系统运行,将基础数据与个案变量参数合成而生成标准回应。在一个实施例中,设置标准回应ID来作为回应数据的主键,并在数据库中设置标准表达(Y语言信息)的需求代码与标准回应ID的对应关系表,从而将标准表达(Y语言信息)的需求代码与回应数据相关联。以下的表1~表3分别示意性示出了表达数据表、表达回应关系表和回应数据表的例子。可选地,标准表达与标准回应ID可以是多对一的关系,如表4所示。此外,在其它实施例中,由于标准表达(Y语言信息)的需求代码本身是编码化的,也可以直接用标准表达(Y语言信息)的需求代码作为回应数据的主键。
Figure PCTCN2020073180-appb-000003
表1
Figure PCTCN2020073180-appb-000004
Figure PCTCN2020073180-appb-000005
表2
Figure PCTCN2020073180-appb-000006
表3
Figure PCTCN2020073180-appb-000007
表4
如前所述,标准表达可以包括与自然表达相关的信息,例如,表达类型,语言类型,方言类型,等等。例如,来自客户的自然表达为语音“收到了”,通过转换后的标准表达查询得到标准回应为语音“好,知道了,谢谢!”;还例如,来自客户的自然表达为图像“转账失败页面截屏”,通过转换后的标准 表达查询得到标准回应为视频“转账纠错简易教程”。
如果数据库中没有与所述标准表达匹配的标准回应,则可以在步骤S36通过人工匹配与之相应的回应。人工匹配可以通过输入或选择标准回应ID来将标准表达与该标准回应ID相关联,也可以直接将标准表达与回应数据相关联,还可以建立新的回应数据。找不到标准回应的原因可能是该标准表达是通过人工新添加的,也可能是因为没有匹配到相同类型的标准回应。
然后,在步骤S37输出机器匹配或者人工匹配的回应。可以根据不同的信息类型来调用或者生成回应的内容。例如,对于语音回应,可以回放真人录音或者输出通过语音合成(Text To Speech,TTS)的语音;对于用户数字化操作,例如,电话按键顺序组合“2-5-1000”,通过程序运行完成“信用卡还款1000元”的操作。
而对于例如,“转5000块给我妈”的文字信息,需要通过运行程序进行“转账5000元给X女士”的操作,但是系统可能并不预先掌握“X女士”的账户信息,一方面可能需要人工添加该账户信息以实现标准表达的转换,另一方面,即使实现了标准表达的转换,也可能查询不到对应的标准回应,而需要人工作出回应处理。这时,会生成新的回应数据(如操作程序),也会手动或者自动为该回应数据分配一个新的标准回应ID,并将该标准回应ID与上述转换的标准表达相关联。于是,在实现对于客户的自然表达回应的同时,可以实现人工辅助理解和训练,更新表达—回应数据库。
根据本发明实施例的自然表达处理和应答方法,可以利用标准表达快速指向回应,从而使得客户无需再花长时间遍历复杂的常规功能菜单来寻找自己所需的自助服务。
另一方面,与传统的应答方式不同,人工操作主要限于后台的“决策”工作,包括确定标准表达(Y语言信息)需求代码,选择回应(或回应ID)或者生成回应操作等,但不需要在前台通过通话或者文字输入(输入标准表达(Y语言信息)需求参数除外)等方式来与客户直接进行交流,即前述的静默座席模式。从而可以大量节省人力资源,大幅提升工作效率。此外,系统对客户提供的标准化回应,相对于人工座席直接对客户提供的传统的自由式回应,不受人工座席的情绪、声腺、口音、业务熟练度等诸多因素影响,更能保证客户体验的稳定性。
再者,通过系统(机器人)的自动学习、训练及人工辅助理解,可以建立经转换的自然表达(X语言信息)—标准表达—标准回应数据库,逐渐实现系统自动理解和回应。并且该数据库中的X语言信息数据还可以具有信息颗粒度细、业务范畴窄、数据保真度高等优点,从而降低机器人训练难度,缩短机器人智能的成熟周期。
根据本发明的实施例,可以通过设置机器人自信度阈值来对人机交互的流程进行控制。例如,设置第一自信度阈值作为判断机器人理解力是否成熟的标准。还可以通过设定自信度的其它阈值来对人机交互进行智能控制。例如,设置第二自信度阈值,当机器人自信度低于第一自信度阈值但不低于第二自信度阈值,则机器人请用户确认其输入的自然表达的含义是否为某条确定的标准表达。又例如,再设置第三自信度阈值,当机器人自信度低于第二自信度阈值但不低于第三自信度阈值,则机器人请用户重复自然表达的输入,当机器人自信度低于第三自信度阈值,则机器人自动切换到人工辅助理解。
例如,图15示出了用于身份认证的智能人机交互的例子。在图15所示的例子中,机器人询问用户的问题是“请问是余先生本人吗?”,作为回复,用户输入的自然表达为一段语音,其所表达的含义可以是“是”、“不是”、“没听清”或者“没兴趣或不愿接听”四种含义中的一种。
在图15所示的例子中,设置第一自信度阈值为80%,第二自信度阈值为60%。当机器人对于用户所述一段语音回复(第一次语音回答)的含义理解的自信度不低于80%时,也就是说,机器人将该段语音回复理解为“是”、“不是”、“没听清”或者“没兴趣或不愿接听”中的一种的自信度不低于80%。机器人将用户语音对应的含义理解为“是”、“不是”、“没听清”或者“没兴趣或不愿接听”中自信度不低于80%的一种。通常在(对应于自然表达的含义或者意图的)标准表达的区分设计较好的情况下,不会发生对多于一个标准表达的自信度都超过50%的情况,但由于训练数据偏差等原因也会产生高自信度偏差的情况,这时如果出现多于一个标准表达的自信度超过阈值,可以自动选择自信度最高的那个标准表达作为理解结果。也可以采用一起计算对多于一个的标准表达的自信度,并使得对各个标准表达的自信度之和为100%,这样就不会出现超过一个的标准表达的自信度高于50%的情况。
当机器人对于用户第一次语音回答的含义理解的自信度不高于80%但不低于60%时,也就是说,例如机器人理解到用户回答的含义可能是“不是”,但是又不是很确定(60%≤CL<80%,CL表示自信度),于是机器人请用户确认其回答的含义是否为“不是”,用户随后再次输入语音回答,机器人遂对于用户的第二次语音回答进行理解,如果机器人对于用户第二次语音回答的含义理解的自信度不低于80%时,机器人获得用户对其第一次语音回答的含义的确认(“是”),因而根据用户的确认结果将用户语音对应的含义理解为“是”、“不是”、“没听清”或者“没兴趣或不愿接听”中的一种,而如果机器人对于用户第二次语音回答的含义理解的自信度达不到80%或者用户的确认结果为“不是”,那么机器人将求助于人工辅助理解或人工应答。
当机器人对于用户第一次语音回答的含义理解的自信度达不到60%时,或者说,机器人不能理解用户回答的含义,于是机器人请用户再次进行回答,用户随后再次输入语音回答,机器人遂对于用户的这一次的语音回答进行理解,如果机器人对于用户第二次语音回答的含义理解的自信度不低于80%时,机器人将用户语音对应的含义理解为“是”、“不是”、“没听清”或者“没兴趣或不愿接听”中的一种,而如果机器人对于用户这一次(第二次)语音回答的含义理解的自信度仍达不到80%,那么机器人将求助于人工辅助理解或人工应答。
考虑到交互次数对于用户体验的影响,上面对于图15所示例子的描述中只设定用户通过两个轮次进行回答。可选地,也可以增加用户回答的轮次,例如,在第二轮次,如果对于用户的第二次语音回答,机器人的理解自信度达不到80%但不低于60%,那么仍然可以请用户进行第三次语音回答,以确认机器人对于第二次语音回答的含义的理解是否正确。
另外,还可以设置第三自信度阈值,例如,40%,当机器人对于用户第一次语音回答的含义的理解自信度低于40%时,自动转人工辅助理解处理,这样可以减少交互次数,改善用户体验。
当然,上述的第一自信度阈值、第二自信度阈值和第三自信度阈值等等,都是可以根据需要设定的,例如第一自信度阈值是90%,第二自信度阈值是50%等等。
上述基于理解自信度进行多轮交互的方式可以被视为通过用户表达输 入对机器人进行反馈控制,也就是说,根据机器人对于用户所输入的表达的理解自信度来产生逻辑输出,并通过该逻辑输出对交互过程进行逻辑控制。这种方案直接的效果是可以大大减少人工辅助理解或者人工应答的工作量。例如,对于用户对某含义的表达的理解,机器人的实际正确率是60%,那么用户就同一含义表达两次以后,机器人理解正确一次的概率理论上为100%-(100%-60%)×(100%-60%)=84%。或者,对于用户对某含义的表达的理解,机器人的自信度不低于80%的概率是60%,那么用户就同一含义表达两次以后,机器人理解正确一次的概率理论上为100%-(100%-60%)×(100%-60%)=84%。实际上,通常在提示用户“不好意思,没听清,麻烦您再说一下?”后,用户的第二次语音输入的含义清晰度和/或发音清晰度都会有提高,因而机器人通常对于就用户对相同含义的第二次表达的理解正确率也会提高。这样通过自动提示用户重复其含义,可以增加机器人的理解准确率。进一步,再自动提示用户确认其含义,由于对于用户确认所用的表达通常是“是”、“对”、“不是”、“不对”等等简单的表达,因而机器人对于用户确认其含义的表达的理解正确率或者理解自信度通常比较高,例如在90%-100%,那么通过多轮互动将用户重复表达其含义和用户确认其表达的含义相结合,可以基本上实现机器人的自动应答,从而大大减少转入人工处理(包括进行人工辅助理解的静默座席或者人工客服)的用户表达的量。
另一方面,对于用户通过表达输入而确认含义的情况,可以将从用户第一次输入的表达转换得到的X语言信息与该表达对应的预设含义(以Y语言信息表示的标准表达)作为配对数据存入训练数据库,并利用前述的方式针对该配对数据进行训练。通过这种方式,仅通过用户的表达输入,就可以产生扩展训练数据库的配对数据,而不需要通过人工辅助对于用户表达的含义进行理解或确认。也就是说,可以实现在服务端的智能数据积累和机器人(引擎)自动学习。
图15例子所示的人机交互方案所涉及的用户表达可以是语音也可以是其它表达方式。如果是人机语音交互方案,可以通过IVR来控制实现。
上面阐述了基于自然智能的自然表达处理原理和方法及交互方法,接下来描述基于自然智能的人机交互的精准信息萃取。需要强调的是,根据本发明实施例的人机交互中的精准信息萃取可以完全涵盖上述基于自然智能的 自然表达处理方法,也可以说,在上述方法的基础上,通过增加或者调整部分步骤来实现精准信息萃取。
“精确信息萃取方法”,通俗来讲,就是从一条自然表达中获取多个意图。如前所述,自然表达并不限于自然语言,还可以是静态图像、动态图像等等。结合前面的自然智能体系,也可以这样理解,就是从一条自然表达中得到与分别出多个意图对应的Y语言信息。具体来讲,机器人先将A语言信息转换成X语言信息,然后从X语言分析和计算与预设意图对应的部分,之后将这些部分分别转换为Y语言信息。也就是说,与前述的自然表达处理过程相比,在X语言信息层进行了关键信息的甄别和提取,采用精确转换或局部转换而非整体转换。这种方式可以提高机器人理解的准确度,特别对于含有多个体现意图的关键信息的自然表达而言,精准转换比整体转换的准确度更高。利用“精确信息萃取方法”获取自然表达中的意图之后,填入与意图分类向对应的槽(slot)中,实现填槽(slot filling)处理。
例如,前述的代码“112124”(Y语言信息)中,各位代码分别表示:1(飞机票),1(机票改签),2(延后),1(折扣票),2(淡季),4(亚太区),前面三位代码对应的是操作,后面三位代码对应的是对象。还原成完整的意图,就是延后改签亚太区淡季折扣(机)票。假设对需求的关键信息进行分类,包括槽1——操作,槽2——对象,那么如果填入槽,则可以对应在槽1填入“延后改签机票”,在槽2填入“亚太区淡季折扣票”。对于槽的设置,类似于前面的对Y语言信息的编码,可以根据需要增减或调整。例如,将槽填充限定为涉及飞机票的票务操作,那么在槽1中填入“延后改签”即可,又例如,对票务操作的对象进行细化,槽2对应机票折扣类型,槽3对应目的地,则在槽2中填入“淡季折扣票”,在槽3中填入“亚太区”。可见,槽填充是根据需求细化分类提取意图,并按照分类进行存放的过程。如前面类似,如果用代码表示细分后的意图,可以在槽1中填入“12”,在槽2中填入“12”,在槽3中填入“4”。
进一步,由于具体意图可能是多样的,比如,前述槽3对应的机票目的地,可能包括成百上千个国际机场以及成千上万个国内机场,如果是这种情况,还可以用字母组合(如国际航空运输协会(IATA,International Air Transport Association)制定的机场三字代码)或者字母与数字的组合所形成 的代码来指示具体的机场名称。不过这种情况下,这些代码有时并不利于人工辅助人员记忆并输入,人工辅助人员可以直接将具体目的地(机场名称)以一种系统能自动将之与代码对应的表述填入槽中。例如,直接填城市名或者城市代码,例如,上海、Shanghai、沪,等等。
在基于AI的自然表达处理方法论之下,通过意图获取或信息获取来进行“槽填充”还要运用语法模型、语义模型来从自然表达转化得到的文本得到所需要提取的关键信息。在基于NI的自然表达处理方法论之下,自然表达到标准表达的转换的实质上是对自然表达的理解,因此仍然不需要通过语法模型和语义模型,而是基于前述的机器翻译原理将自然表达与标准表达中的所要提取的意图部分对应起来。换句话说,在信息获取(萃取)的过程中,仅对与所需提取的意图有关的信息进行标准表达(Y语言信息)的转换。例如,对于自然表达“我明晚从上海飞北京,回家”,如果对航班预订进行信息萃取和填槽操作,且假设槽1对应“出发地”、槽2对应“目的地”、槽3对应“日期”,则槽1填入“上海”、槽2填入“北京”、槽3填入“明天”或者系统自动确定的与“明天”对应的具体日期,而其它信息,诸如订票人信息、接机目的地信息等,不会在此次信息萃取和填槽操作中使用。
从转换过程上,机器人先将A语言信息转换成X语言信息,然后从X语言信息提取与所要填入槽的Y语言信息对应的部分,之后转换为Y语言信息,并填入槽。也就是说,与前述的自然表达处理过程相比,在X语言信息层进行了关键信息的甄别和提取,采用精确转换而非整体转换。这种方式可以提高机器人理解的准确度,特别对于含有多个体现意图的关键信息的自然表达而言,精确转换比整体转换的准确度更高。
图8示意性地示出了根据本发明一个实施例的基于自然智能的人机交互系统的信息萃取和槽填充流程。
在步骤S40,系统接收自然表达信息(A语言信息),如前所述,该自然表达信息可以是文本信息、语音信息、图像信息、视频信息等。
在步骤S41,判断机器人的精确信息萃取能力(或者简单称之为“意图获取能力”)是否成熟。其中,对于机器人精确信息萃取能力是否成熟的判断,是基于在一定时间区间内(根据具体应用要求设定),机器人从A语言信息转换成X语言信息,然后从X语言信息提取与所要填入槽的Y语言信 息对应的部分,转换为Y语言信息,与人工从A语言信息直接获得的需要填入槽的Y语言信息进行比较,两者相同的次数,除以比较的总次数,得到的百分比,就是机器人精确信息萃取准确率或意图获取准确率。
类似前述,也可以采用机器人自我判断理解能力是否成熟的方式,即机器人估计其基于某条或某些条A语言信息对某个意图进行正确地信息获取的概率,我们称之为机器人的“精确信息萃取自信度”或“意图获取自信度”(也可以通俗地称之为“填槽自信度”)。随着人工辅助训练及机器人的自学习,机器人对于特定意图的意图获取自信度会不断提高。机器人对于意图获取的正确概率的估计,可以基于其当前处理的A语言信息与MT训练数据表中已有的A语言信息之间在X元素层面的比较。具体地,基于X语言信息(次级语言信息)与Y语言信息(标准表达)的对应关系来计算所述自信度,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对Y语言信息的对数概率或相类似分数,再利用归一化的指数函数来计算出所述自信度。
根据应用需要设定的机器人意图获取准确率或者意图获取自信度,称之为“机器人意图获取成熟阀值”。如果机器人意图获取准确率或者意图获取自信度低于机器人意图获取成熟阀值,系统则认为机器人意图获取能力尚未成熟,不会采用机器人意图获取结果YF,而仍继续采用人工意图获取结果YF2,以保证系统对A语言信息的意图获取的准确与稳定。同时,系统将A语言信息通过机器自动转换的X语言信息(左侧语言),以及人工意图获取结果YF2(右侧语言)加入MT训练数据表(即训练数据库)中,供MT机器人自训练使用。
如果机器人精确信息萃取能力成熟了,则在步骤S42让机器人自动进行意图获取和填槽操作,从A语言信息转换成X语言信息,然后从X语言信息提取与所要填入槽的Y语言信息对应的部分,转换为Y语言信息,并填入槽;如果机器人精确信息萃取能力还未成熟,则在步骤S43由机器人尝试从自然表达A转换所要提取的标准表达YF1并填入槽,同时在步骤S44由MAU座席从A语言信息直接获得需要填入槽的Y语言信息YF2并填入槽。
可选地,在步骤S45,对自然表达A、机器人尝试从自然表达A转换所要提取的标准表达的提取转换结果YF1、MAU座席人工提取转换的结果YF2 进行如下的后续处理:将A自动转换成X语言信息(左侧语言)连同YF2(右侧语言),作为一对新的配对数据放入MT训练数据表中;将YF1和YF2进行比较,用作“判断机器人精确信息萃取能力是否成熟”的统计数据。可选地,将原始数据A保留,当未来A→X转换技术进一步发展成熟(转换准确率更高)时,更新MT训练数据表的左侧语言数据。如果对提取转换得到的YF1和YF2直接进行上述后续处理,那么在步骤S43实际上并不需要将YF1填入槽。也可以对将YF1和YF2分别填槽后的填槽数据作为训练数据或者统计数据。
如果A语言信息是文字,则如前所述,将文字本身或者字符等作为X元素而得到或者转换为X语言信息,进行后续操作。
图9进一步示例性地示出了“订机票”问询项下的自然表达意图获取和槽填充处理流程。如图9所示,处理开始后,在步骤ES11接收自然表达“我明晚从上海飞北京,回家”,如前所述,该表达可以是语音、文字等形式。在步骤ES12判断所述表达所体现的意图是否为“订机票”问询项,如果判断不是“订机票”问询项,则对用户提示当前为“订机票”问询项或者请求用户确定当前需求为“订机票”,然后请用户重新输入表达。步骤ES12也可以在处理开始时用户输入表达之前进行,即先提示用户当前的问询项。然后在步骤ES13进一步判断用户是否在给自己订机票还是给别人定机票,用户可以输入“我爸妈”、“我太太”、“董事长”等等,如果机器人能够识别出这些表达所对应的具体的人并具有这些人的信息,那么可以自动将订票人信息填入对应槽内。如果ES13判断用户是给自己订机票,那么机器人进一步在步骤ES15提取与“出发地”有关的信息、与“目的地”有关的信息以及与“日期”有关的信息,该信息提取与前述的从X语言信息到Y语言信息的转换在原理和基本方法上是一致的,只不过是仅将与“出发地”有关的信息、与“目的地”有关的信息以及与“日期”有关的信息精确地提取出来转换成Y语言信息。机器人可以在步骤ES15的意图获取和填槽操作后,询问或者自主判断用户有无其它意图,例如,在本例中,用户的表达中还包括了“回家”,因此,机器人发现该“回家”的表达之后还可以进行后续的处理,例如,询问或者自主提示用户是否需要接机服务,并可以将用户的家庭住址(如果机器人的知识库中包括用户家庭住址的数据)填入“接机目的地”槽中。 所需信息填入槽中后,机器人可进行对应的应答操作,例如,显示或者语音告知客户可能符合客户需求的航班信息,等等。
值得说明的是,图9中的多个步骤可以包括前述的(例如图7或图8所示的)将自然表达转换为标准表达或者基于自然表达进行意图获取和填槽的过程。例如,从自然表达判断问询项(ES12),确定订票人(ES13、ES14),确定“出发地”、“目的地”和“日期”(ES15),确定其它意图(ES16),进行后续填槽处理(ES17)。
这些信息的精确提取不必通过一次自然表达到标准表达的转换完成。如果是一次提取,实际上包括7个槽,即,“问询项”槽——填入“订机票”,“订票人”槽——填入“本人”,“出发地”槽——填入“上海”,“目的地”槽——填入“北京”,“日期”槽——填入“明天”或对应的日期,“其它意图”槽——填入“接机”,“接机目的地”槽——填入“家”或具体家庭地址。这样会大大增加机器人理解所需的运算量和数据流,也会增加人工辅助理解的操作复杂度。因此,可以通过多次理解来实现对于各个槽对应信息的提取和对各个槽的填充。例如,第一次理解确认“问询项”(即后文所述的常见问题“FAQ”)是“订机票”,这样可以将A/X→Y的数据库缩小到“问询项”为“订机票”所对应的范围内,于是大大降低机器人理解和训练所需的数据量和运算量,大大加快迭代运算的收敛速度。也可以采用默认的方式或者用户选择的方式来确定“问询项”。同样地,可以通过再次理解或者默认或者用户选择的方式来确定用户给自己订机票。也可以通过多轮会话的方式来逐步获得各个槽对应的信息。例如,机器人先通过文字、语言或图像等方式询问用户所需的服务——“订机票”,然后询问用户为谁订票,之后再询问用户“出发地”、“目的地”、“日期”、“优选时段”、“价位”等信息,以及询问有无其它需求(诸如接机等)。
还可以设置“问询项”的上位槽——“应用场景”槽,将“问询项”进行归类。如果问询项“订机票”所归入的“应用场景”可以是“出行”。上位槽对应的可以是上一层次的意图。
不仅可以在一次填槽操作中实现针对一个“问询项”的多槽填入,还可以在一次填槽操作中实现跨多个“问询项”的多槽操作。例如,对于表达“我想明晚7点半从上海回京在大董请两个外国朋友吃饭”,就可以包括两个应 用场景——“出行”和“餐饮”的“问询项”,分别是“订机票”和“订堂餐”,于是,在一次填槽操作中,除了实现“订机票”下的“出发地”、“目的地”、“日期”、“时间”的填槽操作,还同时实现“订堂餐”下的“餐厅名”、“就餐人数”的填槽操作。这种一次理解填入多个槽的表达处理方式可以降低对话次数,节省用户时间,极大地提升用户体验。具有这种填槽功能的智能人机交互系统可以作为虚拟个人助理(Vitual Personal Assitant,VPA)实现跨平台的同步对接,例如,对于“出行”应用场景,与携程网对接,对于“餐饮”应用场景,与大众点评(美团)对接。
如果一次理解所需要填入的槽数过多,那么运算所需数据量和运算复杂度会有很大增加,因此,具体应用中可以在用户体验和处理效率/资源耗费之间寻求预设平衡或者动态平衡。
基于自然智能的自然表达处理方法、人机交互方法和精确信息萃取方法尤其可以应用于诸如前述的交互式语音应答IVR或互联网呼叫中心系统ICCS的客户服务系统或其他远程客户联络系统(如:电话销售系统、网络销售系统、VTM智能远程终端机……)。如前所述,在这类应用中,对机器翻译的要求并非逐字的确切含义,而是需要将客户的自然表达转换为系统能够理解的信息,从而为客户提供与其表达对应的应答。也就是说,这里的机器翻译侧重于对人类语言背后的实质涵义的理解,从而以计算机程序更易于处理的形式表示从自然表达中所“理解”到的客户实际意图或需求。
图10示意性示出了根据本发明实施例的一种智能人机交互系统。如图10所示,该智能人际交互系统包括智能应答设备1(相当于服务器端),以及呼叫设备2(相当于客户端),客户8通过呼叫设备2与智能应答设备1通信,MAU人工座席9(系统服务人员)对智能应答设备1进行人工操作。其中,智能应答设备1包括对话网关11,中央控制器12,MAU工作站13,和机器人14。可选地,智能应答设备1还包括训练器15。
客户8指的是机构远程销售和远程服务的对象。远程销售通常指的是机构通过自己专属的电话或互联网通道,以“呼出”的形式主动联系客户,试图对其推销自己的产品与服务。远程服务通常指的是机构的客户通过机构专属的电话或互联网通道,以“呼入”的形式主动联系机构,询问或使用机构的产品与服务。
呼叫设备2可以是机构为了对客户8进行远程销售(呼出业务)和向客户提供远程服务(呼入业务)而设立的专属电话信道或互联网信道。电话通道呼叫系统例如自动呼叫分配系统(Automatic Call Distribution,ACD),是机构通过后台的自动业务系统(例如,基于电话按键技术的传统IVR系统,或者基于智能语音技术的新型VP(Voice Portal)语音门户系统)和人工座席,与客户8以语音形式进行交互的对话通道。
互联网通道呼叫系统例如基于即时通讯(Instant Messaging,IM)技术的互联网呼叫中心系统(Internet Call Center,ICC),是机构通过后台的客户自助系统(例如,自然语言处理系统(Natural Language Processing,NLP))和人工座席,与客户8以文字、语音、图像、视频等形式,进行交互的对话通道。
智能应答设备1使得机构可以管控其后台的自动业务系统和人工座席,以及与客户8之间以文字、语音、图像、视频等多媒体形式进行的对话,从而实现机构与客户间的标准化和自动化交互对话。
对话网关11在智能应答设备1中担当“前置门户”的角色,主要职能包括:经由呼叫设备2接收来自客户8的无规则自然表达(以文字、语音、图像、视频)和规则化非自然表达(如以电话键盘按键等形式),发送给中央控制器12进行后续处理;接收来自中央控制器12的指令,实现对客户8表达的回应(以文字、语音、图像、视频、程序等形式)。
如图11所示,对话网关11包括表达接收器111,身份认证器112,回应数据库113和回应生成器114。
表达接收器111通过呼叫设备2接收来自客户8的表达。该表达可以是前述的各种无规则自然表达和规则化非自然表达。
可选地,在表达接收器111之前设置身份认证器112。该身份认证器112可以在对话的初始阶段识别和验证客户8的身份。可采用传统的“密码输入”技术(如:电话按键输入密码、键盘输入网站登录密码,等等);也可采用“密语(Pass-phrase)+声纹(Voice-print)识别”技术;也可同时混合采用以上两种技术。
设置身份认证器112,并采用“密语+声纹识别”的客户身份识别和验证方法,可以提升客户体验,使得客户无需再记忆多个不同密码;降低在“密 码输入”传统方法中密码被盗的安全风险;此外,将“密语+声纹识别”方法和“密码输入”传统方法混合使用,既能被市场广泛接受,更能提升客户身份识别和验证的安全性。
回应数据库113存储用以回应客户的回应数据。类似于以上表格中举例示出的,该数据可以包括以下多种类型:
文字:预编的文字,例如,网银FAQ(常见问答)中的文字答案。
语音:预录的真人录音,或没有变量的TTS语音合成录音,例如:“您好!这里是未来银行。请问有什么我可以帮到您的?”。
图像:预制的图像,例如,北京地铁网络图。也包括非视频动画,例如:银行给客户介绍如何在网银系统进行国际汇款操作的GIF文件、FLASH文件,等等。
视频:预制的视频,例如,电熨斗供应商给客户演示如何使用它们的新产品。
程序:预编的一系列指令,例如,在客户以说话表达“我想看中国合伙人”,云端智能电视机将按照客户的要求进行操作回应客户:首先自动打开电视机,然后从云服务器端自动下载并缓存《中国合伙人》这部电影,最后开始播放。
模板:可填变量的文字、语音、图像、程序模板。
回应生成器114接收中央控制器12指令,通过调用和/或运行回应数据库113中的数据来生成对客户8表达的回应。具体而言,可以按照指令中的标准回应ID,从回应数据库113中查询调用回应数据,或显示文字、图像,或播放语音、视频,或执行程序;也可以依指令回应数据库113中调用模板,并将指令中传送的变量参数予以填充,或播放实时产生的TTS语音合成(例如,“您已成功还款信用卡5000元。”其中,“5000元”为指令中的变量),或显示一段文字,或显示一幅实时产生的图片或动画,或执行一段程序。
可选地,中央控制器12可以对回应数据库113中的数据进行维护和更新,包括回应数据、标准回应ID等。
中央控制器12接收来自表达接收器111的客户需求表达信息(包括:无规则自然表达和规则化非自然表达),并与机器人14以及经由MAU工作站13与MAU人工座席9协同工作,从而将客户的无规则自然表达信息依 前述的方法转换为标准表达或者提取转换为所需要的标准表达并填入对应槽,并根据标准表达的转换结果或者意图获取结果来确定与之对应的标准回应ID,然后将该标准回应ID发送给回应生成器114。可选地,中央控制器12可以更新MT训练数据表中的数据。
机器人14是实施上述机器智能技术的应用机器人。机器人14可以实施对文字信息、语音信息、图像信息、视频信息等自然表达(A语言信息)的转换,得到标准表达(Y语言信息)和/或前述的意图获取和填槽操作。如前所述,当机器人14的理解能力或精确信息萃取能力达到一定水平时,例如,在某个特定范畴的判断理解能力或精确信息萃取能力成熟时,其可以独立进行A→X→Y的转换或填槽操作,而无需人工座席的辅助。MT训练数据表可以设置在机器人14内,也可以是外置数据库,在其中存储的标准表达数据或填槽结果数据(右侧语言)的需求代码可以与标准回应ID相关联。该数据库可以由中央控制器12更新。另外,用于文字翻译、语音识别、图像识别、视频处理等的数据库可以是外置数据库,也可以设置在机器人14内。
MAU工作站13是智能应答设备1与MAU人工座席9的接口。MAU工作站13将经识别的自然表达或者客户原始表达呈现给MAU人工座席9。MAU人工座席9通过MAU工作站13输入或者选择标准表达、或者输入或者选择填槽内容,然后MAU工作站13将该标准表达或填槽内容发送给中央控制器12。可选地,如果需要人工辅助确定回应,则MAU人工座席9通过MAU工作站13输入或者选择回应(或者标准回应ID)。
可选地,在智能应答设备1中还可以包括训练器15。该训练器15用于训练机器人14将自然表达转换为标准表达的能力和/或从自然表达获取意图的能力。例如,训练器15利用MAU人工座席9的判断结果去训练机器人11,不断提升机器人11在各个范畴(例如,前述的业务范畴和次级业务范畴等)的机器人理解正确率或意图理解正确率。针对每个范畴,在机器人理解正确率达不到“机器人理解成熟阀值”的情况下,训练器15将MAU人工座席9的标准表达转换结果与机器人11的标准表达转换结果进行比较处理,如结果相同,相应增加该范畴“机器人判断准确次数”和“机器人判断次数”各一次,类似地,在机器人意图获取正确率达不到“机器人意图获取成熟阀值”的情况下,训练器15将MAU人工座席9的意图获取结果与机器人11的意 图获取结果进行比较处理,如结果相同,相应增加该范畴“机器人意图获取准确次数”和“机器人意图获取次数”各一次;否则,将人工转换结果或意图获取结果(意图获取结果也可以用人工填槽结果表示)添加进MT训练数据表,作为新的机器人训练数据。训练器15也可以指示机器人14进行前述的“自学习”和训练。
此外,训练器15也可以用于对机器人14进行诸如文字翻译、语音识别、图像识别、视频处理等机器智能技术的训练。训练器15也可以对于MT训练数据表、用于文字翻译、语音识别、图像识别、视频处理的数据库进行维护和更新。
可选地,训练器15也可以与中央控制器12集成在一起。
可选地,回应生成器114和回应数据库113可以独立于对话网关11,也可以集成在中央控制器12中。
智能应答设备1可以实现前述的自然表达处理和应答方法。例如,对话网关11通过表达接收器111从呼叫设备2接收来自客户8的自然表达信息,并将其发送到中央控制器12;中央控制器12指示机器人11将该自然表达信息识别为计算机可处理的某种形式的语言信息(例如X语言信息)及相关的表达信息,然后指示机器人11将该语言信息及相关的表达信息转换为标准表达;如果机器人11的理解力不够成熟或者未实现语料匹配,而不能完成标准表达的转换,则中央控制器12指示MAU工作站13提示MAU人工座席9进行标准表达的人工转换;MAU人工座席9将机器人11识别的语言信息及相关表达信息转换为标准表达,并通过MAU工作站13输入并发送到中央控制器12,可选地,MAU人工座席9可以直接将未经识别的无规则自然表达信息转换为标准表达;中央控制器12查询表达—回应数据库,检索出与标准表达匹配的标准应答ID,如果无匹配结果,则再通过MAU工作站13提示MAU人工座席9进行标准回应的选择和输入相应的标准回应ID,可选地,MAU人工座席9也可以直接将标准表达与回应数据相关联,或者建立新的回应数据;中央控制器12指示回应生成器114调用和/或运行回应数据库113中的数据来生成对客户8表达的回应;然后,对话网关11将回应通过呼叫设备2反馈给客户8;可选地,中央控制器12根据MAU人工座席9确定或添加的标准表达或标准回应分别维护和更新MT训练数据表或回 应数据库,并且相应维护和更新表达—回应数据库。
智能应答设备1也可以实现前述的意图获取和填槽方法。例如,对话网关11通过表达接收器111从呼叫设备2接收来自客户8的自然表达信息,并将其发送到中央控制器12;中央控制器12指示机器人11将该自然表达信息识别为计算机可处理的某种形式的语言信息(例如X语言信息),然后指示机器人11从该语言信息中提取与所需标准表达对应的部分,转换为标准表达,并填入槽;如果机器人11的精确信息萃取能力不够成熟或者未实现语料匹配,而不能完成填槽,则中央控制器12指示MAU工作站13提示MAU人工座席9进行人工填槽;MAU人工座席9直接对自然表达进行理解,并根据理解结果或者通过理解得到的标准表达进行填槽操作,通过MAU工作站13输入并发送到中央控制器12;中央控制器12查询表达—回应数据库,检索出与与填槽结果对应的标准表达相匹配的标准应答ID,如果无匹配结果,则再通过MAU工作站13提示MAU人工座席9进行标准回应的选择和输入相应的标准回应ID,可选地,MAU人工座席9也可以直接将标准表达(包括填槽结果)与回应数据相关联,或者建立新的回应数据;中央控制器12指示回应生成器114调用和/或运行回应数据库113中的数据来生成对客户8表达的回应;然后,对话网关11将回应通过呼叫设备2反馈给客户8;可选地,中央控制器12根据MAU人工座席9确定或添加的标准表达(包括填槽结果)或标准回应分别维护和更新MT训练数据表或回应数据库,并且相应维护和更新表达—回应数据库。
图12A~图12P示意性地示出了根据本发明实施例的意图获取和槽填充系统的操作界面。
图12A示出了设置“FAQ”的界面。所谓“FAQ”,可以是指人机交互话术中的常见问题(也即前述的“询问项”),例如,图中,“Change Password”是修改密码,“Check Credit Balance”是查询信用卡余额,“Customer Service”是售后服务,等等。“Id”是为FAQ分配的唯一标识,用于方便地查询、输入或者选择FAQ。图12A的界面可以用来显示和设置FAQ,例如,当需要新增一个FAQ时,例如,订购机票,可以手动输入FAQ的内容描述“Flight”以及“Id”。也可以用批量上传的方式,新增和修改数据库中的FAQ数据表。图12A的界面也可以用来设置更上位范围的应用场景。可以通过应 用场景将FAQ归类。例如,将“Check Credit Balance”归入信用卡服务场景,将“Flight”归入出行服务场景。如前所述,可以通过多轮人机对话实现对一个应用场景下的多个FAQ或者多个应用场景下的多个FAQ的意图获取和填槽操作,也可以通过一次意图获取和填槽操作同时实现跨FAQ甚至跨应用场景的意图获取和多槽填充操作。
图12B的对话及响应显示界面示出了关于FAQ的初始数据形态。在图12B左侧——聊天“Chat”部分,“Customer support”表示机器人的话语,“test”表示用户的表达;在图12B右侧——引擎响应“Engine Response”部分,显示系统(引擎)理解的FAQID及FAQ。可以通过在(具有“Type a message here”标记)的输入框输入或者批量上传的方式来输入用户表达。如果之前数据库中并无用户表达数据,则用这样的方式来构建最初的用户表达数据。在如图12B所示的训练初始状态,由于没有经过训练,对应于聊天“Chat”部分的各条用户表达,如,“有没有从上海去北京的票”,“Check Credit Balance”,“Change password”,机器人的回答都是“Sorry,I don’t know”,而右侧的引擎响应“Engine Response”部分所显示的响应结果也都是缺省的“ID:Change Password”“Change Password”。图12B中所示界面下部的下拉菜单所示的“AUTO”状态表示当前系统处理采用自动方式,而未使用人工辅助。另外,图12B中所示界面左下角的“Confidence”后面的输入框中的数字“80”,表示当前设置的机器人自信度阈值为80%,也可以设置其它数字作为阈值。成熟阈值越高,对于机器人自动进行理解的准确率要求越高,通常也就需要更多的人工辅助帮助对机器人进行训练。图12B右侧下方的控件“发送”(“Send”)用于发送在其左侧的输入框中输入的消息,控键“新会话”(“New Session”)用于开启一个新的会话,在屏幕上清除左侧的数据。
在图12C所示的界面中,通过人工辅助为用户表达(“Question”)赋予对应的FAQID/FAQ。如图12C所示,由于机器人没有经过训练或者对于FAQ/FAQID的理解成熟度没有达到阈值,无法理解获得正确的FAQID,因而FAQID(界面中“Faqid”)的缺省值为“-1”。这种情况下,可以通过人工辅助填写与“Question”对应的FAQID(界面中“Expected Faqid”)。例如,在图12C中,将“ID”为322-326的“Question”所对应的“Expected Faqid”都输入或 者选择为“Flight”,即前述的订机票;将“ID”为327-329的“Question”所对应的“Expected Faqid”分别输入或者选择为“Check Credit Balance”,“Change password”和“Customer Service”。这里的“ID”是输入项的标识,“QID”是“Question”的标识,在这个界面的例子中,用户表达“question”是唯一的输入项类型,因而用“ID”作为标识,将“QID”缺省设置为“default”;也可以同时用“QID”标识“Question”,在通过批量上传的方式输入“Question”时,“QID”是很好的索引。“Timestamp”是时间戳,表示各条Question数据输入完成的时间,利用该时间可以对Question数据进行检索,以选择特定时间窗口内的数据进行操作。“CL%”是机器人对Question的理解自信度,由于还没有对机器人进行训练,因此缺省的“CL%”是“0.00”。“ResponseID”是与“Question”对应的系统应答的标识,对于不同的“Question”可以有不同的“ResponseID”,也可以对应相同的“ResponseID”。图12C中界面的左上角的输入框和“Go”用于选择其下数据表格的页,可以通过输入页码进行跳转。图12C中界面的右上部分:单选框“Include training data”(包括训练数据)用于选择检索结果是否包括已有的训练数据;单选框“Mismatch FAQID”(不匹配FAQID)用于选择检索结果是否包括FAQID与EXPECTED FAQID不相同的训练数据,这样可以查看未经人工纠正之前的不匹配数据;重置控键“Reset”用于对“Question”的检索条件进行一次性重置;检索控键“Search”用于根据设定的检索条件对“Question”及其相关数据进行检索;训练引擎控键“Train Engine”用于启动对引擎(也可以认为是前述的机器人或者机器人的一部分)的训练,人工为用户表达(“Question”)赋予对应的FAQID,也就相当于赋予对应的FAQ后,点击控键“Train Engine”(训练引擎)对机器人进行训练。
图12D所示的界面用于为FAQ生成与意图对应的槽。图12D的界面左侧是系统菜单栏,其中,“FAQ”项目下有:“Tree Editor”(树编辑器),用于编辑人工交互的话术,即基于对用户表达的理解进行应答的对话逻辑;“Import/Export”(输入/输出)用于输入或者批量上传FAQ数据或者输出FAQ数据。“Chat”(对话)项目用于对人机交互对话进行显示、选择、编辑等操作。“Response”(响应)项目下有:“Report”(报告),用于生成关于引擎响应的报告;“Import/Export”(“输入/输出”)用于输入或输出引擎响应。“Slot Filling”(槽填充)项目下有:“Report”(报告),用于生成关于槽填充 的报告;“Slots editor”(槽编辑器)用于对槽进行新建、修改、删除等编辑工作。“User”(用户)项目用来编辑用户数据。“Engine Config”(引擎配置)项目用来配置引擎。
当对于FAQ生成槽时,首先点击“Slots editor”(槽编辑器)标记,然后在(标注有“Please Select”的)“FAQ”下拉菜单选择需要生成(增加)槽的FAQ,例如“Flight”,然后点击该下拉菜单右侧的“Add”(增加)控键。接下来在弹出窗口(如图12E-1和12E-2所示)填写所生成(增加)的槽的信息。
如图12E-1和图12E-2所示,所选择的FAQ是“Flight”,在“ID”输入框填入“槽标识”(Slot ID),例如,指向“出发地”的“FROM”,指向目的地的“TO”,等等。“Sort”栏用于输入与槽对应的快捷键(即热键,Hot Key),用于静默座席以人工方式赋予槽对应值时的快速输入,例如用“1”对应“From”、用“2”对应“TO”,这样在之后通过人工辅助输入或查询槽值(slot value)时可以通过输入“Sort”数值或代码来快捷地实现对槽的指定。“Description”(描述)输入框用于输入对于槽所填入的内容的描述,例如用“FROM”描述出发地,用“TO”描述目的地。槽的有效值“Valid Values”是能够有效填入槽中的值。可以将槽的有效值看作是从用户自然表达中转换并提取的标准表达。例如图12E-1和图12E-2所示,在“Valid Values”所指向的编辑框内输入的“PEK”、“PVG”、“HKG”等都是机场的唯一代码。因为飞机出发地和目的地可以根据行程变化,但是机场的代码一般是不变的,因此,同一个槽有效值可以适应填入不同的槽。并且在不同的FAQ中或者甚至在不同的应用场景下,均可以使用某个槽有效值,并且该槽有效值在不同的FAQ或者应用场景中的含义是相同的。例如,与北京国际机场对应的槽有效值“PEK”也可以用于“餐饮”或“购物”应用场景,还可以用在“出行”应用场景下的另一个FAQ“Pick Up”(接送站)。对于采用独立数据库和引擎的应用场景产品,也可以使用相同的槽有效值来表示不同的含义。在各个输入框完成填写后,单击“Add”(“添加”)按键来将与槽标识所对应的槽的内容添加进入后台数据库,从而完成槽的添加。
填入的槽有效值与标准表达相对应,而每个标准表达可以对应多个X语言信息,通过转换得到这些X语言信息的A语言信息(自然表达)又是多 种多样的。例如,在图12E-1和图12E-2所示的例子中,从各个A语言信息,例如,SH,Shanghai Pu Dong,Shanghai,Shanghai Pudong,Pudong,Pu Dong,上海浦东国际机场,浦东国际机场,上海,上海浦东,都可以与PVG对应,也就是说,当用户自然表达中出现这些表达中的任何一个,都可能被认为与槽有效值PVG相对应,而被转换为PVG填入对应的槽中。另一方面,在通过人工辅助来训练机器人的精确信息萃取能力时,可以通过静默座席从自然表达理解出上海浦东机场,且其为出发地,然后将PVG输入与出发地对应的槽中。正确的X语言信息与填槽结果配对数据将被保存到数据库(即前述的MT训练数据表)中,供机器人学习。
机器人利用正确的X语言信息与填槽结果配对数据进行学习以提高理解正确率和自信度,因而也可以通过从外部导入训练数据的方式加快机器人的训练。
另外,还可以通过局部配对数据进行训练。如图12F所示,通过点击控键
Figure PCTCN2020073180-appb-000008
如图12G所示的弹出窗口,再点击控键“Choose File”(选择文件)可以对槽“FROM”上传槽数据文件。所述槽数据文件例如包括这样的数据:PVG,SH;PVG,Shanghai Pu Dong;PVG,Shanghai;PVG,Shanghai Pudong;PVG,Pudong;PVG,Pu Dong;PVG,上海浦东国际机场;PVG,浦东国际机场;PVG,上海;PVG,上海浦东;HKG,Hong Kong International Airport;HKG,Hong Kong Airport;HKG,Hong Kong;HKG,HK;HKG,Hongkong;HKG,香港赤腊角国际机场;HKG,香港国际机场;HKG,香港机场;HKG,香港;PEK,BJ;PEK,Beijing;PEK,Beijing Capital International Airport;PEK,Beijing Shou Du Ji Chang;PEK,Beijing Shou Du Guo Ji Ji Chang;PEK,北京首都机场;PEK,首都机场;PEK,首都国际机场;PEK,北京首都国际机场;PEK,北京,等等。这些槽数据中包括了分别与PVG、HKG、PEK对应的多个表达方式,经过训练后,机器人可以更准确或者自信地从自然表达转换得到的X语言信息中识别出与PVG、HKG、PEK对应的部分。
对于填槽而言,还需要考虑对表达的整体理解,例如,即使理解出了PVG,还需要从整体表达中获知其是应作为出发地而填入“FROM”还是作为目的地而填入“TO”,而这需要将涵盖与PVG关联的出发地和目的地信息以及对应PVG的信息的自然表达或整体自然表达转换成X语言信息,并 将该X语言信息与填槽结果“FROM”“PVG”组成配对数据进行训练。尽管仅依靠与槽有效值对应的槽数据(因为不包括与槽本身对应的信息(例如“FROM”或者“TO”),于是可以将这样的与槽有效值对应的槽数据连同槽有效值所形成的配对数据称为局部配对数据)进行训练并不能完全实现机器人自动进行填槽的能力,但是这种训练可以有效提升机器人的理解准确率和自信度,从而提高机器人意图获取和填槽的能力。并且这种训练可以在人工辅助训练之前预先进行,提高迭代运算的收敛速度,从而减少人工辅助训练的工作量。因而这种基于局部配对数据的训练可以被视为完全由机器人自己进行的预训练。根据前述的自学习原理,在训练时所采用的实际数据也仍然是将与槽有效值对应的槽数据转换后得到的X语言信息与槽有效值所形成的配对数据。
在图12G中,当上传了上述槽数据文件后,点击控键“Update”(更新)进行更新。然后可以点击控键“Train Slot Values”(训练槽值),如图12H所示,机器人利用已有的槽数据与槽有效值的配对数据进行训练。
图12F~图12H所示的用局部配对数据进行训练的方式也可以用在前述的标准表达理解转换中,作为训练的可选手段。
在前面的步骤中,例如,在图12A所示步骤新增FAQ“Flight”及其对应的Id,在图12C所示步骤对于每条输入的表达赋予“Expected Faqid”并进行引擎训练,在图12D、图12E-1和图12E-2所示步骤添加槽、输入槽的有效值并进行训练,在图12F、图12G和图12H所示的可选步骤上传槽数据并训练槽值(用槽数据和槽有效值的配对数据训练引擎),都可以是为了下面的人工辅助填槽过程进行准备。
图12I示出了人工辅助填槽的主引导界面。在该界面上部设置了多个用于数据筛选的输入框。例如,“Update Date From…To…”是用更新日期来作为数据筛选条件,“Create Date From…To…”是用创建日期来作为数据筛选条件,“Confidence Min:…Max…”是用自信度来作为数据筛选条件,“QID”、“Question”、“Faqid”、“Expected Faqid”、“ResponseId”的含义如前所述,也可以作为数据删选条件。控键“Search”(检索)用来根据设定的检索条件进行检索,而控键“Reset”(重置)用来对检索条件进行一次性全部重置。单选框“Include training data”(包括训练数据)用于选择检索结果是 否包括已有的训练数据;单选框“Mismatch FAQID”(不匹配FAQID)用于选择检索结果是否包括不匹配的FAQID。
图12I下部的表格部分与图12C类似,但是在最右侧显示出了控键
Figure PCTCN2020073180-appb-000009
用来指定槽。当通过前述操作对应于“Expexted Faqid”添加了槽之后,就会在具有这样的“Expexted Faqid”的用户表达记录后面显示控键
Figure PCTCN2020073180-appb-000010
点击控键
Figure PCTCN2020073180-appb-000011
然后在弹出的窗口中会显示“ResponseId”、“Faq”、“Question”等表项。可以进行人工操作,用鼠标或者键盘等输入工具选中用所要填入的槽所对应的填入部分,例如,图12J-1中因被选中而蓝色高亮的部分“hong kong”,并通过选中控键
Figure PCTCN2020073180-appb-000012
或者输入与槽对应的快捷键(“1”或“2”)来选中与要填入部分对应的槽,例如,槽1“FROM”,之后在“FROM”对应行中间的文字框中会显示“hong kong”,静默座席工作人员再利用同一行右边的下拉菜单来选择对应的槽有效值,例如,与被选中表达“hong kong”对应的槽有效值是“HKG”。类似地,如图12J-2所示,通过控键
Figure PCTCN2020073180-appb-000013
选中槽2“TO”,并在“Question”项的部分选中“shanghai”,以及通过下拉菜单选中对应的槽有效值“PVG”。如图12K所示选中单选框“Template”的目的是将该条用户表达“I want to buy a ticket from hong kong to shanghai”作为训练所有槽的模板。完成图12J-1、图12J-2和图12K所示的操作后,点击控键“Update”上传数据并可同时关闭此窗口。之后,可以对图12I中所示的其它具有控键
Figure PCTCN2020073180-appb-000014
的表达项进行类似的处理。
然后,如图12L所示,点击控键“Train Engine”对引擎进行训练。此时的训练不再是用局部配对数据进行训练,而是用完整用户表达所转换的X语言信息与填槽结果的配对数据进行训练,获得或者提升关于配对数据所指向的FAQ的精确信息萃取能力,在图12L所示例子中,训练针对的是对FAQ“Flight”(订机票)的精确信息萃取能力。
图12M示出了访问引擎来检验训练效果的例子。如“Chat”对话框中所示,当输入表达“我要买一张从香港去北京的机票”,机器人可以正确地识别出FAQ是“Flight”;并且在右侧的“Engine Response”部分可以看到,机器人正确记录FAQ ID是“Flight”,并且能够自动得到正确的填槽内容“From”,“HKG”,“To”,“PEK”;当输入表达“I want to buy a ticket from Beijing to shanghai”,机器人可以正确地识别出FAQ是“Flight”;并且在右 侧的“Engine Response”部分可以看到,机器人正确记录FAQ ID是“Flight”,并且能够自动得到正确的填槽内容“From”,“PEK”,“To”,“PVG”。
图12N示出了访问引擎来检验训练效果的另一个例子。在该例中,引擎响应结果出现了丢失槽值或者槽填充值错误的情况。如图12N所示,当输入的表达为“我要买一张去北京的机票,从香港飞”,机器人可以正确地识别出FAQ是“Flight”;在右侧的“Engine Response”部分可以看到,机器人正确记录FAQ ID是“Flight”,并且能够自动得到正确的填槽内容“From”,“HKG”,但是却缺少填槽内容“To”,“PEK”。这种情况下,可以通过点击图12D所示界面左侧导航栏中Response项下的“Report”,进入图12I所示的界面,如前述操作,为该条表达进行人工填槽,并训练引擎。在这种出差错后人工纠正而得到的表达与填槽结果的配对数据,可以提供有价值的训练数据,因而最好选中“Template”使该数据成为模板数据,用于以后同时对所有槽进行训练。此处的例子是对于文字表达进行处理,由于文字的信息颗粒度比X语言信息粗,因此信息量比X语言信息还小,因此将文字表达与填槽结果作为配对数据进行存储也是可以的。也就是说,可以将从文字表达转换得到的X语言信息与填槽结果构成配对数据进行训练,也可以将文字表达与填槽结果构成配对数据进行存储,在训练时再将该文字表达转换为X语言信息。
也有不能获得正确填槽内容的其它情况。如图12O所示。在图12O中,对于左侧示出的表达“我要去上海从北京走”,在右侧的“Engine Response”部分可以看到,机器人正确记录FAQ ID是“Flight”,但却不能自动得到正确的填槽内容。这种情况发生的原因是机器人自信度低于设定的阈值,如图12O所示的自信度阈值“Confidence”为80,而图12P中所示针对表达“我要去上海从北京走”,机器人的当前自信度“CL%”为69.12,低于80。对于这种情况,可以如前所述通过点击图12D所示界面左侧导航栏中Response项下的“Report”进入窗口再次训练,或者也可以调低自信度阈值,允许机器人在低于80的自信度并且在当前自信度69.12的情况下自动将理解的内容填入对应槽中。
图13示意性地示出了根据本发明实施例的机器人理解与人工辅助理解(MAU)相结合的自然表达处理的过程。如图13所示,在从上向下包括了 四层处理。
第一层处理由机器人自动完成。如前所述,可以通过设定机器人理解成熟阈值来作为机器人自动进行处理的条件,该机器人理解成熟阈值可以是机器人理解准确率阈值或者机器人自信度阈值。例如,将机器人理解成熟阈值设定为90,那么如图13所示,机器人理解准确率或者机器人自信度低于90的自然表达将不被机器人进行自动处理,而转到静默座席进行标准化的处理。
第二层处理由静默座席完成。如前所述,静默座席是一种利用客服人员对自然表达的理解能力提供标准化的理解结果,从而辅助机器人进行回答,并形成用于训练机器人的配对数据。具体而言,当机器人理解成熟度达不到阈值时,机器人会将待理解的自然表达转给静默座席处理。静默座席经系统提示后通过观看、接听等方式用自身感官来接收该自然表达,并基于自己的理解能力对自然表达进行理解,然后将用标准表达输出理解结果,之后机器人根据该理解结果进行自动应答。静默座席的理解能力就是普通客服人员的理解能力,并且由于不需要直接回应客户表达,因此不需要对静默座席人员有发声、口音、应答熟练度等要求,可以说降低了对客户人员的从业能力要求,也有利于促进社会就业。
从智能客服的角度,一方面,机器人自动接收表达并进行应答,静默座席只负责理解不需要应答,都可以大量节省人力资源,并且这种模式下静默座席可以同时对多个会话进行理解操作,从而进一步提升工作效率;另一方面,静默座席根据自然表达输出的理解结果为标准表达,因此将该自然表达与对应的标准表达形成配对数据,加入前述的MT训练数据表,可以用于对机器人进行训练,提升机器人的理解能力,随着机器人理解能力的提升,在理解成熟阈值不变的情况下,越来越少比例的客户表达被转到静默座席,因此可以进一步减少人工座席的数量,降低用工成本,从而实现系统的闭环正反馈。
机器人根据静默座席的理解结果自动应答,也能够保证回应不受客服人员的情绪、声腺、口音、业务熟练度等诸多因素影响。对于具体范畴(或者说是具体的垂直应用)而言,如果标准应答的量不太多,可以通过预先录制的语音、视频等作为应答,相比通过TTS技术等合成的语音或者合成的动画, 能够带来更好的用户体验。
关于静默座席的工作方式和工作界面,也可以参考前述图12A~图12P及其对应描述。对于静默座席关于人工辅助进行意图获取和填槽的常规工作而言,可以仅进行类似图12J-1和图12J-2的填槽操作,而不必进行FAQ设定、槽设定以及机器人预训练、训练等工作。
图14示意性地示出了一个由MAU工作站呈现给MAU人工座席9的操作界面的例子,此处的MAU人工座席9即为静默座席。如图14所示,MAU工作站13的操作界面包括:客户表达显示区131,对话状态显示区132,导航区133,范畴选择区134和快捷区135。
客户表达显示区131显示客户(即用户)的自然表达,例如,呈现从文字、图像、语音转换而成的文本等形式,或者显示作为自然表达的图像本身,也可以提示链接等,由MAU人工座席9选择点击收听语音表达。
对话状态显示区132显示客户8与MAU人工座席9或机器人14之间的对话实时状态信息,如:对话来回次数、对话总时长、客户信息等等。该显示区域也可以不设置。
导航区133显示MAU人工座席9目前已选择到达的范畴。该区左端显示目前范畴路径的文字版本(如图中所示:银行→信用卡),右端显示该范畴对代码(如图中所示:“12”,“1”代表“银行”范畴,“2”代表在“银行”范畴的下一级范畴“信用卡”。与前述的例子不同,在该应用中,用“1”代表“银行”范畴,而未用“BNK”,二者的标识作用是相同的)。
范畴选择区134供MAU人工座席9选择下一级范畴。如图中所示:MAU人工座席9已进入到“银行”范畴的下一级范畴“信用卡”,而“信用卡”这一级范畴下辖7个子范畴:“激活新卡”、“申请新卡及申请进度查询”、“还款”……。如客户8的表达是“我的信用卡能透支太少了。”,MAU人工座席9就在当前范畴“银行→信用卡”中选择“7”,导航区将更新显示“银行→信用卡→调整信用额度……127”,进入再下一级范畴。MAU人工座席9也可以在收到并理解客户8的表达后,在键盘上直接输入“127”,到达目标范畴“银行→信用卡→调整信用额度”。这样,客户8无需再花长时间遍历复杂的功能菜单树寻找自己所需的自助服务,只需直接说出自己的需求,MAU人工座席9便能快捷地帮助客户直接启动“调整信用卡额度”处理,从而,用户体验变得容 易便捷,而目前传统IVR系统的自助服务流程利用率将得到大幅提升。
快捷区135为MAU人工座席9提供了常用快捷键,例如,“-”返回上层范畴、“0”转接人工座席、“+”返回顶层范畴(在这个例子中,就是根范畴“银行”)。快捷区135也可以为MAU人工座席9提供了其它快捷键。快捷区135可以提高MAU人工座席9的处理速度。快捷区135也是可选设置区域。
这里只给出了MAU工作站13的操作界面的一个例子,其可用于MAU人工座席9对于标准表达的转换处理。也可以通过类似的操作界面来进行对于回应的人工处理。
第三层处理由高级座席进行。当静默座席遇到非标准情况,也就是在他/她不确定自己对于客户表达的理解是否正确,或者发现系统中没有可以用来对应该表达的标准表达,亦或者发现系统中没有可以准确回应该客户表达的标准应答时,静默座席可以将处理转交高级座席,由高级座席以语音或者文字的方式与客户进行直接沟通。也就是说高级座席通常负责处理非标准的情况(包括新出现的情况)。当然,静默座席也可以反馈客户自己没有听清或者没能理解客户的表达,请客户再次表达或者换一种方式表达,如果还是认为自己处理不了,再转给高级座席。这里的高级座席有些类似于传统客服的座席主管,处理疑难问题。
高级座席也可以为系统提供正反馈。具体地,高级座席将遇到的客户问题(具体表达)和解决方案(应答)形成Q&A(问题及回答),提供给后台的知识库设计师。知识库设计师进行话术的后台构建,例如针对某一具体范畴或其下的子范畴设计树状的对话方案。如图13所示,知识库设计师根据高级座席提供的Q&A,设计业务范畴“Branch-1”的子范畴“Branch-11”下的新的常见问题“FAQ-12”。该FAQ可以包括与客户表达对应的标准表达、填槽结果,还包括与标准表达、填槽结果等对应的标准应答。
前述的MAU人工座席9可以包括上述的静默座席,也可以包括上述的高级座席,还可以包括知识库设计师。
根据本发明实施例的基于人工智能的自然表达处理方法、设备和人机交互系统,通过由自然表达转换获得的X语言信息(即次级语言信息)与对应于该自然表达的含义(意图)的Y语言信息(即标准表达)构成配对数据,再通过元素排列组合的迭代比较进行自学习(训练)。也就是说,机器自学 习(训练)的基础是自然表达与对应该自然表达含义的标准表达的配对数据。如前所述,可以通过静默座席等进行人工辅助理解的方式来获得这样的配对数据,也可以通过用户输入自然表达进行验证而获得。还可以通过机器来自动获得这样的配对数据。
以自然表达为语音(声波)为例,具体而言,可以先生成与标准表达对应的文字脚本,例如,标准表达是“是”的含义,那么文字脚本可以写多条来对应这一含义,例如,“Yes”(英语),“对”,“啊”,等等,这些文字脚本可以由人工编写,也可以从数据库中调用;然后通过文本语音转换工具(TTS)转换得到对应的语音,于是得到了标准表达——语音配对数据。由于该标准表达可以预先设计,TTS工具对于文本到语音的转换又是比较准确的,因此可以得到准确的配对数据,进一步获得将标准表达转换为信息颗粒度比文字小的次级语言信息与标准表达的配对数据,从而形成供机器自学习的数据。我们也可以称这种方式是自然智能机器人的预训练。
并且可以通过TTS工具来丰富和扩展与该条标准表达对应的语音,增加配对语料。例如,可以通过TTS工具调整变化语音的语速、音量、语气、语调中的一个或多个参数。例如,1.1倍、0.9倍的语速,1.1倍、0.9倍的音量,以及通过随机变量对语音声波进行微调,该随机变量的选择以及变化范围可以基于对人语音的大数据统计模型来确定。还可以采用具有不同性别声音模型的TTS工具,采用具有不同语种或不同方言的声音模型的TTS工具,以及采用具有不同说话习惯、说话方式等声音模型的TTS工具,来生成供训练用的语音。
这样的预训练配对数据以及基于这些数据生成并存储在训练数据库中的配对数据也可以根据需要复制到其他垂直领域或者范畴的训练数据库,也可以从当前训练数据库中移除这些配对数据。
上述预训练方法同样适用于自然表达是前述的静态图像、动态图像、视频等的情况。
在图10和图11所示的人机交互系统中,可以用回应生成器114来作为上述的TTS工具生成与标准表达对应的语音。
图17示意性地示出了根据本发明实施例的基于自然智能的端到端控制系统。如图17所示,该控制系统包括传感器、机器人和控制器。在现有的 基于人工智能的端到端控制系统中,传感器为一端、控制器为另一端,其控制方式为:通过传感器获取图像等信息,由机器人基于已知规则和数据进行处理而获得控制信号,然后通过输出控制信号给控制器来进行控制。而基于自然智能的端到端控制系统中,如图17所示,控制器为机器人提供控制数据作为训练数据的一部分。
例如,在自动驾驶的场景里,人类驾驶员在车辆内驾驶或者远程操控车辆,其驾驶或者操控包括通过油门、刹车、方向盘和档位来控制车辆的加减速和方向以及对转向灯、雾灯、示宽灯、雨刷器等进行的操作。控制器获得由人类驾驶员操控而产生的控制参数,该控制参数可以是对油门、刹车、方向盘、档位、转向灯、雾灯、示宽灯、雨刷器等的具体操作行为,也可以是这些操作行为带来的控制量,例如,加减速、加减速的加速度、转向角和转向角速度,灯开关等。这些控制参数和/或控制量作为控制数据被控制器提供给机器人。
另一方面,传感器通过对行驶环境进行感知来获得诸如图像、声音、雷达测距等传感器数据,也提供给机器人。人类驾驶员的操控行为是其对于行驶环境做出的反应,也就是说,传感器数据与控制数据在时间维度上存在因果关系,因此,将多个时间间隔内的传感器数据与控制数据作为配对数据供机器人进行训练,可以使得机器人获得从传感器数据得到对应的控制数据的能力,并且在训练的同时迭代更新用于将传感器数据对应到控制数据的模型。这类似于前述的自然表达转换得到的X元素到标准表达的Y元素的机器学习/训练,该模型也可以是神经网络模型。
用于机器人学习/训练的传感器数据也可以是图像、声音、雷达测距数据等相对于时间的变化。对于运动中采集到的图像数据,可以进一步采用局部图像相对于整体图像的随时间的变化数据作为传感器数据。例如,在自动驾驶场景,这种局部图像相对于整体图像的随时间的变化数据往往反应了道路环境中的移动物体。
传感器数据中的图像、声音、雷达测距数据等接近于人的感知数据,也可以说这些数据是模拟人类驾驶员在驾驶时所感知的数据。因此,在记录人类驾驶员在车内或者远程的驾驶或操控行为时,可以考虑驾驶员的视野变化,对视野内的图像数据提高加权,还可以对视野内敏感度不同的区域施以 不同的权重。
除了上述的类人感知数据,传感器数据还可以包括对车辆自身状态的检测数据、例如车速、转向、油耗、胎压、风速、风阻等等。
如前所述,传感器端的传感器数据与控制器端的控制数据作为在时间上相关的配对数据被输入给机器人进行学习/训练,使得机器人能够举一反三,具有对传感器数据输出用于控制控制器的控制信号的能力。与数据对应的时间间隔可以是预设的较小间隔(例如图像帧间隔或其倍数),也可以是从操控行为开始到结束(例如停止操控方向盘转向并且停止操控油门和刹车)。
车辆自动驾驶是比较复杂的应用,一方面道路环境比较复杂,另一方面安全性要求非常高。根据本发明实施例的端到端控制方法和控制系统,还可以用于无人机。在无人机应用场景,对于无人机的操控可以远程进行,并可以借助虚拟视觉(VR)头盔或眼镜等工具。操作人员通过观看无人机所载摄像头收集到的图像并可以聆听无人机所载麦克风收集到的声音来作出反应,通过遥控台或者控制手柄对于无人机进行操控,无人机在操控下在水平和垂直各个方向上飞行,可以悬停和翻转,并可以进行安保监控、环境监控、药物喷洒、紧急救助、鸣笛示警等操作。
与前述的车辆自动驾驶应用类似,通过将各个时间段内的无人机传感器数据和控制数据的配对数据输入给机器人进行学习,可以使得机器人具有对于飞行环境进行自动反应的能力。
相比车辆自动驾驶,无人机的飞行环境相对简单,由于其本身质量较小且不载人,因此对于安全性的要求要低一些;更有优势的是,无人机在自动巡航过程中可以随时悬停调整,也就是说,当机器人无法自动判断下一步的行为时,可以有更充分的时间进行人工辅助,这类似与前述实施例中通过静默座席进行人工辅助。
此外,计算机自动操控是一种更简单和安全的应用。简言之,就是通过摄像头记录计算机屏幕所显示的计算机对人操作的反应,即传感器数据;同时对应地记录人通过鼠标、键盘、声音等对计算机的操作或者这些操作对计算机产生的控制量。这样,通过训练机器人,可以使得机器人能够代替人来对计算机进行根据训练人员操作习惯所进行的操作。例如,在电子游戏中的重复动作集合,这种动作集合与玩家实际游戏操作习惯接近,因此不会被认 为是代打机器人;又例如,重复下载和保存操作,一些数据网站为了防止爬虫机器人,设置了一些需要手工完成的步骤,而通过根据本发明实施例的机器人,就可以代替人来完成这些手工步骤。在这些计算机自动操控应用下,经过训练的机器人可以节省操作者大量的时间。该机器人可以通过独立的计算机实现,也可以通过被操控的计算机本身来实现。在此的计算机可以扩展到手机、平板电脑等数字处理终端。
图18示意性地示出了经过训练的机器人根据传感器数据做出判断并控制所述控制器的过程。具体而言,在步骤S50将通过传感器收集到的传感器数据输入机器人;在步骤S51机器人判断是否能够根据传感器数据确定与之对应的控制数据;如果能确定控制数据,则在步骤S52机器人根据与传感器数据对应的控制数据生成控制信号,并将该控制信号发送给控制器,控制器在步骤S54根据所述控制信号来控制设备;如果在步骤S51不能确定控制数据,则在步骤S53由人工对控制器进行操控,从而通过控制器在步骤S54控制设备,并记录操控产生的控制数据;这些人工操控控制器所产生的控制数据可以被与对应的传感器数据配对保存,以便进一步训练机器人。
机器人也可以用前述的自信度或者类似的指标来判断能否根据传感器数据确定与之对应的控制数据。
根据本发明实施例的基于自然智能的人机交互系统可以包括一台或多台计算机、移动终端或其它数据处理设备,其中,可以使用这样的数据处理设备来进行自然表达到标准表达的自动转换处理或者基于自然表达的精准信息萃取。该系统还可以实现闭环反馈以及预训练。
根据本发明实施例的自然表达处理和应答方法、设备及系统以及多意图获取方法和系统,可以利用标准表达(包括意图获取结果)快速指向回应,从而使得客户无需再花长时间遍历复杂的常规功能菜单来寻找自己所需的自助服务。
根据本发明实施例的基于自然智能的人机交互系统,通过机器人的自动学习、训练及人工辅助理解,可以建立经转换的自然表达信息(X语言信息)—标准表达(包括意图获取信息)—标准回应数据库,逐渐实现系统自动理解和回应。数据库中存储的经过转换的自然表达信息数据还可以具有业务范畴窄、保真度高等优点,从而降低机器人训练难度,缩短机器人智能的 成熟周期。
与常规的应答方式不同,静默座席主要进行后台的“决策”工作,包括确定标准表达(Y语言信息)或意图,选择回应(或回应ID)或者生成回应操作等,但不需要在前台通过通话或者文字输入等方式来与客户直接进行交流。从而可以大量节省人力资源,提升工作效率。此外,系统对客户提供的标准化回应,相对于传统人工座席直接对客户提供的传统的自由式回应,不受人工座席的情绪、声腺、口音、业务熟练度等诸多因素影响,更能保证客户体验的稳定性。
此外,可以以具体的应用场景(业务范畴)为单位实现机器人的自动学习、训练和成熟度及自信度评价,从而逐点实现整体系统的智能化。在实际应用中,该“机器人理解逐点成熟”机制更容易得到机构的认可与接受,因为风险相对来说极低,旧系统改造成本不高,且对日常运营不会造成负面影响。
根据本发明实施例的自然智能方法论的控制方法或控制系统,并不需要通过标注和模型来识别物体,也不需要建立大量的规则来进行控制,只需模仿人类的感知及与之对应的控制行为,即可自动训练控制模型,并用训练后的模型实现自动控制。从而节省标注和规则建立所需要的海量人力成本,并且能够避免类似于上述的因为标注误差或规则不完备而带来的潜在安全风险。
以上所述仅是本发明的示范性实施方式,而非用于限制本发明的保护范围,本发明的保护范围由所附的权利要求确定。

Claims (91)

  1. 一种基于自然智能的自然表达处理方法,其中,
    包括:
    接收自然表达的输入,得到具有第一信息颗粒度的第一语言信息,
    将所述第一语言信息转换为具有第二信息颗粒度的第二语言信息,其中,所述第二信息颗粒度的数量级介于所述第一信息颗粒度的数量级与文字的信息颗粒度的数量级之间,
    将所述第二语言信息转换为第三语言信息,所述第三语言信息作为对所述自然表达进行理解的结果,
    其中,
    所述第二语言信息和与该所述第二语言信息对应的第三语言信息作为配对数据被存储在数据库,
    对于所述数据库中已有的成对的第二语言信息和第三语言信息,将该第二语言信息的元素的各种排列组合与该第三语言信息或者该第三语言信息的元素的各种排列组合进行循环迭代,建立所述第二语言信息的元素的各种排列组合与所述第三语言信息或第三语言信息的元素的各种排列组合之间的对应关系,获得更多的第二语言信息与第三语言信息的配对数据,并存储在所述数据库中。
  2. 根据权利要求1所述的基于自然智能的自然表达处理方法,其中,
    当从输入的第一语言信息获得第二语言信息后,将该第二语言信息与所述数据库中已有的第二语言信息进行比较,然后根据比较结果来确定与该第二语言信息对应的第三语言信息,或者计算将该第二语言信息对应到某第三语言信息的正确率,
    如果机器理解能力不够成熟,不足以或者不确定将该第二语言信息转换到某条第三语言信息,那么进行人工辅助理解,
    通过人工对所述输入的第一语言信息进行理解,得到与自然表达的含义所对应的第三语言信息,并且将从该第一语言信息得到的第二语言信息与所述第三语言信息对应起来或者将所述第一语言信息与所述第三语言信息对 应起来,得到新的配对数据存入所述数据库。
  3. 根据权利要求2所述的基于自然智能的自然表达处理方法,其中,对于所述新的第二语言信息与第三语言信息的配对数据或者新的第一语言信息与第三语言信息的配对数据,将其中的第二语言信息或者由第一语言信息转换得到的第二语言信息的元素的各种排列组合与其中的第三语言信息或者该第三语言信息的元素的各种排列组合进行循环迭代,建立第二语言信息的元素的各种排列组合与第三语言信息或第三语言信息的元素的各种排列组合之间的对应关系,获得更多的第二语言信息与第三语言信息的配对数据,并存储在所述数据库中。
  4. 根据权利要求2所述的基于自然智能的自然表达处理方法,其中,通过人工辅助理解纠正所述数据库中第二语言信息与第三语言信息之间错误的对应关系。
  5. 根据权利要求2所述的基于自然智能的自然表达处理方法,其中,
    通过自信度来衡量机器理解能力,
    其中,基于第二语言信息与第三语言信息的对应关系来计算所述自信度。
  6. 根据权利要求5所述的基于自然智能的自然表达处理方法,其中,从第一语言信息得到第二语言信息之后,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对第三语言信息的对数概率或相类似分数,再利用归一化的指数函数来计算出对第三语言信息的自信度。
  7. 根据权利要求1所述的基于自然智能的自然表达处理方法,第二语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
  8. 根据权利要求1所述的基于自然智能的自然表达处理方法,其中,在对第二语言信息与第三语言信息的配对数据进行循环迭代时,也对第二语言 信息到第三语言信息的转换模型进行循环优化。
  9. 根据权利要求1所述的基于自然智能的自然表达处理方法,其中,用循环迭代得到的第二语言信息测试机器对于第二语言信息到第三语言信息的转换,并将不能被正确转换的第二语言信息及其应正确对应的第三语言信息写入对照表,对于后续输入的自然表达,由自然表达转换的第二语言信息先与对照表中存储的第二语言信息进行对比。
  10. 一种基于自然智能的自然表达处理及回应方法,其中,包括:
    通过根据权利要求1-9中任何一项的自然表达处理方法获得所述第三语言信息;
    调用或生成与所述第三语言信息相匹配的标准回应;
    以与所述第一语言信息对应的方式输出所述标准回应。
  11. 根据权利要求10所述的自然表达处理及回应方法,其中,所述标准回应是预先存储在回应数据库中的固定数据,或者基于变量参数和预先在回应数据库中存储的标准回应的基础数据来生成所述标准回应。
  12. 一种基于自然智能的自然表达处理及回应设备(1),其中,包括:对话网关(11),中央控制器(12),MAU工作站(13),机器人(14),表达数据库,回应数据库(113)和回应生成器(114),其中,
    对话网关(11)接收来自用户(8)的自然表达,发送给中央控制器(12)进行后续处理,并且将对自然表达的回应发送给用户(8);
    中央控制器(12)接收来自对话网关(11)的自然表达,并与机器人(14)以及MAU工作站(13)协同工作,将该自然表达转换为表示该自然表达的含义的标准表达,并根据标准表达指示回应生成器(114)生成与该标准表达对应的标准回应;
    机器人(14)根据中央控制器(12)的指示,将自然表达转换为次级语言信息,其中,次级语言信息的信息颗粒度的数量级介于自然表达的信息颗粒度的数量级与文字的信息颗粒度的数量级之间,并将次级语言信息转换为 标准表达;
    MAU工作站(13)将自然表达呈现给外部的MAU人工座席(9),MAU人工座席(9)通过MAU工作站(13)输入或者选择标准表达,然后MAU工作站(13)将该标准表达发送给中央控制器(12);
    训练数据库用于存储次级语言信息和标准表达的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对用户(8)的自然表达的回应,
    其中,
    设备(1)进一步包括训练器(15),该训练器(15)用于训练机器人(14)将自然表达转换为标准表达,
    其中,训练器(15)使得机器人对于训练数据库中已有的成对的次级语言信息和标准表达,将该次级语言信息的元素的各种排列组合与该标准表达或者该标准表达的元素的各种排列组合进行循环迭代比较,建立次级语言信息的元素的各种排列组合与标准表达或标准表达的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达的配对数据,并存储在训练数据库中。
  13. 一种基于自然智能的人机交互系统,其中,
    包括:自然表达处理及回应设备(1)和呼叫设备(2),其中,用户(8)通过呼叫设备(2)与自然表达处理及回应设备(1)通信,MAU人工座席(9)对自然表达处理及回应设备(1)进行人工操作,
    自然表达处理及回应设备(1)包括:对话网关(11),中央控制器(12),MAU工作站(13),机器人(14),表达数据库,回应数据库(113)和回应生成器(114),其中,
    对话网关(11)接收来自用户(8)的自然表达,发送给中央控制器(12)进行后续处理,并且将对自然表达的回应发送给用户(8);
    中央控制器(12)接收来自对话网关(11)的自然表达,并与机器人(14)以及MAU工作站(13)协同工作,将该自然表达转换为表示该自然表达的 含义的标准表达,并根据标准表达指示回应生成器(114)生成与该标准表达对应的标准回应;
    机器人(14)根据中央控制器(12)的指示,将自然表达转换为次级语言信息,其中,次级语言信息的信息颗粒度的数量级介于自然表达的信息颗粒度的数量级与文字的信息颗粒度的数量级之间,并将次级语言信息转换为标准表达;
    MAU工作站(13)将自然表达呈现给MAU人工座席(9),MAU人工座席(9)通过MAU工作站(13)输入或者选择标准表达,然后MAU工作站(13)将该标准表达发送给中央控制器(12);
    训练数据库用于存储次级语言信息和标准表达的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对用户(8)的自然表达的回应,
    其中,
    设备(1)进一步包括训练器(15),该训练器(15)用于训练机器人(14)将自然表达转换为标准表达,
    其中,训练器(15)使得机器人对于训练数据库中已有的成对的次级语言信息和标准表达,将该次级语言信息的元素的各种排列组合与该标准表达或者该标准表达的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准表达的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达的配对数据,并存储在训练数据库中。
  14. 一种基于自然智能的自然表达处理方法,其中,
    包括:
    接收第一自然表达,
    将所述第一自然表达转换为次级语言信息,
    计算将所述由第一自然表达转换的次级语言信息转换为数据库中的标准表达的自信度,
    当计算得到对于某标准表达的自信度不低于第一自信度阈值,输出该标准表达作为对所述第一自然表达进行理解的结果。
  15. 根据权利要求14所述的基于自然智能的自然表达处理方法,其中,当所述计算的自信度均低于第二自信度阈值,提示输入与所述第一自然表达具有相同含义的第二自然表达。
  16. 根据权利要求15所述的基于自然智能的自然表达处理方法,其中,将所述第二自然表达转换为次级语言信息,计算将所述由第二自然表达转换的次级语言信息转换为数据库中的标准表达的自信度,当计算得到对于某标准表达的自信度不低于所述第一自信度阈值,输出该标准表达作为对所述第一自然表达进行理解的结果。
  17. 根据权利要求14所述的基于自然智能的自然表达处理方法,其中,当计算得到的对某标准表达的自信度低于所述第一自信度阈值但不低于第二自信度阈值,提示输入第三自然表达以确认该标准表达是否对应于所述第一自然表达的含义。
  18. 根据权利要求17所述的基于自然智能的自然表达处理方法,其中,将所述第三自然表达转换为次级语言信息,计算将所述由第三自然表达转换的次级语言信息转换为表示“确认”含义的第二标准表达的自信度,如果该自信度不低于所述第一自信度阈值,输出所述第一标准表达作为对所述第一自然表达进行理解的结果。
  19. 根据权利要求18所述的基于自然智能的自然表达处理方法,其中,将所述由第一自然表达转换的次级语言信息与所述第一标准表达作为配对数据存储在所述数据库。
  20. 根据权利要求14所述的基于自然智能的自然表达处理方法,其中,如果所述计算的自信度低于所述第一自信度阈值或者其它自信度阈值,对所 述第一自然表达进行人工辅助理解或者其它人工处理。
  21. 根据权利要求14所述的基于自然智能的自然表达处理方法,其中,基于所述次级语言信息与标准表达的对应关系来计算所述自信度,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对单条或多条标准表达的对数概率或相类似分数,再利用归一化的指数函数来计算出对该条或该多条标准表达的自信度。
  22. 根据权利要求14所述的基于自然智能的自然表达处理方法,其中,所述次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级。
  23. 根据权利要求22所述的基于自然智能的自然表达处理方法,其中,所述次级语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
  24. 根据权利要求14所述的基于自然智能的自然表达处理方法,其中,对于所述数据库中已有的成对的次级语言信息和标准表达,将该次级语言信息的元素的各种排列组合与该标准表达或者该标准表达的元素的各种排列组合进行循环迭代,建立所述次级语言信息的元素的各种排列组合与所述标准表达或所述标准表达的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达的配对数据,并存储所述在数据库中。
  25. 根据权利要求24所述的基于自然智能的自然表达处理方法,其中,用循环迭代得到的次级语言信息测试机器对于次级语言信息到标准表达的转换,并将不能被正确转换的次级语言信息及其应正确对应的标准表达写入对照表,对于后续输入的自然表达,由自然表达转换的次级语言信息先与对照表中存储的次级语言信息进行对比。
  26. 根据权利要求24所述的基于自然智能的自然表达处理方法,其中,在对次级语言信息与标准表达的配对数据进行循环迭代时,也对次级语言信息到标准表达的转换模型进行循环优化。
  27. 一种基于自然智能的自然表达处理及回应方法,其中,包括:
    通过根据权利要求14-26中任何一项的自然表达处理方法获得所述第一标准表达;
    调用或生成与所述标准表达相匹配的标准回应;
    以与所述第一自然表达对应的方式输出所述标准回应。
  28. 一种基于自然智能的自然表达处理及回应设备(1),其中,包括:对话网关(11),中央控制器(12),MAU工作站(13),机器人(14),训练数据库,回应数据库(113)和回应生成器(114),其中,
    对话网关(11)接收来自用户(8)的自然表达,发送给中央控制器(12)进行后续处理,并且将对所述自然表达的回应发送给用户(8);
    中央控制器(12)接收来自所述对话网关(11)的自然表达,并与机器人(14)以及MAU工作站(13)协同工作,将该自然表达转换为表示该自然表达的含义的标准表达,并根据所述标准表达指示回应生成器(114)生成与该标准表达对应的标准回应;
    机器人(14)根据所述中央控制器(12)的指示,将所述自然表达转换为次级语言信息,计算将由所述自然表达转换的次级语言信息转换为训练数据库中的标准表达的自信度,当计算得到对于某标准表达的自信度不低于第一自信度阈值,将所述次级语言信息转换为该标准表达;
    MAU工作站(13)将所述自然表达呈现给外部的MAU人工座席(9),MAU人工座席(9)通过MAU工作站(13)输入或者选择标准表达,然后MAU工作站(13)将该标准表达发送给中央控制器(12);
    训练数据库用于存储所述次级语言信息和所述标准表达的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对所述用户(8)的自然表达的回应。
  29. 一种基于自然智能的人机交互系统,其中,
    包括:自然表达处理及回应设备(1)和呼叫设备(2),其中,用户(8)通过呼叫设备(2)与自然表达处理及回应设备(1)通信,MAU人工座席(9)对自然表达处理及回应设备(1)进行人工操作,
    所述自然表达处理及回应设备(1)包括:对话网关(11),中央控制器(12),MAU工作站(13),机器人(14),训练数据库,回应数据库(113)和回应生成器(114),其中,
    对话网关(11)接收来自用户(8)的自然表达,发送给中央控制器(12)进行后续处理,并且将对所述自然表达的回应发送给用户(8);
    中央控制器(12)接收来自所述对话网关(11)的自然表达,并与机器人(14)以及MAU工作站(13)协同工作,将该自然表达转换为表示该自然表达的含义的标准表达,并根据所述标准表达指示回应生成器(114)生成与该标准表达对应的标准回应;
    机器人(14)根据所述中央控制器(12)的指示,将所述自然表达转换为次级语言信息,计算将由所述自然表达转换的次级语言信息转换为训练数据库中的标准表达的自信度,当计算得到对于某标准表达的自信度不低于第一自信度阈值,将所述次级语言信息转换为该标准表达;
    MAU工作站(13)将所述自然表达呈现给MAU人工座席(9),MAU人工座席(9)通过MAU工作站(13)输入或者选择标准表达,然后MAU工作站(13)将该标准表达发送给中央控制器(12);
    训练数据库用于存储所述次级语言信息和所述标准表达的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对所述用户(8)的自然表达的回应。
  30. 一种基于自然智能的自然表达处理方法,其中,
    包括:
    在数据库中设置分别与多个意图对应的多个标准表达,
    接收自然表达,
    将所述自然表达转换为次级语言信息,
    从所述次级语言信息获取与多个所述意图对应的部分,
    将所述获取的与多个所述意图对应的次级语言信息的部分分别转换为所述标准表达,
    其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级。
  31. 根据权利要求30所述的基于自然智能的自然表达处理方法,其中,由所述自然表达转换得到的所述次级语言信息和从该次级语言信息转换得到的分别与多个意图对应的多个标准表达作为配对数据被存储在所述数据库,将该次级语言信息的元素的各种排列组合与所述多个标准表达的组合或者该多个标准表达的组合的元素的各种排列组合进行循环迭代,建立所述次级语言信息的元素的各种排列组合与所述多个标准表达的组合或者该多个标准表达的组合的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达组合的配对数据,并存储在所述数据库中。
  32. 根据权利要求31所述的基于自然智能的自然表达处理方法,其中,用循环迭代得到的次级语言信息测试机器对于次级语言信息到标准表达的转换,并将不能被正确转换的次级语言信息及其应正确对应的标准表达写入对照表,对于后续输入的自然表达,由自然表达转换的次级语言信息先与对照表中存储的次级语言信息进行对比。
  33. 根据权利要求31所述的基于自然智能的自然表达处理方法,其中,在对次级语言信息与标准表达的配对数据进行循环迭代时,也对次级语言信息到标准表达的转换模型进行循环优化。
  34. 根据权利要求30所述的基于自然智能的自然表达处理方法,其中,当从输入的自然表达获得次级语言信息后,将该次级语言信息与所述数据库中已有的次级语言信息进行比较,然后根据比较结果来确定与该次级语言信息对应的标准表达或标准表达组合,和/或计算将该次级语言信息正确对应到某标准表达的概率,
    如果机器理解能力不够成熟,不足以或者不确定将该次级语言信息转换到某标准表达,那么进行人工辅助理解,
    通过人工对所述输入的自然表达进行理解,得到与某个或某些意图所对应的标准表达或标准表达组合,并且将从该自然表达得到的次级语言信息与所述标准表达或标准表达组合对应起来或者将所述自然表达与所述标准表达或标准表达组合对应起来,得到新的配对数据存入所述数据库。
  35. 根据权利要求34所述的基于自然智能的自然表达处理方法,其中,对于所述新的次级语言信息与标准表达或标准表达组合的配对数据或者新的自然表达与标准表达或标准表达组合的配对数据,将其中的次级语言信息或者由自然表达转换得到的次级语言信息的元素的各种排列组合与其中的标准表达或标准表达组合本身或者该标准表达或标准表达组合的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准表达组合本身或者该标准表达或标准表达组合的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达或标准表达组合的配对数据,并存储在所述数据库中。
  36. 根据权利要求34所述的基于自然智能的自然表达处理方法,其中,通过人工辅助理解纠正所述数据库中次级语言信息与标准表达或标准表达组合之间错误的对应关系。
  37. 根据权利要求34所述的基于自然智能的自然表达处理方法,其中,
    通过自信度来衡量机器理解能力,
    其中,基于次级语言信息与标准表达的对应关系来计算所述自信度。
  38. 根据权利要求37所述的基于自然智能的自然表达处理方法,其中,从自然表达得到次级语言信息之后,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对单条或多条标准表达的对数概率或相类似分数,再利用归一化的指数函数来计算出对于该条或该多条标准表达的自信度。
  39. 根据权利要求30所述的基于自然智能的自然表达处理方法,次级语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
  40. 根据权利要求30所述的基于自然智能的自然表达处理方法,其中,通过多次理解或者多轮会话来从次级语言信息获取与多个所述意图对应的部分。
  41. 根据权利要求30所述的基于自然智能的自然表达处理方法,其中,在所述数据库中设置多个上位意图,在每个上位意图设置多个下位意图,在一次意图获取操作中,从次级语言信息获取与不同上位意图的各自下位意图对应的部分,并将这些部分转换为标准表达。
  42. 根据权利要求30所述的基于自然智能的自然表达处理方法,其中,对于与所述多个意图中的一个意图对应的标准表达,或者与所述多个意图中的一部分意图对应的标准表达的组合,预先在所述数据库中存储该标准表达和与该标准表达对应的自然表达或次级语言信息作为配对训练数据,或存储所述标准表达组合和与该标准表达组合对应的自然表达或次级语言信息作为配对训练数据,并利用这些配对训练数据进行训练。
  43. 一种基于自然智能的自然表达处理及回应方法,其中,包括:
    通过根据权利要求30-42中任何一项的自然表达处理方法获得标准表达或标准表达的组合;
    调用或生成与所述标准表达或标准表达的组合相匹配的标准回应;
    以与所述自然表达对应的方式输出所述标准回应。
  44. 根据权利要求43所述的自然表达处理及回应方法,其中,所述标准回应是预先存储在回应数据库中的固定数据,或者基于变量参数和预先在回应数据库中存储的标准回应的基础数据来生成所述标准回应。
  45. 一种基于自然智能的自然表达处理及回应设备(1),其中,包括:对话网关(11),中央控制器(12),MAU工作站(13),机器人(14),表达数据库,回应数据库(113)和回应生成器(114),其中,
    对话网关(11)接收来自用户(8)的自然表达,发送给中央控制器(12)进行后续处理,并且将对所述自然表达的回应发送给用户(8);
    中央控制器(12)接收来自所述对话网关(11)的自然表达,并与机器人(14)以及MAU工作站(13)协同工作,将该自然表达转换为与设置的多个意图对应的多个标准表达,并根据所述标准表达指示回应生成器(114)生成与该标准表达对应的标准回应;
    机器人(14)根据所述中央控制器(12)的指示,将所述自然表达转换为次级语言信息,从所述次级语言信息获取与多个所述意图对应的部分,将所述获取的与多个所述意图对应的次级语言信息的部分分别转换为所述标准表达,其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级;
    MAU工作站(13)将所述自然表达呈现给外部的MAU人工座席(9),MAU人工座席(9)通过MAU工作站(13)输入或者选择标准表达,然后MAU工作站(13)将该标准表达发送给中央控制器(12);
    训练数据库存储次级语言信息和标准表达或标准表达组合的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对所述用户(8)的自然表达的回应,
    训练器(15),该训练器(15)用于训练所述机器人(14)将所述自然表达转换为所述标准表达或标准表达组合。
  46. 一种基于自然智能的人机交互系统,其中,
    包括:自然表达处理及回应设备(1)和呼叫设备(2),其中,用户(8)通过呼叫设备(2)与自然表达处理及回应设备(1)通信,MAU人工座席(9)对自然表达处理及回应设备(1)进行人工操作,
    所述自然表达处理及回应设备(1)包括:对话网关(11),中央控制器 (12),MAU工作站(13),机器人(14),表达数据库,回应数据库(113)和回应生成器(114),其中,
    对话网关(11)接收来自用户(8)的自然表达,发送给中央控制器(12)进行后续处理,并且将对所述自然表达的回应发送给用户(8);
    中央控制器(12)接收来自所述对话网关(11)的自然表达,并与机器人(14)以及MAU工作站(13)协同工作,将该自然表达转换为与设置的多个意图对应的多个标准表达,并根据所述标准表达指示回应生成器(114)生成与该标准表达对应的标准回应;
    机器人(14)根据所述中央控制器(12)的指示,将所述自然表达转换为次级语言信息,从所述次级语言信息获取与多个所述意图对应的部分,将所述获取的与多个所述意图对应的次级语言信息的部分分别转换为所述标准表达,其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级;
    MAU工作站(13)将所述自然表达呈现给外部的MAU人工座席(9),MAU人工座席(9)通过MAU工作站(13)输入或者选择标准表达,然后MAU工作站(13)将该标准表达发送给中央控制器(12);
    训练数据库存储次级语言信息和标准表达或标准表达组合的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对所述用户(8)的自然表达的回应,
    训练器(15),该训练器(15)用于训练所述机器人(14)将所述自然表达转换为所述标准表达或标准表达组合。
  47. 一种基于自然智能的自然表达处理方法,其中,
    包括:
    接收并存储自然表达,
    将所述自然表达转换为次级语言信息,
    计算将所述由自然表达转换的次级语言信息转换为数据库中的标准表达的自信度,
    当对于第一标准表达所计算的自信度不低于第一自信度阈值,输出所述第一标准表达作为对所述第一自然表达进行理解的结果;
    当所述自信度低于第一自信度阈值,静默座席对存储的自然表达进行理解,
    当静默座席能够理解所述自然表达,则由静默座席输入理解得到的第二标准表达;
    当静默座席不能理解所述自然表达,则静默座席提示再次输入具有相同含义的自然表达或者转由高级座席理解所述存储的自然表达并进行应答。
  48. 根据权利要求47所述的基于自然智能的自然表达处理方法,其中,知识库设计师根据高级座席对静默座席不能理解的自然表达的应答进行话术的后台构建。
  49. 根据权利要求47所述的基于自然智能的自然表达处理方法,其中,将所述由自然表达转换的次级语言信息与所述第二标准表达作为配对数据存储在所述数据库。
  50. 根据权利要求47所述的基于自然智能的自然表达处理方法,其中,基于所述次级语言信息与所述标准表达的对应关系来计算所述自信度,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对单条或多条标准表达的对数概率或相类似分数,再利用归一化的指数函数来计算出对于该条或该多条标准表达的自信度。
  51. 根据权利要求47所述的基于自然智能的自然表达处理方法,其中,所述次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级。
  52. 根据权利要求51所述的基于自然智能的自然表达处理方法,其中,所述次级语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
  53. 根据权利要求47所述的基于自然智能的自然表达处理方法,其中, 对于所述数据库中已有的成对的次级语言信息和标准表达,将该次级语言信息的元素的各种排列组合与该标准表达或者该标准表达的元素的各种排列组合进行循环迭代,建立所述次级语言信息的元素的各种排列组合与所述标准表达或所述标准表达的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达的配对数据,并存储所述在数据库中。
  54. 根据权利要求53所述的基于自然智能的自然表达处理方法,其中,用循环迭代得到的次级语言信息测试机器对于次级语言信息到标准表达的转换,并将不能被正确转换的次级语言信息及其应正确对应的标准表达写入对照表,对于后续输入的自然表达,由自然表达转换的次级语言信息先与对照表中存储的次级语言信息进行对比。
  55. 根据权利要求53所述的基于自然智能的自然表达处理方法,其中,在对次级语言信息与标准表达的配对数据进行循环迭代时,也对次级语言信息到标准表达的转换模型进行循环优化。
  56. 一种基于自然智能的自然表达处理及回应方法,其中,包括:
    通过根据权利要求47-55中任何一项的自然表达处理方法获得所述第一标准表达或所述第二标准表达;
    调用或生成与第一标准表达或第二标准表达相匹配的标准回应;
    以与所述自然表达对应的方式输出所述标准回应。
  57. 一种基于自然智能的自然表达处理及回应设备(1),其中,包括:对话网关(11),中央控制器(12),MAU工作站(13),机器人(14),训练数据库,回应数据库(113)和回应生成器(114),其中,
    对话网关(11)接收来自用户(8)的自然表达,发送给中央控制器(12)进行后续处理,并且将对所述自然表达的回应发送给用户(8);
    中央控制器(12)接收来自所述对话网关(11)的自然表达,并与机器人(14)以及MAU工作站(13)协同工作,将该自然表达转换为表示该自然表达的含义的标准表达,并根据所述标准表达指示回应生成器(114)生 成与该标准表达对应的标准回应;
    机器人(14)根据所述中央控制器(12)的指示,将所述自然表达转换为次级语言信息,计算将由所述自然表达转换的次级语言信息转换为训练数据库中的标准表达的自信度,当对于第一标准表达所计算的自信度不低于第一自信度阈值,将所述次级语言信息转换为第一标准表达;
    MAU工作站(13)将所述自然表达呈现给外部的MAU人工座席(9),其中,MAU人工座席(9)包括静默座席和高级座席,静默座席通过MAU工作站(13)输入或者选择标准表达,然后MAU工作站(13)将该标准表达发送给中央控制器(12),当所述计算得到的自信度低于第一自信度阈值,静默座席对存储的自然表达进行理解,当静默座席能够理解所述自然表达,则由静默座席输入理解得到的第二标准表达,当静默座席不能理解所述自然表达,则静默座席提示用户(8)再次输入具有相同含义的自然表达或者转由高级座席理解所述存储的自然表达并进行应答;
    训练数据库用于存储所述次级语言信息和所述标准表达的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对所述用户(8)的自然表达的回应。
  58. 一种基于自然智能的人机交互系统,其中,
    包括:自然表达处理及回应设备(1)和呼叫设备(2),其中,用户(8)通过呼叫设备(2)与自然表达处理及回应设备(1)通信,MAU人工座席(9)对自然表达处理及回应设备(1)进行人工操作,
    所述自然表达处理及回应设备(1)包括:对话网关(11),中央控制器(12),MAU工作站(13),机器人(14),训练数据库,回应数据库(113)和回应生成器(114),其中,
    对话网关(11)接收来自用户(8)的自然表达,发送给中央控制器(12)进行后续处理,并且将对所述自然表达的回应发送给用户(8);
    中央控制器(12)接收来自所述对话网关(11)的自然表达,并与机器人(14)以及MAU工作站(13)协同工作,将该自然表达转换为表示该自 然表达的含义的标准表达,并根据所述标准表达指示回应生成器(114)生成与该标准表达对应的标准回应;
    机器人(14)根据所述中央控制器(12)的指示,将所述自然表达转换为次级语言信息,计算将由所述自然表达转换的次级语言信息转换为训练数据库中的标准表达的自信度,当对于第一标准表达所计算的自信度不低于第一自信度阈值,将所述次级语言信息转换为第一标准表达;
    MAU工作站(13)将所述自然表达呈现给MAU人工座席(9),其中,MAU人工座席(9)包括静默座席和高级座席,静默座席通过MAU工作站(13)输入或者选择标准表达,然后MAU工作站(13)将该标准表达发送给中央控制器(12),当所述计算得到的自信度低于第一自信度阈值,静默座席对存储的自然表达进行理解,当静默座席能够理解所述自然表达,则由静默座席输入理解得到的第二标准表达,当静默座席不能理解所述自然表达,则静默座席提示用户(8)再次输入具有相同含义的自然表达或者转由高级座席理解所述存储的自然表达并进行应答;
    训练数据库用于存储所述次级语言信息和所述标准表达的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对所述用户(8)的自然表达的回应。
  59. 一种对基于自然智能的人机交互系统进行训练的方法,其中,
    包括:
    生成与标准表达对应的文字脚本,
    通过文本语音转换工具得到与所述文字脚本对应的语音,
    将各条语音分别转换为次级语言信息,
    其中,次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级,
    所述次级语言信息和与其对应标准表达作为配对数据被存储在数据库,对于数据库中已有的成对的次级语言信息和标准表达,将该次级语言信息的元素的各种排列组合与该标准表达或者该标准表达的元素的各种排列 组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或或标准表达的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达的配对数据,并存储在所述数据库中。
  60. 根据权利要求59所述的对基于自然智能的人机交互系统进行训练的方法,其中,
    输入语音,
    将输入的语音转换为次级语言信息,
    将从输入的语音转换得到的次级语言信息与数据库中已有的次级语言信息进行比较,然后根据比较结果来确定与该次级语言信息对应的标准表达,和/或计算将该次级语言信息正确对应到某标准表达的概率,
    如果机器理解能力不够成熟,不足以或者不确定将该次级语言信息转换到某标准表达,那么进行人工辅助理解,
    通过人工对所述输入的语音进行理解,得到标准表达,并且将从该语音得到的次级语言信息与该标准表达对应起来,得到新的配对数据存入所述数据库。
  61. 根据权利要求60所述的对基于自然智能的人机交互系统进行训练的方法,其中,对于所述新的次级语言信息与标准表达或标准表达组合的配对数据或者新的自然表达与标准表达或标准表达组合的配对数据,将其中的次级语言信息或者由自然表达转换得到的次级语言信息的元素的各种排列组合与其中的标准表达或标准表达组合本身或者该标准表达或标准表达组合的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准表达组合本身或者该标准表达或标准表达组合的元素的各种排列组合之间的对应关系,获得更多的次级语言信息与标准表达或标准表达组合的配对数据,并存储在所述数据库中。
  62. 根据权利要求60所述的对基于自然智能的人机交互系统进行训练的方法,其中,通过人工辅助理解纠正所述数据库中次级语言信息与标准表达或标准表达组合之间错误的对应关系。
  63. 根据权利要求60所述的对基于自然智能的人机交互系统进行训练的方法,其中,
    通过自信度来衡量机器理解能力,
    其中,基于次级语言信息与标准表达的对应关系来计算所述自信度。
  64. 根据权利要求63所述的对基于自然智能的人机交互系统进行训练的方法,其中,从自然表达得到次级语言信息之后,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对单条或多条标准表达的对数概率或相类似分数,再利用归一化的指数函数来计算出对于该条或该多条标准表达的自信度。
  65. 根据权利要求59所述的对基于自然智能的人机交互系统进行训练的方法,次级语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
  66. 根据权利要求59所述的对基于自然智能的人机交互系统进行训练的方法,其中,用循环迭代得到的次级语言信息测试机器对于次级语言信息到标准表达的转换,并将不能被正确转换的次级语言信息及其应正确对应的标准表达写入对照表,对于后续输入的自然表达,由自然表达转换的次级语言信息先与对照表中存储的次级语言信息进行对比。
  67. 根据权利要求59所述的对基于自然智能的人机交互系统进行训练的方法,其中,在对次级语言信息与标准表达的配对数据进行循环迭代时,也对次级语言信息到标准表达的转换模型进行循环优化。
  68. 根据权利要求59所述的对基于自然智能的人机交互系统进行训练的方法,其中,通过文本语音转换工具调整变化语音的语速、音量、语气、语调中的一个或多个参数。
  69. 一种基于自然智能的语音处理及回应设备(1),其中,包括:对话 网关(11),中央控制器(12),MAU工作站(13),机器人(14),训练数据库,回应数据库(113),回应生成器(114),文本语音转换器,其中,对话网关(11)接收来自用户(8)的语音,发送给中央控制器(12)进行后续处理,并且将对所述语音的回应发送给用户(8);
    中央控制器(12)接收来自所述对话网关(11)的语音,并与机器人(14)以及MAU工作站(13)协同工作,将该语音转换为表示该语音的含义的标准表达,并根据所述标准表达指示回应生成器(114)生成与该标准表达对应的标准回应;
    机器人(14)根据所述中央控制器(12)的指示,将所述语音转换为次级语言信息,其中,所述次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级,并将所述次级语言信息转换为所述标准表达;
    MAU工作站(13)将所述语音呈现给外部的MAU人工座席(9),MAU人工座席(9)通过MAU工作站(13)输入或者选择标准表达,然后MAU工作站(13)将该标准表达发送给中央控制器(12);
    训练数据库用于存储所述次级语言信息和所述标准表达的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对所述用户(8)的语音的回应,
    文本语音转换器,基于与标准表达对应的文字脚本生成与该文字脚本对应的语音,机器人(14)将文本语音转换器得到的语音转换为次级语言信息,并将该次级语言信息与相应文本所对应的标准表达构成配对数据存储在训练数据库,
    其中,
    所述设备(1)进一步包括训练器(15),该训练器(15)用于训练所述机器人(14)将所述语音转换为所述标准表达,
    其中,机器人(14)将次级语言信息的元素的各种排列组合与对应的标准表达或者该标准表达的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准表达的元素的各种排列组合之间的对应关系,获得的次级语言信息与标准表达的配对数据,存储在训练 数据库中。
  70. 一种基于自然智能的人机交互系统,其中,
    包括:自然表达处理及回应设备(1)和呼叫设备(2),其中,用户(8)通过呼叫设备(2)与自然表达处理及回应设备(1)通信,MAU人工座席(9)对自然表达处理及回应设备(1)进行人工操作,
    所述自然表达处理及回应设备(1)包括:对话网关(11),中央控制器(12),MAU工作站(13),机器人(14),训练数据库,回应数据库(113),回应生成器(114),文本语音转换器,其中,
    对话网关(11)接收来自用户(8)的语音,发送给中央控制器(12)进行后续处理,并且将对所述语音的回应发送给用户(8);
    中央控制器(12)接收来自所述对话网关(11)的语音,并与机器人(14)以及MAU工作站(13)协同工作,将该语音转换为表示该语音的含义的标准表达,并根据所述标准表达指示回应生成器(114)生成与该标准表达对应的标准回应;
    机器人(14)根据所述中央控制器(12)的指示,将所述语音转换为次级语言信息,其中,所述次级语言信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级,并将所述次级语言信息转换为所述标准表达;
    MAU工作站(13)将所述语音呈现给外部的MAU人工座席(9),MAU人工座席(9)通过MAU工作站(13)输入或者选择标准表达,然后MAU工作站(13)将该标准表达发送给中央控制器(12);
    训练数据库用于存储所述次级语言信息和所述标准表达的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对所述用户(8)的语音的回应,
    文本语音转换器,基于与标准表达对应的文字脚本生成与该文字脚本对应的语音,机器人(14)将文本语音转换器得到的语音转换为次级语言信息,并将该次级语言信息与相应文本所对应的标准表达构成配对数据存储在训练数据库,
    其中,
    所述设备(1)进一步包括训练器(15),该训练器(15)用于训练所述机器人(14)将所述语音转换为所述标准表达,
    其中,机器人(14)将次级语言信息的元素的各种排列组合与对应的标准表达或者该标准表达的元素的各种排列组合进行循环迭代,建立次级语言信息的元素的各种排列组合与标准表达或标准表达的元素的各种排列组合之间的对应关系,获得的次级语言信息与标准表达的配对数据,存储在训练数据库中。
  71. 一种对机器人进行训练的方法,其中,
    包括:
    用训练数据库中的表达数据与意图数据的正确配对数据来对机器人进行训练;
    机器人对这些表达数据进行理解,将理解结果与正确配对的意图数据进行对比,找到理解错误的表达数据;
    将理解错误的表达数据及与其对应的意图数据写入独立于训练数据库的对照表,
    其中,机器人在以后进行理解时先将所要理解的表达数据与对照表中的表达数据进行比对,如果发现该表达数据在对照表中,则直接通过对照表找到对应的理解结果,如果在对照表中没有找到该表达数据,那么再在训练数据库中进行比对。
  72. 根据权利要求71所述的对机器人进行训练的方法,其中,表达数据是从自然表达转换得到的次级语言信息。
  73. 根据权利要求72所述的对机器人进行训练的方法,其中,次级语言信息的信息颗粒度是文字的信息颗粒度的1/10~1/1000。
  74. 根据权利要求71所述的对机器人进行训练的方法,其中,对于训练数据库中已有的成对的表达数据和意图数据,将该表达数据的元素的各种排 列组合与该意图数据或者该意图数据的元素的各种排列组合进行循环迭代,建立表达数据的元素的各种排列组合与意图数据或意图数据的元素的各种排列组合之间的对应关系,获得更多的表达数据与意图数据的配对数据,并存储在训练数据库中。
  75. 根据权利要求74所述的对机器人进行训练的方法,其中,在对表达数据与意图数据的配对数据进行循环迭代时,也对表达数据到意图数据的转换模型进行循环优化。
  76. 根据权利要求71所述的对机器人进行训练的方法,其中,对照表还用来存储出现概率较高的表达数据及与其对应的意图数据。
  77. 根据权利要求71所述的对机器人进行训练的方法,其中,通过人工辅助理解纠正训练数据库中表达数据与意图数据之间错误的对应关系。
  78. 根据权利要求71所述的对机器人进行训练的方法,其中,生成与意图数据对应的脚本,通过转换工具得到与该脚本对应的自然表达,从该自然表达转换得到表达数据,从而获得表达数据与意图数据的正确配对数据。
  79. 根据权利要求78所述的对机器人进行训练的方法,其中,所述脚本是文字脚本,所述自然表达是语音,通过文本语音转换工具调整变化语音的语速、音量、语气、语调中的一个或多个参数。
  80. 一种自然表达处理及回应设备(1),其中,包括:对话网关(11),中央控制器(12),MAU工作站(13),机器人(14),训练数据库,回应数据库(113),回应生成器(114),其中,
    对话网关(11)接收来自用户(8)的自然表达,发送给中央控制器(12)进行后续处理,并且将对所述自然表达的回应发送给用户(8);
    中央控制器(12)接收来自所述对话网关(11)的自然表达,并与机器人(14)以及MAU工作站(13)协同工作,将该自然表达转换为表示该自 然表达的含义的意图数据,并根据所述意图数据指示回应生成器(114)生成与该意图数据对应的标准回应;
    机器人(14)根据所述中央控制器(12)的指示,将所述自然表达转换为表达数据,并得到与该表达数据对应的意图数据;
    MAU工作站(13)将所述自然表达呈现给外部的MAU人工座席(9),MAU人工座席(9)通过MAU工作站(13)输入或者选择意图数据,然后MAU工作站(13)将该意图数据发送给中央控制器(12);
    训练数据库用于存储表达数据和意图数据的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对所述用户(8)的自然表达的回应,
    其中,
    所述设备(1)进一步包括训练器(15),该训练器(15)用于训练所述机器人(14)从所述自然表达获得意图数据,其中,训练器(15)用权利要求71-79中任一项所述的方法来对机器人(14)进行训练。
  81. 一种人机交互系统,其中,
    包括:自然表达处理及回应设备(1)和呼叫设备(2),其中,用户(8)通过呼叫设备(2)与自然表达处理及回应设备(1)通信,MAU人工座席(9)对自然表达处理及回应设备(1)进行人工操作,
    所述自然表达处理及回应设备(1)包括:对话网关(11),中央控制器(12),MAU工作站(13),机器人(14),训练数据库,回应数据库(113),回应生成器(114),其中,
    对话网关(11)接收来自用户(8)的自然表达,发送给中央控制器(12)进行后续处理,并且将对所述自然表达的回应发送给用户(8);
    中央控制器(12)接收来自所述对话网关(11)的自然表达,并与机器人(14)以及MAU工作站(13)协同工作,将该自然表达转换为表示该自然表达的含义的意图数据,并根据所述意图数据指示回应生成器(114)生成与该意图数据对应的标准回应;
    机器人(14)根据所述中央控制器(12)的指示,将所述自然表达转换为表达数据,并得到与该表达数据对应的意图数据;
    MAU工作站(13)将所述自然表达呈现给外部的MAU人工座席(9),MAU人工座席(9)通过MAU工作站(13)输入或者选择意图数据,然后MAU工作站(13)将该意图数据发送给中央控制器(12);
    训练数据库用于存储表达数据和意图数据的配对数据;
    回应数据库(113)存储回应相关数据,包括供调用的标准回应数据和/或用于生成回应的数据;
    回应生成器(114)接收中央控制器(12)的指令,通过调用和/或运行回应数据库(113)中的数据来生成对所述用户(8)的自然表达的回应,
    其中,
    所述设备(1)进一步包括训练器(15),该训练器(15)用于训练所述机器人(14)从所述自然表达获得意图数据,其中,训练器(15)用权利要求71-79中任一项所述的方法来对机器人(14)进行训练。
  82. 一种端到端控制方法,其中,
    包括:
    在操作者操控设备时,通过传感器从被控设备的外部环境和/或被控设备本身收集传感器数据,并实时记录由操控者的操控所产生的控制数据;
    将在时间上关联的传感器数据和控制数据作为配对数据输入给机器人;
    机器人用配对数据进行训练;
    其中,经过训练的机器人根据传感器数据做出判断并控制所述设备,包括:
    将通过传感器收集到的传感器数据输入机器人;
    机器人判断是否能够根据传感器数据确定与之对应的控制数据,
    如果能确定控制数据,则机器人根据与传感器数据对应的控制数据来控制所述设备;如果机器人不能确定控制数据,则由操作者对设备进行操控。
  83. 根据权利要求82所述的端到端控制方法,其中,机器人不能确定控制数据而由操作者对设备进行操控时,通过传感器从被控设备的外部环境和 /或被控设备本身收集传感器数据,并实时记录由操控者的操控所产生的控制数据,将该传感器数据和控制数据作为配对数据来训练机器人。
  84. 根据权利要求82所述的端到端控制方法,其中,在对机器人进行训练时,自动优化将传感器数据对应到控制数据的模型。
  85. 根据权利要求82所述的端到端控制方法,其中,机器人基于自信度来判断是否能够确定与传感器数据对应的控制数据,其中,基于传感器数据与控制数据的对应关系来计算所述自信度,通过深度神经网络、有穷状态转换器、自动编码器解码器中的一个或多个来产生对控制数据的对数概率或相类似分数,再利用归一化的指数函数来计算出对该控制数据的自信度。
  86. 根据权利要求82所述的端到端控制方法,其中,传感器数据包括图像,机器人在进行训练之前将图像转换为次级图像信息,该次级图像信息的信息颗粒度比像素粗但比物体识别所用的信息颗粒度细。
  87. 根据权利要求82所述的端到端控制方法,其中,传感器数据包括语音,机器人在进行训练之前将语音转换为次级语音信息,该次级语音信息的信息颗粒度的数量级小于文字的信息颗粒度的数量级。
  88. 根据权利要求82所述的端到端控制方法,其中,在预设的时间间隔内关联传感器数据和控制数据。
  89. 根据权利要求88所述的端到端控制方法,其中,传感器数据是在所述预设时间间隔内由传感器从被控设备的外部环境和/或被控设备本身收集的图像、声音和距离中的一种或多种相对于时间的变化。
  90. 根据权利要求88所述的端到端控制方法,其中,所述设备是交通工具、无人机或者数字处理终端。
  91. 一种端到端控制系统,其中,
    包括:传感器,机器人和控制器,
    其中,
    在操作者通过控制器操控设备时,通过传感器从被控设备的外部环境和/或被控设备本身收集传感器数据,并实时记录由操控者的操控使得控制器所产生的控制数据;
    将在时间上关联的传感器数据和控制数据作为配对数据输入给机器人;
    机器人用配对数据进行训练;
    其中,经过训练的机器人根据传感器数据做出判断并控制所述设备,包括:
    将通过传感器收集到的传感器数据输入机器人;
    机器人判断是否能够根据传感器数据确定与之对应的控制数据,
    如果能确定控制数据,则机器人根据与传感器数据对应的控制数据来生产控制信号,控制器根据该控制信号控制所述设备;如果机器人不能确定控制数据,则由操作者通过控制器对设备进行操控。
PCT/CN2020/073180 2019-01-23 2020-01-20 基于自然智能的自然表达处理方法、回应方法、设备及系统,对机器人进行训练的方法,人机交互系统,对基于自然智能的人机交互系统进行训练的方法,端到端控制方法和控制系统 WO2020151652A1 (zh)

Applications Claiming Priority (16)

Application Number Priority Date Filing Date Title
CN201910064402.0 2019-01-23
CN201910064406.9 2019-01-23
CN201910065177.2 2019-01-23
CN201910065177.2A CN110059167A (zh) 2019-01-23 2019-01-23 自然智能的自然表达处理方法、回应方法、设备及系统
CN201910065098.1 2019-01-23
CN201910064406.9A CN110059166A (zh) 2019-01-23 2019-01-23 自然智能的自然表达处理方法、回应方法、设备及系统
CN201910064402 2019-01-23
CN201910065178.7A CN110046232A (zh) 2019-01-23 2019-01-23 自然智能的自然表达处理方法、回应方法、设备及系统
CN201910065098.1A CN110008317A (zh) 2019-01-23 2019-01-23 自然智能的自然表达处理方法、回应方法、设备及系统
CN201910065179.1 2019-01-23
CN201910065179.1A CN110059168A (zh) 2019-01-23 2019-01-23 对基于自然智能的人机交互系统进行训练的方法
CN201910065178.7 2019-01-23
CN201910303688.3A CN110019688A (zh) 2019-01-23 2019-04-16 对机器人进行训练的方法
CN201910303688.3 2019-04-16
CN201910420711.7 2019-05-20
CN201910420711.7A CN110083110A (zh) 2019-01-23 2019-05-20 基于自然智能的端到端控制方法和控制系统

Publications (1)

Publication Number Publication Date
WO2020151652A1 true WO2020151652A1 (zh) 2020-07-30

Family

ID=71736764

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/073180 WO2020151652A1 (zh) 2019-01-23 2020-01-20 基于自然智能的自然表达处理方法、回应方法、设备及系统,对机器人进行训练的方法,人机交互系统,对基于自然智能的人机交互系统进行训练的方法,端到端控制方法和控制系统

Country Status (1)

Country Link
WO (1) WO2020151652A1 (zh)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1489086A (zh) * 2002-10-10 2004-04-14 莎 刘 一种语义约定全文翻译系统和方法
US20070299824A1 (en) * 2006-06-27 2007-12-27 International Business Machines Corporation Hybrid approach for query recommendation in conversation systems
CN103593340A (zh) * 2013-10-28 2014-02-19 茵鲁维夫有限公司 自然表达信息处理方法、处理及回应方法、设备及系统
CN110008317A (zh) * 2019-01-23 2019-07-12 艾肯特公司 自然智能的自然表达处理方法、回应方法、设备及系统
CN110019688A (zh) * 2019-01-23 2019-07-16 艾肯特公司 对机器人进行训练的方法
CN110046232A (zh) * 2019-01-23 2019-07-23 艾肯特公司 自然智能的自然表达处理方法、回应方法、设备及系统
CN110059168A (zh) * 2019-01-23 2019-07-26 艾肯特公司 对基于自然智能的人机交互系统进行训练的方法
CN110059166A (zh) * 2019-01-23 2019-07-26 艾肯特公司 自然智能的自然表达处理方法、回应方法、设备及系统
CN110059167A (zh) * 2019-01-23 2019-07-26 艾肯特公司 自然智能的自然表达处理方法、回应方法、设备及系统

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1489086A (zh) * 2002-10-10 2004-04-14 莎 刘 一种语义约定全文翻译系统和方法
US20070299824A1 (en) * 2006-06-27 2007-12-27 International Business Machines Corporation Hybrid approach for query recommendation in conversation systems
CN103593340A (zh) * 2013-10-28 2014-02-19 茵鲁维夫有限公司 自然表达信息处理方法、处理及回应方法、设备及系统
CN110008317A (zh) * 2019-01-23 2019-07-12 艾肯特公司 自然智能的自然表达处理方法、回应方法、设备及系统
CN110019688A (zh) * 2019-01-23 2019-07-16 艾肯特公司 对机器人进行训练的方法
CN110046232A (zh) * 2019-01-23 2019-07-23 艾肯特公司 自然智能的自然表达处理方法、回应方法、设备及系统
CN110059168A (zh) * 2019-01-23 2019-07-26 艾肯特公司 对基于自然智能的人机交互系统进行训练的方法
CN110059166A (zh) * 2019-01-23 2019-07-26 艾肯特公司 自然智能的自然表达处理方法、回应方法、设备及系统
CN110059167A (zh) * 2019-01-23 2019-07-26 艾肯特公司 自然智能的自然表达处理方法、回应方法、设备及系统
CN110083110A (zh) * 2019-01-23 2019-08-02 艾肯特公司 基于自然智能的端到端控制方法和控制系统

Similar Documents

Publication Publication Date Title
US10977452B2 (en) Multi-lingual virtual personal assistant
US9753914B2 (en) Natural expression processing method, processing and response method, device, and system
US20210233521A1 (en) Method for speech recognition based on language adaptivity and related apparatus
JP7022062B2 (ja) 統合化された物体認識および顔表情認識を伴うvpa
CN110083110A (zh) 基于自然智能的端到端控制方法和控制系统
CN110838288A (zh) 一种语音交互方法及其系统和对话设备
US9361589B2 (en) System and a method for providing a dialog with a user
CN115329779B (zh) 一种多人对话情感识别方法
CN110059166A (zh) 自然智能的自然表达处理方法、回应方法、设备及系统
CN110046232A (zh) 自然智能的自然表达处理方法、回应方法、设备及系统
EP2879062A2 (en) A system and a method for providing a dialog with a user
CN114722839B (zh) 人机协同对话交互系统及方法
CN110059168A (zh) 对基于自然智能的人机交互系统进行训练的方法
US20210165974A1 (en) Artificial intelligence apparatus for learning natural language understanding models
CN110059167A (zh) 自然智能的自然表达处理方法、回应方法、设备及系统
CN110008317A (zh) 自然智能的自然表达处理方法、回应方法、设备及系统
CN116450799B (zh) 一种应用于交通管理服务的智能对话方法及设备
CN117216212A (zh) 对话处理方法、对话模型训练方法、装置、设备及介质
WO2020151652A1 (zh) 基于自然智能的自然表达处理方法、回应方法、设备及系统,对机器人进行训练的方法,人机交互系统,对基于自然智能的人机交互系统进行训练的方法,端到端控制方法和控制系统
CN115221306B (zh) 自动应答评价方法及装置
Araki et al. Spoken dialogue system for learning Braille
Schuller et al. Speech communication and multimodal interfaces
Ge et al. Dialogue management based on sentence clustering
CN117809616A (zh) 一种服务器、显示设备及语音交互方法
CN117809681A (zh) 一种服务器、显示设备及数字人交互方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20744767

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20744767

Country of ref document: EP

Kind code of ref document: A1