CN110298372B - Method and system for automatically training virtual assistant - Google Patents

Method and system for automatically training virtual assistant Download PDF

Info

Publication number
CN110298372B
CN110298372B CN201810244565.2A CN201810244565A CN110298372B CN 110298372 B CN110298372 B CN 110298372B CN 201810244565 A CN201810244565 A CN 201810244565A CN 110298372 B CN110298372 B CN 110298372B
Authority
CN
China
Prior art keywords
corpus
training
query
query data
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810244565.2A
Other languages
Chinese (zh)
Other versions
CN110298372A (en
Inventor
周忠信
吴兆麟
许旭正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digiwin Software Co Ltd
Original Assignee
Digiwin Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digiwin Software Co Ltd filed Critical Digiwin Software Co Ltd
Priority to CN201810244565.2A priority Critical patent/CN110298372B/en
Publication of CN110298372A publication Critical patent/CN110298372A/en
Application granted granted Critical
Publication of CN110298372B publication Critical patent/CN110298372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A method and system for automatically training a virtual assistant. The method for automatically training the virtual assistant comprises the following steps: analyzing the data structure of the enterprise database to form a domain knowledge database and analyzing the workflow of the enterprise resource system to form an application knowledge database; establishing a query data operation corpus generator by using a domain knowledge database, and establishing an execution instruction operation corpus generator by using an application knowledge database; generating a plurality of query data operation training corpuses by using a query data operation corpus generator, and generating a plurality of execution instruction operation training corpuses by using an execution instruction operation corpus generator to form a training corpus set; forming a plurality of system domain vocabularies and a plurality of service application parameters into a key entity set; and generating a common vocabulary model and a common semantic model by utilizing the key entity set and the training corpus set. Therefore, the effects of quickly training and updating the virtual assistant are achieved.

Description

Method and system for automatically training virtual assistant
Technical Field
The present disclosure relates to a method and system for training a virtual assistant, and more particularly, to a method and system for automatically training a virtual assistant.
Background
An enterprise resource planning system (Enterprise Resource Planning, ERP), abbreviated as ERP system, refers to a management platform that provides decisions for an enterprise decision-making layer based on information technology. The method mainly manages the people stream, the logistics, the information stream and the fund stream of the enterprise uniformly so as to utilize the resources of the enterprise to the maximum extent. The ERP system has three functions of production control, logistics management and financial management, so that the ERP system is quite large in scale.
In modern life, the virtual assistant (or intelligent assistant) can help the user to communicate with the electronic product directly in the natural language of oral or text form, providing a more convenient and rapid communication mode for the user. In order to apply the virtual assistant to the ERP system, training of common vocabulary and common functions in the ERP system is needed for the virtual assistant to combine with the ERP system, but training of the intelligent assistant requires providing training corpus of natural language besides providing database, which means that training corpus is needed to be continuously provided for conversation with the virtual assistant by someone, so that the virtual assistant has the capability of interacting with the person. Therefore, how to train a virtual assistant quickly, so that the virtual assistant can have the related knowledge of the ERP system and the capability of interacting with people is a problem to be improved in the field.
Disclosure of Invention
The invention mainly aims at providing a method and a system for automatically training a virtual assistant, which mainly can automatically generate training corpus of natural language, so that the virtual assistant can train by using the training corpus, and the effects of quickly training and updating the virtual assistant are achieved.
To achieve the above object, in a first aspect, a method for automatically training a virtual assistant is provided, the method comprising the steps of: analyzing the data structure of the enterprise database to form a domain knowledge database and analyzing the workflow of the enterprise resource system to form an application knowledge database; establishing a query data operation corpus generator by using a domain knowledge database, and establishing an execution instruction operation corpus generator by using an application knowledge database; generating a plurality of query data operation training corpuses by using a query data operation corpus generator, and generating a plurality of execution instruction operation training corpuses by using an execution instruction operation corpus generator to form a training corpus set; forming a plurality of system domain vocabularies and a plurality of service application parameters into a key entity set; and generating a common vocabulary model and a common semantic model by utilizing the key entity set and the training corpus set.
According to an embodiment of the present application, the generating the common vocabulary model and the common semantic model by using the key entity set and the training corpus set further includes: differentiating the intentions of the plurality of query data operation training corpuses according to categories in the enterprise database to form a plurality of query data operation intentions, and differentiating the intentions of the plurality of execution instruction operation training corpuses according to service behaviors provided by the enterprise resource system to form a plurality of execution instruction operation intentions; establishing a template of the operation intents of the plurality of query data and a template of the operation intents of the plurality of execution instructions; establishing a general database according to the key entity set, the templates of the operation intents of the plurality of query data and the templates of the operation intents of the plurality of execution instructions; identifying a plurality of first probabilities of the plurality of system domain words in the key entity set in the training corpus set, analyzing a plurality of sentence pattern structures of the plurality of query data operation training corpus through the identified plurality of system domain words, and a plurality of correlations among the plurality of system domain words, and establishing a common word model according to the plurality of first probabilities and the plurality of correlations; and analyzing a plurality of second probabilities of the plurality of system domain vocabularies in the plurality of query data operation intents and the plurality of execution instruction operation intents, and establishing a common semantic model according to the plurality of sentence pattern structures and the plurality of second probabilities.
According to an embodiment of the present disclosure, the query data operation corpus generator further includes: analyzing a plurality of query corpus data of the enterprise database, and summarizing a query rule of the plurality of query corpus data; and automatically generating the plurality of query data operation corpus according to the query rule.
According to an embodiment of the present disclosure, the executing instruction operates a corpus generator, further including: analyzing a plurality of pieces of execution corpus data interacted with the enterprise resource system, and summarizing an execution rule of the plurality of pieces of execution corpus data; and automatically generating the plurality of execution instruction operation training corpuses according to the execution rule.
According to one embodiment, the training corpus is operated by using the plurality of query data and the plurality of execution instructions, which are automatically generated, the common vocabulary model and the common semantic model are trained, and a virtual assistant executes corresponding operations according to the common vocabulary model and the common semantic model.
A second aspect of the present invention provides a system for automatically training a virtual assistant, which is connected to an enterprise database and an enterprise resource system, respectively, comprising: a processor and a storage device. The storage device is electrically connected to the processor and used for storing the overall database, the application knowledge database and the domain knowledge database. Wherein the processor comprises: the system comprises an analysis module, a generator building module, a training corpus generating module and a semantic and vocabulary model building module. The analysis module is used for analyzing the data structure of the enterprise database to form a domain knowledge database and analyzing the operation flow of the enterprise resource system to form an application knowledge database. The generator establishing module is electrically connected with the training module and is used for establishing a query data operation corpus generator by using the domain knowledge database and establishing an execution instruction operation corpus generator by using the application knowledge database. The training corpus generating module is electrically connected with the generator establishing module and is used for generating a plurality of inquiry data operation training corpuses by utilizing the inquiry data operation corpus generator, generating a plurality of execution instruction operation training corpuses by the execution instruction operation corpus generator to form a training corpus set, and forming a key entity set according to a plurality of system domain vocabularies and a plurality of service application parameters. The semantic and vocabulary model building module is electrically connected with the training corpus generating module and is used for generating a common vocabulary model and a common semantic model by utilizing the key entity set and the training corpus set.
According to an embodiment of the present disclosure, the semantic and vocabulary model building module further includes: a model establishing module, electrically connected to the corpus generating module, for differentiating the intentions of the plurality of query data operation corpuses according to the categories in the enterprise database to form a plurality of query data operation intentions, differentiating the intentions of the plurality of execution instruction operation corpuses according to the service behaviors provided by the enterprise resource system to form a plurality of execution instruction operation intentions, establishing a model of the plurality of query data operation intentions and a model of the plurality of execution instruction operation intentions, and then establishing a general database according to the key entity set, the model of the plurality of query data operation intentions and the model of the plurality of execution instruction operation intentions; the vocabulary model building module is electrically connected with the model building module, and is used for identifying a plurality of first probabilities of the plurality of system domain vocabularies in the key entity set in the training corpus set, analyzing a plurality of sentence pattern structures of the plurality of query data operation training corpus through the identified plurality of system domain vocabularies, and a plurality of correlations among the plurality of system domain vocabularies, and building a common vocabulary model according to the plurality of first probabilities and the plurality of correlations; and a semantic model building module electrically connected with the template building module, for analyzing a plurality of second probabilities of the plurality of system domain vocabularies in the plurality of query data operation intentions and the plurality of execution instruction operation intentions, and building a common semantic model according to the plurality of sentence structures and the plurality of second probabilities.
According to an embodiment of the present disclosure, the query data operation corpus generator is configured to analyze a plurality of query corpus data of the enterprise database, and to generalize a query rule of the plurality of query corpus data; and automatically generating the plurality of query data operation corpus according to the query rule.
According to an embodiment of the present disclosure, the executing instruction is configured to operate a corpus generator to analyze a plurality of executing corpus data interacted with the enterprise resource system, and to summarize an executing rule of the plurality of executing corpus data; and automatically generating the plurality of execution instruction operation training corpuses according to the execution rule.
According to one embodiment, the training corpus is operated by using the plurality of query data and the plurality of execution instructions, which are automatically generated, the common vocabulary model and the common semantic model are trained, and a virtual assistant executes corresponding operations according to the common vocabulary model and the common semantic model.
The method and the system for automatically training the virtual assistant can automatically generate the training corpus of the natural language to generate the semantic and vocabulary models, so that the virtual assistant can interact with a user according to the semantic and vocabulary models, and can continuously generate new training results through the automatically generated training corpus to achieve the effects of quickly training and updating the virtual assistant.
Drawings
The foregoing and other objects, features, advantages and embodiments of the invention will be apparent from the following description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic diagram of a system for automatically training a virtual assistant according to some embodiments of the present disclosure;
FIG. 2 is a schematic diagram of a processor according to some embodiments of the present disclosure;
FIG. 3 is a schematic diagram of a semantic and lexical model building block according to some embodiments of the present disclosure;
FIG. 4 is a flow chart of a method of automatically training a virtual assistant according to some embodiments of the present disclosure; and
fig. 5 is a flowchart of step S450 according to some embodiments of the present disclosure.
Detailed Description
The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. Elements and configurations in specific examples are used in the following discussion to simplify the present disclosure. Any exemplifications set out herein are for illustrative purposes only, and are not intended to limit the scope and meaning of the invention or its exemplifications in any manner. Moreover, the present disclosure may repeat reference numerals and/or letters in the various examples, which are for the purpose of simplicity and illustration, and does not in itself dictate a relationship between the various embodiments and/or configurations discussed below.
The term "about" as used throughout the specification and claims, unless otherwise indicated, shall generally have the meaning of each term used in this field, in the context of the disclosure and in the special context. Certain terms used to describe the disclosure are discussed below, or elsewhere in this specification, to provide additional guidance to those skilled in the art in describing the disclosure.
As used herein, "coupled" or "connected" may mean that two or more elements are in direct physical or electrical contact with each other, or in indirect physical or electrical contact with each other, and "coupled" or "connected" may also mean that two or more elements are in operation or action with each other.
The terms first, second, third, etc. are used herein to describe various elements, components, regions, layers and/or blocks. These elements, components, regions, layers and/or blocks should not be limited by these terms. These terms are only used to distinguish one element, component, region, layer or section from another. Accordingly, a first element, component, region, layer and/or section discussed below could be termed a second element, component, region, layer and/or section without departing from the spirit of the present invention. As used herein, the term "and/or" includes any combination of one or more of the listed associated items. Reference in the present document to "and/or" means any, all, or any combination of at least one of the listed elements.
Please refer to fig. 1. FIG. 1 is a schematic diagram of a system 100 for automatically training a virtual assistant according to some embodiments of the present disclosure. As depicted in FIG. 1, a system 100 for automatically training a virtual assistant is coupled to an enterprise database 101 and an enterprise resource system 102, which includes a processor 110 and a storage 130. The storage device 130 is used for storing the overall database 131, the application knowledge database 132 and the domain knowledge database 133, and the overall database 131, the application knowledge database 132 and the domain knowledge database 133 are electrically connected to the processor 110.
In various embodiments of the invention, the processor 110 may be implemented as an integrated circuit such as a microcontroller (microcontroller), microprocessor (microprocessor), digital signal processor (digital signal processor), application specific integrated circuit (application specific integrated circuit, ASIC), logic circuit, or other similar element, or a combination of the above elements. The storage device 130 may be implemented as a memory, hard disk, portable disk, memory card, etc.
Referring to fig. 2 and 3 together, fig. 2 is a schematic diagram of a processor 110 according to some embodiments of the present disclosure, and fig. 3 is a schematic diagram of a semantic and lexical model building module 114 according to some embodiments of the present disclosure. The processor 110 includes an analysis module 111, a generator building module 112, and corpus generation modules 113 and 114. The generator building module 112 is electrically connected with the analysis module 111, the corpus generating module 113 is electrically connected with the generator building module 112, and the semantic and vocabulary model building module 114 is electrically connected with the corpus generating module 113. The semantic and vocabulary model building module 114 includes a template building module 1141, a vocabulary model building module 1142, and a semantic model building module 1143. The vocabulary model building module 1142 and the semantic model building module 1143 are electrically connected to the model building module 1141.
Please refer to fig. 1-4 together. Fig. 4 is a flow chart of a method 400 of automatically training a virtual assistant according to some embodiments of the present disclosure. As shown in fig. 4, a method 400 of automatically training a virtual assistant includes the steps of:
step S410: analyzing the data structure of the enterprise database to form a domain knowledge database and analyzing the workflow of the enterprise resource system to form an application knowledge database;
step S420: establishing a query data operation corpus generator by using a domain knowledge database, and establishing an execution instruction operation corpus generator by using an application knowledge database;
step S430: generating a plurality of query data operation training corpuses by using a query data operation corpus generator, and generating a plurality of execution instruction operation training corpuses by using an execution instruction operation corpus generator to form a training corpus set;
step S440: forming a plurality of system domain vocabularies and a plurality of service application parameters into a key entity set; and
step S450: and generating a common vocabulary model and a common semantic model by utilizing the key entity set and the training corpus set.
In step S410, the data structure of the enterprise database 101 is analyzed to form the domain knowledge database 133 and the workflow of the enterprise resource system 102 is analyzed to form the application knowledge database 132. In one embodiment, the application knowledge database 132 and the domain knowledge database 133 are required to be established, and in addition to analyzing the operation flow and operation procedure of the enterprise resource system 102, how the enterprise personnel interact with the enterprise resource system 102 is required to be collected, for example, which operation procedure of the enterprise resource system 102 is used when the enterprise personnel uses the leave service provided by the enterprise resource system 102, and the parameters such as the leave name, leave time, agent personnel and the like are required to be provided when the leave service is used. Similarly, in addition to analyzing the data structure of the enterprise database 101 to find out the specialized vocabulary of the enterprise domain, it is also necessary to analyze the relevance between the specialized vocabularies, for example, the bill of lading, the customer name, the commodity name, etc. are all vocabularies with relevance, because the contents of the bill of lading will record to which customer the shipment is and the commodity of the shipment.
Next, in step S420 and step S430, a query data operation corpus generator is built by using the domain knowledge database 133, an execution instruction operation corpus generator is built by using the application knowledge database 132, then a plurality of query data operation training corpuses are generated by using the query data operation corpus generator, and a plurality of execution instruction operation training corpuses are generated by the execution instruction operation corpus generator, so as to form a training corpus set.
In one embodiment, the query data operation corpus generator is configured to analyze the natural language used by the enterprise personnel to query the enterprise database 101, and to induce the query rules from the natural language used by the enterprise personnel, so that the query data operation corpus generator can automatically generate the corpus data of the query data operation. The query rules for querying the data operation may be [ preamble ] + [ enterprise data condition ] + [ connective ] + [ enterprise domain professional to query ] + [ suffix ], for example, if the natural language used by the enterprise personnel is "I want to find an order of the month on company A, you know? "I want to find" in this example is [ preamble ], "A company" and "last month" are [ Enterprise data conditions ], there can be multiple Enterprise data conditions, in this example 2 Enterprise data conditions, "order" is [ connective ], "you know? The term "is a suffix".
In view of the foregoing, the executing instruction operation corpus generator is configured to analyze the natural language used when interacting with the enterprise resource system 102, and to induce the natural language used by the enterprise personnel to generate the executing rule, so that the executing instruction operation corpus generator can automatically generate the corpus data of executing instruction operation. The execution rule for executing the instruction operation may be [ preamble ] + [ enterprise system service parameter ] + [ connector ] + [ enterprise system service to be used ] + [ suffix ], for example, if the natural language used by the enterprise personnel is "help me please 1/15-1/16 sick" in this example, "help me please" is [ preamble ], "1/15-1/16" is [ enterprise system service parameter ], the enterprise system service parameter may be plural, in this example, only 1 enterprise system service parameter is [ connector ], "sick" is [ enterprise system service to be used ], in this example, there is no [ suffix ]. In this way, after the query data operation corpus generator and the query rules and the execution rules corresponding to the execution instruction operation corpus generator are established, a large amount of training corpuses can be generated to form a training corpus set.
In step S440, a plurality of system domain vocabularies and a plurality of service application parameters are formed as a set of key entities. For example, the set of key entities includes information such as business domain vocabulary and service application parameters of the enterprise system. The vocabulary in the enterprise domain refers to the vocabulary that may be needed by each enterprise in different domains, for example, the vocabulary used in the medical industry and the vocabulary used in the transportation industry are necessarily different, so that the vocabulary in the enterprise domain may be changed according to each enterprise using the ERP system. The service application parameters of the enterprise system are parameters corresponding to various services provided by the enterprise system, for example, the leave-leave function in the enterprise system may need leave-leave time, leave-leave information, and the system domain vocabulary in the key entity set may need to include leave-leave, annual leave, sick leave, business leave information, and the like.
In detail, the key entity set further includes a data field name that can be used when accessing data, a service name provided to the user by the enterprise system, a parameter value of a limiting condition set by the user when querying, a parameter value of a service application, an operation function of the enterprise system, and the operation function of the enterprise system can be an operation function such as leave application, overtime application, business trip application, report, and the like. The above information may have corresponding aliases, and may need to be input together when training the database, for example: the shipment may be given different names such as shipment details or sales bills for a manufacturer in a particular area.
In step S450, a common vocabulary model and a common semantic model are generated by using the key entity set and the training corpus set. Referring to fig. 5 for details of step S450, fig. 5 is a flowchart of step S450 according to some embodiments of the present disclosure. As shown in FIG. 5, the stage of generating the vocabulary and semantic model includes the following steps:
step S451: differentiating the intentions of the query data operation training corpus according to the categories in the enterprise database to form a plurality of query data operation intentions, and differentiating the intentions of the execution instruction operation training corpus according to the service behaviors provided by the enterprise resource system to form a plurality of execution instruction operation intentions;
step S452: establishing a template of query data operation intents and executing instruction operation intents;
step S452: establishing an overall database according to the key entity set, the template of the query data operation intention and the template of the execution instruction operation intention;
step S453: identifying a plurality of first probabilities of the system domain vocabulary in the key entity set in the training corpus set, operating a plurality of sentence pattern structures of the training corpus by analyzing and inquiring data through the identified system domain vocabulary, and a plurality of correlations among the system domain vocabulary, and establishing a common vocabulary model according to the first probabilities and the correlations; and
step S454: and analyzing a plurality of second probabilities of the system domain vocabulary in the query data operation intention and the execution instruction operation intention, and establishing a common semantic model according to the sentence pattern structure and the second probabilities.
In step S451, the intent of the query data operation training corpus is differentiated according to the category in the enterprise database 101 to form a plurality of query data operation intents, and the intent of the execution instruction operation training corpus is differentiated according to the service behavior provided by the enterprise resource system 102 to form a plurality of execution instruction operation intents. In one embodiment, intent is first differentiated for query data according to enterprise database 101 for each different domain. For example, the data fields stored in the enterprise database of the medical industry must not be identical to the enterprise database of the transportation industry, so that the user requirements of both are not necessarily identical. For example, a user in the healthcare industry may have different intents to query medical record data, query room space, etc. for data, and a user in the transportation industry may have different intents to query shipment records, query package shipping status, etc. for data. The service provided by the enterprise resource system of the medical industry is different from the transportation industry, and the query data operation or the service behavior operation provided by the enterprises of the different fields are not necessarily universal, so that the service provided by the enterprises of the different fields needs to be distinguished, for example, the user of the medical industry may have different intentions of providing registered service, providing service of ordering health food in hospital, and the like, and the user of the transportation industry may have different intentions of providing service of automatically classifying goods, arranging goods delivery order, and the like.
In step S452 and step S453, a template of query data operation intention and a template of execution instruction operation intention are established, and the overall database 131 is established according to the set of key entities, the template of query data operation intention and the template of execution instruction operation intention. For example, after the user distinguishes the operation intention of the query data and the operation intention of the execution instruction of the virtual assistant of the enterprise in a certain area, a corresponding template can be generated for each intention, according to the above example, the medical industry has 4 enterprise resource system instruction operation templates corresponding to the query of medical record data, the query of ward room, the provision of registered services and the provision of inpatient ordered health services, and the transportation industry has 4 enterprise resource system instruction operation templates corresponding to the query of shipment records, the query of package delivery status, the provision of services for automatically classifying the goods and the provision of goods shipment sequence, and then an overall database 131 is established according to the templates and the key entity set.
In step S454, a plurality of first probabilities of occurrence of the system domain vocabulary in the key entity set in the corpus set are identified, a plurality of sentence pattern structures of the corpus set are operated by analyzing the query data of the identified system domain vocabulary, and a plurality of correlations between the system domain vocabulary are performed, and a common vocabulary model is established according to the first probabilities and the correlations. In one embodiment, the probability of each system domain vocabulary appearing in the training corpus is calculated using two algorithms, n-GRAM and Context-free grammar (CFG), and the sentence pattern structure of the training corpus and the relevance of the system domain vocabularies to each other are analyzed by the system domain vocabularies to build a common vocabulary model. For example, if there are "i want to query the quotation of company a" and "i want to query the delivery of company a" in the training corpus, and "company a", "quotation" and "delivery" are all system domain vocabularies, but in the above example, since "company a" may appear on average in each intention of query data operation, the probability of "company a" is almost the same in each intention of query data operation, while "quotation" and "delivery" appear only in a large amount in the training corpus of intentions of querying some specific data, but not in the training corpus of intentions of querying other data, so the probability of "quotation" and "delivery" will be particularly high in the corresponding intentions, but lower in other intentions.
In step S455, a plurality of second probabilities of occurrence of the system domain vocabulary in the query data operation intent and the execution instruction operation intent are analyzed, and a common semantic model is established according to the sentence structure and the second probabilities. In one embodiment, the probability of the system domain vocabulary appearing simultaneously in each intent (including query data manipulation intent and execution instruction manipulation intent) is calculated using a hidden Markov model (Hidden Markov Model, HMM) algorithm to build a common semantic model, e.g., a number of training corpora are input during the training data model stage, which must calculate the probability of the system domain vocabulary appearing simultaneously in different graphs. In combination with the above example, if there is "i want to query the delivery form of company a" in the training corpus, it can find out that "company a" and "delivery form" are system domain vocabulary according to n-gram and context-free grammar, and the hidden markov model algorithm can calculate the probability that all the recognized system domain vocabulary (i.e., "company a" and "delivery form") appear in a specific intention (e.g., query data operation intention for querying delivery related data or execution instruction operation intention for applying delivery difference) at the same time according to all recognized system domain vocabulary in each sentence corpus under different agreements, as a semantic model for recognizing user intention; according to the common semantic model established by the hidden Markov model algorithm, the virtual assistant can judge that the intention of the query shipment data is highly correlated when the company A and the shipment bill are simultaneously generated, and then the user can automatically inquire the shipment related data of the company A in the enterprise database by combining the enterprise resource system instruction operation template for inquiring the shipment data and the vocabulary of the system field of the company A as inquiry conditions.
After the common vocabulary model and the common semantic model are established, the virtual assistant can execute corresponding operations according to the common vocabulary model and the common semantic model. For example, when there is a voice input, the virtual assistant performs voice recognition to convert the natural language into corpus data, then finds out the key words in the corpus data and determines the intention of the user (the user's needs can be understood at this time) according to the established common word model and common semantic model, and the virtual assistant can perform corresponding operations (e.g. searching data in a database or performing enterprise service operations) according to the user's needs and the identified key words.
The embodiment of the present invention is mainly to improve the capability of the virtual assistant to interact with the person by providing the training corpus with a constant dialog of the person with the virtual assistant when the virtual assistant is trained. Therefore, the semantic model and the vocabulary model can be trained by automatically generating the training corpus of the natural language, so that the virtual assistant can interact with a user according to the semantic model and the vocabulary model, and the virtual assistant can also continuously generate new training results through the automatically generated training corpus, thereby achieving the effects of quickly training and updating the virtual assistant.
Additionally, the above illustration includes exemplary steps in a sequence, but the steps need not be performed in the order shown. It is within the contemplation of the present disclosure that these steps be performed in a different order. It is contemplated that sequences may be added, substituted, altered, and/or omitted within the spirit and scope of the embodiments of the disclosure.
While the present invention has been described with reference to the embodiments, it should be understood that the invention is not limited thereto, but may be variously modified and modified by those skilled in the art without departing from the spirit and scope of the present invention, and the scope of the present invention is accordingly defined by the appended claims.

Claims (10)

1. A method of automatically training a virtual assistant, comprising:
analyzing a data structure of an enterprise database to form a domain knowledge database and analyzing a workflow of an enterprise resource system to form an application knowledge database;
utilizing the domain knowledge database to summarize a query rule to establish a query data operation corpus generator and utilizing the application knowledge database to summarize an execution rule to establish an execution instruction operation corpus generator, wherein the query rule comprises at least one of a lead word, an enterprise data condition, a connective word, an enterprise domain specialized vocabulary which is required to be queried and a suffix word, and the execution rule comprises at least one of a lead word, an enterprise system service parameter, a connective word, an enterprise system service which is required to be used and a suffix word;
generating a plurality of query data operation training corpuses by using the query data operation corpus generator, and generating a plurality of execution instruction operation training corpuses by using the execution instruction operation corpus generator to form a training corpus set;
forming a plurality of system domain vocabularies and a plurality of service application parameters into a key entity set; and
and generating a common vocabulary model and a common semantic model by using the key entity set and the training corpus set.
2. The method of automatically training a virtual assistant of claim 1, wherein generating the common vocabulary model and the common semantic model using the set of key entities and the set of training corpora further comprises:
differentiating the intentions of the plurality of query data operation training corpuses according to categories in the enterprise database to form a plurality of query data operation intentions, and differentiating the intentions of the plurality of execution instruction operation training corpuses according to service behaviors provided by the enterprise resource system to form a plurality of execution instruction operation intentions;
establishing a template of the operation intents of the plurality of query data and a template of the operation intents of the plurality of execution instructions;
establishing a general database according to the key entity set, the templates of the operation intents of the plurality of query data and the templates of the operation intents of the plurality of execution instructions;
identifying a plurality of first probabilities of the plurality of system domain words in the key entity set in the training corpus set, analyzing a plurality of sentence pattern structures of the plurality of query data operation training corpuses through the identified plurality of system domain words, and a plurality of correlations among the plurality of system domain words, and establishing a common word model according to the plurality of first probabilities, the plurality of sentence pattern structures and the plurality of correlations; and
analyzing a plurality of second probabilities of the plurality of system domain vocabularies in the plurality of query data operation intentions and the plurality of execution instruction operation intentions, and establishing a common semantic model according to the plurality of sentence pattern structures and the plurality of second probabilities.
3. The method of automatically training a virtual assistant of claim 1, wherein the query data operates a corpus generator, further comprising:
analyzing a plurality of query corpus data of the enterprise database, and summarizing a query rule of the plurality of query corpus data; and
and automatically generating the plurality of query data operation training corpus according to the query rules.
4. The method of automatically training a virtual assistant of claim 1, wherein the executing instructions operate a corpus generator, further comprising:
analyzing a plurality of pieces of execution corpus data interacted with the enterprise resource system, and summarizing an execution rule of the plurality of pieces of execution corpus data; and
and automatically generating the plurality of execution instruction operation training corpuses according to the execution rule.
5. The method of claim 2, wherein the training corpus is operated by the automatically generated plurality of query data and the training corpus is operated by the plurality of execution instructions to train the common vocabulary model and the common semantic model, and wherein a virtual assistant performs corresponding operations according to the common vocabulary model and the common semantic model.
6. A system for automatically training virtual assistants, each connected to an enterprise database and an enterprise resource system, comprising:
a processor;
a storage device, electrically connected to the processor, for storing a general database, an application knowledge database and a domain knowledge database;
wherein the processor comprises:
an analysis module for analyzing the data structure of an enterprise database to form a domain knowledge database and analyzing the operation flow of an enterprise resource system to form an application knowledge database;
the generator establishing module is electrically connected with the training module and is used for establishing a query data operation corpus generator by utilizing the domain knowledge database and an execution instruction operation corpus generator by utilizing the application knowledge database;
the system comprises a generator, a training corpus generating module, a query data operation corpus generating module, a command execution operation corpus generating module and a key entity set, wherein the generator is electrically connected with the generator, and is used for generating a plurality of query data operation training corpuses by utilizing the query data operation corpus generating module, generating a plurality of command execution operation training corpuses by utilizing the command execution operation corpus generating module, forming a training corpus set, and forming a key entity set according to a plurality of system domain vocabularies, a plurality of service application parameters and the training corpus set; and
and the semantic and vocabulary model building module is electrically connected with the training corpus generating module and is used for generating a common vocabulary model and a common semantic model by utilizing the key entity set.
7. The system for automatically training a virtual assistant of claim 6, wherein the semantic and vocabulary model building module further comprises:
a model establishing module, electrically connected to the corpus generating module, for differentiating the intentions of the plurality of query data operation corpuses according to the categories in the enterprise database to form a plurality of query data operation intentions, differentiating the intentions of the plurality of execution instruction operation corpuses according to the service behaviors provided by the enterprise resource system to form a plurality of execution instruction operation intentions, establishing a model of the plurality of query data operation intentions and a model of the plurality of execution instruction operation intentions, and then establishing a general database according to the key entity set, the model of the plurality of query data operation intentions and the model of the plurality of execution instruction operation intentions;
the vocabulary model building module is electrically connected with the model building module, and is used for identifying a plurality of first probabilities of the plurality of system domain vocabularies in the key entity set in the training corpus set, analyzing a plurality of sentence pattern structures of the plurality of query data operation training corpus through the identified plurality of system domain vocabularies, and a plurality of correlations among the plurality of system domain vocabularies, and building a common vocabulary model according to the plurality of first probabilities and the plurality of correlations; and
and the semantic model building module is electrically connected with the template building module, analyzes a plurality of second probabilities of the plurality of system domain vocabularies in the plurality of query data operation intentions and the plurality of execution instruction operation intentions, and builds a common semantic model according to the plurality of sentence structures and the plurality of second probabilities.
8. The system of claim 6, wherein the query data operation corpus generator is configured to analyze a plurality of query corpus data of the enterprise database and to generalize a query rule of the plurality of query corpus data; and automatically generating the plurality of query data operation corpus according to the query rule.
9. The system of claim 6, wherein the execution instructions operate a corpus generator to analyze a plurality of execution corpus data that interact with the enterprise resource system and to generalize an execution rule for the plurality of execution corpus data; and automatically generating the plurality of execution instruction operation training corpuses according to the execution rule.
10. The system of claim 7, wherein the training corpus is operated on by the automatically generated plurality of query data and the training corpus is operated on by the plurality of execution instructions to train the common vocabulary model and the common semantic model, and wherein a virtual assistant performs corresponding operations based on the common vocabulary model and the common semantic model.
CN201810244565.2A 2018-03-23 2018-03-23 Method and system for automatically training virtual assistant Active CN110298372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810244565.2A CN110298372B (en) 2018-03-23 2018-03-23 Method and system for automatically training virtual assistant

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810244565.2A CN110298372B (en) 2018-03-23 2018-03-23 Method and system for automatically training virtual assistant

Publications (2)

Publication Number Publication Date
CN110298372A CN110298372A (en) 2019-10-01
CN110298372B true CN110298372B (en) 2023-06-09

Family

ID=68025894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810244565.2A Active CN110298372B (en) 2018-03-23 2018-03-23 Method and system for automatically training virtual assistant

Country Status (1)

Country Link
CN (1) CN110298372B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8577131B1 (en) * 2011-07-12 2013-11-05 Google Inc. Systems and methods for visual object matching
CN104346406A (en) * 2013-08-08 2015-02-11 北大方正集团有限公司 Training corpus expanding device and training corpus expanding method
CN107688583A (en) * 2016-08-05 2018-02-13 株式会社Ntt都科摩 The method and apparatus for creating the training data for natural language processing device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10217059B2 (en) * 2014-02-04 2019-02-26 Maluuba Inc. Method and system for generating natural language training data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8577131B1 (en) * 2011-07-12 2013-11-05 Google Inc. Systems and methods for visual object matching
CN104346406A (en) * 2013-08-08 2015-02-11 北大方正集团有限公司 Training corpus expanding device and training corpus expanding method
CN107688583A (en) * 2016-08-05 2018-02-13 株式会社Ntt都科摩 The method and apparatus for creating the training data for natural language processing device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
限定领域语言模型训练语料的词类扩展方法;黄韵竹,韦玮,罗杨宇,李成荣;《计算机系统应用》;20111115;55-58 *
面向口语统计语言模型建模的自动语料生成算法;司玉景,肖业鸣,徐及,潘接林,颜永红;《自动化学报》;20141231;第40卷(第12期);2808-2814 *

Also Published As

Publication number Publication date
CN110298372A (en) 2019-10-01

Similar Documents

Publication Publication Date Title
US8312041B2 (en) Resource description framework network construction device and method using an ontology schema having class dictionary and mining rule
US9575936B2 (en) Word cloud display
US9087236B2 (en) Automated recognition of process modeling semantics in flow diagrams
US10282468B2 (en) Document-based requirement identification and extraction
US9672490B2 (en) Procurement system
Van der Aa et al. Detecting inconsistencies between process models and textual descriptions
US11954140B2 (en) Labeling/names of themes
US11487577B2 (en) Robotic task planning for complex task instructions in natural language
JP7042693B2 (en) Interactive business support system
US20090106023A1 (en) Speech recognition word dictionary/language model making system, method, and program, and speech recognition system
US20220398598A1 (en) Facilitating an automated, interactive, conversational troubleshooting dialog regarding a product support issue via a chatbot and associating product support cases with a newly identified issue category
WO2020077350A1 (en) Adaptable systems and methods for discovering intent from enterprise data
US11404058B2 (en) System and method for handling multi-turn conversations and context management for voice enabled ecommerce transactions
WO2020248366A1 (en) Text intention intelligent classification method and device, and computer-readable storage medium
CN107112009A (en) Corrected using the transcription of multiple labeling structure
US9208194B2 (en) Expanding high level queries
US11226832B2 (en) Dynamic generation of user interfaces based on dialogue
WO2020139865A1 (en) Systems and methods for improved automated conversations
CN110489517B (en) Automatic learning method and system of virtual assistant
Bashir et al. Requirement or not, that is the question: A case from the railway industry
TWI674530B (en) Method and system for operating a virtual assistant
CN110298372B (en) Method and system for automatically training virtual assistant
CN110209776B (en) Method and system for operating virtual assistant
CN113779231B (en) Knowledge graph-based big data visual analysis method, device and equipment
TWI652587B (en) Method and system for automatically training virtual assistant

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant