CN110298372A - The method and system of automatic training virtual assistant - Google Patents

The method and system of automatic training virtual assistant Download PDF

Info

Publication number
CN110298372A
CN110298372A CN201810244565.2A CN201810244565A CN110298372A CN 110298372 A CN110298372 A CN 110298372A CN 201810244565 A CN201810244565 A CN 201810244565A CN 110298372 A CN110298372 A CN 110298372A
Authority
CN
China
Prior art keywords
corpus
model
training
data manipulation
instruction operations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810244565.2A
Other languages
Chinese (zh)
Other versions
CN110298372B (en
Inventor
周忠信
吴兆麟
许旭正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digiwin Software Co Ltd
Original Assignee
Digiwin Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digiwin Software Co Ltd filed Critical Digiwin Software Co Ltd
Priority to CN201810244565.2A priority Critical patent/CN110298372B/en
Publication of CN110298372A publication Critical patent/CN110298372A/en
Application granted granted Critical
Publication of CN110298372B publication Critical patent/CN110298372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of method and system of automatic training virtual assistant.The method of automatic training virtual assistant includes: analyzing the data structure of enterprise database to form domain knowledge data library and analyze the work flow of enterprise resource system to form working knowledge database;Inquiry data manipulation corpus generator is established using domain knowledge data library and utilizes working knowledge Database executing instruction operations corpus generator;Multiple queries data manipulation training corpus is generated using inquiry data manipulation corpus generator and executing instruction operations corpus generator generates multiple executing instruction operations training corpus, forms training corpus set;Multiple system regions vocabulary and multiple parameters that are served by are formed as critical entities set;And common lexicon model and common semanteme model are generated using critical entities set and training corpus set.Whereby, the effect of reaching quickly training and updating virtual assistant.

Description

The method and system of automatic training virtual assistant
Technical field
This case relates to the method and system of training virtual assistant a kind of, and empty in particular to a kind of automatic training The method and system of quasi- assistant.
Background technique
Enterprise Resource Planning System (Enterprise Resource Planning, ERP), abbreviation ERP system refers to and builds It stands and provides the management platform of decision on the basis of information technology for business decision layer.Its mainly by the stream of people of enterprise, logistics, Information flow, cash flow carry out unified management, to utilize the resource of enterprise to greatest extent.And ERP system includes production control The function of three broad aspects such as system, logistics management and financial management, therefore ERP system scale is very huge.
In Modern Live, virtual assistant (or intelligent assistant) can help user directly with oral/or text shape The natural language and electronic product of formula are linked up, and are provided user and are more convenient quick communicative mode.In order to by virtual assistant application It in ERP system, needs to carry out virtual assistant the training of common words and common function in ERP system, could allow virtual Assistant it is also required to provide natural language in conjunction with ERP system, but when training smart assistant other than needing to provide database Training corpus, the meaning are to need someone ceaselessly to talk with virtual assistant to provide training corpus, virtual assistant could be allowed to have The ability interacted with people.Therefore how quick training virtual assistant, allow virtual assistant that can have the relevant knowledge of ERP system And the ability interacted with people, it is this field problem to be modified.
Summary of the invention
The main object of the present invention is to provide the method and system of automatic training virtual assistant a kind of, mainly can The training corpus for automatically generating natural language allows virtual assistant to can use training corpus and is trained, reach quickly training and The effect of updating virtual assistant.
To reach above-mentioned purpose, first aspect of this case is to provide the method for automatic training virtual assistant a kind of, this side Method comprises the steps of: the data structure of analysis enterprise database to form domain knowledge data library and analysis corporate resources system The work flow of system is to form working knowledge database;Inquiry data manipulation corpus generator is established using domain knowledge data library And utilize working knowledge Database executing instruction operations corpus generator;It is produced using inquiry data manipulation corpus generator Raw multiple queries data manipulation training corpus and executing instruction operations corpus generator generate multiple executing instruction operations training Corpus forms training corpus set;Multiple system regions vocabulary and multiple parameters that are served by are formed as critical entities set; And common lexicon model and common semanteme model are generated using critical entities set and training corpus set.
According to one embodiment of this case, the common lexicon model is generated using the critical entities set and the training corpus set And the common semanteme model, also include: being instructed according to the multiple inquiry data manipulation of the class discrimination in the enterprise database The intention for practicing corpus forms multiple queries data manipulation intention, and distinguishes according to the service behavior that the enterprise resource system provides The intention of the multiple executing instruction operations training corpus forms multiple executing instruction operations and is intended to;Establish the multiple inquiry number The model that the model being intended to according to operation and the multiple executing instruction operations are intended to;According to the critical entities set, described more The model that the model and the multiple executing instruction operations that a inquiry data manipulation is intended to are intended to establishes a global database;It distinguishes Know multiple first probability that the multiple system regions vocabulary in the critical entities set occurs in the training corpus set, And multiple sentence patterns of the multiple inquiry data manipulation training corpus of the multiple system regions lexical analysis by picking out Multiple relevances between structure and the multiple system regions vocabulary, and according to the multiple first probability and The multiple relevance establishes a common lexicon model;And the multiple inquiry data manipulation of analysis intention and the multiple There are multiple second probability of the multiple system regions vocabulary in being intended in executing instruction operations, and according to the multiple sentence pattern knot Structure and the multiple second probability establish a common semanteme model.
According to one embodiment of this case, which also includes: the enterprise database is inquired in analysis Multiple queries corpus data, and summarize it is the multiple inquiry corpus data a rule searching;And it is advised according to the inquiry Then automatically generate the multiple inquiry data manipulation training corpus.
According to one embodiment of this case, which also includes: analysis and the enterprise resource system Multiple execution corpus datas of interaction, and summarize the multiple executing rule for executing corpus data;And it is held according to this Line discipline automatically generates the multiple executing instruction operations training corpus.
According to one embodiment of this case, the multiple inquiry data manipulation training corpus that automatically generates and described more is utilized A executing instruction operations training corpus, the training common lexicon model and the common semanteme model, a virtual assistant can bases The common lexicon model and the common semanteme model execute corresponding operation.
Second aspect of this case is to provide the system of automatic training virtual assistant a kind of, respectively with enterprise database and enterprise The connection of industry resource system, it includes: processor and storage device.Storage device is electrically connected to processor, total to store Volume data library, working knowledge database and domain knowledge data library.Wherein, processor includes: analysis module, generator are established Module, training corpus generation module and the meaning of one's words and lexicon model establish module.Analysis module is to analyze enterprise database Data structure is to form domain knowledge data library and analyze the work flow of enterprise resource system to form working knowledge data Library.Generator establishes module and training module is electrically connected, to establish inquiry data manipulation language using domain knowledge data library Expect generator and utilizes working knowledge Database executing instruction operations corpus generator.Training corpus generation module, with Generator establishes module electric connection, to generate multiple queries data manipulation training using inquiry data manipulation corpus generator Corpus and executing instruction operations corpus generator generate multiple executing instruction operations training corpus, form training corpus set, And be formed as a critical entities set according to multiple system regions vocabulary and multiple parameters that are served by.The meaning of one's words and vocabulary mould Type establishes module and training corpus generation module is electrically connected, to be produced using critical entities set and the training corpus set Raw common lexicon model and common semanteme model.
According to one embodiment of this case, the meaning of one's words and lexicon model establish module and also include: a model establishes module, with the instruction Practice corpus generation module to be electrically connected, according to the multiple inquiry data manipulation training language of the class discrimination in the enterprise database The intention of material forms multiple queries data manipulation intention, and described in the service behavior differentiation according to enterprise resource system offer The intention of multiple executing instruction operations training corpus forms multiple executing instruction operations and is intended to, and establishes the multiple inquiry data The model being intended to is operated and model that the multiple executing instruction operations are intended to, then according to the critical entities set, described The model that the model and the multiple executing instruction operations that multiple queries data manipulation is intended to are intended to establishes a global database; One lexicon model establishes module, establishes module with the model and is electrically connected, recognizes the multiple system in the critical entities set Multiple first probability that system Field Words occur in the training corpus set, and the multiple system regions by picking out Lexical analysis it is the multiple inquiry data manipulation training corpus multiple sentence pattern structures and the multiple system regions vocabulary that Multiple relevances between this, and a common vocabulary mould is established according to the multiple first probability and the multiple relevance Type;And one semanteme model establish module, establish module with the model and be electrically connected, analyze the multiple inquiry data manipulation and anticipate There are multiple second probability of the multiple system regions vocabulary in being intended in figure and the multiple executing instruction operations, and according to The multiple sentence pattern structure and the multiple second probability establish a common semanteme model.
According to one embodiment of this case, which inquires the more of the enterprise database to analyze A inquiry corpus data, and summarize a rule searching of the multiple inquiry corpus data;And certainly according to the rule searching The raw the multiple inquiry data manipulation training corpus of movable property.
According to one embodiment of this case, which is interacted to analyze with the enterprise resource system Multiple execution corpus datas, and summarize it is the multiple execute corpus data an executing rule;And rule are executed according to this Then automatically generate the multiple executing instruction operations training corpus.
According to one embodiment of this case, the multiple inquiry data manipulation training corpus that automatically generates and described more is utilized A executing instruction operations training corpus, the training common lexicon model and the common semanteme model, a virtual assistant can bases The common lexicon model and the common semanteme model execute corresponding operation.
The method and system of automatic training virtual assistant of the invention are mainly that can automatically generate the training of natural language Corpus generates the meaning of one's words and lexicon model, virtual assistant can be interacted according to the meaning of one's words and lexicon model with user, certainly Also new training result can be constantly generated by the training corpus automatically generated, reach quickly training and updates virtual assistant Effect.
Detailed description of the invention
For above and other purpose, feature, advantage and embodiment of the invention can be clearer and more comprehensible, appended attached drawing is said It is bright as follows:
Fig. 1 is a kind of schematic diagram of the system of automatic training virtual assistant according to depicted in some embodiments of this case;
Fig. 2 is the schematic diagram of the processor according to depicted in some embodiments of this case;
Fig. 3 is the schematic diagram that the meaning of one's words according to depicted in some embodiments of this case and lexicon model establish module;
Fig. 4 is a kind of flow chart of the method for automatic training virtual assistant according to depicted in some embodiments of this case; And
Fig. 5 is the flow chart of the step S450 according to depicted in some embodiments of this case.
Specific embodiment
The many different embodiments or illustration disclosed below of providing are to implement different characteristic of the invention.In special illustration Element and configuration are used to simplify this announcement in the following discussion.The purposes that any illustration discussed only is used to narrate, and It will not limit the invention in any way or the range and meaning of its illustration.In addition, this announcement may repeat in different illustrations Numerical chracter and/or letter are quoted, these are repeated all in order to simplify and illustrate, different real in itself and not specified following discussion Apply the relationship between example and/or configuration.
The word (terms) used in full piece specification and claims usually has every in addition to having and especially indicating A word using in the content disclosed in this area, herein with the usual meaning in special content.It is certain to describe originally to take off The word of dew by it is lower or this specification other places discuss, to provide those skilled in the art in the description in relation to this exposure Additional guidance.
About " coupling " used herein or " connection ", can refer to two or multiple element mutually directly make entity or electricity Property contact, or mutually put into effect indirectly body or in electrical contact, and " coupling " or " connection " also can refer to two or multiple element mutually grasp Make or acts.
Herein, using the vocabulary of first, second and third etc., be used to describe various elements, component, region, Layer and/or block be it is understood that.But these elements, component, region, layer and/or block should not be by these terms It is limited.These vocabulary are only limited to for distinguishing single element, component, region, layer and/or block.Therefore, one hereinafter First element, component, region, layer and/or block are also referred to as second element, component, region, layer and/or block, without de- From original idea of the invention.As used herein, vocabulary " and/or " contain any of one or more of associated item listed Combination.Mentioned in this case file " and/or " refer to table column element any one, all or at least one any combination.
Please refer to Fig. 1.Fig. 1 is a kind of system of automatic training virtual assistant according to depicted in some embodiments of this case 100 schematic diagram.Show as depicted in FIG. 1, the system 100 of automatic training virtual assistant and enterprise database 101 and corporate resources system 102 connection of system, it includes processor 110 and storage devices 130.Storage device 130 is to store global database 131, answer With knowledge data base 132 and domain knowledge data library 133, store global database 131, working knowledge database 132 and Domain knowledge data library 133 is electrically connected to processor 110.
In various embodiments of the present invention, processor 110 may be embodied as integrated circuit such as micro-control unit (microcontroller), microprocessor (microprocessor), digital signal processor (digital signal Processor), special application integrated circuit (application specific integrated circuit, ASIC), patrol Collect the combination of circuit or other similar element or said elements.Storage device 130 may be embodied as memory body, hard disk, portable disk, Memory card etc..
Please refer to figs. 2 and 3 together, and Fig. 2 is the signal of the processor 110 according to depicted in some embodiments of this case Figure, Fig. 3 is the schematic diagram that the meaning of one's words according to depicted in some embodiments of this case and lexicon model establish module 114.Processor 110 establish module 112, training corpus generation module 113 and 114 comprising analysis module 111, generator.Generator establishes mould Block 112 and analysis module 111 are electrically connected, and training corpus generation module 113 and generator establish the electric connection of module 112, language Meaning and lexicon model establish module 114 and training corpus generation module 113 is electrically connected.The meaning of one's words and lexicon model establish module 114 comprising model establishes module 1141, lexicon model establishes module 1142 and semanteme model establishes module 1143.Lexicon model It establishes module 1142 and semanteme model establishes module 1143 and all establishes the electric connection of module 1141 with model.
Also referring to FIG. 1 to FIG. 4.Fig. 4 is a kind of automatic training virtual according to depicted in some embodiments of this case The flow chart of the method 400 of assistant.As shown in figure 4, the method 400 of automatic training virtual assistant comprises the steps of:
Step S410: the data structure of enterprise database is analyzed to form domain knowledge data library and analysis corporate resources The work flow of system is to form working knowledge database;
Step S420: inquiry data manipulation corpus generator is established using domain knowledge data library and utilizes working knowledge Database executing instruction operations corpus generator;
Step S430: generating multiple queries data manipulation training corpus using inquiry data manipulation corpus generator, and Executing instruction operations corpus generator generates multiple executing instruction operations training corpus, forms training corpus set;
Step S440: multiple system regions vocabulary and multiple parameters that are served by are formed as critical entities set;And
Step S450: common lexicon model and the common meaning of one's words are generated using critical entities set and training corpus set Model.
In step S410, the data structure of enterprise database 101 is analyzed to form domain knowledge data library 133 and divide The work flow of enterprise resource system 102 is analysed to form working knowledge database 132.In an embodiment, need to establish application Knowledge data base 132 and domain knowledge data library 133, the first work flow in addition to needing to analyze enterprise resource system 102 And operation procedure, it is also necessary to collect how enterprise personnel interacts with enterprise resource system 102, for example, enterprise personnel makes What is provided with enterprise resource system 102 is which operation procedure using enterprise resource system 102 when asking for leave and servicing, and is made The setup parameters such as personnel's title to be offered of asking for leave, time of asking for leave, agent are needed with when asking for leave and servicing.Similarly, in addition to needing The data structure for analyzing enterprise database 101 is found out except the dedicated vocabulary in enterprise field, it is also necessary to analyze dedicated vocabulary it Between relevance, for example, packing slip, customer name, product name etc. are all to have associated vocabulary, because of packing slip Content will record shipment to which client and the commodity of this batch of shipment.
Then in step S420 and step S430, inquiry data manipulation corpus is established using domain knowledge data library 133 Generator and executing instruction operations corpus generator is established using working knowledge database 132, followed by inquiry data behaviour Make corpus generator generation multiple queries data manipulation training corpus and executing instruction operations corpus generator generates multiple hold Row instruction operation training corpus, forms training corpus set.
In an embodiment, inquiry data manipulation corpus generator is for analyzing enterprise personnel inquiry enterprise database When 101, used natural language, and the natural language that enterprise personnel is used summarizes rule searching, so that inquiry data Operation corpus generator can automatically generate the corpus data of inquiry data manipulation.Inquiry data manipulation rule searching can be [preceding lead]+[business data condition] * n+ [conjunction]+[the enterprise's field specialized vocabulary for wanting inquiry]+[suffix word], citing For, if the natural language that enterprise personnel uses is " I wants to look for the order of company A last month, and do you know? ", at this " I wants to look for " is exactly [preceding lead] in example, and " company A " and " last month " is all [business data condition], business data condition Can have multiple, business data condition has 2 in this present embodiment, " " it is [conjunction], " order " is [to want the enterprise of inquiry Industry field specialized vocabulary], " do you know? " it is then [suffix word].
Hold above-mentioned, executing instruction operations corpus generator is used when interacting for analyzing with enterprise resource system 102 Natural language, and the natural language that enterprise personnel is used summarizes executing rule, so that executing instruction operations corpus generates Device can automatically generate the corpus data of executing instruction operations.The executing rule of executing instruction operations can be [preceding lead]+[enterprise Industry system service parameter] * n+ [conjunction]+[thinking business system service to be used]+[suffix word], for example, if enterprise The natural language that industry personnel use is " me is helped to ask 1/15~1/16 sick leave ", and it is exactly [leading for " me being helped to ask " in this example Language], " 1/15~1/16 " is [business system service parameter], business system service parameter can have it is multiple, in this present embodiment Business system service parameter only has 1, " " it is [conjunction], " sick leave " is [thinking business system service to be used], at this There is no [suffix word] in a example.In this way, establish inquiry data manipulation corpus generator and executing instruction operations language After expecting the corresponding rule searching of generator and executing rule, training corpus can be largely generated, forms training corpus set.
In step S440, multiple system regions vocabulary and multiple parameters that are served by are formed as critical entities set. For example, critical entities set includes that enterprise's Field Words and business system are served by the information such as parameter.Enterprise's neck Domain vocabulary then refers to that the enterprise of each different field may may require that the vocabulary used, such as the vocabulary that applies to of hospitality industry and fortune The vocabulary that defeated industry applies to is not centainly identical, thus enterprise's Field Words can it is different according to each enterprise using ERP system and It is varied.Business system to be served by parameter then be the corresponding parameter of respective services provided by business system, citing and Speech, the function of asking for leave in business system may need to ask for leave time, the false information such as not, the system regions word in critical entities set Remittance is just needed comprising information such as the leave of absence, annual leave, sick leave, vacations of going on business.
Specifically, critical entities set also includes that data field title, the business system that can be used when accessing data mention The parameter value of the service name of supply user, the user restrictive condition set in inquiry, the parameter value being served by And handling function of business system etc., the handling function of business system can be to ask for leave, work overtime application, application of going on business, report branch Equal handling functions.And these above-mentioned information may also have corresponding alias, need to also input together in tranining database, example Such as: packing slip is possible to shipment detail list or the different title of sales slip for the manufacturer of specific area.
In step S450, common lexicon model and common is generated using critical entities set and training corpus set Semanteme model.The detailed step of step S450 is referring to FIG. 5, Fig. 5 is the step according to depicted in some embodiments of this case The flow chart of S450.As shown in figure 5, generation vocabulary and semanteme model stage comprise the steps of:
Step S451: it is formed according to the intention of the class discrimination inquiry data manipulation training corpus in enterprise database multiple It inquires data manipulation to be intended to, and distinguishes executing instruction operations training corpus according to the service behavior that enterprise resource system provides Intention forms multiple executing instruction operations and is intended to;
Step S452: the model that inquiry data manipulation is intended to and the model that executing instruction operations are intended to are established;
Step S452: the model and executing instruction operations being intended to according to critical entities set, inquiry data manipulation are intended to Model establish global database;
Step S453: the system regions vocabulary in identification critical entities set occur in training corpus set multiple the One probability, and the system regions lexical analysis by picking out inquires multiple sentence pattern structures of data manipulation training corpus, and Multiple relevances between system regions vocabulary, and common lexicon model is established according to the first probability and relevance;With And
Step S454: there is system regions vocabulary in analysis inquiry data manipulation intention and executing instruction operations intention Multiple second probability, and common semanteme model is established according to sentence pattern structure and the second probability.
In step S451, according to the intention of the class discrimination inquiry data manipulation training corpus in enterprise database 101 It forms multiple queries data manipulation to be intended to, and the service behavior differentiation provided according to enterprise resource system 102 executes instruction behaviour The intention for making training corpus forms multiple executing instruction operations and is intended to.It, can be first according to each different field in an embodiment 101 pairs of inquiry data manipulations of enterprise database, which are distinguished, to be intended to.For example, the field stored by the enterprise database of hospitality industry Position is not centainly identical as the enterprise database of transport service, therefore user's demand of the two is also not necessarily identical.For example, to medical treatment It is all to inquire the different of data manipulation to be intended to that the user of industry, which might have inquiry medical record data, inquiry ward vacancy etc., to fortune It is all to inquire disagreeing for data manipulation that the user of defeated industry, which might have inquiry shipment record, inquiry package shipping situation etc., Figure.Certainly executing instruction operations can also be distinguished according to the service behavior that the enterprise resource system of each different field provides and is anticipated Figure, the provided service of the enterprise resource system of hospitality industry as described above also certainly can be different with transport service, each difference Inquiry data manipulation provided by the enterprise in field or service behavior operation also not necessarily can be general, therefore are also required to each Service differentiation provided by the enterprise of different field is intended to, for example, might have the clothes for providing and registering to the user of hospitality industry Business provides the different intentions for ordering that healthy service eaten etc. is all service behavior operation in hospital, may to the user of transport service Service, the service for arranging cargo shipment sequence etc. for being provided with automatic classification cargo are all that the different of service behavior operation are intended to.
In step S452 and step S453, the model and executing instruction operations intention that inquiry data manipulation is intended to are established Model, and according to critical entities set, inquiry data manipulation be intended to model and executing instruction operations be intended to model build Vertical global database 131.For example, user is grasped in the inquiry data that the virtual assistant for operating some field enterprise has Make to be intended to and executing instruction operations are intended to after all distinguishing well, so that it may corresponding model is generated for each intention, according to top Example, hospitality industry just has corresponding inquiry medical record data, inquiry ward vacancy, the service registered is provided and provide be hospitalized order it is strong 4 enterprise resource systems instruction operation model of the service of health meal, transport service just have corresponding inquiry shipment record, inquiry package Shipping situation, arranges 4 enterprise resource systems of the service of cargo shipment sequence to instruct behaviour at the service for providing automatic classification cargo Make model, then can establish global database 131 according to these above-mentioned models and critical entities set.
In step S454, the system regions vocabulary in identification critical entities set occurs more in training corpus set A first probability, and the system regions lexical analysis by picking out inquires multiple sentence pattern structures of data manipulation training corpus, And multiple relevances between system regions vocabulary, and common vocabulary mould is established according to the first probability and relevance Type.In one embodiment, using n-gram (n-GRAM) and context-free grammar (Context-free grammar, CFG) two kinds of algorithms calculate the probability that each system regions vocabulary occurs in training corpus, and pass through system regions vocabulary point The sentence pattern structure of training corpus and the relevance between system regions vocabulary are analysed to establish common lexicon model.Citing and Speech, if there is " I will inquire the price list of company A " and " I will inquire the packing slip of company A " in training corpus, and " A is public Department ", " price list " and " packing slip " are all system regions vocabulary, but in above-mentioned example, since " company A " may average out Now in the intention of each inquiry data manipulation, therefore the probability of " company A " is in the intention of each inquiry data manipulation It is all almost the same, and " price list " and " packing slip " then only measures greatly in the training corpus of intention for inquiring certain specific datas It is existing, without in the training corpus of intention for inquiring other data, therefore the probability of " price list " and " packing slip " exists Can be especially high in corresponding intention, and can be lower in other intentions.
In step S455, there is system regions word in analysis inquiry data manipulation intention and executing instruction operations intention Multiple second probability converged, and common semanteme model is established according to sentence pattern structure and the second probability.In one embodiment, it utilizes Hidden Markov model (Hidden Markov Model, HMM) algorithm computing system Field Words are intended to (including look into each Data manipulation intention and executing instruction operations are ask to be intended to) in the probability that occurs simultaneously, to establish common semanteme model, citing and Speech, many training corpus can be inputted at the training data model stage, and hidden Markov model algorithm must computing system neck Domain vocabulary is in different intentions while the probability of appearance.In conjunction with above-mentioned example, if had in training corpus, " I will inquire company A Packing slip ", can find out " company A " and " packing slip " according to n-gram and context-free grammar is all system regions Vocabulary, and hidden Markov model algorithm can be according to all system regions picked out in each corpus under different intentions Vocabulary calculates all these system regions vocabulary (that is, " company A " and " packing slip ") picked out in some specific intended In (such as: the inquiry data manipulation of inquiry shipment related data is intended to or the executing instruction operations gone on business of application are intended to) while out Existing probability, the semanteme model being intended to as identification user;And the lingua franca established according to hidden Markov model algorithm Meaning model, virtual assistant are intention height phase when can determine whether " company A " and " packing slip " while occurring with inquiry stock withdrawal data Association, in conjunction with the enterprise resource system instruction operation model and " company A " this system regions vocabulary of inquiry stock withdrawal data As querying condition, it can user is helped to inquire the shipment related data of company A in enterprise database automatically.
After having established common lexicon model and common semanteme model, virtual assistant can according to common lexicon model and altogether Logical semanteme model executes corresponding operation.For example, when there is voice input, virtual assistant can first carry out speech recognition, Natural language is converted into corpus data, can then be looked for according to aforementioned established common lexicon model and common semanteme model The key vocabularies in corpus data and judge the intention (just will appreciate that the demand of user at this time) of user out, virtual assistant is just Corresponding operation can be carried out (for example, searching data in the database according to the demand of user and the key vocabularies picked out Or execute enterprises service operation).
Someone was needed ceaselessly in training virtual assistant in the past it is found that mainly improving by the embodiment of above-mentioned this case Talk with virtual assistant and training corpus is provided, virtual assistant could be allowed to have the ability interacted with people.Therefore, by can be automatic The training corpus for generating natural language allows virtual assistant can be according to semanteme model to train semanteme model and lexicon model And lexicon model is interacted with user, also can constantly generate new training knot by the training corpus automatically generated certainly Fruit, the effect of reaching quickly training and update virtual assistant.
In addition, above-mentioned illustration includes example steps sequentially, but these steps need not be sequentially executed according to shown.With Different order executes these steps all considering in range in this disclosure.In the spirit and model of the embodiment of this disclosure In enclosing, it can optionally increase, replace, change sequence and/or omitting these steps.
Although this case is disclosed as above with embodiment, so it is not limited to this case, any to be familiar with this those skilled in the art, Do not depart from the spirit and scope of this case, when can be used for a variety of modifications and variations, therefore the protection scope of this case when view it is appended Subject to the range that claims are defined.

Claims (10)

1. a kind of method of automatic training virtual assistant, characterized by comprising:
The data structure of an enterprise database is analyzed to form a domain knowledge data library and analyze an enterprise resource system Work flow is to form a working knowledge database;
An inquiry data manipulation corpus generator is established using the domain knowledge data library and utilizes the working knowledge database Establish an executing instruction operations corpus generator;
Multiple queries data manipulation training corpus is generated using the inquiry data manipulation corpus generator and this executes instruction behaviour Make corpus generator and generate multiple executing instruction operations training corpus, forms a training corpus set;
Multiple system regions vocabulary and multiple parameters that are served by are formed as a critical entities set;And
A common lexicon model and a common semanteme model are generated using the critical entities set and the training corpus set.
2. the method for automatic training virtual assistant according to claim 1, which is characterized in that utilize the critical entities set The common lexicon model and the common semanteme model are generated with the training corpus set, also includes:
Multiple look into is formed according to the intention of the multiple inquiry data manipulation training corpus of the class discrimination in the enterprise database It askes data manipulation to be intended to, and distinguishes the multiple executing instruction operations according to the service behavior that the enterprise resource system provides and instruct The intention for practicing corpus forms multiple executing instruction operations and is intended to;
Establish the model that the multiple inquiry data manipulation is intended to and the model that the multiple executing instruction operations are intended to;
The model being intended to according to the critical entities set, the multiple inquiry data manipulation and the multiple executing instruction operations The model of intention establishes a global database;
Recognize the multiple system regions vocabulary in the critical entities set occurs in the training corpus set multiple One probability, and the multiple inquiry data manipulation training corpus of the multiple system regions lexical analysis by picking out is more Multiple relevances between a sentence pattern structure and the multiple system regions vocabulary, and according to the multiple first machine Rate and the multiple relevance establish a common lexicon model;And
It analyzes in the multiple inquiry data manipulation intention and the multiple executing instruction operations intention and the multiple system occurs It unites multiple second probability of Field Words, and it is common according to the multiple sentence pattern structure and the multiple second probability to establish one Semanteme model.
3. the method for automatic training virtual assistant according to claim 1, which is characterized in that the inquiry data manipulation corpus Generator also includes:
The multiple queries corpus data of the enterprise database is inquired in analysis, and is summarized the one of the multiple inquiry corpus data and looked into Ask rule;And
The multiple inquiry data manipulation training corpus is automatically generated according to the rule searching.
4. the method for automatic training virtual assistant according to claim 1, which is characterized in that the executing instruction operations corpus Generator also includes:
The multiple execution corpus datas interacted with the enterprise resource system are analyzed, and summarize the multiple execution corpus data One executing rule;And
The multiple executing instruction operations training corpus is automatically generated according to the executing rule.
5. the method for automatic training virtual assistant according to claim 2, which is characterized in that described in automatically generating Multiple queries data manipulation training corpus and the multiple executing instruction operations training corpus, training the common lexicon model with And the common semanteme model, a virtual assistant can execute corresponding according to the common lexicon model and the common semanteme model Operation.
6. a kind of system of automatic training virtual assistant, connect with an enterprise database and an enterprise resource system respectively, special Sign is, includes:
One processor;
One storage device is electrically connected to the processor, to store a global database, a working knowledge database and one Domain knowledge data library;
Wherein, which includes:
One analysis module, to analyze the data structure of an enterprise database to form a domain knowledge data library and analysis one The work flow of enterprise resource system is to form a working knowledge database;
One generator establishes module, is electrically connected with the training module, to establish an inquiry using the domain knowledge data library Data manipulation corpus generator and utilize the one executing instruction operations corpus generator of working knowledge Database;
One training corpus generation module is established module with the generator and is electrically connected, to utilize the inquiry data manipulation corpus Generator generates multiple queries data manipulation training corpus and the executing instruction operations corpus generator generates multiple execution and refers to Enable operation training corpus, form a training corpus set, and according to multiple system regions vocabulary, it is multiple be served by parameter with And the training corpus set is formed as a critical entities set;And
One meaning of one's words and lexicon model establish module, are electrically connected with the training corpus generation module, to utilize the critical entities Set generates a common lexicon model and a common semanteme model.
7. the system of automatic training virtual assistant according to claim 6, which is characterized in that the meaning of one's words and lexicon model are built Formwork erection block also includes:
One model establishes module, is electrically connected with the training corpus generation module, according to the class discrimination in the enterprise database The intention of the multiple inquiry data manipulation training corpus forms multiple queries data manipulation intention, and according to the corporate resources The intention that the service behavior that system provides distinguishes the multiple executing instruction operations training corpus forms multiple executing instruction operations It is intended to, and establishes the model that the multiple inquiry data manipulation is intended to and the model that the multiple executing instruction operations are intended to, The model being intended to then according to the critical entities set, the multiple inquiry data manipulation and the multiple executing instruction operations The model of intention establishes a global database;
One lexicon model establishes module, establishes module with the model and is electrically connected, recognizes described more in the critical entities set Multiple first probability that a system regions vocabulary occurs in the training corpus set, and the multiple system by picking out Field Words analyze the multiple multiple sentence pattern structures for inquiring data manipulation training corpus and the multiple system regions word Multiple relevances between remittance, and a common vocabulary is established according to the multiple first probability and the multiple relevance Model;And
One semanteme model establishes module, establishes module with the model and is electrically connected, and analyzes the multiple inquiry data manipulation and is intended to And there are multiple second probability of the multiple system regions vocabulary in the multiple executing instruction operations intention, and according to institute It states multiple sentence pattern structures and the multiple second probability establishes a common semanteme model.
8. the system of automatic training virtual assistant according to claim 6, which is characterized in that the inquiry data manipulation corpus Generator inquires the multiple queries corpus data of the enterprise database to analyze, and summarizes the multiple inquiry corpus data A rule searching;And the multiple inquiry data manipulation training corpus is automatically generated according to the rule searching.
9. the system of automatic training virtual assistant according to claim 6, which is characterized in that the executing instruction operations corpus Generator summarizes the multiple execution corpus to analyze the multiple execution corpus datas interacted with the enterprise resource system One executing rule of data;And the multiple executing instruction operations training corpus is automatically generated according to the executing rule.
10. the system of automatic training virtual assistant according to claim 7, which is characterized in that utilize the institute automatically generated State multiple queries data manipulation training corpus and the multiple executing instruction operations training corpus, the training common lexicon model And the common semanteme model, a virtual assistant can execute corresponding according to the common lexicon model and the common semanteme model Operation.
CN201810244565.2A 2018-03-23 2018-03-23 Method and system for automatically training virtual assistant Active CN110298372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810244565.2A CN110298372B (en) 2018-03-23 2018-03-23 Method and system for automatically training virtual assistant

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810244565.2A CN110298372B (en) 2018-03-23 2018-03-23 Method and system for automatically training virtual assistant

Publications (2)

Publication Number Publication Date
CN110298372A true CN110298372A (en) 2019-10-01
CN110298372B CN110298372B (en) 2023-06-09

Family

ID=68025894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810244565.2A Active CN110298372B (en) 2018-03-23 2018-03-23 Method and system for automatically training virtual assistant

Country Status (1)

Country Link
CN (1) CN110298372B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8577131B1 (en) * 2011-07-12 2013-11-05 Google Inc. Systems and methods for visual object matching
CN104346406A (en) * 2013-08-08 2015-02-11 北大方正集团有限公司 Training corpus expanding device and training corpus expanding method
US20150220511A1 (en) * 2014-02-04 2015-08-06 Maluuba Inc. Method and system for generating natural language training data
CN107688583A (en) * 2016-08-05 2018-02-13 株式会社Ntt都科摩 The method and apparatus for creating the training data for natural language processing device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8577131B1 (en) * 2011-07-12 2013-11-05 Google Inc. Systems and methods for visual object matching
CN104346406A (en) * 2013-08-08 2015-02-11 北大方正集团有限公司 Training corpus expanding device and training corpus expanding method
US20150220511A1 (en) * 2014-02-04 2015-08-06 Maluuba Inc. Method and system for generating natural language training data
CN107688583A (en) * 2016-08-05 2018-02-13 株式会社Ntt都科摩 The method and apparatus for creating the training data for natural language processing device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
司玉景,肖业鸣,徐及,潘接林,颜永红: "面向口语统计语言模型建模的自动语料生成算法", 《自动化学报》 *
黄韵竹,韦玮,罗杨宇,李成荣: "限定领域语言模型训练语料的词类扩展方法", 《计算机系统应用》 *

Also Published As

Publication number Publication date
CN110298372B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
US11694036B2 (en) Using natural language constructs for data visualizations
US8627208B2 (en) Application generator for data transformation applications
US11144710B2 (en) Device with communication interface and method for controlling database access
US9069802B2 (en) Syntactic tagging in a domain-specific context
JP6894534B2 (en) Information processing method and terminal, computer storage medium
US9037613B2 (en) Self-learning data lenses for conversion of information from a source form to a target form
WO2019085697A1 (en) Man-machine interaction method and system
US20140351228A1 (en) Dialog system, redundant message removal method and redundant message removal program
US10157175B2 (en) Business intelligence data models with concept identification using language-specific clues
US10726056B2 (en) Speech-based database access
US9043367B2 (en) Self-learning data lenses for conversion of information from a first form to a second form
CN110383263A (en) The creation cognition intelligence inquiry from multiple data corpus
US20130325770A1 (en) Probabilistic language model in contextual network
KR102307380B1 (en) Natural language processing based call center support system and method
CN110321360A (en) The processing method and relevant device of list data
CN101916208A (en) System and method for calling driver module in multithreading
CN109472029B (en) Medicine name processing method and device
US20060010138A1 (en) Method and system for efficient representation, manipulation, communication, and search of hierarchical composite named entities
CN109657803A (en) The building of machine learning model
US9207917B2 (en) Application generator for data transformation applications
CN106250366A (en) A kind of data processing method for question answering system and system
US20180314766A1 (en) Data Processing System, Data Processing Method, and Data Structure
CN110489517B (en) Automatic learning method and system of virtual assistant
CN110298372A (en) The method and system of automatic training virtual assistant
CN109902215A (en) A kind of method and system of deals match

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant