CN110489517A

CN110489517A - The Auto-learning Method and system of virtual assistant

Info

Publication number: CN110489517A
Application number: CN201810436639.2A
Authority: CN
Inventors: 周忠信; 吴兆麟; 许旭正
Original assignee: Digiwin Software Co Ltd
Current assignee: Digiwin Software Co Ltd
Priority date: 2018-05-09
Filing date: 2018-05-09
Publication date: 2019-11-22
Anticipated expiration: 2038-05-09
Also published as: CN110489517B

Abstract

A kind of Auto-learning Method and system of virtual assistant.The Auto-learning Method of virtual assistant includes: receiving audio input and recognizes audio to form corpus data；Corpus data is analyzed using Natural Language Processing Models, to generate language feature information corresponding with corpus data；Functional scenario analysis is carried out to language feature information according to functional contextual information, judges the corresponding operation of one of these intentions；If functional scenario analysis can not judge the corresponding operation of one of these intentions, word segmentation processing is carried out for corpus data；With according to after word segmentation processing as a result, judging whether there is new term or new corpus data；If there is new term, Natural Language Processing Models are updated according to the meaning of new term, if there is new corpus data, functional scenario analysis is updated according to the intention of new corpus data.Whereby, reach effect that user can more rapidly facilitate when using ERP system.

Description

The Auto-learning Method and system of virtual assistant

Technical field

This case relates to a kind of method and system learnt automatically, and in particular to a kind of the automatic of virtual assistant Learning method and system.

Background technique

Enterprise Resource Planning System (Enterprise Resource Planning, ERP), abbreviation ERP system refers to and builds It stands and provides the management platform of decision on the basis of information technology for business decision layer.Its mainly by the stream of people of enterprise, logistics, Information flow, cash flow carry out unified management, to utilize the resource of enterprise to greatest extent.And ERP system includes production control The function of three broad aspects such as system, logistics management and financial management, therefore ERP system scale is very huge.

Virtual assistant is applied in ERP system, more can quickly help user to exchange with huge ERP system, User can be saved the time spent in using in ERP system, but since each user uses ERP system to be accustomed to not Together, therefore the case where virtual assistant can not understand user's problem is had, causes user using tired in ERP system instead It is difficult.

Summary of the invention

The main object of the present invention is to provide the Auto-learning Method and system of a kind of virtual assistant, mainly allows void Quasi- assistant has the function of learning automatically, allows virtual assistant can be during exchanging with user, automatic study to use Jargoon word in speak habit or the industry of person, reaches and user is allowed the use of ERP system to be more rapidly to facilitate The effect of.

To reach above-mentioned purpose, first aspect of this case is to provide a kind of Auto-learning Method of virtual assistant, this side Method, which comprises the steps of:, to be received audio input and recognizes audio to form corpus data；It is analyzed using Natural Language Processing Models Corpus data, to generate corresponding with corpus data language feature information, wherein language feature information include multiple intentions, it is described The corresponding probability of multiple intentions and multiple vocabulary；Functional situation point is carried out to language feature information according to functional contextual information Analysis judges the corresponding operation of one of the multiple intention；If functional scenario analysis can not judge the multiple intention One of corresponding operation, then for corpus data carry out word segmentation processing；With according to after word segmentation processing as a result, judging whether There are new term or new corpus datas；If there is new term, Natural Language Processing Models are updated according to the meaning of new term, such as There are new corpus datas for fruit, update functional scenario analysis according to the intention of new corpus data；Wherein, operation includes inquiry data behaviour One of work and executing instruction operations.

According to one embodiment of this case, also include: generating one according to a working knowledge database and a domain knowledge data library System regions lexical set；The system regions lexical set and multiple parameters that are served by are formed as a critical entities set, should Critical entities set includes multiple system regions vocabulary；Multiple training corpus are classified as the inquiry data manipulation and the execution refers to Enable one of operation；The multiple training of the inquiry data manipulation is corresponded to according to the class discrimination in the enterprise database The intention of corpus forms multiple queries data manipulation intention, and the service behavior differentiation pair provided according to the enterprise resource system Should the intentions of the multiple training corpus of executing instruction operations form multiple executing instruction operations and be intended to；It establishes the multiple Inquire the model that data manipulation is intended to and the model that the multiple executing instruction operations are intended to；According to the critical entities set, The model that the model and the multiple executing instruction operations that the multiple inquiry data manipulation is intended to are intended to establishes the totality number According to library；Recognize the multiple system regions vocabulary in the critical entities set occur in the multiple training corpus it is multiple First probability, and multiple sentence pattern knots of the multiple the multiple training corpus of system regions lexical analysis by picking out Multiple relevances between structure and the multiple system regions vocabulary, and according to the multiple first probability and institute It states multiple relevances and establishes a common lexicon model；And it the multiple inquiry data manipulation of analysis intention and the multiple holds There are multiple second probability of the multiple system regions vocabulary in being intended in row instruction operation, and according to the multiple sentence pattern structure And the multiple second probability establishes a common semanteme model.

According to one embodiment of this case, also include: it is strong that the data in one historical data base are carried out relationship using a classifier Weak typing generates a functional situational model；And the multiple training corpus is subjected to hyphenation and analysis, and according to the history number A functional lexicon model is generated according to the data in library.

According to one embodiment of this case, which also includes: being believed using the corpus data and the function situation Breath is compared with the function situational model, and generates a functional situation identification result；And it is recognized and is tied according to the function situation Fruit judges one of the multiple intention corresponding one of the inquiry data manipulation and the executing instruction operations.

According to one embodiment of this case, which also includes: being carried out according to the function lexicon model to the corpus data Hyphenation, to generate multiple participles；And calculate the frequency of the multiple participle.

According to one embodiment of this case, also include: judge the calculated the multiple participle of the word segmentation processing frequency whether Lower than a threshold value；If it is the multiple participle one of be lower than the threshold value, it is the multiple participle one of if For the new term, and the definition of the new term is received, to update the common lexicon model and the common semanteme model；And if The multiple participle is above the threshold value, then the corpus data is then the new corpus data, and receives the new corpus data It is intended to, to update the function situational model.

According to one embodiment of this case, also include: judging whether the new corpus data is common corpus, if it is basis should New corpus data updates the system regions lexical set；And the system regions lexical set is updated according to the new term.

According to one embodiment of this case, which analyzes the corpus data and also includes: utilizing the common word Whether there is the multiple system regions vocabulary met in the critical entities set in remittance Model Distinguish corpus data, will distinguish Know result and be set as the multiple vocabulary, and analyzes the probability that the multiple vocabulary occurs；It should according to the multiple lexical analysis The sentence pattern structure of corpus data；And the probability occurred using the common semanteme model according to the multiple vocabulary and the corpus The multiple intention of the sentence pattern Structure Identification of the data corpus data and the corresponding probability of the multiple intention.

Second aspect of this case is to provide a kind of automatic learning system of virtual assistant, respectively with enterprise database and enterprise The connection of industry resource system, it includes: processor, storage device and input/output device.Storage device is electrically connected to processing Device, to store global database, working knowledge database, domain knowledge data library and historical data base.Input/output dress It sets and is electrically connected to processor, to provide interface for input audio.Wherein, processor includes: voice identification module, corpus Analysis module, situation identification module, unknown corpus judgment module and update information module.Voice identification module is to recognize sound Frequency is to form corpus data.Concordance module and voice identification module are electrically connected, to utilize Natural Language Processing Models Analyze corpus data, to generate corresponding with corpus data language feature information, wherein language feature information include multiple intentions, The corresponding probability of the multiple intention and multiple vocabulary.Situation recognizes module and Concordance module and is electrically connected, to according to Functional scenario analysis is carried out to language feature information according to functional contextual information, judges that one of the multiple intention is corresponding Operation.Unknown corpus judgment module and situation identification module are electrically connected, described more can not recognize in situation identification module It is a when one of being intended to corresponding operation, carry out word segmentation processing for corpus data, and with according to after word segmentation processing as a result, Judge whether there is new term or new corpus data.Information module and unknown corpus judgment module is updated to be electrically connected, to When having new term generation, which is updated according to the meaning of the new term, and produce in the new corpus data When raw, which is updated according to the intention of the new corpus data；Wherein, the operation include one inquiry data manipulation and One of one executing instruction operations.

According to one embodiment of this case, which also includes: a training module, is electrically connected with the Concordance module, To generate a system regions lexical set, the system regions word according to the working knowledge database and the domain knowledge data library Collect conjunction and multiple parameters that are served by are formed as a critical entities set, which includes multiple system regions words It converges, and multiple training corpus is classified as one of the inquiry data manipulation and the executing instruction operations, according to the enterprise The intention that class discrimination in database corresponds to the multiple training corpus of the inquiry data manipulation forms multiple queries data Operation is intended to, and distinguishes the multiple of the corresponding executing instruction operations according to the service behavior that the enterprise resource system provides The intention of training corpus forms multiple executing instruction operations and is intended to；One model establishes module, is electrically connected, builds with the training module The model that the multiple inquiry data manipulation is intended to and the model that the multiple executing instruction operations are intended to are found, according to the pass The model that the model and the multiple executing instruction operations that key entity sets, the multiple inquiry data manipulation are intended to are intended to is built Found the global database；One lexicon model establishes module, establishes module with the model and is electrically connected, recognizes the critical entities set In the multiple system regions vocabulary multiple first probability for occurring in the multiple training corpus, and by picking out Multiple sentence pattern structures of the multiple the multiple training corpus of system regions lexical analysis and the multiple system regions word Multiple relevances between remittance, and a common vocabulary is established according to the multiple first probability and the multiple relevance Model；And one semanteme model establish module, with the model establish module be electrically connected, analyze the multiple inquiry data manipulation There is multiple second probability of the multiple system regions vocabulary, and root in being intended in intention and the multiple executing instruction operations A common semanteme model is established according to the multiple sentence pattern structure and the multiple second probability.

According to one embodiment of this case, which also includes: a situation training module electrically connects with the scenario analysis module It connects, the data in the historical data base are carried out relationship power classification using a classifier, generates a functional situational model； And a vocabulary training module, it is electrically connected with the unknown corpus judgment module, the multiple training corpus to break Word and analysis, and a functional lexicon model is generated according to the data in the historical data base.

According to one embodiment of this case, the scenario analysis module is more to utilize the corpus data and the function contextual information It is compared with the function situational model, and generates a functional situation identification result, and according to the function situation identification result Judge one of the multiple intention corresponding one of the inquiry data manipulation and the executing instruction operations.

According to one embodiment of this case, the unknown corpus judgment module more to according to the function lexicon model to the corpus number According to hyphenation is carried out, to generate multiple participles, to calculate the frequency of the multiple participle.

According to one embodiment of this case, the update information module is more to judge calculated the multiple point of the word segmentation processing Whether the frequency of word is lower than a threshold value；If one of the multiple participle is lower than the threshold value, the multiple participle One of be then the new term, and the definition of the new term is received, to update the common lexicon model and the common meaning of one's words Model；If the multiple participle is above the threshold value, which is then the new corpus data, and receives the newspeak The intention of data is expected, to update the function situational model.

According to one embodiment of this case, the update information module more to judge whether the new corpus data is common corpus, If it is the system regions lexical set is updated according to the new corpus data；And the system regions are updated according to the new term Lexical set.

According to one embodiment of this case, the Concordance module is more to recognize the corpus data using the common lexicon model In whether have and meet the multiple system regions vocabulary in the critical entities set, identification result is set as the multiple Vocabulary, and the probability that the multiple vocabulary occurs is analyzed, according to the sentence pattern structure of the multiple lexical analysis corpus data, and The sentence pattern Structure Identification of probability and the corpus data language occurred using the common semanteme model according to the multiple vocabulary Expect the multiple intention and the corresponding probability of the multiple intention of data.

The Auto-learning Method of virtual assistant of the invention and the automatic learning system of virtual assistant mainly allow and virtually help Reason has the function of learning automatically, allows virtual assistant can be during exchanging with user, automatic study arrives user's Jargoon word in habit or the industry of speaking, reach allow user using ERP system when being capable of more rapidly convenient function Effect.

Detailed description of the invention

For above and other purpose, feature, advantage and embodiment of the invention can be clearer and more comprehensible, appended attached drawing is said It is bright as follows:

Fig. 1 is a kind of schematic diagram of the automatic learning system of virtual assistant according to depicted in some embodiments of this case；

Fig. 2 is the schematic diagram of the processor according to depicted in some embodiments of this case；

Fig. 3 is a kind of flow chart of the Auto-learning Method of virtual assistant according to depicted in some embodiments of this case；

Fig. 4 is the flow chart of the training data model according to depicted in some embodiments of this case；

Fig. 5 is the flow chart of the step S320 according to depicted in some embodiments of this case；

Fig. 6 is the flow chart of the step S330 according to depicted in some embodiments of this case；

Fig. 7 is the flow chart of the step S340 according to depicted in some embodiments of this case；And

Fig. 8 is the flow chart of the step S360 according to depicted in some embodiments of this case.

Specific embodiment

The many different embodiments or illustration disclosed below of providing are to implement different characteristic of the invention.In special illustration Element and configuration are used to simplify this announcement in the following discussion.The purposes that any illustration discussed only is used to narrate, and It will not limit the invention in any way or the range and meaning of its illustration.In addition, this announcement may repeat in different illustrations Numerical chracter and/or letter are quoted, these are repeated all in order to simplify and illustrate, different real in itself and not specified following discussion Apply the relationship between example and/or configuration.

The word (terms) used in full piece specification and claims usually has every in addition to having and especially indicating A word using in the content disclosed in this area, herein with the usual meaning in special content.It is certain to describe originally to take off The word of dew by it is lower or this specification other places discuss, to provide those skilled in the art in the description in relation to this exposure Additional guidance.

About " coupling " used herein or " connection ", can refer to two or multiple element mutually directly make entity or electricity Property contact, or mutually put into effect indirectly body or in electrical contact, and " coupling " or " connection " also can refer to two or multiple element mutually grasp Make or acts.

Herein, using the vocabulary of first, second and third etc., be used to describe various elements, component, region, Layer and/or block be it is understood that.But these elements, component, region, layer and/or block should not be by these terms It is limited.These vocabulary are only limited to for distinguishing single element, component, region, layer and/or block.Therefore, one hereinafter First element, component, region, layer and/or block are also referred to as second element, component, region, layer and/or block, without de- From original idea of the invention.As used herein, vocabulary " and/or " contain any of one or more of associated item listed Combination.Mentioned in this case file " and/or " refer to table column element any one, all or at least one any combination.

Please refer to Fig. 1.Fig. 1 is a kind of automatic learning system of virtual assistant according to depicted in some embodiments of this case 100 schematic diagram.Show as depicted in FIG. 1, the automatic learning system 100 of virtual assistant include processor 110, storage device 130 with And input/output device 150.Storage device 130 to store global database 131, working knowledge database 132, field are known Know database 133 and historical data base 134, stores global database 131, working knowledge database 132, domain knowledge data Library 133 and historical data base 134 are electrically connected to processor 110.Input/output device 150 is electrically connected to processor 110, to provide interface for input audio.In an embodiment, input/output device 150 can be keyboard, touch screen Curtain, microphone, loudspeaker or other suitable input/output devices.It is defeated that user can pass through the interface that input/output device provides Enter audio.

In various embodiments of the present invention, processor 110 may be embodied as integrated circuit such as micro-control unit (microcontroller), microprocessor (microprocessor), digital signal processor (digital signal Processor), special application integrated circuit (application specific integrated circuit, ASIC), patrol Collect the combination of circuit or other similar element or said elements.Storage device 150 may be embodied as memory body, hard disk, portable disk, Memory card etc..

Referring to Fig. 2, Fig. 2 is a kind of schematic diagram of processor 110 according to depicted in some embodiments of this case.Processing Device 110 include voice identification module 111, Concordance module 112, situation identification module 113, unknown corpus judgment module 114, Update information module 115, training module 121, model establishes module 122, semanteme model establishes module 123, lexicon model is established Module 124, situation training module 125 and vocabulary training module 126.111 electricity of Concordance module 112 and voice identification module Property connection, situation recognizes module 113 and Concordance module 112 and is electrically connected, and unknown corpus judgment module 114 judges with situation Module 113 is electrically connected, and updates information module 115 and unknown corpus judgment module 114 is electrically connected.Training module 121 and language Expect that analysis module 112 is electrically connected, model establishes module 122 and training module 121 is electrically connected, and semanteme model establishes module 123 and lexicon model establishes module 124 and model establishes the electric connection of module 122, situation training module 125 and situation recognize Module 113 is electrically connected, and unknown corpus judgment module 114 is electrically connected with vocabulary training module 126.

Also referring to FIG. 1 to FIG. 3.Fig. 3 be a kind of virtual assistant according to depicted in some embodiments of this case from The flow chart of dynamic learning method 300.As shown in figure 3, the Auto-learning Method 300 of virtual assistant comprises the steps of:

Step S310: it receives audio input and recognizes audio to form corpus data；

Step S320: analyzing corpus data using Natural Language Processing Models, to generate language corresponding with corpus data Characteristic information；

Step S330: functional scenario analysis is carried out to language feature information according to functional contextual information, judges these intentions One of corresponding operation；

Step S340: it if functional scenario analysis can not judge the corresponding operation of one of these intentions, is directed to Corpus data carries out word segmentation processing；

Step S350: with according to after word segmentation processing as a result, judging whether there is new term or new corpus data；And

Step S360: if there is new term, updating Natural Language Processing Models according to the meaning of new term, if there is New corpus data updates functional scenario analysis according to the intention of new corpus data.

In step S310, receives audio input and recognize audio to form corpus data.In an embodiment, via defeated Enter/audio that receives of output device 150 can carry out speech recognition by the voice identification module 111 of processor 110, it will use The natural language of person is converted to corpus data.In another embodiment, speech recognition can also be passed audio by Internet It send to cloud voice identification system, after recognizing audio via cloud voice identification system, then using identification result as corpus data, For example, cloud voice identification system may be embodied as the voice identification system of google.

Before executing step S320, common lexicon model and common semanteme model need to be first established.Therefore figure is please referred to 4, Fig. 4 be the flow chart of the training data model according to depicted in some embodiments of this case.As shown in figure 4, training data mould The type stage comprises the steps of:

Step S410: according to working knowledge database and domain knowledge data library generation system Field Words set；

Step S420: system regions lexical set and multiple parameters that are served by are formed as critical entities set；

Step S430: multiple training corpus are classified as inquiry one of data manipulation and executing instruction operations；

Step S440: according to the meaning of these training corpus of the corresponding inquiry data manipulation of class discrimination in enterprise database Figure is intended at multiple queries data manipulation, and the service behavior provided according to enterprise resource system is distinguished correspondence and executed instruction The intention of these training corpus of operation forms multiple executing instruction operations and is intended to；

Step S450: the model that inquiry data manipulation is intended to and the model that executing instruction operations are intended to are established；

Step S460: the model and executing instruction operations being intended to according to critical entities set, inquiry data manipulation are intended to Model establish global database；

Step S470: multiple first machines that the system regions vocabulary in identification critical entities set occurs in training corpus Rate, and the multiple sentence pattern structures and system regions vocabulary of the system regions lexical analysis training corpus by picking out are each other Between multiple relevances, and common lexicon model is established according to the first probability and relevance；And

Step S480: there is system regions vocabulary in analysis inquiry data manipulation intention and executing instruction operations intention Multiple second probability, and common semanteme model is established according to sentence pattern structure and the second probability.

In step S410 and step S420, system is generated according to working knowledge database 132 and domain knowledge data library 133 Domain lexical set is commanded, reutilization system Field Words set and multiple parameters that are served by are formed as critical entities set, close Key entity sets include multiple system regions vocabulary.For example, critical entities set includes enterprise's Field Words and enterprise System is served by the information such as parameter.Enterprise's Field Words then refer to that the enterprise of each different field may may require that and use Vocabulary, such as the vocabulary that the vocabulary that applies to of hospitality industry and transport service apply to is not centainly identical, therefore enterprise's Field Words meeting It is varied according to each enterprise's difference using ERP system.Business system to be served by parameter then be business system institute The corresponding parameter of the respective services of offer, for example, the function of asking for leave in business system may need to ask for leave the time, it is false not etc. Information, the system regions vocabulary in critical entities set are just needed comprising information such as the leave of absence, annual leave, sick leave, vacations of going on business.

Specifically, critical entities set also includes that data field title, the business system having when accessing data provide To the parameter value of the service name of user, the user restrictive condition set in inquiry, the parameter value that is served by with And handling function of business system etc., the handling function of business system can be ask for leave, work overtime application, application of going on business, report branch etc. Handling function.And these above-mentioned information may also have corresponding alias, need to also input together in tranining database, example Such as: packing slip is possible to shipment detail list or the different title of sales slip for the manufacturer of specific area.

In step S430, by multiple training corpus be classified as inquiry data manipulation and executing instruction operations wherein it One.Training corpus can be user may under instruction or the problem of can ask etc. natural languages data, establishing Can be by training corpus according to intent classifier after critical entities set, in an embodiment, the intention of user is divided into inquiry data Operation and executing instruction operations, but can also be by the finer of the intent classifier of user, the invention is not limited thereto.Citing and Speech, if user says virtual assistant: " me please be helped to look for the packing slip of XX company " can classify in the intent of the present invention classification To inquire data manipulation, virtual assistant will remove in enterprise database the packing slip for helping user to inquire XX company.If used Person says virtual assistant: " me is helped to ask the vacation of going on business on January 30 " can be classified as executing instruction behaviour in the intent of the present invention classification Make, virtual assistant will help user to ask for leave in Entry Firm resource system.

In step S440, according to these training corpus of the corresponding inquiry data manipulation of class discrimination in enterprise database Intention form multiple queries data manipulation intention, and distinguish corresponding execute according to the service behavior that enterprise resource system provides It instructs the intention of these training corpus of operation to form multiple executing instruction operations to be intended to.It, can be first according to every in an embodiment The enterprise database of a different field, which distinguishes inquiry data manipulation, to be intended to.For example, the enterprise database of hospitality industry is stored up The data field deposited is not centainly identical as the enterprise database of transport service, therefore user's demand of the two is also not necessarily identical. For example, might have inquiry medical record data, inquiry ward vacancy etc. to the user of hospitality industry is all to inquire data manipulation not With being intended to, might have to the user of transport service and inquire shipment record, inquiry package shipping situation etc. is all that inquiry data are grasped The different of work are intended to.Certainly also can according to the enterprise resource system of each different field provide service behavior to execute instruction behaviour To make to distinguish and be intended to, service also certainly can be different with transport service provided by the enterprise resource system of hospitality industry as described above, Inquiry data manipulation provided by the enterprise of each different field or service behavior operation also not necessarily can be general, therefore also need Will service differentiation provided by the enterprise to each different field be intended to, for example, might have offer to the user of hospitality industry It is all that the difference that service behavior operates is intended to that service, the offer registered, which are hospitalized and order service etc. of health meal, the use to transport service It is all that service behavior operates not that person, which might have the service for providing automatic classification cargo, the service for arranging cargo shipment sequence etc., With intention.

In step S450 and step S460, the model and executing instruction operations intention that inquiry data manipulation is intended to are established Model, and according to critical entities set, inquiry data manipulation be intended to model and executing instruction operations be intended to model build Vertical global database 131.For example, user is grasped in the inquiry data that the virtual assistant for operating some field enterprise has Make to be intended to and executing instruction operations are intended to after all distinguishing well, so that it may corresponding model is generated for each intention, according to top Example, hospitality industry just has corresponding inquiry medical record data, inquiry ward vacancy, the service registered is provided and provide be hospitalized order it is strong 4 models of the service of health meal, transport service just have corresponding inquiry shipment record, inquiry package shipping situation, provide automatic point 4 models of service, the service of arrangement cargo shipment sequence of class cargo, then can be real according to these above-mentioned models and key Body set establishes global database 131.

In step S470, the system regions vocabulary in critical entities set occurs in training corpus multiple the are recognized One probability, and the multiple sentence pattern structures and system regions vocabulary of the system regions lexical analysis training corpus by picking out Multiple relevances each other, and common lexicon model is established according to the first probability and relevance.In one embodiment, sharp It is calculated with two kinds of algorithms of n-gram (n-GRAM) and context-free grammar (Context-free grammar, CFG) every The probability that one system regions vocabulary occurs in training corpus, and pass through the sentence pattern structure of system regions lexical analysis training corpus And the relevance between system regions vocabulary is to establish common lexicon model.For example, if having in training corpus " I will inquire the price list of XX company " and " I will inquire the packing slip of XX company ", and " XX company ", " price list " and " out Manifest " is all system regions vocabulary, but in above-mentioned example, since " XX company " may averagely appear in each inquiry number According in the intention of operation, therefore the probability of " XX company " is almost the same in the intention of each inquiry data manipulation, and " report Valence list " and " packing slip " then only largely occur in the training corpus of intention for inquiring certain specific datas, without It inquires in the training corpus of the intention of other data, therefore the probability of " price list " and " packing slip " can be special in corresponding intention It is not high, and can be lower in other intentions.

In step S480, there is system regions word in analysis inquiry data manipulation intention and executing instruction operations intention Multiple second probability converged, and common semanteme model is established according to sentence pattern structure and the second probability.In one embodiment, it utilizes Hidden Markov model (Hidden Markov Model, HMM) algorithm computing system Field Words are in inquiry data manipulation meaning The probability occurred in figure and executing instruction operations intention, to establish common semanteme model, for example, in training data model Can input many training corpus when the stage, hidden Markov model algorithm must computing system Field Words be intended to different Existing probability.In conjunction with above-mentioned example, if having " I will inquire the packing slip of XX company " in training corpus, according to n-gram And context-free grammar can find out " XX company " and " packing slip " is all system regions vocabulary, and hidden Markov model Algorithm can be intended to according to all system regions vocabulary picked out in inquiry data manipulation intention and executing instruction operations In probability and system regions vocabulary between relationship, further judge " packing slip " with inquire the intention of stock withdrawal data it is related Connection can help user to inquire going out for XX company in enterprise database automatically in conjunction with the system regions vocabulary of " XX company " Goods related data.

After having established common lexicon model and common semanteme model, step S320 is then carried out, at natural language Model analysis corpus data is managed, to generate language feature information corresponding with corpus data, language feature information includes multiple meanings Scheme, be intended to corresponding probability and multiple vocabulary.The thin portion process of step S320 is referring to FIG. 5, Fig. 5 is according to some of this case The flow chart of step S320 depicted in embodiment.As shown in figure 5, step S320 is comprised the steps of:

Step S321: it is using whether having to meet in critical entities set in common lexicon model identification corpus data System Field Words, are set as the vocabulary in language feature information for identification result, and the vocabulary in metalanguage characteristic information goes out Existing probability；

Step S322: according to the sentence pattern structure of the lexical analysis corpus data in characteristic information；And

Step S323: the probability and corpus data occurred using common semanteme model according to the vocabulary in characteristic information The intention and the corresponding probability of intention of sentence pattern Structure Identification corpus data.

In step S321 and step S322, meet key using whether having in common lexicon model identification corpus data Identification result is set as the vocabulary in language feature information, and metalanguage feature by the system regions vocabulary in entity sets The probability that vocabulary in information occurs, further according to the sentence pattern structure of the lexical analysis corpus data in characteristic information.For example, The corpus data that user is inputted is recognized the vocabulary in corpus data containing system regions vocabulary using common lexicon model Out, further judge the sentence pattern structure of corpus data.For example, if user says virtual assistant: " I wants Look into the packing slip of XX company last month ", " XX company ", " last month " and " packing slip " can be picked out according to common lexicon model Etc. the vocabulary for meeting system regions vocabulary.

In step S323, using common semanteme model according to the probability and corpus number of the vocabulary appearance in characteristic information According to sentence pattern Structure Identification corpus data intention and be intended to corresponding probability.According to the example of top, " XX public affairs are picked out After the vocabulary such as department ", " last month " and " packing slip ", can further judge these vocabulary it is intentional in probability.Herein All probability for being intended to encompass all inquiry data manipulation intentions and executing instruction operations and being intended to referred to.

In step S330, functional scenario analysis is carried out to language feature information according to functional contextual information, judges these The corresponding operation of one of intention.It needs first to establish functional situational model and functional vocabulary before carrying out functional scenario analysis Model, functional situational model are the spies for being first converted into the data in historical data base 134 when carrying out functional scenario analysis Vector is levied, then can utilize machine learning algorithm by the data in historical data base 134 according to a variety of different context classifications The strong or weak relation between feature vector and each situation is calculated afterwards, then generates functional situational model.It is suitble to establish above-mentioned functional feelings The machine learning algorithm in border include: conventional machines study it is common support vector machine (Support Vector Machine, SVM), and at present deep learning (Deep Learning) relevant convolutional Neural network (Convolutional Neural Networks, CNN), recurrent neural network (Recurrent Neural Networks, RNN) and shot and long term memory models Algorithms such as (Long Short-Term Memory, LSTM).

Hold above-mentioned, functional lexicon model is to be divided according to the training corpus that largely inputs using hidden Markov model algorithm Hyphenation processing is carried out after analysis again, can then count the frequency of occurrences of participle to generate participle frequency meter, and then establishes functional vocabulary Model.The thin portion process of step S330 is referring to FIG. 6, Fig. 6 is the step S330 according to depicted in some embodiments of this case Flow chart.As shown in fig. 6, step S330 is comprised the steps of:

Step S331: it is compared using corpus data and functional contextual information with functional situational model, and generates duty It can situation identification result；And

Step S332: according to functional situation identification result judge these one of be intended to corresponding inquiry data manipulation and One of executing instruction operations.

In step S331, it is compared using corpus data and functional contextual information with functional situational model, and produce Raw function situation identification result.Functional contextual information include the identity of user, the position of user, user department, when Between and place.The partial information of functional contextual information can be sensed by input/output device 150, such as can be detected and made The current state of user (such as, if go on business back).According to front recognize after user's corpus data it is obtained intentionally Scheme corresponding probability and vocabulary, the corpus data and training number of user can be further estimated in conjunction with functional contextual information Probability according to the similarity degree of the data in model, as corresponding intention.

In step S332, the corresponding inquiry data behaviour of one of these intentions is judged according to functional situation identification result One of work and executing instruction operations.Due to had in training data model multiple queries data manipulation be intended to and it is more A executing instruction operations are intended to, and the corresponding machine of each intention can be being generated after the calculating of common semanteme model above-mentioned Rate, the intention with lower probit value can use threshold value filtering, to obtain most possible intention and confirm corresponding behaviour Make.By example above-mentioned it is found that judging these words after picking out vocabulary such as " XX companies ", " last month " and " packing slip " The functional contextual information of the collocation that converges finds out the inquiry data manipulation intention being best suitable for or executing instruction operations are intended to, and is passing through above-mentioned behaviour Judge out user after work to say virtual assistant: " I wants to look into the packing slip of XX company last month " can most possibly be wanted The packing slip of XX company is looked into, therefore can correspond to out that user wishes to carry out is inquiry data manipulation.Need functional situation Judgement be because can have different need because the information such as position, department, operating time, operation place of user are different Ask, for example, procurement staff and financial staff can see [the every monthly returns of manufacturer], but may both [manufacturer is every Monthly returns] statistics target it is not identical: one be count manufacturer situation of stocking up, the other is count oneself company payment To the situation of manufacturer.But user when talking with virtual assistant not necessarily can clearly refer to needs what [manufacturer monthly counts Table], it may only say: " I needs the every monthly returns of the manufacturer of last month " this simple sentence pattern, therefore just make with greater need for collocation The functional contextual information of user is further accurately judged again.

In step S340, if functional scenario analysis can not judge the corresponding operation of one of these intentions, Word segmentation processing is carried out for corpus data.The thin portion process of step S340 is referring to FIG. 7, Fig. 7 is some implementations according to this case The flow chart of step S340 depicted in example.As shown in fig. 7, step S340 is comprised the steps of:

Step S341: hyphenation is carried out to corpus data according to functional lexicon model, to generate multiple participles；And

Step S342: the frequency of these participles is calculated.

In step S341 and step S342, hyphenation is carried out to corpus data according to functional lexicon model, it is multiple to generate Participle；Then the frequency of these participles is calculated.If functional scenario analysis can not judge the corpus number of input in step S330 When according to corresponding operation, it is necessary to carry out word segmentation processing to corpus data.Firstly, can be according to the functional word previously pre-established The vocabulary stored in remittance model carries out hyphenation to corpus data, then calculates the frequency of the multiple participles generated after hyphenation.

In step S350 and step S360, with according to after word segmentation processing as a result, judging whether there is new term or newspeak Expect data；If there is new term, Natural Language Processing Models are updated according to the meaning of new term, if there is new corpus number According to according to the functional scenario analysis of the intention of new corpus data update.The thin portion process of step S360 is referring to FIG. 8, Fig. 8 is basis The flow chart of step S360 depicted in some embodiments of this case.As shown in figure 8, step S360 is comprised the steps of:

Step S361: judge whether the frequency of these calculated participles of word segmentation processing is lower than threshold value；

Step S362: if these participle one of be lower than threshold value, these participle one of if be neologisms It converges, and receives the definition of new term, to update common lexicon model and common semanteme model；And

Step S363: if these participles are above threshold value, corpus data is then new corpus data, and receives newspeak The intention of data is expected, to update functional situational model.

In step S361 and step S362, judge whether the frequency of these calculated participles of word segmentation processing is lower than threshold Value, if one of these participles are lower than threshold value, one of these participles ifs is new term, and receives new term Definition, to update common lexicon model and common semanteme model.In an embodiment, these points have been calculated by word segmentation processing After the frequency of word, the participle that will be less than threshold value is set as new term, and virtual assistant can inquire the definition of user's new term, and The definition of new term and new term is stored in together in common lexicon model and common semanteme model.For example, user The corpus of input is " contact person of my Xiang Zhao XX company ", and if virtual assistant can not judge " the contact of my Xiang Zhao XX company The meaning of people ", can be separated after word segmentation processing " I ", " wanting to look for ", " XX company ", " ", the vocabulary such as " contact person ", if " XX Company " be lower than threshold value, virtual assistant can inquire user " XX company " be what the meaning, then by the answer of user and " XX company " is stored in common lexicon model and common semanteme model together；And new term is also required to be stored in system regions vocabulary together In set, shared with owner.

In step S363, if these participles are above threshold value, corpus data is then new corpus data, and is received The intention of new corpus data, to update functional situational model.The example for connecting top " contact person of my Xiang Zhao XX company " is dividing Separated after word processing " I ", " wanting to look for ", " XX company ", " ", the vocabulary such as " contact person ", if being all lower than threshold without vocabulary Value, indicate virtual assistant it is unapprehended be corpus intention, it is possible to the training corpus in training smart assistant be all about " Help me to look into the contact person of XX company " narration, therefore virtual assistant will can not understand " contact person of my Xiang Zhao XX company " It is intended to, and virtual assistant just needs to inquire what meaning user " contact person of my Xiang Zhao XX company " is again, will then use The new corpus of the answer of person and " contact person of my Xiang Zhao XX company " are stored in functional situational model together.Be stored in functional model it Before need to judge again whether new corpus is common corpus, if so then representing other people can also make when using virtual assistant Use new corpus, it is therefore desirable to new corpus is stored in system regions lexical set, owner is allowed to share；But if words that no are then It represents new corpus and is the different terms be accustomed to and had of speaking of user itself, therefore only need to update functional situational model , do not need to update system regions lexical set again.

By the embodiment of above-mentioned this case it is found that mainly virtual assistant is allowed to have the function of learning automatically, allows and virtually help Reason can be during exchanging, if there is the vocabulary that intelligent assistant is ignorant of can be after inquiring user, more with user The database of new virtual assistant learns virtual assistant automatically to the spy in speak habit or the industry of user Language is very used, reaches and user is allowed the use of ERP system to be the effect that can more rapidly facilitate.

In addition, above-mentioned illustration includes example steps sequentially, but these steps need not be sequentially executed according to shown.With Different order executes these steps all considering in range in this disclosure.In the spirit and model of the embodiment of this disclosure In enclosing, it can optionally increase, replace, change sequence and/or omitting these steps.

Although this case is disclosed as above with embodiment, so it is not limited to this case, any to be familiar with this those skilled in the art, In Do not depart from the spirit and scope of this case, when can be used for a variety of modifications and variations, therefore the protection scope of this case when view it is appended Subject to the range that claims are defined.

Claims

1. a kind of Auto-learning Method of virtual assistant, characterized by comprising:

It receives an audio input and recognizes the audio to form a corpus data；

The corpus data is analyzed using a Natural Language Processing Models, to generate language feature letter corresponding with the corpus data Breath, wherein the language feature information includes multiple intentions, the corresponding probability of the multiple intention and multiple vocabulary；

One functional scenario analysis is carried out to the language feature information according to a functional contextual information, judges its of the multiple intention One of it is corresponding one operation；

If the function scenario analysis can not judge the corresponding operation of one of the multiple intention, it is directed to the corpus Data carry out a word segmentation processing；

With according to after the word segmentation processing as a result, judging whether there is a new term or a new corpus data；And

If there is the new term, which is updated according to the meaning of the new term, if there is the newspeak Expect data, which is updated according to the intention of the new corpus data；

Wherein, which includes inquiry one of a data manipulation and an executing instruction operations.

2. the Auto-learning Method of virtual assistant according to claim 1, which is characterized in that also include:

A system regions lexical set is generated according to a working knowledge database and a domain knowledge data library；

The system regions lexical set and multiple parameters that are served by are formed as a critical entities set, the critical entities set packet Containing multiple system regions vocabulary；

Multiple training corpus are classified as one of the inquiry data manipulation and the executing instruction operations；

The intention shape of the multiple training corpus of the inquiry data manipulation is corresponded to according to the class discrimination in the enterprise database It is intended at multiple queries data manipulation, and the service behavior provided according to the enterprise resource system is distinguished corresponding this and executed instruction The intention of the multiple training corpus of operation forms multiple executing instruction operations and is intended to；

Establish the model that the multiple inquiry data manipulation is intended to and the model that the multiple executing instruction operations are intended to；

The model being intended to according to the critical entities set, the multiple inquiry data manipulation and the multiple executing instruction operations The model of intention establishes the global database；

Recognize the multiple system regions vocabulary in the critical entities set occur in the multiple training corpus it is multiple First probability, and multiple sentence pattern knots of the multiple the multiple training corpus of system regions lexical analysis by picking out Multiple relevances between structure and the multiple system regions vocabulary, and according to the multiple first probability and institute It states multiple relevances and establishes a common lexicon model；And

It analyzes in the multiple inquiry data manipulation intention and the multiple executing instruction operations intention and the multiple system occurs It unites multiple second probability of Field Words, and it is common according to the multiple sentence pattern structure and the multiple second probability to establish one Semanteme model.

3. the Auto-learning Method of virtual assistant according to claim 2, which is characterized in that also include:

The data in one historical data base are subjected to relationship power classification using a classifier, generate a functional situational model；With And

The multiple training corpus is subjected to hyphenation and analysis, and generates a functional vocabulary according to the data in the historical data base Model.

4. the Auto-learning Method of virtual assistant according to claim 3, which is characterized in that the function scenario analysis is also wrapped Contain:

It is compared using the corpus data and the function contextual information with the function situational model, and generates a functional situation Identification result；And

Judge that one of the multiple intention corresponds to the inquiry data manipulation and this is held according to the function situation identification result One of row instruction operation.

5. the Auto-learning Method of virtual assistant according to claim 4, which is characterized in that the word segmentation processing also includes:

Hyphenation is carried out to the corpus data according to the function lexicon model, to generate multiple participles；And

Calculate the frequency of the multiple participle.

6. the Auto-learning Method of virtual assistant according to claim 5, which is characterized in that also include:

Judge whether the frequency of the calculated the multiple participle of the word segmentation processing is lower than a threshold value；

If it is the multiple participle one of be lower than the threshold value, it is the multiple participle one of if be the neologisms It converges, and receives the definition of the new term, to update the common lexicon model and the common semanteme model；And

If the multiple participle is above the threshold value, which is then the new corpus data, and receives the newspeak The intention of data is expected, to update the function situational model.

7. the Auto-learning Method of virtual assistant according to claim 6, which is characterized in that also include:

Judge whether the new corpus data is common corpus, if it is updates the system regions vocabulary according to the new corpus data Set；And

The system regions lexical set is updated according to the new term.

8. the Auto-learning Method of virtual assistant according to claim 2, which is characterized in that the Natural Language Processing Models Analyzing the corpus data also includes:

It is recognized using the common lexicon model whether the multiple in the critical entities set with meeting in the corpus data Identification result is set as the multiple vocabulary by system regions vocabulary, and analyzes the probability that the multiple vocabulary occurs；

According to the sentence pattern structure of the multiple lexical analysis corpus data；And

Utilize the sentence pattern Structure Identification of probability and the corpus data that the common semanteme model occurs according to the multiple vocabulary The multiple intention of the corpus data and the corresponding probability of the multiple intention.

9. a kind of automatic learning system of virtual assistant, connect with an enterprise database and an enterprise resource system respectively, special Sign is, includes:

One processor；

One storage device is electrically connected to the processor, to store a global database, a working knowledge database, a neck Domain knowledge database and a historical data base；

One input/output device is electrically connected to the processor, inputs an audio to provide an interface；

Wherein, which includes:

One voice identification module, to recognize the audio to form a corpus data；

One Concordance module is electrically connected with the voice identification module, to be somebody's turn to do using Natural Language Processing Models analysis Corpus data, to generate a language feature information corresponding with the corpus data, wherein the language feature information includes multiple meanings Figure, the corresponding probability of the multiple intention and multiple vocabulary；

One situation recognizes module, is electrically connected with the Concordance module, to special to the language according to a functional contextual information Reference breath carries out a functional scenario analysis, judges the corresponding operation of one of the multiple intention；

One unknown corpus judgment module is electrically connected with situation identification module, can not recognize in situation identification module When the corresponding operation of one of the multiple intention, a word segmentation processing is carried out for the corpus data, and with according to this point Treated for word as a result, judging whether there is a new term or a new corpus data；And

One updates information module, is electrically connected with the unknown corpus judgment module, to when there is new term generation, according to this The meaning of new term updates the Natural Language Processing Models, and when the new corpus data generates, according to the new corpus data Intention update the function scenario analysis；

10. the automatic learning system of virtual assistant according to claim 9, which is characterized in that the processor also includes:

One training module is electrically connected, to according to the working knowledge database and the domain knowledge with the Concordance module Database generates a system regions lexical set, and the system regions lexical set and multiple parameters that are served by are formed as a key Entity sets, which includes multiple system regions vocabulary, and multiple training corpus are classified as the inquiry data One of operation and the executing instruction operations, correspond to the inquiry data manipulation according to the class discrimination in the enterprise database The multiple training corpus intention formed multiple queries data manipulation intention, and according to the enterprise resource system provide The intention that service behavior distinguishes the multiple training corpus of the corresponding executing instruction operations forms multiple executing instruction operations meanings Figure；

One model establishes module, is electrically connected with the training module, establishes the model that the multiple inquiry data manipulation is intended to, with And the model that the multiple executing instruction operations are intended to, it is intended to according to the critical entities set, the multiple inquiry data manipulation Model and the multiple executing instruction operations be intended to model establish the global database；

One lexicon model establishes module, establishes module with the model and is electrically connected, recognizes described more in the critical entities set Multiple first probability that a system regions vocabulary occurs in the multiple training corpus, and the multiple system by picking out Between multiple sentence pattern structures and the multiple system regions vocabulary of commanding the multiple training corpus of domain lexical analysis Multiple relevances, and a common lexicon model is established according to the multiple first probability and the multiple relevance；And

One semanteme model establishes module, establishes module with the model and is electrically connected, and analyzes the multiple inquiry data manipulation and is intended to And there are multiple second probability of the multiple system regions vocabulary in the multiple executing instruction operations intention, and according to institute It states multiple sentence pattern structures and the multiple second probability establishes a common semanteme model.

11. the automatic learning system of virtual assistant according to claim 10, which is characterized in that the processor also includes:

One situation training module is electrically connected with the scenario analysis module, to utilize a classifier will be in the historical data base Data carry out relationship power classification, generate a functional situational model；And

One vocabulary training module is electrically connected, the multiple training corpus to break with the unknown corpus judgment module Word and analysis, and a functional lexicon model is generated according to the data in the historical data base.

12. the automatic learning system of virtual assistant according to claim 11, which is characterized in that the scenario analysis module is more To be compared using the corpus data and the function contextual information with the function situational model, and generate a functional situation Identification result, and the corresponding inquiry data behaviour of one of the multiple intention is judged according to the function situation identification result One of work and the executing instruction operations.

13. the automatic learning system of virtual assistant according to claim 12, which is characterized in that the unknown corpus judges mould Block is the multiple to calculate to generate multiple participles more to carry out hyphenation to the corpus data according to the function lexicon model The frequency of participle.

14. the automatic learning system of virtual assistant according to claim 13, which is characterized in that the update information module is more To judge whether the frequency of the calculated the multiple participle of the word segmentation processing is lower than a threshold value；If the multiple participle One of be lower than the threshold value, one of the multiple participle is then the new term, and receives determining for the new term Justice, to update the common lexicon model and the common semanteme model；If the multiple participle is above the threshold value, the language Expect that data are then the new corpus data, and receive the intention of the new corpus data, to update the function situational model.

15. the automatic learning system of virtual assistant according to claim 14, which is characterized in that the update information module is more To judge whether the new corpus data is common corpus, the system regions vocabulary is if it is updated according to the new corpus data Set；And the system regions lexical set is updated according to the new term.

16. the automatic learning system of virtual assistant according to claim 10, which is characterized in that the Concordance module is more It is whether the multiple in the critical entities set with meeting in the corpus data to be recognized using the common lexicon model Identification result is set as the multiple vocabulary by system regions vocabulary, and analyzes the probability that the multiple vocabulary occurs, according to institute The sentence pattern structure of multiple lexical analysis corpus datas is stated, and occurred using the common semanteme model according to the multiple vocabulary The multiple intention of probability and the sentence pattern Structure Identification of the corpus data corpus data and the multiple intention correspond to Probability.