CN110287283A

CN110287283A - Intent model training method, intension recognizing method, device, equipment and medium

Info

Publication number: CN110287283A
Application number: CN201910430534.0A
Authority: CN
Inventors: 顾宝宝
Original assignee: Ping An Property and Casualty Insurance Company of China Ltd
Current assignee: Ping An Property and Casualty Insurance Company of China Ltd
Priority date: 2019-05-22
Filing date: 2019-05-22
Publication date: 2019-09-27
Anticipated expiration: 2039-05-22
Also published as: CN110287283B

Abstract

The invention discloses a kind of intent model training method, intension recognizing method, device, equipment and medium, this method includes being input to training text term vector and intention labels to be intended in training pattern, obtains original intent identification model；Test text term vector and intention labels are input in original intent identification model, the corresponding output of original intent identification model is obtained as a result, if output result is greater than default accuracy rate, original intent identification model is determined as target intention identification model；The corresponding target text term vector of target text and corresponding entity tag are input in entity training pattern, entity recognition model is obtained.Target intention is obtained by target intention identification model, target entity is obtained by entity recognition model, it is exchanged to randomly select effective words art corresponding with target intention and target entity from words art template with client, improves the flexibility of robot and client's dialogue.

Description

Intent model training method, intension recognizing method, device, equipment and medium

Technical field

The present invention relates to intelligent decision field more particularly to a kind of intent model training method, device, computer equipment and Storage medium.

Background technique

The question and answer process and dialog template of existing intelligent training system, robot and client be all it is pre-set, No matter that is what information client issues, robot can all be putd question to according to pre-set problem, be lacked flexible Property, it can not accomplish to carry out Intelligent dialogue according to the actual situation.If robot is simply by pre-set dialog template and asks It answers process to engage in the dialogue with client, and carries out Products Show, then the intention adjustment pair that cannot be accurately spoken according to client Template is talked about, the conversational quality between client is influenced, so that Products Show success rate reduces.

Summary of the invention

The embodiment of the present invention provides a kind of intent model training method, device, computer equipment and storage medium, to solve It attends a banquet and talks with inflexible problem with robot.

A kind of intent model training method, comprising:

It obtains received pronunciation and the received pronunciation is labeled, the received pronunciation carries corresponding intention mark Label；

Text Pretreatment is carried out to the received pronunciation, obtains target text；

The target text is converted into target text term vector, and the target text term vector is divided into training text This term vector and test text term vector；

The training text term vector and the intention labels are input to and are intended to be trained in training pattern, is obtained former Beginning intention assessment model, the training pattern that is intended to is that the Seq2Seq model formed after attention mechanism is added；

The test text term vector and the intention labels are input in the original intent identification model, institute is obtained The corresponding output of original intent identification model is stated as a result, if the output result is greater than default accuracy rate, original intent is known Other model is determined as target intention identification model；

Entity mark is named to the target text, so that the target text carries entity tag；

The corresponding target text term vector of the target text and corresponding entity tag are input to entity training pattern In be trained, obtain entity recognition model.

A kind of intent model training device, comprising:

Received pronunciation obtains module, for obtaining received pronunciation and being labeled to the received pronunciation, the standard speech Sound carries corresponding intention labels；

Received pronunciation processing module obtains target text for carrying out Text Pretreatment to the received pronunciation；

Target text processing module, for the target text to be converted into target text term vector, and by the target Text term vector is divided into training text term vector and test text term vector；

Original intent identification model training module, for the training text term vector and the intention labels to be input to It is intended to be trained in training pattern, obtains original intent identification model, the intention training pattern is that attention mechanism is added The Seq2Seq model formed afterwards；

Original intent identification model test module, for the test text term vector and the intention labels to be input to In the original intent identification model, the corresponding output of the original intent identification model is obtained as a result, if the output result Greater than default accuracy rate, then original intent identification model is determined as target intention identification model；

Entity labeling module is named, for being named entity mark to the target text, so that the target text Carry entity tag；

Entity recognition model obtains module, is used for the corresponding target text term vector of the target text and corresponding reality Body label is input in entity training pattern and is trained, and obtains entity recognition model.

A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize above-mentioned intent model training method when executing the computer program.

A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter Calculation machine program realizes above-mentioned intent model training method when being executed by processor

A kind of intension recognizing method, comprising:

The customer voice for obtaining robot acquisition carries out voice pretreatment to the customer voice, obtains voice to be identified；

Text Pretreatment is carried out to the voice to be identified, obtains term vector to be identified；

Using the target intention identification model that above-mentioned intent model training method obtains identify the word to be identified to Amount obtains target intention；

The entity recognition model obtained using above-mentioned intent model training method carries out the term vector to be identified Identification obtains target entity；

By target intention and target entity, art template if selection is corresponding with target intention and target entity, from words art Effectively words art is randomly selected in template, and effective words art is converted by target voice by text-to-speech technology, is controlled The robot plays the target voice.

A kind of intention assessment device, comprising:

Customer voice processing module carries out voice to the customer voice for obtaining the customer voice of robot acquisition Pretreatment, obtains voice to be identified；

Speech processing module to be identified, for carrying out Text Pretreatment to the voice to be identified, obtain word to be identified to Amount；

Target intention obtains module, and the target intention for using above-mentioned intent model training method to obtain identifies mould Type identifies the term vector to be identified, obtains target intention；

Target entity obtains module, the entity recognition model pair for using above-mentioned intent model training method to obtain The term vector to be identified is identified, target entity is obtained；

Target voice processing module, for selecting and target intention and target entity by target intention and target entity Art template if correspondence randomly selects effectively words art from words art template, and passes through text-to-speech technology for effective words Art is converted into target voice, controls the robot and plays the target voice.

A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize above-mentioned intension recognizing method when executing the computer program.

A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter Calculation machine program realizes above-mentioned intension recognizing method when being executed by processor.

Above-mentioned intent model training method, device, computer equipment and storage medium, by obtaining received pronunciation and to mark Quasi- voice is labeled, so that received pronunciation carries corresponding intention labels, it is in subsequent trained original intent identification model It constructs loss function and data source is provided.In order to facilitate training original intent identification model, also need received pronunciation being converted to mesh Text is marked, and target text is converted to by target text term vector using term vector crossover tool.Obtain target text word to After amount, occurs overfitting problem in order to prevent, it is also necessary to which target text term vector is divided into training text term vector and test Text term vector.Training training text term vector and intention labels are input to and are intended to be trained in training pattern, more new meaning The parameter of figure training pattern obtains original intent identification model；Then test text term vector and intention labels are input to original In beginning intention assessment model, the corresponding output of original intent identification model is obtained as a result, if output result is greater than default accuracy rate, It then indicates that trained original intent identification model is met the requirements, can be used as target intention identification model and identify one section of voice institute The intention to be expressed.In order to definitely know the meaning of one section of phonetic representation, identifying that one section of customer voice to be expressed Intention after, intent model training method provided by the invention additionally provides entity recognition model, for identification one section of client's language Name entity in sound, to improve the identification accuracy of customer voice.

Above-mentioned intension recognizing method, device, computer equipment and storage medium acquire customer voice by robot, so Voice pretreatment is carried out to the customer voice of acquisition afterwards, removes the interference voice in customer voice, is only retained continuous containing vocal print Change apparent phonological component, i.e., voice to be identified improves the efficiency and accuracy rate of Text Pretreatment.Then to voice to be identified Text Pretreatment is carried out, obtains term vector to be identified, and treat respectively by target intention identification model and entity recognition model The identification for identifying term vector, obtains the corresponding target intention of each model and target entity, to randomly select from words art template Effective words art corresponding with target intention and target entity is exchanged with client, improve robot talk with client it is flexible Property.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 is an application scenario diagram of intent model training method in one embodiment of the invention；

Fig. 2 is a flow chart of intent model training method in one embodiment of the invention；

Fig. 3 is a specific flow chart of step S20 in Fig. 2；

Fig. 4 is a specific flow chart of step S40 in Fig. 2；

Fig. 5 is a specific flow chart of step S42 in Fig. 4；

Fig. 6 is a specific flow chart of step S70 in Fig. 2；

Fig. 7 is a schematic diagram of intent model training device in one embodiment of the invention；

Fig. 8 is a flow chart of intension recognizing method in one embodiment of the invention；

Fig. 9 is the schematic diagram for being intended to identification device in one embodiment of the invention；

Figure 10 is a schematic diagram of computer equipment in one embodiment of the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

Intent model training method provided by the present application, can be applicable in the application environment such as Fig. 1, wherein client is logical Network is crossed to be communicated with server.Wherein, terminal device includes but is not limited to various personal computers, laptop, intelligence It can mobile phone, tablet computer and portable wearable device.Server can use independent server either multiple server groups At server cluster realize.

In one embodiment, it as shown in Fig. 2, providing a kind of intent model training method, applies in Fig. 1 in this way It is illustrated, includes the following steps: for server

S10: it obtains received pronunciation and received pronunciation is labeled, received pronunciation carries corresponding intention labels.

Wherein, received pronunciation refers to after treatment only comprising the apparent customer voice of vocal print consecutive variations.Customer voice refers to Client passes through the voice for the client that sound collection equipment acquires in speech.It is to be appreciated that received pronunciation refers to removal client The client's language retained after noise when speaking because thinking deeply, breathing the sendings such as the pause of generation and the collision of the switch of door and window, object Sound.

Specifically, model training is carried out in order to facilitate subsequent step, needs in advance to carry out at intention labels received pronunciation Reason, so that each received pronunciation carries corresponding intention labels.Wherein, it is intended that label refers to developer according to standard speech The ready-made label of intention of sound expression.

S20: Text Pretreatment is carried out to received pronunciation, obtains target text.

Wherein, Text Pretreatment refers to the processing method that the data of speech form are converted to textual form.Target text refers to Received pronunciation is converted to the text formed after written form.Specifically, after obtaining received pronunciation, need to received pronunciation into Row Text Pretreatment will be converted to existing target text in the form of text, with received pronunciation existing for speech form with convenient Execute subsequent step.

S30: target text is converted into target text term vector, and target text term vector is divided into training text word Vector sum test text term vector.

Wherein, target text term vector, which refers to, is converted to the data formed after corresponding term vector for the word in target text.

Specifically, after obtaining target text, it is also necessary to be converted to target text by text term vector crossover tool Corresponding target text term vector, completes subsequent step to facilitate.Text term vector crossover tool used in the present embodiment is Word2vec (word to vector, word converting vector), word2vec are a kind of tools that word is converted to vector, should Each word can be mapped to corresponding vector by tool.

In order to train the model being related in subsequent step, and the accuracy of trained model is verified, is obtaining mesh After marking text term vector, target text term vector is divided into training text term vector and test text term vector.Generally, it instructs The ratio for practicing text term vector and test text term vector is 9:1.Training text term vector refers to for adjusting the parameter in model Term vector.Test text term vector refers to the term vector of the recognition accuracy for testing trained model.

S40: training text term vector and intention labels are input to and are intended to be trained in training pattern, obtains original meaning Figure identification model, it is intended that training pattern is that the Seq2Seq model formed after attention mechanism is added.

Intention training pattern in the present embodiment is the model of seq2Seq+attention composition.Wherein, Seq2Seq mould Type is a kind of encoder-decoder model, i.e. coding-decoded model.Coding, which refers to, is converted to a fixed length for list entries The vector of degree；Decoding, refers to that the fixed vector that will be generated before is then converted into output sequence.Further, Seq2Seq model includes Two RNN (Recurrent Neural Networks, Recognition with Recurrent Neural Network), essence is a kind of deformation to two-way RNN, The sequence length output and input is not equal.Two RNN in Seq2Seq model respectively correspond to a hidden layer, jointly Possess an input layer and output layer.In order to facilitate subsequent descriptions, two RNN are known as RNN and backward RNN forward, forward RNN Corresponding hidden layer is hidden layer forward, and the corresponding hidden layer of RNN is hidden layer backward backward.

Specifically, training text term vector and intention labels are input to and are intended to be trained in training pattern, obtained former Beginning intention assessment model specifically comprises the following steps: that the input layer of (one) seq2Seq+attention model is obtaining training text After this term vector, training text term vector is input in hidden layer forward and is calculated, and using attention mechanism to hidden forward The output that hiding layer obtains carries out Automobile driving, wherein attention mechanism, that is, attention mechanism, referring to will be important according to data Property different pairs according to different weight is assigned, the big corresponding weight of importance is big, and the small corresponding weight of importance is small； (2) output for carrying different attentions (weight) is encoded by encoder, obtains semantic vector C, wherein coding Refer to the process that list entries is converted to the vector of a regular length；(3) semantic vector C is input in hidden layer backward, The output obtained using attention mechanism to hidden layer backward carries out Automobile driving, then by decoder to carrying The output of different attentions is decoded, and obtains the output of hidden layer backward, wherein decoding refers to turns the fixed vector of generation again It is melted into the process of output sequence；(4) by the output layer for being input to Seq2Seq of hidden layer backward, pass through the meter of output layer It calculates, obtains model output, wherein model output is that training text term vector is obtained by seq2Seq+attention model training The output taken；(5) loss function is constructed by model output and intention labels, backpropagation is then used according to loss function Algorithm adjusts the weight of Seq2Seq, obtains original intent identification model, wherein backpropagation (Back Propagation) is calculated Method refer to according to the reverse sequence of time sequence status adjust weight between hidden layer and the output layer of Seq2Seq and biasing and The algorithm of weight and biasing between input layer and hidden layer.

S50: test text term vector and intention labels are input in original intent identification model, are obtained original intent and are known Other model is corresponding to be exported as a result, if output result is greater than default accuracy rate, and original intent identification model is determined as target Intention assessment model.

Specifically, after obtaining original intent model, occur over-fitting in order to prevent, further verify original intent The accuracy of identification model, it is also necessary to test text term vector and intention labels are input in original intent identification model, obtained The corresponding output of original intent identification model is taken as a result, if output result is greater than default accuracy rate, then it represents that trained original Intention assessment model is met the requirements, and can be used as target intention identification model, and one section of customer voice is to be expressed for identification It is intended to.

S60: entity mark is named to target text, so that target text carries entity tag.

Wherein, name entity mark refers to the process that certain types of things title or symbol are identified in collection of document. Name entity (named entity) refers to name, mechanism name, place name and other all entities with entitled mark, such as insures Classification (XX insurance kind) and customer action (client will place an order) etc..

Entity mark is named to target text, so that target text carries entity tag, to obtain in step S70 Entity recognition model provides data source.Wherein, entity tag refers to is used to mark target according to what the content of target text obtained The name entity of text.

S70: the corresponding target text term vector of target text and corresponding entity tag are input to entity training pattern In be trained, obtain entity recognition model.

Entity training pattern in the present embodiment is BLSTM+CRF.Wherein, BLSTM (Bi-long-short term Memory, two-way long short-term memory nerve) network is a kind of time recurrent neural network.CRF(conditional random Field algorithm, condition random field algorithm) it is a kind of to sequences progress such as participle, part-of-speech tagging and name Entity recognitions The algorithm of mark.

Specifically, target text term vector is input in BLSTM, obtains output of the BLSTM to target text term vector, In order to remove illegal name entity, by BLSTM it is corresponding be input in CRF calculate target text term vector it is corresponding Optimal sequence label, using the sequence of maximum probability as the corresponding training name entity of target text term vector.It is trained obtaining After naming entity, the name entity error between training name entity and original name entity is calculated, when name entity error exists Preset entity error range, then using corresponding entity training pattern as entity recognition model, one section of client's language for identification Name entity in sound.

Step S10- step S70, by obtaining received pronunciation and being labeled to received pronunciation, so that received pronunciation carries There are corresponding intention labels, provides data source to construct loss function in subsequent trained original intent identification model.For side Just original intent identification model is trained, also needs to be converted to received pronunciation into target text, and uses term vector crossover tool by mesh Mark text conversion is target text term vector.After obtaining target text term vector, occurs overfitting problem in order to prevent, also need Target text term vector is divided into training text term vector and test text term vector.Will training training text term vector and Intention labels, which are input to, to be intended to be trained in training pattern, and the parameter of update intent training pattern obtains original intent identification Model；Then test text term vector and intention labels are input in original intent identification model, obtain original intent identification The corresponding output of model is as a result, if output result is greater than default accuracy rate, then it represents that trained original intent identification model is full Foot requires, and can be used as target intention identification model and identifies one section of voice intention to be expressed.In order to definitely know one The meaning of section phonetic representation, after having identified one section of customer voice intention to be expressed, intent model instruction provided by the invention Practice method and additionally provide entity recognition model, for identification the name entity in one section of customer voice, to improve customer voice Identify accuracy.

In one embodiment, as shown in figure 3, step S20, carries out Text Pretreatment to received pronunciation, obtain target text, Specifically comprise the following steps:

S21: speech-to-text technology is used, received pronunciation is converted into urtext.

The speech-to-text technology that the present embodiment uses is ASR technology, wherein ASR (Automatic Speech Recognition, automatic speech recognition technology) it is a kind of technology that the voice of people is converted to text.Urtext index is quasi- Voice generates the text of corresponding written form by ASR technology conversion.

S22: the first pretreatment is carried out to urtext using regular expression, and will be by the according to default Cutting Length One pretreated urtext is cut into effective text.

Wherein, regular expression is also known as regular expression (Regular Expression, is often abbreviated as in code Regex, regexp or RE), refer to a kind of logical formula that operation is filtered to urtext.Regular expressions in the present embodiment Formula is for expressing a kind of filter logic being filtered to data in urtext and additional character.Default Cutting Length refers to root The value for being used to for urtext being cut into specific length pre-set according to actual needs.

Specifically, after obtaining urtext, urtext is carried out using the regular expression write in advance first First pretreatment removes the data and additional character in urtext, and the data in the present embodiment, which refer to, is converted to target voice The number occurred after urtext；Additional character refers to the word that cannot be identified occurred after target voice to be converted to urtext Symbol, as $, *, &, # ,+,?.

After carrying out the first pretreatment to urtext, server will be after the first pretreatment according to default Cutting Length Urtext be cut into effective text of specific length.Wherein, effective text refers to urtext according to default Cutting Length It is cut into the text of specific length.

S23: the second pretreatment is carried out to effective text using participle tool, obtains target text.

Specifically, after obtaining effective text, server carries out cutting to effective text using participle tool, and removal deactivates Word (participle, preposition, pronoun etc.) obtains target text.Wherein, target text is formed after referring to the stop words removed in effective text Text.Participle tool in the present embodiment includes but is not limited to participle tool of stammering.Wherein, stop words refers in information retrieval In, it, can automatic mistake before or after handling natural language data (or text) to save memory space and improving search efficiency Certain words or word are filtered, which can deactivate vocabulary or Harbin Institute of Technology's stop words dictionary by developer voluntarily with reference to Baidu Definition.

Received pronunciation is converted to urtext by speech-to-text technology by step S21- step S23, then using just The then data and additional character in expression formula removal urtext, and the original that the data in urtext and additional character will be removed Beginning text is cut into effective text, finally removes the stop words in effective text using participle tool, target text is obtained, to subtract Target text is converted into the data processing amount of corresponding target text term vector by few subsequent step.

In one embodiment, as shown in figure 4, step S40, is input to intention instruction for training text term vector and intention labels Practice and be trained in model, obtains original intent identification model, specifically comprise the following steps:

S41: Initialize installation is carried out to the weight and biasing that are intended in training pattern.

In the present embodiment, Initialize installation is carried out to weight and biasing using preset value, which is developer's root The value pre-set according to experience.Initialize installation is carried out to the weight and biasing that are intended to training pattern using preset value, it can be with It is subsequent according to training text term vector carry out be intended to training pattern training when, shorten the training time of model, improve model Recognition accuracy.If the Initialize installation to weight and biasing is not that very appropriately, will lead to model in the adjustment of initial stage Poor ability, so that it is subsequent to text term vector effect to influence the intention training pattern.

S42: training text term vector and intention labels being input to and are intended to be trained in training pattern, update intent instruction Practice the weight in model and biasing, obtains original intent identification model.

Specifically, training text term vector and intention labels are input to and are intended to be trained in training pattern, obtain mould Then type output is constructed loss function using model output and intention labels, seeks local derviation to loss function, calculated using backpropagation Weight and biasing in method update intent training pattern obtain original intent identification model.

Step S41- step S42 shortens mould by carrying out Initialize installation to the weight and biasing that are intended in training pattern The training time of type improves the recognition accuracy of model, and training text term vector and intention labels are then input to intention instruction Practice and be trained in model, weight and biasing in update intent training pattern obtain original intent identification model, so that original Intention assessment model is the model that can be used to identify customer voice intention to be expressed.

In one embodiment, as shown in figure 5, step S42, is input to intention instruction for training text term vector and intention labels Practice and be trained in model, weight and biasing in update intent training pattern obtain original intent identification model, specifically include Following steps:

Training text term vector and intention labels: being input to the input layer for being intended to training pattern by S421, and input layer will obtain The training text term vector got is input in hidden layer forward, obtains that hidden layer is corresponding forward exports forward.

Specifically, training text term vector is input to the input layer of seq2Seq+attention model, input layer will obtain The training text term vector got is input in the hidden layer forward of RNN forward, passes through formula h in hidden layer forward_t=σ (Ux_t+Wh_t-1+ b) calculate export forward, i.e. the output of hidden layer forward.Wherein, σ indicates the activation letter of RNN hidden layer forward Number, U indicates the input layer of seq2Seq+attention model and the weight between RNN hidden layer, W indicate that RNN is each forward forward Weight between hidden layer, the input layer and the biasing between RNN forward that b indicates seq2Seq+attention model, x_tIt indicates The training text term vector that t moment inputs in the input layer of seq2Seq+attention model, h_tIndicate forward RNN hide Output in layer to the corresponding training text term vector of t moment, h_t-1Indicate t-1 moment corresponding instruction in the hidden layer of RNN forward Practice the output of text term vector.

S422: Automobile driving is carried out to output forward using attention mechanism, and using encoding mechanism to exporting forward It is encoded, obtains semantic vector.

Specifically, after obtaining output forward, the attention mechanism in seq2Seq+attention model is according to formulaCalculate the attention of training text term vector, wherein c_tRefer to the attention (i.e. importance value) of t moment semantic vector, α_tjRefer to the correlation for the training text term vector that the output of t moment in the decoder stage is inputted with j-th of the encoder stage, h_j Refer to j input in the output encoded by encoder.Further, normalization process isWherein, k Refer to k-th input.e_tj=V^Γtanh(U·h_j+WS_t-1+ b), wherein e_tjIndex goes out the conditional probability of result, and V indicates hidden layer Weight between output layer, V^ΓThe transposition of weight V, S_t-1Refer to decoder in the output at t-1 moment.

After the Automobile driving for completing to export forward, the encoding mechanism pair in seq2Seq+attention model is used Output is encoded forward, obtains semantic vector.

S423: being decoded semantic vector using decoding mechanism, and using attention mechanism to it is decoded it is semantic to Amount carries out Automobile driving, obtains the output backward of hidden layer backward.

Specifically, the corresponding output of hidden layer after output is directed toward backward.After obtaining semantic vector, seq2Seq+ is used Decoding mechanism in attention model is decoded semantic vector, and using attention mechanism to it is decoded it is semantic to Amount carries out Automobile driving, obtains the output backward of hidden layer backward.

S424: the output backward of hidden layer backward is input in output layer, obtains model output.

Specifically, after obtaining output backward, it will export and be input in output layer backward, output layer passes through formula S_t=f (S_t-1,y_t-1,c_t) computation model output.Wherein, S_tIndicate output of the decoder in t moment, S_t-1Indicate decoder in t-1 The output at quarter, y_t-1Refer to the intention labels that the training text term vector that the t-1 moment inputs carries, f generally selects softmax function.It obtains The output of modulus type constructs loss function convenient for subsequent step, so as to adjust seq2Seq+attention model weight and weighting.

S425: loss function is constructed based on model output and intention labels, based on loss function to recycling nerve net forward Network and backward Recognition with Recurrent Neural Network carry out error back propagation, adjustment Recognition with Recurrent Neural Network and Recognition with Recurrent Neural Network backward forward Weight and biasing obtain original intent identification model.

Specifically, after obtaining model output, loss function is constructed based on model output and intention labels,θ indicates the set (U, V, W, b, c) of weight and biasing, y_tRefer to the intention labels that the training text term vector of t moment input carries.Then local derviation is asked to loss function, to circulation mind forward Error back propagation is carried out through network and backward Recognition with Recurrent Neural Network, adjustment Recognition with Recurrent Neural Network and recycles nerve net forward backward The weight of network and biasing obtain original intent identification model.

In one embodiment, as shown in fig. 6, step S70, by the corresponding target text term vector of target text and corresponding Entity tag is input in entity training pattern and is trained, and obtains entity recognition model, specifically comprises the following steps:

S71: target text term vector and entity tag are input in entity training pattern, obtain training entity.

Specifically, after obtaining target text term vector, target text term vector is input in entity training pattern, is obtained Take trained entity.Wherein, training entity refers to entity training pattern that through the identification to training text term vector, what is obtained is defeated Out.

S72: calculating the name entity error of training entity and entity tag, when name entity error is missed in preset entity In poor range, then using entity training pattern as entity recognition model.

Wherein, name entity error refers to the error of trained entity and entity tag；Preset entity error range refers to exploitation The error range whether met the requirements for determining name entity error that personnel pre-set.

Specifically, after obtaining training entity, the name entity error of training entity and entity tag is calculated, when name is real Body error is in preset entity error range, then it represents that the accuracy of entity training pattern has reached requirement.Entity training Model can be used as entity of the entity recognition model for identification in target text term vector.

Step S71- step S72 is instructed by the way that target text term vector and entity tag to be input in entity training pattern Practice entity training pattern, then calculate the name entity error of training entity and entity tag, when name entity error is default Entity error range in, then it represents that the accuracy of entity training pattern has reached requirement, can be used as entity recognition model use Entity in identification target text term vector.

Intent model training method provided by the invention by acquisition received pronunciation and is labeled received pronunciation, so that Received pronunciation carries corresponding intention labels, provides data to construct loss function in subsequent trained original intent identification model Source.It in order to facilitate training original intent identification model, also needs to be converted to received pronunciation into target text, and is turned using term vector It changes tool and target text is converted into target text term vector.After obtaining target text term vector, occurred intending in order to prevent Conjunction problem, it is also necessary to which target text term vector is divided into training text term vector and test text term vector.Training is trained Text term vector and intention labels, which are input to, to be intended to be trained in training pattern, and the parameter of update intent training pattern obtains Original intent identification model；Then test text term vector and intention labels are input in original intent identification model, are obtained The corresponding output of original intent identification model is as a result, if output result is greater than default accuracy rate, then it represents that trained original meaning Figure identification model is met the requirements, and be can be used as target intention identification model and is identified one section of voice intention to be expressed.In order to more Add the meaning for clearly knowing one section of phonetic representation, after having identified one section of customer voice intention to be expressed, the present invention is provided Intent model training method additionally provide entity recognition model, the name entity in one section of customer voice for identification, to mention The identification accuracy of high customer voice.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

In one embodiment, a kind of intent model training device is provided, the intent model training device and above-described embodiment Middle intent model training method corresponds.As shown in fig. 7, the intent model training device includes that received pronunciation obtains module 10, received pronunciation processing module 20, target text processing module 30, original intent identification model training module 40, original intent Identification model test module 50, name entity labeling module 60 and entity recognition model obtain module 70.Each functional module is detailed It is described as follows:

Received pronunciation obtains module 10, and for obtaining received pronunciation and being labeled to received pronunciation, received pronunciation is carried There are corresponding intention labels.

Received pronunciation processing module 20 obtains target text for carrying out Text Pretreatment to received pronunciation.

Target text processing module 30, for target text to be converted into target text term vector, and by target text word Vector is divided into training text term vector and test text term vector.

Original intent identification model training module 40, for training text term vector and intention labels to be input to intention instruction Practice and be trained in model, obtain original intent identification model, it is intended that training pattern is formed after attention mechanism is added Seq2Seq model.

Original intent identification model test module 50, for test text term vector and intention labels to be input to original meaning In figure identification model, the corresponding output of original intent identification model is obtained as a result, if output result is greater than default accuracy rate, it will Original intent identification model is determined as target intention identification model.

Entity labeling module 60 is named, for being named entity mark to target text, so that target text carries Entity tag.

Entity recognition model obtains module 70, is used for the corresponding target text term vector of target text and corresponding entity Label is input in entity training pattern and is trained, and obtains entity recognition model.

Further, received pronunciation processing module 20 includes speech-to-text unit, effective text acquiring unit and target Text acquiring unit.

Received pronunciation is converted to urtext for using speech-to-text technology by speech-to-text unit.

Effective text acquiring unit, for carrying out the first pretreatment to urtext using regular expression, and according to pre- If Cutting Length will be cut into effective text by the first pretreated urtext.

Target text acquiring unit obtains target text for carrying out the second pretreatment to effective text using participle tool This.

Further, original intent identification model training module 40 includes that parameter initialization unit and original intent identify mould Type acquiring unit.

Parameter initialization unit, for carrying out Initialize installation to the weight and biasing that are intended in training pattern.

Original intent identification model acquiring unit, for training text term vector and intention labels to be input to intention training It is trained in model, weight and biasing in update intent training pattern, obtains original intent identification model.

Further, original intent identification model acquiring unit includes preceding single to output acquiring unit, semantic vector acquisition Member exports acquiring unit, model output acquiring unit and parameter update processing unit backward.

Forward direction exports acquiring unit, is intended to the defeated of training pattern for training text term vector and intention labels to be input to Enter layer, the training text term vector that input layer will acquire is input in hidden layer forward, obtain forward hidden layer it is corresponding to Preceding output.

Semantic vector acquiring unit for carrying out Automobile driving to output forward using attention mechanism, and is used and is compiled Ink recorder system encodes output forward, obtains semantic vector.

Acquiring unit is exported backward, for being decoded using decoding mechanism to semantic vector, and uses attention mechanism Automobile driving is carried out to decoded semantic vector, obtains the output backward of hidden layer backward.

Model exports acquiring unit, and for the output backward of hidden layer backward to be input in output layer, it is defeated to obtain model Out.

Parameter updates processing unit, for constructing loss function based on model output and intention labels, is based on loss function Recognition with Recurrent Neural Network carries out error back propagation to Recognition with Recurrent Neural Network forward and backward, adjustment forward Recognition with Recurrent Neural Network and to The weight of Recognition with Recurrent Neural Network and biasing afterwards obtain original intent identification model.

Further, it includes that training entity acquiring unit and entity recognition model obtain that entity recognition model, which obtains module 70, Unit.

Training entity acquiring unit, for target text term vector and entity tag to be input in entity training pattern, Obtain training entity.

Entity recognition model acquiring unit works as name for calculating the name entity error of trained entity and entity tag Entity error is in preset entity error range, then using entity training pattern as entity recognition model.

Specific about intent model training device limits the limit that may refer to above for intent model training method Fixed, details are not described herein.Modules in above-mentioned intent model training device can fully or partially through software, hardware and its Combination is to realize.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also be with It is stored in the memory in computer equipment in a software form, in order to which processor calls the above modules of execution corresponding Operation.

In one embodiment, as shown in figure 8, providing a kind of intension recognizing method, the intension recognizing method specifically include as Lower step:

S81: obtaining the customer voice of robot acquisition, carries out voice pretreatment to customer voice, obtains voice to be identified.

Wherein, voice pretreatment refers to that the customer voice to robot acquisition carries out pretreated method.

Specifically, when client with robot when engaging in the dialogue, the data acquisition equipment in robot can acquire client Voice.Due to the customer voice that directly acquires exist interference voice (because thinking deeply, breathe the pause of generation when such as client speaks, or The sound of the sendings such as the collision of switch, object of person's door and window), it will affect later period target intention identification model and entity recognition model Accuracy rate, therefore, server obtain robot acquisition customer voice after, it is also necessary to customer voice carry out preemphasis, The pretreatment of the voices such as framing, adding window and end-point detection, removes the interference voice in customer voice, only retains and continuously becomes containing vocal print Change apparent phonological component, i.e., voice to be identified.Voice pretreatment is carried out to customer voice, facilitates subsequent step to language to be identified Sound carries out Text Pretreatment, improves processing accuracy rate.

S82: Text Pretreatment is carried out to voice to be identified, obtains term vector to be identified.

Specifically, after obtaining voice to be identified, Text Pretreatment is carried out to voice to be identified, text is turned using voice first Voice to be identified is converted into corresponding text by word technology, then using regular expression to data in the text and special Symbol is removed, and is cut according to default Cutting Length to the text, finally, being carried out using participle tool to the text Stop words removal processing, and using text term vector crossover tool by remove stop words after text conversion be word to be identified to Amount.

S83: term vector to be identified is identified using the target intention identification model that above-mentioned intent model training method obtains, is obtained Take target intention.

Wherein, client's meaning that target intention feeling the pulse with the finger-tip mark intention assessment model obtains after identifying to term vector to be identified Figure.Specifically, after obtaining term vector to be identified, the target intention identification model of above-mentioned intent model training method acquisition is used It identifies term vector to be identified, obtains target intention.

S84: identifying term vector to be identified using the entity recognition model that above-mentioned intent model training method obtains, Obtain target entity.

Wherein, target entity refers to the name entity obtained after entity recognition model identifies term vector to be identified.Tool Body, after obtaining term vector to be identified, the entity recognition model obtained using above-mentioned intent model training method is to be identified Term vector is identified, target entity is obtained.

S85: by target intention and target entity, art template if selection is corresponding with target intention and target entity, from Effectively words art is randomly selected in words art template, and effective words art is converted by target voice by text-to-speech technology, is controlled Robot plays target voice.

Specifically, after obtaining target intention and target entity, by target intention and target entity choose with it is corresponding Art template is talked about, in order to more fully meet customer need, each target intention and target entity are all provided with multiple words art templates. Server can randomly select one as effectively words art at random from words art template and be sent to corresponding robot.Robot is logical It crosses TTS technology effective words art is converted into target voice and plays to the client talked with him, so that pair of robot and client Words are to be intended to carry out with target entity according to client destination, more close to the idea of client, so that pair of client and robot It talks about more flexible, improves client's conversational quality.

Wherein, TTS technology, which refers to, is changed into Chinese characters spoken language simultaneously for computer oneself generation or externally input text information The technology of output.Target voice, which refers to, converts the voice for being used for talking with client for effective words art by TTS technology.

Intension recognizing method provided by the invention acquires customer voice by robot, then to the customer voice of acquisition Voice pretreatment is carried out, the interference voice in customer voice is removed, only retains and contains the apparent phonological component of vocal print consecutive variations, Voice i.e. to be identified improves the efficiency and accuracy rate of Text Pretreatment.Then Text Pretreatment is carried out to voice to be identified, obtained Term vector to be identified, and obtained respectively by the identification of target intention identification model and entity recognition model to term vector to be identified The corresponding target intention of each model and target entity are taken, to randomly select and target intention and target entity from words art template Corresponding effective words art is exchanged with client, improves the flexibility of robot and client's dialogue.

In one embodiment, a kind of intention assessment device is provided, is intended to know in the intention assessment device and above-described embodiment Other method corresponds.As shown in figure 9, the intention assessment device includes customer voice processing module 81, speech processes to be identified Module 82, target intention obtain module 83, target entity obtains module 84 and target voice processing module 85.Each functional module is detailed Carefully it is described as follows:

It is pre- to carry out voice to customer voice for obtaining the customer voice of robot acquisition for customer voice processing module 81 Processing, obtains voice to be identified.

Speech processing module 82 to be identified obtains term vector to be identified for carrying out Text Pretreatment to voice to be identified.

Target intention obtains module 83, the target intention identification model for using above-mentioned intent model training method to obtain It identifies term vector to be identified, obtains target intention.

Target entity obtains module 84, and the entity recognition model for being obtained using above-mentioned intent model training method is treated Identification term vector is identified, target entity is obtained.

Target voice processing module 85, for by target intention and target entity, selection and target intention and target to be real Art template if body is corresponding randomly selects effectively words art from words art template, and will effectively talk about art by text-to-speech technology It is converted into target voice, control robot plays target voice.

Specific about intention assessment device limits the restriction that may refer to above for intension recognizing method, herein not It repeats again.Modules in above-mentioned intention assessment device can be realized fully or partially through software, hardware and combinations thereof.On Stating each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also store in a software form In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 10.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is for storing the data that intent model training method is related to, alternatively, the database of the computer equipment The data being related to for storing intension recognizing method.The network interface of the computer equipment is used to pass through net with external terminal Network connection communication.To realize a kind of intent model training method when the computer program is executed by processor, alternatively, the computer To realize a kind of intension recognizing method when program is executed by processor.

In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor realize the intent model of above-described embodiment when executing computer program Training method, such as step shown in the step of step S10- shown in Fig. 2 S70 or Fig. 3 to Fig. 6, to avoid repeating, here It repeats no more.Alternatively, the processor is realized in above-mentioned this embodiment of intent model training device when executing computer program The function of each module/unit, such as shown in Fig. 7, which includes that received pronunciation obtains module 10, standard speech Sound processing module 20, target text processing module 30, original intent identification model training module 40, original intent identification model are surveyed Die trial block 50, name entity labeling module 60 and entity recognition model obtain the function of module 70, to avoid repeating, here no longer It repeats.

In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor realize the intention assessment of above-described embodiment when executing computer program Method, such as the step S85 of step S81- shown in Fig. 8, to avoid repeating, which is not described herein again.Alternatively, the processor executes calculating The function of each module/unit in above-mentioned this embodiment of intention assessment device is realized when machine program, such as shown in Fig. 9, the meaning Figure identification device includes customer voice processing module 81, speech processing module to be identified 82, target intention acquisition module 83, target Entity obtains the function of module 84 and target voice processing module 85, and to avoid repeating, which is not described herein again.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes the intent model training method of above-described embodiment, such as the step of step S10- shown in Fig. 2 when being executed by processor Step shown in S70 or Fig. 3 to Fig. 6, to avoid repeating, which is not described herein again.Alternatively, the computer program is processed Device realizes the function of each module/unit in above-mentioned this embodiment of intent model training device when executing, such as shown in Fig. 7, The intent model training device includes that received pronunciation obtains module 10, received pronunciation processing module 20, target text processing module 30, original intent identification model training module 40, original intent identification model test module 50, name 60 and of entity labeling module Entity recognition model obtains the function of module 70, and to avoid repeating, which is not described herein again.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes the intension recognizing method of above-described embodiment, such as such as step of step S81- shown in Fig. 8 when being executed by processor The step of S85, to avoid repeating, which is not described herein again.Alternatively, the computer program realizes above-mentioned intention when being executed by processor The function of each module/unit in this embodiment of identification device, such as shown in Fig. 9, which includes client's language Sound processing module 81, speech processing module to be identified 82, target intention obtain module 83, target entity obtains module 84 and target The function of speech processing module 85, to avoid repeating, which is not described herein again.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of intent model training method characterized by comprising

It obtains received pronunciation and the received pronunciation is labeled, the received pronunciation carries corresponding intention labels；

The target text is converted into target text term vector, and the target text term vector is divided into training text word Vector sum test text term vector；

The training text term vector and the intention labels are input to and are intended to be trained in training pattern, obtains original meaning Figure identification model, the training pattern that is intended to is that the Seq2Seq model formed after attention mechanism is added；

The test text term vector and the intention labels are input in the original intent identification model, the original is obtained Original intent is identified mould as a result, if the output result is greater than default accuracy rate by the corresponding output of beginning intention assessment model Type is determined as target intention identification model；

By the corresponding target text term vector of the target text and corresponding entity tag be input in entity training pattern into Row training, obtains entity recognition model.

2. intent model training method as described in claim 1, which is characterized in that described to carry out text to the received pronunciation Pretreatment obtains target text, comprising:

Using speech-to-text technology, the received pronunciation is converted into urtext；

The first pretreatment is carried out to the urtext using regular expression, and will be pre- by first according to default Cutting Length Urtext that treated is cut into effective text；

Second pretreatment is carried out to effective text using participle tool, obtains target text.

3. intent model training method as described in claim 1, which is characterized in that it is described by the training text term vector and The intention labels, which are input to, to be intended to be trained in training pattern, obtains original intent identification model, comprising:

To the weight and biasing progress Initialize installation in the intention training pattern；

The training text term vector and the intention labels are input to and are intended to be trained in training pattern, update intent instruction Practice the weight in model and biasing, obtains original intent identification model.

4. intent model training method as claimed in claim 3, which is characterized in that it is described by the training text term vector and The intention labels, which are input to, to be intended to be trained in training pattern, and weight and biasing in update intent training pattern obtain Original intent identification model, comprising:

The training text term vector and the intention labels are input to the input layer for being intended to training pattern, the input layer will The training text term vector got is input in hidden layer forward, and hidden layer is corresponding forward described in acquisition exports forward；

Using attention mechanism to it is described forward output carry out Automobile driving, and using encoding mechanism to it is described export forward into Row coding, obtains semantic vector；

The semantic vector is decoded using decoding mechanism, and using attention mechanism to the decoded semantic vector Automobile driving is carried out, the output backward of hidden layer backward is obtained；

The output backward of the hidden layer backward is input in output layer, model output is obtained；

Loss function is constructed based on model output and the intention labels, is recycled forward based on the loss function to described Neural network and the Recognition with Recurrent Neural Network backward carry out error back propagation, adjust Recognition with Recurrent Neural Network forward and recycle backward The weight of neural network and biasing obtain original intent identification model.

5. intent model training method as described in claim 1, which is characterized in that described by the corresponding mesh of the target text Mark text term vector and corresponding entity tag are input in entity training pattern and are trained, and obtain entity recognition model, packet It includes:

The target text term vector and the entity tag are input in entity training pattern, training entity is obtained；

The name entity error for calculating the trained entity Yu the entity tag, when the name entity error is in preset reality In body error range, then using the entity training pattern as entity recognition model.

6. a kind of intension recognizing method characterized by comprising

The term vector to be identified is identified using any one of the claim 1-5 target intention identification model, obtains target meaning Figure；

The term vector to be identified is identified using any one of the claim 1-5 entity recognition model, obtains target Entity；

By target intention and target entity, art template if selection is corresponding with target intention and target entity, from words art template In randomly select effectively words art, and effective words art is converted by target voice by text-to-speech technology, described in control Robot plays the target voice.

7. a kind of intent model training device characterized by comprising

Received pronunciation obtains module, and for obtaining received pronunciation and being labeled to the received pronunciation, the received pronunciation is taken With corresponding intention labels；

Original intent identification model training module, for the training text term vector and the intention labels to be input to intention It is trained in training pattern, obtains original intent identification model, the intention training pattern is that shape after attention mechanism is added At Seq2Seq model；

Original intent identification model test module, it is described for the test text term vector and the intention labels to be input to In original intent identification model, the corresponding output of the original intent identification model is obtained as a result, if the output result is greater than Default accuracy rate, then be determined as target intention identification model for original intent identification model；

Entity labeling module is named, for being named entity mark to the target text, so that the target text carries There is entity tag；

Entity recognition model obtains module, is used for the corresponding target text term vector of the target text and corresponding entity mark Label, which are input in entity training pattern, to be trained, and entity recognition model is obtained.

8. a kind of intention assessment device characterized by comprising

Customer voice processing module carries out voice to the customer voice and locates in advance for obtaining the customer voice of robot acquisition Reason, obtains voice to be identified；

Speech processing module to be identified obtains term vector to be identified for carrying out Text Pretreatment to the voice to be identified；

Target intention obtains module, for using any one of claim 1-5 target intention identification model identify it is described to It identifies term vector, obtains target intention；

Target entity obtains module, for using any one of the claim 1-5 entity recognition model to the word to be identified Vector is identified, target entity is obtained；

Target voice processing module, for selecting corresponding with target intention and target entity by target intention and target entity If art template, effectively words art is randomly selected from words art template, and will effective words art turn by text-to-speech technology It changes target voice into, controls the robot and play the target voice.

9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to Any one of 5 intent model training methods, alternatively, the processor realizes such as claim when executing the computer program 6 intension recognizing methods.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In, the intent model training method as described in any one of claim 1 to 5 is realized when the computer program is executed by processor, Alternatively, the computer program realizes intension recognizing method as claimed in claim 6 when being executed by processor.