Summary of the invention
The embodiment of the present invention creatively provides a kind of rule-based to effectively solve problems of the prior art
Semantic analysis, device and the readable storage medium storing program for executing of fusion.
The present invention provides a kind of semantic analysis of rule-based fusion, which comprises obtains text data;It is right
The text data carries out pre-training on data set, obtains multiple word vectors and/or term vector;Pass through system Rules Engine
With the regular vector for obtaining corresponding to each described word vector and/or term vector;By each described word vector and/or term vector
It is combined with corresponding regular vector, obtains corresponding mix vector;Obtained all mix vectors are successively used as and are followed
The input of ring neural network obtains the intent data for characterizing the text data.
Preferably, pre-training is carried out on data set to the text data, obtains multiple word vectors and/or term vector,
Include: that word segmentation processing is carried out to the text data, obtains word segmentation processing result;By the word segmentation processing result on data set
It is pre-processed, generates multiple word vectors and/or term vector;
Preferably, obtained all mix vectors are successively used as to the input of Recognition with Recurrent Neural Network, are obtained for characterizing
The intent data of the text data, comprising: obtained mix vector is sequentially inputted to Recognition with Recurrent Neural Network layer and is compiled
Code, obtains the first coding result;The strictly all rules Vector Groups that matching is obtained, which merge, carries out feature coding, obtains the second coding knot
Fruit;Obtained first coding result and the second coding result are merged, fusion coding result is obtained;The fusion is compiled
Code result is added to Softmax layers of progress intention assessment, to obtain characterizing the intent data of the text data.
Preferably, the strictly all rules vector matched is combined, carries out feature coding again after operating by pondization.
Preferably, during obtained mix vector is sequentially inputted to Recognition with Recurrent Neural Network layer being encoded,
The method also includes: using first coding result sequentially obtained as the input of condition random field CRF, to obtain pair
Answer the sequence labelling of the word and/or term vector.
Another aspect of the present invention provides a kind of semantic analysis device of rule-based fusion, and described device includes: that data are adopted
Collect module, for obtaining text data;Word and/or term vector generation module, for enterprising in data set to the text data
Row pre-training obtains multiple word vectors and/or term vector;Regular vector generation module, for being matched by system Rules Engine
Obtain corresponding to the regular vector of each described word vector and/or term vector;Composite module, for by word vector and/or word to
Amount is combined with corresponding regular vector, obtains mix vector;Intention assessment module, for by it is obtained it is all combine to
Amount obtains the intent data for characterizing the text data successively as the input of Recognition with Recurrent Neural Network.
Preferably, the word and/or term vector generation module are specifically used for: word segmentation processing is carried out to the text data,
Obtain word segmentation processing result;The word segmentation processing result is subjected to pre-training on data set, obtains multiple word vectors and/or word
Vector.
Preferably, the intention assessment module is specifically used for: obtained mix vector is sequentially inputted to circulation nerve
Network layer is encoded, and the first coding result is obtained;The strictly all rules vector combination that matching is obtained, after being operated by pondization again
Feature coding is carried out, the second coding result is obtained;Obtained first coding result and the second coding result are merged, obtained
To fusion coding result;The fusion coding result is added to Softmax layers of progress intention assessment, to obtain described in characterization
The intent data of text data.
Preferably, described device further includes recognition sequence module, using first coding result sequentially obtained as item
The input of part random field CRF, to obtain corresponding to the sequence labelling of the word and/or term vector.
Another aspect of the present invention also provides a kind of computer readable storage medium, and the storage medium includes one group of computer
Executable instruction, when executed for executing the semantic analysis of the rule-based fusion.
Semantic analysis, device and the readable storage medium storing program for executing of the rule-based fusion of the embodiment of the present invention, first will be literary
Notebook data is pre-processed by data set, obtains multiple word vectors and/or term vector, then by regulation engine match corresponding word to
Each word vector and/or term vector, are then combined by the regular vector of amount and/or term vector with rule of correspondence vector,
It forms mix vector and obtains the intention for characterizing the text data using mix vector as the input of Recognition with Recurrent Neural Network
Data for the prior art that compares, can make output data more by combining regulation engine on the basis of depth model
Add precisely.
It is to be appreciated that the teachings of the present invention does not need to realize whole beneficial effects recited above, but it is specific
Technical solution may be implemented specific technical effect, and other embodiments of the invention can also be realized and not mentioned above
Beneficial effect.
Specific embodiment
To keep the purpose of the present invention, feature, advantage more obvious and understandable, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
It is only a part of the embodiment of the present invention, and not all embodiments.Based on the embodiments of the present invention, those skilled in the art are not having
Every other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.
As shown in Figure 1, the embodiment of the present invention provides a kind of semantic analysis of rule-based fusion, method includes:
Step 101, data acquire: obtaining text data.
Step 102, preprocessed text data: the text data is carried out on data set pre-training obtain multiple words to
Amount and/or term vector.
Step 103, loading rule engine: match to obtain by system Rules Engine correspond to each described word vector sum/
Or the regular vector of term vector.
Step 104, vector pre-processes: each described word vector and/or term vector are carried out with corresponding regular vector
Combination, obtains corresponding mix vector.
Step 105, it is intended that identification: obtained all mix vectors are successively used as to the input of Recognition with Recurrent Neural Network, are obtained
To the intent data for characterizing the text data, specific manifestation are as follows: obtained mix vector to be sequentially inputted to recycle
Neural net layer is encoded, and the first coding result is obtained;The strictly all rules Vector Groups that matching is obtained, which merge, carries out feature volume
Code, obtains the second coding result;Obtained first coding result and the second coding result are merged, fusion coding is obtained
As a result;The fusion coding result is added to Softmax layers of progress intention assessment, to obtain characterizing the text data
Intent data.
In embodiments of the present invention, text data is obtained by step 101 first, text data is the text of specific area
Notebook data, specific area refer to same type of data or resource, and the service provided around these data or resource, than
Such as " dining room ", " hotel ", " plane ticket ", " train ticket ", " yellow pages " etc., text data can be third party's corpus such as
Wiki or the data crawled from network etc..
By step 102, the text data that will acquire passes through segmenter such as jieba or the machine learning based on statistics
Algorithm obtains at least one word and/or word, then the word and/or word that will acquire for the data set of term vector pre-training by obtaining
To corresponding word and/or term vector, data set is preferably encyclopaedia data set.
By step 103, loading rule engine matches the rule of word and/or word in corresponding text by regulation engine, will
All rules being matched to are with one-hot coded representation.
Further, regular vector matrix is initialized, and passes through the regular vector matrix after one-hot vector and initialization
It is multiplied to obtain the regular vector of specified rule.
By step 104, obtained multiple words and/or term vector are combined with corresponding regular vector, are obtained
Mix vector, combined concrete mode are splicing, are illustrated: if obtained word and/or term vector w are [a1, a2], it is corresponding
Regular vector r is [b1, b2], and spliced mix vector X is [w, r], i.e., [a1, a2, b1, b2].
By step 105, as shown in connection with fig. 2, X in figuret-1, Xt, Xt+1Mix vector is respectively indicated at t-1, t, t+1 moment
Input value, ht-1, ht, ht+1Mix vector is respectively indicated in the hidden layer state vector value at t-1, t, t+1 moment, yt-1, yt, yt+1
Mix vector is respectively indicated in the output valve at t-1, t, t+1 moment, w1, w2 are weighted value, wherein hidden state vector h is in t
The value at quarter are as follows: ht=f (w1Xt+w2ht-1)。
Obtained multiple mix vector X are multiplied with weight w1 and multiplied result is successively used as to the defeated of Recognition with Recurrent Neural Network
Enter, in the present embodiment, the preferably bidirectional Recognition with Recurrent Neural Network of Recognition with Recurrent Neural Network obtains the first coding result i.e. last moment
Hidden layer state vector value hn。
The strictly all rules vector such as r1 that matching is obtained, r2... are combined, and combination is splicing, such as r1, r2, r3
The obtained vector R of combination be [r1, r2, r3], the vector R that splicing is obtained is by the pond average pooling or max
Pooling Chi Huahou obtains feature coding R ', and it is the second coding result that this feature, which encodes R ', then compiles the last moment first
Code result hidden layer state vector hnIt is merged with the second coding result, amalgamation mode is splicing, the connecting method and above-mentioned
Connecting method is consistent, obtains fusion coding result F, and fusion coding result F is finally input into softmax layers, carries out intention knowledge
Not, the intent data of characterization text data is obtained.
It further, after neural circuitry network, is further including step 106, sequence labelling: by the obtain first coding
As a result condition random field CRF is sequentially passed through, to obtain the sequence labelling of corresponding word and/or term vector.
By step 106, according to step 105, the first coding result that each circulation generates is input into condition random field
CRF obtains the sequence labelling for corresponding to each word and/or term vector.
Through the above steps, regulation engine will be combined on the basis of existing depth model, the intention of output can be made
Identification and sequence labelling are more accurate.
Based on a kind of semantic analysis of rule-based fusion mentioned above, the present invention additionally provides one kind based on rule
The semantic analysis device then merged.
As shown in figure 3, device includes:
Data acquisition module 301, for obtaining text data.
Word and/or term vector generation module 302 are generated for pre-process on data set to the text data
Multiple word vectors and/or term vector.
Regular vector generation module 303 corresponds to each described word vector for matching to obtain by system Rules Engine
And/or the regular vector of term vector.
Composite module 304 is combined for word vector and/or term vector to be combined with corresponding regular vector
Vector.
Intention assessment module 305, for obtained all mix vectors to be successively used as to the input of Recognition with Recurrent Neural Network,
Obtain the intent data for characterizing the text data, specific manifestation are as follows: be sequentially inputted to follow by obtained mix vector
Ring neural net layer is encoded, and the first coding result is obtained;The strictly all rules vector combination that matching is obtained, passes through Chi Huacao
Feature coding is carried out after work again, obtains the second coding result;Obtained first coding result and the second coding result are carried out
Fusion obtains fusion coding result;The fusion coding result is added to SoftMax layers of progress intention assessment, to obtain
Characterize the intent data of the text data.
In embodiments of the present invention, the text data of specific area is obtained by data acquisition module 301 first.
By word and/or term vector generation module 302, the text data that will acquire by segmenter such as jieba or
Machine learning algorithm based on statistics obtains at least one word and/or word, then the word and/or word that will acquire pass through for instructing in advance
The data set for practicing term vector obtains corresponding word and/or term vector, and data set is preferably encyclopaedia data set.
By regular vector generation module 303, loading rule engine, by regulation engine match in corresponding text word and/
Or the rule of word, by all rules being matched to one-hot coded representation.
Further, regular vector matrix is initialized, and passes through the regular vector matrix after one-hot vector and initialization
It is multiplied to obtain the regular vector of specified rule.
Side by composite module 304, by obtained multiple words and/or term vector with corresponding regular vector to splice
Formula is combined, and obtains mix vector.
By intention assessment module 305, obtained multiple mix vectors are successively used as to the input of Recognition with Recurrent Neural Network, this
In embodiment, Recognition with Recurrent Neural Network is bidirectional circulating neural network, obtains the hidden layer state of the first coding result i.e. last moment
Vector value.
The strictly all rules vector such as r1 that matching is obtained, r2... are combined, and combination is splicing, such as r1, r2, r3
The obtained vector R of combination be [r1, r2, r3], the vector R that splicing is obtained is by the pond average pooling or max
Pooling Chi Huahou obtains feature coding R ', and it is the second coding result that this feature, which encodes R ', then compiles the last moment first
Code result and the second coding result are merged, and amalgamation mode is splicing, and the connecting method is consistent with above-mentioned connecting method, obtains
To fusion coding result F, fusion coding result F is finally input into SoftMax layers, carries out intention assessment, obtains characterization text
The intent data of data.
Further, device further includes recognition sequence module 306: by the first obtained coding result sequentially input condition with
Airport CRF, to obtain corresponding to the sequence labelling of the word and/or term vector.
By recognition sequence module 306, the first coding result obtained after recycling each time is input into condition random field
CRF obtains the sequence labelling for corresponding to each word and/or word vector.
By above-mentioned module, regulation engine is combined on the basis of existing depth model, the intention of output can be made
Identification and sequence labelling are more accurate.
Based on a kind of semantic analysis and device of rule-based fusion mentioned above, the present invention additionally provides one kind
Computer readable storage medium, storage medium include a group of computer-executable instructions, which is based on for any one one kind
The semantic analysis of rule fusion.
Semantic analysis, device and the readable storage medium storing program for executing of the rule-based fusion of the embodiment of the present invention, first will be literary
Notebook data obtains multiple word vectors and/or term vector by data set pre-training, then by regulation engine match corresponding word to
Each word vector and/or term vector, are then combined by the regular vector of amount and/or term vector with rule of correspondence vector,
It forms mix vector and obtains the intention for characterizing the text data using mix vector as the input of Recognition with Recurrent Neural Network
Data for the prior art that compares, can make output data more by combining regulation engine on the basis of depth model
Add precisely.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described
It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this
The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples
Sign is combined.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance
Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden
It include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise
Clear specific restriction.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.