CN110532566A

CN110532566A - A kind of implementation method that vertical field Question sentence parsing calculates

Info

Publication number: CN110532566A
Application number: CN201910825709.8A
Authority: CN
Inventors: 彭云龙; 翟超
Original assignee: Shandong Inspur Genersoft Information Technology Co Ltd
Current assignee: Inspur General Software Co Ltd
Priority date: 2019-09-03
Filing date: 2019-09-03
Publication date: 2019-12-03
Anticipated expiration: 2039-09-03
Also published as: CN110532566B

Abstract

The present invention discloses a kind of implementation method that vertical field Question sentence parsing calculates, and is related to natural language processing and information retrieval field；The term vector model of vertical field question sentence is obtained using word2vec training, interdependent syntactic analysis is carried out to vertical field question sentence simultaneously, identify subject, predicate and object, question sentence structural similarity model is established using the similarity of subject, predicate and object, bluebeard compound vector model and question sentence structural similarity model analyze Question sentence parsing；Word2vec and interdependent syntax are combined using the method for the present invention, improve the accuracy rate that Question sentence parsing calculates, and combine the interdependent syntactic analysis of question sentence, the syntactic structure information of parsing sentence, to reinforce the semantic understanding of question sentence, deeper Semantic Similarity Measurement is carried out to question sentence, improves the accuracy rate of question matching.

Description

A kind of implementation method that vertical field Question sentence parsing calculates

Technical field

The present invention discloses a kind of implementation method that vertical field Question sentence parsing calculates, and is related to natural language processing retrieval neck Domain.

Background technique

The fast development of internet, big data, to the accuracy of information retrieval, more stringent requirements are proposed.Traditional character It is impossible to meet the requirements of people for the mode of String matching, how the technologies such as natural language processing to be utilized deeply to excavate sentence Semantic information realizes the semantic matches ever more important between sentence.Similarity calculation can calculate between two sentences Similarity degree, and calculating sentence similarity and training text just with the vocabularys region feature such as TF-IDF has very big relationship, And vocabulary region feature can not indicate the semanteme of sentence, keep similarity degree calculating between sentence not accurate enough.Although simple depth Learning method can obtain the semantic information between word, but for question sentence, the structural information of sentence, such as subject, guest Language is more even more important than modifier, therefore still can not more accurately know similarity degree between sentence.

And the present invention provides a kind of implementation method that vertical field Question sentence parsing calculates, by word2vec and interdependent syntax It combines, improves the accuracy rate that Question sentence parsing calculates, and combine the interdependent syntactic analysis of question sentence, the syntactic structure of parsing sentence is believed Breath carries out deeper Semantic Similarity Measurement to question sentence, improves the standard of question matching to reinforce the semantic understanding of question sentence True rate.

Summary of the invention

The present invention is directed to problem of the prior art, provides a kind of implementation method that vertical field Question sentence parsing calculates, will Deep learning and syntactic analysis combine, and according to field question sentence feature, make full use of syntactic structure information and lexical semantic information, Improve the accuracy rate of question matching.

Concrete scheme proposed by the present invention is:

A kind of implementation method that vertical field Question sentence parsing calculates obtains vertical field question sentence using word2vec training Term vector model, while interdependent syntactic analysis is carried out to vertical field question sentence, identifies subject, predicate and object, utilize master The term vector of language, predicate and object establishes question sentence structural similarity model, bluebeard compound vector model and question sentence structural similarity mould Type analyzes Question sentence parsing.

User's question sentence in existing vertical Field Words and vertical field, user's question sentence group are collected in the implementation method At question sentence collection, question sentence collection is trained using word2vec to obtain term vector model.

User's question sentence in existing vertical Field Words and vertical field, user's question sentence group are collected in the implementation method At question sentence collection, interdependent syntactic analysis is carried out to question sentence collection, subject, predicate and object is identified, utilizes subject, predicate and object Term vector establishes question sentence structural similarity model.

User's question sentence in vertical field is segmented and gone in the implementation method pretreatment of stop words, recycles The data of pretreated vertical field question sentence establish term vector model and question sentence structural similarity model.

Term vector model carries out the calculating of Question sentence parsing in the implementation method, utilizes following formula:

Indicate that sentence 1 has n word, w_iIndicate the term vector of i-th of word,It indicates to calculate being averaged for sentence 1 Term vector,Indicate that sentence 2 has m word,Indicate the average term vector of calculating sentence 2.

Term vector model and question sentence structural similarity models coupling carry out the meter of Question sentence parsing in the implementation method It calculates, utilizes following formula:

sim(S₁,S₂)=μ₁β₁+μ₂β₂

μ₁, μ₂For adjustment parameter, it is adjusted according to the vertical field question sentence of application, in which:

β₂=α₁·sim₁(w₁,w₂)+α₂·sim₂(w₁,w₂)+α₃·sim₃(w₁,w₂)

β₂Sim in similarity₁(w₁,w₂), sim₂(w₁,w₂), sim₃(w₁,w₂) it is respectively subject similarity, predicate is similar Degree, object similarity, α₁, α₂, α₃Respectively subject similarity, predicate similarity, the coefficient of object similarity.

Using any tool in NLPIR, Jieba, LTP to user's question sentence in vertical field in the implementation method It is pre-processed.

A kind of realization device that vertical field Question sentence parsing calculates, including term vector model foundation unit, question sentence structure Similarity model establishes unit, analytical unit,

Term vector model foundation unit obtains the term vector model of vertical field question sentence using word2vec training, asks simultaneously Sentence structural similarity model foundation unit carries out interdependent syntactic analysis to vertical field question sentence, identifies subject, predicate and object, Question sentence structural similarity model is established using the term vector of subject, predicate and object, analytical unit bluebeard compound vector model and is asked Sentence structural similarity model, analyzes Question sentence parsing.

Usefulness of the present invention is:

The present invention provides a kind of implementation method that vertical field Question sentence parsing calculates, and is hung down using word2vec training The term vector model of straight field question sentence, while interdependent syntactic analysis is carried out to vertical field question sentence, identify subject, predicate and guest Language establishes question sentence structural similarity model, bluebeard compound vector model and question sentence structure using the term vector of subject, predicate and object Similarity model analyzes Question sentence parsing；

The method of the present invention combines the subject in the similarity calculation and sentence structure of term vector, the sentence of predicate and object The similarity of structure is considered, and is only considered that the method for core word is more reasonable than simple, is avoided the occurrence of subject, predicate, object identification The situation for not coming out or identifying mistake, reinforces the semantic understanding of question sentence, improves the accuracy rate of Question sentence parsing calculating.

Detailed description of the invention

Fig. 1, which is the method for the present invention, obtains term vector model flow schematic diagram using word2vec training；

Fig. 2 carries out the flow diagram of Question sentence parsing calculating using the method for the present invention.

Specific embodiment

The present invention provides a kind of implementation method that vertical field Question sentence parsing calculates, and is hung down using word2vec training The term vector model of straight field question sentence, while interdependent syntactic analysis is carried out to vertical field question sentence, identify subject, predicate and guest Language establishes question sentence structural similarity model, bluebeard compound vector model and question sentence structure using the term vector of subject, predicate and object Similarity model analyzes Question sentence parsing.

The realization device that a kind of vertical field Question sentence parsing corresponded to the above method calculates, including word are provided simultaneously Vector model establishes unit, question sentence structural similarity model foundation unit, analytical unit,

The present invention will be further explained below with reference to the attached drawings and specific examples, so that those skilled in the art can be with It more fully understands the present invention and can be practiced, but illustrated embodiment is not as a limitation of the invention.

To the question sentence of information retrieval compares similarity in vertical field, such as in the navigation of movie navigation, music, utilize The method of the present invention establishes term vector model and question sentence structural similarity model,

Term vector model wherein is obtained using question sentence in the vertical field of word2vec training: by the existing corresponding vertical of collection Straight Field Words and user's question sentence carry out data normalization processing, then pre-process to data, and pretreated process includes adopting The processing such as segmented, remove stop words with the NLPIR of the Chinese Academy of Sciences,

To pretreated data, determines model parameter and training data obtains term vector model and corresponding vertical field Question sentence library；

Establish question sentence structural similarity model: by the data of the existing corresponding vertical Field Words and user's question sentence of collection It is pre-processed, pretreated process includes the processing such as being segmented using the NLPIR of the Chinese Academy of Sciences, being removed stop words or using other Pretreating tool such as Jieba, LTP etc., to treated, the progress of Stanford University's parser is can be used in user's question sentence data Interdependent syntactic analysis identifies subject, predicate and object, establishes question sentence structure phase using the term vector of subject, predicate and object Like degree model and the question sentence library in corresponding vertical field；

According to actual vertical field, suitable adjustment parameter is selected, determines term vector model and question sentence structural similarity The distribution weight of model, bluebeard compound vector model and question sentence structural similarity model need to inquire to pretreated user Question sentence is analyzed, and is calculated user and is inputted similarity of the question sentence in corresponding vertical field with question sentence in question sentence library.

In above process, when obtaining term vector model using question sentence in the vertical field of word2vec training, term vector mould Type can carry out similarity calculation using following formula:

Indicate that sentence 1 has n word, w_iIndicate the term vector of i-th of word,It indicates to calculate being averaged for sentence 1 Term vector,Indicate that sentence 2 has m word,The average term vector for indicating calculating sentence 2, calculates separately user's question sentence In every question sentence term vector mean value,

Interdependent syntactic analysis is carried out to every question sentence in user's question sentence using LTP, subject, predicate and object is identified, obtains To the term vector of the subject of every question sentence, predicate and object,

And term vector model and question sentence structural similarity models coupling can carry out similarity calculation using following formula:

sim(S₁,S₂)=μ₁β₁+μ₂β₂

S₁, S₂Indicate question sentence 1 and question sentence 2, μ₁, μ₂For adjustment parameter, i.e. term vector model and question sentence structural similarity model Between weight ratio, be adjusted according to the vertical field of application, μ₁, μ₂It is empirically determined according to vertical corpus.

β₂In w₁, w₂Refer to the subject in two question sentences, predicate or object, β₂Sim in similarity₁(w₁,w₂), sim₂ (w₁,w₂), sim₃(w₁,w₂) it is respectively subject similarity, predicate similarity, object similarity, α₁, α₂, α₃Respectively subject is similar Degree, predicate similarity, the coefficient of object similarity, α₁, α₂, α₃It is empirically determined according to vertical corpus.

After user inputs the question sentence for needing to inquire, after data prediction, it is equal to calculate pretreated question sentence term vector Value carries out the similarity calculation with question sentence in question sentence library using the model formation in the above process.

Embodiment described above is only to absolutely prove preferred embodiment that is of the invention and being lifted, protection model of the invention It encloses without being limited thereto.Those skilled in the art's made equivalent substitute or transformation on the basis of the present invention, in the present invention Protection scope within.Protection scope of the present invention is subject to claims.

Claims

1. a kind of implementation method that vertical field Question sentence parsing calculates, it is characterized in that vertically being led using word2vec training The term vector model of domain question sentence, while interdependent syntactic analysis is carried out to vertical field question sentence, identify subject, predicate and object, Question sentence structural similarity model, bluebeard compound vector model and question sentence structure phase are established using the term vector of subject, predicate and object Like degree model, Question sentence parsing is analyzed.

2. implementation method according to claim 1, it is characterized in that collecting existing vertical Field Words and vertical field User's question sentence, user's question sentence are formed question sentence collection, are trained to obtain term vector model to question sentence collection using word2vec.

3. implementation method according to claim 1 or 2, it is characterized in that collecting existing vertical Field Words and vertical field User's question sentence, user's question sentence forms question sentence collection, carries out interdependent syntactic analysis to question sentence collection, identify subject, predicate and object, Question sentence structural similarity model is established using the term vector of subject, predicate and object.

4. implementation method according to claim 3, it is characterized in that segmenting and going to stop to user's question sentence in vertical field The pretreatment of word recycles the data of pretreated vertical field question sentence to establish term vector model and question sentence structural similarity Model.

5. implementation method according to claim 1 or 4, it is characterized in that term vector model carries out the calculating of Question sentence parsing, Utilize following formula:

Indicate that sentence 1 has n word, w_iIndicate the term vector of i-th of word,Indicate calculate sentence 1 average word to Amount,Indicate that sentence 2 has m word,Indicate the average term vector of calculating sentence 2.

6. implementation method according to claim 5, it is characterized in that term vector model and question sentence structural similarity models coupling The calculating for carrying out Question sentence parsing, utilizes following formula:

sim(S₁,S₂)=μ₁β₁+μ₂β₂

μ₁, μ₂For adjustment parameter, it is adjusted according to the vertical field of application, in which:

β₂Sim in similarity₁(w₁,w₂), sim₂(w₁,w₂), sim₃(w₁,w₂) it is respectively subject similarity, predicate similarity, object Similarity, α₁, α₂, α₃Respectively subject similarity, predicate similarity, the coefficient of object similarity.

7. implementation method according to claim 6, it is characterized in that utilizing any tool pair in NLPIR, Jieba, LTP User's question sentence in vertical field pre-processes.

8. a kind of realization device that vertical field Question sentence parsing calculates is asked it is characterized in that including term vector model foundation unit Sentence structural similarity model foundation unit, analytical unit,

Term vector model foundation unit obtains the term vector model of vertical field question sentence, while question sentence knot using word2vec training Structure similarity model establishes unit and carries out interdependent syntactic analysis to vertical field question sentence, identifies subject, predicate and object, utilizes The term vector of subject, predicate and object establishes question sentence structural similarity model, analytical unit bluebeard compound vector model and question sentence knot Structure similarity model analyzes Question sentence parsing.