CN110532566A - A kind of implementation method that vertical field Question sentence parsing calculates - Google Patents
A kind of implementation method that vertical field Question sentence parsing calculates Download PDFInfo
- Publication number
- CN110532566A CN110532566A CN201910825709.8A CN201910825709A CN110532566A CN 110532566 A CN110532566 A CN 110532566A CN 201910825709 A CN201910825709 A CN 201910825709A CN 110532566 A CN110532566 A CN 110532566A
- Authority
- CN
- China
- Prior art keywords
- question sentence
- vertical field
- model
- sentence
- term vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000004458 analytical method Methods 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 13
- 241000531229 Caryopteris x clandonensis Species 0.000 claims abstract description 9
- 235000001486 Salvia viridis Nutrition 0.000 claims abstract description 9
- 150000001875 compounds Chemical class 0.000 claims abstract description 9
- 230000008878 coupling Effects 0.000 claims description 3
- 238000010168 coupling process Methods 0.000 claims description 3
- 238000005859 coupling reaction Methods 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 abstract description 3
- 238000005259 measurement Methods 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of implementation method that vertical field Question sentence parsing calculates, and is related to natural language processing and information retrieval field;The term vector model of vertical field question sentence is obtained using word2vec training, interdependent syntactic analysis is carried out to vertical field question sentence simultaneously, identify subject, predicate and object, question sentence structural similarity model is established using the similarity of subject, predicate and object, bluebeard compound vector model and question sentence structural similarity model analyze Question sentence parsing;Word2vec and interdependent syntax are combined using the method for the present invention, improve the accuracy rate that Question sentence parsing calculates, and combine the interdependent syntactic analysis of question sentence, the syntactic structure information of parsing sentence, to reinforce the semantic understanding of question sentence, deeper Semantic Similarity Measurement is carried out to question sentence, improves the accuracy rate of question matching.
Description
Technical field
The present invention discloses a kind of implementation method that vertical field Question sentence parsing calculates, and is related to natural language processing retrieval neck
Domain.
Background technique
The fast development of internet, big data, to the accuracy of information retrieval, more stringent requirements are proposed.Traditional character
It is impossible to meet the requirements of people for the mode of String matching, how the technologies such as natural language processing to be utilized deeply to excavate sentence
Semantic information realizes the semantic matches ever more important between sentence.Similarity calculation can calculate between two sentences
Similarity degree, and calculating sentence similarity and training text just with the vocabularys region feature such as TF-IDF has very big relationship,
And vocabulary region feature can not indicate the semanteme of sentence, keep similarity degree calculating between sentence not accurate enough.Although simple depth
Learning method can obtain the semantic information between word, but for question sentence, the structural information of sentence, such as subject, guest
Language is more even more important than modifier, therefore still can not more accurately know similarity degree between sentence.
And the present invention provides a kind of implementation method that vertical field Question sentence parsing calculates, by word2vec and interdependent syntax
It combines, improves the accuracy rate that Question sentence parsing calculates, and combine the interdependent syntactic analysis of question sentence, the syntactic structure of parsing sentence is believed
Breath carries out deeper Semantic Similarity Measurement to question sentence, improves the standard of question matching to reinforce the semantic understanding of question sentence
True rate.
Summary of the invention
The present invention is directed to problem of the prior art, provides a kind of implementation method that vertical field Question sentence parsing calculates, will
Deep learning and syntactic analysis combine, and according to field question sentence feature, make full use of syntactic structure information and lexical semantic information,
Improve the accuracy rate of question matching.
Concrete scheme proposed by the present invention is:
A kind of implementation method that vertical field Question sentence parsing calculates obtains vertical field question sentence using word2vec training
Term vector model, while interdependent syntactic analysis is carried out to vertical field question sentence, identifies subject, predicate and object, utilize master
The term vector of language, predicate and object establishes question sentence structural similarity model, bluebeard compound vector model and question sentence structural similarity mould
Type analyzes Question sentence parsing.
User's question sentence in existing vertical Field Words and vertical field, user's question sentence group are collected in the implementation method
At question sentence collection, question sentence collection is trained using word2vec to obtain term vector model.
User's question sentence in existing vertical Field Words and vertical field, user's question sentence group are collected in the implementation method
At question sentence collection, interdependent syntactic analysis is carried out to question sentence collection, subject, predicate and object is identified, utilizes subject, predicate and object
Term vector establishes question sentence structural similarity model.
User's question sentence in vertical field is segmented and gone in the implementation method pretreatment of stop words, recycles
The data of pretreated vertical field question sentence establish term vector model and question sentence structural similarity model.
Term vector model carries out the calculating of Question sentence parsing in the implementation method, utilizes following formula:
Indicate that sentence 1 has n word, wiIndicate the term vector of i-th of word,It indicates to calculate being averaged for sentence 1
Term vector,Indicate that sentence 2 has m word,Indicate the average term vector of calculating sentence 2.
Term vector model and question sentence structural similarity models coupling carry out the meter of Question sentence parsing in the implementation method
It calculates, utilizes following formula:
sim(S1,S2)=μ1β1+μ2β2
μ1, μ2For adjustment parameter, it is adjusted according to the vertical field question sentence of application, in which:
β2=α1·sim1(w1,w2)+α2·sim2(w1,w2)+α3·sim3(w1,w2)
β2Sim in similarity1(w1,w2), sim2(w1,w2), sim3(w1,w2) it is respectively subject similarity, predicate is similar
Degree, object similarity, α1, α2, α3Respectively subject similarity, predicate similarity, the coefficient of object similarity.
Using any tool in NLPIR, Jieba, LTP to user's question sentence in vertical field in the implementation method
It is pre-processed.
A kind of realization device that vertical field Question sentence parsing calculates, including term vector model foundation unit, question sentence structure
Similarity model establishes unit, analytical unit,
Term vector model foundation unit obtains the term vector model of vertical field question sentence using word2vec training, asks simultaneously
Sentence structural similarity model foundation unit carries out interdependent syntactic analysis to vertical field question sentence, identifies subject, predicate and object,
Question sentence structural similarity model is established using the term vector of subject, predicate and object, analytical unit bluebeard compound vector model and is asked
Sentence structural similarity model, analyzes Question sentence parsing.
Usefulness of the present invention is:
The present invention provides a kind of implementation method that vertical field Question sentence parsing calculates, and is hung down using word2vec training
The term vector model of straight field question sentence, while interdependent syntactic analysis is carried out to vertical field question sentence, identify subject, predicate and guest
Language establishes question sentence structural similarity model, bluebeard compound vector model and question sentence structure using the term vector of subject, predicate and object
Similarity model analyzes Question sentence parsing;
The method of the present invention combines the subject in the similarity calculation and sentence structure of term vector, the sentence of predicate and object
The similarity of structure is considered, and is only considered that the method for core word is more reasonable than simple, is avoided the occurrence of subject, predicate, object identification
The situation for not coming out or identifying mistake, reinforces the semantic understanding of question sentence, improves the accuracy rate of Question sentence parsing calculating.
Detailed description of the invention
Fig. 1, which is the method for the present invention, obtains term vector model flow schematic diagram using word2vec training;
Fig. 2 carries out the flow diagram of Question sentence parsing calculating using the method for the present invention.
Specific embodiment
The present invention provides a kind of implementation method that vertical field Question sentence parsing calculates, and is hung down using word2vec training
The term vector model of straight field question sentence, while interdependent syntactic analysis is carried out to vertical field question sentence, identify subject, predicate and guest
Language establishes question sentence structural similarity model, bluebeard compound vector model and question sentence structure using the term vector of subject, predicate and object
Similarity model analyzes Question sentence parsing.
The realization device that a kind of vertical field Question sentence parsing corresponded to the above method calculates, including word are provided simultaneously
Vector model establishes unit, question sentence structural similarity model foundation unit, analytical unit,
Term vector model foundation unit obtains the term vector model of vertical field question sentence using word2vec training, asks simultaneously
Sentence structural similarity model foundation unit carries out interdependent syntactic analysis to vertical field question sentence, identifies subject, predicate and object,
Question sentence structural similarity model is established using the term vector of subject, predicate and object, analytical unit bluebeard compound vector model and is asked
Sentence structural similarity model, analyzes Question sentence parsing.
The present invention will be further explained below with reference to the attached drawings and specific examples, so that those skilled in the art can be with
It more fully understands the present invention and can be practiced, but illustrated embodiment is not as a limitation of the invention.
To the question sentence of information retrieval compares similarity in vertical field, such as in the navigation of movie navigation, music, utilize
The method of the present invention establishes term vector model and question sentence structural similarity model,
Term vector model wherein is obtained using question sentence in the vertical field of word2vec training: by the existing corresponding vertical of collection
Straight Field Words and user's question sentence carry out data normalization processing, then pre-process to data, and pretreated process includes adopting
The processing such as segmented, remove stop words with the NLPIR of the Chinese Academy of Sciences,
To pretreated data, determines model parameter and training data obtains term vector model and corresponding vertical field
Question sentence library;
Establish question sentence structural similarity model: by the data of the existing corresponding vertical Field Words and user's question sentence of collection
It is pre-processed, pretreated process includes the processing such as being segmented using the NLPIR of the Chinese Academy of Sciences, being removed stop words or using other
Pretreating tool such as Jieba, LTP etc., to treated, the progress of Stanford University's parser is can be used in user's question sentence data
Interdependent syntactic analysis identifies subject, predicate and object, establishes question sentence structure phase using the term vector of subject, predicate and object
Like degree model and the question sentence library in corresponding vertical field;
According to actual vertical field, suitable adjustment parameter is selected, determines term vector model and question sentence structural similarity
The distribution weight of model, bluebeard compound vector model and question sentence structural similarity model need to inquire to pretreated user
Question sentence is analyzed, and is calculated user and is inputted similarity of the question sentence in corresponding vertical field with question sentence in question sentence library.
In above process, when obtaining term vector model using question sentence in the vertical field of word2vec training, term vector mould
Type can carry out similarity calculation using following formula:
Indicate that sentence 1 has n word, wiIndicate the term vector of i-th of word,It indicates to calculate being averaged for sentence 1
Term vector,Indicate that sentence 2 has m word,The average term vector for indicating calculating sentence 2, calculates separately user's question sentence
In every question sentence term vector mean value,
Interdependent syntactic analysis is carried out to every question sentence in user's question sentence using LTP, subject, predicate and object is identified, obtains
To the term vector of the subject of every question sentence, predicate and object,
And term vector model and question sentence structural similarity models coupling can carry out similarity calculation using following formula:
sim(S1,S2)=μ1β1+μ2β2
S1, S2Indicate question sentence 1 and question sentence 2, μ1, μ2For adjustment parameter, i.e. term vector model and question sentence structural similarity model
Between weight ratio, be adjusted according to the vertical field of application, μ1, μ2It is empirically determined according to vertical corpus.
β2=α1·sim1(w1,w2)+α2·sim2(w1,w2)+α3·sim3(w1,w2)
β2In w1, w2Refer to the subject in two question sentences, predicate or object, β2Sim in similarity1(w1,w2), sim2
(w1,w2), sim3(w1,w2) it is respectively subject similarity, predicate similarity, object similarity, α1, α2, α3Respectively subject is similar
Degree, predicate similarity, the coefficient of object similarity, α1, α2, α3It is empirically determined according to vertical corpus.
After user inputs the question sentence for needing to inquire, after data prediction, it is equal to calculate pretreated question sentence term vector
Value carries out the similarity calculation with question sentence in question sentence library using the model formation in the above process.
Embodiment described above is only to absolutely prove preferred embodiment that is of the invention and being lifted, protection model of the invention
It encloses without being limited thereto.Those skilled in the art's made equivalent substitute or transformation on the basis of the present invention, in the present invention
Protection scope within.Protection scope of the present invention is subject to claims.
Claims (8)
1. a kind of implementation method that vertical field Question sentence parsing calculates, it is characterized in that vertically being led using word2vec training
The term vector model of domain question sentence, while interdependent syntactic analysis is carried out to vertical field question sentence, identify subject, predicate and object,
Question sentence structural similarity model, bluebeard compound vector model and question sentence structure phase are established using the term vector of subject, predicate and object
Like degree model, Question sentence parsing is analyzed.
2. implementation method according to claim 1, it is characterized in that collecting existing vertical Field Words and vertical field
User's question sentence, user's question sentence are formed question sentence collection, are trained to obtain term vector model to question sentence collection using word2vec.
3. implementation method according to claim 1 or 2, it is characterized in that collecting existing vertical Field Words and vertical field
User's question sentence, user's question sentence forms question sentence collection, carries out interdependent syntactic analysis to question sentence collection, identify subject, predicate and object,
Question sentence structural similarity model is established using the term vector of subject, predicate and object.
4. implementation method according to claim 3, it is characterized in that segmenting and going to stop to user's question sentence in vertical field
The pretreatment of word recycles the data of pretreated vertical field question sentence to establish term vector model and question sentence structural similarity
Model.
5. implementation method according to claim 1 or 4, it is characterized in that term vector model carries out the calculating of Question sentence parsing,
Utilize following formula:
Indicate that sentence 1 has n word, wiIndicate the term vector of i-th of word,Indicate calculate sentence 1 average word to
Amount,Indicate that sentence 2 has m word,Indicate the average term vector of calculating sentence 2.
6. implementation method according to claim 5, it is characterized in that term vector model and question sentence structural similarity models coupling
The calculating for carrying out Question sentence parsing, utilizes following formula:
sim(S1,S2)=μ1β1+μ2β2
μ1, μ2For adjustment parameter, it is adjusted according to the vertical field of application, in which:
β2=α1·sim1(w1,w2)+α2·sim2(w1,w2)+α3·sim3(w1,w2)
β2Sim in similarity1(w1,w2), sim2(w1,w2), sim3(w1,w2) it is respectively subject similarity, predicate similarity, object
Similarity, α1, α2, α3Respectively subject similarity, predicate similarity, the coefficient of object similarity.
7. implementation method according to claim 6, it is characterized in that utilizing any tool pair in NLPIR, Jieba, LTP
User's question sentence in vertical field pre-processes.
8. a kind of realization device that vertical field Question sentence parsing calculates is asked it is characterized in that including term vector model foundation unit
Sentence structural similarity model foundation unit, analytical unit,
Term vector model foundation unit obtains the term vector model of vertical field question sentence, while question sentence knot using word2vec training
Structure similarity model establishes unit and carries out interdependent syntactic analysis to vertical field question sentence, identifies subject, predicate and object, utilizes
The term vector of subject, predicate and object establishes question sentence structural similarity model, analytical unit bluebeard compound vector model and question sentence knot
Structure similarity model analyzes Question sentence parsing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910825709.8A CN110532566B (en) | 2019-09-03 | 2019-09-03 | Method for realizing similarity calculation of questions in vertical field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910825709.8A CN110532566B (en) | 2019-09-03 | 2019-09-03 | Method for realizing similarity calculation of questions in vertical field |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110532566A true CN110532566A (en) | 2019-12-03 |
CN110532566B CN110532566B (en) | 2023-05-02 |
Family
ID=68666416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910825709.8A Active CN110532566B (en) | 2019-09-03 | 2019-09-03 | Method for realizing similarity calculation of questions in vertical field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110532566B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560481A (en) * | 2020-12-25 | 2021-03-26 | 北京百度网讯科技有限公司 | Statement processing method, device and storage medium |
CN112699663A (en) * | 2021-01-07 | 2021-04-23 | 中通天鸿(北京)通信科技股份有限公司 | Semantic understanding system based on combination of multiple algorithms |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004287679A (en) * | 2003-03-20 | 2004-10-14 | Fuji Xerox Co Ltd | Natural language processing system and natural language processing method and computer program |
CN105843897A (en) * | 2016-03-23 | 2016-08-10 | 青岛海尔软件有限公司 | Vertical domain-oriented intelligent question and answer system |
CN109062892A (en) * | 2018-07-10 | 2018-12-21 | 东北大学 | A kind of Chinese sentence similarity calculating method based on Word2Vec |
CN109271626A (en) * | 2018-08-31 | 2019-01-25 | 北京工业大学 | Text semantic analysis method |
-
2019
- 2019-09-03 CN CN201910825709.8A patent/CN110532566B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004287679A (en) * | 2003-03-20 | 2004-10-14 | Fuji Xerox Co Ltd | Natural language processing system and natural language processing method and computer program |
CN105843897A (en) * | 2016-03-23 | 2016-08-10 | 青岛海尔软件有限公司 | Vertical domain-oriented intelligent question and answer system |
CN109062892A (en) * | 2018-07-10 | 2018-12-21 | 东北大学 | A kind of Chinese sentence similarity calculating method based on Word2Vec |
CN109271626A (en) * | 2018-08-31 | 2019-01-25 | 北京工业大学 | Text semantic analysis method |
Non-Patent Citations (1)
Title |
---|
曹莉丽等: "融合词向量的多特征问句相似度计算方法研究", 《现代计算机(专业版)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560481A (en) * | 2020-12-25 | 2021-03-26 | 北京百度网讯科技有限公司 | Statement processing method, device and storage medium |
CN112560481B (en) * | 2020-12-25 | 2024-05-31 | 北京百度网讯科技有限公司 | Statement processing method, device and storage medium |
CN112699663A (en) * | 2021-01-07 | 2021-04-23 | 中通天鸿(北京)通信科技股份有限公司 | Semantic understanding system based on combination of multiple algorithms |
Also Published As
Publication number | Publication date |
---|---|
CN110532566B (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110134772A (en) | Medical text Relation extraction method based on pre-training model and fine tuning technology | |
CN106383817B (en) | Utilize the Article Titles generation method of distributed semantic information | |
CN107330011B (en) | The recognition methods of the name entity of more strategy fusions and device | |
CN110704621B (en) | Text processing method and device, storage medium and electronic equipment | |
CN109062892A (en) | A kind of Chinese sentence similarity calculating method based on Word2Vec | |
CN109408642A (en) | A kind of domain entities relation on attributes abstracting method based on distance supervision | |
CN108549634A (en) | A kind of Chinese patent text similarity calculating method | |
CN103853834B (en) | Text structure analysis-based Web document abstract generation method | |
CN103116578A (en) | Translation method integrating syntactic tree and statistical machine translation technology and translation device | |
CN110362678A (en) | A kind of method and apparatus automatically extracting Chinese text keyword | |
CN110532566A (en) | A kind of implementation method that vertical field Question sentence parsing calculates | |
He et al. | Question answering over linked data using first-order logic | |
CN110008323A (en) | A kind of the problem of semi-supervised learning combination integrated study, equivalence sentenced method for distinguishing | |
CN105975475A (en) | Chinese phrase string-based fine-grained thematic information extraction method | |
CN115238029A (en) | Construction method and device of power failure knowledge graph | |
CN110532568A (en) | Chinese Word Sense Disambiguation method based on tree feature selecting and transfer learning | |
CN105389303B (en) | A kind of automatic fusion method of heterologous corpus | |
CN106339371A (en) | English and Chinese word meaning mapping method and device based on word vectors | |
CN111597349A (en) | Rail transit standard entity relation automatic completion method based on artificial intelligence | |
CN101876985A (en) | WEB text sentiment theme recognizing method based on mixed model | |
CN112801217A (en) | Text similarity judgment method and device, electronic equipment and readable storage medium | |
CN107622047A (en) | A kind of extraction of design decision knowledge and expression | |
CN106095733B (en) | A kind of improved accurate extracting method of natural language feature based on deep learning | |
CN108733658A (en) | Institution term Chinese-English translation method | |
CN109101591A (en) | The phonetic meaning of a word search method in knowledge based library |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20230411 Address after: No. 1036, Shandong high tech Zone wave road, Ji'nan, Shandong Applicant after: Inspur Genersoft Co.,Ltd. Address before: 250100 No. 2877 Kehang Road, Sun Village Town, Jinan High-tech District, Shandong Province Applicant before: SHANDONG INSPUR GENESOFT INFORMATION TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |