CN110532566A - A kind of implementation method that vertical field Question sentence parsing calculates - Google Patents

A kind of implementation method that vertical field Question sentence parsing calculates Download PDF

Info

Publication number
CN110532566A
CN110532566A CN201910825709.8A CN201910825709A CN110532566A CN 110532566 A CN110532566 A CN 110532566A CN 201910825709 A CN201910825709 A CN 201910825709A CN 110532566 A CN110532566 A CN 110532566A
Authority
CN
China
Prior art keywords
question sentence
vertical field
model
sentence
term vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910825709.8A
Other languages
Chinese (zh)
Other versions
CN110532566B (en
Inventor
彭云龙
翟超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur General Software Co Ltd
Original Assignee
Shandong Inspur Genersoft Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Genersoft Information Technology Co Ltd filed Critical Shandong Inspur Genersoft Information Technology Co Ltd
Priority to CN201910825709.8A priority Critical patent/CN110532566B/en
Publication of CN110532566A publication Critical patent/CN110532566A/en
Application granted granted Critical
Publication of CN110532566B publication Critical patent/CN110532566B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of implementation method that vertical field Question sentence parsing calculates, and is related to natural language processing and information retrieval field;The term vector model of vertical field question sentence is obtained using word2vec training, interdependent syntactic analysis is carried out to vertical field question sentence simultaneously, identify subject, predicate and object, question sentence structural similarity model is established using the similarity of subject, predicate and object, bluebeard compound vector model and question sentence structural similarity model analyze Question sentence parsing;Word2vec and interdependent syntax are combined using the method for the present invention, improve the accuracy rate that Question sentence parsing calculates, and combine the interdependent syntactic analysis of question sentence, the syntactic structure information of parsing sentence, to reinforce the semantic understanding of question sentence, deeper Semantic Similarity Measurement is carried out to question sentence, improves the accuracy rate of question matching.

Description

A kind of implementation method that vertical field Question sentence parsing calculates
Technical field
The present invention discloses a kind of implementation method that vertical field Question sentence parsing calculates, and is related to natural language processing retrieval neck Domain.
Background technique
The fast development of internet, big data, to the accuracy of information retrieval, more stringent requirements are proposed.Traditional character It is impossible to meet the requirements of people for the mode of String matching, how the technologies such as natural language processing to be utilized deeply to excavate sentence Semantic information realizes the semantic matches ever more important between sentence.Similarity calculation can calculate between two sentences Similarity degree, and calculating sentence similarity and training text just with the vocabularys region feature such as TF-IDF has very big relationship, And vocabulary region feature can not indicate the semanteme of sentence, keep similarity degree calculating between sentence not accurate enough.Although simple depth Learning method can obtain the semantic information between word, but for question sentence, the structural information of sentence, such as subject, guest Language is more even more important than modifier, therefore still can not more accurately know similarity degree between sentence.
And the present invention provides a kind of implementation method that vertical field Question sentence parsing calculates, by word2vec and interdependent syntax It combines, improves the accuracy rate that Question sentence parsing calculates, and combine the interdependent syntactic analysis of question sentence, the syntactic structure of parsing sentence is believed Breath carries out deeper Semantic Similarity Measurement to question sentence, improves the standard of question matching to reinforce the semantic understanding of question sentence True rate.
Summary of the invention
The present invention is directed to problem of the prior art, provides a kind of implementation method that vertical field Question sentence parsing calculates, will Deep learning and syntactic analysis combine, and according to field question sentence feature, make full use of syntactic structure information and lexical semantic information, Improve the accuracy rate of question matching.
Concrete scheme proposed by the present invention is:
A kind of implementation method that vertical field Question sentence parsing calculates obtains vertical field question sentence using word2vec training Term vector model, while interdependent syntactic analysis is carried out to vertical field question sentence, identifies subject, predicate and object, utilize master The term vector of language, predicate and object establishes question sentence structural similarity model, bluebeard compound vector model and question sentence structural similarity mould Type analyzes Question sentence parsing.
User's question sentence in existing vertical Field Words and vertical field, user's question sentence group are collected in the implementation method At question sentence collection, question sentence collection is trained using word2vec to obtain term vector model.
User's question sentence in existing vertical Field Words and vertical field, user's question sentence group are collected in the implementation method At question sentence collection, interdependent syntactic analysis is carried out to question sentence collection, subject, predicate and object is identified, utilizes subject, predicate and object Term vector establishes question sentence structural similarity model.
User's question sentence in vertical field is segmented and gone in the implementation method pretreatment of stop words, recycles The data of pretreated vertical field question sentence establish term vector model and question sentence structural similarity model.
Term vector model carries out the calculating of Question sentence parsing in the implementation method, utilizes following formula:
Indicate that sentence 1 has n word, wiIndicate the term vector of i-th of word,It indicates to calculate being averaged for sentence 1 Term vector,Indicate that sentence 2 has m word,Indicate the average term vector of calculating sentence 2.
Term vector model and question sentence structural similarity models coupling carry out the meter of Question sentence parsing in the implementation method It calculates, utilizes following formula:
sim(S1,S2)=μ1β12β2
μ1, μ2For adjustment parameter, it is adjusted according to the vertical field question sentence of application, in which:
β21·sim1(w1,w2)+α2·sim2(w1,w2)+α3·sim3(w1,w2)
β2Sim in similarity1(w1,w2), sim2(w1,w2), sim3(w1,w2) it is respectively subject similarity, predicate is similar Degree, object similarity, α1, α2, α3Respectively subject similarity, predicate similarity, the coefficient of object similarity.
Using any tool in NLPIR, Jieba, LTP to user's question sentence in vertical field in the implementation method It is pre-processed.
A kind of realization device that vertical field Question sentence parsing calculates, including term vector model foundation unit, question sentence structure Similarity model establishes unit, analytical unit,
Term vector model foundation unit obtains the term vector model of vertical field question sentence using word2vec training, asks simultaneously Sentence structural similarity model foundation unit carries out interdependent syntactic analysis to vertical field question sentence, identifies subject, predicate and object, Question sentence structural similarity model is established using the term vector of subject, predicate and object, analytical unit bluebeard compound vector model and is asked Sentence structural similarity model, analyzes Question sentence parsing.
Usefulness of the present invention is:
The present invention provides a kind of implementation method that vertical field Question sentence parsing calculates, and is hung down using word2vec training The term vector model of straight field question sentence, while interdependent syntactic analysis is carried out to vertical field question sentence, identify subject, predicate and guest Language establishes question sentence structural similarity model, bluebeard compound vector model and question sentence structure using the term vector of subject, predicate and object Similarity model analyzes Question sentence parsing;
The method of the present invention combines the subject in the similarity calculation and sentence structure of term vector, the sentence of predicate and object The similarity of structure is considered, and is only considered that the method for core word is more reasonable than simple, is avoided the occurrence of subject, predicate, object identification The situation for not coming out or identifying mistake, reinforces the semantic understanding of question sentence, improves the accuracy rate of Question sentence parsing calculating.
Detailed description of the invention
Fig. 1, which is the method for the present invention, obtains term vector model flow schematic diagram using word2vec training;
Fig. 2 carries out the flow diagram of Question sentence parsing calculating using the method for the present invention.
Specific embodiment
The present invention provides a kind of implementation method that vertical field Question sentence parsing calculates, and is hung down using word2vec training The term vector model of straight field question sentence, while interdependent syntactic analysis is carried out to vertical field question sentence, identify subject, predicate and guest Language establishes question sentence structural similarity model, bluebeard compound vector model and question sentence structure using the term vector of subject, predicate and object Similarity model analyzes Question sentence parsing.
The realization device that a kind of vertical field Question sentence parsing corresponded to the above method calculates, including word are provided simultaneously Vector model establishes unit, question sentence structural similarity model foundation unit, analytical unit,
Term vector model foundation unit obtains the term vector model of vertical field question sentence using word2vec training, asks simultaneously Sentence structural similarity model foundation unit carries out interdependent syntactic analysis to vertical field question sentence, identifies subject, predicate and object, Question sentence structural similarity model is established using the term vector of subject, predicate and object, analytical unit bluebeard compound vector model and is asked Sentence structural similarity model, analyzes Question sentence parsing.
The present invention will be further explained below with reference to the attached drawings and specific examples, so that those skilled in the art can be with It more fully understands the present invention and can be practiced, but illustrated embodiment is not as a limitation of the invention.
To the question sentence of information retrieval compares similarity in vertical field, such as in the navigation of movie navigation, music, utilize The method of the present invention establishes term vector model and question sentence structural similarity model,
Term vector model wherein is obtained using question sentence in the vertical field of word2vec training: by the existing corresponding vertical of collection Straight Field Words and user's question sentence carry out data normalization processing, then pre-process to data, and pretreated process includes adopting The processing such as segmented, remove stop words with the NLPIR of the Chinese Academy of Sciences,
To pretreated data, determines model parameter and training data obtains term vector model and corresponding vertical field Question sentence library;
Establish question sentence structural similarity model: by the data of the existing corresponding vertical Field Words and user's question sentence of collection It is pre-processed, pretreated process includes the processing such as being segmented using the NLPIR of the Chinese Academy of Sciences, being removed stop words or using other Pretreating tool such as Jieba, LTP etc., to treated, the progress of Stanford University's parser is can be used in user's question sentence data Interdependent syntactic analysis identifies subject, predicate and object, establishes question sentence structure phase using the term vector of subject, predicate and object Like degree model and the question sentence library in corresponding vertical field;
According to actual vertical field, suitable adjustment parameter is selected, determines term vector model and question sentence structural similarity The distribution weight of model, bluebeard compound vector model and question sentence structural similarity model need to inquire to pretreated user Question sentence is analyzed, and is calculated user and is inputted similarity of the question sentence in corresponding vertical field with question sentence in question sentence library.
In above process, when obtaining term vector model using question sentence in the vertical field of word2vec training, term vector mould Type can carry out similarity calculation using following formula:
Indicate that sentence 1 has n word, wiIndicate the term vector of i-th of word,It indicates to calculate being averaged for sentence 1 Term vector,Indicate that sentence 2 has m word,The average term vector for indicating calculating sentence 2, calculates separately user's question sentence In every question sentence term vector mean value,
Interdependent syntactic analysis is carried out to every question sentence in user's question sentence using LTP, subject, predicate and object is identified, obtains To the term vector of the subject of every question sentence, predicate and object,
And term vector model and question sentence structural similarity models coupling can carry out similarity calculation using following formula:
sim(S1,S2)=μ1β12β2
S1, S2Indicate question sentence 1 and question sentence 2, μ1, μ2For adjustment parameter, i.e. term vector model and question sentence structural similarity model Between weight ratio, be adjusted according to the vertical field of application, μ1, μ2It is empirically determined according to vertical corpus.
β21·sim1(w1,w2)+α2·sim2(w1,w2)+α3·sim3(w1,w2)
β2In w1, w2Refer to the subject in two question sentences, predicate or object, β2Sim in similarity1(w1,w2), sim2 (w1,w2), sim3(w1,w2) it is respectively subject similarity, predicate similarity, object similarity, α1, α2, α3Respectively subject is similar Degree, predicate similarity, the coefficient of object similarity, α1, α2, α3It is empirically determined according to vertical corpus.
After user inputs the question sentence for needing to inquire, after data prediction, it is equal to calculate pretreated question sentence term vector Value carries out the similarity calculation with question sentence in question sentence library using the model formation in the above process.
Embodiment described above is only to absolutely prove preferred embodiment that is of the invention and being lifted, protection model of the invention It encloses without being limited thereto.Those skilled in the art's made equivalent substitute or transformation on the basis of the present invention, in the present invention Protection scope within.Protection scope of the present invention is subject to claims.

Claims (8)

1. a kind of implementation method that vertical field Question sentence parsing calculates, it is characterized in that vertically being led using word2vec training The term vector model of domain question sentence, while interdependent syntactic analysis is carried out to vertical field question sentence, identify subject, predicate and object, Question sentence structural similarity model, bluebeard compound vector model and question sentence structure phase are established using the term vector of subject, predicate and object Like degree model, Question sentence parsing is analyzed.
2. implementation method according to claim 1, it is characterized in that collecting existing vertical Field Words and vertical field User's question sentence, user's question sentence are formed question sentence collection, are trained to obtain term vector model to question sentence collection using word2vec.
3. implementation method according to claim 1 or 2, it is characterized in that collecting existing vertical Field Words and vertical field User's question sentence, user's question sentence forms question sentence collection, carries out interdependent syntactic analysis to question sentence collection, identify subject, predicate and object, Question sentence structural similarity model is established using the term vector of subject, predicate and object.
4. implementation method according to claim 3, it is characterized in that segmenting and going to stop to user's question sentence in vertical field The pretreatment of word recycles the data of pretreated vertical field question sentence to establish term vector model and question sentence structural similarity Model.
5. implementation method according to claim 1 or 4, it is characterized in that term vector model carries out the calculating of Question sentence parsing, Utilize following formula:
Indicate that sentence 1 has n word, wiIndicate the term vector of i-th of word,Indicate calculate sentence 1 average word to Amount,Indicate that sentence 2 has m word,Indicate the average term vector of calculating sentence 2.
6. implementation method according to claim 5, it is characterized in that term vector model and question sentence structural similarity models coupling The calculating for carrying out Question sentence parsing, utilizes following formula:
sim(S1,S2)=μ1β12β2
μ1, μ2For adjustment parameter, it is adjusted according to the vertical field of application, in which:
β21·sim1(w1,w2)+α2·sim2(w1,w2)+α3·sim3(w1,w2)
β2Sim in similarity1(w1,w2), sim2(w1,w2), sim3(w1,w2) it is respectively subject similarity, predicate similarity, object Similarity, α1, α2, α3Respectively subject similarity, predicate similarity, the coefficient of object similarity.
7. implementation method according to claim 6, it is characterized in that utilizing any tool pair in NLPIR, Jieba, LTP User's question sentence in vertical field pre-processes.
8. a kind of realization device that vertical field Question sentence parsing calculates is asked it is characterized in that including term vector model foundation unit Sentence structural similarity model foundation unit, analytical unit,
Term vector model foundation unit obtains the term vector model of vertical field question sentence, while question sentence knot using word2vec training Structure similarity model establishes unit and carries out interdependent syntactic analysis to vertical field question sentence, identifies subject, predicate and object, utilizes The term vector of subject, predicate and object establishes question sentence structural similarity model, analytical unit bluebeard compound vector model and question sentence knot Structure similarity model analyzes Question sentence parsing.
CN201910825709.8A 2019-09-03 2019-09-03 Method for realizing similarity calculation of questions in vertical field Active CN110532566B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910825709.8A CN110532566B (en) 2019-09-03 2019-09-03 Method for realizing similarity calculation of questions in vertical field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910825709.8A CN110532566B (en) 2019-09-03 2019-09-03 Method for realizing similarity calculation of questions in vertical field

Publications (2)

Publication Number Publication Date
CN110532566A true CN110532566A (en) 2019-12-03
CN110532566B CN110532566B (en) 2023-05-02

Family

ID=68666416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910825709.8A Active CN110532566B (en) 2019-09-03 2019-09-03 Method for realizing similarity calculation of questions in vertical field

Country Status (1)

Country Link
CN (1) CN110532566B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560481A (en) * 2020-12-25 2021-03-26 北京百度网讯科技有限公司 Statement processing method, device and storage medium
CN112699663A (en) * 2021-01-07 2021-04-23 中通天鸿(北京)通信科技股份有限公司 Semantic understanding system based on combination of multiple algorithms

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004287679A (en) * 2003-03-20 2004-10-14 Fuji Xerox Co Ltd Natural language processing system and natural language processing method and computer program
CN105843897A (en) * 2016-03-23 2016-08-10 青岛海尔软件有限公司 Vertical domain-oriented intelligent question and answer system
CN109062892A (en) * 2018-07-10 2018-12-21 东北大学 A kind of Chinese sentence similarity calculating method based on Word2Vec
CN109271626A (en) * 2018-08-31 2019-01-25 北京工业大学 Text semantic analysis method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004287679A (en) * 2003-03-20 2004-10-14 Fuji Xerox Co Ltd Natural language processing system and natural language processing method and computer program
CN105843897A (en) * 2016-03-23 2016-08-10 青岛海尔软件有限公司 Vertical domain-oriented intelligent question and answer system
CN109062892A (en) * 2018-07-10 2018-12-21 东北大学 A kind of Chinese sentence similarity calculating method based on Word2Vec
CN109271626A (en) * 2018-08-31 2019-01-25 北京工业大学 Text semantic analysis method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曹莉丽等: "融合词向量的多特征问句相似度计算方法研究", 《现代计算机(专业版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560481A (en) * 2020-12-25 2021-03-26 北京百度网讯科技有限公司 Statement processing method, device and storage medium
CN112560481B (en) * 2020-12-25 2024-05-31 北京百度网讯科技有限公司 Statement processing method, device and storage medium
CN112699663A (en) * 2021-01-07 2021-04-23 中通天鸿(北京)通信科技股份有限公司 Semantic understanding system based on combination of multiple algorithms

Also Published As

Publication number Publication date
CN110532566B (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN110134772A (en) Medical text Relation extraction method based on pre-training model and fine tuning technology
CN106383817B (en) Utilize the Article Titles generation method of distributed semantic information
CN107330011B (en) The recognition methods of the name entity of more strategy fusions and device
CN110704621B (en) Text processing method and device, storage medium and electronic equipment
CN109062892A (en) A kind of Chinese sentence similarity calculating method based on Word2Vec
CN109408642A (en) A kind of domain entities relation on attributes abstracting method based on distance supervision
CN108549634A (en) A kind of Chinese patent text similarity calculating method
CN103853834B (en) Text structure analysis-based Web document abstract generation method
CN103116578A (en) Translation method integrating syntactic tree and statistical machine translation technology and translation device
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN110532566A (en) A kind of implementation method that vertical field Question sentence parsing calculates
He et al. Question answering over linked data using first-order logic
CN110008323A (en) A kind of the problem of semi-supervised learning combination integrated study, equivalence sentenced method for distinguishing
CN105975475A (en) Chinese phrase string-based fine-grained thematic information extraction method
CN115238029A (en) Construction method and device of power failure knowledge graph
CN110532568A (en) Chinese Word Sense Disambiguation method based on tree feature selecting and transfer learning
CN105389303B (en) A kind of automatic fusion method of heterologous corpus
CN106339371A (en) English and Chinese word meaning mapping method and device based on word vectors
CN111597349A (en) Rail transit standard entity relation automatic completion method based on artificial intelligence
CN101876985A (en) WEB text sentiment theme recognizing method based on mixed model
CN112801217A (en) Text similarity judgment method and device, electronic equipment and readable storage medium
CN107622047A (en) A kind of extraction of design decision knowledge and expression
CN106095733B (en) A kind of improved accurate extracting method of natural language feature based on deep learning
CN108733658A (en) Institution term Chinese-English translation method
CN109101591A (en) The phonetic meaning of a word search method in knowledge based library

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230411

Address after: No. 1036, Shandong high tech Zone wave road, Ji'nan, Shandong

Applicant after: Inspur Genersoft Co.,Ltd.

Address before: 250100 No. 2877 Kehang Road, Sun Village Town, Jinan High-tech District, Shandong Province

Applicant before: SHANDONG INSPUR GENESOFT INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant