CN109614618A - Multi-semantic-based extraset word processing method and device - Google Patents

Multi-semantic-based extraset word processing method and device Download PDF

Info

Publication number
CN109614618A
CN109614618A CN201811498210.2A CN201811498210A CN109614618A CN 109614618 A CN109614618 A CN 109614618A CN 201811498210 A CN201811498210 A CN 201811498210A CN 109614618 A CN109614618 A CN 109614618A
Authority
CN
China
Prior art keywords
word
semanteme
semantic
justice
term vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811498210.2A
Other languages
Chinese (zh)
Other versions
CN109614618B (en
Inventor
杨凯程
李健铨
蒋宏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Taiyue Xiangsheng Software Co ltd
Original Assignee
Anhui Taiyue Xiangsheng Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Taiyue Xiangsheng Software Co ltd filed Critical Anhui Taiyue Xiangsheng Software Co ltd
Publication of CN109614618A publication Critical patent/CN109614618A/en
Application granted granted Critical
Publication of CN109614618B publication Critical patent/CN109614618B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the application provides a method and a device for processing extracorporal words based on multiple meanings. The method comprises the following steps: acquiring the weight of each semantic meaning of the out-of-set words according to the upper and lower words of the out-of-set words in the sentence; generating a semantic vector of each semantic according to the word vector of the semantic in each semantic; and according to the weight of each semantic, carrying out weighted summation on the semantic vector of each semantic to generate a simulation word vector. The simulated word vector generated by the technical scheme provided by the application can realize matching sentence meanings and simultaneously take other semantics of the word outside the set into account, so that the semantics expressed by the simulated word vector are richer and fuller and are suitable for richer semantic environments; when the simulated word vector is used in the intelligent interactive system, the relevance degree of the response and the problem can be high, the response accuracy is improved, the intelligent question-answering system is suitable for richer dialogue environments, the intelligent question-answering system is more intelligent in performance, the user friendliness is greatly improved, and the problem of word outside collection in the prior art is solved.

Description

Word treatment method and device outside collection based on multi-semantic meaning
This application claims in submission on June 1st, 2018 Patent Office of the People's Republic of China, application No. is 201810556386.2, invention names The referred to as priority of the Chinese patent application of " word treatment method, intelligent answer method and device outside the collection based on multi-semantic meaning ", Full content is hereby incorporated by reference in the application.
Technical field
This application involves word treatment methods outside natural language processing technique field more particularly to a kind of collection based on multi-semantic meaning And device.
Background technique
With the development of natural language processing technique, also obtained based on the conversational system that natural language processing technique is set up To being widely applied, common conversational system such as chat robots can be automatic raw according to the chat content of user's input At correspondingly replying.
In the prior art, conversational system can be divided into the retrieval type conversational system in knowledge based library according to different answer methods With the production conversational system based on deep learning model.Wherein, the conversational system based on deep learning model, by establishing one A dialog model based on RNN (recurrent neural network: Recurrent Neural Networks), and carried out using the model A large amount of corpus training, enables dialog model to learn from question and answer centering to the potential answer-mode to unknown dialogue, thus its Answer content is not only limited to existing knowledge in training corpus.
Conversational system based on deep learning model is operation pair with term vector when carrying out trained corpus and corpus response As term vector is the expression-form to a kind of mathematicization segmented in corpus.Contribution of the term vector in deep learning is: passing through Two term vectors are calculated into cosine angle or Euclidean distance, the distance of two participles can be obtained, the distance of two participles is smaller, Indicate that the similarity of two participles is higher.In the training process of conversational system, it can be generated according to training corpus comprising known point The term vector space of word term vector;In the answering of conversational system, according to the term vector of problem participle and known participle The distance between term vector, and combine the response content of the algorithm generation problem of machine learning.
But based on the corpus obtained term vector space of training for business terms of professional domain, dialectism, outer Literary, portmanteau word poor comprising ability, therefore, in the unrestricted open conversational system of problem content, conversational system is frequent The outer word (OOV:out-of-vocabulary) of collection can be encountered, refers to the participle for not including in term vector space.When conversational system is met When to comprising collecting outer word problem, the accuracy rate for providing response content will decline, and such case, which is referred to as, collects outer word (OOV) Problem.Currently, the prior art lacks effective solution method to outer word problem is collected.
Summary of the invention
The embodiment of the present application provides word treatment method and device outside a kind of collection based on multi-semantic meaning, to solve the prior art The problem of.
In a first aspect, the embodiment of the present application provides word treatment method outside a kind of collection based on multi-semantic meaning, comprising:
According to up and down word of the outer word in sentence is collected, obtains and collect the semantic weight of each of outer word;Word includes above and below described Collect at least one preamble participle and at least one postorder participle of the outer word in sentence;
According to the former term vector of justice in each semanteme, the semantic vector of each semanteme is generated;
According to the weight of each semanteme, semantic vector weighted sum to each semanteme generates emulation term vector.
Second aspect, the embodiment of the present application provide a kind of intelligent answer method, are applied to provided by the embodiments of the present application Word treatment method outside collection based on multi-semantic meaning, comprising:
It is obtained from the word segmentation result of unknown problem and collects outer word;
Based on the multi-semantic meaning of word outside the collection, the emulation term vector of the outer word of the collection is generated;
According to the term vector of remaining participle in the emulation term vector and described problem, from the Question-Answering Model trained With problem answers.
The third aspect, the embodiment of the present application provide word processing unit outside a kind of collection based on multi-semantic meaning, comprising:
Semantic weight acquiring unit, for obtaining each semanteme for collecting outer word according to word up and down of the outer word in sentence is collected Weight;The word up and down includes at least one preamble participle and at least one postorder participle for collecting outer word in sentence;
Semantic vector generation unit, for the term vector former according to justice in each semanteme, generate it is each it is semantic it is semantic to Amount;
Term vector generation unit is emulated, for the weight according to each semanteme, the semantic vector weighting of each semanteme is asked With generation emulation term vector.
From the above technical scheme, the embodiment of the present application provide word treatment method outside a kind of collection based on multi-semantic meaning, Intelligent answer method and device.It include: to obtain according to the word up and down for collecting outer word in sentence and collect the semantic power of each of outer word Weight;The word up and down includes at least one preamble participle and at least one postorder participle for collecting outer word in sentence;According to each The former term vector of justice in semanteme, generates the semantic vector of each semanteme;According to the weight of each semanteme, to the semanteme of each semanteme Vector weighted sum generates emulation term vector.The emulation term vector that technical solution provided by the present application generates is based on the outer word of collection Multiple semantemes, and according to the semantic dependency for collecting outer word and upper and lower word, it is generated according to different weight fusions;Thus, it is possible to real While present matched sentences clause's justice, other semantemes of the outer word of collection are taken into account, keep semanteme expressed by emulation term vector richer It is full, adapt to richer semantic environment;Therefore, when the emulation term vector that the embodiment of the present application generates is used for intelligent interactive system When middle, the degree of association of response and problem can be made high, response accuracy rate is improved, and adapt to richer session context, make intelligence Question answering system shows more intelligent, greatlys improve user's likability, solves the problems, such as the outer word of collection in the prior art.
Detailed description of the invention
In order to illustrate more clearly of the technical solution of the application, letter will be made to attached drawing needed in the embodiment below Singly introduce, it should be apparent that, for those of ordinary skills, without any creative labor, It is also possible to obtain other drawings based on these drawings.
Fig. 1 be the embodiment of the present application shown in a kind of collection based on multi-semantic meaning outside word treatment method flow chart;
Fig. 2 be the embodiment of the present application shown in a kind of collection based on multi-semantic meaning outside word treatment method step S110 process Figure;
Fig. 3 be the embodiment of the present application shown in a kind of collection based on multi-semantic meaning outside word treatment method step S111 process Figure;
Fig. 4 be the embodiment of the present application shown in a kind of collection based on multi-semantic meaning outside word treatment method step S112 process Figure;
Fig. 5 be the embodiment of the present application shown in a kind of collection based on multi-semantic meaning outside word treatment method step S120 process Figure;
Fig. 6 is a kind of flow chart of intelligent interactive method shown in the embodiment of the present application;
Fig. 7 is word processing unit block diagram outside a kind of collection based on multi-semantic meaning shown in the embodiment of the present application;
Fig. 8 is a kind of intelligent interaction device block diagram shown in the embodiment of the present application.
Specific embodiment
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application reality The attached drawing in example is applied, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described implementation Example is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field is common The application protection all should belong in technical staff's every other embodiment obtained without making creative work Range.
Conversational system based on deep learning model is operation pair with term vector when carrying out trained corpus and corpus response As term vector is the expression-form to a kind of mathematicization segmented in corpus.Contribution of the term vector in deep learning is: passing through Two term vectors are calculated into cosine angle or Euclidean distance, the distance of two participles can be obtained, the distance of two participles is smaller, Indicate that the similarity of two participles is higher.
In natural language processing technique field, a kind of term vector is One-Hot Representation type, this word to Amount determines the dimension of term vector according to the quantity of participle known in dictionary for word segmentation, wherein each dimension in term vector represents A participle in dictionary for word segmentation, therefore, in the term vector of one-hot representation type, only one dimension Numerical value is 1, remaining dimension is 0.Due to, the quantity of known participle is usually many in a dictionary for word segmentation, therefore, One- The term vector dimension of Hot Representation type is very high.But high-dimensional term vector is applying to deep learning neck When in domain, it is easy the puzzlement by dimension disaster, also, since participle each in this term vector independently possesses a dimension, Therefore it is difficult to reflect the similitude between two words, is not suitable for deep learning model.
Therefore, in the conversational system based on deep learning model, usually used is another term vector: Distributed Representation.This term vector is that each participle is mapped to a kind of fixed length by corpus training The low-dimensional real vector of degree, the term vector of all Distributed Representation types, which is put together, will form one Term vector space, in term vector space, a point of each term vector equivalent vector space, for example, some term vector are as follows: [0.792, -0.177, -0.107,0.109 ...].In term vector space, the distance between two points are just represented two minutes Similarity between word, can between two term vectors cosine angle and Euclidean distance indicate.Based on Distributed The characteristic of Representation type term vector, the preferred Distributed Representation of term vector in the application Type.
In the prior art, it is limited by corpus quantity and abundant in content degree, business of the term vector space to professional domain Term, dialectism, foreign language, portmanteau word comprising ability it is poor, therefore, in the unrestricted open conversational system of problem content In, conversational system is frequently encountered the outer word (OOV:out-of-vocabulary) of collection, is not present in term vector sky due to collecting outer word Between in, when conversational system encounter comprising collect outer word problem when, can not using term vector space progress answer matches, it is therefore, right Response cannot be provided comprising collecting outer word problem.
In order to solve the problem of that a kind of scheme of word is outside collection in the prior art: when user proposes comprising collecting outer word When, using the mode generated at random to collecting outer word one random term vector of generation, this random term vector can be mapped to word to Then a point in quantity space uses this random term vector to carry out the matching of term vector as the term vector for collecting outer word, thus Response is provided to comprising collecting outer word problem.This scheme is able to solve in the conversational system based on deep learning of the prior art To the problem of outer word cannot provide response is collected, still, since in this scheme, the term vector for collecting outer word is randomly generated, have There is uncertainty, therefore, although response can be carried out to comprising collecting outer word problem, the content of its response cannot be protected Card, unresponsive accuracy can be sayed, collect outer word problem and do not solved thoroughly still.
Embodiment one
In order to solve the problems, such as word outside collection in the prior art, the embodiment of the present application is provided outside a kind of collection based on multi-semantic meaning Word treatment method is the process of word treatment method outside a kind of collection based on multi-semantic meaning shown in the embodiment of the present application referring to Fig. 1 Figure, the described method comprises the following steps:
Step S110 is obtained according to up and down word of the outer word in sentence is collected and is collected the semantic weight of each of outer word;
Collecting outer word would generally include multiple semantemes, and in the application, the semanteme of the outer word of collection can (English name be from Hownet HowNet) in obtain, Hownet is one using concept representated by the word of Chinese and english as description object, with disclose concept with Relationship between concept and between attribute possessed by concept is the commonsense knowledge base of basic content.In Hownet, adopted original is Minimum unit that is most basic, being not easy to the meaning divided again, a word can have multiple semantemes, and each semanteme may include more A justice is former, for example, the semanteme of word and its adopted original can be indicated with following form:
Wherein, each row lists the semanteme an of word and the justice original of each semanteme.Wherein, in each row, first row Indicate word itself, secondary series indicates the semantic quantity of word, after secondary series, is given expression to respectively with number+former mode of justice The former quantity of justice and the former content of justice in each semanteme.Such as: " word " shares 6 semantemes;Wherein, the 1st semanteme has 2 justice It is former: function word, progress;2nd semanteme has 1 justice former: function word;3rd semanteme has 1 justice former: living;Etc..
In sentence, the semanteme of participle is the component part of sentence sentence justice, therefore in the sentence for expressing different sentence justice, point The semanteme of word is different, and for example, two sentences is listed below:
Sentence 1: I wishes that birthday gift is an Apple Computers.
Sentence 2: I likes eating apple.
In two sentences shown in above-mentioned example, the semanteme of " apple " is obviously different;And in sentence, with mesh Other participles that mark participle closes on can be related to target participle semantically, co-expresses out the local sentence justice of sentence.
For example, the adjacent participle in the front and back of " apple " is respectively " one " " computer " in sentence 1, wherein " apple electricity What brain " indicated is the computer of apple brand, and " one " is the digit of " Apple Computers ", it is seen then that " one " " apple " " electricity Brain " is semantically related;In addition, " eating " there is also phases semantically as " apple " in verb, with sentence 2 in sentence 2 It closes.
It is semantic different in different sentences based on participle, and, the related feature of the semanteme of adjacent participle, this Shen in sentence Please in step s 110, it according to up and down word of the outer word in sentence is collected, obtains and collects the semantic weight of each of outer word;To pass through Weight reflects the percentage contribution for collecting the semantic distich justice in specific sentence of each of outer word.
The concept of upper and lower word defined herein, upper and lower word include at least one preamble participle for collecting outer word in sentence Segmented at least one postorder, specifically: in sentence centered on collecting outer word, to sentence before the direction far from the outer word of collection successively Search at least one participle, and, to sentence after far from the direction for collecting outer word successively search at least one participle.
Fig. 2 be the embodiment of the present application shown in a kind of collection based on multi-semantic meaning outside word treatment method step S110 process Figure.
In a kind of selectable embodiment, as shown in Fig. 2, step S110 the following steps are included:
Step S111 obtains the word up and down for collecting outer word in sentence;
In the application, upper and lower word can segment to collect outer word a preamble participle and postorder in sentence, Ke Yiwei Collect two preambles participle and two postorder participles of the outer word in sentence, or multiple preambles point of the outer word of collection in sentence Word and multiple postorders participle.Fig. 3 is word treatment method step outside a kind of collection based on multi-semantic meaning shown in the embodiment of the present application The flow chart of S111.
Collect the word up and down of outer word in order to which the method that can be followed from sentence quantitatively obtains, in a kind of selectable implementation In mode, as shown in figure 3, step S111 may comprise steps of:
Step S1111 is arranged and takes word window value C, C to be integer and be greater than or equal to for constrain the word quantity up and down 1;
In the embodiment of the present application, value window C is defined, value window C is used to constrain the quantity of upper and lower word, when in sentence When being all larger than C positioned at the participle quantity for collecting outer word front and back, the quantity of upper and lower word is 2C.
Step S1112 takes word window value C according to described, outside comprising the collection in the participle of the sentence of word described in acquisition Upper and lower word;
Wherein, the word up and down includes the C participle and postorder C participle for being located at the outer word preamble of the collection in sentence.
Illustratively, setting takes word window value C=1;Sentence comprising collecting outer word are as follows: I wants to buy an Apple Computers;Sentence In the outer word of collection are as follows: apple.
All participles in sentence are obtained first, it may be assumed that I wants to buy an Apple Computers
Due to taking word window value C=1, therefore, upper and lower word is previous participle and the latter point of the outer word of collection in sentence Word, it may be assumed that one, computer.
Illustratively, setting takes word window value C=2, the sentence comprising collecting outer word are as follows: I wants to buy an Apple Computers;Sentence In the outer word of collection are as follows: apple.
All participles in sentence are obtained first, it may be assumed that I wants to buy an Apple Computers
Due to taking word window value C=2, therefore, upper and lower word is the first two participle and latter two point of the outer word of collection in sentence Word.But in sentence, collect only one participle of the rear of outer word, in this case, the application when obtaining upper and lower word, If getting the start or end of sentence forward or backward, stopping continues to obtain.Therefore, when taking word window value C=2, from The word up and down of " apple " that is obtained in sentence are as follows: want to buy, one, computer.
Step S112 obtains the first kind distance of the word and each semanteme up and down;
In sentence, upper and lower word with collect outer word semantically related, in order to obtain upper and lower word and collect each semanteme of outer word Degree of correlation, to reasonably determine the weight for collecting the outer each semanteme of word, the application obtains word up and down and every in step S112 The first kind distance of a semanteme, wherein first kind distance can be upper and lower word and semantic COS distance, Euclidean distance etc..
Fig. 4 be the embodiment of the present application shown in a kind of collection based on multi-semantic meaning outside word treatment method step S112 process Figure.
In a kind of selectable embodiment, as shown in figure 4, step S112 the following steps are included:
Step S1121 obtains the COS distance that each justice is former in each participle of word up and down and each semanteme;
Illustratively, the semanteme of " apple " and justice are former are as follows:
Apple 35 carries the specific brand computer of pattern value can 1 fruit, 3 tree fruit reproduction
When value window value C=1, the word up and down of " apple " includes following participle altogether: one, computer.
"one" COS distance former with justice each in first semanteme of word up and down is obtained, with COS (semantic, justice is former) table Show, is respectively as follows:
COS (one, carry), COS (one, pattern value), COS (one, pattern value), COS (one, computer), COS (one, can)
The COS distance that word " computer " is former with justice each in first semanteme up and down is obtained, with COS (semantic, justice is former) table Show, is respectively as follows:
COS (computer carries), COS (computer, pattern value), COS (computer, pattern value), COS (computer, computer), COS (computer, can)
"one" COS distance former with justice each in second semanteme of word up and down is obtained, with COS (semantic, justice is former) table Show, is respectively as follows:
COS (one, fruit)
The COS distance that word " computer " is former with justice each in second semanteme up and down is obtained, with COS (semantic, justice is former) table Show, is respectively as follows:
COS (computer, fruit)
"one" COS distance former with each justice in third semanteme of word up and down is obtained, with COS (semantic, justice is former) table Show, is respectively as follows:
COS (one, tree), COS (one, fruit), COS (one, reproduction)
The COS distance that each justice is former in word " computer " up and down and third semanteme is obtained, with COS (semantic, justice is former) table Show, is respectively as follows:
COS (computer, tree), COS (computer, fruit), COS (computer, reproduction)
Step S1122 obtains institute's ariyoshi in each participle of word up and down and each semanteme according to the COS distance Former average distance;
Illustratively, the average distance in step S1122 is indicated with Da, the quantity of the word up and down of " apple " is 2, " apple The semantic quantity of fruit " is 3, therefore can obtain a distance Da in 6 (2 × 3) altogether:
Da (one, it is semantic 1)=[COS (one, carry)+COS (one, pattern value)+COS (one, pattern value)+COS (one, computer)+COS (one, can)] ÷ 5
Da (computer, it is semantic 1)=[COS (computer carries)+COS (computer, pattern value)+COS (computer, pattern value)+COS (computer, computer)+COS (computer, can)] ÷ 5
Da (one, it is semantic 2)=COS (one, fruit)
Da (computer, it is semantic 2)=COS (computer, fruit)
Da (one, semantic 3)=[COS (one, tree)+COS (one, fruit)+COS (one, reproduction)] ÷ 3
Da (computer, semantic 3)=[COS (computer, tree)+COS (computer, fruit)+COS (computer, reproduction)] ÷ 3
Step S1123 obtains the first kind distance of the word and each semanteme up and down according to the average distance.
In the embodiment of the present application, upper and lower word includes multiple participles, and the first kind distance of upper and lower word and each semanteme is these Segment the average value with each semantic distance Da.
Illustratively:
Upper and lower word and first semantic first kind distance D1=[Da (one, and semantic 1)+Da (computer, semanteme is 1)] ÷ 2
Upper and lower word and second semantic first kind distance D2=[Da (one, and semantic 2)+Da (computer, semanteme is 2)] ÷ 2
Upper and lower word and third semanteme first kind distance D3=[Da (and one, semanteme 3)+Da (computer, semanteme is 3)] ÷ 2
Step S113 calculates the weight of each semanteme according to the first kind distance.
First kind distance is to calculate to obtain by COS distance in the application, and the numerical value of first kind distance is higher, in expression Lower word and semantic degree of correlation are higher, and weight also should be correspondingly higher.As it can be seen that in the application, the numerical value of first kind distance with The numerical value of semantic weight is in positive related.
Based on positive relevant relationship, in a kind of selectable embodiment, collecting the semantic weight of each of outer word makes It is calculated with following formula:
Wherein, n is the quantity for collecting outer word justice, and Wm is m-th of semantic weight of the outer word of collection, and Dm is the word up and down and collection M-th of semantic first kind distance of outer word,For the sum of the first kind distance of the outer all semantemes of word of collection.
Step S120 generates the semantic vector of each semanteme according to the former term vector of justice in each semanteme;
Fig. 5 be the embodiment of the present application shown in a kind of collection based on multi-semantic meaning outside word treatment method step S120 process Figure.
In a kind of selectable embodiment, as shown in figure 5, step S120 the following steps are included:
Step S121 obtains the former term vector of justice for collecting that each justice is former in each of outer word semanteme;
Illustratively, collect outer word " apple " and share 3 semantemes, in step S121, need to obtain respectively every in this 3 semantemes The former term vector of the former justice of a justice, such as: obtain the former term vector T11~T15 of justice in semanteme 1, the former term vector of justice in semanteme 2 The former term vector T31~T33 of justice in T21 and semantic 3.
Step S122, according to the former quantity of justice in each semanteme, to the former setting justice original weight of each of each semanteme justice;
In the embodiment of the present application, the size of justice original weight determines that the former quantity of justice is more according to the former quantity of justice in semanteme, The former weight of justice that each justice original is shared is former to semantic percentage contribution to embody each justice with regard to smaller.
In a kind of selectable embodiment, in each semanteme, the former weight of the justice of all sememe can be identical, and makes It is obtained with following formula:
Wp=1/x
Wherein, Wp is the former weight of justice, and x is the quantity that justice is former in semanteme.
Illustratively, the former weight of justice of the former term vector T11~T15 of justice is W1=1/5;
The former weight of justice of adopted original term vector T21 is W2=1;
The former weight of justice of adopted original term vector T31~T33 is W3=1/3.
Step S123, according to the former weight of the justice, the term vector former to justice in each semanteme is weighted summation, generates every The semantic vector of a semanteme.
Step S123 obtains the semantic vector of each semanteme using following formula:
Wherein, Ti is i-th of semantic semantic vector, and n is the quantity that justice is former in i-th of semanteme, and Tij is semantic i-th In the former former term vector of justice of j-th of justice, Wi is the former weight of justice that j-th of justice is former in i-th of semanteme.
Illustratively, the 1st of the outer word " apple " of the collection semantic justice original term vector T11 according to obtained in step S122~ The T15 and former weight W11~W15 of justice, calculates " apple " first semantic semantic vector are as follows:
T1=T11 × W11+T12 × W12++T13 × W13+T14 × W14+T15 × W15
In the application, Tij can be the low-dimensional vector of Distributed Representation type, such as dimension m= 50 or dimension m=100.
Step S130, according to the weight of each semanteme, semantic vector weighted sum to each semanteme, generate emulation word to Amount.
The embodiment of the present application obtains collect the semantic weight of each of outer word respectively in step S110 and step S120, And the semantic vector of each semanteme.In step s 130, summation is weighted to semantic weight by using semantic vector Mode, one can be generated and merged the emulation term vector for collecting outer word multi-semantic meaning.
Illustratively, according to step S110 and step S120 to the semantic vector T1~T3 for collecting outer word " apple " generation, and, Semantic weight W1~W3, the emulation term vector Tout that weighted sum generates:
Tout=T1 × W1+T2 × W2+T3 × W3
As can be seen that emulation term vector Tout is based on the multiple semantemes for collecting outer word from above-mentioned formula, and according to the outer word of collection With the semantic dependency of upper and lower word, generated according to different weight fusions;Therefore, the emulation term vector that the embodiment of the present application generates Other semantemes of the outer word of collection can be taken into account while matched sentences clause's justice, make to emulate semanteme expressed by term vector more It enriches full, adapts to richer semantic environment;Therefore, the emulation term vector that the embodiment of the present application generates is being used for intelligent answer When in system, the degree of association of response and problem can be made high, improve response accuracy rate, and intelligent Answer System can be made to adapt to more Session context abundant shows intelligent Answer System more intelligent, greatlys improve user's likability, solve the prior art In the outer word problem of collection.
From the above technical scheme, the embodiment of the present application provides word treatment method outside a kind of collection based on multi-semantic meaning, It include: to obtain according to the word up and down for collecting outer word in sentence and collect the semantic weight of each of outer word;The word up and down includes that collection is outer At least one preamble participle and at least one postorder participle of the word in sentence;It is raw according to the former term vector of justice in each semanteme At the semantic vector of each semanteme;According to the weight of each semanteme, semantic vector weighted sum to each semanteme generates emulation Term vector.Multiple semantemes of the emulation term vector based on the outer word of collection that the embodiment of the present application generates, and according to the outer word of collection and upper and lower word Semantic dependency, generated according to different weight fusion;It can be realized while matched sentences clause's justice, take into account the outer word of collection Other are semantic, it is richer full to make to emulate semanteme expressed by term vector, adapts to richer semantic environment;Therefore, when this When the emulation term vector for applying for that embodiment generates is used in intelligent Answer System, the degree of association of response and problem can be made high, mentioned High response accuracy rate, and richer session context is adapted to, it shows intelligent Answer System more intelligent, greatlys improve user Likability solves the problems, such as the outer word of collection in the prior art.
Embodiment two
The embodiment of the present application provides a kind of intelligent answer method, wherein applies the base of the offer of the embodiment of the present application one In word treatment method outside the collection of multi-semantic meaning, Fig. 6 is a kind of flow chart of intelligent answer method shown in the embodiment of the present application, is such as schemed Shown in 6, it the described method comprises the following steps:
Step S210 is obtained from the word segmentation result of unknown problem and is collected outer word;
Intelligent Answer System needs just have responsibility by the training of training corpus, in the training process, intelligence Question answering system can generate the term vector space for being used to express known participle term vector according to known participle;When user is to after training Intelligent Answer System when being putd question to, intelligent Answer System divides unknown problem according to preset participle word cutting rule Word, and according to word outside the collection being not present in term vector space can be got according to word segmentation result.
In intelligent Answer System, collect outer word since there is no in term vector space, therefore, it is impossible to be matched to corresponding Term vector causes intelligent Answer System when encountering the outer word of collection, can not be by being matched to accurate response.
Step S220 generates the emulation term vector of the outer word of the collection based on the multi-semantic meaning of word outside the collection;
In step S220, word treatment method is to step outside the collection based on multi-semantic meaning that is provided using the embodiment of the present application one The outer word of the collection that S210 is got generates emulation term vector;
Step S230 is asked according to the term vector of remaining participle in the emulation term vector and described problem from what is trained Answer matching problem answer in model.
From the above technical scheme, the embodiment of the present application provides a kind of intelligent answer method, comprising: from unknown problem Word segmentation result in obtain and collect outer word;Based on the multi-semantic meaning of word outside the collection, the emulation term vector of the outer word of the collection is generated;According to The term vector of remaining participle, the matching problem answer from the Question-Answering Model trained in the emulation term vector and described problem. Intelligent answer method provided by the embodiments of the present application, when encountering word outside the collection in unknown problem, based on the multi-semantic meaning for collecting outer word, The emulation term vector of the outer word of spanning set applies provided by the present application based on multi-semantic meaning during generating emulation term vector The outer word treatment method of collection, when intelligent Answer System being made to generate response, the degree of association of response and problem can be made high, improve response Accuracy rate, and richer session context is adapted to, it shows intelligent Answer System more intelligent, greatlys improve user's good opinion Degree solves the problems, such as the outer word of collection in the prior art.
Embodiment three
The embodiment of the present application provides word processing unit outside a kind of collection based on multi-semantic meaning, and Fig. 7 shows for the embodiment of the present application Word processing unit block diagram outside a kind of collection based on multi-semantic meaning out, as shown in fig. 7, described device includes:
Semantic weight acquiring unit 310, for obtaining each language for collecting outer word according to word up and down of the outer word in sentence is collected The weight of justice;
Semantic vector generation unit 320 generates the semanteme of each semanteme for the term vector former according to justice in each semanteme Vector;
Term vector generation unit 330 is emulated, for the weight according to each semanteme, the semantic vector of each semanteme is weighted Summation generates emulation term vector.
From the above technical scheme, the embodiment of the present application provides word processing unit outside a kind of collection based on multi-semantic meaning, For obtaining and collecting the semantic weight of each of outer word according to word up and down of the outer word in sentence is collected;The word up and down includes that collection is outer At least one preamble participle and at least one postorder participle of the word in sentence;It is raw according to the former term vector of justice in each semanteme At the semantic vector of each semanteme;According to the weight of each semanteme, semantic vector weighted sum to each semanteme generates emulation Term vector.Multiple semantemes of the emulation term vector based on the outer word of collection that the embodiment of the present application generates, and according to the outer word of collection and upper and lower word Semantic dependency, generated according to different weight fusion;It can be realized while matched sentences clause's justice, take into account the outer word of collection Other are semantic, it is richer full to make to emulate semanteme expressed by term vector, adapts to richer semantic environment;Therefore, when this When the emulation term vector for applying for that embodiment generates is used in intelligent Answer System, the degree of association of response and problem can be made high, mentioned High response accuracy rate, and richer session context is adapted to, it shows intelligent Answer System more intelligent, greatlys improve user Likability solves the problems, such as the outer word of collection in the prior art.
Example IV
The embodiment of the present application provides a kind of intelligent answer device, wherein applies the base of the offer of the embodiment of the present application one In word treatment method outside the collection of multi-semantic meaning, Fig. 8 is a kind of intelligent answer device block diagram shown in the embodiment of the present application, such as Fig. 8 institute Show, described device includes:
Collect outer word acquiring unit 410, collects outer word for obtaining from the word segmentation result of unknown problem;
Collect outer word processing unit 420, for the multi-semantic meaning based on word outside the collection, generate the collection outside word emulation word to Amount;
It answers unit 430, for the term vector according to remaining participle in the emulation term vector and described problem, Cong Yixun Matching problem answer in experienced Question-Answering Model.
From the above technical scheme, the embodiment of the present application provides a kind of intelligent answer device, described device be used for from It is obtained in the word segmentation result of unknown problem and collects outer word;Based on the multi-semantic meaning of word outside the collection, the emulation word of the outer word of the collection is generated Vector;According to the term vector of remaining participle in the emulation term vector and described problem, matched from the Question-Answering Model trained Problem answers.Intelligent answer method provided by the embodiments of the present application, when encountering word outside the collection in unknown problem, based on the outer word of collection Multi-semantic meaning, the emulation term vector of the outer word of spanning set, generate emulate term vector during, apply base provided by the present application In word treatment method outside the collection of multi-semantic meaning, when intelligent Answer System being made to generate response, the degree of association of response and problem can be made high, Response accuracy rate is improved, and adapts to richer session context, shows intelligent Answer System more intelligent, greatlys improve use Family likability solves the problems, such as the outer word of collection in the prior art.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as: personal computer, service Device computer, handheld device or portable device, laptop device, multicomputer system, microprocessor-based system, top set Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer, including any of the above system or equipment Distributed computing environment etc..
The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.
It should be noted that, in this document, the relational terms of such as " first " and " second " or the like are used merely to one A entity or operation with another entity or operate distinguish, without necessarily requiring or implying these entities or operation it Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to Cover non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or setting Standby intrinsic element.
Those skilled in the art will readily occur to its of the application after considering specification and practicing application disclosed herein Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the application Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by following Claim is pointed out.
It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims.

Claims (10)

1. word treatment method outside a kind of collection based on multi-semantic meaning characterized by comprising
According to up and down word of the outer word in sentence is collected, obtains and collect the semantic weight of each of outer word;The word up and down includes that collection is outer At least one preamble participle and at least one postorder participle of the word in sentence;
According to the former term vector of justice in each semanteme, the semantic vector of each semanteme is generated;
According to the weight of each semanteme, semantic vector weighted sum to each semanteme generates emulation term vector.
2. the method according to claim 1, wherein the word up and down according to the outer word of collection in sentence, obtains The step of collecting the weight of each of outer word semanteme, comprising:
Obtain the word up and down for collecting outer word in sentence;
Obtain the first kind distance of the word and each semanteme up and down;
According to the first kind distance, the weight of each semanteme is calculated.
3. according to the method described in claim 2, it is characterized in that, the first kind distance of the acquisition upper and lower word and each semanteme The step of, comprising:
Obtain the COS distance that each justice is former in each participle of word up and down and each semanteme;
According to the COS distance, the average distance of all sememe in each participle of word up and down and each semanteme is obtained;
According to the average distance, the first kind distance of the word and each semanteme up and down is obtained.
4. according to the method described in claim 2, calculating each semanteme it is characterized in that, described according to the first kind distance Weight the step of, use following formula:
Wherein, n is the quantity for collecting outer word justice, and Wm is m-th of semantic weight of the outer word of collection, and Dm is the word up and down and collects outer word M-th of semantic first kind distance,For the sum of the first kind distance of the outer all semantemes of word of collection.
5. the method according to claim 1, wherein the term vector former according to justice in each semanteme, generates The step of semantic vector of each semanteme, comprising:
Obtain the former term vector of justice for collecting that each justice is former in each of outer word semanteme;
According to the former quantity of justice in each semanteme, to the former setting justice original weight of each of each semanteme justice;
According to the former weight of the justice, the term vector former to justice in each semanteme is weighted summation, generates the semanteme of each semanteme Vector.
6. according to the method described in claim 2, it is characterized in that, described obtain the step for collecting up and down word of the outer word in sentence Suddenly, comprising:
It is arranged and takes word window value C, C to be integer and be greater than or equal to 1 for constrain the word quantity up and down;
Word window value C is taken according to described, obtains the word up and down in the participle of the sentence of word outside comprising the collection;
Wherein, the word up and down includes the C participle and postorder C participle for being located at the outer word preamble of the collection in sentence.
7. according to the method described in claim 5, it is characterized in that,
The quantity former according to justice in each semanteme, to the former setting justice original weight of each of each semanteme justice, use is following Formula:
Wp=1/x
Wherein, Wp is the former weight of justice, and x is the quantity that justice is former in semanteme.
8. -7 any method according to claim 1, which is characterized in that further include:
It is obtained from the word segmentation result of unknown problem and collects outer word;
Based on the multi-semantic meaning of word outside the collection, the emulation term vector of the outer word of the collection is generated;
According to the term vector of remaining participle in the emulation term vector and described problem, matches and ask from the Question-Answering Model trained Inscribe answer.
9. word processing unit outside a kind of collection based on multi-semantic meaning characterized by comprising
Semantic weight acquiring unit, for obtaining and collecting the semantic power of each of outer word according to word up and down of the outer word in sentence is collected Weight;The word up and down includes at least one preamble participle and at least one postorder participle for collecting outer word in sentence;
Semantic vector generation unit generates the semantic vector of each semanteme for the term vector former according to justice in each semanteme;
Term vector generation unit is emulated, for the weight according to each semanteme, semantic vector weighted sum to each semanteme is given birth to At emulation term vector.
10. device according to claim 9, which is characterized in that further include:
Collect outer word acquiring unit, collects outer word for obtaining from the word segmentation result of unknown problem;
Collect outer word processing unit, for the multi-semantic meaning based on word outside the collection, generates the emulation term vector of the outer word of the collection;
Unit of answering is asked for the term vector according to remaining participle in the emulation term vector and described problem from what is trained Answer matching problem answer in model.
CN201811498210.2A 2018-06-01 2018-12-07 Method and device for processing foreign words in set based on multiple semantics Active CN109614618B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810556386.2A CN108763217A (en) 2018-06-01 2018-06-01 Word treatment method, intelligent answer method and device outside collection based on multi-semantic meaning
CN2018105563862 2018-06-01

Publications (2)

Publication Number Publication Date
CN109614618A true CN109614618A (en) 2019-04-12
CN109614618B CN109614618B (en) 2023-07-14

Family

ID=64001970

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201810556386.2A Pending CN108763217A (en) 2018-06-01 2018-06-01 Word treatment method, intelligent answer method and device outside collection based on multi-semantic meaning
CN201811498210.2A Active CN109614618B (en) 2018-06-01 2018-12-07 Method and device for processing foreign words in set based on multiple semantics

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201810556386.2A Pending CN108763217A (en) 2018-06-01 2018-06-01 Word treatment method, intelligent answer method and device outside collection based on multi-semantic meaning

Country Status (1)

Country Link
CN (2) CN108763217A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036163A (en) * 2020-08-28 2020-12-04 南京航空航天大学 Method for processing out-of-set words in electric power plan text sequence labeling
CN113254616A (en) * 2021-06-07 2021-08-13 佰聆数据股份有限公司 Intelligent question-answering system-oriented sentence vector generation method and system
CN113468308A (en) * 2021-06-30 2021-10-01 竹间智能科技(上海)有限公司 Conversation behavior classification method and device and electronic equipment
CN113486142A (en) * 2021-04-16 2021-10-08 华为技术有限公司 Semantic-based word semantic prediction method and computer equipment

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727769B (en) 2018-06-29 2024-04-19 阿里巴巴(中国)有限公司 Corpus generation method and device and man-machine interaction processing method and device
CN109740162B (en) * 2019-01-09 2023-07-11 安徽省泰岳祥升软件有限公司 Text representation method, device and medium
CN109740163A (en) * 2019-01-09 2019-05-10 安徽省泰岳祥升软件有限公司 Semantic representation resource generation method and device applied to deep learning model
CN110147446A (en) * 2019-04-19 2019-08-20 中国地质大学(武汉) A kind of word embedding grammar based on the double-deck attention mechanism, equipment and storage equipment
CN111125333B (en) * 2019-06-06 2022-05-27 北京理工大学 Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017177901A1 (en) * 2016-04-12 2017-10-19 芋头科技(杭州)有限公司 Semantic matching method and smart device
CN107798140A (en) * 2017-11-23 2018-03-13 北京神州泰岳软件股份有限公司 A kind of conversational system construction method, semantic controlled answer method and device
CN108038105A (en) * 2017-12-22 2018-05-15 中科鼎富(北京)科技发展有限公司 A kind of method and device that emulation term vector is generated to unregistered word

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017177901A1 (en) * 2016-04-12 2017-10-19 芋头科技(杭州)有限公司 Semantic matching method and smart device
CN107798140A (en) * 2017-11-23 2018-03-13 北京神州泰岳软件股份有限公司 A kind of conversational system construction method, semantic controlled answer method and device
CN108038105A (en) * 2017-12-22 2018-05-15 中科鼎富(北京)科技发展有限公司 A kind of method and device that emulation term vector is generated to unregistered word

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
唐共波等: "基于知网义原词向量表示的无监督词义消歧方法", 《中文信息学报》 *
王景中等: "基于多谓词语义框架的长短语文本相似度计算", 《计算机工程与设计》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112036163A (en) * 2020-08-28 2020-12-04 南京航空航天大学 Method for processing out-of-set words in electric power plan text sequence labeling
CN113486142A (en) * 2021-04-16 2021-10-08 华为技术有限公司 Semantic-based word semantic prediction method and computer equipment
CN113254616A (en) * 2021-06-07 2021-08-13 佰聆数据股份有限公司 Intelligent question-answering system-oriented sentence vector generation method and system
CN113254616B (en) * 2021-06-07 2021-10-19 佰聆数据股份有限公司 Intelligent question-answering system-oriented sentence vector generation method and system
CN113468308A (en) * 2021-06-30 2021-10-01 竹间智能科技(上海)有限公司 Conversation behavior classification method and device and electronic equipment
CN113468308B (en) * 2021-06-30 2023-02-10 竹间智能科技(上海)有限公司 Conversation behavior classification method and device and electronic equipment

Also Published As

Publication number Publication date
CN108763217A (en) 2018-11-06
CN109614618B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN109614618A (en) Multi-semantic-based extraset word processing method and device
US11775760B2 (en) Man-machine conversation method, electronic device, and computer-readable medium
CN110427463B (en) Search statement response method and device, server and storage medium
CN110489755A (en) Document creation method and device
CN111291549B (en) Text processing method and device, storage medium and electronic equipment
CN107329995A (en) A kind of controlled answer generation method of semanteme, apparatus and system
Park et al. Systematic review on chatbot techniques and applications
CN110245253B (en) Semantic interaction method and system based on environmental information
CN109635294A (en) Single-semantic-based unknown word processing method, intelligent question answering method and device
CN112749556B (en) Multi-language model training method and device, storage medium and electronic equipment
CN116541493A (en) Interactive response method, device, equipment and storage medium based on intention recognition
CN110895656A (en) Text similarity calculation method and device, electronic equipment and storage medium
Kowsher et al. Knowledge-base optimization to reduce the response time of bangla chatbot
Liang et al. Intelligent chat robot in digital campus based on deep learning
CN113342924A (en) Answer retrieval method and device, storage medium and electronic equipment
CN115408500A (en) Question-answer consistency evaluation method and device, electronic equipment and medium
Sonia et al. Automatic question-answer generation from video lecture using neural machine translation
JP2018010481A (en) Deep case analyzer, deep case learning device, deep case estimation device, method, and program
Senthilnayaki et al. Crop Yield Management System Using Machine Learning Techniques
Huang Opinion Mining Algorithm Based on the Evaluation of Online Mathematics Course with Python
Shawar et al. Chatbots: Can they serve as natural language interfaces to QA corpus?
WO2023166747A1 (en) Training data generation device, training data generation method, and program
WO2023166746A1 (en) Summary generation device, summary model learning device, summary generation method, summary model learning method, and program
Dang English-Speaking Learning Strategies in University Based on Artificial Intelligence
Ong et al. Chatbot application training using natural language processing techniques: Case of small-scale agriculture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant