CN110489538A

CN110489538A - Sentence answer method, device and electronic equipment based on artificial intelligence

Info

Publication number: CN110489538A
Application number: CN201910797093.8A
Authority: CN
Inventors: 张倩汶; 饶孟良; 闫昭; 曹云波
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2019-11-22
Anticipated expiration: 2039-08-27
Also published as: CN110489538B

Abstract

The present invention provides a kind of sentence answer method, device, electronic equipment and storage medium based on artificial intelligence；Sentence answer method based on artificial intelligence includes: to obtain user's question sentence, identifies the entity word in user's question sentence；Target corpus is determined in corpus according to user's question sentence, the section sentence granularity that the semanteme of the target corpus is determined as user's question sentence is semantic；It determines the corresponding semantic attribute of the entity word, and the entity word is arranged according to the corresponding progressive relationship of the semantic attribute, the words granularity for obtaining user's question sentence is semantic；It is semantic according to described section of sentence granularity semanteme and the words granularity, determine that the output of user's question sentence is semantic；It is inquired in the knowledge mapping of setting according to the output semanteme, obtains semantic results；Answer statement is generated according to the semantic results.Double grains degree mechanism through the invention is able to ascend semantic generalization ability, promotes the applicability to different user question sentence, realizes correct response.

Description

Sentence answer method, device and electronic equipment based on artificial intelligence

Technical field

The present invention relates to artificial intelligence technology more particularly to a kind of sentence answer method, device, electricity based on artificial intelligence Sub- equipment and storage medium.

Background technique

Artificial intelligence (AI, Artificial Intelligence) is to utilize digital computer or digital computer control Machine simulation, extension and the intelligence for extending people of system, perception environment obtain knowledge and the reason using Knowledge Acquirement optimum By, method, technology and application system.Natural language processing (NLP, Nature Language Processing) is artificial intelligence An important directions, it, which studies to be able to achieve between people and computer, carries out the various theoretical of efficient communication and just with natural language Method is one and melts linguistics, computer science, mathematics in the science of one.

Sentence response is the important branch of natural language processing, refers specifically to convert computer for user's question sentence and is understood that Logical form, and carry out according to stored information the process of response.It is usually to pass through in the scheme that the relevant technologies provide Then a large amount of high quality corpus train classification models carry out semantic parsing to user's question sentence by disaggregated model, according to semanteme The result of parsing carries out response.But it for containing noise or more complicated user's question sentence, is difficult to solve by disaggregated model Analyse that core therein is semantic, being easy to appear can not reply or situation that answer statement is not corresponding with user's question sentence.To sum up, related For the scheme that technology provides to the poor for applicability of different user question sentence, the accuracy of response is low.

Summary of the invention

The embodiment of the present invention provides a kind of sentence answer method based on artificial intelligence, device, electronic equipment and storage and is situated between Matter is able to ascend the applicability to different user question sentence, promotes the accuracy of response.

The technical solution of the embodiment of the present invention is achieved in that

The embodiment of the present invention provides a kind of sentence answer method based on artificial intelligence, comprising:

User's question sentence is obtained, identifies the entity word in user's question sentence；

Target corpus is determined in corpus according to user's question sentence, the semanteme of the target corpus is determined as described The section sentence granularity of user's question sentence is semantic；

Determine the corresponding semantic attribute of the entity word, and the entity word is corresponding progressive according to the semantic attribute Relationship is arranged, and the words granularity for obtaining user's question sentence is semantic；

It is semantic according to described section of sentence granularity semanteme and the words granularity, determine that the output of user's question sentence is semantic；

It is inquired in the knowledge mapping of setting according to the output semanteme, obtains semantic results；

Answer statement is generated according to the semantic results.

In the above scheme, the entity word in identification user's question sentence, comprising:

Entity recognition is carried out to user's question sentence by Named Entity Extraction Model, obtains the first recognition result；

String matching is carried out to user's question sentence according to setting dictionary, obtains the second recognition result；

To first recognition result and second recognition result merges and duplicate removal, obtains entity word.

In the above scheme, described to be inquired in the knowledge mapping of setting according to the output semanteme, obtain semanteme As a result, comprising:

When semantic corresponding at least two semantic attribute of the output, according to the semantic attribute in the output language Sequence in justice, is successively inquired in knowledge mapping, is obtained and each one-to-one attribute results of semantic attribute；

Each attribute results group is combined into semantic results.

In the above scheme, acquisition user's question sentence, comprising:

Obtain user speech；

Speech recognition is carried out to the user speech, obtains user's question sentence.

In the above scheme, further includes:

It identifies the setting symbol in each corpus that the corpus includes, and deletes the setting symbol；

Letter in the corpus is all converted into upper case or lower case；

Chinese character in the corpus is all converted into the complex form of Chinese characters or simplified Chinese character.

The embodiment of the present invention provides a kind of sentence answering device based on artificial intelligence, comprising:

Identification module identifies the entity word in user's question sentence for obtaining user's question sentence；

Section sentence granularity processing module, for determining target corpus in corpus according to user's question sentence, by the mesh The section sentence granularity that the semanteme of poster material is determined as user's question sentence is semantic；

Words granularity processing module, for determining the corresponding semantic attribute of the entity word, and by the entity word according to The corresponding progressive relationship of the semantic attribute is arranged, and the words granularity for obtaining user's question sentence is semantic；

Semantic output module, for determining the user according to described section of sentence granularity semanteme and words granularity semanteme The output of question sentence is semantic；

Result queries module obtains semanteme for being inquired in the knowledge mapping of setting according to the output semanteme As a result；

Sentence generation module, for generating answer statement according to the semantic results.

In the above scheme, the identification module is also used to:

In the above scheme, the result queries module is also used to:

Each attribute results group is combined into semantic results.

In the above scheme, the identification module is also used to:

Obtain user speech；

In the above scheme, the sentence answering device based on artificial intelligence, further includes:

Removing module, the setting symbol in each corpus that the corpus includes for identification, and delete the set symbol Number；

First conversion module, for the letter in the corpus to be all converted to upper case or lower case；

Second conversion module, for the Chinese character in the corpus to be all converted to the complex form of Chinese characters or simplified Chinese character.

The embodiment of the present invention provides a kind of electronic equipment, comprising:

Memory, for storing executable instruction；

Processor when for executing the executable instruction stored in the memory, is realized provided in an embodiment of the present invention Sentence answer method based on artificial intelligence.

The embodiment of the present invention provides a kind of storage medium, is stored with executable instruction, real when for causing processor to execute The existing sentence answer method provided in an embodiment of the present invention based on artificial intelligence.

The embodiment of the present invention has the advantages that

On the one hand the embodiment of the present invention determines target for the user's question sentence got according to user's question sentence in corpus Corpus, on the other hand according to the semantic attribute of the entity word in user's question sentence, obtains words grain so that it is determined that section sentence granularity is semantic Degree is semantic, and comprehensive section sentence granularity semanteme and words granularity are semantic, final to determine that output is semantic, and is answered according to output semanteme It answers, by the mechanism of double grains degree, improves to different user question sentence, the applicability including simple question sentence and complicated question, promoted The accuracy of response.

Detailed description of the invention

Fig. 1 is an optional framework signal of the sentence answering system provided in an embodiment of the present invention based on artificial intelligence Figure；

Fig. 2 is an optional structural schematic diagram of server provided in an embodiment of the present invention；

Fig. 3 is an optional structural representation of the sentence answering device provided in an embodiment of the present invention based on artificial intelligence Figure；

Fig. 4 A is that an optional process of the sentence answer method provided in an embodiment of the present invention based on artificial intelligence is shown It is intended to；

Fig. 4 B is another optional process of the sentence answer method provided in an embodiment of the present invention based on artificial intelligence Schematic diagram；

Fig. 5 is an optional structural schematic diagram of BERT model provided in an embodiment of the present invention；

Fig. 6 is an optional flow diagram of sentence response provided in an embodiment of the present invention；

Fig. 7 is the flow diagram for the sentence question and answer scheme that the relevant technologies provided in an embodiment of the present invention provide；

Fig. 8 is another flow diagram of the sentence answer method provided in an embodiment of the present invention based on artificial intelligence；

Fig. 9 is a contrast schematic diagram of response scene provided in an embodiment of the present invention；

Figure 10 is another contrast schematic diagram of response scene provided in an embodiment of the present invention；

Figure 11 is a schematic diagram of the response scene of complicated question provided in an embodiment of the present invention.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into It is described in detail to one step, described embodiment is not construed as limitation of the present invention, and those of ordinary skill in the art are not having All other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.

In the following description, it is related to " some embodiments ", which depict the subsets of all possible embodiments, but can To understand, " some embodiments " can be the same subsets or different subsets of all possible embodiments, and can not conflict In the case where be combined with each other.

In the following description, related term " first second third " be only be the similar object of difference, no Represent the particular sorted for being directed to object, it is possible to understand that ground, " first second third " can be interchanged specific in the case where permission Sequence or precedence so that the embodiment of the present invention described herein can be other than illustrating herein or describing Sequence is implemented.

Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term used herein is intended merely to the purpose of the description embodiment of the present invention, It is not intended to limit the present invention.

Before the embodiment of the present invention is further elaborated, to noun involved in the embodiment of the present invention and term It is illustrated, noun involved in the embodiment of the present invention and term are suitable for following explanation.

1) entity word: also referred to as name entity, with the entity of certain sense in finger speech sentence, as name, place name, mechanism name and Proper noun etc..Usually by naming Entity recognition (NER, Named Entity Recognition) technology to determine in sentence Entity word.

2) semantic parsing: natural language is converted to the process for the logical form that machine is understood that.

3) ES (ElasticSearch): the search server based on Lucene provides the distributed search of analysis in real time and draws It holds up, herein for determining candidate corpus relevant to user's question sentence, wherein Lucene is for full-text search and to search Engine tool packet.

4) simple question sentence: user's question sentence including a subject and an attribute, such as " age of Zhang San ".

5) complicated question: user's question sentence including at least two subjects and an attribute, such as " year of the wife of Zhang San Age ".

6) be cold-started: the stage that a product or new function have just been born, face verifying the market demand, shortage of data and The difficulty of user's missing.

7) grammatical attribute: including at least one of: name entity attribute (such as name, mechanism name and the university of entity word Name etc.), the part of speech (such as noun, verb and preposition) of entity word, entity word in user's question sentence syntactic structure (such as subject, Predicate and object etc.).

8) semantic attribute: refer to that is defined possesses the text of particular meaning, such as " local " and " wife ".

9) knowledge mapping: for describe various entities and concept present in real world and they between it is existing Relationship is usually built with the structures such as " entity-relationship-entity " and " entity-attribute-attribute value ", at this point, " entity-relation-reality Body " is equivalent to a knowledge in knowledge mapping, " entity-attribute-attribute value " similarly.

10) corpus: user's question sentence as linguistic data.

Inventor has found during implementing the embodiment of the present invention, in the sentence response scheme that the relevant technologies provide, Usually according to online mining or the corpus train classification models of manual compiling, then the disaggregated model completed by training is to user Question sentence carries out semantic parsing, carries out response according to the result of semanteme parsing.Above scheme have the disadvantage in that (1) in product or The quantity of the cold-start phase of new function, the corpus of online mining is very limited, if writing corpus manually, cost of labor mistake Height, and take long time；(2) the semantic generalization ability for the model that training is completed is limited, for some noise-containing user's question sentences, It is semantic to be difficult to position core therein, if user's question sentence " Zhang San other township " includes noise word " people ", user is asked by model When sentence is handled, it is easy to resolve to " people " into the word there are practical significance；(3) model that training is completed is for complicated question Processing capacity is poor, such as " which university, institute the wife of Li Si graduates from ", after model treatment, is often unable to get correct language Justice.

The embodiment of the present invention provides a kind of sentence answer method based on artificial intelligence, device, electronic equipment and storage and is situated between Matter is able to ascend the applicability to different user question sentence, promotes the accuracy of response, illustrates below provided in an embodiment of the present invention The exemplary application of electronic equipment.

It is that one of the sentence answering system 100 provided in an embodiment of the present invention based on artificial intelligence can referring to Fig. 1, Fig. 1 The configuration diagram of choosing supports a sentence response application based on artificial intelligence to realize, terminal device 400 is (exemplary to show Go out terminal device 400-1 and terminal device 400-2) server 200 is connected by network 300, network 300 can be wide area network Or local area network, or be combination, in addition, Fig. 1 also shows the corpus that there is communication connection with server 200 500。

Terminal device 400 is used to (illustrate graphical interfaces 410-1 and graphical interfaces 410- in graphical interfaces 410 2) (referred to as response application) is applied in sentence response of the display based on artificial intelligence；It is also used to according to user in response application Operation obtains user's question sentence, and user's question sentence is sent to server 200；Server 200 is identified for obtaining user's question sentence Entity word in user's question sentence；It is also used to obtain corpus from corpus 500, target is determined in corpus according to user's question sentence Corpus, the section sentence granularity that the semanteme of target corpus is determined as user's question sentence are semantic；It is also used to determine the corresponding semanteme of entity word Attribute, and entity word is arranged according to the corresponding progressive relationship of semantic attribute, the words granularity for obtaining user's question sentence is semantic； It is also used to determine that the output of user's question sentence is semantic according to section sentence granularity semanteme and words granularity semanteme；It is also used to according to output language Justice is inquired in the knowledge mapping of setting, obtains semantic results；It is also used to generate answer statement according to semantic results, will answer It answers sentence and is sent to terminal device 400；Terminal device 400 is also used to show response language in the response application of graphical interfaces 410 Sentence.

Continue with the exemplary application for illustrating electronic equipment provided in an embodiment of the present invention.Electronic equipment may be embodied as Laptop, tablet computer, desktop computer, set-top box, (for example, mobile phone, portable music plays mobile device Device, personal digital assistant, specific messages equipment, portable gaming device) etc. various types of terminal devices, also may be embodied as Server.In the following, being illustrated so that electronic equipment is server as an example.

Referring to fig. 2, Fig. 2 is server provided in an embodiment of the present invention 200 (for example, it may be server shown in FIG. 1 200) configuration diagram, server 200 shown in Fig. 2 include: at least one processor 210, memory 250, at least one Network interface 220 and user interface 230.Various components in server 200 are coupled by bus system 240.It can manage Solution, bus system 240 is for realizing the connection communication between these components.Bus system 240 is in addition to including data/address bus, also Including power bus, control bus and status signal bus in addition.But for the sake of clear explanation, in Fig. 2 all by various buses It is designated as bus system 240.

Processor 210 can be a kind of IC chip, the processing capacity with signal, such as general processor, number Word signal processor (DSP, Digital Signal Processor) either other programmable logic device, discrete gate or Transistor logic, discrete hardware components etc., wherein general processor can be microprocessor or any conventional processing Device etc..

User interface 230 include make it possible to present one or more output devices 231 of media content, including one or Multiple loudspeakers and/or one or more visual display screens.User interface 230 further includes one or more input units 232, packet Include the user interface component for facilitating user's input, for example keyboard, mouse, microphone, touch screen display screen, camera, other are defeated Enter button and control.

Memory 250 can be it is removable, it is non-removable or combinations thereof.Illustrative hardware device includes that solid-state is deposited Reservoir, hard disk drive, CD drive etc..Memory 250 optionally includes one geographically far from processor 210 A or multiple storage equipment.

Memory 250 includes volatile memory or nonvolatile memory, may also comprise volatile and non-volatile and deposits Both reservoirs.Nonvolatile memory can be read-only memory (ROM, Read Only Me mory), and volatile memory can To be random access memory (RAM, Random Access Memor y).The memory 250 of description of the embodiment of the present invention is intended to Memory including any suitable type.

In some embodiments, memory 250 can storing data to support various operations, the example of these data includes Program, module and data structure or its subset or superset, below exemplary illustration.

Operating system 251, including for handle various basic system services and execute hardware dependent tasks system program, Such as ccf layer, core library layer, driving layer etc., for realizing various basic businesses and the hardware based task of processing；

Network communication module 252, for reaching other calculating via one or more (wired or wireless) network interfaces 220 Equipment, illustrative network interface 220 include: bluetooth, Wireless Fidelity (WiFi) and universal serial bus (USB, Universal Serial Bus) etc.；

Module 253 is presented, for via one or more associated with user interface 230 output device 231 (for example, Display screen, loudspeaker etc.) make it possible to present information (for example, for operating peripheral equipment and showing the user of content and information Interface)；

Input processing module 254, for one to one or more from one of one or more input units 232 or Multiple user's inputs or interaction detect and translate input or interaction detected.

In some embodiments, the sentence answering device provided in an embodiment of the present invention based on artificial intelligence can use soft Part mode realizes that Fig. 2 shows the sentence answering devices 255 based on artificial intelligence being stored in memory 250, can be The software of the forms such as program and plug-in unit, including following software module: identification module 2551, section sentence granularity processing module 2552, word Word granularity processing module 2553, semantic output module 2554, result queries module 2555 and sentence generation module 2556, these moulds Block is in logic, therefore to can be combined arbitrarily according to the function of being realized or further split.

The function of modules will be described hereinafter.

In further embodiments, the sentence answering device provided in an embodiment of the present invention based on artificial intelligence can use Hardware mode is realized, as an example, the sentence answering device provided in an embodiment of the present invention based on artificial intelligence can be use The processor of hardware decoding processor form is programmed to perform the sentence provided in an embodiment of the present invention based on artificial intelligence Answer method, for example, the processor of hardware decoding processor form can use one or more application specific integrated circuit (ASIC, Application Specific Integrate d Circuit), DSP, programmable logic device (PLD, Programmable Logic Device), Complex Programmable Logic Devices (CPLD, Complex Programmable Logic Device), field programmable gate array (FPGA, Field-Programmable Gate Array) or other electronic components.

Sentence answer method provided in an embodiment of the present invention based on artificial intelligence can be executed by above-mentioned server, It can be executed by terminal device (for example, it may be terminal device 400-1 and terminal device 400-2 shown in FIG. 1), or by taking Business device and terminal device execute jointly.

Below in conjunction with the exemplary application and structure of the electronic equipment being described above, illustrate to pass through insertion in electronic equipment The sentence answering device based on artificial intelligence and realize the process of the sentence answer method based on artificial intelligence.

It is the sentence answering device 255 provided in an embodiment of the present invention based on artificial intelligence referring to Fig. 3 and Fig. 4 A, Fig. 3 Structural schematic diagram shows a series of process flow that the semantic parsing and sentence response of user's question sentence is realized by modules, figure 4A is the flow diagram of the sentence answer method provided in an embodiment of the present invention based on artificial intelligence, will be in conjunction with Fig. 3 to Fig. 4 A The step of showing is illustrated.

In a step 101, user's question sentence is obtained, identifies the entity word in user's question sentence.

Here, user's question sentence of textual form is obtained, and identifies the name entity in user's question sentence with practical significance, is made For entity word.

In some embodiments, above-mentioned acquisition user's question sentence can be realized in this way: obtaining user's language Sound；Speech recognition is carried out to the user speech, obtains user's question sentence.

As an example, in identification module 2551, being inputted in the form of text input in addition to acquisition user referring to Fig. 3 Outside user's question sentence, user speech can be also obtained, and automatic speech recognition (A SR, Automatic are carried out to user speech Speech Recognition), obtain user's question sentence of textual form.The spirit for obtaining user's question sentence is improved through the above way Activity, the application scenarios suitable for communication software.

In some embodiments, the entity in above-mentioned identification user's question sentence can be realized in this way Word: Entity recognition is carried out to user's question sentence by Named Entity Extraction Model, obtains the first recognition result；According to setting word Library carries out string matching to user's question sentence, obtains the second recognition result；To first recognition result and described second Recognition result merges and duplicate removal, obtains entity word.

As an example, there are two processing branches to user's question sentence in identification module 2551 referring to Fig. 3, one of them Processing branch is to carry out Entity recognition to user's question sentence by Named Entity Extraction Model, obtains the first recognition result, the name Entity recognition model is machine learning model.It, can be by corpus existing in corpus and corpus in order to promote recognition effect Fixed entity word is trained Named Entity Extraction Model, and according to training complete Named Entity Extraction Model to Family question sentence carries out Entity recognition, wherein the embodiment of the present invention without limitation, such as may be used to the concrete type of Named Entity Extraction Model For condition random field (CRF, C onditional Random Field) model or hidden Markov model (HMM, Hidden Markov Mo del).In another processing branch, string matching is carried out to user's question sentence according to setting dictionary, obtains second Recognition result, wherein setting dictionary includes multiple defined entity words, can be established according to practical application scene, character String matching can be multimode matching, and multimode matching refers in user's question sentence while searching multiple mode characters that setting dictionary defines Go here and there (existing word), the embodiment of the present invention to the mode of multimode matching without limitation, such as by establishing dictionary tree (trie tree) Multimode matching is carried out, or carries out multimode matching etc. by establishing AC automatic machine (Aho-Corasick automation).

After the completion of being handled by two processing branches user's question sentence, the result for handling branches to two is merged, i.e., To the first recognition result and the second recognition result merges and duplicate removal, obtains entity word.For example, the first recognition result packet Include word₁And word₂, the second recognition result includes word₂And word₃, then after merging and duplicate removal, obtained entity word includes word₁、word₂And word₃.Above-mentioned in such a way that Named Entity Extraction Model carries out Entity recognition to user's question sentence, It can recognize that the neologisms not occurred in setting dictionary, pass through the above-mentioned side for carrying out string matching according to setting dictionary Formula, can match the existing word in setting dictionary, and comprehensive two kinds of identification methods improve the accuracy for the entity word determined And validity.

In a step 102, target corpus is determined in corpus according to user's question sentence, by the language of the target corpus The section sentence granularity that justice is determined as user's question sentence is semantic.

In embodiments of the present invention, by section the double grains degree mechanism of sentence granularity and words granularity carries out semantic parsing.Specifically Ground determines target corpus in corpus according to user's question sentence, the semanteme of target corpus is determined as user and is asked in section sentence granularity The semanteme of sentence will be it is determined here that the semanteme gone out be named as section sentence granularity semanteme for the ease of distinguishing.Wherein, corpus, which refers to, has determined that Semantic question sentence, can obtain corpus, and corpus is added in corpus in such a way that data on line crawl or manually write.

In some embodiments, between arbitrary steps, the sentence answer method based on artificial intelligence further include: know The setting symbol in each corpus that the not described corpus includes, and delete the setting symbol；Letter in the corpus is complete Portion is converted to upper case or lower case；Chinese character in the corpus is all converted into the complex form of Chinese characters or simplified Chinese character.

The embodiment of the present invention can pre-process corpus, specifically, setting in each corpus that identification corpus includes Determine symbol, and delete setting symbol, setting symbol be with the incoherent meaningless symbol of corpus content, can be according to practical application field Scape is configured, be such as set as "！", the symbols such as " # " and " * ".Also, by corpus letter all be converted to capitalization or it is small It writes, the Chinese character in corpus is all converted into the complex form of Chinese characters or simplified Chinese character, guarantees the uniformity of corpus.

In step 103, the corresponding semantic attribute of the entity word is determined, and by the entity word according to the semantic category Property corresponding progressive relationship arranged, the words granularity for obtaining user's question sentence is semantic.

It is worth noting that when being arranged entity word according to the corresponding progressive relationship of semantic attribute, by corresponding language The entity word of adopted attribute itself replaces with the semantic attribute, is convenient for subsequent parsing.

As an example, in the N-Gram matching of words granularity processing module 2553, passing through left and right physical machine referring to Fig. 3 Restrict the progressive relationship of beam semantic attribute.For example, user's question sentence for " which university, institute Zhang San graduates from ", the entity word that identifies Including " Zhang San " and " university ", " university " corresponding semantic attribute is " graduated school ", the corresponding progressive relationship of the semantic attribute Are as follows: Zuo Shiti [person]/right entity [universit y], the then sequence according to entity word in user's question sentence from left to right It is arranged, while " Zhang San " is replaced with into " graduated school ", the words granularity for obtaining user's question sentence is semantic are as follows: Zhang San finishes Industry universities and colleges, above-mentioned [person] and [university] are grammatical attribute.

In the case where some complicated questions, user's question sentence may correspond at least two semantic attributes.For example, user's question sentence " which university, institute the wife of Zhang San graduates from ", the entity word identified includes " Zhang San ", " wife " and " university ", and " wife " is right The semantic attribute answered is " wife ", and progressive relationship is Zuo Shiti [perso n]/right entity [person], " university " corresponding language Adopted attribute is " graduated school ", progressive relationship are as follows: Zuo Shiti [person]/right entity [university], then according to entity word Sequence in user's question sentence from left to right arranges, while entity word is replaced with corresponding semantic attribute, obtains words Granularity is semantic: Zhang San wife graduated school.Wherein, if there are at least two grammers before the corresponding entity word of semantic attribute The entity word that attribute is consistent with the left entity of semantic attribute, then by least two entity words according in user's question sentence from left to right Sequence merge, such as in the above example, " Zhang San " and " wife " is consistent with [person], then will " Zhang San " and " always Mother-in-law " merges, collectively as the left entity of " graduated school " attribute.

In some embodiments, the corresponding semantic category of the above-mentioned determination entity word can be realized in this way Property: the entity word is matched at least two semantic dictionaries of setting, wherein each semantic dictionary is one corresponding Semantic attribute；When the entity word and the semantic dictionary successful match, by the corresponding semantic attribute of the semanteme dictionary, really It is set to the semantic attribute of the entity word.

Variability based on language, i.e., multiple words may correspond to the same meaning, in embodiments of the present invention, set multiple Semantic attribute, and a semantic dictionary is set for each semantic attribute, for example, semantic attribute is " local ", corresponding semanteme word Library includes words such as " families ", " home " and " hometown ".

As an example, in words granularity processing module 2553, by the entity word in user's question sentence and owning referring to Fig. 3 Semantic dictionary is matched, i.e., will when the semanteme dictionary includes the entity word when entity word and some semantic dictionary successful match The corresponding semantic attribute of semanteme dictionary, is determined as the semantic attribute of entity word.It improves through the above way and determines semantic belong to The success rate of property.

In some embodiments, after the corresponding semantic attribute of the determination entity word, further includes: by the user Entity word not corresponding with the semantic attribute is determined as non-matching word in question sentence, and determine the non-matching word and except it is described not The word weight of entity word outside matching word；Sentence according to the word weight of the non-matching word and user's question sentence is long, determines institute State the penalty values of non-matching word；According to the penalty values and the word weight of the entity word in addition to the non-matching word, determine described in The sentence of user's question sentence scores；When sentence scoring is less than sentence scoring threshold value, determine the words granularity semanteme for sky.

Since there may be noises in user's question sentence, and noise may be identified as entity word, therefore implement in the present invention In example, entity word is analyzed.Specifically, entity word not corresponding with semantic attribute in user's question sentence is determined as not matching Word, and determine the word weight of non-matching word and the entity word in addition to non-matching word, word weight can be preset, such as by word " Zhang San " Word weight be set as 0.6, set 0.5 for the word weight of word " Li Si ".Then, by the word weight of non-matching word divided by user The sentence of question sentence is long, obtains the penalty values of non-matching word, wherein sentence is long to refer to the sum of word that user's question sentence includes.It will not match The word weight of entity word outside word adds up, and accumulated result is subtracted penalty values, obtains the sentence scoring of user's question sentence.When When sentence scoring is less than the sentence scoring threshold value of setting, determines that the entity word identified is unreliable, determine words granularity semanteme for sky； When sentence is scored above sentence scoring threshold value, by the entity word in addition to non-matching word, according to the corresponding progressive relationship of semantic attribute into Row arrangement, the words granularity for obtaining user's question sentence are semantic, wherein sentence scoring threshold value can be configured according to practical application scene. By the above-mentioned means, excessive to noise, the excessive user's question sentence of mistake does not calculate words granularity semanteme, saves process resource.

In some embodiments, can realize in this way the above-mentioned basis non-matching word word weight and The sentence of user's question sentence is long, determines the penalty values of the non-matching word: determining the grammatical attribute of the non-matching word；When described When the grammatical attribute of non-matching word is subject, determine that the punishment of the non-matching word is divided into sky；When the grammer of the non-matching word When attribute is not subject, the sentence according to the word weight of the non-matching word and user's question sentence is long, determines the non-matching word Penalty values.

In embodiments of the present invention, it may recognize that the non-matching word with physical meaning, and the non-matching word do not punished It penalizes.For example, it sets and grammatical attribute is not punished as the non-matching word of name subject, it is assumed that user's question sentence is " Zhang San people Local ", noise therein are " people ", and the entity word identified includes " Zhang San ", " people " and " local ", and corresponding semantic attribute is Entity word " local ", then non-matching word includes " Zhang San " and " people ", is name subject in the grammatical attribute for determining " Zhang San ", and After the grammatical attribute of " people " is not name subject, determine that the punishment of non-matching word " Zhang San " is divided into sky, for non-matching word " people ", It is long according to the word weight of the non-matching word and the sentence of user's question sentence, determine the penalty values of non-matching word.On this basis, it is assumed that The word weight in " local " is 0.8, and the word weight of " Zhang San " is 0.6, and the word weight of " people " is 0.5, then sentence scoring can be obtained and be - 0.5/5 (penalty values of " people ")=0.7 0.8-0*0.6 (" Zhang San " is not punished).To effectively have reality through the above way Border meaning and non-matching word without physical meaning separate, and improve the accuracy of calculated sentence scoring.

At step 104, semantic according to described section of sentence granularity semanteme and the words granularity, determine user's question sentence Output is semantic.

Section sentence granularity semanteme and words granularity semanteme are being obtained, is therefrom selecting one, the output as user's question sentence is semantic. The embodiment of the present invention to the mode selected without limitation, for example, due to complicated question words granularity parsing effect usually more It is good, therefore when the long threshold value long more than sentence of the sentence of user's question sentence, it is semantic that words granularity semanteme is determined as output；When user's question sentence Sentence is long when being less than the long threshold value of sentence, and it is semantic that section sentence granularity semanteme is determined as output.

In some embodiments, it can realize in this way above-mentioned semantic and described according to described section of sentence granularity Words granularity is semantic, determines that the output of user's question sentence is semantic: when described section of sentence granularity semanteme is not sky, by described section of sentence It is semantic that granularity semanteme is determined as output；When described section of sentence granularity semanteme is sky, and the words granularity semanteme is not sky, by institute It states words granularity semanteme and is determined as output semanteme；It is defeated when described section of sentence granularity semanteme and the words granularity semanteme are sky The prompt of answer failed out.

As an example, in semantic output module 2554, determining output using the preferential mechanism of section sentence granularity referring to Fig. 3 It is semantic to be determined directly as output when section sentence granularity semanteme is not sky by semanteme for section sentence granularity semanteme；When section sentence granularity semanteme is When the case where sky, and words granularity semanteme is not empty such as question sentence is complicated question, words granularity semanteme is determined as to export language Justice；When section sentence granularity is semantic and words granularity semanteme is sky, since semanteme can not be parsed, the prompt of answer failed is exported. Wherein, it is that empty situation is described in detail later that section sentence granularity semanteme, which is empty and words granularity semanteme,.It is set by above-mentioned The mode of granularity priority is set, simple question sentence is semantic as output using section sentence granularity semanteme, and complicated question uses words granularity Semanteme is semantic as output, improves the semantic accuracy of output.

In step 105, it is inquired in the knowledge mapping of setting according to the output semanteme, obtains semantic results.

As an example, in result queries module 2555, inquiry meets output semanteme in knowledge mapping referring to Fig. 3 Knowledge obtains semantic results.Such as a knowledge in knowledge mapping is " Zhang San-local-Beijing ", semantic output is Zhang San Local then inquires the knowledge according to output semanteme in knowledge mapping, and " Beijing " in the knowledge is determined as semantic knot Fruit.

In some embodiments, it can also realize in this way above-mentioned semantic in setting according to the output It is inquired in knowledge mapping, obtains semantic results: when semantic corresponding at least two semantic attribute of the output, according to Sequence of the semantic attribute in the output semanteme, successively inquires in knowledge mapping, obtains and each semantic attribute One-to-one attribute results；Each attribute results group is combined into semantic results.

When corresponding at least two semantic attribute of the output semanteme determined, each semanteme is successively inquired in knowledge mapping The corresponding attribute results of attribute, for example, semantic for output: Zhang San wife graduated school, firstly, according to the semanteme of " wife " Attribute, whom the wife that Zhang San is inquired in knowledge mapping is, so that it is determined that first attribute results, it is assumed herein that for " Zhu is small Elder sister "；Then, on the basis of first attribute results, according to the semantic attribute of " graduated school ", Zhu is inquired in knowledge mapping What the graduated school of Miss is, so that it is determined that second attribute results.Two attribute results are combined, are obtained final Semantic results.It is worth noting that can also possess semantic attribute for section sentence granularity semanteme, which is setting In the semanteme of corpus.The careful property of semantic results is improved through the above way, suitable for different ways to put questions.

In step 106, answer statement is generated according to the semantic results.

As an example, referring to Fig. 3, it, can be semantic to output by the sentence template of setting in sentence generation module 2556 And semantic results are combined processing, obtain answer statement, can such as set sentence template as " xx of xx is xx ", in output semanteme For the local Zhang San, in the case of semantic results are Pekinese, answer statement " local of Zhang San is Beijing " is obtained, to promote use Family experience.Certainly, can also be by semantic results directly as answer statement, it is not limited in the embodiment of the present invention.

Implemented by above-mentioned example of the inventive embodiments for Fig. 4 A it is found that the machine that the embodiment of the present invention passes through double grains degree System, improves to different user question sentence, the applicability including simple question sentence and complicated question improves the accuracy of response.

In some embodiments, B, Fig. 4 B are that the sentence provided in an embodiment of the present invention based on artificial intelligence is answered referring to fig. 4 The step of answering another optional flow diagram of method, showing in conjunction with Fig. 3 to Fig. 4 B is illustrated.

In figure 4b, step 102 shown in Fig. 4 A can be realized by step 1021 to step 1024, specifically:

In step 1021, candidate's corpus is determined in corpus according to user's question sentence.

As an example, in section sentence granularity processing module 2552, being determined in corpus according to user's question sentence referring to Fig. 3 Relevant corpus, as candidate corpus.

In some embodiments, can realize in this way it is above-mentioned according to user's question sentence in corpus It determines candidate's corpus: determining the grammatical attribute of the entity word, and be described in subject by grammatical attribute in user's question sentence Entity word replaces with reference word；In corpus Integrated query corpus relevant to replaced user's question sentence, and determine The degree of correlation between the corpus and replaced user's sentence；It will expire with the degree of correlation of replaced user's sentence The corpus of sufficient degree of correlation condition, is determined as candidate corpus.

While identifying the entity word in user's question sentence, pass through part-of-speech tagging technology (POS tagging, Pa rt-Of- Speech tagging) and the relevant technologies determine grammatical attribute of the entity word in user's question sentence, due to the corpus in corpus Subject is not included usually, therefore grammatical attribute is that the entity word of subject causes unfavorable shadow to the process for determining candidate corpus in order to prevent It rings, the entity word that grammatical attribute in user's question sentence is subject is replaced with into reference word, refers to word such as letter A.

Then, replaced user's question sentence is handled by ES, replaced user's question sentence can be specifically divided Word includes the corpus of any word in word segmentation result in corpus Integrated query, and is determined between corpus and replaced user's question sentence The degree of correlation.Herein, can include using corpus word in word segmentation result TF-IDF as feature, obtain the degree of correlation, TF-IDF is Referring to the result of product of TF and IDF, wherein word frequency (TF, Term Frequency) indicates the frequency that the word occurs in the corpus, Reverse document-frequency (IDF, Inve rse Document Frequency) is the measurement of the general importance of the word, can be by language Material collects the interior total quantity including corpus and takes denary logarithm to obtain divided by the quantity of the corpus comprising the word, then by obtained quotient Arrive, TF and IDF with degree of correlation positive correlation.In addition to this, more features can be also introduced when determining the degree of correlation, this Place repeats no more.The embodiment of the present invention supports corpus inquiry by using ES, largely increases in the corpus quantity of corpus, such as increases When growing to hundred million grades, the management inquiry for quickly handling mass data is remained to, ensures that the search condition of various complexity can be controlled short It is returned in time (usually 1 second), is easy time-out when in face of inquiry compared to traditional relational database such as MySQL database The case where, tool has a distinct increment.

For the degree of correlation, degree of correlation condition is set, it is the highest n degree of correlation of numerical value that degree of correlation condition, which is such as arranged, and n is big In 0 and the integer less than 25.The corpus of degree of correlation condition will be met with the degree of correlation of replaced user's sentence, is determined as candidate Corpus, the replaced user's question sentence for being such as input to ES is " whom the wife of A is ", and the candidate corpus of ES output includes the " wife of A Whom is ", " what is your name by the son's wife of A " and " you know that whom wife of A is " etc..Pass through the above-mentioned side for determining candidate corpus Formula realizes the preliminary screening to the relevant corpus of user's sentence, provides data basis for subsequent determining target corpus.

In some embodiments, it can realize that the above-mentioned semanteme by the target corpus is determined as in this way The section sentence granularity of user's question sentence is semantic: by the reference word in the semanteme of the target corpus, replacing with grammatical attribute For the entity word of subject, the section sentence granularity for obtaining user's question sentence is semantic.

On the basis of corpus in corpus does not include subject, for the target corpus determined, by target corpus Reference word in semanteme replaces with the entity word that grammatical attribute is subject, and the section sentence granularity for obtaining user's question sentence is semantic.For example, mesh Poster material is " whom the wife of A is ", and the semanteme of target corpus is wife A., and grammatical attribute is that the entity word of subject is " Zhang San ", The section sentence granularity semanteme for then obtaining user's question sentence is Zhang San wife, and guarantee is subsequent to inquire correct language in knowledge mapping Adopted result.

In step 1022, the statement similarity between user's question sentence and the candidate corpus is determined.

In some embodiments, above-mentioned determination user's question sentence and the time can also be realized in this way It selects the statement similarity between corpus: determining the between user's question sentence and the candidate corpus by neural network model One similarity；The second similarity between user's question sentence and the candidate corpus is determined by extreme gradient lift scheme； According to first similarity and second similarity, the sentence phase between user's question sentence and the candidate corpus is determined Like degree.

As an example, in section sentence granularity processing module 2552, user's question sentence and candidate corpus are input to referring to Fig. 3 Neural network model, the neural network model of the embodiment of the present invention can be indicated for the alternating binary coding of Transformer (BERT, Bidirectional Encoder Representation from Transformers) model, it is executed by BERT model Classification task realizes that user's question sentence and the similarity of candidate corpus are predicted.BERT model includes embeding layer and full articulamentum, wherein Embeding layer for generating the corresponding term vector of user's question sentence and the corresponding term vector of candidate corpus, full articulamentum be used for word to Amount is handled, and the similarity generated between user's question sentence and candidate corpus will pass through neural network model for the ease of distinguishing The similarity of generation is named as the first similarity.For example, user's question sentence is " whom the wife of A is ", and candidate corpus is that " you know Whom the wife of road A is ", after being input to BERT model, by the output result 0.866 of BERT model, as the first similarity.Value It must illustrate, be by training corpus collection to each layer in BERT model of power when being trained in advance to BERT model Weight parameter is adjusted, and training corpus collection is different from corpus above, and include in training corpus collection is pairs of corpus, with And similarity corresponding with pairs of corpus.

User's question sentence and candidate corpus are input to extreme ladder in the section sentence granularity processing module 2552 of Fig. 3, while also Degree promotes (XGBoost, eXtreme Gradient Boosting) model, in input, by fasttext, TF-IDF and At least one of one-hot coding mode carries out fusion treatment to user's question sentence and candidate corpus and generates input feature vector, and will be defeated Enter feature and is input to XGBoost model.XGBoost model passes through classification and regression tree (CART, the Classification for including And Regression Tree) execution classification task is carried out to input feature vector, it ultimately generates corresponding with input feature vector similar The similarity generated by XGBoost model is named as the second similarity for the ease of distinguishing by degree.For example, Yong Huwen Sentence is " whom the wife of A is ", and candidate corpus is " you know that whom the wife of A is ", generates input feature vector, and input feature vector is defeated Enter to XGBoost model, by the output result 0.85 of XGBoost model, as the second similarity.It is worth noting that this Inventive embodiments to the generating mode of input feature vector without limitation, for example, only including letter in user's question sentence and candidate corpus In the case where, input feature vector can be set as { the character total length of user's question sentence, the character total length of candidate corpus, user's question sentence With the character overall length difference of candidate corpus, the word quantity of user's question sentence, the word quantity of candidate corpus }.In addition, right in advance During XGBoost model training, ten folding cross validations can be used, determine the accuracy rate of XGBoost model, when accuracy rate height When accuracy rate threshold value, determine that XGBoost model training is completed.Ten folding cross validations, which refer to, will be used for trained training corpus collection Be divided into 10 parts, in turn will wherein 9 parts be used as training data, 1 part is used as test data, is tested, and 10 times are tested and is obtained The average value of accuracy rate is as final accuracy rate.

Summation is weighted to the first similarity and the second similarity and obtains statement similarity, the first similarity and the second phase It can be determined according to practical application scene like corresponding weight is spent, weight is bigger, then the significance level of corresponding similarity is got over It greatly, is 0.6 as the corresponding weight of the first similarity is arranged, the corresponding weight of the second similarity is 0.4, by taking above-mentioned example as an example, It is 0.6*0.866+0.4*0.85 ≈ 0.86 that statement similarity, which then can be obtained,.By above-mentioned global neurological network model and extremely The mode of gradient lift scheme improves the accuracy for the statement similarity determined.

In step 1023, when the statement similarity is more than the first similarity threshold, the candidate corpus is determined For target corpus, the section sentence granularity that the semanteme of the target corpus is determined as user's question sentence is semantic.

In embodiments of the present invention, target corpus is determined by the first similarity threshold of setting.When super there is only one When crossing the statement similarity of the first similarity threshold, the corresponding candidate corpus of the statement similarity is determined as target corpus；When There are at least two more than the first similarity threshold statement similarity when, by the corresponding candidate of the highest statement similarity of numerical value Corpus is determined as target corpus.Then, the section sentence granularity for the semanteme of target corpus being determined as user's question sentence is semantic.

In step 1024, when the statement similarity of all candidate corpus is less than first similarity When threshold value, determine the target corpus and described section of sentence granularity semanteme for sky.

In another scenario, the statement similarity of all candidate corpus is less than the first similarity threshold, at this point, will Target corpus and section sentence granularity semanteme are determined as sky.

In some embodiments, after step 1022, further includes: when between user's question sentence and the candidate corpus Statement similarity is less than first similarity threshold, and when more than the second similarity threshold, and user's question sentence is submitted To the audit side of setting, and the audit side is obtained to the auditing result of user's question sentence；When the auditing result is correct When sentence, user's question sentence is added to the corpus；Wherein, it is similar to be greater than described second for first similarity threshold Spend threshold value.

Limitation due to model by training corpus collection, for the question sentence type that do not trained or frequency of training is less, mould The effect that type handles it may be bad, even if that is, user's question sentence is similar compared with candidate corpus, but after model treatment, The similarity of generation will not be too high.For the situation, other than the first similarity threshold, the embodiment of the present invention also sets up second Similarity threshold, wherein the first similarity threshold is greater than the second similarity threshold.Language between user's question sentence and candidate corpus Sentence similarity is less than the first similarity threshold, and when more than the second similarity threshold, and user's question sentence is committed to the careful of setting Core side, and audit side is obtained to the auditing result of user's question sentence, wherein audit side such as manual examination and verification side.When auditing result is positive When true sentence, user's question sentence is added to corpus, realizes that the dynamic of corpus increases.It in addition to this, can also be by user's question sentence It is added to training corpus in pairs with candidate corpus to concentrate, so that training neural network model and extreme gradient lift scheme, are promoted Processing capacity of two models for different types of user's question sentence.

In figure 4b, before step 103, the grammatical attribute of the entity word can also be determined in step 107.

From in user's question sentence identify user's question sentence in entity word while, determine that entity word exists by part-of-speech tagging technology Grammatical attribute in user's question sentence.

In step 108, it is verified according to grammatical attribute of the grammar templates of setting to the entity word.

As an example, in words granularity processing module 2553, being provided with grammar templates referring to Fig. 3.For example, In In grammar templates, it is set with subject-noun corresponding relationship, is verified according to grammatical attribute of the grammar templates to entity word, Judge whether the grammatical attribute of entity word meets subject-noun corresponding relationship.

In step 109, when there is the grammatical attribute for not meeting the grammar templates, it is corresponding to delete the grammatical attribute Entity word.

For example, the grammatical attribute when entity word is subject-preposition, and grammar templates include subject-noun corresponding relationship When, it determines that the grammatical attribute does not meet grammar templates, rejects the corresponding entity word of the grammatical attribute, it is subsequent no longer to the entity word It processes.

Implemented by above-mentioned example of the inventive embodiments for Fig. 4 B it is found that the embodiment of the present invention passes through first in corpus In filter out candidate corpus, then target corpus is filtered out from candidate corpus, so that the semanteme of target corpus is determined as user The section sentence granularity of question sentence is semantic, improves the accuracy for the section sentence granularity semanteme determined；In addition, according to grammar templates to entity Word is filtered, and improves the accuracy of the subsequent semantic attribute determined.

In order to make it easy to understand, the embodiment of the invention provides the structural schematic diagrams of BERT model as shown in Figure 5.In determination When the first similarity in section sentence granularity, Text Pretreatment is carried out to user's question sentence and candidate corpus first, including delete setting Symbol, capital and small letter format and the either traditional and simplified characters format of Chinese character of unified letter etc. operate, then distinguish user's question sentence and candidate corpus Segmented, the embodiment of the present invention to the mode of participle without limitation.For user's question sentence, segment and obtain Tok1 ... TokN's Word segmentation result；For candidate corpus, the word segmentation result of Tok ' M, wherein N and M is whole greater than 0 that participle obtains Tok ' 1 ... Number.It is worth noting that also adding two additional characters when word segmentation result is inputted BERT model, wherein [C LS] is to use In the additional character of classification output, i.e. similarity between instruction final output user question sentence and candidate corpus, [SEP] is to be used for Separate the additional character of discontinuous sequence of tokens, i.e. the participle knot of the word segmentation result for separating user's question sentence and candidate corpus Fruit.Then, in BERT model, word segmentation result and additional character are converted to by term vector by embeding layer, and by connecting entirely It connects layer to handle term vector, the first similarity between final output user question sentence and candidate corpus, i.e. C in Fig. 5.

In the following, will illustrate exemplary application of the embodiment of the present invention in some actual application scenarios.

It is sentence response flow diagram provided in an embodiment of the present invention referring to Fig. 6, Fig. 6.In Fig. 6, Yong Huwen Sentence is " whom the wife of Xiao Zhang is ", and by name Entity recognition and part-of-speech tagging, it is " small for obtaining the entity word in user's question sentence " and " wife ", and determine that the grammatical attribute of " Xiao Zhang " is name subject.Assuming that the semantic dictionary of semantic attribute " Zhang San " includes " Zhang San " and " Xiao Zhang " two words, the semantic dictionary of semantic attribute " spouse " include " spouse " and " wife " two words, and semantic Simultaneously progressive relationship is not present in attribute " Zhang San ", then arranges according to " wife " corresponding progressive relationship entity word, simultaneously will After entity word replaces with corresponding semantic attribute, it is semantic to obtain output: Zhang San spouse.According to the semantic building Subject, Predicate and Object of output (SPO, Subject-Predication-Object) triple, obtains<S: Zhang San, P: spouse, O:>, according to the SPO ternary Group is inquired in knowledge mapping, and obtaining semantic results is Zhu little Jie, finally " is opened according to the sentence template generation answer statement of setting Three spouse is Zhu little Jie.", complete the whole flow process of sentence response.

It is the process signal for the sentence question and answer scheme that the relevant technologies provided in an embodiment of the present invention provide referring to Fig. 7, Fig. 7 Figure.In Fig. 7, by data on line crawl and manually write in the way of generate corpus, be added in corpus, and to corpus Interior corpus carries out data prediction.Then, user speech is obtained, user speech is converted to by user's question sentence by ASR, and right User's question sentence is named Entity recognition, obtains the entity word in user's question sentence.Entity word is input to trained classification mould Type, it is according to the output of disaggregated model as a result, determining with the highest target corpus of user's question sentence degree of closeness in corpus, and will The semanteme of target corpus is semantic as the output of user's question sentence.Map inquiry is carried out according to output is semantic, obtains semantic results, then Answer statement is generated according to semantic results, completes to reply.In the sentence question and answer scheme that above-mentioned the relevant technologies provide, by dividing Class model determines target corpus, causes semantic generalization ability limited, it is difficult to parse the core semantic information in user's question sentence, especially It is for complicated question, it is possible that the case where parsing mistake can not even parse.

It is another process of the sentence answer method provided in an embodiment of the present invention based on artificial intelligence referring to Fig. 8, Fig. 8 Schematic diagram.In fig. 8, the mode for crawling also with data on line and manually writing generates corpus, is added in corpus, but Compared to the requirement in the related technology to corpus quantity up to ten thousand easily, the embodiment of the present invention can be on the basis of thousand or so corpus Sentence response is carried out, and guarantees certain accuracy.After completing the building of corpus, data are carried out to the corpus in corpus Pretreatment, data prediction herein includes but is not limited to: deleting the setting symbol in corpus, the letter in corpus is all turned It is changed to upper case or lower case, the Chinese character in corpus is all converted into the complex form of Chinese characters or simplified Chinese character.Then, user speech is obtained, is passed through User speech is converted to user's question sentence by ASR, and is named Entity recognition to user's question sentence, obtains the entity in user's question sentence Word.Name Entity recognition herein is combined using two ways, and first way is such as instructed according to Named Entity Extraction Model The conditional random field models perfected are identified to obtain entity word to user's question sentence；The second way be according to setting dictionary to Family question sentence carries out multimode matching, using the word of successful match as entity word.The result that two ways obtains is merged and gone Weight, obtains final entity word.

The embodiment of the invention provides the semantic mechanism for resolving of double grains degree, in section sentence granularity, by grammer in user's question sentence Attribute is that the entity word of subject replaces with reference word, and replaced user's question sentence is input to ES.By the query function of ES, A plurality of corpus relevant to replaced user's question sentence in corpus is obtained, by the degree of correlation between replaced user's question sentence The corpus for meeting degree of correlation condition is determined as candidate corpus, completes preliminary screening.Then, user's question sentence and candidate corpus is defeated Enter to trained BERT model and X GBoost model, the between user's question sentence and candidate corpus is obtained by BERT model One similarity obtains the second similarity between user's question sentence and candidate corpus by XGBoost model, to the first similarity and It is preferred that second similarity is balanced coefficient, that is, is weighted summation, obtains statement similarity.

For statement similarity, measured by the first similarity threshold of setting and the second similarity threshold, wherein the One similarity threshold is greater than the second similarity threshold, and the level1 in Fig. 8 is the first similarity threshold, and level2 is second similar Threshold value is spent, score is statement similarity.It is when while statement similarity is more than the first similarity threshold, corresponding candidate corpus is true It is set to target corpus, the section sentence granularity that the semanteme of target corpus is determined as user's question sentence is semantic, wherein if there are at least two times The statement similarity for selecting corpus is more than the first similarity threshold, then the corresponding candidate corpus of the highest statement similarity of numerical value is true It is set to target corpus；When the statement similarity of all candidate corpus is less than the first similarity threshold, target corpus is determined And section sentence granularity semanteme is sky.In addition, being less than the first similarity threshold when existing, and more than the sentence of the second similarity threshold When similarity, user's question sentence is committed to manual examination and verification side, and obtains manual examination and verification side to the auditing result of user's question sentence, when careful When core result is correct sentence, which is added to corpus；When auditing result is wrong sentence, the user is abandoned Question sentence.It is worth noting that when the semanteme of target corpus is determined as section sentence granularity semanteme, it will also be in section sentence granularity semanteme Subject replaces with the entity word that grammatical attribute is subject.By BERT model and XGBoost model, essence has been carried out to candidate corpus Screening, to can determine and the immediate target corpus of user's question sentence.

In words granularity, the grammatical attribute of entity word is determined, and according to the N-Gram grammar templates of setting to entity word Grammatical attribute verified, weed out the corresponding entity word of grammatical attribute for not meeting grammar templates.Then, N-Gram is carried out Matching determines the corresponding semantic attribute of semantic dictionary of successful match firstly, entity word is matched with semantic dictionary, and Entity word is arranged according to the corresponding progressive relationship of semantic attribute, while entity word is replaced with into semantic attribute, is used The words granularity of family question sentence is semantic.

It is semantic for section sentence granularity semanteme and words granularity, output language is determined in such a way that section sentence granularity semanteme is preferential Justice, it may be assumed that when section sentence granularity semanteme is not sky, it is semantic that section sentence granularity semanteme is determined as output；When section sentence granularity semanteme be sky, And it is semantic to be determined as output when not being sky by words granularity semanteme for words granularity semanteme；When section sentence granularity is semantic and words granularity When semanteme is sky, the prompt of answer failed is exported.It is inquired in knowledge mapping according to obtained output semanteme, obtains language Justice as a result, and according to semantic results generate answer statement, replied, complete the process of entire sentence response.Compared to correlation The sentence question and answer scheme that technology provides, the embodiment of the present invention are improved by the mechanism of double grains degree to different user question sentence, including The applicability of simple question sentence and complicated question, improves the accuracy of response.

It is a contrast schematic diagram of response scene provided in an embodiment of the present invention referring to Fig. 9, Fig. 9.The left figure of Fig. 9 is portion The chat application for affixing one's name to the sentence response scheme that the relevant technologies provide, in the response scene generated when sentence response.In left figure In response scene in, user input textual form user's question sentence " what is your name ", chat application parses it, It obtains correctly exporting semanteme, and is correctly replied.But in question answering process 91, when user inputs user's question sentence " Li Xiaoming It is high " when, for chat application by disaggregated model, the core semanteme that can not parse user's question sentence is "high", causes parsing to fail, returns The prompt " I should study hard, and not understand what you are saying unexpectedly " of answer failed is returned.For user's question sentence, " you know Lee Xiao Ming is high ", chat application can not equally parse, and return to the prompt of answer failed.

The right figure of Fig. 9 is to dispose the chat application of the sentence answer method based on artificial intelligence of the embodiment of the present invention, In Carry out the intelligent response scene generated when sentence response.Compared to the situation that can not parse semanteme in question answering process 91, In In question answering process 92, when user inputs user's question sentence " Li Xiaoming is high ", chat application parses it, determines therein Core semanteme is "high", and output semantic is Li Xiaoming height, generates answer statement according to output is semantic, specially " Li Xiaoming's Height is 187 centimetres ".User input user's question sentence " you know that Li Xiaoming is high " when, chat application equally can to its into The correct parsing of row, it is Li Xiaoming height that it is semantic, which to obtain output, and generates correct answer statement.

It is another contrast schematic diagram of response scene provided in an embodiment of the present invention referring to Figure 10, Figure 10.A left side of Figure 10 Figure is the chat application for disposing the sentence response scheme that the relevant technologies provide, in the response scene generate when sentence response. In response scene in left figure, user inputs user's question sentence " Zhang San local " of textual form, and chat application solves it Analysis, obtains correctly exporting semantic, and exports correct answer statement " birthplace of Zhang San is Chengdu.".But in question answering process In 101, when user inputs noise-containing user's question sentence " Zhang San other township ", chat application is by disaggregated model, to user Question sentence has carried out the semantic parsing of mistake, causes to have replied a music links.

The right figure of Figure 10 is to dispose the chat application of the sentence answer method based on artificial intelligence of the embodiment of the present invention, In Carry out the intelligent response scene generated when sentence response.When user inputs user's question sentence " Zhang San the age how old ", chat is answered It is parsed with to it, determines that core semanteme therein is " age ", semantic output is the Zhang San age, semantic according to output Answer statement is generated, specially " age of Zhang San is 36.".When user inputs user's question sentence " Zhang San local ", chat application It can correctly be parsed, obtain exporting the birthplace semantic Zhang San, generating answer statement, " birthplace of Zhang San is into All.".Compared in question answering process 101, the situation of user's question sentence " Zhang San other township " parsing mistake, in question answering process 102, When user inputs noise-containing user's question sentence " Zhang San other township ", chat application can filter out noise " people ", obtain To the output semanteme birthplace Zhang San, generating answer statement, " birthplace of Zhang San is Chengdu."

It is a schematic diagram of the response scene of complicated question provided in an embodiment of the present invention referring to Figure 11, Figure 11.Figure 11 Left figure and right figure be dispose the embodiment of the present invention the sentence answer method based on artificial intelligence chat application, carrying out The response scene generated when sentence response, wherein the right figure of Figure 11 is the next screen of the left figure of Figure 11.In Figure 11, chat is answered With the user speech inputted in a manner of voice input is obtained, speech recognition is carried out to user speech and obtains user's question sentence.For The question answering process 111 of left figure, when user's question sentence is " company of the husband of Zhang little Jie ", chat application parses it, obtains Semantic to output is Miss Zhang husband company, is inquired according to the sequence of semantic attribute, i.e., first in knowledge mapping to language Adopted attribute " husband " is inquired, and attribute results " Liu " is obtained, then on the basis of the attribute results, to semantic attribute " company " is inquired, and attribute results " store xx " is obtained.Attribute results group is finally combined into semantic results, and generates response language " husband of Zhang little Jie is Liu to sentence, and the company of Liu is the store xx.".

When user's question sentence is " which university, institute the wife of Liu graduates from ", chat application parses it, obtains Semantic output is Liu wife graduated school, according to the sequence of semantic attribute, i.e., first inquires " wife ", then inquire " graduation The sequence of universities and colleges ", is successively inquired in knowledge mapping, and obtaining the corresponding attribute results of semantic attribute " wife " is " Zhang little Jie ", The corresponding attribute results of semantic attribute " graduated school " are " so-and-so primary school, so-and-so middle school, so-and-so university ", finally by attribute results Group is combined into semantic results, and generates answer statement " wife of Liu is Zhang little Jie, and the graduated school of Zhang little Jie is that so-and-so is small It learns, so-and-so middle school, so-and-so university."

In the question answering process 112 of Figure 11 right figure, when user's question sentence is " Nanjing mayoral is how old ", chat application It is parsed, it is the Nanjing mayor age that it is semantic, which to obtain output, according to the sequence of semantic attribute, i.e., " mayor " first is inquired, The sequence for inquiring " age " again, is successively inquired in knowledge mapping, and it is " blue for obtaining the corresponding attribute results of semantic attribute " mayor " So-and-so ", the corresponding attribute results of semantic attribute " age " are " 55 ", attribute results group are finally combined into semantic results, and generate Answer statement " mayor in Nanjing be it is blue so-and-so, so-and-so blue age is 55."

When user's question sentence is " wife of Zhang San is Miss Zhu ", chat application parses it, retains grammer category Property be name subject " Zhang San ", filter out grammatical attribute not and be the non-matching word " Zhu little Jie " of subject, obtaining output semanteme is Zhang San wife is inquired in knowledge mapping according to output semanteme, is obtained semantic results " Zhu little Jie ", and tied according to semanteme Fruit generates answer statement, and " wife of Zhang San is Zhu little Jie.".

When user's question sentence is " poplar has had 18 years old ", chat application parses it, identifies that semantic attribute is " age ", it is the poplar age that it is semantic, which to obtain output, is inquired in knowledge mapping according to output semanteme, obtains semantic results " 32 ", and " age of poplar is 32 according to semantic results generation answer statement.".

It continues with the explanation sentence answering device 255 provided in an embodiment of the present invention based on artificial intelligence and is embodied as software The exemplary structure of module, in some embodiments, as shown in Fig. 2, being stored in the sentence based on artificial intelligence of memory 250 Software module in answering device 255 may include: identification module 2551, for obtaining user's question sentence, identify that the user asks Entity word in sentence；Section sentence granularity processing module 2552, for determining target corpus in corpus according to user's question sentence, The section sentence granularity that the semanteme of the target corpus is determined as user's question sentence is semantic；Words granularity processing module 2553 is used In determining the corresponding semantic attribute of the entity word, and by the entity word according to the corresponding progressive relationship of the semantic attribute into Row arrangement, the words granularity for obtaining user's question sentence are semantic；Semantic output module 2554, for according to described section of sentence granularity language The adopted and described words granularity is semantic, determines that the output of user's question sentence is semantic；Result queries module 2555, for according to Output semanteme is inquired in the knowledge mapping of setting, obtains semantic results；Sentence generation module 2556, for according to Semantic results generate answer statement.

In some embodiments, section sentence granularity processing module 2552 is also used to: according to user's question sentence in corpus Determine candidate's corpus；Determine the statement similarity between user's question sentence and the candidate corpus；When the statement similarity When more than the first similarity threshold, the candidate corpus is determined as target corpus, the semanteme of the target corpus is determined as The section sentence granularity of user's question sentence is semantic；When the statement similarity of all candidate corpus is less than described first When similarity threshold, determine the target corpus and described section of sentence granularity semanteme for sky.

In some embodiments, described to determine candidate's corpus in corpus according to user's question sentence, comprising: to determine institute The grammatical attribute of entity word is stated, and the entity word that grammatical attribute in user's question sentence is subject is replaced with into reference word； In corpus Integrated query corpus relevant to replaced user's question sentence, and determine the corpus and replaced institute State the degree of correlation between user's sentence；The corpus of degree of correlation condition will be met with the degree of correlation of replaced user's sentence, It is determined as candidate corpus.

In some embodiments, section sentence granularity processing module 2552 is also used to: by the institute in the semanteme of the target corpus Reference word is stated, the entity word that grammatical attribute is subject is replaced with, the section sentence granularity for obtaining user's question sentence is semantic.

In some embodiments, the statement similarity between determination user's question sentence and the candidate corpus, packet It includes: the first similarity between user's question sentence and the candidate corpus is determined by neural network model；Pass through extreme ladder Degree lift scheme determines the second similarity between user's question sentence and the candidate corpus；According to first similarity and Second similarity determines the statement similarity between user's question sentence and the candidate corpus.

In some embodiments, the sentence answering device 255 based on artificial intelligence further include: auditing module, for working as institute The statement similarity stated between user's question sentence and the candidate corpus is less than first similarity threshold, and more than the second phase When like degree threshold value, user's question sentence is committed to the audit side of setting, and obtain the audit side to user's question sentence Auditing result；Adding module, for when the auditing result is correct sentence, user's question sentence to be added to the corpus Collection；Wherein, first similarity threshold is greater than second similarity threshold.

In some embodiments, identification module 2551 is also used to: by Named Entity Extraction Model to user's question sentence Entity recognition is carried out, the first recognition result is obtained；String matching is carried out to user's question sentence according to setting dictionary, obtains the Two recognition results；To first recognition result and second recognition result merges and duplicate removal, obtains entity word.

In some embodiments, the sentence answering device 255 based on artificial intelligence further include: grammer determining module is used for Determine the grammatical attribute of the entity word；Correction verification module, for the grammer category according to the grammar templates of setting to the entity word Property is verified；Entity word removing module, for deleting institute's predicate when there is the grammatical attribute for not meeting the grammar templates Attribute corresponding entity word.

In some embodiments, the sentence answering device 255 based on artificial intelligence further include: word weight determination module is used In entity word not corresponding with the semantic attribute in user's question sentence is determined as non-matching word, and determines and described do not match The word weight of word and the entity word in addition to the non-matching word；Penalty values determining module, for the word according to the non-matching word The sentence of weight and user's question sentence is long, determines the penalty values of the non-matching word；Score determining module, for punishing according to The word weight of point penalty and the entity word in addition to the non-matching word determines the sentence scoring of user's question sentence；Empty semantic determining mould Block, for determining the words granularity semanteme for sky when sentence scoring is less than sentence scoring threshold value.

In some embodiments, the word weight of the matching word non-according to and the sentence of user's question sentence are long, determine The penalty values of the non-matching word, comprising: determine the grammatical attribute of the non-matching word；When the grammatical attribute of the non-matching word When for subject, determine that the punishment of the non-matching word is divided into sky；When the grammatical attribute of the non-matching word is not subject, according to The sentence of the word weight of the non-matching word and user's question sentence is long, determines the penalty values of the non-matching word.

In some embodiments, words granularity processing module 2553, is also used to: by the entity word and at least the two of setting A semanteme dictionary is matched, wherein each semantic corresponding semantic attribute of dictionary；When the entity word and institute's predicate When adopted dictionary successful match, by the corresponding semantic attribute of the semanteme dictionary, it is determined as the semantic attribute of the entity word.

In some embodiments, semantic output module 2554, is also used to:, will when described section of sentence granularity semanteme is not sky It is semantic that described section of sentence granularity semanteme is determined as output；When described section of sentence granularity semanteme is sky, and the words granularity semanteme is not When empty, it is semantic that the words granularity semanteme is determined as output；When described section of sentence granularity semanteme and the words granularity are semantic When for sky, the prompt of answer failed is exported.

In some embodiments, result queries module 2555, is also used to: described in the output semantic corresponding at least two When semantic attribute, according to the semantic attribute it is described output semanteme in sequence, successively inquired in knowledge mapping, obtain with Each one-to-one attribute results of the semantic attribute；Each attribute results group is combined into semantic results.

In some embodiments, identification module 2551 is also used to: obtaining user speech；Voice is carried out to the user speech Identification, obtains user's question sentence.

In some embodiments, the sentence answering device 255 based on artificial intelligence further include: removing module, for identification The setting symbol in each corpus that the corpus includes, and delete the setting symbol；First conversion module, being used for will be described Letter in corpus is all converted to upper case or lower case；Second conversion module, for all converting the Chinese character in the corpus For the complex form of Chinese characters or simplified Chinese character.

The embodiment of the present invention provides a kind of storage medium for being stored with executable instruction, wherein it is stored with executable instruction, When executable instruction is executed by processor, processor will be caused to execute method provided in an embodiment of the present invention, for example, such as Fig. 4 A Or the sentence answer method based on artificial intelligence shown in 4B.

In some embodiments, storage medium can be FRAM, ROM, PROM, EPROM, EE PROM, flash memory, magnetic surface The memories such as memory, CD or CD-ROM；Be also possible to include one of above-mentioned memory or any combination various equipment.

In some embodiments, executable instruction can use program, software, software module, the form of script or code, By any form of programming language (including compiling or interpretative code, or declaratively or process programming language) write, and its It can be disposed by arbitrary form, including be deployed as independent program or be deployed as module, component, subroutine or be suitble to Calculate other units used in environment.

As an example, executable instruction can with but not necessarily correspond to the file in file system, can be stored in A part of the file of other programs or data is saved, for example, being stored in hypertext markup language (H TML, Hyper Text Markup Language) in one or more scripts in document, it is stored in the single file for being exclusively used in discussed program In, alternatively, being stored in multiple coordinated files (for example, the file for storing one or more modules, subprogram or code section).

As an example, executable instruction can be deployed as executing in a calculating equipment, or it is being located at one place Multiple calculating equipment on execute, or, be distributed in multiple places and by multiple calculating equipment of interconnection of telecommunication network Upper execution.

In conclusion improving the extensive energy of semanteme of sentence response by the mechanism of double grains degree through the embodiment of the present invention Power either can obtain more accurately answer statement for simple question sentence or complicated question, improve response just True rate；Also, compared to usually in ten thousand grades or more of corpus quantity demand, the embodiment of the present invention is to cold start-up rank in the related technology The corpus quantitative requirement of section is not high, by inventor's experimental verification, when the corpus quantity in corpus reaches thousand or so, and language Sentence response just can reach certain accuracy rate, and by way of corpus self-propagation, can constantly promote corpus can By property.

The above, only the embodiment of the present invention, are not intended to limit the scope of the present invention.It is all in this hair Made any modifications, equivalent replacements, and improvements etc., is all included in the scope of protection of the present invention within bright spirit and scope.

Claims

1. a kind of sentence answer method based on artificial intelligence characterized by comprising

Target corpus is determined in corpus according to user's question sentence, and the semanteme of the target corpus is determined as the user The section sentence granularity of question sentence is semantic；

Determine the corresponding semantic attribute of the entity word, and by the entity word according to the corresponding progressive relationship of the semantic attribute It is arranged, the words granularity for obtaining user's question sentence is semantic；

Answer statement is generated according to the semantic results.

2. sentence answer method according to claim 1, which is characterized in that it is described according to user's question sentence in corpus Middle determining target corpus, the section sentence granularity that the semanteme of the target corpus is determined as user's question sentence are semantic, comprising:

Candidate's corpus is determined in corpus according to user's question sentence；

Determine the statement similarity between user's question sentence and the candidate corpus；

When the statement similarity is more than the first similarity threshold, the candidate corpus is determined as target corpus, it will be described The section sentence granularity that the semanteme of target corpus is determined as user's question sentence is semantic；

When the statement similarity of all candidate corpus is less than first similarity threshold, the mesh is determined Poster material and described section of sentence granularity semanteme are sky.

3. sentence answer method according to claim 2, which is characterized in that it is described according to user's question sentence in corpus Middle determining candidate corpus, comprising:

It determines the grammatical attribute of the entity word, and the entity word that grammatical attribute in user's question sentence is subject is replaced To refer to word；

In corpus Integrated query corpus relevant to replaced user's question sentence, and after determining the corpus and replacement User's sentence between the degree of correlation；

The corpus of degree of correlation condition will be met with the degree of correlation of replaced user's sentence, is determined as candidate corpus.

4. sentence answer method according to claim 2, which is characterized in that determination user's question sentence and the time Select the statement similarity between corpus, comprising:

The first similarity between user's question sentence and the candidate corpus is determined by neural network model；

The second similarity between user's question sentence and the candidate corpus is determined by extreme gradient lift scheme；

According to first similarity and second similarity, the language between user's question sentence and the candidate corpus is determined Sentence similarity.

5. sentence answer method according to claim 1, which is characterized in that the corresponding semanteme of the determination entity word Before attribute, further includes:

Determine the grammatical attribute of the entity word；

It is verified according to grammatical attribute of the grammar templates of setting to the entity word；

When there is the grammatical attribute for not meeting the grammar templates, the corresponding entity word of the grammatical attribute is deleted.

6. sentence answer method according to claim 1, which is characterized in that further include:

Entity word not corresponding with the semantic attribute in user's question sentence is determined as non-matching word, and determine it is described not The word weight of entity word with word and in addition to the non-matching word；

Sentence according to the word weight of the non-matching word and user's question sentence is long, determines the penalty values of the non-matching word；

According to the penalty values and the word weight of the entity word in addition to the non-matching word, determine that the sentence of user's question sentence is commented Point；

When sentence scoring is less than sentence scoring threshold value, determine the words granularity semanteme for sky.

7. sentence answer method according to claim 1, which is characterized in that the corresponding semanteme of the determination entity word Attribute, comprising:

The entity word is matched at least two semantic dictionaries of setting, wherein each semantic dictionary corresponding one A semantic attribute；

When the entity word and the semantic dictionary successful match, the corresponding semantic attribute of the semanteme dictionary is determined as The semantic attribute of the entity word.

8. sentence answer method according to any one of claims 1 to 7, which is characterized in that described according to described section of sentence grain Degree semanteme and the words granularity are semantic, determine that the output of user's question sentence is semantic, comprising:

When described section of sentence granularity semanteme is not sky, it is semantic that described section of sentence granularity semanteme is determined as output；

When described section of sentence granularity semanteme is sky, and the words granularity semanteme is not sky, the words granularity semanteme is determined It is semantic for output；

When described section of sentence granularity semanteme and the words granularity semanteme are sky, the prompt of answer failed is exported.

9. a kind of sentence answering device based on artificial intelligence characterized by comprising

Section sentence granularity processing module, for determining target corpus in corpus according to user's question sentence, by the target language The section sentence granularity that the semanteme of material is determined as user's question sentence is semantic；

Words granularity processing module, for determining the corresponding semantic attribute of the entity word, and by the entity word according to described The corresponding progressive relationship of semantic attribute is arranged, and the words granularity for obtaining user's question sentence is semantic；

Semantic output module, for determining user's question sentence according to described section of sentence granularity semanteme and words granularity semanteme Output it is semantic；

Result queries module obtains semantic results for being inquired in the knowledge mapping of setting according to the output semanteme；

10. a kind of electronic equipment characterized by comprising

Memory, for storing executable instruction；

Processor when for executing the executable instruction stored in the memory, is realized described in any one of claim 1 to 8 Sentence answer method.