CN110245219A

CN110245219A - A kind of answering method and equipment based on automatic extension Q & A database

Info

Publication number: CN110245219A
Application number: CN201910337469.7A
Authority: CN
Inventors: 陆晨昱; 舒畅; 李竹桥; 郑思璇; 朱婷婷; 李先云; 刘尧
Original assignee: Semantic Intelligent Technology (guangzhou) Co Ltd
Current assignee: Semantic Intelligent Technology (guangzhou) Co Ltd
Priority date: 2019-04-25
Filing date: 2019-04-25
Publication date: 2019-09-17

Abstract

The purpose of the application is to provide a kind of answering method and equipment based on automatic extension Q & A database, and the application passes through the synonymous transcription training sample optimization neural network transcription model by obtaining synonymous transcription training sample；Determine the question and answer pair being added in database, wherein the question and answer are to including problem sentence and problem sentence is corresponding answers sentence；By the neural network transcription model after the optimization to the question and answer in the database to the problems in sentence be extended, obtain multiple extension question and answer pair.The coverage rate that question and answer data request user is promoted in the way of the sentence that the question sentence in Q & A database is expanded to a plurality of synonymous or close justice, to optimize the effect of question answering system.

Description

A kind of answering method and equipment based on automatic extension Q & A database

Technical field

This application involves computer field more particularly to a kind of answering method based on automatic extension Q & A database and set It is standby.

Background technique

A kind of scheme for realizing automatic question answering function generallyd use in the industry at present is to be answered based on magnanimity to database Method for inquiring and matching.After user has issued a text request, according to text content " asking to entries all in database Sentence " field is inquired, and " answering sentence " field for the entry being matched to is returned.It requires to have in Q & A database using the program and to the greatest extent may be used Question and answer pair more than energy can just be such that the automatically request-answering system reaches and more manage to cover the text request of user's sending as far as possible The effect thought.In the case where question and answer data are insufficient, often occur to match the feelings less than user's current request in database Condition.

Summary of the invention

The purpose of the application is to provide a kind of answering method and equipment based on automatic extension Q & A database, solves The probability of database matching user request is low in the prior art, the ineffective problem of question answering system.

According to the one aspect of the application, a kind of answering method based on automatic extension Q & A database, the party are provided Method includes:

Synonymous transcription training sample is obtained, the synonymous transcription training sample optimization neural network transcription model is passed through；

Determine the question and answer pair being added in database, wherein the question and answer are to including problem sentence and problem sentence is corresponding answers sentence；

By the neural network transcription model after the optimization to the question and answer in the database to the problems in sentence carry out Extension, obtains multiple extension question and answer pair.

Further, obtaining synonymous transcription training sample includes:

The sentence pair that meaning same degree reaches preset threshold is matched from text database using semantic matches system, In, the sentence pair includes the identical multiple sentences of meaning；

The sentence pair that the meaning same degree reaches preset threshold is put into the synonymous transcription training sample.

Further, meaning same degree is matched from text database using semantic matches system and reaches preset threshold Sentence pair, comprising:

Sentence to be matched is obtained, it is corresponding that each sentence to be matched is filtered out from statement library according to the mode of character string comparison Candidate sentence；

It is given a mark, is matched to the sentence to be matched and its corresponding candidate sentence according to the semantic matches system Degree result；

Determine that meaning same degree reaches the sentence pair of preset threshold according to the matching degree result and preset threshold.

Further, the mode of the character string comparison includes:

Sentence to be matched is segmented using full-text search, obtains word segmentation result；

Inquiry knot is returned by word segmentation result described in search index, and according to the sequence respectively segmented in the word segmentation result Fruit.

Further, obtaining synonymous transcription training sample includes:

The synonymous or close adopted sentence pair of first language is translated into second language using machine translation system, by the sentence after translation To being put into the synonymous transcription training sample.

Further, obtaining synonymous transcription training sample includes:

The synonymous transcription model training sample for training is generated on without mark text by retroversion mode.

Further, the synonymous transcription model training sample for training is generated on without mark text by retroversion mode This, comprising:

Preliminary transcription model is obtained by the synonymous transcription training sample training of acquisition；

Transcription sampling is carried out to without mark text according to the preliminary transcription model, obtains including by the preliminary transcription mould The molecular sentence pair of output sentence of the input sentence and the preliminary transcription model of type；

By the encoder of preliminary transcription model described in the output sentence inputting in the sentence pair, input sentence inputting institute State preliminary transcription solution to model code device.

On the other hand according to the application, a kind of computer-readable medium is additionally provided, is stored thereon with computer-readable Instruction, the computer-readable instruction can be executed by processor to realize a kind of aforementioned asking based on automatic extension Q & A database Answer method.

According to the application another aspect, a kind of equipment of question and answer based on automatic extension Q & A database is additionally provided, Wherein, the equipment includes:

One or more processors；And

It is stored with the memory of computer-readable instruction, the computer-readable instruction makes the processor when executed Execute the operation of the aforementioned method.

Compared with prior art, the application passes through the synonymous transcription training sample by obtaining synonymous transcription training sample This optimization neural network transcription model；Determine the question and answer pair being added in database, wherein the question and answer are to including problem sentence and asking Sentence is corresponding answers sentence for topic；By the neural network transcription model after the optimization to the question and answer in the database to the problems in Sentence is extended, and obtains multiple extension question and answer pair.A plurality of synonymous or close justice is expanded to using by the question sentence in Q & A database The mode of sentence promotes the coverage rate that question and answer data request user, to optimize the effect of question answering system.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 shows a kind of answering method based on automatic extension Q & A database provided according to the one aspect of the application Flow diagram.

The same or similar appended drawing reference represents the same or similar component in attached drawing.

Specific embodiment

The application is described in further detail with reference to the accompanying drawing.

In a typical configuration of this application, terminal, the equipment of service network and trusted party include one or more Processor (CPU), input/output interface, network interface and memory.

Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices or Any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, computer Readable medium does not include non-temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.

Fig. 1 shows a kind of answering method based on automatic extension Q & A database provided according to the one aspect of the application Flow diagram, this method comprises: step S11~step S13,

In step s 11, synonymous transcription training sample is obtained, the synonymous transcription training sample optimization neural network is passed through Transcription model；Here, synonymous transcription training sample includes that multiple groups meaning is identical or approximate sentence, neural network transcription model can Think the transcription model obtained using neural network method training, the detailed construction and hyper parameter of model are unlimited, in the application In one example, using the sequence of attention mechanism to series model (Seq2seq model), wherein Seq2seq model is by encoder (encoder) and decoder (decoder) composition, decoder pass through attention mechanism when generating object statement from encoder Information needed for obtaining.When optimizing neural network transcription model using obtained synonymous transcription training sample, every sample Sentence by one group of identical meanings is to forming, and by a sentence inputting encoder in every group of sentence pair, another is used as decoder Input and output target.

In step s 12, the question and answer pair being added in database are determined, wherein the question and answer are to including problem sentence and problem Sentence is corresponding to answer sentence；Here, being obtained by manual compiling or from internet and determining question and answer pair, such as question and answer pair by handling The problems in sentence be " you like what fruit eaten? ", answering sentence is " I likes eating apple ".Question and answer to being put into database, thus In step s 13, by the neural network transcription model after the optimization to the question and answer in the database to the problems in sentence It is extended, obtains multiple extension question and answer pair.Here, a certain question and answer in such as database to the problems in sentence: " you like eating What fruit? " be extended using the neural network transcription model after optimization, be extended to " fruit that you most like to eat is? ", " that fruit is what you liked? ", " your favorite is in fruit? " etc. meanings are identical or approximate sentence, thus and former Answer sentence " I likes eating apple " to begin forms multiple question and answer pair.To by the way that a question and answer are extended to a plurality of question and answer pair, every The question sentences of question and answer pair is different and to answer sentence identical, and then when user makes requests, it is corresponding to carry out matched and searched in the database Question and answer pair, bigger probability makes the request of user that can find matching in the question and answer pair of database, to promote question and answer system The effect of system.

In step s 11, meaning same degree is matched from text database using semantic matches system and reaches default threshold The sentence pair of value, wherein the sentence pair includes the identical multiple sentences of meaning；The meaning same degree is reached into preset threshold Sentence pair is put into the synonymous transcription training sample.Here, it is identical to find out meaning from mass text using semantic matches system Sentence pair, be used as training sample.Wherein, text is the Chinese text without mark, and the Chinese text of no mark is plain text, is not had Additional information, for example, only 10,000 sentences form 5000 sentence pairs, each sentence pair provides mark 0 or 1, indicates each sentence pair Whether semantically identical, then 5000 sentence pairs are the corpus of text for having mark.When being searched in text database, from no mark Chinese text in search, matching meaning same degree reaches the sentence pair of preset threshold, for example question sentence is a, journey identical as a meaning The sentence that degree reaches preset threshold is a1, then a and a1 forms one group of sentence pair, using the sentence pair as synonymous transcription training sample.

Specifically, the sentence that meaning same degree reaches preset threshold is matched from text database using semantic matches system It is right, it can determine in the following manner: obtain sentence to be matched, be filtered out from statement library often according to the mode of character string comparison The corresponding candidate sentence of one sentence to be matched；According to the semantic matches system to the sentence to be matched and its corresponding candidate sentence It gives a mark, obtains matching degree result；Determine that meaning same degree reaches according to the matching degree result and preset threshold The sentence pair of preset threshold.Here, the statement library of a certain amount of sentence to be matched and magnanimity sentence is prepared in advance, from statement library In find out the pairing of sentence to be matched, for example can be and crawl and screen some texts on the internet with crawler, or from A part of sentence is extracted in the Request Log that family is generated when using product, or a part is chosen from existing Q & A database " question sentence ".The mode of the sentence character string comparison to be matched to each sentence preliminary screening from statement library goes out a certain number of times Sentence is selected, several candidate sentences corresponding to the sentence that each sentence is to be matched are given a mark two-by-two respectively with semantic matching system, Score value height indicates the meaning identical match degree of sentence pair, determines that meaning same degree reaches according to marking result and preset threshold To the sentence pair of preset threshold, i.e., one group of sentence pair for keeping score high with certain threshold value is put into training sample.Wherein, the word The mode that symbol string compares includes: to be segmented sentence to be matched using full-text search, obtains word segmentation result；It is looked by index The word segmentation result is ask, and returns to query result according to the sequence respectively segmented in the word segmentation result.For example, sentence to be matched For " what is your name ", which is segmented, using search index, inquiry all candidate sentences are that " your child cries What name ", " what is your name ", " you say our kitten what is your name ", " your small pet is handed over assorted Name " screens most suitable matching sentence from these candidate sentences.

In one embodiment of the application, synonymous transcription training sample is obtained, can also be obtained in the following manner: utilizing machine The synonymous or close adopted sentence pair of first language is translated into second language by device translation system, the sentence pair after translation is put into described synonymous In transcription training sample.Here, first language is such as English, second language is Chinese, will be ready-made using machine translation system English it is synonymous or approximate sentence pair translates into Chinese, relatively Chinese, English field possesses the training number of a greater amount of synonymous transcriptions According to.Existing English sentence pair is converted into Chinese by way of machine translation, and is added in training sample.

In one embodiment of the application, synonymous transcription training sample is obtained, it can also be in the following manner: by retroversion side Formula generates the synonymous transcription model training sample for training on without mark text.Specifically, pass through the synonymous transcription of acquisition Training sample training obtains preliminary transcription model；Transcription sampling is carried out to without mark text according to the preliminary transcription model, is obtained To include by the preliminary transcription model input sentence and the preliminary transcription model the molecular sentence pair of output sentence；By institute State the encoder of preliminary transcription model described in the output sentence inputting in sentence pair, preliminary transcription mould described in the input sentence inputting The decoder of type.Here, can use training without unsupervised training is carried out on mark text in magnanimity in the way of retroversion " retroversion " mode when Machine Translation Model obtains the training sample of magnanimity.Specifically, it is matched with from semantic matches system The training sample that the training sample or machine translation system mode that lookup mode obtains obtain is trained to obtain rudimentary model, uses Transcription model carries out transcription sampling to the text of no mark, obtains sentence pair, wherein each sentence pair is by inputing to the sentence of transcription model The transcription sentence composition that son and model generate, is used for further training for transcription model for the sentence pair of generation, when training, by it The encoder (such as encoder of Seq2seq model) for the preliminary transcription model of transcription sentence inputting that preceding model generates, and by transcription Sentence (the input sentence of preliminary transcription model) input decoder as training objective.

Continue to connect above-described embodiment, after the training for completing transcription model, question and answer to being added in database, are being used into transcription Model is extended question sentence therein, specifically, by question sentence input transcription model, with certain threshold value generate several it is synonymous Or nearly adopted sentence, sentence composition question and answer are answered with original respectively, database is added together to after, to promote the coverage rate requested user.It needs Illustrate, the transcription model that training obtains is a generative probabilistic model, the sentence of each generation while generating sentence There is corresponding probability value, the more high score for also just representing the generation sentence of probability value is higher.When using model, collection can be used Optimization of Beam Search Algorithm generates sentence, can simply choose the preceding n sentence of highest scoring, can also set a threshold to score Value chooses the sentence that score is higher than threshold value.

By the answering method described herein based on automatic extension Q & A database, depth is based on using a series of It practises and obtains the method for machine learning to obtain the synonymous or close adopted sentence pair of magnanimity, to train transcription model, such as: utilize semantic matches System finds out sentence pair similar in meaning from mass text；Using machine translation system by existing English is synonymous or nearly adopted sentence pair Translate Chinese；In magnanimity without carrying out unsupervised training on mark text in the way of retroversion.In turn, using by question and answer number The mode of the sentence of a plurality of synonymous or close justice is expanded to according to the question sentence in library to promote the coverage rate that question and answer data request user, To optimize the effect of question answering system.

In addition, it is stored thereon with computer-readable instruction the embodiment of the present application also provides a kind of computer-readable medium, The computer-readable instruction can be executed by processor to realize a kind of aforementioned question and answer side based on automatic extension Q & A database Method.

One or more processors；And

It is stored with the memory of computer-readable instruction, the computer-readable instruction makes the processor when executed Execute the operation of method above-mentioned.

For example, computer-readable instruction makes one or more of processors when executed:

Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies Within, then the application is also intended to include these modifications and variations.

It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt With specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment In, the software program of the application can be executed to implement the above steps or functions by processor.Similarly, the application Software program (including relevant data structure) can be stored in computer readable recording medium, for example, RAM memory, Magnetic or optical driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps or function of the application, example Such as, as the circuit cooperated with processor thereby executing each step or function.

In addition, a part of the application can be applied to computer program product, such as computer program instructions, when its quilt When computer executes, by the operation of the computer, it can call or provide according to the present processes and/or technical solution. And the program instruction of the present processes is called, it is possibly stored in fixed or moveable recording medium, and/or pass through Broadcast or the data flow in other signal-bearing mediums and transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, including a device according to one embodiment of the application, which includes using Memory in storage computer program instructions and processor for executing program instructions, wherein when the computer program refers to When enabling by processor execution, method and/or skill of the device operation based on aforementioned multiple embodiments according to the application are triggered Art scheme.

It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned exemplary embodiment, Er Qie In the case where without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and scope of the present application is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included in the application.Any reference signs in the claims should not be construed as limiting the involved claims.This Outside, it is clear that one word of " comprising " does not exclude other units or steps, and odd number is not excluded for plural number.That states in device claim is multiple Unit or device can also be implemented through software or hardware by a unit or device.The first, the second equal words are used to table Show title, and does not indicate any particular order.

Claims

1. a kind of answering method based on automatic extension Q & A database, wherein the described method includes:

By the neural network transcription model after the optimization to the question and answer in the database to the problems in sentence be extended, Obtain multiple extension question and answer pair.

2. according to the method described in claim 1, wherein, obtaining synonymous transcription training sample includes:

The sentence pair that meaning same degree reaches preset threshold is matched from text database using semantic matches system, wherein institute Stating sentence pair includes the identical multiple sentences of meaning；

3. according to the method described in claim 2, wherein, it is identical that meaning is matched from text database using semantic matches system Degree reaches the sentence pair of preset threshold, comprising:

Sentence to be matched is obtained, the corresponding time of each sentence to be matched is filtered out from statement library according to the mode of character string comparison Select sentence；

It is given a mark according to the semantic matches system to the sentence to be matched and its corresponding candidate sentence, obtains matching degree As a result；

4. according to the method described in claim 3, wherein, the mode of the character string comparison includes:

Query result is returned by word segmentation result described in search index, and according to the sequence respectively segmented in the word segmentation result.

5. according to the method described in claim 1, wherein, obtaining synonymous transcription training sample includes:

The synonymous or close adopted sentence pair of first language is translated into second language using machine translation system, the sentence pair after translation is put Enter in the synonymous transcription training sample.

6. according to the method described in claim 1, wherein, obtaining synonymous transcription training sample includes:

7. according to the method described in claim 6, wherein, being generated on without mark text for the same of training by retroversion mode Adopted transcription model training sample, comprising:

Transcription sampling is carried out to without mark text according to the preliminary transcription model, obtains including by the preliminary transcription model Input the molecular sentence pair of output sentence of sentence and the preliminary transcription model；

By the encoder of preliminary transcription model described in the output sentence inputting in the sentence pair, at the beginning of described in the input sentence inputting Walk transcription solution to model code device.

8. a kind of equipment of the question and answer based on automatic extension Q & A database, wherein the equipment includes:

One or more processors；And

It is stored with the memory of computer-readable instruction, the computer-readable instruction when executed executes the processor Such as the operation of any one of claims 1 to 7 the method.

9. a kind of computer-readable medium, is stored thereon with computer-readable instruction, the computer-readable instruction can be processed Device is executed to realize the method as described in any one of claims 1 to 7.