CN107526725A - The method and apparatus for generating text based on artificial intelligence - Google Patents

The method and apparatus for generating text based on artificial intelligence Download PDF

Info

Publication number
CN107526725A
CN107526725A CN201710787262.0A CN201710787262A CN107526725A CN 107526725 A CN107526725 A CN 107526725A CN 201710787262 A CN201710787262 A CN 201710787262A CN 107526725 A CN107526725 A CN 107526725A
Authority
CN
China
Prior art keywords
identification information
text
sequence
word
information sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710787262.0A
Other languages
Chinese (zh)
Other versions
CN107526725B (en
Inventor
刘毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201710787262.0A priority Critical patent/CN107526725B/en
Publication of CN107526725A publication Critical patent/CN107526725A/en
Application granted granted Critical
Publication of CN107526725B publication Critical patent/CN107526725B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application discloses the method and apparatus for generating text based on artificial intelligence.One embodiment of this method includes:Expanded text is treated in acquisition;Expanded text is treated in cutting, obtains treating the word sequence of expanded text;According to the word and the corresponding relation of identification information prestored, it is determined that identification information sequence corresponding with word sequence;By the text extended model of identified identification information sequence inputting training in advance, the identification information sequence of the text after generation extension;According to the identification information sequence and word and the corresponding relation of identification information generated, the text after generation extension.This embodiment improves the diversity of text generation.

Description

The method and apparatus for generating text based on artificial intelligence
Technical field
The application is related to field of computer technology, and in particular to Internet technical field, more particularly to based on artificial intelligence The method and apparatus for generating text.
Background technology
Artificial intelligence (Artificial Intelligence), english abbreviation AI.It is research, develop for simulating, Extension and the extension intelligent theory of people, method, a new technological sciences of technology and application system.Artificial intelligence is to calculate One branch of machine science, it attempts to understand essence of intelligence, and produce it is a kind of it is new can be in a manner of human intelligence be similar The intelligence machine made a response, the research in the field include robot, language identification, image recognition, natural language processing and specially Family's system etc..
At present, when being extended to text, it is based primarily upon the offline database pre-established and realizes, expanded text will be treated In word replace with word in offline database with its semantic similarity, the text after being extended with generation.
However, the document creation method used at present, because offline database maintenance cost is higher, and data are limited, institute More limited to text generation result.Influence the diversity of text generation of knowing clearly.
The content of the invention
The purpose of the embodiment of the present application is to propose a kind of improved method for generating text based on artificial intelligence And device, to solve the technical problem that background section above is mentioned.
In a first aspect, this application provides a kind of method for generating text based on artificial intelligence, this method includes: Expanded text is treated in acquisition;Expanded text is treated in cutting, obtains treating the word sequence of expanded text;Believed according to the word prestored and mark The corresponding relation of breath, it is determined that identification information sequence corresponding with word sequence;Identified identification information sequence inputting is instructed in advance Experienced text extended model, the identification information sequence of the text after generation extension, wherein, text extended model is used to characterize to wait to expand Corresponding relation between the identification information sequence of text after opening up the identification information sequence of text and extending;According to the mark generated Know information sequence and the corresponding relation of word and identification information, the text after generation extension.
In certain embodiments, text extended model includes encoding model and decoded model, and encoding model, which is used to characterize, to be marked Know the corresponding relation between information sequence and coded information sequences, decoded model is used for the mark for characterizing the starting word pre-set Corresponding relation between information, both coded information sequences and identification information sequence;And by identified identification information sequence The text extended model of training in advance is inputted, generates the identification information sequence of the text after extension, including:By identified mark The coded information sequences of expanded text are treated in information sequence input coding model, generation;By the coded information sequences generated and rise The identification information input decoded model of beginning word, the identification information sequence of the text after generation extension.
In certain embodiments, the volume of expanded text is treated into identified identification information sequence inputting encoding model, generation Code information sequence, including:Each identification information positive sequence in identified identification information sequence is inputted to the forward direction for coding Recognition with Recurrent Neural Network is propagated, generates the first reference encoder information sequence;By each mark in identified identification information sequence Information inverted order inputs the backpropagation Recognition with Recurrent Neural Network for coding, generates the second reference encoder information sequence;According to first The coded information sequences of expanded text sequence are treated in reference encoder information sequence and the second reference encoder information sequence, generation.
In certain embodiments, the identification information of the coded information sequences generated and starting word is inputted into decoded model, The identification information sequence of text after generation extension, including:Based on the Recognition with Recurrent Neural Network for decoding and the coding generated Information sequence, the identification information sequence of the alternative follow-up word sequence of prediction starting word;According to each identification information sequence predicted The probability that the identification information that arranging includes occurs, calculate the probability of identification information sequence appearance;From each mark letter predicted Cease in sequence according to the descending sequential selection predetermined number identification information sequence of the probability of appearance, as the text after extension This identification information sequence.
In certain embodiments, based on the Recognition with Recurrent Neural Network for decoding and the coded information sequences generated, prediction The identification information sequence of the alternative follow-up word sequence of word is originated, including:Determine to be generated during prediction every time according to attention model Coded information sequences weight;The coded information sequences generated are weighted according to weight;Based on following for decoding Coded information sequences after ring neutral net and weighting, the identification information sequence of the alternative follow-up word sequence of prediction starting word.
In certain embodiments, text extended model is trained via following steps:By the click logs of search engine In, query statement corresponding with same clickthrough forms sample group two-by-two;The query statement that each sample group of cutting includes, The each word being syncopated as;According to the sequential selection preset number that occurrence number is descending from each word being syncopated as Word;For selected each word allocation identification information, and store the corresponding relation of word and identification information;According to word and identification information Corresponding relation, it is determined that identification information sequence corresponding with the query statement that each sample group includes;Will be with each sample group bag Identification information sequence corresponding to two query statements included, respectively as input and output, training obtains text extended model.
In certain embodiments, it is that the Query Information inputted according to terminal generates to treat expanded text;And according to giving birth to Into identification information sequence and the corresponding relation of word and identification information, generation extension after text after, method also includes:It is based on The text generated scans for operating, and obtains search result information;Search result information is pushed to terminal.
Second aspect, this application provides a kind of device for being used to generate text based on artificial intelligence, the device includes: Acquiring unit, expanded text is treated for obtaining;Cutting unit, expanded text is treated for cutting, obtain treating the word order of expanded text Row;Determining unit, the word and the corresponding relation of identification information prestored for basis, it is determined that mark letter corresponding with word sequence Cease sequence;First generation unit, for by the text extended model of identified identification information sequence inputting training in advance, generation The identification information sequence of text after extension, wherein, text extended model is used to characterize the identification information sequence for treating expanded text Corresponding relation between the identification information sequence of the text after extension;Second generation unit, for according to the mark generated The corresponding relation of information sequence and word and identification information, the text after generation extension.
In certain embodiments, text extended model includes encoding model and decoded model, and encoding model, which is used to characterize, to be marked Know the corresponding relation between information sequence and coded information sequences, decoded model is used for the mark for characterizing the starting word pre-set Corresponding relation between information, both coded information sequences and identification information sequence;And first generation unit, including:Coding Subelement, for identified identification information sequence inputting encoding model, generation to be treated into the coded information sequences of expanded text;Solution Numeral unit, for the identification information of the coded information sequences generated and starting word to be inputted into decoded model, after generation extension Text identification information sequence.
In certain embodiments, coded sub-units, further it is configured to:Will be each in identified identification information sequence Individual identification information positive sequence inputs the forward-propagating Recognition with Recurrent Neural Network for coding, generates the first reference encoder information sequence;Will Each identification information inverted order in identified identification information sequence inputs the backpropagation Recognition with Recurrent Neural Network for coding, raw Into the second reference encoder information sequence;According to the first reference encoder information sequence and the second reference encoder information sequence, generation is treated The coded information sequences of expanded text sequence.
In certain embodiments, decoding subunit is further configured to:Based on the Recognition with Recurrent Neural Network for decoding and The coded information sequences generated, the identification information sequence of the alternative follow-up word sequence of prediction starting word;It is every according to what is predicted The probability that the identification information that individual identification information sequence includes occurs, calculate the probability of identification information sequence appearance;From being predicted Each identification information sequence according to the descending sequential selection predetermined number identification information sequence of the probability of appearance, make For the identification information sequence of the text after extension.
In certain embodiments, decoding subunit is further configured to:When determining prediction every time according to attention model The weight of the coded information sequences generated;The coded information sequences generated are weighted according to weight;Based on for solving Coded information sequences after the Recognition with Recurrent Neural Network of code and weighting, the identification information sequence of the alternative follow-up word sequence of prediction starting word Row.
In certain embodiments, device also includes training unit, and training unit is used for:By the click logs of search engine In, query statement corresponding with same clickthrough forms sample group two-by-two;The query statement that each sample group of cutting includes, The each word being syncopated as;According to the sequential selection preset number that occurrence number is descending from each word being syncopated as Word;For selected each word allocation identification information, and store the corresponding relation of word and identification information;According to word and identification information Corresponding relation, it is determined that identification information sequence corresponding with the query statement that each sample group includes;Will be with each sample group bag Identification information sequence corresponding to two query statements included, respectively as input and output, training obtains text extended model.
In certain embodiments, it is that the Query Information inputted according to terminal generates to treat expanded text;And device also wraps Push unit is included, push unit is used for:Scan for operating based on the text generated, obtain search result information;To terminal Push search result information.
The third aspect, this application provides a kind of equipment, including:One or more processors;Storage device, for storing One or more programs, when one or more of programs are by one or more of computing devices so that it is one or Multiple processors realize method as described in relation to the first aspect.
Fourth aspect, this application provides a kind of computer-readable recording medium, computer program is stored thereon with, it is special Sign is, method as described in relation to the first aspect is realized when the program is executed by processor.
The method and apparatus for generating text based on artificial intelligence that the embodiment of the present application provides, wait to expand by obtaining Text is opened up, and expanded text is treated in cutting, obtains treating the word sequence of expanded text, then will identification information sequence corresponding with word sequence The text extended model of row input training in advance, the identification information sequence of the text after generation extension, finally, according to what is generated The corresponding relation of identification information sequence and word and identification information, the text after extension is generated, improve the diversity of text generation.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that the application can apply to exemplary system architecture figure therein;
Fig. 2 is the schematic of one embodiment for being used to generate the method for text based on artificial intelligence according to the application Flow chart;
Fig. 3 is the schematic diagram for being used to generate the application scenarios of the method for text based on artificial intelligence according to the application;
Fig. 4 is the signal for being used to generate another embodiment of the method for text based on artificial intelligence according to the application Property flow chart;
Fig. 5 is the exemplary of one embodiment for being used to generate the device of text based on artificial intelligence according to the application Structure chart;
Fig. 6 is adapted for the structural representation of the computer system of the server for realizing the embodiment of the present application.
Embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Be easy to describe, illustrate only in accompanying drawing to about the related part of invention.
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Describe the application in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 show can apply the application based on artificial intelligence be used for generate the method for text or based on artificial intelligence The exemplary system architecture 100 for being used to generate the embodiment of the device of text of energy.
As shown in figure 1, system architecture 100 can include terminal device 101,102,103, network 104 and server 105, 106.Network 104 between terminal device 101,102,103 and server 105,106 provide communication link medium.Net Network 104 can include various connection types, such as wired, wireless communication link or fiber optic cables etc..
User 110 can be interacted with using terminal equipment 101,102,103 by network 104 with server 105,106, to connect Receive or send data etc..Various applications can be installed on terminal device 101,102,103, such as web browser applications, searched Index hold up class application, map class application, pay class application, the application of social class, shopping class application, JICQ, mobile phone help Hand class application etc..
Terminal device 101,102,103 can be the various electronic equipments for having display screen and supporting function of search, bag Include but be not limited to smart mobile phone, tablet personal computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio aspect 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio aspect 4) it is player, on knee portable Computer and desktop computer etc..
Server 105,106 can be to provide the server of various services, such as terminal device 101,102,103 is provided The background server of support.Background server can analyze and process to data such as the requests that receives, and by result Terminal device is fed back to, for example, what can be sent according to terminal treats expanded text, the text after generation extension.
It should be noted that the method for being used to generate text based on artificial intelligence that the embodiment of the present application is provided can be with Performed by server 105,106, correspondingly, the device for being used to generate text based on artificial intelligence can be arranged at server 105th, in 106.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realizing need Will, can have any number of terminal device, network and server.
With continued reference to Fig. 2, one that is used to generate the method for text based on artificial intelligence according to the application is shown The flow 200 of embodiment.The method for generating text based on artificial intelligence, comprises the following steps:
Step 201, obtain and treat expanded text.
In the present embodiment, the electronic equipment (example of the method operation for being used to generate text thereon based on artificial intelligence Server as shown in Figure 1), can be from local or obtained from other electronic equipments and treat expanded text.Treat that expanded text can To be the text with break-up value that above-mentioned electronic equipment can be got, for example, treating that expanded text can be stored in advance in In the inquiry request that the Query Information (query) of local user's history input or user are sent by terminal in real time Including Query Information, and other users input text.
Step 202, expanded text is treated in cutting, obtains treating the word sequence of expanded text.
In the present embodiment, above-mentioned electronic equipment can treat expanded text with what is obtained in dicing step 201, obtain waiting to extend The word sequence of text.Cutting treats that expanded text can treat expanded text to carry out cutting word/participle operation, can pass through full cutting Method etc. is handled, and treating that expanded text is divided into word, obtains treating the word sequence of expanded text.For example, " pregnant woman eats any sea Product zinc supplementation " can using cutting as " pregnant woman, eat, what, marine product, mend, zinc ".
Step 203, according to the word and the corresponding relation of identification information prestored, it is determined that mark letter corresponding with word sequence Cease sequence.
In the present embodiment, above-mentioned electronic equipment can be according to the word and the corresponding relation of identification information prestored, really Fixed identification information sequence corresponding with the word sequence obtained in step 202.Identification information can be another representation of word, It can be made up of letter and/or numeral, for example, identification information can be sequence number of the word in the dictionary pre-set, for word The word being not present in allusion quotation can use unified identification information, such as " UNKOWN ", represent.The dictionary pre-set can be to language material After carrying out cutting word processing, the obtained frequency of word appearance is counted, the high word of frequency is will appear from and is stored what is obtained.
Step 204, by the text extended model of identified identification information sequence inputting training in advance, generate after extending The identification information sequence of text.
In the present embodiment, above-mentioned electronic equipment can be advance by identified identification information sequence inputting in step 203 The text extended model of training, the identification information sequence of the text after generation extension.Text extended model, which can be used for sign, to be treated Corresponding relation between the identification information sequence of text after the identification information sequence of expanded text and extension.
As an example, text extended model can include one or more neural network models, neural network model can be with Using Recognition with Recurrent Neural Network model (RNN, Recurrent Neural Network), in Recognition with Recurrent Neural Network prototype network structure Hidden node between connection form ring-type, it can not only learn the information at current time, the sequence information before also relying on.By Solves the problems, such as information preservation in its special network architecture.So RNN is to processing time sequence and language text sequence Row problem has the advantage of uniqueness.Further, RNN variant shot and long term memory network (LSTM, Long Short can also be used Term Memory networks), gate recursive unit one or more of (GRU, Gated Recurrent Unit) group Into text extended model.Text extended model can also be that technical staff is pre-set and deposited based on the statistics to mass data Storage to one or more of identification information sequence in above-mentioned electronic equipment, to treat expanded text identification information carries out computing With the operational formula of the identification information sequence of the text after being expanded.
In some optional implementations of the present embodiment, text extended model includes encoding model and decoded model, compiles Code model is used to characterize the corresponding relation between identification information sequence and coded information sequences, and decoded model is used to characterize to set in advance Corresponding relation between the identification information for originating word, both coded information sequences and the identification information sequence put;And by really The text extended model of fixed identification information sequence inputting training in advance, the identification information sequence of the text after generation extension, bag Include:By identified identification information sequence inputting encoding model, the coded information sequences of expanded text are treated in generation;By what is generated The identification information input decoded model of coded information sequences and starting word, the identification information sequence of the text after generation extension.
In this implementation, coding, can be the vector that list entries is changed into a regular length;Decoding, can To be that the fixed vector that will be generated before is then converted into output sequence.It is defeated that brain reading-memory-has been imitated in coding-storage-decoding The process gone out.In addition to " coding-decoding " mechanism, notice (attention) model can also be used to complete to treat expanded text Mapping between the identification information sequence of text after identification information sequence and extension, attention model do not require encoder by institute There is input information all to encode among the vector of a regular length.So, when each output is produced, can do To the information for making full use of list entries to carry.Starting word can be configured according to being actually needed, for example, it may be “START”。
In some optional implementations of the present embodiment, text extended model is trained via following steps:It will search Index in the click logs held up, query statement corresponding with same clickthrough forms sample group two-by-two;The each sample of cutting The query statement that group includes, each word being syncopated as;It is descending according to occurrence number from each word being syncopated as Sequential selection preset number word;For selected each word allocation identification information, and store that word is corresponding with identification information to close System;According to word and the corresponding relation of identification information, it is determined that identification information sequence corresponding with the query statement that each sample group includes Row;By identification information sequence corresponding with two query statements that each sample group includes, respectively as input and output, train Obtain text extended model.
In this implementation, the loss function of model training can be according to the determine the probability of word appearance, can be first Random initializtion text extended model, the model then is trained according to small lot stochastic gradient descent method using training data, is made Obtain empirical risk minimization.Because the click logs of search engine are abundant in content, query statement corresponding to same clickthrough Between relation merely to be semantic similar, so, the text after the extension of generation is more abundant, and model output result with Query statement is more pressed close to, if subsequently being scanned for according to the text after extension, search effect is more preferably.Except same clickthrough Outside corresponding query statement, the language material of training text extended model can also be the correlation that other users are submitted or machine generates The text of connection.
Step 205, according to the identification information sequence and word and the corresponding relation of identification information generated, generate after extending Text.
In the present embodiment, above-mentioned electronic equipment can according to word and the corresponding relation of identification information, it is determined that with according to step Word corresponding to each identification information in identification information sequence, the text after being expanded generated in rapid 204.
In some optional implementations of the present embodiment, it is that the Query Information inputted according to terminal generates to treat expanded text 's;And according to the identification information sequence and word and the corresponding relation of identification information generated, after the text after generation extension, Method also includes:Scan for operating based on the text generated, obtain search result information;To terminal push search result letter Breath.
In this implementation, user can input the Query Information of the forms such as voice, picture or word by terminal, on Text can be converted into by stating electronic equipment, and using the text being converted to as treating expanded text.Searched to what terminal pushed Rope object information can as the supplement for treating expanded text search result, further saved with this user obtain information when Between.
The method that above-described embodiment of the application provides treats expanded text by obtaining, and expanded text is treated in cutting, is obtained The word sequence of expanded text is treated, then by the text expanded mode of identification information sequence inputting training in advance corresponding with word sequence Type, the identification information sequence of the text after generation extension, finally, according to the identification information sequence and word and identification information generated Corresponding relation, the text after generation extension, improve the diversity of text generation.
With continued reference to Fig. 3, it illustrates answering for the method for being used to generate text based on artificial intelligence according to the application With the schematic diagram of scene.In Fig. 3 application scenarios, server 301 gets user and waits to expand by what terminal 302 uploaded first Text 303 " what marine product zinc supplementation pregnant woman eats " is opened up, then server carries out segmenting etc. processing to it, and by the information after processing The text 304 inputted after to the text extended model of training in advance, having finally given extension, the text 304 after extension include " pregnant Woman eat marine product can supplement what trace element ", " what food zinc supplementation pregnant woman eats ".
Refer to Fig. 4, Fig. 4 be according to the method for being used to generate text based on artificial intelligence of the present embodiment another The schematic flow sheet of embodiment.
In Fig. 4, should be comprised the following steps based on the method 400 for being used to generate text of artificial intelligence:
Step 401, obtain and treat expanded text.
In the present embodiment, the electronic equipment (example of the method operation for being used to generate text thereon based on artificial intelligence Server as shown in Figure 1), can be from local or obtained from other electronic equipments and treat expanded text.
Step 402, expanded text is treated in cutting, obtains treating the word sequence of expanded text.
In the present embodiment, above-mentioned electronic equipment can treat expanded text with what is obtained in dicing step 401, obtain waiting to extend The word sequence of text.
Step 403, according to the word and the corresponding relation of identification information prestored, it is determined that mark letter corresponding with word sequence Cease sequence.
In the present embodiment, above-mentioned electronic equipment can be according to the word and the corresponding relation of identification information prestored, really Fixed identification information sequence corresponding with the word sequence obtained in step 402.
Step 404, each identification information positive sequence in identified identification information sequence is inputted to the forward direction for coding Recognition with Recurrent Neural Network is propagated, generates the first reference encoder information sequence.
In the present embodiment, above-mentioned electronic equipment can be by each mark in the identification information sequence determined in step 403 Knowledge information positive sequence inputs the forward-propagating Recognition with Recurrent Neural Network for coding, generates the first reference encoder information sequence.With circulation Neutral net is exemplified by LSTM Recognition with Recurrent Neural Network, LSTM includes input gate, forgets door, out gate, the first reference encoder information Sequence can be calculated by following equation:
ienc,t=σ (Wenc,ixenc+Uenc,iht-1+benc,i) (1)
fenc,t=σ (Wenc,fxenc+Uenc,fht-1+benc,f) (2)
oenc,t=σ (Wenc,oxenc+Uenc,oht-1+benc,o) (3)
ht=oenc,t⊙cenc,t-1 (6)
By current term xtInformation and front and continued sequence of terms information ht-1After comprehensive modeling, current term sequence is generated Information ht
Wherein, symbol ⊙ represents to be multiplied by dimension, and tanh () is expressed as hyperbolic tangent function, and σ represents S type functions (Sigmoid) function.T represents current time, xencRepresent current time coding neutral net input, each moment it is defeated Enter the identification information sequence for constituting and being determined in step 403.Wenc,i, Wenc,f, Wenc,o, Wenc,c, Uenc,i, Uenc,f, Uenc,o, Uenc,cThe weight matrix of presentation code neutral net, benc,i, benc,f, benc,o, benc,cPresentation code neutral net it is inclined Put item.ienc,tRepresent the current time coding weights of neutral net input gate, fenc,tRepresent current time coding nerve net Network forgets the weights of door, oenc,tRepresent the weights of current time coding neutral net out gate.htRepresent current time coding With the output state of neutral net, ht-1Represent the output state of last moment coding neutral net.cenc,tPresentation code god Status information through network current time, cenc,t-1The status information of last moment coding neutral net is represented,Represent Emerging status information in coding neutral net.
Step 405, each identification information inverted order in identified identification information sequence is inputted for the reverse of coding Recognition with Recurrent Neural Network is propagated, generates the second reference encoder information sequence.
In the present embodiment, above-mentioned electronic equipment can be by each mark in the identification information sequence determined in step 403 Knowledge information inverted order inputs the backpropagation Recognition with Recurrent Neural Network for coding, generates the second reference encoder information sequence.Reversely pass The iteration that Recognition with Recurrent Neural Network can be taken turns more by gradient descent method is broadcast, obtains suitable backpropagation Recognition with Recurrent Neural Network ginseng Number.
Step 406, expanded text is treated according to the first reference encoder information sequence and the second reference encoder information sequence, generation The coded information sequences of sequence.
In the present embodiment, above-mentioned electronic equipment can be according to the first reference encoder information sequence generated in step 404 With the second reference encoder information sequence generated in step 405, the coded information sequences of expanded text sequence are treated in generation.Above-mentioned electricity Sub- equipment can weight the first reference encoder information sequence and the second reference encoder by setting weight matrix according to weight matrix The coded information sequences of expanded text sequence are treated in information sequence, generation.Weight matrix can be pre-set, and can also pass through machine Learning method determines.
Step 407, word is originated based on the Recognition with Recurrent Neural Network for decoding and the coded information sequences generated, prediction The identification information sequence of alternative follow-up word sequence.
In the present embodiment, above-mentioned electronic equipment can be based on raw in the Recognition with Recurrent Neural Network and step 406 for decoding Into coded information sequences, the identification information sequence of the alternative follow-up word sequence of prediction starting word.It is different during from coding, during decoding The hidden layer result of output needs to predict corresponding target vocabulary in sequence ergodic process, and the target vocabulary can change as next round The input in generation.In addition, above-mentioned electronic equipment can also determine the coding information generated during prediction every time according to attention model The weight of sequence;The coded information sequences generated are weighted according to weight;Based on the Recognition with Recurrent Neural Network for decoding With the coded information sequences after weighting, the identification information sequence of the alternative follow-up word sequence of prediction starting word.To coding side result Summation is weighted by attention model, generates contextual information.
So that Recognition with Recurrent Neural Network is LSTM Recognition with Recurrent Neural Network as an example, the standby of starting word can be predicted according to below equation Select the identification information sequence of follow-up word sequence:
idec,t=σ (Wdec,ixdec+Udec,ist-1+Aiat+bdec,i) (7)
fdec,t=σ (Wdec,fxdec+Udec,fst-1+Afat+bDec, f) (8)
odec,t=σ (Wdec,oxdec+Udec,ost-1+Aoat+bdec,o) (9)
st=odec,t⊙cdec,t-1 (12)
Wherein, t represents current time, xdecRepresent the input of current time decoding neutral net, Wdec,i, Wdec,f, Wdec,o, Wdec,c, Udec,i, Udec,f, Udec,o, Udec,c, Ai, Af, Ao, AcRepresent the weight matrix of decoding neutral net.bdec,i, bdec,f, bdec,o, bdec,cRepresent the bias term of decoding neutral net.idec,tRepresent that current time decoding is inputted with neutral net The weights of door, fdec,tRepresent that the weights of door, o are forgotten in current time decoding with neutral netdec,tRepresent current time decoding god Weights through network out gate.cdec,tRepresent the decoding status information at neutral net current time, cdec,t-1In expression for the moment The status information of decoding neutral net is carved,Represent emerging status information in decoding neutral net.StRepresent to work as The output state of preceding moment decoding neutral net, St-1Represent the output state of last moment decoding neutral net.atRepresent Automobile driving value.
atIt can be calculated according in the following manner:
vit=Vatanh(Wachi+Uast-1) (13)
Wherein, i=1,2,3..., j=1,2,3... represent to treat each volume in the coded information sequences of expanded text sequence At the time of code information corresponds to.Exp () represents the exponential function using natural constant e the bottom of as.Va, Wa, UaRepresent weight matrix.vit、 vjtIt is the median for the input that should be alignd for the output for the Recognition with Recurrent Neural Network for determining to be used to decode with current time.wit Represent at current time, treat the weight of the coding information at i moment in the coded information sequences of expanded text sequence.chiExpression is treated The coding information at i moment in the coded information sequences of expanded text sequence.
Step 408, the probability that the identification information included according to each identification information sequence predicted occurs, calculates the mark Know the probability that information sequence occurs.
In the present embodiment, above-mentioned electronic equipment can include according to each identification information sequence predicted in step 407 Identification information occur probability, calculate the identification information sequence appearance probability.Can be by StWord is projected to as linear transformation Table size space, the probability of next word is then predicted by the operation of flexible maximum transfer function (Softmax).
Step 409, according to the descending sequential selection of the probability of appearance from each identification information sequence predicted Predetermined number identification information sequence, the identification information sequence as the text after extension.
In the present embodiment, above-mentioned electronic equipment can be from each identification information sequence predicted according to step 408 The descending sequential selection predetermined number identification information sequence of the probability of the appearance of middle calculating, as the text after extension Identification information sequence.Predetermined number can be configured according to being actually needed.
Optionally, above-mentioned electronic equipment can also combine beam-search (Beam Search) algorithm, generating probability maximum Preceding predetermined number identification information sequence.As an example, can first, given sequence starting vocabulary START is defeated as the moment 0 Enter, the probability distribution of next word is then generated by decoding end computing.We select the pre- of maximum probability from the distribution Fixed number mesh word, then respectively using this predetermined number word as next word in decoding sequence, and it is used as the moment 1 Input.Then, selected in predetermined number branch in each distribution of predetermined number caused by branch general with front and continued sequence Candidate of the rate product predetermined maximum number word as the input at moment 2, repeat aforesaid operations.If beam-search output series Vocabulary " END " is terminated, then beam-search width subtracts one, and continues search for, until the width of boundling is changed into 0 or reaches maximal sequence Untill generating length.Thus, you can obtain identification information sequence of the predetermined number identification information sequence as the text after extension Row.
Step 410, according to the identification information sequence and word and the corresponding relation of identification information generated, generate after extending Text.
In the present embodiment, above-mentioned electronic equipment can according to word and the corresponding relation of identification information, it is determined that with according to step Word corresponding to each identification information, the text after being expanded in the identification information sequence obtained in rapid 409.
Step 401, step 402, step 403, step 410 realize details and technique effect may be referred to step 201, step Rapid 202, the explanation in step 203, step 205, will not be repeated here.
Figure 4, it is seen that compared with embodiment corresponding to Fig. 2, the method base of above-described embodiment offer of the application Encoded in the output of forward-propagating Recognition with Recurrent Neural Network, backpropagation Recognition with Recurrent Neural Network, then by circulating nerve net Network is decoded so that the text extended model table justice being made up of Recognition with Recurrent Neural Network is more abundant and accurate.
With further reference to Fig. 5, as the realization to the above method, it is used for this application provides a kind of based on artificial intelligence One embodiment of the device of text is generated, the device embodiment is corresponding with the embodiment of the method shown in Fig. 2, and the device is specific It can apply in various electronic equipments.
As shown in figure 5, the device 500 for being used to generate text based on artificial intelligence of the present embodiment includes:Acquiring unit 510th, cutting unit 520, determining unit 530, the first generation unit 540, the second generation unit 550, wherein, acquiring unit 510, Expanded text is treated for obtaining;Cutting unit 520, expanded text is treated for cutting, obtain treating the word sequence of expanded text;It is determined that Unit 530, the word and the corresponding relation of identification information prestored for basis, it is determined that identification information sequence corresponding with word sequence Row;First generation unit 540, for by the text extended model of identified identification information sequence inputting training in advance, generation The identification information sequence of text after extension, wherein, text extended model is used to characterize the identification information sequence for treating expanded text Corresponding relation between the identification information sequence of the text after extension;Second generation unit 550, for according to the mark generated Know information sequence and the corresponding relation of word and identification information, the text after generation extension.
In the present embodiment, acquiring unit 510, cutting unit 520, determining unit 530, the first generation unit 540, second The specific processing of generation unit 550 may be referred to Fig. 2 and correspond to embodiment step 201, step 202, step 203, step 204, step Rapid 205 detailed description, will not be repeated here.
In some optional implementations of the present embodiment, text extended model includes encoding model and decoded model, compiles Code model is used to characterize the corresponding relation between identification information sequence and coded information sequences, and decoded model is used to characterize to set in advance Corresponding relation between the identification information for originating word, both coded information sequences and the identification information sequence put;And first life Into unit 50, including:Coded sub-units 541, for by identified identification information sequence inputting encoding model, generating and waiting to extend The coded information sequences of text;Decoding subunit 542, for by the coded information sequences generated and starting word identification information Input decoded model, the identification information sequence of the text after generation extension.
In some optional implementations of the present embodiment, coded sub-units 541, further it is configured to:It will be determined Identification information sequence in each identification information positive sequence input forward-propagating Recognition with Recurrent Neural Network for coding, generation first Reference encoder information sequence;Each identification information inverted order in identified identification information sequence is inputted for the reverse of coding Recognition with Recurrent Neural Network is propagated, generates the second reference encoder information sequence;Referred to according to the first reference encoder information sequence and second The coded information sequences of expanded text sequence are treated in coded information sequences, generation.
In some optional implementations of the present embodiment, decoding subunit 541 is further configured to:Based on for solving The Recognition with Recurrent Neural Network of code and the coded information sequences generated, the identification information sequence of the alternative follow-up word sequence of prediction starting word Row;The probability that the identification information that each identification information sequence according to being predicted includes occurs, calculates the identification information sequence and goes out Existing probability;According to the descending sequential selection predetermined number of the probability of appearance from each identification information sequence predicted Individual identification information sequence, the identification information sequence as the text after extension.
In some optional implementations of the present embodiment, decoding subunit 542 is further configured to:According to notice Model determines the weight of the coded information sequences generated during prediction every time;The coded information sequences generated are entered according to weight Row weighting;Based on the coded information sequences after the Recognition with Recurrent Neural Network for decoding and weighting, prediction originates the alternative follow-up of word The identification information sequence of word sequence.
In some optional implementations of the present embodiment, device also includes training unit 560, and training unit 560 is used for: By in the click logs of search engine, query statement corresponding with same clickthrough forms sample group two-by-two;Cutting is each The query statement that sample group includes, each word being syncopated as;From each word being syncopated as according to occurrence number by greatly to Small sequential selection preset number word;For selected each word allocation identification information, and store pair of word and identification information It should be related to;According to word and the corresponding relation of identification information, it is determined that mark letter corresponding with the query statement that each sample group includes Cease sequence;By identification information sequence corresponding with two query statements that each sample group includes, respectively as input and export, Training obtains text extended model.
In some optional implementations of the present embodiment, it is that the Query Information inputted according to terminal generates to treat expanded text 's;And device also includes push unit 570, push unit 570 is used for:Scan for operating based on the text generated, obtain To search result information;Search result information is pushed to terminal.
From figure 5 it can be seen that the device 500 for being used to generate text based on artificial intelligence passes through acquisition in the present embodiment Visited in predetermined amount of time targeted customer's information of target area;The identification information that extraction targeted customer's information includes;Obtain Take the permanent residence information for meeting identification information in User Information Database, wherein, User Information Database include identification information and Permanent residence information corresponding to identification information;Text is generated according to acquired permanent residence information, improves the various of text generation Property.It will be understood by those skilled in the art that above-mentioned first generation unit, the second generation unit only represent two different acquisition lists Member, the first generation unit, for the text extended model of identified identification information sequence inputting training in advance, generation to be extended The identification information sequence of text afterwards;Second generation unit, for being believed according to the identification information sequence and word generated and mark The corresponding relation of breath, the text after generation extension, wherein, first, second does not form the particular determination to generation unit.
Below with reference to Fig. 6, it illustrates suitable for for realizing the computer system 600 of the server of the embodiment of the present application Structural representation.Server shown in Fig. 6 is only an example, should not be to the function and use range band of the embodiment of the present application Carry out any restrictions.
As shown in fig. 6, computer system 600 includes CPU (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into program in random access storage device (RAM) 603 from storage part 608 and Perform various appropriate actions and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.
I/O interfaces 605 are connected to lower component:Importation 606 including keyboard, mouse etc.;Penetrated including such as negative electrode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 608 including hard disk etc.; And the communications portion 609 of the NIC including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net performs communication process.Driver 610 is also according to needing to be connected to I/O interfaces 606.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc., it is arranged on as needed on driver 610, in order to read from it Computer program be mounted into as needed storage part 608.
Especially, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product, it includes being carried on computer-readable medium On computer program, the computer program include be used for execution flow chart shown in method program code.In such reality To apply in example, the computer program can be downloaded and installed by communications portion 609 from network, and/or from detachable media 611 are mounted.When the computer program is performed by CPU (CPU) 601, perform what is limited in the present processes Above-mentioned function.
It should be noted that computer-readable medium described herein can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination.Meter The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more wires, just Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this application, computer-readable recording medium can any include or store journey The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this In application, computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for By instruction execution system, device either device use or program in connection.Included on computer-readable medium Program code can be transmitted with any appropriate medium, be included but is not limited to:Wirelessly, electric wire, optical cable, RF etc., or it is above-mentioned Any appropriate combination.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of the various embodiments of the application, method and computer journey Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation The part of one unit of table, program segment or code, the part of the unit, program segment or code include one or more use In the executable instruction of logic function as defined in realization.It should also be noted that marked at some as in the realization replaced in square frame The function of note can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actually It can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also to note Meaning, the combination of each square frame and block diagram in block diagram and/or flow chart and/or the square frame in flow chart can be with holding Function as defined in row or the special hardware based system of operation are realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit can also be set within a processor, for example, can be described as:A kind of processor bag Include acquiring unit, cutting unit, determining unit, the first generation unit, the second generation unit.Wherein, the title of these units exists The restriction to the unit in itself is not formed in the case of certain, for example, acquiring unit is also described as " obtaining and treating extension text This unit ".
As on the other hand, present invention also provides a kind of computer-readable medium, the computer-readable medium can be Included in device described in above-described embodiment;Can also be individualism, and without be incorporated the device in.Above-mentioned calculating Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the device so that should Device:Expanded text is treated in acquisition;Expanded text is treated in cutting, obtains treating the word sequence of expanded text;According to the word that prestores with The corresponding relation of identification information, it is determined that identification information sequence corresponding with word sequence;By identified identification information sequence inputting The text extended model of training in advance, the identification information sequence of the text after generation extension, wherein, text extended model is used for table Levy the corresponding relation between the identification information sequence of the text after the identification information sequence of expanded text and extension;According to giving birth to Into identification information sequence and the corresponding relation of word and identification information, generation extension after text.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to the technology that the particular combination of above-mentioned technical characteristic forms Scheme, while should also cover in the case where not departing from foregoing invention design, carried out by above-mentioned technical characteristic or its equivalent feature The other technical schemes for being combined and being formed.Such as features described above has similar work(with (but not limited to) disclosed herein The technical scheme that the technical characteristic of energy is replaced mutually and formed.

Claims (16)

  1. A kind of 1. method for generating text based on artificial intelligence, it is characterised in that methods described includes:
    Expanded text is treated in acquisition;
    Expanded text is treated described in cutting, obtains the word sequence for treating expanded text;
    According to the word and the corresponding relation of identification information prestored, it is determined that identification information sequence corresponding with the word sequence;
    By the text extended model of identified identification information sequence inputting training in advance, the mark letter of the text after generation extension Sequence is ceased, wherein, the text extended model is used to characterize the identification information sequence after expanded text and the text after extension Corresponding relation between identification information sequence;
    According to the identification information sequence and the corresponding relation of institute's predicate and identification information generated, the text after generation extension.
  2. 2. according to the method for claim 1, it is characterised in that the text extended model includes encoding model and decoding mould Type, the encoding model are used to characterize the corresponding relation between identification information sequence and coded information sequences, the decoded model The corresponding pass between identification information, both coded information sequences and identification information sequence for characterizing the starting word pre-set System;And
    The text extended model by identified identification information sequence inputting training in advance, the mark of the text after generation extension Know information sequence, including:
    By encoding model described in identified identification information sequence inputting, the coded information sequences of expanded text are treated described in generation;
    The identification information of the coded information sequences generated and the starting word is inputted into the decoded model, after generation extension The identification information sequence of text.
  3. 3. according to the method for claim 2, it is characterised in that described by volume described in identified identification information sequence inputting Code model, the coded information sequences of expanded text are treated described in generation, including:
    Each identification information positive sequence in identified identification information sequence is inputted and circulates nerve for the forward-propagating of coding Network, generate the first reference encoder information sequence;
    Each identification information inverted order in identified identification information sequence is inputted and circulates nerve for the backpropagation of coding Network, generate the second reference encoder information sequence;
    According to the first reference encoder information sequence and the second reference encoder information sequence, expanded text is treated described in generation The coded information sequences of sequence.
  4. 4. according to the method for claim 2, it is characterised in that described by the coded information sequences generated and the starting The identification information of word inputs the decoded model, generates the identification information sequence of the text after extension, including:
    Based on the Recognition with Recurrent Neural Network for decoding and the coded information sequences generated, the alternative follow-up of the starting word is predicted The identification information sequence of word sequence;
    The probability that the identification information that each identification information sequence according to being predicted includes occurs, calculates the identification information sequence and goes out Existing probability;
    Marked from each identification information sequence predicted according to the descending sequential selection predetermined number of the probability of appearance Know information sequence, the identification information sequence as the text after extension.
  5. 5. according to the method for claim 4, it is characterised in that the Recognition with Recurrent Neural Network based on for decoding and give birth to Into coded information sequences, predict the identification information sequence of the alternative follow-up word sequence of the starting word, including:
    The weight for the coded information sequences for determining to be generated during prediction every time according to attention model;
    The coded information sequences generated are weighted according to the weight;
    Based on the coded information sequences after the Recognition with Recurrent Neural Network for decoding and weighting, the alternative follow-up of the starting word is predicted The identification information sequence of word sequence.
  6. 6. according to the method for claim 1, it is characterised in that the text extended model is trained via following steps 's:
    By in the click logs of search engine, query statement corresponding with same clickthrough forms sample group two-by-two;
    The query statement that each sample group of cutting includes, each word being syncopated as;
    According to the sequential selection preset number word that occurrence number is descending from each word being syncopated as;
    For selected each word allocation identification information, and store the corresponding relation of word and identification information;
    According to institute's predicate and the corresponding relation of identification information, it is determined that mark letter corresponding with the query statement that each sample group includes Cease sequence;
    By identification information sequence corresponding with two query statements that each sample group includes, respectively as input and output, instruct Get the text extended model.
  7. 7. according to the method any one of claim 1-6, it is characterised in that described to treat that expanded text is defeated according to terminal The Query Information generation entered;And
    It is described according to the identification information sequence generated and the corresponding relation of institute's predicate and identification information, the text after generation extension Afterwards, methods described also includes:
    Scan for operating based on the text generated, obtain search result information;
    The search result information is pushed to the terminal.
  8. 8. a kind of device for being used to generate text based on artificial intelligence, it is characterised in that described device includes:
    Acquiring unit, expanded text is treated for obtaining;
    Cutting unit, for treating expanded text described in cutting, obtain the word sequence for treating expanded text;
    Determining unit, the word and the corresponding relation of identification information prestored for basis, it is determined that corresponding with the word sequence Identification information sequence;
    First generation unit, for the text extended model of identified identification information sequence inputting training in advance, generation to be expanded The identification information sequence of text after exhibition, wherein, the text extended model is used to characterize the identification information sequence for treating expanded text Corresponding relation between the identification information sequence of text after row and extension;
    Second generation unit, for according to the identification information sequence and the corresponding relation of institute's predicate and identification information generated, life Into the text after extension.
  9. 9. device according to claim 8, it is characterised in that the text extended model includes encoding model and decoding mould Type, the encoding model are used to characterize the corresponding relation between identification information sequence and coded information sequences, the decoded model The corresponding pass between identification information, both coded information sequences and identification information sequence for characterizing the starting word pre-set System;And
    First generation unit, including:
    Coded sub-units, for by encoding model described in identified identification information sequence inputting, expanded text to be treated described in generation Coded information sequences;
    Decoding subunit, for the identification information of the coded information sequences generated and the starting word to be inputted into the decoding mould Type, the identification information sequence of the text after generation extension.
  10. 10. device according to claim 9, it is characterised in that the coded sub-units, be further configured to:
    Each identification information positive sequence in identified identification information sequence is inputted and circulates nerve for the forward-propagating of coding Network, generate the first reference encoder information sequence;
    Each identification information inverted order in identified identification information sequence is inputted and circulates nerve for the backpropagation of coding Network, generate the second reference encoder information sequence;
    According to the first reference encoder information sequence and the second reference encoder information sequence, expanded text is treated described in generation The coded information sequences of sequence.
  11. 11. device according to claim 9, it is characterised in that the decoding subunit is further configured to:
    Based on the Recognition with Recurrent Neural Network for decoding and the coded information sequences generated, the alternative follow-up of the starting word is predicted The identification information sequence of word sequence;
    The probability that the identification information that each identification information sequence according to being predicted includes occurs, calculates the identification information sequence and goes out Existing probability;
    Marked from each identification information sequence predicted according to the descending sequential selection predetermined number of the probability of appearance Know information sequence, the identification information sequence as the text after extension.
  12. 12. device according to claim 11, it is characterised in that the decoding subunit is further configured to:
    The weight for the coded information sequences for determining to be generated during prediction every time according to attention model;
    The coded information sequences generated are weighted according to the weight;
    Based on the coded information sequences after the Recognition with Recurrent Neural Network for decoding and weighting, the alternative follow-up of the starting word is predicted The identification information sequence of word sequence.
  13. 13. device according to claim 8, it is characterised in that described device also includes training unit, the training unit For:
    By in the click logs of search engine, query statement corresponding with same clickthrough forms sample group two-by-two;
    The query statement that each sample group of cutting includes, each word being syncopated as;
    According to the sequential selection preset number word that occurrence number is descending from each word being syncopated as;
    For selected each word allocation identification information, and store the corresponding relation of word and identification information;
    According to institute's predicate and the corresponding relation of identification information, it is determined that mark letter corresponding with the query statement that each sample group includes Cease sequence;
    By identification information sequence corresponding with two query statements that each sample group includes, respectively as input and output, instruct Get the text extended model.
  14. 14. according to the device any one of claim 8-13, it is characterised in that described to treat that expanded text is according to terminal The Query Information generation of input;And
    Described device also includes push unit, and the push unit is used for:
    Scan for operating based on the text generated, obtain search result information;
    The search result information is pushed to the terminal.
  15. A kind of 15. equipment, it is characterised in that including:
    One or more processors;
    Storage device, for storing one or more programs,
    When one or more of programs are by one or more of computing devices so that one or more of processors are real The now method as described in any in claim 1-7.
  16. 16. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The method as described in any in claim 1-7 is realized during execution.
CN201710787262.0A 2017-09-04 2017-09-04 Method and device for generating text based on artificial intelligence Active CN107526725B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710787262.0A CN107526725B (en) 2017-09-04 2017-09-04 Method and device for generating text based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710787262.0A CN107526725B (en) 2017-09-04 2017-09-04 Method and device for generating text based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN107526725A true CN107526725A (en) 2017-12-29
CN107526725B CN107526725B (en) 2021-08-24

Family

ID=60683533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710787262.0A Active CN107526725B (en) 2017-09-04 2017-09-04 Method and device for generating text based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN107526725B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509413A (en) * 2018-03-08 2018-09-07 平安科技(深圳)有限公司 Digest extraction method, device, computer equipment and storage medium
CN108932326A (en) * 2018-06-29 2018-12-04 北京百度网讯科技有限公司 A kind of example extended method, device, equipment and medium
CN109284367A (en) * 2018-11-30 2019-01-29 北京字节跳动网络技术有限公司 Method and apparatus for handling text
CN109800421A (en) * 2018-12-19 2019-05-24 武汉西山艺创文化有限公司 A kind of game scenario generation method and its device, equipment, storage medium
CN109858004A (en) * 2019-02-12 2019-06-07 四川无声信息技术有限公司 Text Improvement, device and electronic equipment
CN110162751A (en) * 2019-05-13 2019-08-23 百度在线网络技术(北京)有限公司 Text generator training method and text generator training system
CN110188204A (en) * 2019-06-11 2019-08-30 腾讯科技(深圳)有限公司 A kind of extension corpora mining method, apparatus, server and storage medium
CN110309407A (en) * 2018-03-13 2019-10-08 优酷网络技术(北京)有限公司 Viewpoint extracting method and device
CN110362810A (en) * 2018-03-26 2019-10-22 优酷网络技术(北京)有限公司 Text analyzing method and device
CN110362809A (en) * 2018-03-26 2019-10-22 优酷网络技术(北京)有限公司 Text analyzing method and device
CN110362808A (en) * 2018-03-26 2019-10-22 优酷网络技术(北京)有限公司 Text analyzing method and device
CN110555104A (en) * 2018-03-26 2019-12-10 优酷网络技术(北京)有限公司 text analysis method and device
CN110851673A (en) * 2019-11-12 2020-02-28 西南科技大学 Improved cluster searching strategy and question-answering system
CN110852093A (en) * 2018-07-26 2020-02-28 腾讯科技(深圳)有限公司 Text information generation method and device, computer equipment and storage medium
CN110874771A (en) * 2018-08-29 2020-03-10 北京京东尚科信息技术有限公司 Method and device for matching commodities
CN111209725A (en) * 2018-11-19 2020-05-29 阿里巴巴集团控股有限公司 Text information generation method and device and computing equipment
CN111783422A (en) * 2020-06-24 2020-10-16 北京字节跳动网络技术有限公司 Text sequence generation method, device, equipment and medium
CN111859888A (en) * 2020-07-22 2020-10-30 北京致医健康信息技术有限公司 Diagnosis assisting method and device, electronic equipment and storage medium
US11069346B2 (en) 2019-04-22 2021-07-20 International Business Machines Corporation Intent recognition model creation from randomized intent vector proximities
CN113392639A (en) * 2020-09-30 2021-09-14 腾讯科技(深圳)有限公司 Title generation method and device based on artificial intelligence and server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090292693A1 (en) * 2008-05-26 2009-11-26 International Business Machines Corporation Text searching method and device and text processor
CN106407381A (en) * 2016-09-13 2017-02-15 北京百度网讯科技有限公司 Method and device for pushing information based on artificial intelligence
CN106503255A (en) * 2016-11-15 2017-03-15 科大讯飞股份有限公司 Based on the method and system that description text automatically generates article
CN106919702A (en) * 2017-02-14 2017-07-04 北京时间股份有限公司 Keyword method for pushing and device based on document
CN106980683A (en) * 2017-03-30 2017-07-25 中国科学技术大学苏州研究院 Blog text snippet generation method based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090292693A1 (en) * 2008-05-26 2009-11-26 International Business Machines Corporation Text searching method and device and text processor
CN101593179A (en) * 2008-05-26 2009-12-02 国际商业机器公司 Document search method and device and document processor
CN106407381A (en) * 2016-09-13 2017-02-15 北京百度网讯科技有限公司 Method and device for pushing information based on artificial intelligence
CN106503255A (en) * 2016-11-15 2017-03-15 科大讯飞股份有限公司 Based on the method and system that description text automatically generates article
CN106919702A (en) * 2017-02-14 2017-07-04 北京时间股份有限公司 Keyword method for pushing and device based on document
CN106980683A (en) * 2017-03-30 2017-07-25 中国科学技术大学苏州研究院 Blog text snippet generation method based on deep learning

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020520492A (en) * 2018-03-08 2020-07-09 平安科技(深▲せん▼)有限公司Ping An Technology (Shenzhen) Co.,Ltd. Document abstract automatic extraction method, device, computer device and storage medium
CN108509413A (en) * 2018-03-08 2018-09-07 平安科技(深圳)有限公司 Digest extraction method, device, computer equipment and storage medium
CN110309407A (en) * 2018-03-13 2019-10-08 优酷网络技术(北京)有限公司 Viewpoint extracting method and device
CN110555104A (en) * 2018-03-26 2019-12-10 优酷网络技术(北京)有限公司 text analysis method and device
CN110362810A (en) * 2018-03-26 2019-10-22 优酷网络技术(北京)有限公司 Text analyzing method and device
CN110362809A (en) * 2018-03-26 2019-10-22 优酷网络技术(北京)有限公司 Text analyzing method and device
CN110362808A (en) * 2018-03-26 2019-10-22 优酷网络技术(北京)有限公司 Text analyzing method and device
CN108932326A (en) * 2018-06-29 2018-12-04 北京百度网讯科技有限公司 A kind of example extended method, device, equipment and medium
CN108932326B (en) * 2018-06-29 2021-02-19 北京百度网讯科技有限公司 Instance extension method, device, equipment and medium
CN110852093B (en) * 2018-07-26 2023-05-16 腾讯科技(深圳)有限公司 Poem generation method, device, computer equipment and storage medium
CN110852093A (en) * 2018-07-26 2020-02-28 腾讯科技(深圳)有限公司 Text information generation method and device, computer equipment and storage medium
CN110874771A (en) * 2018-08-29 2020-03-10 北京京东尚科信息技术有限公司 Method and device for matching commodities
CN111209725A (en) * 2018-11-19 2020-05-29 阿里巴巴集团控股有限公司 Text information generation method and device and computing equipment
CN111209725B (en) * 2018-11-19 2023-04-25 阿里巴巴集团控股有限公司 Text information generation method and device and computing equipment
CN109284367A (en) * 2018-11-30 2019-01-29 北京字节跳动网络技术有限公司 Method and apparatus for handling text
CN109800421A (en) * 2018-12-19 2019-05-24 武汉西山艺创文化有限公司 A kind of game scenario generation method and its device, equipment, storage medium
CN109858004B (en) * 2019-02-12 2023-08-01 四川无声信息技术有限公司 Text rewriting method and device and electronic equipment
CN109858004A (en) * 2019-02-12 2019-06-07 四川无声信息技术有限公司 Text Improvement, device and electronic equipment
US11069346B2 (en) 2019-04-22 2021-07-20 International Business Machines Corporation Intent recognition model creation from randomized intent vector proximities
US11521602B2 (en) 2019-04-22 2022-12-06 International Business Machines Corporation Intent recognition model creation from randomized intent vector proximities
CN110162751A (en) * 2019-05-13 2019-08-23 百度在线网络技术(北京)有限公司 Text generator training method and text generator training system
CN110188204B (en) * 2019-06-11 2022-10-04 腾讯科技(深圳)有限公司 Extended corpus mining method and device, server and storage medium
CN110188204A (en) * 2019-06-11 2019-08-30 腾讯科技(深圳)有限公司 A kind of extension corpora mining method, apparatus, server and storage medium
CN110851673A (en) * 2019-11-12 2020-02-28 西南科技大学 Improved cluster searching strategy and question-answering system
CN111783422B (en) * 2020-06-24 2022-03-04 北京字节跳动网络技术有限公司 Text sequence generation method, device, equipment and medium
CN111783422A (en) * 2020-06-24 2020-10-16 北京字节跳动网络技术有限公司 Text sequence generation method, device, equipment and medium
US11669679B2 (en) 2020-06-24 2023-06-06 Beijing Byledance Network Technology Co., Ltd. Text sequence generating method and apparatus, device and medium
CN111859888A (en) * 2020-07-22 2020-10-30 北京致医健康信息技术有限公司 Diagnosis assisting method and device, electronic equipment and storage medium
CN111859888B (en) * 2020-07-22 2024-04-02 北京致医健康信息技术有限公司 Diagnosis assisting method, diagnosis assisting device, electronic equipment and storage medium
CN113392639A (en) * 2020-09-30 2021-09-14 腾讯科技(深圳)有限公司 Title generation method and device based on artificial intelligence and server
CN113392639B (en) * 2020-09-30 2023-09-26 腾讯科技(深圳)有限公司 Title generation method, device and server based on artificial intelligence

Also Published As

Publication number Publication date
CN107526725B (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN107526725A (en) The method and apparatus for generating text based on artificial intelligence
CN110796190B (en) Exponential modeling with deep learning features
CN110083705B (en) Multi-hop attention depth model, method, storage medium and terminal for target emotion classification
CN107705784B (en) Text regularization model training method and device, and text regularization method and device
CN107273503A (en) Method and apparatus for generating the parallel text of same language
CN109906460A (en) Dynamic cooperation attention network for question and answer
CN107577737A (en) Method and apparatus for pushed information
CN109885756B (en) CNN and RNN-based serialization recommendation method
CN110766142A (en) Model generation method and device
CN107680580A (en) Text transformation model training method and device, text conversion method and device
CN116415654A (en) Data processing method and related equipment
CN110348535A (en) A kind of vision Question-Answering Model training method and device
CN114358203B (en) Training method and device for image description sentence generation module and electronic equipment
CN110162766B (en) Word vector updating method and device
CN106682387A (en) Method and device used for outputting information
US11423307B2 (en) Taxonomy construction via graph-based cross-domain knowledge transfer
CN107943895A (en) Information-pushing method and device
CN107832300A (en) Towards minimally invasive medical field text snippet generation method and device
CN109710953A (en) A kind of interpretation method and device calculate equipment, storage medium and chip
CN106407381A (en) Method and device for pushing information based on artificial intelligence
CN109710760A (en) Clustering method, device, medium and the electronic equipment of short text
CN113377914A (en) Recommended text generation method and device, electronic equipment and computer readable medium
CN111046757A (en) Training method and device for face portrait generation model and related equipment
CN113850012B (en) Data processing model generation method, device, medium and electronic equipment
Xu et al. CNN-based skip-gram method for improving classification accuracy of chinese text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant