CN109145190A - A kind of local quotation recommended method and system based on neural machine translation mothod - Google Patents
A kind of local quotation recommended method and system based on neural machine translation mothod Download PDFInfo
- Publication number
- CN109145190A CN109145190A CN201810994562.0A CN201810994562A CN109145190A CN 109145190 A CN109145190 A CN 109145190A CN 201810994562 A CN201810994562 A CN 201810994562A CN 109145190 A CN109145190 A CN 109145190A
- Authority
- CN
- China
- Prior art keywords
- quotation
- word
- article
- context
- decoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Abstract
The present invention discloses local quotation recommended method and system based on neural machine translation mothod, quotation extraction, lemmatization, word frequency statistics data cleansing operation are carried out to raw data set, obtain quotation context and is drawn the parallel corpora of article title and constructed initially to by quotation chapter list storehouse;By quotation context and drawn the word occurred in article title by the method for the negative sampling of rising space models coupling in term vector model and be embedded into low-dimensional semantic space and obtain term vector, the encoder of one bidirectional valve controlled cycling element with attention mechanism of building and the decoder chassis of gating cycle unit, quotation context in parallel corpora is passed through into term vector model conversion as the input after term vector as model, is drawn article title as output and carrys out training pattern;By kind subtitle that coder-decoder frame exports with to carry out cosine similarity calculating one by one by all article titles drawn in article list;According to the article time, satisfactory article is chosen as recommendation list.
Description
Technical field
The present invention relates to a kind of technical field of information retrieval more particularly to a kind of local quotations based on neural machine translation
Recommended method and system.
Background technique
With the fast development of Internet technology, a large amount of new scientific articles can be all published every year, how from magnanimity document
In quickly find out oneself needs document at a big difficulty.Local quotation recommendation may assist in given one section of context
Under the premise of, rapid build model of mind in semantic and content helps you quickly to find the research neck with you from magnanimity document
The relevant document for reference in domain directly recommends the document for reference for you, this is saved in research work for you
A large amount of times for finding pertinent literature.Local quotation recommendation plays considerable effect in research work.
In recent years, many researchers expand research to this.Two classes are broadly divided into, first is that global quotation is recommended, i.e.,
It is independent article and recommends quotation;Second is that recommending quotation for one section of context text in article.Institute's use research method generally has
Method based on text similarity, the method based on topic model, the method based on translation model, the side based on collaborative filtering
Method, the method based on deep learning and some other methods.
Neural machine translation is a set of coder-decoder frame proposed by Google in 2014, in machine translation problem
On achieve considerable progress.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of local quotation recommendation side based on neural machine translation mothod
Method, to improve the translation accuracy of quotation context and quotation article title.
A kind of local quotation recommended method based on neural machine translation mothod, comprising the following steps:
S1, quotation extraction, lemmatization, word frequency statistics data cleansing operation are carried out to raw data set, obtain on quotation
Hereafter with the parallel corpora that is drawn article title and construct initially to by quotation chapter list storehouse;
S2, the method for sampling is born by the rising space models coupling in term vector model by quotation context and by quotation chapter mark
The word occurred in topic is embedded into low-dimensional semantic space and obtains term vector, makes semantically similar word by an embedded space
Distance is closer in the space;
S3, neural machine translation mothod, the volume of one bidirectional valve controlled cycling element with attention mechanism of building are based on
The decoder chassis of code device and gating cycle unit, it is word that the quotation context in parallel corpora, which is passed through term vector model conversion,
As the input of model after vector, is drawn article title as output and carry out training pattern;
S4, by kind subtitle that coder-decoder frame exports and all article titles wait be drawn in article list
Cosine similarity calculating is carried out one by one;
Article of the time after the article time where quotation context is delivered in S5, foundation article time, removal, chooses phase
Like the satisfactory article of degree as recommendation list.
Scheme as a preferred embodiment of the above technical solution, step S1 are specifically included:
It extracts the quotation context of all English and removes unblind, retain on the quotation of word number within the set range
Hereafter and carry out lemmatization;Word frequency is counted, the vocabulary of setting ranking before ranking is retained, other vocabulary are insufficient with<UNK>replacement
Word in setting range then expands<PAD>, and is drawn article title according to the extraction of quotation context and carry out similar cleaning
Operation.
Scheme as a preferred embodiment of the above technical solution, step S2 are specifically included:
S21, sentence is divided by multiple input words form opposite with word is exported according to word window size;
S22, all words are converted to the 0-1 vector for being equivalent to vocabulary size;
S23, building neural network, include an input layer, hidden layer, output layer;
S24, negative sampling back transfer error is added in rising space model, the weight matrix at word embeded matrix is exactly last
The term vector obtained indicates.
Scheme as a preferred embodiment of the above technical solution, the step S3 specifically:
The encoder of bidirectional valve controlled cycling element of the building with attention mechanism and the decoder frame of gating cycle unit
Frame carries out the study of characterizing semantics to quotation context, excavates from candidate vocabulary on the basis of understanding semantic and decodes seed
Title, formed it is a kind of with semantic content be linking kind subtitle tectonic model;
The encoder of bidirectional valve controlled cycling element of the building with attention mechanism and the decoder frame of gating cycle unit
Frame specifically:
Encoder is made of a bidirectional valve controlled cycling element network, receives t-th of word of list entries in each moment t
Vector indicate and obtain hide layer state h<t>, acted on and being inputted by the hiding layer state of attention mechanism and output layer
Each word translation weight, further obtain final context vector and be sent into decoder and decode word;
The formula of encoder GRU unit is expressed as follows:
Gu=sigmod (Wu[h<t-1>, x<t>]+bu)
Gr=sigmod (Wr[h<t-1>, x<t>]+br)
Wherein GuTo update door, GrTo reset door,To update hidden layer variable, C<t>To flow to the hidden of subsequent time
Hide layer variable, h<t>Indicate the hidden layer variable at h moment, x<t>Indicate the input of t moment, bu、br、bcIt indicates to bias, sigmod,
Tanh is activation primitive W[u, r, c]It is weight parameter.
It is as follows that attention mechanism decodes partial routine:
When decoder decodes t-th of word, decoder t moment need to be calculated and hide layer state s<t>, the t-1 moment decodes
Word y<t-1>, the incoming context vector c of t moment encoder<t>, wherein decoder t moment hides layer state stIt can be by following public
Formula obtains:
S<t>=g (y<t-1>, s<t-1>, c<t>
The wherein incoming context variable c of t moment<t>By the hidden layer variable h of encoder<t>With each coding vocabulary and this
The translation attention for decoding vocabulary determines that formula is as follows:
WhereinIt is the attention of vector type, presentation code deviceA word pays attention to the translation of decoder whole word
Power,It can be obtained by following formula:
WhereinThe attention of scalar type, presentation code deviceA word pays attention to the translation of t-th of word of decoder
Power,It can be obtained by following formula:
Wherein vT, W[s, h]For parameters weighting;
Above procedure is recycled, until decoding whole words, as plants subtitle.
Scheme as a preferred embodiment of the above technical solution, step S4 specifically:
S41, the similarity two-by-two for all decoding the vocabulary in candidate word remittance table is calculated, establishes dictionary similarity search dictionary
Collection;
S42, to kind of subtitle and to be segmented by the article title in quotation chapter list storehouse, searched according to dictionary similarity
Similarity calculates kind of subtitle and every similarity wait be drawn article title by word in rope wordbook;
Similarity of the calculated result as kind of subtitle and this article in S43, accumulation step S42;
S44, the step S43 similarity result obtained is ranked up, forms literature recommendation list.
The local quotation recommender system based on neural machine translation mothod that the present invention also provides a kind of, applied to above-mentioned side
Method, comprising:
Quotation cleaning module, for the quotation context of input to be processed into mark required by coder-decoder frame
Quasi- input corpus form;
Article enlargement module, on the basis of existing article list library Dynamic expansion wait for quotation chapter list storehouse, utilize net
Network crawler technology crawls the newest open article of pertinent literature searching platform, so that quotation context is arranged to quotation chapter in time
Table storehouse is more complete, comprehensively;
Candidate word update module is recalculating word frequency, dynamic more new decoder after quotation chapter list storehouse is updated
Candidate word list when decoding kind subtitle;
Recommendation article list under the premise of given quotation context is calculated in quotation recommending module.
Scheme as a preferred embodiment of the above technical solution, quotation cleaning module are specifically used for:
Unblind in removal quotation context and replace with the vocabulary not appeared in vocabulary in quotation context <
UNK>, then polishing<PAD>, the word more than setting range then carry out break-in operation and to all to the word in insufficient setting range
Word carries out lemmatization, and whole vocabulary are converted to term vector with the good term vector model of pre-training later.
Scheme as a preferred embodiment of the above technical solution in article enlargement module, crawls related inspection using web crawlers technology
The newest open article of Suo Pingtai, carries out the data cleansings such as quotation extraction, lemmatization, word frequency statistics to raw data set and grasps
Make, obtain quotation context and is drawn the parallel corpora of article title and constructed initially to by quotation chapter list storehouse, Dynamic expansion
With maintenance to quotation chapter list storehouse.
Scheme as a preferred embodiment of the above technical solution in candidate word update module, is updated to quotation chapter list storehouse
Afterwards, word frequency is segmented and is recalculated to quotation chapter list storehouse title to the newest overall situation, later dynamic more new decoder solution
Candidate word list when code kind subtitle, so that article list to be drawn and decoder candidate's vocabulary maintain synchronization association state.
Scheme as a preferred embodiment of the above technical solution in quotation recommending module, local quotation is recommended and neural machine turns over
It translates and combines, local quotation is recommended to be expressed as the machine translation problem from original language to object language, mould is recommended by quotation
Recommendation article list under the premise of given quotation context is calculated in block.
The invention proposes a kind of novel local quotation recommended methods.Compared to traditional quotation recommended method, pass through structure
It builds quotation context to recommend to combine with neural machine translation to by quotation with the parallel corpora for being drawn article title, will locally draw
Text is recommended to regard as from quotation context (original language) to the machine translation problem for being drawn article title (object language), so that drawing
Stronger semantic consistency is provided between literary context and article title, it is last according to the kind subtitle translated with wait be drawn
Article in article list library carries out cosine similarity calculating, obtains wait be drawn article list.
The present invention by being embedded into low-dimensional semantic space by quotation context and by the vocabulary drawn in article title so that
Semantically similar word distance in the space is closer;Construct the volume of the bidirectional valve controlled cycling element with attention mechanism
The decoder chassis of code device and gating cycle unit, by carrying out influence power weight calculation (note one by one to coding and decoding vocabulary
Meaning power), substantially increase the translation accuracy of quotation context and quotation article title;Further, to decoding by quotation
Chapter title and chooses satisfactory text with to carry out cosine similarity calculating piece by piece by the article title in quotation chapter list storehouse
Zhang Zuowei recommends article list, greatly reduces the coupling between quotation-title translation and article recommendation, so that two work
It can independently carry out.
Detailed description of the invention
Fig. 1 is a kind of step schematic diagram of local quotation recommended method based on neural machine translation mothod;
Fig. 2 is a kind of functional schematic of local quotation recommender system based on neural machine translation mothod;
Fig. 3 is the logic diagram of step S2 in local quotation recommended method based on neural machine translation mothod a kind of;
Fig. 4 is the logic diagram of step S3 in local quotation recommended method based on neural machine translation mothod a kind of.
Specific embodiment
As shown in Figure 1 and Figure 2, Fig. 1, Fig. 2 are that a kind of local quotation based on neural machine translation proposed by the present invention is recommended
Method and system.
Referring to Fig.1, a kind of local quotation recommended method based on neural machine translation proposed by the present invention, including following step
It is rapid:
S1, the data cleansings such as quotation extraction, lemmatization, word frequency statistics operation is carried out to raw data set, obtain quotation
Context is with the parallel corpora and building for being drawn article title initially to by quotation chapter list storehouse;
It in present embodiment, extracts the quotation context of all English and removes unblind, retain word number and arrived 10
Quotation context between 28 simultaneously carries out lemmatization;Word frequency is counted, preceding 10000 vocabulary is retained, other vocabulary are replaced with "<UNK>"
It changes, then expands "<PAD>" less than 28 words, and extract to be drawn article title and carry out similar cleaning according to quotation context and grasp
Make.
In the actual operation process, step S1 specifically includes the following steps:
S11, it is extracted from initial data by this according to quotation position in quotation context using dictionary Corresponding matching algorithm
Initial article title data corresponding to quotation context construct original quotation using context-title knowledge base join algorithm
Context and drawn article title corresponding relationship;
S12, retain whole English quotation contexts using having built deactivated symbol knowledge base, and remove all nothings
Character, including some escape symbols, punctuation mark, formal notation, additional character etc. are imitated, is formed using word as the quotation of relationship tie
Context indicates set;
S13, participle operation is carried out to quotation context using stammerer participle library, counts word frequency, counts preceding 10000 word frequency
Vocabulary, building coding word lists library;
The whole quotation context of S14, traversal is replaced the vocabulary not being on the permanent staff in yard word lists library with "<UNK>", is right
Quotation context more than 28 words is truncated, to the quotation context polishing "<PAD>" less than 28 words, and reference format is generated
Quotation context;
S15, the operation to article title progress similar S12, S13, S14 is drawn, and according to the corresponding relationship extracted in S11
It generates quotation context and is drawn article title parallel corpora;
S2, the method for sampling is born by the rising space models coupling in term vector model by quotation context and by quotation chapter mark
The word occurred in topic is embedded into low-dimensional semantic space and obtains term vector, makes semantically similar word by an embedded space
Distance is closer in the space;
In the actual operation process, step S2 specifically includes the following steps:
S21, the form that sentence is divided into multiple (input words)-(output word) pair according to word window size;
S22, all words are converted to the 0-1 vector for being equivalent to vocabulary size (10000 word)
S23, building neural network include an input layer (the 0-1 vector for receiving 10000 dimensions), 100 nerves of hidden layer
First (term vector dimension), 10000 neurons of output layer, structure are as shown in Figure 3:
S24, negative sampling back transfer error, 10000 × 300 weight at word embeded matrix are added in rising space model
Matrix is exactly that the term vector finally obtained indicates.
S3, neural machine translation mothod, the volume of one bidirectional valve controlled cycling element with attention mechanism of building are based on
The decoder chassis of code device and gating cycle unit, it is word that the quotation context in parallel corpora, which is passed through term vector model conversion,
As the input of model after vector, is drawn article title as output and carry out training pattern;
In present embodiment, neural machine translation mothod and local quotation are recommended to combine, local quotation is recommended to indicate
At the machine translation problem from original language (quotation context) to object language (by article title is drawn).Building has attention machine
The encoder of the bidirectional valve controlled cycling element of system and the decoder chassis of gating cycle unit carry out semantic table to quotation context
The study of sign excavates from candidate vocabulary on the basis of understanding semantic and decodes kind of a subtitle, forms one kind with semantic content
For the kind subtitle tectonic model of linking.
In step S3, a quotation context refers to plurality of articles, is handled in the form of a plurality of parallel corpora.
In the actual operation process, step S3 is specifically included:
The encoder of bidirectional valve controlled cycling element of the building with attention mechanism and the decoder frame of gating cycle unit
Frame, as shown in Figure IV:
Encoder is made of a bidirectional valve controlled cycling element network, receives t-th of word of list entries in each moment t
Vector indicate and obtain hide layer state h<t>, acted on and being inputted by the hiding layer state of attention mechanism and output layer
Each word translation weight, further obtain final context vector and be sent into decoder and decode word.
The formula of encoder GRU unit is expressed as follows:
Gu=sigmod (Wu[h<t-1>, x<t>]+bu) --- update door
Gr=sigmod (Wr[h<t-1>, x<t>]+br) --- resetting door
--- update hidden layer variable
--- flow to the hidden layer variable of subsequent time
Wherein h<t>Indicate the hidden layer variable at h moment, x<t>Indicate the input of t moment, bu、br、bcIndicate biasing,
Sigmod, tanh are activation primitive W[u, r, c]It is weight parameter.
It is as follows that attention mechanism decodes partial routine:
When decoder decodes t-th of word, decoder t moment need to be calculated and hide layer state s<t>, the t-1 moment decodes
Word y<t-1>, the incoming context vector c of t moment encoder<t>, wherein decoder t moment hides layer state stIt can be by following public
Formula obtains:
s<t>=g (y<t-1>, s<t-1>, c<t>
The wherein incoming context variable c of t moment<t>By the hidden layer variable h of encoder<t>With each coding vocabulary and this
The translation attention for decoding vocabulary determines that formula is as follows:
WhereinIt is the attention of vector type, presentation code deviceA word pays attention to the translation of decoder whole word
Power,It can be obtained by following formula:
WhereinThe attention of scalar type, presentation code deviceA word pays attention to the translation of t-th of word of decoder
Power,It can be obtained by following formula:
Wherein vT, W[s, h]For parameters weighting.
Above procedure is recycled, until decoding whole words, as plants subtitle.
S4, by kind subtitle that coder-decoder frame exports and all article titles wait be drawn in article list
Cosine similarity calculating is carried out one by one;
In present embodiment, described kind of subtitle is the sequence by decoding in one group of candidate word, as all wait be drawn
The similarity of article title compares.
In the actual operation process, step S4 specifically includes the following steps:
S41, the similarity two-by-two for all decoding the vocabulary in candidate word remittance table is calculated, it is proposed that dictionary similarity search dictionary
Collection;
S42, to kind of subtitle and to be segmented by the article title in quotation chapter list storehouse, searched according to dictionary similarity
Similarity calculates kind of subtitle and every similarity wait be drawn article title by word in rope wordbook;
Similarity of the calculated result as kind of subtitle and this article in S43, accumulation step S42;
S44, the step S43 similarity result obtained is ranked up, forms literature recommendation list;
Article of the time after the article time where quotation context is delivered in S5, foundation article time, removal, chooses phase
Recommendation list is used as like degree preceding 20.
In present embodiment, preceding 20 recommendation list is a basic value, can be adjusted according to concrete scene to by quotation
Chapter quantity.
In the actual operation process, step S5 specifically includes the following steps:
Article or quotation context to not year information, in the literature recommendation list that direct recommendation step S4 is obtained
Preceding 20 article;To the article and quotation context for having year information, first removed from the literature recommendation list that step S4 is obtained
Article of the time after the article time where quotation context is delivered, preceding 20 article in recommending remaining list completes part
Quotation is recommended.
Referring to Fig. 2, a kind of local quotation recommender system based on neural machine translation proposed by the present invention, comprising:
Quotation cleaning module, for the quotation context of input to be processed into mark required by coder-decoder frame
Quasi- input corpus form;
In present embodiment, quotation cleaning module is specifically used for:
It removes the unblind in quotation context and replaces with the vocabulary not appeared in vocabulary in quotation context
"<UNK>" then carries out break-in operation more than 28 words and carries out lemmatization to all words less than 28 words then polishing "<PAD>", it
Whole vocabulary are converted into term vector with the good term vector model of pre-training afterwards;
Article enlargement module, on the basis of existing article list library Dynamic expansion wait for quotation chapter list storehouse, utilize net
Network crawler technology crawls the newest open article of pertinent literature searching platform, so that quotation context is arranged to quotation chapter in time
Table storehouse is more complete, comprehensively;
In present embodiment, article enlargement module is specifically used for:
The newest open article of coordinate indexing platform is crawled using web crawlers technology, carries out the cleaning in similar step S1
It is persistently dissolved into afterwards to quotation chapter list storehouse, Dynamic expansion and maintenance are to quotation chapter list storehouse;
Candidate word update module is recalculating word frequency, dynamic more new decoder after quotation chapter list storehouse is updated
Candidate word list when decoding kind subtitle;
In present embodiment, candidate word update module is specifically used for:
After quotation chapter list storehouse obtains update, participle is carried out to quotation chapter list storehouse title to the newest overall situation and is laid equal stress on
Word frequency is newly calculated, later candidate word list when dynamic more new decoder decoding kind subtitle, so that article list to be drawn reconciliation
Code device candidate's vocabulary maintains synchronization association state;
Quotation recommending module is calculated under the premise of given quotation context by core algorithm described in S2, S3, S4
Recommendation article list;
In present embodiment, quotation recommending module is specifically used for:
Local quotation is recommended and neural machine translation combines, the recommendation of local quotation is expressed as from original language (quotation
Context) to the machine translation problem of object language (by article title is drawn), it is calculated by core algorithm described in S2, S3, S4
Obtain recommendation article list under the premise of given quotation context.
The invention proposes a kind of novel local quotation recommended methods.Compared to traditional quotation recommended method, pass through structure
It builds quotation context to recommend to combine with neural machine translation to by quotation with the parallel corpora for being drawn article title, will locally draw
Text is recommended to regard as from quotation context (original language) to the machine translation problem for being drawn article title (object language), so that drawing
Stronger semantic consistency is provided between literary context and article title.The present invention passes through by quotation context and by quotation chapter
Vocabulary in title is embedded into low-dimensional semantic space, so that semantically similar word distance in the space is closer;Building
The encoder of bidirectional valve controlled cycling element with attention mechanism and the decoder chassis of gating cycle unit, by volume
Code and decoding vocabulary carry out influence power weight calculation (attention) one by one, substantially increase quotation context and quotation article title
Translation accuracy;Further, to decode drawn article title with to by the article title in quotation chapter list storehouse piece by piece
Cosine similarity calculating is carried out, and chooses preceding 20 article as article list is recommended, greatly reduces quotation-title translation and text
Coupling between chapter recommendation, so that two work can be carried out independently.The preferably specific implementation of the above, the only present invention
Mode, but scope of protection of the present invention is not limited thereto, anyone skilled in the art the invention discloses
Technical scope in, be subject to equivalent substitution or change according to the technical scheme of the invention and its inventive conception, should all cover this
Within the protection scope of invention.
Claims (10)
1. a kind of local quotation recommended method based on neural machine translation mothod, which comprises the following steps:
S1, quotation extraction, lemmatization, word frequency statistics data cleansing operation are carried out to raw data set, obtain quotation context
With the parallel corpora that is drawn article title and construct initially to by quotation chapter list storehouse;
S2, by quotation context and drawn in article title by the method for the negative sampling of rising space models coupling in term vector model
The word of appearance is embedded into low-dimensional semantic space and obtains term vector, by an embedded space make semantically similar word at this
Distance is closer in space;
S3, neural machine translation mothod, the encoder of one bidirectional valve controlled cycling element with attention mechanism of building are based on
With the decoder chassis of gating cycle unit, it is term vector that the quotation context in parallel corpora, which is passed through term vector model conversion,
Afterwards as the input of model, is drawn article title as output and carry out training pattern;
S4, by kind subtitle that coder-decoder frame exports with all article titles wait be drawn in article list one by one
Carry out cosine similarity calculating;
Article of the time after the article time where quotation context is delivered in S5, foundation article time, removal, chooses similarity
Satisfactory article is as recommendation list.
2. a kind of local quotation recommended method based on neural machine translation mothod according to claim 1, feature exist
In step S1 is specifically included:
It extracts the quotation context of all English and removes unblind, retain the quotation context of word number within the set range
And carry out lemmatization;Word frequency is counted, retains the vocabulary of setting ranking before ranking, other vocabulary are with<UNK>replacement, deficiency setting
Word in range then expands<PAD>, and is drawn article title according to the extraction of quotation context and carry out similar cleaning operation.
3. a kind of local quotation recommended method based on neural machine translation mothod according to claim 1, feature exist
In step S2 is specifically included:
S21, sentence is divided by multiple input words form opposite with word is exported according to word window size;
S22, all words are converted to the 0-1 vector for being equivalent to vocabulary size;
S23, building neural network, include an input layer, hidden layer, output layer;
S24, negative sampling back transfer error is added in rising space model, the weight matrix at word embeded matrix is exactly finally to obtain
Term vector indicate.
4. a kind of local quotation recommended method based on neural machine translation mothod according to claim 1, feature exist
In the step S3 specifically:
The encoder of bidirectional valve controlled cycling element of the building with attention mechanism and the decoder chassis pair of gating cycle unit
Quotation context carries out the study of characterizing semantics, excavates from candidate vocabulary on the basis of understanding semantic and decodes seed mark
Topic, formed it is a kind of with semantic content be connected kind subtitle tectonic model;
The encoder of bidirectional valve controlled cycling element of the building with attention mechanism and the decoder chassis tool of gating cycle unit
Body are as follows:
Encoder is made of a bidirectional valve controlled cycling element network, each moment t receive list entries t-th of word to
Amount indicates and obtains hiding layer state h<t>, acted on by the hiding layer state of attention mechanism and output layer inputted it is every
The translation weight of a word, further obtains final context vector and is sent into decoder and decode word;
The formula of encoder GRU unit is expressed as follows:
Gu=sigmod (Wu[h<t-1>, x<t>]+bu)
Gr=sigmod (Wr[h<t-1>, x<t>]+br)
Wherein GuTo update door, GrTo reset door,To update hidden layer variable, C<t>For the hidden layer for flowing to subsequent time
Variable, h<t>Indicate the hidden layer variable at h moment, x<t>Indicate the input of t moment, bu、br、bcIndicate biasing, sigmod, tanh
It is activation primitive W[u, r, c]It is weight parameter.
It is as follows that attention mechanism decodes partial routine:
When decoder decodes t-th of word, decoder t moment need to be calculated and hide layer state s<t>, word y that the t-1 moment decodes<t-1>, the incoming context vector c of t moment encoder<t>, wherein decoder t moment hides layer state stIt can be obtained by following formula
It arrives:
s<t>=g (y<t-1>, s<t-1>,c<t>)
The wherein incoming context variable c of t moment<t>By the hidden layer variable h of encoder<t>With each coding vocabulary and the decoding
The translation attention of vocabulary determines that formula is as follows:
WhereinIt is the attention of vector type, presentation code deviceA word to the translation attention of decoder whole word,It can be obtained by following formula:
WhereinThe attention of scalar type, presentation code deviceA word to the translation attention of t-th of word of decoder,It can be obtained by following formula:
Wherein vT, W[s, h]For parameters weighting;
Above procedure is recycled, until decoding whole words, as plants subtitle.
5. a kind of local quotation recommended method based on neural machine translation mothod according to claim 1, feature exist
In step S4 specifically:
S41, the similarity two-by-two for all decoding the vocabulary in candidate word remittance table is calculated, establishes dictionary similarity search wordbook;
S42, to kind of subtitle and to be segmented by the article title in quotation chapter list storehouse, according to dictionary similarity search word
Allusion quotation concentrates similarity to calculate kind of subtitle and every similarity wait be drawn article title by word;
Similarity of the calculated result as kind of subtitle and this article in S43, accumulation step S42;
S44, the step S43 similarity result obtained is ranked up, forms literature recommendation list.
6. a kind of local quotation recommender system based on neural machine translation mothod, which is characterized in that wanted applied to aforesaid right
Seek 1 to 5 any method, comprising:
Quotation cleaning module, it is defeated for the quotation context of input to be processed into standard required by coder-decoder frame
Enter corpus form;
Article enlargement module, on the basis of existing article list library Dynamic expansion wait for quotation chapter list storehouse, climbed using network
Worm technology crawls the newest open article of pertinent literature searching platform in time so that quotation context to quotation chapter list storehouse
It is more complete, comprehensively;
Candidate word update module is recalculating word frequency, dynamic more new decoder decoding after quotation chapter list storehouse is updated
Candidate word list when kind subtitle;
Recommendation article list under the premise of given quotation context is calculated in quotation recommending module.
7. a kind of local quotation recommender system based on neural machine translation according to claim 6, which is characterized in that draw
Literary cleaning module is specifically used for:
It removes the unblind in quotation context and the vocabulary not appeared in vocabulary in quotation context is replaced with into < UNK
>, word in insufficient setting range then polishing<PAD>then carries out break-in operation more than the word of setting range and to all words
Lemmatization is carried out, whole vocabulary are converted into term vector with the good term vector model of pre-training later.
8. a kind of local quotation recommender system based on neural machine translation mothod according to claim 6, feature exist
In the newest open article of coordinate indexing platform being crawled using web crawlers technology, to raw data set in article enlargement module
The data cleansings such as quotation extraction, lemmatization, word frequency statistics operation is carried out, quotation context is obtained and is drawn the flat of article title
Row corpus simultaneously constructs initially to which by quotation chapter list storehouse, Dynamic expansion and maintenance are to quotation chapter list storehouse.
9. a kind of local quotation recommender system based on neural machine translation according to claim 6, which is characterized in that wait
It selects in word update module, after quotation chapter list storehouse obtains update, the newest overall situation is carried out to quotation chapter list storehouse title
Word frequency is segmented and recalculates, later candidate word list when dynamic more new decoder decoding kind subtitle, so as to quotation chapter
List and decoder candidate's vocabulary maintain synchronization association state.
10. a kind of local quotation recommender system based on neural machine translation according to claim 6, which is characterized in that
In quotation recommending module, local quotation is recommended and neural machine translation combines, the recommendation of local quotation is expressed as from source language
Pushing away under the premise of given quotation context is calculated by quotation recommending module in the machine translation problem for saying object language
Recommend article list.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810994562.0A CN109145190B (en) | 2018-08-27 | 2018-08-27 | Local citation recommendation method and system based on neural machine translation technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810994562.0A CN109145190B (en) | 2018-08-27 | 2018-08-27 | Local citation recommendation method and system based on neural machine translation technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109145190A true CN109145190A (en) | 2019-01-04 |
CN109145190B CN109145190B (en) | 2021-07-30 |
Family
ID=64828908
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810994562.0A Active CN109145190B (en) | 2018-08-27 | 2018-08-27 | Local citation recommendation method and system based on neural machine translation technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109145190B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109740164A (en) * | 2019-01-09 | 2019-05-10 | 国网浙江省电力有限公司舟山供电公司 | Based on the matched electric power defect rank recognition methods of deep semantic |
CN109740168A (en) * | 2019-01-09 | 2019-05-10 | 北京邮电大学 | A kind of classic of TCM ancient Chinese prose interpretation method based on knowledge of TCM map and attention mechanism |
CN109753567A (en) * | 2019-01-31 | 2019-05-14 | 安徽大学 | A kind of file classification method of combination title and text attention mechanism |
CN110276082A (en) * | 2019-06-06 | 2019-09-24 | 百度在线网络技术(北京)有限公司 | Translation processing method and device based on dynamic window |
CN110472727A (en) * | 2019-07-25 | 2019-11-19 | 昆明理工大学 | Based on the neural machine translation method read again with feedback mechanism |
CN111061935A (en) * | 2019-12-16 | 2020-04-24 | 北京理工大学 | Science and technology writing recommendation method based on self-attention mechanism |
CN111581401A (en) * | 2020-05-06 | 2020-08-25 | 西安交通大学 | Local citation recommendation system and method based on depth correlation matching |
CN112035607A (en) * | 2020-08-19 | 2020-12-04 | 中南大学 | MG-LSTM-based citation difference matching method, device and storage medium |
CN112395892A (en) * | 2020-12-03 | 2021-02-23 | 内蒙古工业大学 | Mongolian Chinese machine translation method for realizing placeholder disambiguation based on pointer generation network |
CN112765342A (en) * | 2021-03-22 | 2021-05-07 | 中国电子科技集团公司第二十八研究所 | Article recommendation method based on time and semantics |
CN113239181A (en) * | 2021-05-14 | 2021-08-10 | 廖伟智 | Scientific and technological literature citation recommendation method based on deep learning |
CN113268951A (en) * | 2021-04-30 | 2021-08-17 | 南京邮电大学 | Citation recommendation method based on deep learning |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140006424A1 (en) * | 2012-06-29 | 2014-01-02 | Khalid Al-Kofahi | Systems, methods, and software for processing, presenting, and recommending citations |
CN105589948A (en) * | 2015-12-18 | 2016-05-18 | 重庆邮电大学 | Document citation network visualization and document recommendation method and system |
US20160321316A1 (en) * | 2011-06-03 | 2016-11-03 | Gdial Inc. | Systems and methods for atomizing and individuating data as data quanta |
US9607058B1 (en) * | 2016-05-20 | 2017-03-28 | BlackBox IP Corporation | Systems and methods for managing documents associated with one or more patent applications |
CN106682172A (en) * | 2016-12-28 | 2017-05-17 | 江苏大学 | Keyword-based document research hotspot recommending method |
CN106844368A (en) * | 2015-12-03 | 2017-06-13 | 华为技术有限公司 | For interactive method, nerve network system and user equipment |
CN107341199A (en) * | 2017-06-21 | 2017-11-10 | 北京林业大学 | A kind of recommendation method based on documentation & info general model |
US20180018831A1 (en) * | 2016-07-15 | 2018-01-18 | Charlena L. Thorpe | Licensing and ticketing system for traffic violation |
US20180060790A1 (en) * | 2016-08-26 | 2018-03-01 | Conduent Business Services, Llc | System And Method For Coordinating Parking Enforcement Officer Patrol In Real Time With The Aid Of A Digital Computer |
GB2556664A (en) * | 2016-11-07 | 2018-06-06 | Google Llc | Third party application configuration for issuing notifications |
-
2018
- 2018-08-27 CN CN201810994562.0A patent/CN109145190B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160321316A1 (en) * | 2011-06-03 | 2016-11-03 | Gdial Inc. | Systems and methods for atomizing and individuating data as data quanta |
US20140006424A1 (en) * | 2012-06-29 | 2014-01-02 | Khalid Al-Kofahi | Systems, methods, and software for processing, presenting, and recommending citations |
CN106844368A (en) * | 2015-12-03 | 2017-06-13 | 华为技术有限公司 | For interactive method, nerve network system and user equipment |
CN105589948A (en) * | 2015-12-18 | 2016-05-18 | 重庆邮电大学 | Document citation network visualization and document recommendation method and system |
US9607058B1 (en) * | 2016-05-20 | 2017-03-28 | BlackBox IP Corporation | Systems and methods for managing documents associated with one or more patent applications |
US20180018831A1 (en) * | 2016-07-15 | 2018-01-18 | Charlena L. Thorpe | Licensing and ticketing system for traffic violation |
US20180060790A1 (en) * | 2016-08-26 | 2018-03-01 | Conduent Business Services, Llc | System And Method For Coordinating Parking Enforcement Officer Patrol In Real Time With The Aid Of A Digital Computer |
GB2556664A (en) * | 2016-11-07 | 2018-06-06 | Google Llc | Third party application configuration for issuing notifications |
CN106682172A (en) * | 2016-12-28 | 2017-05-17 | 江苏大学 | Keyword-based document research hotspot recommending method |
CN107341199A (en) * | 2017-06-21 | 2017-11-10 | 北京林业大学 | A kind of recommendation method based on documentation & info general model |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109740164A (en) * | 2019-01-09 | 2019-05-10 | 国网浙江省电力有限公司舟山供电公司 | Based on the matched electric power defect rank recognition methods of deep semantic |
CN109740168A (en) * | 2019-01-09 | 2019-05-10 | 北京邮电大学 | A kind of classic of TCM ancient Chinese prose interpretation method based on knowledge of TCM map and attention mechanism |
CN109740164B (en) * | 2019-01-09 | 2023-08-15 | 国网浙江省电力有限公司舟山供电公司 | Electric power defect grade identification method based on depth semantic matching |
CN109753567A (en) * | 2019-01-31 | 2019-05-14 | 安徽大学 | A kind of file classification method of combination title and text attention mechanism |
CN110276082A (en) * | 2019-06-06 | 2019-09-24 | 百度在线网络技术(北京)有限公司 | Translation processing method and device based on dynamic window |
CN110472727A (en) * | 2019-07-25 | 2019-11-19 | 昆明理工大学 | Based on the neural machine translation method read again with feedback mechanism |
CN110472727B (en) * | 2019-07-25 | 2021-05-11 | 昆明理工大学 | Neural machine translation method based on re-reading and feedback mechanism |
CN111061935A (en) * | 2019-12-16 | 2020-04-24 | 北京理工大学 | Science and technology writing recommendation method based on self-attention mechanism |
CN111061935B (en) * | 2019-12-16 | 2022-04-12 | 北京理工大学 | Science and technology writing recommendation method based on self-attention mechanism |
CN111581401A (en) * | 2020-05-06 | 2020-08-25 | 西安交通大学 | Local citation recommendation system and method based on depth correlation matching |
CN111581401B (en) * | 2020-05-06 | 2023-04-07 | 西安交通大学 | Local citation recommendation system and method based on depth correlation matching |
CN112035607A (en) * | 2020-08-19 | 2020-12-04 | 中南大学 | MG-LSTM-based citation difference matching method, device and storage medium |
CN112035607B (en) * | 2020-08-19 | 2022-05-20 | 中南大学 | Method, device and storage medium for matching citation difference based on MG-LSTM |
CN112395892A (en) * | 2020-12-03 | 2021-02-23 | 内蒙古工业大学 | Mongolian Chinese machine translation method for realizing placeholder disambiguation based on pointer generation network |
CN112765342B (en) * | 2021-03-22 | 2022-10-14 | 中国电子科技集团公司第二十八研究所 | Article recommendation method based on time and semantics |
CN112765342A (en) * | 2021-03-22 | 2021-05-07 | 中国电子科技集团公司第二十八研究所 | Article recommendation method based on time and semantics |
CN113268951A (en) * | 2021-04-30 | 2021-08-17 | 南京邮电大学 | Citation recommendation method based on deep learning |
CN113268951B (en) * | 2021-04-30 | 2023-05-30 | 南京邮电大学 | Deep learning-based quotation recommendation method |
CN113239181A (en) * | 2021-05-14 | 2021-08-10 | 廖伟智 | Scientific and technological literature citation recommendation method based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN109145190B (en) | 2021-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109145190A (en) | A kind of local quotation recommended method and system based on neural machine translation mothod | |
CN111159223B (en) | Interactive code searching method and device based on structured embedding | |
CN107133213B (en) | Method and system for automatically extracting text abstract based on algorithm | |
CN112559556B (en) | Language model pre-training method and system for table mode analysis and sequence mask | |
WO2021114745A1 (en) | Named entity recognition method employing affix perception for use in social media | |
CN109800411A (en) | Clinical treatment entity and its attribute extraction method | |
CN107273355A (en) | A kind of Chinese word vector generation method based on words joint training | |
CN109918510A (en) | Cross-cutting keyword extracting method | |
CN110020189A (en) | A kind of article recommended method based on Chinese Similarity measures | |
CN107480132A (en) | A kind of classic poetry generation method of image content-based | |
Huang et al. | Neural temporality adaptation for document classification: Diachronic word embeddings and domain adaptation models | |
CN104991890A (en) | Method for constructing Vietnamese dependency tree bank on basis of Chinese-Vietnamese vocabulary alignment corpora | |
CN109977250A (en) | Merge the depth hashing image search method of semantic information and multistage similitude | |
Zhang et al. | Effective subword segmentation for text comprehension | |
CN110516145B (en) | Information searching method based on sentence vector coding | |
CN103744956A (en) | Diversified expansion method of keyword | |
CN111291188A (en) | Intelligent information extraction method and system | |
CN111967267B (en) | XLNET-based news text region extraction method and system | |
CN107092605A (en) | A kind of entity link method and device | |
CN110888991A (en) | Sectional semantic annotation method in weak annotation environment | |
CN112364132A (en) | Similarity calculation model and system based on dependency syntax and method for building system | |
CN112036178A (en) | Distribution network entity related semantic search method | |
CN109086255A (en) | A kind of bibliography automatic marking method and system based on deep learning | |
CN113268606A (en) | Knowledge graph construction method and device | |
CN110222338A (en) | A kind of mechanism name entity recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |