CN109145190A - A kind of local quotation recommended method and system based on neural machine translation mothod - Google Patents

A kind of local quotation recommended method and system based on neural machine translation mothod Download PDF

Info

Publication number
CN109145190A
CN109145190A CN201810994562.0A CN201810994562A CN109145190A CN 109145190 A CN109145190 A CN 109145190A CN 201810994562 A CN201810994562 A CN 201810994562A CN 109145190 A CN109145190 A CN 109145190A
Authority
CN
China
Prior art keywords
quotation
word
article
context
decoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810994562.0A
Other languages
Chinese (zh)
Other versions
CN109145190B (en
Inventor
赵姝
王鑫
刘洋
陈洁
段震
张燕平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN201810994562.0A priority Critical patent/CN109145190B/en
Publication of CN109145190A publication Critical patent/CN109145190A/en
Application granted granted Critical
Publication of CN109145190B publication Critical patent/CN109145190B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Abstract

The present invention discloses local quotation recommended method and system based on neural machine translation mothod, quotation extraction, lemmatization, word frequency statistics data cleansing operation are carried out to raw data set, obtain quotation context and is drawn the parallel corpora of article title and constructed initially to by quotation chapter list storehouse;By quotation context and drawn the word occurred in article title by the method for the negative sampling of rising space models coupling in term vector model and be embedded into low-dimensional semantic space and obtain term vector, the encoder of one bidirectional valve controlled cycling element with attention mechanism of building and the decoder chassis of gating cycle unit, quotation context in parallel corpora is passed through into term vector model conversion as the input after term vector as model, is drawn article title as output and carrys out training pattern;By kind subtitle that coder-decoder frame exports with to carry out cosine similarity calculating one by one by all article titles drawn in article list;According to the article time, satisfactory article is chosen as recommendation list.

Description

A kind of local quotation recommended method and system based on neural machine translation mothod
Technical field
The present invention relates to a kind of technical field of information retrieval more particularly to a kind of local quotations based on neural machine translation Recommended method and system.
Background technique
With the fast development of Internet technology, a large amount of new scientific articles can be all published every year, how from magnanimity document In quickly find out oneself needs document at a big difficulty.Local quotation recommendation may assist in given one section of context Under the premise of, rapid build model of mind in semantic and content helps you quickly to find the research neck with you from magnanimity document The relevant document for reference in domain directly recommends the document for reference for you, this is saved in research work for you A large amount of times for finding pertinent literature.Local quotation recommendation plays considerable effect in research work.
In recent years, many researchers expand research to this.Two classes are broadly divided into, first is that global quotation is recommended, i.e., It is independent article and recommends quotation;Second is that recommending quotation for one section of context text in article.Institute's use research method generally has Method based on text similarity, the method based on topic model, the method based on translation model, the side based on collaborative filtering Method, the method based on deep learning and some other methods.
Neural machine translation is a set of coder-decoder frame proposed by Google in 2014, in machine translation problem On achieve considerable progress.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of local quotation recommendation side based on neural machine translation mothod Method, to improve the translation accuracy of quotation context and quotation article title.
A kind of local quotation recommended method based on neural machine translation mothod, comprising the following steps:
S1, quotation extraction, lemmatization, word frequency statistics data cleansing operation are carried out to raw data set, obtain on quotation Hereafter with the parallel corpora that is drawn article title and construct initially to by quotation chapter list storehouse;
S2, the method for sampling is born by the rising space models coupling in term vector model by quotation context and by quotation chapter mark The word occurred in topic is embedded into low-dimensional semantic space and obtains term vector, makes semantically similar word by an embedded space Distance is closer in the space;
S3, neural machine translation mothod, the volume of one bidirectional valve controlled cycling element with attention mechanism of building are based on The decoder chassis of code device and gating cycle unit, it is word that the quotation context in parallel corpora, which is passed through term vector model conversion, As the input of model after vector, is drawn article title as output and carry out training pattern;
S4, by kind subtitle that coder-decoder frame exports and all article titles wait be drawn in article list Cosine similarity calculating is carried out one by one;
Article of the time after the article time where quotation context is delivered in S5, foundation article time, removal, chooses phase Like the satisfactory article of degree as recommendation list.
Scheme as a preferred embodiment of the above technical solution, step S1 are specifically included:
It extracts the quotation context of all English and removes unblind, retain on the quotation of word number within the set range Hereafter and carry out lemmatization;Word frequency is counted, the vocabulary of setting ranking before ranking is retained, other vocabulary are insufficient with<UNK>replacement Word in setting range then expands<PAD>, and is drawn article title according to the extraction of quotation context and carry out similar cleaning Operation.
Scheme as a preferred embodiment of the above technical solution, step S2 are specifically included:
S21, sentence is divided by multiple input words form opposite with word is exported according to word window size;
S22, all words are converted to the 0-1 vector for being equivalent to vocabulary size;
S23, building neural network, include an input layer, hidden layer, output layer;
S24, negative sampling back transfer error is added in rising space model, the weight matrix at word embeded matrix is exactly last The term vector obtained indicates.
Scheme as a preferred embodiment of the above technical solution, the step S3 specifically:
The encoder of bidirectional valve controlled cycling element of the building with attention mechanism and the decoder frame of gating cycle unit Frame carries out the study of characterizing semantics to quotation context, excavates from candidate vocabulary on the basis of understanding semantic and decodes seed Title, formed it is a kind of with semantic content be linking kind subtitle tectonic model;
The encoder of bidirectional valve controlled cycling element of the building with attention mechanism and the decoder frame of gating cycle unit Frame specifically:
Encoder is made of a bidirectional valve controlled cycling element network, receives t-th of word of list entries in each moment t Vector indicate and obtain hide layer state h<t>, acted on and being inputted by the hiding layer state of attention mechanism and output layer Each word translation weight, further obtain final context vector and be sent into decoder and decode word;
The formula of encoder GRU unit is expressed as follows:
Gu=sigmod (Wu[h<t-1>, x<t>]+bu)
Gr=sigmod (Wr[h<t-1>, x<t>]+br)
Wherein GuTo update door, GrTo reset door,To update hidden layer variable, C<t>To flow to the hidden of subsequent time Hide layer variable, h<t>Indicate the hidden layer variable at h moment, x<t>Indicate the input of t moment, bu、br、bcIt indicates to bias, sigmod, Tanh is activation primitive W[u, r, c]It is weight parameter.
It is as follows that attention mechanism decodes partial routine:
When decoder decodes t-th of word, decoder t moment need to be calculated and hide layer state s<t>, the t-1 moment decodes Word y<t-1>, the incoming context vector c of t moment encoder<t>, wherein decoder t moment hides layer state stIt can be by following public Formula obtains:
S<t>=g (y<t-1>, s<t-1>, c<t>
The wherein incoming context variable c of t moment<t>By the hidden layer variable h of encoder<t>With each coding vocabulary and this The translation attention for decoding vocabulary determines that formula is as follows:
WhereinIt is the attention of vector type, presentation code deviceA word pays attention to the translation of decoder whole word Power,It can be obtained by following formula:
WhereinThe attention of scalar type, presentation code deviceA word pays attention to the translation of t-th of word of decoder Power,It can be obtained by following formula:
Wherein vT, W[s, h]For parameters weighting;
Above procedure is recycled, until decoding whole words, as plants subtitle.
Scheme as a preferred embodiment of the above technical solution, step S4 specifically:
S41, the similarity two-by-two for all decoding the vocabulary in candidate word remittance table is calculated, establishes dictionary similarity search dictionary Collection;
S42, to kind of subtitle and to be segmented by the article title in quotation chapter list storehouse, searched according to dictionary similarity Similarity calculates kind of subtitle and every similarity wait be drawn article title by word in rope wordbook;
Similarity of the calculated result as kind of subtitle and this article in S43, accumulation step S42;
S44, the step S43 similarity result obtained is ranked up, forms literature recommendation list.
The local quotation recommender system based on neural machine translation mothod that the present invention also provides a kind of, applied to above-mentioned side Method, comprising:
Quotation cleaning module, for the quotation context of input to be processed into mark required by coder-decoder frame Quasi- input corpus form;
Article enlargement module, on the basis of existing article list library Dynamic expansion wait for quotation chapter list storehouse, utilize net Network crawler technology crawls the newest open article of pertinent literature searching platform, so that quotation context is arranged to quotation chapter in time Table storehouse is more complete, comprehensively;
Candidate word update module is recalculating word frequency, dynamic more new decoder after quotation chapter list storehouse is updated Candidate word list when decoding kind subtitle;
Recommendation article list under the premise of given quotation context is calculated in quotation recommending module.
Scheme as a preferred embodiment of the above technical solution, quotation cleaning module are specifically used for:
Unblind in removal quotation context and replace with the vocabulary not appeared in vocabulary in quotation context < UNK>, then polishing<PAD>, the word more than setting range then carry out break-in operation and to all to the word in insufficient setting range Word carries out lemmatization, and whole vocabulary are converted to term vector with the good term vector model of pre-training later.
Scheme as a preferred embodiment of the above technical solution in article enlargement module, crawls related inspection using web crawlers technology The newest open article of Suo Pingtai, carries out the data cleansings such as quotation extraction, lemmatization, word frequency statistics to raw data set and grasps Make, obtain quotation context and is drawn the parallel corpora of article title and constructed initially to by quotation chapter list storehouse, Dynamic expansion With maintenance to quotation chapter list storehouse.
Scheme as a preferred embodiment of the above technical solution in candidate word update module, is updated to quotation chapter list storehouse Afterwards, word frequency is segmented and is recalculated to quotation chapter list storehouse title to the newest overall situation, later dynamic more new decoder solution Candidate word list when code kind subtitle, so that article list to be drawn and decoder candidate's vocabulary maintain synchronization association state.
Scheme as a preferred embodiment of the above technical solution in quotation recommending module, local quotation is recommended and neural machine turns over It translates and combines, local quotation is recommended to be expressed as the machine translation problem from original language to object language, mould is recommended by quotation Recommendation article list under the premise of given quotation context is calculated in block.
The invention proposes a kind of novel local quotation recommended methods.Compared to traditional quotation recommended method, pass through structure It builds quotation context to recommend to combine with neural machine translation to by quotation with the parallel corpora for being drawn article title, will locally draw Text is recommended to regard as from quotation context (original language) to the machine translation problem for being drawn article title (object language), so that drawing Stronger semantic consistency is provided between literary context and article title, it is last according to the kind subtitle translated with wait be drawn Article in article list library carries out cosine similarity calculating, obtains wait be drawn article list.
The present invention by being embedded into low-dimensional semantic space by quotation context and by the vocabulary drawn in article title so that Semantically similar word distance in the space is closer;Construct the volume of the bidirectional valve controlled cycling element with attention mechanism The decoder chassis of code device and gating cycle unit, by carrying out influence power weight calculation (note one by one to coding and decoding vocabulary Meaning power), substantially increase the translation accuracy of quotation context and quotation article title;Further, to decoding by quotation Chapter title and chooses satisfactory text with to carry out cosine similarity calculating piece by piece by the article title in quotation chapter list storehouse Zhang Zuowei recommends article list, greatly reduces the coupling between quotation-title translation and article recommendation, so that two work It can independently carry out.
Detailed description of the invention
Fig. 1 is a kind of step schematic diagram of local quotation recommended method based on neural machine translation mothod;
Fig. 2 is a kind of functional schematic of local quotation recommender system based on neural machine translation mothod;
Fig. 3 is the logic diagram of step S2 in local quotation recommended method based on neural machine translation mothod a kind of;
Fig. 4 is the logic diagram of step S3 in local quotation recommended method based on neural machine translation mothod a kind of.
Specific embodiment
As shown in Figure 1 and Figure 2, Fig. 1, Fig. 2 are that a kind of local quotation based on neural machine translation proposed by the present invention is recommended Method and system.
Referring to Fig.1, a kind of local quotation recommended method based on neural machine translation proposed by the present invention, including following step It is rapid:
S1, the data cleansings such as quotation extraction, lemmatization, word frequency statistics operation is carried out to raw data set, obtain quotation Context is with the parallel corpora and building for being drawn article title initially to by quotation chapter list storehouse;
It in present embodiment, extracts the quotation context of all English and removes unblind, retain word number and arrived 10 Quotation context between 28 simultaneously carries out lemmatization;Word frequency is counted, preceding 10000 vocabulary is retained, other vocabulary are replaced with "<UNK>" It changes, then expands "<PAD>" less than 28 words, and extract to be drawn article title and carry out similar cleaning according to quotation context and grasp Make.
In the actual operation process, step S1 specifically includes the following steps:
S11, it is extracted from initial data by this according to quotation position in quotation context using dictionary Corresponding matching algorithm Initial article title data corresponding to quotation context construct original quotation using context-title knowledge base join algorithm Context and drawn article title corresponding relationship;
S12, retain whole English quotation contexts using having built deactivated symbol knowledge base, and remove all nothings Character, including some escape symbols, punctuation mark, formal notation, additional character etc. are imitated, is formed using word as the quotation of relationship tie Context indicates set;
S13, participle operation is carried out to quotation context using stammerer participle library, counts word frequency, counts preceding 10000 word frequency Vocabulary, building coding word lists library;
The whole quotation context of S14, traversal is replaced the vocabulary not being on the permanent staff in yard word lists library with "<UNK>", is right Quotation context more than 28 words is truncated, to the quotation context polishing "<PAD>" less than 28 words, and reference format is generated Quotation context;
S15, the operation to article title progress similar S12, S13, S14 is drawn, and according to the corresponding relationship extracted in S11 It generates quotation context and is drawn article title parallel corpora;
S2, the method for sampling is born by the rising space models coupling in term vector model by quotation context and by quotation chapter mark The word occurred in topic is embedded into low-dimensional semantic space and obtains term vector, makes semantically similar word by an embedded space Distance is closer in the space;
In the actual operation process, step S2 specifically includes the following steps:
S21, the form that sentence is divided into multiple (input words)-(output word) pair according to word window size;
S22, all words are converted to the 0-1 vector for being equivalent to vocabulary size (10000 word)
S23, building neural network include an input layer (the 0-1 vector for receiving 10000 dimensions), 100 nerves of hidden layer First (term vector dimension), 10000 neurons of output layer, structure are as shown in Figure 3:
S24, negative sampling back transfer error, 10000 × 300 weight at word embeded matrix are added in rising space model Matrix is exactly that the term vector finally obtained indicates.
S3, neural machine translation mothod, the volume of one bidirectional valve controlled cycling element with attention mechanism of building are based on The decoder chassis of code device and gating cycle unit, it is word that the quotation context in parallel corpora, which is passed through term vector model conversion, As the input of model after vector, is drawn article title as output and carry out training pattern;
In present embodiment, neural machine translation mothod and local quotation are recommended to combine, local quotation is recommended to indicate At the machine translation problem from original language (quotation context) to object language (by article title is drawn).Building has attention machine The encoder of the bidirectional valve controlled cycling element of system and the decoder chassis of gating cycle unit carry out semantic table to quotation context The study of sign excavates from candidate vocabulary on the basis of understanding semantic and decodes kind of a subtitle, forms one kind with semantic content For the kind subtitle tectonic model of linking.
In step S3, a quotation context refers to plurality of articles, is handled in the form of a plurality of parallel corpora.
In the actual operation process, step S3 is specifically included:
The encoder of bidirectional valve controlled cycling element of the building with attention mechanism and the decoder frame of gating cycle unit Frame, as shown in Figure IV:
Encoder is made of a bidirectional valve controlled cycling element network, receives t-th of word of list entries in each moment t Vector indicate and obtain hide layer state h<t>, acted on and being inputted by the hiding layer state of attention mechanism and output layer Each word translation weight, further obtain final context vector and be sent into decoder and decode word.
The formula of encoder GRU unit is expressed as follows:
Gu=sigmod (Wu[h<t-1>, x<t>]+bu) --- update door
Gr=sigmod (Wr[h<t-1>, x<t>]+br) --- resetting door
--- update hidden layer variable
--- flow to the hidden layer variable of subsequent time
Wherein h<t>Indicate the hidden layer variable at h moment, x<t>Indicate the input of t moment, bu、br、bcIndicate biasing, Sigmod, tanh are activation primitive W[u, r, c]It is weight parameter.
It is as follows that attention mechanism decodes partial routine:
When decoder decodes t-th of word, decoder t moment need to be calculated and hide layer state s<t>, the t-1 moment decodes Word y<t-1>, the incoming context vector c of t moment encoder<t>, wherein decoder t moment hides layer state stIt can be by following public Formula obtains:
s<t>=g (y<t-1>, s<t-1>, c<t>
The wherein incoming context variable c of t moment<t>By the hidden layer variable h of encoder<t>With each coding vocabulary and this The translation attention for decoding vocabulary determines that formula is as follows:
WhereinIt is the attention of vector type, presentation code deviceA word pays attention to the translation of decoder whole word Power,It can be obtained by following formula:
WhereinThe attention of scalar type, presentation code deviceA word pays attention to the translation of t-th of word of decoder Power,It can be obtained by following formula:
Wherein vT, W[s, h]For parameters weighting.
Above procedure is recycled, until decoding whole words, as plants subtitle.
S4, by kind subtitle that coder-decoder frame exports and all article titles wait be drawn in article list Cosine similarity calculating is carried out one by one;
In present embodiment, described kind of subtitle is the sequence by decoding in one group of candidate word, as all wait be drawn The similarity of article title compares.
In the actual operation process, step S4 specifically includes the following steps:
S41, the similarity two-by-two for all decoding the vocabulary in candidate word remittance table is calculated, it is proposed that dictionary similarity search dictionary Collection;
S42, to kind of subtitle and to be segmented by the article title in quotation chapter list storehouse, searched according to dictionary similarity Similarity calculates kind of subtitle and every similarity wait be drawn article title by word in rope wordbook;
Similarity of the calculated result as kind of subtitle and this article in S43, accumulation step S42;
S44, the step S43 similarity result obtained is ranked up, forms literature recommendation list;
Article of the time after the article time where quotation context is delivered in S5, foundation article time, removal, chooses phase Recommendation list is used as like degree preceding 20.
In present embodiment, preceding 20 recommendation list is a basic value, can be adjusted according to concrete scene to by quotation Chapter quantity.
In the actual operation process, step S5 specifically includes the following steps:
Article or quotation context to not year information, in the literature recommendation list that direct recommendation step S4 is obtained Preceding 20 article;To the article and quotation context for having year information, first removed from the literature recommendation list that step S4 is obtained Article of the time after the article time where quotation context is delivered, preceding 20 article in recommending remaining list completes part Quotation is recommended.
Referring to Fig. 2, a kind of local quotation recommender system based on neural machine translation proposed by the present invention, comprising:
Quotation cleaning module, for the quotation context of input to be processed into mark required by coder-decoder frame Quasi- input corpus form;
In present embodiment, quotation cleaning module is specifically used for:
It removes the unblind in quotation context and replaces with the vocabulary not appeared in vocabulary in quotation context "<UNK>" then carries out break-in operation more than 28 words and carries out lemmatization to all words less than 28 words then polishing "<PAD>", it Whole vocabulary are converted into term vector with the good term vector model of pre-training afterwards;
Article enlargement module, on the basis of existing article list library Dynamic expansion wait for quotation chapter list storehouse, utilize net Network crawler technology crawls the newest open article of pertinent literature searching platform, so that quotation context is arranged to quotation chapter in time Table storehouse is more complete, comprehensively;
In present embodiment, article enlargement module is specifically used for:
The newest open article of coordinate indexing platform is crawled using web crawlers technology, carries out the cleaning in similar step S1 It is persistently dissolved into afterwards to quotation chapter list storehouse, Dynamic expansion and maintenance are to quotation chapter list storehouse;
Candidate word update module is recalculating word frequency, dynamic more new decoder after quotation chapter list storehouse is updated Candidate word list when decoding kind subtitle;
In present embodiment, candidate word update module is specifically used for:
After quotation chapter list storehouse obtains update, participle is carried out to quotation chapter list storehouse title to the newest overall situation and is laid equal stress on Word frequency is newly calculated, later candidate word list when dynamic more new decoder decoding kind subtitle, so that article list to be drawn reconciliation Code device candidate's vocabulary maintains synchronization association state;
Quotation recommending module is calculated under the premise of given quotation context by core algorithm described in S2, S3, S4 Recommendation article list;
In present embodiment, quotation recommending module is specifically used for:
Local quotation is recommended and neural machine translation combines, the recommendation of local quotation is expressed as from original language (quotation Context) to the machine translation problem of object language (by article title is drawn), it is calculated by core algorithm described in S2, S3, S4 Obtain recommendation article list under the premise of given quotation context.
The invention proposes a kind of novel local quotation recommended methods.Compared to traditional quotation recommended method, pass through structure It builds quotation context to recommend to combine with neural machine translation to by quotation with the parallel corpora for being drawn article title, will locally draw Text is recommended to regard as from quotation context (original language) to the machine translation problem for being drawn article title (object language), so that drawing Stronger semantic consistency is provided between literary context and article title.The present invention passes through by quotation context and by quotation chapter Vocabulary in title is embedded into low-dimensional semantic space, so that semantically similar word distance in the space is closer;Building The encoder of bidirectional valve controlled cycling element with attention mechanism and the decoder chassis of gating cycle unit, by volume Code and decoding vocabulary carry out influence power weight calculation (attention) one by one, substantially increase quotation context and quotation article title Translation accuracy;Further, to decode drawn article title with to by the article title in quotation chapter list storehouse piece by piece Cosine similarity calculating is carried out, and chooses preceding 20 article as article list is recommended, greatly reduces quotation-title translation and text Coupling between chapter recommendation, so that two work can be carried out independently.The preferably specific implementation of the above, the only present invention Mode, but scope of protection of the present invention is not limited thereto, anyone skilled in the art the invention discloses Technical scope in, be subject to equivalent substitution or change according to the technical scheme of the invention and its inventive conception, should all cover this Within the protection scope of invention.

Claims (10)

1. a kind of local quotation recommended method based on neural machine translation mothod, which comprises the following steps:
S1, quotation extraction, lemmatization, word frequency statistics data cleansing operation are carried out to raw data set, obtain quotation context With the parallel corpora that is drawn article title and construct initially to by quotation chapter list storehouse;
S2, by quotation context and drawn in article title by the method for the negative sampling of rising space models coupling in term vector model The word of appearance is embedded into low-dimensional semantic space and obtains term vector, by an embedded space make semantically similar word at this Distance is closer in space;
S3, neural machine translation mothod, the encoder of one bidirectional valve controlled cycling element with attention mechanism of building are based on With the decoder chassis of gating cycle unit, it is term vector that the quotation context in parallel corpora, which is passed through term vector model conversion, Afterwards as the input of model, is drawn article title as output and carry out training pattern;
S4, by kind subtitle that coder-decoder frame exports with all article titles wait be drawn in article list one by one Carry out cosine similarity calculating;
Article of the time after the article time where quotation context is delivered in S5, foundation article time, removal, chooses similarity Satisfactory article is as recommendation list.
2. a kind of local quotation recommended method based on neural machine translation mothod according to claim 1, feature exist In step S1 is specifically included:
It extracts the quotation context of all English and removes unblind, retain the quotation context of word number within the set range And carry out lemmatization;Word frequency is counted, retains the vocabulary of setting ranking before ranking, other vocabulary are with<UNK>replacement, deficiency setting Word in range then expands<PAD>, and is drawn article title according to the extraction of quotation context and carry out similar cleaning operation.
3. a kind of local quotation recommended method based on neural machine translation mothod according to claim 1, feature exist In step S2 is specifically included:
S21, sentence is divided by multiple input words form opposite with word is exported according to word window size;
S22, all words are converted to the 0-1 vector for being equivalent to vocabulary size;
S23, building neural network, include an input layer, hidden layer, output layer;
S24, negative sampling back transfer error is added in rising space model, the weight matrix at word embeded matrix is exactly finally to obtain Term vector indicate.
4. a kind of local quotation recommended method based on neural machine translation mothod according to claim 1, feature exist In the step S3 specifically:
The encoder of bidirectional valve controlled cycling element of the building with attention mechanism and the decoder chassis pair of gating cycle unit Quotation context carries out the study of characterizing semantics, excavates from candidate vocabulary on the basis of understanding semantic and decodes seed mark Topic, formed it is a kind of with semantic content be connected kind subtitle tectonic model;
The encoder of bidirectional valve controlled cycling element of the building with attention mechanism and the decoder chassis tool of gating cycle unit Body are as follows:
Encoder is made of a bidirectional valve controlled cycling element network, each moment t receive list entries t-th of word to Amount indicates and obtains hiding layer state h<t>, acted on by the hiding layer state of attention mechanism and output layer inputted it is every The translation weight of a word, further obtains final context vector and is sent into decoder and decode word;
The formula of encoder GRU unit is expressed as follows:
Gu=sigmod (Wu[h<t-1>, x<t>]+bu)
Gr=sigmod (Wr[h<t-1>, x<t>]+br)
Wherein GuTo update door, GrTo reset door,To update hidden layer variable, C<t>For the hidden layer for flowing to subsequent time Variable, h<t>Indicate the hidden layer variable at h moment, x<t>Indicate the input of t moment, bu、br、bcIndicate biasing, sigmod, tanh It is activation primitive W[u, r, c]It is weight parameter.
It is as follows that attention mechanism decodes partial routine:
When decoder decodes t-th of word, decoder t moment need to be calculated and hide layer state s<t>, word y that the t-1 moment decodes<t-1>, the incoming context vector c of t moment encoder<t>, wherein decoder t moment hides layer state stIt can be obtained by following formula It arrives:
s<t>=g (y<t-1>, s<t-1>,c<t>)
The wherein incoming context variable c of t moment<t>By the hidden layer variable h of encoder<t>With each coding vocabulary and the decoding The translation attention of vocabulary determines that formula is as follows:
WhereinIt is the attention of vector type, presentation code deviceA word to the translation attention of decoder whole word,It can be obtained by following formula:
WhereinThe attention of scalar type, presentation code deviceA word to the translation attention of t-th of word of decoder,It can be obtained by following formula:
Wherein vT, W[s, h]For parameters weighting;
Above procedure is recycled, until decoding whole words, as plants subtitle.
5. a kind of local quotation recommended method based on neural machine translation mothod according to claim 1, feature exist In step S4 specifically:
S41, the similarity two-by-two for all decoding the vocabulary in candidate word remittance table is calculated, establishes dictionary similarity search wordbook;
S42, to kind of subtitle and to be segmented by the article title in quotation chapter list storehouse, according to dictionary similarity search word Allusion quotation concentrates similarity to calculate kind of subtitle and every similarity wait be drawn article title by word;
Similarity of the calculated result as kind of subtitle and this article in S43, accumulation step S42;
S44, the step S43 similarity result obtained is ranked up, forms literature recommendation list.
6. a kind of local quotation recommender system based on neural machine translation mothod, which is characterized in that wanted applied to aforesaid right Seek 1 to 5 any method, comprising:
Quotation cleaning module, it is defeated for the quotation context of input to be processed into standard required by coder-decoder frame Enter corpus form;
Article enlargement module, on the basis of existing article list library Dynamic expansion wait for quotation chapter list storehouse, climbed using network Worm technology crawls the newest open article of pertinent literature searching platform in time so that quotation context to quotation chapter list storehouse It is more complete, comprehensively;
Candidate word update module is recalculating word frequency, dynamic more new decoder decoding after quotation chapter list storehouse is updated Candidate word list when kind subtitle;
Recommendation article list under the premise of given quotation context is calculated in quotation recommending module.
7. a kind of local quotation recommender system based on neural machine translation according to claim 6, which is characterized in that draw Literary cleaning module is specifically used for:
It removes the unblind in quotation context and the vocabulary not appeared in vocabulary in quotation context is replaced with into < UNK >, word in insufficient setting range then polishing<PAD>then carries out break-in operation more than the word of setting range and to all words Lemmatization is carried out, whole vocabulary are converted into term vector with the good term vector model of pre-training later.
8. a kind of local quotation recommender system based on neural machine translation mothod according to claim 6, feature exist In the newest open article of coordinate indexing platform being crawled using web crawlers technology, to raw data set in article enlargement module The data cleansings such as quotation extraction, lemmatization, word frequency statistics operation is carried out, quotation context is obtained and is drawn the flat of article title Row corpus simultaneously constructs initially to which by quotation chapter list storehouse, Dynamic expansion and maintenance are to quotation chapter list storehouse.
9. a kind of local quotation recommender system based on neural machine translation according to claim 6, which is characterized in that wait It selects in word update module, after quotation chapter list storehouse obtains update, the newest overall situation is carried out to quotation chapter list storehouse title Word frequency is segmented and recalculates, later candidate word list when dynamic more new decoder decoding kind subtitle, so as to quotation chapter List and decoder candidate's vocabulary maintain synchronization association state.
10. a kind of local quotation recommender system based on neural machine translation according to claim 6, which is characterized in that In quotation recommending module, local quotation is recommended and neural machine translation combines, the recommendation of local quotation is expressed as from source language Pushing away under the premise of given quotation context is calculated by quotation recommending module in the machine translation problem for saying object language Recommend article list.
CN201810994562.0A 2018-08-27 2018-08-27 Local citation recommendation method and system based on neural machine translation technology Active CN109145190B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810994562.0A CN109145190B (en) 2018-08-27 2018-08-27 Local citation recommendation method and system based on neural machine translation technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810994562.0A CN109145190B (en) 2018-08-27 2018-08-27 Local citation recommendation method and system based on neural machine translation technology

Publications (2)

Publication Number Publication Date
CN109145190A true CN109145190A (en) 2019-01-04
CN109145190B CN109145190B (en) 2021-07-30

Family

ID=64828908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810994562.0A Active CN109145190B (en) 2018-08-27 2018-08-27 Local citation recommendation method and system based on neural machine translation technology

Country Status (1)

Country Link
CN (1) CN109145190B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740164A (en) * 2019-01-09 2019-05-10 国网浙江省电力有限公司舟山供电公司 Based on the matched electric power defect rank recognition methods of deep semantic
CN109740168A (en) * 2019-01-09 2019-05-10 北京邮电大学 A kind of classic of TCM ancient Chinese prose interpretation method based on knowledge of TCM map and attention mechanism
CN109753567A (en) * 2019-01-31 2019-05-14 安徽大学 A kind of file classification method of combination title and text attention mechanism
CN110276082A (en) * 2019-06-06 2019-09-24 百度在线网络技术(北京)有限公司 Translation processing method and device based on dynamic window
CN110472727A (en) * 2019-07-25 2019-11-19 昆明理工大学 Based on the neural machine translation method read again with feedback mechanism
CN111061935A (en) * 2019-12-16 2020-04-24 北京理工大学 Science and technology writing recommendation method based on self-attention mechanism
CN111581401A (en) * 2020-05-06 2020-08-25 西安交通大学 Local citation recommendation system and method based on depth correlation matching
CN112035607A (en) * 2020-08-19 2020-12-04 中南大学 MG-LSTM-based citation difference matching method, device and storage medium
CN112395892A (en) * 2020-12-03 2021-02-23 内蒙古工业大学 Mongolian Chinese machine translation method for realizing placeholder disambiguation based on pointer generation network
CN112765342A (en) * 2021-03-22 2021-05-07 中国电子科技集团公司第二十八研究所 Article recommendation method based on time and semantics
CN113239181A (en) * 2021-05-14 2021-08-10 廖伟智 Scientific and technological literature citation recommendation method based on deep learning
CN113268951A (en) * 2021-04-30 2021-08-17 南京邮电大学 Citation recommendation method based on deep learning

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140006424A1 (en) * 2012-06-29 2014-01-02 Khalid Al-Kofahi Systems, methods, and software for processing, presenting, and recommending citations
CN105589948A (en) * 2015-12-18 2016-05-18 重庆邮电大学 Document citation network visualization and document recommendation method and system
US20160321316A1 (en) * 2011-06-03 2016-11-03 Gdial Inc. Systems and methods for atomizing and individuating data as data quanta
US9607058B1 (en) * 2016-05-20 2017-03-28 BlackBox IP Corporation Systems and methods for managing documents associated with one or more patent applications
CN106682172A (en) * 2016-12-28 2017-05-17 江苏大学 Keyword-based document research hotspot recommending method
CN106844368A (en) * 2015-12-03 2017-06-13 华为技术有限公司 For interactive method, nerve network system and user equipment
CN107341199A (en) * 2017-06-21 2017-11-10 北京林业大学 A kind of recommendation method based on documentation & info general model
US20180018831A1 (en) * 2016-07-15 2018-01-18 Charlena L. Thorpe Licensing and ticketing system for traffic violation
US20180060790A1 (en) * 2016-08-26 2018-03-01 Conduent Business Services, Llc System And Method For Coordinating Parking Enforcement Officer Patrol In Real Time With The Aid Of A Digital Computer
GB2556664A (en) * 2016-11-07 2018-06-06 Google Llc Third party application configuration for issuing notifications

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160321316A1 (en) * 2011-06-03 2016-11-03 Gdial Inc. Systems and methods for atomizing and individuating data as data quanta
US20140006424A1 (en) * 2012-06-29 2014-01-02 Khalid Al-Kofahi Systems, methods, and software for processing, presenting, and recommending citations
CN106844368A (en) * 2015-12-03 2017-06-13 华为技术有限公司 For interactive method, nerve network system and user equipment
CN105589948A (en) * 2015-12-18 2016-05-18 重庆邮电大学 Document citation network visualization and document recommendation method and system
US9607058B1 (en) * 2016-05-20 2017-03-28 BlackBox IP Corporation Systems and methods for managing documents associated with one or more patent applications
US20180018831A1 (en) * 2016-07-15 2018-01-18 Charlena L. Thorpe Licensing and ticketing system for traffic violation
US20180060790A1 (en) * 2016-08-26 2018-03-01 Conduent Business Services, Llc System And Method For Coordinating Parking Enforcement Officer Patrol In Real Time With The Aid Of A Digital Computer
GB2556664A (en) * 2016-11-07 2018-06-06 Google Llc Third party application configuration for issuing notifications
CN106682172A (en) * 2016-12-28 2017-05-17 江苏大学 Keyword-based document research hotspot recommending method
CN107341199A (en) * 2017-06-21 2017-11-10 北京林业大学 A kind of recommendation method based on documentation & info general model

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740164A (en) * 2019-01-09 2019-05-10 国网浙江省电力有限公司舟山供电公司 Based on the matched electric power defect rank recognition methods of deep semantic
CN109740168A (en) * 2019-01-09 2019-05-10 北京邮电大学 A kind of classic of TCM ancient Chinese prose interpretation method based on knowledge of TCM map and attention mechanism
CN109740164B (en) * 2019-01-09 2023-08-15 国网浙江省电力有限公司舟山供电公司 Electric power defect grade identification method based on depth semantic matching
CN109753567A (en) * 2019-01-31 2019-05-14 安徽大学 A kind of file classification method of combination title and text attention mechanism
CN110276082A (en) * 2019-06-06 2019-09-24 百度在线网络技术(北京)有限公司 Translation processing method and device based on dynamic window
CN110472727A (en) * 2019-07-25 2019-11-19 昆明理工大学 Based on the neural machine translation method read again with feedback mechanism
CN110472727B (en) * 2019-07-25 2021-05-11 昆明理工大学 Neural machine translation method based on re-reading and feedback mechanism
CN111061935A (en) * 2019-12-16 2020-04-24 北京理工大学 Science and technology writing recommendation method based on self-attention mechanism
CN111061935B (en) * 2019-12-16 2022-04-12 北京理工大学 Science and technology writing recommendation method based on self-attention mechanism
CN111581401A (en) * 2020-05-06 2020-08-25 西安交通大学 Local citation recommendation system and method based on depth correlation matching
CN111581401B (en) * 2020-05-06 2023-04-07 西安交通大学 Local citation recommendation system and method based on depth correlation matching
CN112035607A (en) * 2020-08-19 2020-12-04 中南大学 MG-LSTM-based citation difference matching method, device and storage medium
CN112035607B (en) * 2020-08-19 2022-05-20 中南大学 Method, device and storage medium for matching citation difference based on MG-LSTM
CN112395892A (en) * 2020-12-03 2021-02-23 内蒙古工业大学 Mongolian Chinese machine translation method for realizing placeholder disambiguation based on pointer generation network
CN112765342B (en) * 2021-03-22 2022-10-14 中国电子科技集团公司第二十八研究所 Article recommendation method based on time and semantics
CN112765342A (en) * 2021-03-22 2021-05-07 中国电子科技集团公司第二十八研究所 Article recommendation method based on time and semantics
CN113268951A (en) * 2021-04-30 2021-08-17 南京邮电大学 Citation recommendation method based on deep learning
CN113268951B (en) * 2021-04-30 2023-05-30 南京邮电大学 Deep learning-based quotation recommendation method
CN113239181A (en) * 2021-05-14 2021-08-10 廖伟智 Scientific and technological literature citation recommendation method based on deep learning

Also Published As

Publication number Publication date
CN109145190B (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN109145190A (en) A kind of local quotation recommended method and system based on neural machine translation mothod
CN111159223B (en) Interactive code searching method and device based on structured embedding
CN107133213B (en) Method and system for automatically extracting text abstract based on algorithm
CN112559556B (en) Language model pre-training method and system for table mode analysis and sequence mask
WO2021114745A1 (en) Named entity recognition method employing affix perception for use in social media
CN109800411A (en) Clinical treatment entity and its attribute extraction method
CN107273355A (en) A kind of Chinese word vector generation method based on words joint training
CN109918510A (en) Cross-cutting keyword extracting method
CN110020189A (en) A kind of article recommended method based on Chinese Similarity measures
CN107480132A (en) A kind of classic poetry generation method of image content-based
Huang et al. Neural temporality adaptation for document classification: Diachronic word embeddings and domain adaptation models
CN104991890A (en) Method for constructing Vietnamese dependency tree bank on basis of Chinese-Vietnamese vocabulary alignment corpora
CN109977250A (en) Merge the depth hashing image search method of semantic information and multistage similitude
Zhang et al. Effective subword segmentation for text comprehension
CN110516145B (en) Information searching method based on sentence vector coding
CN103744956A (en) Diversified expansion method of keyword
CN111291188A (en) Intelligent information extraction method and system
CN111967267B (en) XLNET-based news text region extraction method and system
CN107092605A (en) A kind of entity link method and device
CN110888991A (en) Sectional semantic annotation method in weak annotation environment
CN112364132A (en) Similarity calculation model and system based on dependency syntax and method for building system
CN112036178A (en) Distribution network entity related semantic search method
CN109086255A (en) A kind of bibliography automatic marking method and system based on deep learning
CN113268606A (en) Knowledge graph construction method and device
CN110222338A (en) A kind of mechanism name entity recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant