CN102999486B - Phrase rule abstracting method based on combination - Google Patents

Phrase rule abstracting method based on combination Download PDF

Info

Publication number
CN102999486B
CN102999486B CN201210464597.6A CN201210464597A CN102999486B CN 102999486 B CN102999486 B CN 102999486B CN 201210464597 A CN201210464597 A CN 201210464597A CN 102999486 B CN102999486 B CN 102999486B
Authority
CN
China
Prior art keywords
phrase
rule
phrase rule
combination
minimum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210464597.6A
Other languages
Chinese (zh)
Other versions
CN102999486A (en
Inventor
朱靖波
李强
肖桐
张�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Yayi Network Technology Co ltd
Original Assignee
SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd filed Critical SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd
Priority to CN201210464597.6A priority Critical patent/CN102999486B/en
Publication of CN102999486A publication Critical patent/CN102999486A/en
Application granted granted Critical
Publication of CN102999486B publication Critical patent/CN102999486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of phrase rule abstracting method based on combination, comprise the following steps: in bilingual corpora, construct one " minimum phrase rule ";The phrase rule collection combined by composite construction;From given bilingual parallel corpora, generate minimum phrase rule collection, and leave in hash structure;The phrase rule of tectonic association, judges that by minimum phrase rule collection the phrase rule of this combination is made up of several minimum phrase rules;If the phrase rule of this combination is made up of the minimum phrase rule concentrated less than or equal to n bar minimum phrase rule, put it in a new hash structure;Exporting the phrase rule of new minimum phrase rule collection and the phrase rule concentration of combination, once phrase rule extraction process based on combination terminates.The present invention effectively generates the high-quality phrase rule collection containing more contextual information, and in the case of translation performance does not reduces, the phrase rule collection that the inventive method extracts than pedestal method reduces 56.5%.

Description

Phrase rule abstracting method based on combination
Technical field
The present invention relates to the phrase treatment technology in a kind of statictic machine translation system based on phrase, specifically one Plant phrase rule abstracting method based on combination.
Background technology
The statictic machine translation system based on the phrase performance in machine translation field goes out the strongest competitiveness.Base Method reason the most greatly in phrase is that the method relies on the phrase rule collection that a quality is higher.? Phrase rule is concentrated, and each source language phrase is mapped to one or more different target language phrase.In phrase system, Phrase is made up of a series of continuous print words, and phrase does not has linguistic meaning.At present, some machine translation area research personnel Have been proposed that some effective phrase rule abstracting methods.In these phrase rule abstracting methods, heuristic obtains To being widely applied.This abstracting method, by using the word alignment information that in bilingual corpora, each sentence is corresponding, extracts institute There is the phrase rule keeping consistent with word alignment information.Owing to this Rule Extracting Algorithm is simple, it is easily achieved, shows non-simultaneously The performance of Chang Youyue, so be widely used in the statictic machine translation system be currently based on phrase.Take out in use During taking phrase rule, the quantity of the phrase rule finally extracted becomes quadratic power with the quantity of word in training data Relation.In order to obtain a controlled phrase rule collection of scale, it is common practice to original language and object language to extraction are short The length of language is any limitation as.In the machine translation system of most excellent performances, default setting is by the source language extracted and target Contained by language phrase, the upper limit of word number is set to 7 to 10 words.Such as, the original language end of phrase that Moses will extract It is 7 words with the length limitation of object language end.The most verified most of redundant rule elimination concentrated by phrase rule can't Affect the performance of translation system.
In order to reduce the size of phrase rule collection, the method commonly used most is to extract existing heuristic rule Method, the phrase rule that i.e. benchmark phrase rule abstracting method extracts filters, thus reduces the size of phrase rule collection. Benchmark phrase rule abstracting method is widely used in the statictic machine translation system based on phrase of excellent performance, as Moses system, NiuTrans system.In the phrase rule model that Koehn etc. proposes, it is fixed that phrase rule must is fulfilled for concordance Justice.Described conformance definition is:
Phrase pairKeeping consistent with word alignment information, and if only ifIn all words in word alignment A institute right The word answered existsWithin the scope of,In the word corresponding in word alignment A of all words existWithin the scope of;Meanwhile, ?WithIn, at least a word is in word alignment A.
Wherein,Expression source language phrase,Represent target language phrase.The visual interpretation of this definition: given source language phrase and a mesh Poster phrase, in the phrase of any one end, during at least word corresponds to the phrase of the other end;Meanwhile, any one end All words in phrase the most not can be mapped to outside other end phrase.By as defined above, under the model that Koehn etc. proposes All of phrase rule must all be fulfilled for conforming definition.Can according to as defined above directly from parallel corpora extraction with Word alignment information keeps consistent phrase rule: first in each centering, owns with target language end circulation searching from source language Phrase, then output keeps consistent phrase rule with word alignment information.When carrying out phrase rule collection structure by the method, During rule extraction, need to arrange the maximum number of the contained word of extracting phrase, the most just can avoid obtaining scale not Controlled phrase rule collection.In Fig. 2 right side Baseline list show with benchmark phrase rule abstracting method from example containing word The phrase rule of the sentence centering extraction of alignment information.From the phrase rule extracted it can be seen that these rules are equal and word alignment Keep consistent.
But, benchmark phrase rule abstracting method has inevitable problem, i.e. during rule extraction, phrase length Need the debugging carrying out machinery to obtain optimum phrase rule collection.The phrase rule table extracted is very big, it is empty to take hard disk Between many, contain more noise data simultaneously.
Summary of the invention
The phrase rule table extracted for heuristic rule abstracting method in prior art is very big, take hard drive space Many, containing weak points such as more noise datas, the technical problem to be solved in the present invention be to provide a kind of generate compact, The phrase rule abstracting method based on combination of the phrase rule collection containing more contextual information.
For solving above-mentioned technical problem, the technical solution used in the present invention is:
A kind of phrase rule abstracting method based on combination of the present invention comprises the following steps: construct one in bilingual corpora " minimum phrase rule ";
Construct a phrase rule collection containing more contextual informations by the minimum phrase rule of combination, form " group The phrase rule collection closed ";Phrase rule collection based on combination, raw from the given bilingual parallel corpora containing word alignment information Become minimum phrase rule collection, and leave in hash structure;
The value of combination frequency n, the phrase rule of tectonic association are set, judge the short of this combination by minimum phrase rule collection Language rule is made up of several minimum phrase rules;
If the phrase rule of this combination is by the minimum phrase rule group concentrated less than or equal to n bar minimum phrase rule Become, put it in a new hash structure;
Export the phrase rule of new minimum phrase rule collection and the phrase rule concentration of combination, once based on combination short Language rule extraction process terminates.
Described minimum phrase rule is: in the case of consistent with the holding of word alignment information, it is impossible to be broken down into two again Or more rule.
The phrase rule of described combination is: a phrase rule keeps consistent with word alignment information, this phrase rule simultaneously By the n of same training sentence centering or forming less than minimum phrase rule merging individual for n, this rule-like is the phrase rule of combination Then.
If the phrase rule of this combination is made up of, the most not the minimum phrase rule concentrated more than n bar minimum phrase rule Processing, this phrase rule extraction process based on combination terminates.
The size of the phrase rule collection of described combination combines the value of frequency n and adjusts in the phrase rule by combination Whole, the value i.e. combining frequency n is the biggest, and the phrase rule collection of the combination obtained is the biggest.
The invention have the advantages that and advantage:
1. the present invention can effectively generate high-quality, compact, simultaneously to contain more contextual information phrase rule Then collecting, in the case of ensureing that translation performance does not reduces, the phrase rule collection of the inventive method extraction extracts than pedestal method Phrase rule collection reduces 56.5%.
2. by the analysis of experimental result is found, on some data set, by using phrase extraction based on combination Method, it is possible to obtain the raising of BLEU value, simultaneously by substantial amounts of experiment, to having of phrase rule abstracting method based on combination Effect property has carried out rational checking.
Accompanying drawing explanation
Fig. 1 is the inventive method flow chart;
Fig. 2 is the phrase rule (right) that in word alignment data, (left) extracts;
Fig. 3 is that in the inventive method, the impact of BLEU value is illustrated by phrase table different size;
Fig. 4 is that the rule of combination in 30-best translation result that the present invention applies uses ratio situation.
Detailed description of the invention
Below in conjunction with Figure of description, the present invention is further elaborated.
A kind of phrase rule abstracting method based on combination of the present invention comprises the following steps:
One " minimum phrase rule collection " is constructed in bilingual corpora;
One is constructed containing more contextual informations, superior in quality phrase rule by the minimum phrase rule collection of combination Collection, forms " the phrase rule collection of combination " n-composed;
Phrase rule based on combination, generates minimum phrase from the given bilingual parallel corpora containing word alignment information Rule set minimal, and leave in the hash structure of entitled minimal;
Arranging the value of combination frequency n, the phrase rule collection n-composed of tectonic association, by minimum phrase rule collection Minimal detects all possible phrase rule, i.e. judges that the phrase rule of this combination is made up of several minimum phrase rules;
If the phrase rule of this combination is by less than or equal to the minimum phrase in n bar minimum phrase rule collection minimal Rule composition, puts it in new hash structure composed;
Phrase rule in output minimal and composed, once phrase rule extraction process based on combination terminates.
If the phrase rule of this combination is by more than the minimum phrase rule group in n bar minimum phrase rule collection minimal Become, do not process.
In order to obtain the phrase rule collection of a reasonable quantity exercisable, regular, the present invention proposes based on combination Phrase rule abstracting method.
As it is shown in figure 1, before implementing the inventive method, first prepare bilingual panel data and word alignment, and set in advance Put and combine frequency n;
Read data line, including source language, target language and word alignment;
The minimum phrase rule collection of structure, puts in hash data structure 1;
Tectonic association rule, it is judged that whether this rule of combination meets the requirement of combination frequency n, satisfactory, i.e. this group The phrase rule closed is formed by less than or equal to the minimum phrase rule in n bar minimum phrase rule collection minimal, then put into Kazakhstan In uncommon structure 2;
Judge whether also other possible rule of combinations, without the rule of combination that other are possible, then Hash is tied Content in structure 1,2 exports and preserves, and the most once phrase rule extraction process based on combination terminates
Judge whether the most untreated data, without untreatment data, then terminate whole control process.
If the most untreated data, return to read data line, including source language, target language and word alignment step.
If also having other possible rule of combinations, then return tectonic association rule step, continue and judge whether to meet group Close the requirement step of frequency n.
If not meeting the requirement of combination frequency n, the phrase rule i.e. combined is by more than n bar minimum phrase rule collection Minimum phrase rule composition in minimal, goes to judge whether also other possible rule of combination steps.
As in figure 2 it is shown, the basic concept of this Rule Extracting Algorithm is, first at bilingual corpora (large-scale parallel sentence to) Middle structure one " minimum phrase rule " minimal(refers to rule most basic, that unit granularity is minimum, is certain phrase rule Definition then), then construct one containing more contextual informations, superior in quality phrase by the minimum phrase rule of combination Rule set, the phrase rule collection n-composed i.e. combined.In the present invention, n-composed phrase rule collection is meant that this rule Then can be made up of 1 ~ n minimum phrase rule, i.e. (n-1)-composed phrase rule collection is included in n-composed rule It it is a subset of (n-1)-composed rule fairground n-composed rule set among collection.In the methods of the invention, rule The size of collection is to be adjusted by the value of n in rule of combination, i.e. n value is the biggest, and the rule set obtained is the biggest.This with In Rule Extracting Algorithm different by limiting the maximum number of word contained by source language and target language phrase.
In the phrase rule abstracting method based on combination that the present invention proposes, which type of rule is first concern be It is only minimum phrase rule.
Minimum phrase rule be exactly in the case of consistent with the holding of word alignment information, it is impossible to be broken down into again two or More rule, minimum rule set is the minimum unit of translation, comprises the essential information needed for translation.
Minimum rule set constitutes a translation model the most succinct.In Fig. 2, right side Minimal list is shown and is carried by the present invention The minimum phrase rule that the phrase rule abstracting method gone out extracts from the sentence centering containing word alignment information of example.In fig. 2 In shown phrase rule, first five rule meets the present invention definition to minimum rule.Such as, (Liaoning, liaoning) no Two or more phrase rule can be broken down into, so this rule is minimum phrase rule.
Minimum rule does not comprise only the phrase rule of a word all referring to source language and target language end phrase.When word pair Be together 1 to many or multipair 1 in the case of, the consistent phrase rule of keeping with word alignment extracted also corresponds to minimum rule Definition.Such as, in (import and export, import and export) rule, " import and export " in word alignment information relative to target Language word is " import " and " export ", and this rule keeps consistent with word alignment information, is a rational phrase rule, with The Shi Fuhe definition to minimum phrase rule, when the minimum phrase rule collection of structure, is added into minimum phrase rule and concentrates.This Outward, if the word word alignment being connected with minimum phrase rule source language and target language end is for time empty, this minimum rule can be to right Null word extends, and the phrase rule constructed still conforms to minimum phrase rule definition.Such as, advise at (Liaoning, liaoning's) In then, target language word ' s occurs in the edge of target language phrase, simultaneously to sky in word alignment information, this rule the most only by One minimum phrase rule (Liaoning, liaoning) is constituted, so this rule is minimum phrase rule.
The definition of minimum phrase rule meets the intuition of people, i.e. when translating, it is always desirable to the translation rule of use The shortest and the smallest, translation quality is higher simultaneously.But, also contain only use in translation process just because of minimum phrase rule Most basic word, ultimately constructed minimum phrase rule concentration lost substantial amounts of contextual information, and these contextual informations are One of key factor of statictic machine translation system excellent performance based on phrase.In extreme situations, when extracting When the source language of little phrase rule and target language end only have a word, translation system then degenerates to translation system based on word. In order to improve the quality of phrase rule, making phrase rule can comprise more contextual information, the present invention proposes by combination Minimum phrase rule obtains containing more words, the method for the extracting phrase rule of more contextual information.
Article one, phrase rule keeps consistent with word alignment information, and this phrase rule is by the n of same training sentence centering simultaneously Or the minimum phrase rule less than n combines, this rule-like is called n-composed phrase rule, the phrase rule i.e. combined Then.
Concentrate it can be seen that (n-1)-composed phrase rule collection is included in n-composed phrase rule.The right side in Fig. 2 The sentence centering containing word alignment information from Fig. 2 of the phrase rule abstracting method with present invention combination is shown in side 2-Composed list Extraction by two or the phrase rule of combination that combines less than two minimum phrase rules.Such as, (Liaoning is imported and exported, Liaoning's import and export) by minimum rule (Liaoning, liaoning's) and (import and export, import and Export) combination, so it is 2-composed phrase rule.For generalization, minimum phrase rule is defined as 1- Composed phrase rule.
If during it is obvious that the number combining the minimum phrase rule comprised in phrase rule is not any limitation as, this The method of bright proposition can extract the phrase rule of random length.But, in most of the cases, will combination phrase rule comprise The number definition of minimum phrase rule is excessive, the quality of the phrase rule collection constructed can't be had the best impact.
By benchmark phrase rule extraction algorithm being carried out simple modification, the phrase rule based on combination that the present invention proposes Abstracting method is highly susceptible to realizing.The given bilingual parallel corpora containing word alignment information, by parameter n in n-composed Rationally arrange.
Based on combination for present invention phrase rule abstracting method is applied in NiuTrans open source system by the present embodiment In translation system based on phrase, at NIST(National Institute of Standards andTechnology) Chinese In English translation duties, by comparing with benchmark phrase extraction method, evaluate this combination phrase rule abstracting method to translation Systematic function affects.
Translation framework based on phrase employs, as benchmark translation system, all standards that open source system Moses uses Feature.Additionally, in translation system, it is integrated with two and adjusts sequence models: Lexical tune sequence model based on maximum entropy and stratification Sequence model adjusted in phrase.Benchmark system decoder uses bundle beta pruning to accelerate decoding with a cube technology of prunning branches, uses minimal error rate Training optimizes feature weight.Acquiescence adjusts sequence longest distance to be set to 8, and the source language end of phrase rule and target language end comprise word It is identical that number is limited to 7(with Moses default setting).For phrase rule collection, each source language phrase turns over according to phrase Translate probability and only retain front 30 translation candidates.
It is right that the training data used in the present embodiment comprises 1,900,000 Chinese-English bilingual sentences, and this training data comes from NIST part data in the extensive bilingual expectation that NIST MT 2008 evaluation and test provides.First, with GIZA++ instrument to training number According to carrying out two-way word alignment, it is right to carry out two-way word alignment result with " grow-diag-final-and " heuristic algorithm afterwards Titleization processes.Additionally, this experiment makes the Xinhua part of GIZAWORD in English and the target language part training of bilingual data One 5 gram language model.About development set and test set, the present embodiment employs the test set (919) of NIST MT2003 As the development set of weight tuning, use the test set of NIST MT 2004 and NIST MT 2005 (to contain 1788 respectively simultaneously With 1082 sentences) as the test set evaluating system translation quality.Translation quality is by using the insensitive IBM version of context This BLEU evaluation index is evaluated.
Table 1. benchmark system and combined method development set (NIST MT 2003) and test set (NIST MT2004 and NIST MT 2005) on Comparison of experiment results, the most often group experimental result is taken turns experiment by 5 and is averaged
Table 1 represents that the rule of combination abstracting method of benchmark abstracting method and present invention proposition is under various combination value n is arranged Experimental result, evaluation of result index by BLEU value represent.It can be seen that ought only extract in " minimum rule " row from table 1 During little rule, the inventive method will obtain a phrase rule collection the least, but owing to minimum rule set is in the process of extraction In lost substantial amounts of contextual information, so the average translation performance in development set and test set reduces than benchmark system 1.37 BLEU points.When being combined rule extraction, can obtain comprising the phrase rule collection of more contextual information, simultaneously BLEU value anywhere rule quantity increase sustainable growth.Such as, carried out with " 2-Composed " method by " pedestal method " in table 1 Relatively, it appeared that when extracting 2-composed phrase rule collection, the available translation performance suitable with pedestal method, with this Meanwhile, the size of the phrase rule collection that 2-Composed method obtains reduces 44.3% than pedestal method.By experiment card further Bright, when extracting the phrase rule of 3-Composed Yu 4-Composed, the average BLEU value of development set and test set compared to Benchmark system all improves with 2-Composed method.Consider the situation of translation performance and phrase rule size at the same time Under, the peak performance during the translation performance of 2-Composed phrase rule is tested with table 1 is comparable, and phrase rule size but has simultaneously Obvious decline, i.e. 2-Comopsed phrase rule basically reached optimum.Finding out from the experimental result of table 1, the present invention carries The method gone out can effectively generate high-quality, compact, simultaneously to contain more contextual information phrase rule collection.
In benchmark phrase rule abstracting method, the maximum number comprising word when source language and target language phrase is set to not With when being worth, can effectively adjust the size of phrase rule collection.Fig. 3 compares pedestal method from combined method under different setting BLEU value.Wherein transverse axis is expressed as the size (unit million) of phrase table, and the longitudinal axis is BLEU value.What in Fig. 3, solid line represented is Situation when phrase length is set to different value in benchmark Rule Extracting Algorithm, what in solid line, solid square point represented is concrete Setup Experiments, represent such as " length=3 " is source language and the greatest length of target language phrase of phrase rule in benchmark system Being disposed as 3, other is similar to therewith.What in Fig. 3, dotted line represented is that in phrase extraction method based on combination, n is set to different value Time situation.From figure 3, it can be seen that in the n-composed phrase rule abstracting method of present invention proposition, when n >=2, can Obtain the translation performance suitable with benchmark abstracting method;Simultaneously it can be seen that the present invention proposes to combine phrase rule abstracting method The balance of rule set size and translation system can be reached faster.Can be observed from this figure, when only using minimum rule set, turn over Translate the value of performance ratio (>=2)-composed combined method to have and reduce significantly, this also proposition of the present invention from side illustration Effectiveness based on combination phrase extraction method, explanation simultaneously containing the phrase rule of more contextual informations to translation system Performance have the biggest impact.
Decoder is used the ratio situation of minimum phrase rule and rule of combination to be added up by the present invention, and this statistics exists Carry out on 30-best translation result in development set and test set.What Fig. 4 represented is the statistics feelings in development set and test set Condition, wherein n-composed* represents the rule of combination only combined by n minimum rule.Figure 4, it is seen that decoding Device, when using phrase rule to translate, tends to select shorter rule (such as minimal and 2-in most cases Composed*).The rule of combination being made up of more minimum phrase rule is then rarely employed (such as 4-when translation Composed*).The experimental result of Fig. 4 explains simultaneously and why uses 2-Composed rule of combination can obtain relatively in table 1 High-performance.
The phrase rule abstracting method that the application of the invention proposes, can obtain one for statistical machine based on phrase The high-quality of translation system service, the phrase rule collection simplified.By with use the most extensively, that performance is excellent is heuristic short Language abstracting method is compared, in the case of ensureing that translation performance does not reduces, and the phrase rule of the method extraction that the present invention proposes Then collect and reduce 56.5% than the phrase rule collection of pedestal method extraction.By the analysis of experimental result is found, at some data set On, by using phrase extraction method based on combination, it is possible to obtain the raising of BLEU value.Simultaneously by substantial amounts of experiment, right The effectiveness of phrase rule abstracting method based on combination has carried out rational checking.
The checking of the statictic machine translation system based on phrase in NiuTrans open source system, with Moses are write from memory The Rule Extracting Algorithm recognizing setting is compared, based on rule of combination abstracting method, what the present invention proposed is ensureing that translation performance does not reduces In the case of, a more succinct phrase rule collection can have been obtained.When extracting 2-composed phrase rule, the present invention The quality of translation rule that obtains of abstracting method suitable with the default set of rules of Moses, phrase rule collection size is simultaneously The 56.5% of Moses default setting rule set.Experimental result again shows that, when increasing along with the minimum phrase rule number of times of combination, The performance of translation system does not show a marked increase compared with 2-composed phrase rule performance.Consider system translation at the same time In the case of performance and phrase rule collection size, 2-composed phrase rule has basically reached optimum.

Claims (3)

1. a phrase rule abstracting method based on combination, it is characterised in that comprise the following steps:
One " minimum phrase rule " is constructed in bilingual corpora;
Construct a phrase rule collection containing more contextual informations by the minimum phrase rule of combination, form " combination Phrase rule collection ";Phrase rule collection based on combination, generates from the given bilingual parallel corpora containing word alignment information Little phrase rule collection, and leave in hash structure;
The value of combination frequency n, the phrase rule of tectonic association are set, are judged that by minimum phrase rule collection the phrase of this combination is advised Then it is made up of several minimum phrase rules;
If phrase rule is made up of the minimum phrase rule concentrated less than or equal to n bar minimum phrase rule, put it into one In individual new hash structure;
The phrase rule that the phrase rule of the minimum phrase rule collection of output and all combinations is concentrated, once phrase rule based on combination Then extraction process terminates;
The phrase rule of described combination is: a phrase rule keeps consistent with word alignment information, and this phrase rule is by same simultaneously Individual or less than n the minimum phrase rule merging of the n of one training sentence centering forms, and such phrase rule is that the phrase of combination is advised Then.
2. the phrase rule abstracting method based on combination as described in claim 1, it is characterised in that: described minimum phrase rule For: in the case of consistent with the holding of word alignment information, it is impossible to be broken down into the rule of two or more again.
3. the phrase rule abstracting method based on combination as described in claim 1, it is characterised in that: the phrase rule of described combination The size then collected combines the value of frequency n and is adjusted in the phrase rule by combination, the value i.e. combining frequency n is the biggest, The phrase rule collection of the combination obtained is the biggest.
CN201210464597.6A 2012-11-16 2012-11-16 Phrase rule abstracting method based on combination Active CN102999486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210464597.6A CN102999486B (en) 2012-11-16 2012-11-16 Phrase rule abstracting method based on combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210464597.6A CN102999486B (en) 2012-11-16 2012-11-16 Phrase rule abstracting method based on combination

Publications (2)

Publication Number Publication Date
CN102999486A CN102999486A (en) 2013-03-27
CN102999486B true CN102999486B (en) 2016-12-21

Family

ID=47928068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210464597.6A Active CN102999486B (en) 2012-11-16 2012-11-16 Phrase rule abstracting method based on combination

Country Status (1)

Country Link
CN (1) CN102999486B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391885B (en) * 2014-11-07 2017-07-28 哈尔滨工业大学 A kind of abstracting method of the chapter level than the parallel phrase pair of language material trained based on parallel corpora
CN107463548B (en) * 2016-06-02 2021-04-27 阿里巴巴集团控股有限公司 Phrase mining method and device
CN108241609B (en) * 2016-12-23 2022-02-01 科大讯飞股份有限公司 Ranking sentence identification method and system
CN107943852B (en) * 2017-11-06 2020-10-30 首都师范大学 Chinese comparison sentence recognition method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6275792B1 (en) * 1999-05-05 2001-08-14 International Business Machines Corp. Method and system for generating a minimal set of test phrases for testing a natural commands grammar
CN1465018A (en) * 2000-05-11 2003-12-31 南加利福尼亚大学 Machine translation mothod
CN1489086A (en) * 2002-10-10 2004-04-14 莎 刘 Semantic-stipulated text translation system and method
CN101989287A (en) * 2009-07-31 2011-03-23 富士通株式会社 Method and equipment for generating rule for statistics-based machine translation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6275792B1 (en) * 1999-05-05 2001-08-14 International Business Machines Corp. Method and system for generating a minimal set of test phrases for testing a natural commands grammar
CN1465018A (en) * 2000-05-11 2003-12-31 南加利福尼亚大学 Machine translation mothod
CN1489086A (en) * 2002-10-10 2004-04-14 莎 刘 Semantic-stipulated text translation system and method
CN101989287A (en) * 2009-07-31 2011-03-23 富士通株式会社 Method and equipment for generating rule for statistics-based machine translation

Also Published As

Publication number Publication date
CN102999486A (en) 2013-03-27

Similar Documents

Publication Publication Date Title
Martins et al. Summarization with a joint model for sentence extraction and compression
Berg-Kirkpatrick et al. Jointly learning to extract and compress
Xiao et al. Sharing attention weights for fast transformer
CN104391885B (en) A kind of abstracting method of the chapter level than the parallel phrase pair of language material trained based on parallel corpora
CN102999486B (en) Phrase rule abstracting method based on combination
CN1916889B (en) Language material storage preparation device and its method
Katsis et al. AIT-QA: Question answering dataset over complex tables in the airline industry
CN111061861A (en) XLNET-based automatic text abstract generation method
CN116050397B (en) Method, system, equipment and storage medium for generating long text abstract
Wu et al. Data augmentation with hierarchical SQL-to-question generation for cross-domain text-to-SQL parsing
Xi et al. Global encoding for long Chinese text summarization
CN106610953A (en) Method for solving text similarity based on Gini index
CN101763403A (en) Query translation method facing multi-lingual information retrieval system
CN108536724A (en) Main body recognition methods in a kind of metro design code based on the double-deck hash index
CN104537280A (en) Protein interactive relationship identification method based on text relationship similarity
Ertam et al. Abstractive text summarization using deep learning with a new Turkish summarization benchmark dataset
CN102681985A (en) Translation method and translation system oriented to morphologically-rich language
CN104199813A (en) Pseudo-feedback-based personalized machine translation system and method
Cai et al. Indonesian automatic text summarization based on a new clustering method in sentence level
CN114996438B (en) Multi-strategy reinforcement learning method for generating cross-language abstract of Chinese
KR102325249B1 (en) Method for providing enhanced search result by fusioning passage-based and document-based information retrievals
Zhao et al. Importance of synthesizing high-quality data for text-to-sql parsing
CN114139561A (en) Multi-field neural machine translation performance improving method
Song et al. A Two-stage User Intent Detection Model on Complicated Utterances with Multi-task Learning
Cheng et al. Extractive Summarization Based on Quadratic Check

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220207

Address after: 110004 1001 - (1103), block C, No. 78, Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee after: Calf Yazhi (Shenyang) Technology Co.,Ltd.

Address before: Room 1517, No. 55, Sanhao Street, Heping District, Shenyang, Liaoning 110003

Patentee before: SHENYANG YAYI NETWORK TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220713

Address after: 110004 11 / F, block C, Neusoft computer city, 78 Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee after: SHENYANG YAYI NETWORK TECHNOLOGY CO.,LTD.

Address before: 110004 1001 - (1103), block C, No. 78, Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee before: Calf Yazhi (Shenyang) Technology Co.,Ltd.