CN102999486A - Phrase rule extracting method based on combination - Google Patents

Phrase rule extracting method based on combination Download PDF

Info

Publication number
CN102999486A
CN102999486A CN2012104645976A CN201210464597A CN102999486A CN 102999486 A CN102999486 A CN 102999486A CN 2012104645976 A CN2012104645976 A CN 2012104645976A CN 201210464597 A CN201210464597 A CN 201210464597A CN 102999486 A CN102999486 A CN 102999486A
Authority
CN
China
Prior art keywords
phrase
rule
phrase rule
combination
minimum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104645976A
Other languages
Chinese (zh)
Other versions
CN102999486B (en
Inventor
朱靖波
李强
肖桐
张�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Yayi Network Technology Co ltd
Original Assignee
SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd filed Critical SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd
Priority to CN201210464597.6A priority Critical patent/CN102999486B/en
Publication of CN102999486A publication Critical patent/CN102999486A/en
Application granted granted Critical
Publication of CN102999486B publication Critical patent/CN102999486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a phrase rule extracting method based on combination. The phrase rule extracting method comprises the following steps of: configuring a 'minimum phrase rule' in a bilingual corpus; configuring a combined phrase rule set through the combination; generating a minimum phrase rule set in a given bilingual parallel corpus and storing into a hash data structure; configuring a combined phrase rule and judging that the combined phrase rule is formed by several minimum phrase rules through the minimum phrase rule set; if the phrase rule in the combination is formed by less than or equal to n minimum phrase rules in the minimum phrase rule set, putting the phrase rule into a new hash data structure; and outputting a new minimum phrase rule set and phrase rules of the combined phrase rule set, and finishing one time of a phrase rule extracting process based on the combination. According to the phrase rule extracting method based on the combination disclosed by the invention, a high-quality phrase rule set containing abundant contextual information is effectively generated; and under the condition that a translation performance is not reduced, the phrase rule set extracted by the method is reduced by 56.5% when being compared with the phrase rule set extracted by a standard method.

Description

Phrase rule abstracting method based on combination
Technical field
The present invention relates to a kind of based on the phrase treatment technology in the statictic machine translation system of phrase, specifically a kind of based on the combination the phrase rule abstracting method.
Background technology
The performance of statictic machine translation system in the mechanical translation field based on phrase shows very strong competitive power.Based on the method for phrase why effectively greatly reason be that the method relies on the phrase rule collection that quality is higher.Concentrate at phrase rule, each source language phrase is mapped to one or more different target language phrases.In the phrase system, phrase is made of a series of continuous words, and phrase does not have linguistic meaning.At present, some mechanical translation area research personnel have proposed some effective phrase rule abstracting methods.In these phrase rule abstracting methods, heuristic is widely used.This abstracting method extracts all phrase rules consistent with the word alignment Information preservation by using word alignment information corresponding to each sentence in the bilingual corpora.Because this Rule Extracting Algorithm is simple, be easy to realize, show simultaneously very superior performance, so be widely used in the statictic machine translation system based on phrase at present.In the process of using the extracting phrase rule, the quantity of word becomes quadratic relation in the quantity of the phrase rule that finally extracts and the training data.In order to obtain a phrase rule collection that scale is controlled, common way is limited the source language that extracts and the length of target language phrase.In the machine translation system of the excellent performance of majority, the source language that default setting extracts and the upper limit of the contained word number of target language phrase are set to 7 to 10 words.For example, Moses is 7 words with the source language end of the phrase that extracts and the length restriction of target language end.Proved that now most of redundant rule elimination that phrase rule is concentrated can't affect the performance of translation system.
In order to reduce the size of phrase rule collection, the method for the most generally using at present is to existing heuristic rule abstracting method, and namely the phrase rule that extracts of benchmark phrase rule abstracting method filters, thereby reduces the size of phrase rule collection.Benchmark phrase rule abstracting method has obtained being widely used in the statictic machine translation system based on phrase of excellent performance, such as the Moses system, and the NiuTrans system.In the phrase rule model of the propositions such as Koehn, phrase rule must satisfy conformance definition.Described conformance definition is:
Phrase pair
Figure BDA00002416611900011
Consistent with the word alignment Information preservation, and if only if
Figure BDA00002416611900012
In all words corresponding word in word alignment A exist
Figure BDA00002416611900013
Within the scope,
Figure BDA00002416611900014
In all words corresponding word in word alignment A exist
Figure BDA00002416611900015
Within the scope; Meanwhile, exist With
Figure BDA00002416611900017
In, have a word at least in word alignment A.
Wherein,
Figure BDA00002416611900018
Expression source language phrase,
Figure BDA00002416611900019
Expression target language phrase.The visual interpretation of this definition: given source language phrase and target language phrase in the phrase of any end, have at least a word to correspond in the phrase of the other end; Simultaneously, all words in any end phrase all can not correspond to outside the other end phrase.By as above definition, all phrase rules under the model of the propositions such as Koehn all must satisfy conforming definition.Can directly from parallel corpora, extract the phrase rule consistent with the word alignment Information preservation according to as above defining: at first each centering, from source language and target language end circulation searching genitive phrase, then export the phrase rule consistent with the word alignment Information preservation.When carrying out phrase rule collection structure by the method, in the process of rule extraction, the maximum number of the contained word of extracting phrase need to be set, so just can avoid the uncontrollable phrase rule collection of the scale that obtains.The phrase rule that extracts from the sentence centering of containing word alignment information of example with benchmark phrase rule abstracting method is shown in Baseline tabulation in right side among Fig. 2.Can find out that from the phrase rule that extracts these rules all are consistent with word alignment.
But benchmark phrase rule abstracting method has inevitable problem, and namely in the rule extraction process, phrase length need to carry out the debugging of machinery to obtain optimum phrase rule collection.The phrase rule table that extracts is very large, take hard drive space many, contain more noise data simultaneously.
Summary of the invention
The phrase rule table that extracts for heuristic rule abstracting method in the prior art is very large, take hard drive space many, contain the more weak points such as noise data, the technical problem to be solved in the present invention provide a kind of generate phrase rule collection compact, that contain more contextual information based on the phrase rule abstracting method that makes up.
For solving the problems of the technologies described above, the technical solution used in the present invention is:
A kind of phrase rule abstracting method based on combination of the present invention may further comprise the steps: one " minimum phrase rule " of structure in bilingual corpora;
Construct one and contain the more phrase rule collection of multi-context information by making up minimum phrase rule, form by " the phrase rule collection of combination "; Based on the phrase rule collection of combination, from the given bilingual parallel corpora that contains word alignment information, generate minimum phrase rule collection, and leave in the Hash data structure;
The value of combination frequency n is set, and the phrase rule of tectonic association judges that by minimum phrase rule collection the phrase rule of this combination is comprised of several minimum phrase rules;
If the phrase rule of this combination forms by being less than or equal to the concentrated minimum phrase rule of the minimum phrase rule of n bar, put it in the new Hash data structure;
Export the concentrated phrase rule of phrase rule of new minimum phrase rule collection and combination, once the phrase rule extraction process based on combination finishes.
Described minimum phrase rule is: in the situation consistent with the word alignment Information preservation, can not be broken down into two or more rule again.
The phrase rule of described combination is: a phrase rule is consistent with the word alignment Information preservation, and this phrase rule is by n of same training sentence centering or merge less than n minimum phrase rule and to form simultaneously, and this rule-like is the phrase rule of combination.
If the phrase rule of this combination is comprised of the minimum phrase rule of concentrating greater than the minimum phrase rule of n bar, then do not process, this phrase rule extraction process based on combination finishes.
The size of the phrase rule collection of described combination is adjusted by the value of combination frequency n in the phrase rule of combination, and the value that namely makes up frequency n is larger, and the phrase rule collection of the combination that obtains is larger.
The present invention has following beneficial effect and advantage:
The present invention can effectively generate high-quality, compact, contain the phrase rule collection of more contextual information simultaneously, guaranteeing that the phrase rule collection that the inventive method extracts reduces 56.5% than the phrase rule collection that pedestal method extracts in the situation that the translation performance does not reduce.
2. find by the analysis to experimental result, on some data set, by using the phrase extraction method based on combination, can obtain the raising of BLEU value, simultaneously by a large amount of experiments, to having carried out rational checking based on the validity of the phrase rule abstracting method that makes up.
Description of drawings
Fig. 1 is the inventive method process flow diagram;
Fig. 2 is the phrase rule (right side) that (left side) extracted in the word alignment data;
Fig. 3 is the different big or small impact diagrams on the BLEU value of phrase table in the inventive method;
Fig. 4 is the rule of combination usage ratio situation in the 30-best translation result that the present invention uses.
Embodiment
The present invention is further elaborated below in conjunction with Figure of description.
A kind of phrase rule abstracting method based on combination of the present invention may further comprise the steps:
One " minimum phrase rule collection " of structure in bilingual corpora;
Construct one and contain multi-context information more, superior in quality phrase rule collection by making up minimum phrase rule collection, form " the phrase rule collection of combination " n-composed;
Based on the phrase rule of combination, from the given bilingual parallel corpora that contains word alignment information, generate minimum phrase rule collection minimal, and leave in the Hash data structure of minimal by name;
The value of combination frequency n is set, and the phrase rule collection n-composed of tectonic association detects all possible phrase rule by minimum phrase rule collection minimal, judges that namely the phrase rule of this combination is comprised of several minimum phrase rules;
If the phrase rule of this combination is comprised of the minimum phrase rule that is less than or equal among the minimum phrase rule collection of the n bar minimal, put it among the new Hash data structure composed;
Phrase rule among output minimal and the composed, once the phrase rule extraction process based on combination finishes.
If the phrase rule of this combination does not then process by forming greater than the minimum phrase rule among the minimum phrase rule collection of the n bar minimal.
In order to obtain the phrase rule collection of exercisable, a regular reasonable quantity, the present invention proposes the phrase rule abstracting method based on combination.
As shown in Figure 1, before implementing the inventive method, at first prepare bilingual panel data and word alignment, and pre-set the combination frequency n;
Read data line, comprise source language, target language and word alignment;
Construct minimum phrase rule collection, put into Hash structure 1;
The tectonic association rule judges whether this rule of combination meets the requirement of making up frequency n, and is satisfactory, i.e. the phrase rule of this combination is comprised of the minimum phrase rule that is less than or equal among the minimum phrase rule collection of the n bar minimal, then puts into Hash structure 2;
Judge whether in addition other possible rules of combination, if there are not other possible rules of combination, then with the output of the content in the Hash structure 1,2 and preservation, then once finish based on the phrase rule extraction process that makes up
Judge whether to also have untreated data, if there is not untreatment data, then finish whole control procedure.
If also have untreated data, turn back to and read data line, comprise source language, target language and word alignment step.
If also have other possible rules of combination, then return tectonic association rule step, continuing judges whether to meet the step that requires of combination frequency n.
If do not meet the requirement of combination frequency n, namely the phrase rule of combination judges whether to also have other possible rule of combination steps by forming greater than the minimum phrase rule among the minimum phrase rule collection of the n bar minimal, going to.
As shown in Figure 2, the basic concept of this Rule Extracting Algorithm is, at first in bilingual corpora (large-scale parallel sentence to), construct " minimum phrase rule " minimal(and refer to rule the most basic, unit granularity minimum, the definition of certain bar phrase rule), then construct one and contain multi-context information more, superior in quality phrase rule collection, i.e. the phrase rule collection n-composed of combination by making up minimum phrase rule.Among the present invention, the implication of n-composed phrase rule collection is that this rule can be comprised of 1 ~ n minimum phrase rule, and namely (n-1)-composed phrase rule collection is a subset of (n-1)-composed rule fairground n-composed rule set among being included in the n-composed rule set.In the methods of the invention, the size of rule set is to adjust by the value of n in the rule of combination, and namely the n value is larger, and the rule set that obtains is larger.This with Rule Extracting Algorithm in the past in different by the maximum number of restriction source language and the contained word of target language phrase.
In the phrase rule abstracting method based on combination that the present invention proposes, the problem of at first being concerned about is which type of rule is only minimum phrase rule.
Minimum phrase rule is exactly in the situation consistent with the word alignment Information preservation, can not be broken down into two or more rule again, and minimum rule set is the minimum unit of translation, comprises the required essential information of translation.
Minimum rule set has consisted of a translation model the most succinct.The minimum phrase rule that the phrase rule abstracting method that proposes with the present invention extracts from the sentence centering of containing word alignment information of example is shown in right side Minimal tabulation among Fig. 2.In the phrase rule shown in Fig. 2, the first five rule meets the present invention to the definition of minimum rule.For example, (Liaoning liaoning) can not be broken down into two or more phrase rules, so this rule is minimum phrase rule.
Minimum rule also not exclusively refers to the source language and target language end phrase only contains the phrase rule of a word.When word alignment be more than 1 pair or many to 1 situation under, the phrase rule that is consistent with word alignment that extracts meets the definition of minimum rule equally.For example, (import and export, import and export) in the rule, " import and export " relative target language word in word alignment information is " import " and " export ", this rule is consistent with the word alignment Information preservation, is a rational phrase rule, meets simultaneously the definition to minimum phrase rule, when the minimum phrase rule collection of structure, it is added minimum phrase rule concentrate.In addition, if when the word word alignment that links to each other with minimum phrase rule source language and target language end is empty, this minimum rule can be to expanding null word, and the phrase rule of constructing still meets minimum phrase rule definition.For example, in that (Liaoning is liaoning's) in the rule, the target language word ' s appears at the edge of target language phrase, simultaneously in word alignment information to sky, this rule is equally only by a minimum phrase rule (Liaoning, liaoning) consist of, so this rule is minimum phrase rule.
The definition of minimum phrase rule meets people's intuition, namely when translating, always wishes that the translation rule that uses is as far as possible short and small, and translation quality is higher simultaneously.Yet, also only contain the most basic word that uses in the translation process just because of minimum phrase rule, the minimum phrase rule of final structure is concentrated and has been lost a large amount of contextual informations, and these contextual informations are based on one of the key factor of the statictic machine translation system excellent performance of phrase.In extreme situation, when the source of the minimum phrase rule that extracts language and target language end only had a word, translation system then degenerated to the translation system based on word.In order to improve the quality of phrase rule, make phrase rule can comprise more contextual information, the present invention proposes by making up minimum phrase rule and obtain and contain more words, the method for the extracting phrase rule of multi-context information more.
Article one, phrase rule is consistent with the word alignment Information preservation, and this phrase rule claims that this rule-like is the n-composed phrase rule, i.e. the phrase rule of combination by n of same training sentence centering or combine less than n minimum phrase rule simultaneously.
Can find out, (n-1)-composed phrase rule collection is included in the n-composed phrase rule and concentrates.Right side 2-Composed tabulation shows that phrase rule abstracting method with the present invention's combination contains from Fig. 2 that the sentence centering of word alignment information extracts among Fig. 2 by the phrase rule of two or the combination that combines less than two minimum phrase rules.For example, (Liaoning import and export, liaoning's import and export) (Liaoning is liaoning's) with (importing and exporting import and export) combination, so it is the 2-composed phrase rule by minimum rule.For universalization, minimum phrase rule is defined as the 1-composed phrase rule.
Clearly, if when the number of the minimum phrase rule that combination is comprised in the phrase rule was not limited, the method that the present invention proposes can extract the phrase rule of random length.Yet in most of the cases, the number definition that combination is comprised minimum phrase rule in the phrase rule is excessive, can't obviously good impact be arranged to the quality of the phrase rule collection that constructs.
By benchmark phrase rule extraction algorithm is carried out simple modification, the phrase rule abstracting method based on combination that the present invention proposes is highly susceptible to realizing.The given bilingual parallel corpora that contains word alignment information is by rationally arranging parameter n among the n-composed.
The phrase rule abstracting method that present embodiment will the present invention is based on combination is applied to NiuTrans and increases income in the translation system based on phrase in the system, at NIST(National Institute of Standards andTechnology) on the Chinese-English translation task, by comparing with benchmark phrase extraction method, estimate this combination phrase rule abstracting method to the translation system performance impact.
Used the feature of all standards of the Moses of the system use of increasing income as the benchmark translation system based on the translation framework of phrase.In addition, in translation system, transfer the order models for integrated two: based on vocabulary accent order model and the stratification phrase accent order model of maximum entropy.The baseline system demoder accelerates decoding with bundle beta pruning and a cube technology of prunning branches, trains to optimize feature weight with minimal error rate.Acquiescence transfers the order longest distance to be set to 8, and the source language end of phrase rule and target language end comprise the word number to be restricted to 7(identical with the Moses default setting).For the phrase rule collection, each source language phrase only keeps front 30 translation candidates according to the phrase translation probability.
It is right that the training data that uses in the present embodiment comprises 1,900,000 Chinese-English bilingual sentences, and this training data comes from NIST MT 2008 NIST partial data in the extensive bilingual expectation that provides is is provided.At first, with the GIZA++ instrument training data is carried out two-way word alignment, use afterwards " grow-diag-final-and " heuristic algorithm that two-way word alignment result is carried out symmetrization and process.In addition, make the Xinhua part of GIZAWORD in English and the target language of bilingual data partly train 5 gram language model in this experiment.About exploitation collection and test set, present embodiment has used the test set (919) of NIST MT2003 as the exploitation collection of weight tuning, uses simultaneously the test set (containing respectively 1788 and 1082 sentences) of NIST MT 2004 and NIST MT 2005 as the test set of evaluation system translation quality.Translation quality is estimated by the BLEU evaluation index of using the insensitive IBM version of context.
Table 1. baseline system and the combined method Comparison of experiment results on exploitation collection (NIST MT 2003) and test set (NIST MT2004 and NIST MT 2005), wherein every group of experimental result taken turns to test by 5 and averaged
The experimental result of rule of combination abstracting method under various combination value n arranges that table 1 expression benchmark abstracting method and the present invention propose, the evaluation of result index is by the BLEU value representation.From table 1, can find out in " minimum rule " row, when only extracting minimum rule, the inventive method will obtain a very little phrase rule collection, but because minimum rule set has been lost a large amount of contextual informations in the process that extracts, so the average translation Performance Ratio baseline system on exploitation collection and test set reduces by 1.37 BLEU points.When carrying out rule of combination when extracting, can obtain comprising the more phrase rule collection of multi-context information, simultaneously the BLEU value anywhere rule quantity increase sustainable growth.For example, compare by " pedestal method " in the table 1 and " 2-Composed " method, can find when extracting 2-composed phrase rule collection, can obtain the translation performance suitable with pedestal method, meanwhile, the size of the phrase rule collection of 2-Composed method acquisition reduces 44.3% than pedestal method.Prove further that by experiment when extracting the phrase rule of 3-Composed and 4-Composed, the exploitation collection all improves than baseline system and 2-Composed method with the average BLEU value of test set.Consider at the same time in the situation of translation performance and phrase rule size, peak performance in the translation performance of 2-Composed phrase rule and table 1 experiment is comparable, the phrase rule size but decreases drastically simultaneously, and namely the 2-Comopsed phrase rule reaches optimum substantially.Find out that from the experimental result of table 1 that the method that the present invention proposes can effectively generate is high-quality, compact, contain simultaneously the phrase rule collection of more contextual information.
In benchmark phrase rule abstracting method, when the maximum number that comprises word when source language and target language phrase is set to different value, can effectively adjust the size of phrase rule collection.Fig. 3 has compared pedestal method and the BLEU value of combined method under different the setting.Wherein transverse axis is expressed as the size (unit 1,000,000) of phrase table, and the longitudinal axis is the BLEU value.Solid line represents among Fig. 3 is situation when phrase length is set to different value in the benchmark Rule Extracting Algorithm, the expression of solid square point is concrete experiment setting in solid line, as " length=3 " expression be that the source language of phrase rule in baseline system and the maximum length of target language phrase all are set to 3, other is similar with it.Situation when n is set to different value in the phrase extraction method that is based on combination that dotted line represents among Fig. 3.As can be seen from Figure 3, in the n-composed phrase rule abstracting method that the present invention proposes, when n 〉=2, can obtain the translation performance suitable with the benchmark abstracting method; Can find out simultaneously that the present invention proposes to make up the balance that the phrase rule abstracting method can reach rule set size and translation system faster.From then on can be observed among the figure, when only using minimum rule set, the value of translation Performance Ratio (〉=2)-composed combined method has significantly reduction, this has also illustrated the validity based on combination phrase extraction method that the present invention proposes from the side, and the phrase rule that simultaneously explanation contains multi-context information more has very large impact to the performance of translation system.
The present invention uses the ratio situation of minimum phrase rule and rule of combination to add up to demoder, and the 30-best translation result of this statistics on exploitation collection and test set carries out.Fig. 4 represents is statistical conditions on exploitation collection and test set, and wherein n-composed* represents the rule of combination that only formed by the individual minimum principle combinations of n.As can be seen from Figure 4, demoder tends to select the rule (such as minimal and 2-composed*) of lacking in most cases when using phrase rule to translate.The rule of combination that is made of more minimum phrase rule then seldom uses (such as 4-composed*) when translating.Why the experimental result of Fig. 4 has been explained simultaneously and has been used the 2-Composed rule of combination can obtain superior performance in the table 1.
The phrase rule abstracting method that the application of the invention proposes, can obtain one is based on the high-quality of the statictic machine translation system service of phrase, the phrase rule collection of simplifying.By comparing with using heuristic phrase extraction method the most extensive, that the performance performance is excellent, guaranteeing that the phrase rule collection that the method that the present invention proposes extracts reduces 56.5% than the phrase rule collection that pedestal method extracts in the situation that the translation performance does not reduce.Find by the analysis to experimental result, on some data set, by using the phrase extraction method based on combination, can obtain the raising of BLEU value.Simultaneously by a large amount of experiments, to having carried out rational checking based on the validity of the phrase rule abstracting method that makes up.
Through increase income the checking based on the statictic machine translation system of phrase in the system of NiuTrans, compare with the Rule Extracting Algorithm of default setting among the Moses, what the present invention proposed is guaranteeing can to have obtained a more succinct phrase rule collection in the situation that the translation performance does not reduce based on the rule of combination abstracting method.When extracting the 2-composed phrase rule, the quality of the translation rule that abstracting method of the present invention obtains is suitable with the default set of rules of Moses, and phrase rule integrates size as 56.5% of Moses default setting rule set simultaneously.Experimental result shows equally, and when along with the increasing of the minimum phrase rule number of times of combination, the performance of translation system is not compared with 2-composed phrase rule performance and shown a marked increase.In the situation of taking into account system translation performance and phrase rule collection size, it is optimum that the 2-composed phrase rule reaches substantially at the same time.

Claims (5)

  1. One kind based on the combination the phrase rule abstracting method, it is characterized in that may further comprise the steps:
    One " minimum phrase rule " of structure in bilingual corpora;
    Construct one and contain the more phrase rule collection of multi-context information by making up minimum phrase rule, form by " the phrase rule collection of combination "; Based on the phrase rule collection of combination, from the given bilingual parallel corpora that contains word alignment information, generate minimum phrase rule collection, and leave in the Hash data structure;
    The value of combination frequency n is set, and the phrase rule of tectonic association judges that by minimum phrase rule collection the phrase rule of this combination is comprised of several minimum phrase rules;
    If the phrase rule of this combination forms by being less than or equal to the concentrated minimum phrase rule of the minimum phrase rule of n bar, put it in the new Hash data structure;
    Export the concentrated phrase rule of phrase rule of new minimum phrase rule collection and combination, once the phrase rule extraction process based on combination finishes.
  2. 2. by the phrase rule abstracting method based on combination claimed in claim 1, it is characterized in that: described minimum phrase rule is: in the situation consistent with the word alignment Information preservation, can not be broken down into two or more rule again.
  3. 3. by the phrase rule abstracting method based on combination claimed in claim 1, it is characterized in that: the phrase rule of described combination is: a phrase rule is consistent with the word alignment Information preservation, this phrase rule is by n of same training sentence centering or merge less than n minimum phrase rule and to form simultaneously, and this rule-like is the phrase rule of combination.
  4. 4. by the phrase rule abstracting method based on combination claimed in claim 1, it is characterized in that: if the phrase rule of this combination is comprised of the minimum phrase rule of concentrating greater than the minimum phrase rule of n bar, then do not process, this phrase rule extraction process based on combination finishes.
  5. 5. by the phrase rule abstracting method based on combination claimed in claim 1, it is characterized in that: the size of the phrase rule collection of described combination is adjusted by the value of combination frequency n in the phrase rule of combination, the value that namely makes up frequency n is larger, and the phrase rule collection of the combination that obtains is larger.
CN201210464597.6A 2012-11-16 2012-11-16 Phrase rule abstracting method based on combination Active CN102999486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210464597.6A CN102999486B (en) 2012-11-16 2012-11-16 Phrase rule abstracting method based on combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210464597.6A CN102999486B (en) 2012-11-16 2012-11-16 Phrase rule abstracting method based on combination

Publications (2)

Publication Number Publication Date
CN102999486A true CN102999486A (en) 2013-03-27
CN102999486B CN102999486B (en) 2016-12-21

Family

ID=47928068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210464597.6A Active CN102999486B (en) 2012-11-16 2012-11-16 Phrase rule abstracting method based on combination

Country Status (1)

Country Link
CN (1) CN102999486B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391885A (en) * 2014-11-07 2015-03-04 哈尔滨工业大学 Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training
CN107463548A (en) * 2016-06-02 2017-12-12 阿里巴巴集团控股有限公司 Short phrase picking method and device
CN107943852A (en) * 2017-11-06 2018-04-20 首都师范大学 Chinese parallelism sentence recognition methods and system
CN108241609A (en) * 2016-12-23 2018-07-03 科大讯飞股份有限公司 The recognition methods of parallelism sentence and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6275792B1 (en) * 1999-05-05 2001-08-14 International Business Machines Corp. Method and system for generating a minimal set of test phrases for testing a natural commands grammar
CN1465018A (en) * 2000-05-11 2003-12-31 南加利福尼亚大学 Machine translation mothod
CN1489086A (en) * 2002-10-10 2004-04-14 莎 刘 Semantic-stipulated text translation system and method
CN101989287A (en) * 2009-07-31 2011-03-23 富士通株式会社 Method and equipment for generating rule for statistics-based machine translation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6275792B1 (en) * 1999-05-05 2001-08-14 International Business Machines Corp. Method and system for generating a minimal set of test phrases for testing a natural commands grammar
CN1465018A (en) * 2000-05-11 2003-12-31 南加利福尼亚大学 Machine translation mothod
CN1489086A (en) * 2002-10-10 2004-04-14 莎 刘 Semantic-stipulated text translation system and method
CN101989287A (en) * 2009-07-31 2011-03-23 富士通株式会社 Method and equipment for generating rule for statistics-based machine translation

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391885A (en) * 2014-11-07 2015-03-04 哈尔滨工业大学 Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training
CN104391885B (en) * 2014-11-07 2017-07-28 哈尔滨工业大学 A kind of abstracting method of the chapter level than the parallel phrase pair of language material trained based on parallel corpora
CN107463548A (en) * 2016-06-02 2017-12-12 阿里巴巴集团控股有限公司 Short phrase picking method and device
CN108241609A (en) * 2016-12-23 2018-07-03 科大讯飞股份有限公司 The recognition methods of parallelism sentence and system
CN108241609B (en) * 2016-12-23 2022-02-01 科大讯飞股份有限公司 Ranking sentence identification method and system
CN107943852A (en) * 2017-11-06 2018-04-20 首都师范大学 Chinese parallelism sentence recognition methods and system
CN107943852B (en) * 2017-11-06 2020-10-30 首都师范大学 Chinese comparison sentence recognition method and system

Also Published As

Publication number Publication date
CN102999486B (en) 2016-12-21

Similar Documents

Publication Publication Date Title
CN106446148A (en) Cluster-based text duplicate checking method
CN103136359B (en) Single document abstraction generating method
CN102999486A (en) Phrase rule extracting method based on combination
He et al. Chinese named entity recognition and word segmentation based on character
CN107273474A (en) Autoabstract abstracting method and system based on latent semantic analysis
Stojanovski et al. Improving anaphora resolution in neural machine translation using curriculum learning
CN106570112A (en) Improved ant colony algorithm-based text clustering realization method
CN106610954A (en) Text feature word extraction method based on statistics
Guo et al. Japanese translation teaching corpus based on bilingual non parallel data model
Lee et al. MAFiD: Moving Average Equipped Fusion-in-Decoder for Question Answering over Tabular and Textual Data
Tan et al. A study of multilingual neural machine translation
Koehn et al. Interpolated backoff for factored translation models
Huang et al. An Extraction-Abstraction Hybrid Approach for Long Document Summarization
Crego et al. Syntax-enhanced N-gram-based SMT
Huang et al. Lul’s WMT22 automatic post-editing shared task submission
Chen et al. Learn from yesterday: A semi-supervised continual learning method for supervision-limited text-to-sql task streams
Zettlemoyer et al. Selective phrase pair extraction for improved statistical machine translation
Yinhan et al. Calculation of Chinese-Thai Cross-Language Similarity Based on Sentence Embedding
Lin et al. Two-Stage Encoder for Pointer-Generator Network with Pretrained Embeddings
Yuan et al. Approach of Customer Requirement Analysis Based on Requirement Element and Improved HoQ in Product Configuration Design.
Archanjo Jose et al. A Multilingual Translator to SQL with Database Schema Pruning to Improve Self-Attention
Rahul et al. Rule based reordering and morphological processing for English-Malayalam statistical machine translation
Zhang et al. Exploring hybrid character-words representational unit in classical-to-modern Chinese machine translation
Zhengxian et al. Employing topic modeling for statistical machine translation
Ninh Le et al. Doubly-polarized $ WZ $ hadronic production at NLO QCD+ EW: Calculation method and further results

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220207

Address after: 110004 1001 - (1103), block C, No. 78, Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee after: Calf Yazhi (Shenyang) Technology Co.,Ltd.

Address before: Room 1517, No. 55, Sanhao Street, Heping District, Shenyang, Liaoning 110003

Patentee before: SHENYANG YAYI NETWORK TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220713

Address after: 110004 11 / F, block C, Neusoft computer city, 78 Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee after: SHENYANG YAYI NETWORK TECHNOLOGY CO.,LTD.

Address before: 110004 1001 - (1103), block C, No. 78, Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee before: Calf Yazhi (Shenyang) Technology Co.,Ltd.

TR01 Transfer of patent right