CN104794110A - Machine translation method and device - Google Patents

Machine translation method and device Download PDF

Info

Publication number
CN104794110A
CN104794110A CN201410026026.3A CN201410026026A CN104794110A CN 104794110 A CN104794110 A CN 104794110A CN 201410026026 A CN201410026026 A CN 201410026026A CN 104794110 A CN104794110 A CN 104794110A
Authority
CN
China
Prior art keywords
item
triggered
language vocabulary
text
mutual information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410026026.3A
Other languages
Chinese (zh)
Other versions
CN104794110B (en
Inventor
贲国生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410026026.3A priority Critical patent/CN104794110B/en
Publication of CN104794110A publication Critical patent/CN104794110A/en
Application granted granted Critical
Publication of CN104794110B publication Critical patent/CN104794110B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a machine translation method and device, and belongs to the technical field of text processing. The machine translation method comprises the steps that a to-be-translated first source language vocabulary of a current text is obtained, and one or more to-be-selected target language vocabularies corresponding to the first source language vocabulary are confirmed; according to a corpus, first point-by-point mutual information between each to-be-selected target language vocabulary and a target language vocabulary corresponding to each translated second source language vocabulary of the current text is confirmed, and second point-by-point mutual information between each to-be-selected target language vocabulary and each second source language vocabulary is confirmed according to the corpus; a translation result of the first source language vocabulary is confirmed according to the first point-by-point mutual information and the second point-by-point mutual information corresponding to each to-be-selected target language vocabulary. By adopting the point-by-point mutual information between target language ends and the point-by-point mutual information between source language ends and the target language ends, to-be-translated source language vocabularies are translated; accordingly, the translation quality is high when a source language is translated to a target language.

Description

Machine translation method and device
Technical field
The present invention relates to text-processing technical field, particularly a kind of machine translation method and device.
Background technology
Along with the development of science and technology now and the increased of various countries' information exchange, interstate aphasis seems even more serious, because the operating type of Traditional Man translation can not satisfy the demands far away, and mechanical translation changes a kind of natural language the interpretative system of another kind of natural language into as a kind of computing machine that utilizes, owing to can the processing speed advantage of computer translate rapidly, and can translate better in conjunction with the context of co-text of entire chapter document simultaneously, therefore, the main flow of interpretative system is become gradually.
Two kinds of machine translation methods are provided: in the first machine translation method in correlation technique, obtain the first source language vocabulary to be translated in current text, and determine at least one target language vocabulary to be selected that the first source language vocabulary is corresponding, determine the pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus, the pointwise mutual information corresponding according to each target language vocabulary to be selected determines the translation result of the first source language vocabulary.Wherein, Lexical Cohesion is mainly divided into repetition and collocation, repeat to refer to the repetition of vocabulary item in text, collocation is mainly concerned with the vocabulary item of identical, similar or relevant semantic relation, as the collocation of the collocation of the collocation of superordination, the next relation, identical relation, the collocation of closeness relation, the collocation of inverse relationship, the collocation etc. of complementary relationship.
In the second machine translation method, obtain the first source language vocabulary to be translated in current text, and determine at least one target language vocabulary to be selected that the first source language vocabulary is corresponding, determine the pointwise mutual information between each target language vocabulary to be selected and the second source language vocabulary according to corpus, the pointwise mutual information corresponding according to each target language vocabulary to be selected determines the translation result of the first source language vocabulary.
Realizing in process of the present invention, inventor finds that prior art at least exists following problem:
Because the pointwise mutual information between target language end and the pointwise mutual information between source language end to target language end have certain reference value when carrying out mechanical translation, and any one in two kinds of pointwise mutual informations can improve the quality of translation, and two kinds of machine translation methods all only make use of wherein a kind of pointwise mutual information in correlation technique, thus cause translating when source language translation being become target language of low quality.
Summary of the invention
In order to solve the problem of prior art, embodiments provide a kind of machine translation method and device.Described technical scheme is as follows:
On the one hand, provide a kind of machine translation method, described method comprises:
Obtain the first source language vocabulary to be translated in current text, and determine at least one target language vocabulary to be selected that described first source language vocabulary is corresponding;
Determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus, and determine the second pointwise mutual information between each target language vocabulary to be selected and described second source language vocabulary according to described corpus;
The first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of described first source language vocabulary.
On the other hand, provide a kind of machine translation apparatus, described device comprises:
Acquisition module, for obtaining the first source language vocabulary to be translated in current text;
First determination module, for determining at least one target language vocabulary to be selected that described first source language vocabulary is corresponding;
Second determination module, for determining the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus;
3rd determination module, for determining the second pointwise mutual information between each target language vocabulary to be selected and described second source language vocabulary according to described corpus;
4th determination module, for determining the translation result of described first source language vocabulary according to the first pointwise mutual information corresponding to each target language vocabulary to be selected and the second pointwise mutual information.
The beneficial effect of the technical scheme that the embodiment of the present invention provides is:
By determining at least one target language vocabulary to be selected that the first source language vocabulary to be translated in current text is corresponding, according to corpus determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text and and the second source language vocabulary between the second pointwise mutual information after, the first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary.Owing to have employed the pointwise mutual information between target language end and the pointwise mutual information between source language end to target language end is translated source language vocabulary to be translated simultaneously, therefore, source language translation become the quality translated during target language higher.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is a kind of machine translation method process flow diagram that the embodiment of the present invention one provides;
Fig. 2 is a kind of machine translation method process flow diagram that the embodiment of the present invention two provides;
Fig. 3 is the structural representation of a kind of machine translation apparatus that the embodiment of the present invention three provides;
Fig. 4 is the structural representation of a kind of first determination module that the embodiment of the present invention three provides;
Fig. 5 is the structural representation of a kind of second determination module that the embodiment of the present invention three provides;
Fig. 6 is the structural representation of a kind of computing unit that the embodiment of the present invention three provides;
Fig. 7 is the structural representation of the another kind of computing unit that the embodiment of the present invention three provides;
Fig. 8 is the structural representation of a kind of terminal that the embodiment of the present invention four provides.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Embodiment one
Due to when carrying out mechanical translation, pointwise mutual information between target language end and the pointwise mutual information between source language end to target language end have certain reference value when carrying out mechanical translation, and any one in two kinds of pointwise mutual informations can improve the quality of translation, if only determine translation result according to wherein one, then what may cause translating when source language translation being become target language is of low quality.
In order to the quality translated when improving and source language translation become target language, embodiments provide a kind of machine translation method, the method can be applied to terminal, this terminal includes but not limited to mobile phone, computing machine, panel computer etc., and the present embodiment does not limit the concrete form of terminal.See Fig. 1, the method flow that the present embodiment provides comprises:
101: obtain the first source language vocabulary to be translated in current text, and determine at least one target language vocabulary to be selected that the first source language vocabulary is corresponding;
102: determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus, and determine the second pointwise mutual information between each target language vocabulary to be selected and the second source language vocabulary according to corpus;
Wherein, determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus, include but not limited to:
Using each target language vocabulary to be selected as the item that is triggered, and in the target language vocabulary that the second source language vocabulary is corresponding, determine at least one first triggering item of each item correspondence that is triggered, the first triggering item is target language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
Trigger the sub-pointwise mutual information between item according to each item and corresponding each first that is triggered of corpus calculating, and determine the first pointwise mutual information between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary according to each item and corresponding each first sub-pointwise mutual information triggered between item that is triggered.
Wherein, determine the second pointwise mutual information between each target language vocabulary to be selected and the second source language vocabulary according to corpus, include but not limited to:
Using each target language vocabulary to be selected as the item that is triggered, and in the second source language vocabulary, determine at least one second triggering item of each item correspondence that is triggered, the second triggering item is the second source language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
Trigger the sub-pointwise mutual information between item according to each item and corresponding each second that is triggered of corpus calculating, and determine each the second pointwise mutual information be triggered between item and the second source language vocabulary according to each item and corresponding each second sub-pointwise mutual information triggered between item that is triggered.
Wherein, trigger the sub-pointwise mutual information between item according to each item and corresponding each first that is triggered of corpus calculating, include but not limited to:
Calculate each item and corresponding each first that is triggered according to corpus and trigger first joint probability of item under corresponding default Lexical Cohesion relation;
Each first marginal probability of item under the default Lexical Cohesion relation of correspondence and second marginal probability of the first triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered is calculated according to corpus;
Trigger each item and corresponding each first that is triggered of first joint probability of item under corresponding default Lexical Cohesion relation, the first marginal probability and the second edge probability calculation according to each item and corresponding each first that is triggered and trigger sub-pointwise mutual information between item.
Wherein, calculate each item and corresponding each first that is triggered according to corpus and trigger first joint probability of item under corresponding default Lexical Cohesion relation, include but not limited to:
Add up in the text of corpus and occur that each item that is triggered triggers item with corresponding each first and meets the first quantity of the text of corresponding default Lexical Cohesion relation simultaneously;
In the text of corpus, statistics has the second quantity that each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each item and corresponding each first that is triggered according to the first quantity and the second quantity and trigger first joint probability of item under corresponding default Lexical Cohesion relation.
Wherein, calculate each first marginal probability of item under the default Lexical Cohesion relation of correspondence and second marginal probability of the first triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered according to corpus, include but not limited to:
There is the 3rd quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 4th quantity of the text of the first triggering item of each item correspondence that is triggered in statistics in the text of corpus;
In the text of corpus, statistics has the second quantity that each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each first marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 3rd quantity and the second quantity, and according to the 4th quantity and the second quantity calculate each item that is triggered corresponding first trigger second marginal probability of item under corresponding default Lexical Cohesion relation.
Wherein, trigger the sub-pointwise mutual information between item according to each item and corresponding each second that is triggered of corpus calculating, include but not limited to:
Calculate each item and corresponding each second that is triggered according to corpus and trigger second joint probability of item under corresponding default Lexical Cohesion relation;
Each three marginal probability of item under the default Lexical Cohesion relation of correspondence and four marginal probability of the second triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered is calculated according to corpus;
Trigger second joint probability of item under corresponding default Lexical Cohesion relation, the 3rd marginal probability and the 4th marginal probability according to each item and corresponding each second that is triggered to calculate each item and corresponding each second that is triggered and trigger sub-pointwise mutual information between item.
Wherein, calculate each item and corresponding each second that is triggered according to corpus and trigger second joint probability of item under corresponding default Lexical Cohesion relation, include but not limited to:
Add up in the text of corpus and occur that each item that is triggered triggers item with corresponding each second and meets the 5th quantity of the text of corresponding default Lexical Cohesion relation simultaneously;
In the text of corpus, statistics has the 6th quantity that each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each item and corresponding each second that is triggered according to the 5th quantity and the 6th quantity and trigger second joint probability of item under corresponding default Lexical Cohesion relation.
Wherein, calculate each three marginal probability of item under the default Lexical Cohesion relation of correspondence and four marginal probability of the second triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered according to corpus, include but not limited to:
There is the 7th quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 8th quantity of the text of the second triggering item of each item correspondence that is triggered in statistics in the text of corpus;
In the text of corpus, statistics has the 6th quantity that each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each three marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 7th quantity and the 6th quantity, and according to the 8th quantity and the 6th quantity calculate each item that is triggered corresponding second trigger four marginal probability of item under corresponding default Lexical Cohesion relation.
103: the first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary.
The method that the present embodiment provides, by determining at least one target language vocabulary to be selected that the first source language vocabulary to be translated in current text is corresponding, according to corpus determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text and and the second source language vocabulary between the second pointwise mutual information after, the first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary.Owing to have employed the pointwise mutual information between target language end and the pointwise mutual information between source language end to target language end is translated source language vocabulary to be translated simultaneously, therefore, source language translation become the quality translated during target language higher.
Embodiment two
Embodiments provide a kind of machine translation method, in conjunction with the content of above-described embodiment one, see Fig. 2, the method flow that the present embodiment provides comprises:
201: obtain the first source language vocabulary to be translated in current text, and determine at least one target language vocabulary to be selected that the first source language vocabulary is corresponding;
Before obtaining the first source language vocabulary to be translated in current text, can pre-set a Text Entry, obtain the text to be translated of user's input in Text Entry, the present embodiment does not do concrete restriction to this.Using the source language vocabulary in source language vocabulary to be translated in current text as the first source language vocabulary, and obtain the first source language vocabulary to be translated in current text, wherein, the number of the first source language vocabulary got can be one, or be the number pre-set, the present embodiment does not do concrete restriction to the quantity of the first source language vocabulary got.
The present embodiment is to determining that the determination mode of at least one target language vocabulary to be selected that the first source language vocabulary is corresponding does concrete restriction, include but not limited to: retrieve in a database according to the first source language vocabulary to be translated in the current text got, determine according to result for retrieval at least one target language vocabulary to be selected that the first source language vocabulary is corresponding.Wherein, source language and target language are arbitrary two kinds of natural languages, and such as, source language is Chinese, and target language is English, and the present embodiment does not do concrete restriction to this.It should be noted that, source language and target language are two kinds of different natural languages, and in addition, the database adopted during retrieval can be selected as required, and the present embodiment does not do concrete restriction to the database used.Wherein, the different pieces of information under a kind of natural language can be stored in database, as the vocabulary in this natural language, phrase, morpheme etc., certainly, the different pieces of information under multiple natural language can also be stored, the present embodiment does not do concrete restriction to the kind of the natural language stored in database, does not also do concrete restriction to the content of the different pieces of information under the often kind of natural language stored in database.
Such as, a Chinese language text is translated into corresponding English text, if current first source language vocabulary to be translated in this Chinese language text got, namely Chinese vocabulary to be translated is " vehicles ", now, can retrieve in a database according to Chinese vocabulary to be translated, thus determine at least one target language vocabulary to be selected.Such as, determine that target language vocabulary can be vocabulary: vehicle, transportation etc.Certainly, all right other English glossary corresponding of Chinese vocabulary " vehicles ", the present embodiment does not do concrete restriction to this.
It should be noted that, in above-mentioned English glossary, some vocabulary comprises multiple implication, but all includes the implication of " vehicles ".
202: determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus;
The present embodiment does not do concrete restriction to the determination mode of the first pointwise mutual information between the target language end Lexical Cohesion determining each item that is triggered according to the target language vocabulary translated in corpus and current text, include but not limited to: using each target language vocabulary to be selected as the item that is triggered, and in the target language vocabulary that the second source language vocabulary is corresponding, determine at least one first triggering item of each item correspondence that is triggered, the first triggering item is target language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation; Trigger the sub-pointwise mutual information between item according to each item and corresponding each first that is triggered of corpus calculating, and determine the first pointwise mutual information between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary according to each item and corresponding each first sub-pointwise mutual information triggered between item that is triggered.
Wherein, because the possibility of each target language vocabulary to be selected as translation result needs to determine according to content fixed in current text, therefore, using each target language vocabulary to be selected as the item that is triggered.Such as, first source language vocabulary to be translated is " vehicles ", corresponding target language vocabulary to be selected is vehicle, transportation etc., now, can respectively using vehicle as the item 1 that is triggered, using transportation as the item 2 that is triggered, the present embodiment does not do concrete restriction to this.Lexical Cohesion is mainly divided into repetition and collocation, repeat to refer to the repetition of vocabulary item in text, collocation is mainly concerned with identical, the vocabulary item of similar or relevant semantic relation, as the collocation of superordination, the collocation of the next relation, the collocation of identical relation, the collocation of closeness relation, the collocation of inverse relationship, the collocation etc. of complementary relationship, therefore, presetting Lexical Cohesion relation can be superordination, the next relation, synonymy, antonymy etc., the present embodiment does not do concrete restriction to the kind of default Lexical Cohesion relation, also concrete restriction is not done to the content of default Lexical Cohesion relation.
When determining at least one first triggering item of each item correspondence that is triggered in the target language vocabulary that the second source language vocabulary is corresponding, include but not limited to: search each target language vocabulary of item under default Lexical Cohesion relation that be triggered in a database, if what find is eachly triggered in the target language vocabulary of item under the default Lexical Cohesion relation target language vocabulary that also translated each second source language vocabulary is corresponding in current text simultaneously, then using find in a database and the target language vocabulary simultaneously existed in the target language vocabulary that translated each second source language vocabulary is corresponding in current text as the target language vocabulary satisfied condition, and the target language vocabulary satisfied condition is defined as at least one first triggering item of each item correspondence that is triggered.For any one be triggered item and any one default Lexical Cohesion relation, item and the Lexical Cohesion relation of this being triggered is designated as be triggered item 1 and Lexical Cohesion relation 1 respectively, search the target language vocabulary of item 1 under Lexical Cohesion relation 1 that be triggered in a database, if what find is triggered in the target language vocabulary 1 of item 1 under Lexical Cohesion relation 1 target language vocabulary that also translated each second source language vocabulary is corresponding in current text simultaneously, then using target language vocabulary 1 as the target language vocabulary satisfied condition, and the target language vocabulary 1 satisfied condition is triggered item as the item 1 that is triggered in corresponding at least one of Lexical Cohesion relation 1 time first.
For the ease of explaining explanation, to determine at least one first triggering item of one of them item correspondence that is triggered, the item that this is triggered is designated as the item 1 that is triggered, preset Lexical Cohesion and close to be one and to close for bottom to be example, search all D-goal language vocabularies of item 1 under the next relation that are triggered in a database, comprise target language vocabulary 1, target language vocabulary 3 and target language vocabulary 5 for all D-goal language vocabularies found in a database.If include target language vocabulary 3 in the target language vocabulary that in current text, translated each second source language vocabulary is corresponding, then using target language vocabulary 3 at least one first triggering item as item 1 correspondence that is triggered.Certainly, alternate manner can also be adopted to determine at least one first triggering item of each item correspondence that is triggered according to actual conditions, the present embodiment does not do concrete restriction to this.Wherein, the database of employing can be selected as required, and the present embodiment does not do concrete restriction to the database used.
Such as, suppose that the target language vocabulary that the translated each second source language vocabulary of current text is corresponding is ... car ... orange...bus ..., wherein, abridged part is the target language vocabulary that current text other source language vocabulary translated is corresponding.For the ease of understanding, with the item that is triggered for vehicle, default Lexical Cohesion pass is the next closing is example, all D-goal language vocabularies of a vehicle that is triggered are searched in a database, as bus, car according to a vehicle that is triggered, plane etc., the present embodiment does not do concrete restriction to this.Now, can determine to include D-goal language vocabulary corresponding to two vehicle that are triggered in the target language vocabulary that in current text, translated each second source language vocabulary is corresponding, i.e. car and bus, therefore, when Lexical Cohesion pass is the next relation, be triggered a vehicle corresponding two first triggering items in the target language vocabulary that the translated each second source language vocabulary of current text is corresponding, is respectively car and bus.
Except said method, at least one first triggering item of each item correspondence that is triggered is determined in the target language vocabulary that the second source language vocabulary is corresponding, can also include but not limited to: target language vocabulary corresponding for each second source language vocabulary translated in current text and each item that is triggered are formed a target language vocabulary pair respectively, search each target language vocabulary pair of item under default Lexical Cohesion relation that be triggered in a database, if the target language vocabulary centering that each target language vocabulary of item under default Lexical Cohesion relation that be triggered found forms the target language vocabulary that also translated second source language vocabulary is corresponding in current text and each item that is triggered simultaneously, then will find in a database and the target language vocabulary that simultaneously there is translated second source language vocabulary is corresponding in current text target language vocabulary and the target language vocabulary centering that each item that is triggered forms to as the target language vocabulary pair satisfied condition, and the target language vocabulary of the target language vocabulary centering satisfied condition is defined as at least one first triggering item of each item correspondence that is triggered.For any one be triggered item and any one default Lexical Cohesion relation, item and the Lexical Cohesion relation of this being triggered is designated as be triggered item 1 and Lexical Cohesion relation 1 respectively, target language vocabulary corresponding for translated for current text each second source language vocabulary and the item 1 that is triggered are formed a target language vocabulary pair respectively, search all target language vocabulary pair of item 1 under Lexical Cohesion relation 1 that are triggered in a database, if the target language vocabulary centering that the target language vocabulary of item 1 under Lexical Cohesion relation 1 that be triggered found forms with the item 1 that is triggered the target language vocabulary that also translated second source language vocabulary is corresponding in current text simultaneously, then will find in a database and the target language vocabulary that simultaneously there is the target language vocabulary centering that translated second source language vocabulary is corresponding in current text target language vocabulary forms with the item 1 that is triggered to as the target language vocabulary pair satisfied condition, and the item 1 that is defined as by the target language vocabulary of the target language vocabulary centering satisfied condition being triggered triggers item in corresponding at least one of Lexical Cohesion relation 1 time first.
It should be noted that there is following situation: under certain Lexical Cohesion relation, certain item that is triggered does not have first of this item correspondence that is triggered to trigger item in the target language vocabulary that the second source language vocabulary is corresponding.Now, can to continue to determine according to the method described above under other kind Lexical Cohesion relation that this item that is triggered corresponding first triggers item in the target language vocabulary that the second source language vocabulary is corresponding.If this is triggered there is not the first corresponding triggering item in item under often kind of Lexical Cohesion relation in the target language vocabulary that the second source language vocabulary is corresponding, now, to continue to determine according to the method described above under often kind of Lexical Cohesion relation that other item that is triggered corresponding first triggers item in the target language vocabulary that the second source language vocabulary is corresponding.
The present embodiment is not done specifically to limit to the account form calculating the sub-pointwise mutual information that each item and corresponding each first that is triggered triggers between item according to corpus, includes but not limited to: calculate each item and corresponding each first that is triggered according to corpus and trigger first joint probability of item under corresponding default Lexical Cohesion relation; Each first marginal probability of item under the default Lexical Cohesion relation of correspondence and second marginal probability of the first triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered is calculated according to corpus; Trigger each item and corresponding each first that is triggered of the joint probability of item under corresponding default Lexical Cohesion relation, the first marginal probability and the second edge probability calculation according to each item and corresponding each first that is triggered and trigger sub-pointwise mutual information between item.
Wherein, calculate each item and corresponding each first that is triggered according to corpus and trigger first joint probability of item under corresponding default Lexical Cohesion relation, following computing method can be adopted: add up in the text of corpus and occur that each item that is triggered triggers item with corresponding each first and meets the first quantity of the text of corresponding default Lexical Cohesion relation simultaneously; In the text of corpus, statistics has the second quantity that each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each item and corresponding each first that is triggered according to the first quantity and the second quantity and trigger first joint probability of item under corresponding default Lexical Cohesion relation.
The present embodiment is not done specifically to limit to calculating the account form that each item and corresponding each first that is triggered triggers first joint probability of item under corresponding default Lexical Cohesion relation according to the first quantity and the second quantity, includes but not limited to: the business that the value of the first quantity obtains divided by the value of the second quantity is triggered first joint probability of item under corresponding default Lexical Cohesion relation as each item and corresponding each first that is triggered.
For the ease of understanding, now with the item that is triggered for vehicle, triggering item is car, default Lexical Cohesion pass is the next closing is example, explain the process that each item and corresponding each first that is triggered of calculating triggers first joint probability of item under corresponding default Lexical Cohesion relation, concrete explaination is as follows:
Suppose that in the text of corpus, add up the value with the second quantity of the text of the next relation is 5, wherein, occurred in first text being triggered a vehicle and triggering item car and meeting the next relation simultaneously, namely triggering item car is the D-goal language vocabulary of a vehicle of being triggered, a vehicle that is triggered only has been there is in second text, triggering item car has only been there is in 3rd text, occur in 4th text being triggered a vehicle and triggering item car and meeting the next relation simultaneously, in the 5th text, only occurred a vehicle that is triggered.Now, can add up in the text obtaining corpus occur being triggered simultaneously a vehicle with trigger item car and the value meeting the first quantity of the text of the next relation is 2.
Be triggered a vehicle and first joint probability of triggering item car under the next relation is calculated according to the first quantity and the second quantity, namely the first joint probability is 2/5, certainly, other method can also be adopted to calculate each item and corresponding each first that is triggered according to the first quantity and the second quantity and trigger first joint probability of item under corresponding default Lexical Cohesion relation, the present embodiment does not do concrete restriction to this.Trigger first joint probability of item under corresponding default Lexical Cohesion relation according to each item and corresponding each first that is triggered of corpus calculating and can also adopt other computing method, the present embodiment does not do concrete restriction to this.
Wherein, calculate each first marginal probability of item under the default Lexical Cohesion relation of correspondence and second marginal probability of the first triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered according to corpus, can include but not limited to: the 3rd quantity of the text of each item that is triggered appears in statistics in the text of corpus; There is the 4th quantity of the text of the first triggering item of each item correspondence that is triggered in statistics in the text of corpus; In the text of corpus, statistics has the second quantity that each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each first marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 3rd quantity and the second quantity, and according to the 4th quantity and the second quantity calculate each item that is triggered corresponding first trigger second marginal probability of item under corresponding default Lexical Cohesion relation.
The present embodiment is not done specifically to limit to calculating each account form being triggered first marginal probability of item under corresponding default Lexical Cohesion relation according to the 3rd quantity and the second quantity, also do not do specifically to limit to the first account form triggering second marginal probability of item under corresponding default Lexical Cohesion relation calculating each item that is triggered according to the 4th quantity and the second quantity corresponding, include but not limited to: the business value of the 3rd quantity obtained divided by the value of the second quantity is as each first marginal probability of item under the default Lexical Cohesion relation of correspondence that be triggered, the business that the value of the 4th quantity obtains divided by the value of the second quantity is triggered second marginal probability of item under the default Lexical Cohesion relation of correspondence as first of each item correspondence that is triggered.
For the ease of understanding, same with the item that is triggered for vehicle, triggering item is car, default Lexical Cohesion pass is the next closing is example, explain calculating each the first process triggering second marginal probability of item under the default Lexical Cohesion relation of correspondence being triggered first marginal probability of item under the default Lexical Cohesion relation of correspondence and each item correspondence that is triggered, concrete explaination is as follows:
Suppose that in the text of corpus, add up the value with the second quantity of the text of the next relation is 5, wherein, occurred in first text being triggered a vehicle and triggering item car and meeting the next relation simultaneously, namely triggering item car is the D-goal language vocabulary of a vehicle of being triggered, a vehicle that is triggered only has been there is in second text, triggering item car has only been there is in 3rd text, occur in 4th text being triggered a vehicle and triggering item car and meeting the next relation simultaneously, in the 5th text, only occurred a vehicle that is triggered.
Now, the value can adding up the 3rd quantity of the text of a vehicle that obtains occurring being triggered in the text of corpus is 4.In like manner, in the text of corpus, statistics obtain occurring the being triggered value of the 4th quantity of text of triggering item car corresponding to a vehicle is 3.
Calculate according to the 3rd quantity and the second quantity first marginal probability of a vehicle under the next relation that be triggered, namely the first marginal probability is 4/5.Second marginal probability of triggering item car under the next relation corresponding to a vehicle that is triggered is calculated according to the 4th quantity and the second quantity, the second marginal probability calculated is 3/5, certainly, other method can also be adopted to calculate each first marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 3rd quantity and the second quantity according to actual conditions, the present embodiment does not do concrete restriction to this.According to actual conditions can also adopt other method according to the 4th quantity and the second quantity calculate each item that is triggered corresponding first trigger second marginal probability of item under corresponding default Lexical Cohesion relation, the present embodiment does not do concrete restriction to this.
The present embodiment does not trigger first joint probability of item under corresponding default Lexical Cohesion relation to according to each item and corresponding each first that is triggered, the account form of the sub-pointwise mutual information that the first marginal probability and each item and corresponding each first that is triggered of the second edge probability calculation trigger between item is done specifically to limit, include but not limited to: by the first marginal probability and the second edge probability multiplication, what obtain after the first joint probability being multiplied divided by the above two is long-pending, and using after the business obtained takes the logarithm as final calculation result, final calculation result is triggered sub-pointwise mutual information between item as each item and corresponding each first that is triggered.Said method can represent with following formula:
PMI ( xRy ) = log p ( x , y , R ) p ( x , R ) p ( y , R )
Wherein, p(x, y, R) represent the first joint probability, x representative triggers item, y represents the item that is triggered, and R representative presets Lexical Cohesion relation, p(x, R) the first marginal probability is represented, p(y, R) represent the second marginal probability, PMI(xRy) represent the sub-pointwise mutual information be triggered between a y and triggering item x.
Such as, according to the result of calculation of example in above-mentioned steps, p(x, y, R)=2/5, p(x, R)=4/5, p(y, R)=3/5, now, can PMI(xRy be calculated)=log(5/6).It should be noted that, the truth of a matter of log function can get 2, and can get other numerical value as required, the present embodiment does not do concrete restriction to this yet.
After calculating the sub-pointwise mutual information that each item and corresponding each first that is triggered triggers between item, the first pointwise mutual information can determining between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary according to each item and corresponding each first sub-pointwise mutual information triggered between item that is triggered.The present embodiment does not do concrete restriction to the mode of the first pointwise mutual information determined between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary according to each item and corresponding each first sub-pointwise mutual information triggered between item that is triggered, include but not limited to: determine the pointwise mutual information between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary under often kind of Lexical Cohesion relation according to each item and corresponding each first sub-pointwise mutual information triggered between item that is triggered, determine the first pointwise mutual information between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary according to the pointwise mutual information between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary under often kind of Lexical Cohesion relation.
Wherein, determine that the pointwise mutual information between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary under often kind of Lexical Cohesion relation can include but not limited to according to each item and corresponding each first sub-pointwise mutual information triggered between item that is triggered: add up each item corresponding first number triggering item under often kind of Lexical Cohesion relation that is triggered, the each first corresponding under often kind of Lexical Cohesion relation with this item that is triggered for each item that is triggered sub-pointwise mutual information triggered between item is multiplied, the result obtained after being multiplied is carried out extracting operation, using the net result obtained after extracting operation as the pointwise mutual information between the target language vocabulary corresponding with the second source language vocabulary under often kind of Lexical Cohesion relation of each item that is triggered.Wherein, the number of times of extracting operation can be the number of each item first triggering item of correspondence under often kind of Lexical Cohesion relation that is triggered of statistics, and the present embodiment does not do concrete restriction to this.For any one be triggered item and any one Lexical Cohesion relation, item and the Lexical Cohesion relation of this being triggered is designated as be triggered item 1 and Lexical Cohesion relation 1 respectively, statistics is triggered item 1 in the first number triggering item corresponding to Lexical Cohesion relation 1 time, by being triggered, item 1 is multiplied at each first sub-pointwise mutual information triggered between item that Lexical Cohesion relation 1 time is corresponding with the item 1 that is triggered, the result obtained after being multiplied is carried out extracting operation, using the net result obtained after extracting operation as the pointwise mutual information of item 1 between the target language vocabulary that Lexical Cohesion relation 1 time is corresponding with the second source language vocabulary that be triggered.Wherein, the number of times of extracting operation can be the item 1 that is triggered of statistics in the first number triggering item corresponding to Lexical Cohesion relation 1 time, and the present embodiment does not do concrete restriction to this.
For the ease of explaining explanation, to determine the pointwise mutual information between the target language vocabulary that one of them item that is triggered is corresponding with the second source language vocabulary under a kind of Lexical Cohesion relation wherein: the item that this is triggered is designated as the item 1 that is triggered, suppose that vocabulary joining relation is the next relation, if be triggered, the total number of all triggering items of item 1 under the next relation is n, now, by being triggered, item 1 is multiplied with the sub-pointwise mutual information be triggered between all triggering items of item 1 under the next relation, the result obtained after being multiplied is opened n power, using last result of calculation as the pointwise mutual information between the target language vocabulary corresponding with the second source language vocabulary under the next relation of the item 1 that is triggered.
Such as, presetting Lexical Cohesion pass is the next relation, the triggering item that a vehicle that is triggered is corresponding in the target language vocabulary that the second source language vocabulary is corresponding under the next relation is car and bus, wherein, the value of the sub-pointwise mutual information PMI be triggered between a vehicle and triggering item car is 0.2, the value of the sub-pointwise mutual information PMI be triggered between a vehicle and triggering item bus is 0.8, now, the pointwise mutual information that can calculate between a vehicle that is triggered target language vocabulary corresponding with the second source language vocabulary under the next relation is (0.8*0.2) ^0.5=0.4, circular is, be triggered a vehicle and the sub-pointwise mutual information triggered between item bus and a vehicle that is triggered are multiplied with the sub-pointwise mutual information triggered between item car and take away again square, accordingly, when a vehicle that is triggered has n to trigger item under the next relation, the sub-pointwise mutual information between a vehicle with each triggering item that is triggered can be multiplied and open n power, thus the pointwise mutual information between the target language vocabulary corresponding with the second source language vocabulary under the next relation of a vehicle that obtains being triggered.
The present embodiment is not done specifically to limit to the defining method of the first pointwise mutual information determined between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary according to the pointwise mutual information between the target language vocabulary corresponding with the second source language vocabulary under often kind of Lexical Cohesion relation of each item that is triggered, include but not limited to: the pointwise mutual information between target language vocabulary corresponding with the second source language vocabulary under often kind of Lexical Cohesion relation for each item that is triggered is superposed, using the first pointwise mutual information between target language vocabulary corresponding with the second source language vocabulary as each item that is triggered for the result after superposition.For any one item that is triggered, the item that this is triggered is designated as the item 1 that is triggered, pointwise mutual information between target language vocabulary corresponding with the second source language vocabulary under often kind of Lexical Cohesion relation for the item 1 that is triggered is superposed, using the first pointwise mutual information between target language vocabulary corresponding with the second source language vocabulary as the item 1 that is triggered for the result after superposition.
Such as, preset Lexical Cohesion relation and have two kinds: the next relation and synonymy, pointwise mutual information between the target language vocabulary that a vehicle that is triggered is corresponding with the second source language vocabulary under the next relation is 0.4, pointwise mutual information between target language vocabulary corresponding with the second source language vocabulary under synonymy is 0.6, and the first pointwise mutual information between the target language vocabulary that a vehicle that now can determine to be triggered is corresponding with the second source language vocabulary is (0.4+0.6)=1.In addition, in advance weight can also be set for often kind of Lexical Cohesion relation, superpose after the pointwise mutual information between target language vocabulary corresponding with the second source language vocabulary under often kind of Lexical Cohesion relation for each item that is triggered is multiplied by weight.Certainly, can also adopt other stacking method according to actual conditions, the present embodiment does not do concrete restriction to this.
203: determine the second pointwise mutual information between each target language vocabulary to be selected and the second source language vocabulary according to corpus;
The present embodiment does not do concrete restriction to the determination mode of the second pointwise mutual information determined according to corpus between each target language vocabulary to be selected and the second source language vocabulary, include but not limited to: using each target language vocabulary to be selected as the item that is triggered, and in the second source language vocabulary, determine at least one second triggering item of each item correspondence that is triggered, the second triggering item is the second source language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation; Trigger the sub-pointwise mutual information between item according to each item and corresponding each second that is triggered of corpus calculating, and determine each the second pointwise mutual information be triggered between item and the second source language vocabulary according to each item and corresponding each second sub-pointwise mutual information triggered between item that is triggered.Wherein, the definition of be triggered item and Lexical Cohesion relation can content in refer step 202, repeats no more herein.
When determining at least one second triggering item of each item correspondence that is triggered in the second source language vocabulary, include but not limited to: search each source language vocabulary of item under default Lexical Cohesion relation that is triggered in a database, if what find is eachly triggered the source language vocabulary of item under default Lexical Cohesion relation simultaneously also in current text in translated each second source language vocabulary, then using find in database and the second source language vocabulary simultaneously existed in current text in translated each second source language vocabulary as the source language vocabulary satisfied condition, and the source language vocabulary satisfied condition is defined as at least one second triggering item of each item correspondence that is triggered.For any one be triggered item and any one default Lexical Cohesion relation, item and the Lexical Cohesion relation of this being triggered is designated as be triggered item 1 and Lexical Cohesion relation 1 respectively, search the source language vocabulary 1 of item 1 under Lexical Cohesion relation 1 that is triggered in a database, if what find is triggered the source language vocabulary 1 of item 1 under Lexical Cohesion relation 1 simultaneously also in current text in translated each second source language vocabulary, then using source language vocabulary 1 as the source language vocabulary satisfied condition, and the source language vocabulary 1 this satisfied condition triggers item as the item 1 that is triggered in corresponding at least one of Lexical Cohesion relation 1 time second.
For the ease of explaining explanation, to determine at least one second triggering item of one of them item correspondence that is triggered: the item that this is triggered is designated as the item 1 that is triggered, suppose to preset Lexical Cohesion to close and be a kind of and be the next relation, according to being triggered, item 1 is searched in a database with the next relation, search all the next source language vocabulary of the item 1 that to be triggered under the next relation, comprise source language vocabulary 5, source language vocabulary 9 and source language vocabulary 11 for all the next source language vocabulary found in a database.If include source language vocabulary 5 and source language vocabulary 11 in translated second source language vocabulary in current text, then source language vocabulary 5 and source language vocabulary 11 are triggered items as two second of item 1 correspondence that is triggered.Certainly, alternate manner can also be adopted to determine at least one second triggering item of each item correspondence that is triggered according to actual conditions, the present embodiment does not do concrete restriction to this.
Such as, suppose that the translated second source language vocabulary of current text is ... car ... orange ... passenger vehicle ..., wherein, abridged part is current text other source language content translated.For the ease of understanding, with the item that is triggered for vehicle, default Lexical Cohesion pass is the next closing is example, the all the next source language vocabulary of a vehicle that is triggered is searched in a database according to a vehicle that is triggered, as passenger vehicle, car, aircrafts etc., the present embodiment does not do concrete restriction to this.Now, can determine to include D-goal language vocabulary corresponding to two vehicle that are triggered in translated second source language vocabulary in current text, i.e. " car " and " passenger vehicle ", therefore, when Lexical Cohesion pass is the next relation, be triggered a vehicle corresponding two second triggering items in the translated second source language vocabulary of current text, is respectively " car " and " passenger vehicle ".
Except said method, at least one second triggering item of each item correspondence that is triggered is determined in the second source language vocabulary, can also include but not limited to: each second source language vocabulary translated in current text and each item that is triggered are formed a source language vocabulary pair respectively, search each source language vocabulary pair of item under default Lexical Cohesion relation that is triggered in a database, if each source language vocabulary of item under default Lexical Cohesion relation that is triggered found is to simultaneously also the source language vocabulary centering that the translated second source language vocabulary of current text and each item that is triggered form, then will find in a database and there is translated second source language vocabulary and each item that is triggered form in current text source language vocabulary to as the source language vocabulary pair satisfied condition simultaneously, and the source language vocabulary of the source language vocabulary centering satisfied condition is defined as at least one second triggering item of each item correspondence that is triggered.For any one be triggered item and any one default Lexical Cohesion relation, item and the Lexical Cohesion relation of this being triggered is designated as be triggered item 1 and Lexical Cohesion relation 1 respectively, each second source language vocabulary translated in current text and the item 1 that is triggered are formed a source language vocabulary pair respectively, for the source language vocabulary formed to comprising a 1-source language vocabulary 6 that is triggered, a 1-source language vocabulary 7 that is triggered, be triggered these three source language vocabulary of a 1-source language vocabulary 8 to.Search the source language vocabulary pair of item 1 under Lexical Cohesion relation 1 that is triggered in a database, if the source language vocabulary of item 1 under Lexical Cohesion relation 1 that is triggered found is to being a 1-source language vocabulary 8 that is triggered, and because a source language vocabulary of 1-source language vocabulary 8 composition that is triggered is to the source language vocabulary centering that also translated second source language vocabulary forms with the item 1 that is triggered in current text simultaneously, then using a source language vocabulary that 1-source language vocabulary 8 forms that is triggered to as the source language vocabulary pair satisfied condition, and the source language vocabulary 8 of this source language vocabulary centering is defined as be triggered item 1 Lexical Cohesion relation 1 time corresponding second trigger item.
It should be noted that there is following situation: under certain Lexical Cohesion relation, certain item that is triggered does not have second of this item correspondence that is triggered to trigger item in the second source language vocabulary.Now, can to continue to determine according to the method described above under other kind Lexical Cohesion relation that this item that is triggered corresponding second triggers item in the second source language vocabulary.If this is triggered there is not the second corresponding triggering item in item under often kind of Lexical Cohesion relation in the second source language vocabulary, now, to continue to determine according to the method described above under often kind of Lexical Cohesion relation that other item that is triggered corresponding second triggers item in the second source language vocabulary.
The present embodiment is not done specifically to limit to the account form calculating the sub-pointwise mutual information that each item and corresponding each second that is triggered triggers between item according to corpus, includes but not limited to: calculate each item and corresponding each second that is triggered according to corpus and trigger second joint probability of item under corresponding default Lexical Cohesion relation; Each three marginal probability of item under the default Lexical Cohesion relation of correspondence and four marginal probability of the second triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered is calculated according to corpus; Trigger second joint probability of item under corresponding default Lexical Cohesion relation, the 3rd marginal probability and the 4th marginal probability according to each item and corresponding each second that is triggered to calculate each item and corresponding each second that is triggered and trigger sub-pointwise mutual information between item.
Wherein, calculate each item and corresponding each second that is triggered according to corpus and trigger second joint probability of item under corresponding default Lexical Cohesion relation, can include but not limited to: add up in the text of corpus and occur that each item that is triggered triggers item with corresponding each second and meets the 5th quantity of the text of corresponding default Lexical Cohesion relation simultaneously; In the text of corpus, statistics has the 6th quantity that each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each item and corresponding each second that is triggered according to the 5th quantity and the 6th quantity and trigger second joint probability of item under corresponding default Lexical Cohesion relation.Certainly, can also adopt other computing method according to actual conditions, the present embodiment does not do concrete restriction to this.
The present embodiment is not done specifically to limit to calculating the account form that each item and corresponding each second that is triggered triggers second joint probability of item under corresponding default Lexical Cohesion relation according to the 5th quantity and the 6th quantity, includes but not limited to: the business that the value of the 5th quantity obtains divided by the value of the 6th quantity is triggered second joint probability of item under corresponding default Lexical Cohesion relation as each item and corresponding each second that is triggered.
For the ease of understanding, now with the item that is triggered for vehicle, triggering item is " car ", default Lexical Cohesion pass is the next closing is example, explain the process that each item and corresponding each second that is triggered of calculating triggers second joint probability of item under corresponding default Lexical Cohesion relation, concrete explaination is as follows:
Suppose that corpus stores the bilingual text of Chinese and English, in the text of corpus, statistics has the value of the 5th quantity of the text of the next relation is 5, wherein, occurred in first text being triggered simultaneously a vehicle with trigger item " car " and meet the next relation, namely the D-goal language vocabulary of item " car " for a vehicle that is triggered is triggered, a vehicle that is triggered only has been there is in second text, triggering item " car " has only been there is in 3rd text, occurred in 4th text being triggered simultaneously a vehicle with trigger item " car " and meet the next relation, a vehicle that is triggered only has been there is in 5th text.Now, can add up in the text obtaining corpus occur being triggered simultaneously a vehicle with trigger item " car " and the value meeting the 5th quantity of the text of the next relation is 2.
Be triggered a vehicle and triggering item " car " the second joint probability under the next relation is calculated according to the 5th quantity and the 6th quantity, namely the second joint probability is 2/5, certainly, other method can also be adopted to calculate each item and corresponding each second that is triggered according to the 5th quantity and the 6th quantity and trigger second joint probability of item under corresponding default Lexical Cohesion relation, the present embodiment does not do concrete restriction to this.Trigger second joint probability of item under corresponding default Lexical Cohesion relation according to each item and corresponding each second that is triggered of corpus calculating and can also adopt other computing method, the present embodiment does not do concrete restriction to this.
Wherein, calculate each three marginal probability of item under the default Lexical Cohesion relation of correspondence and four marginal probability of the second triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered according to corpus, can include but not limited to: the 7th quantity of the text of each item that is triggered appears in statistics in the text of corpus; There is the 8th quantity of the text of the second triggering item of each item correspondence that is triggered in statistics in the text of corpus; In the text of corpus, statistics has the 6th quantity that each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each three marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 7th quantity and the 6th quantity, and according to the 8th quantity and the 6th quantity calculate each item that is triggered corresponding second trigger four marginal probability of item under corresponding default Lexical Cohesion relation.Certainly, can also adopt other computing method according to actual conditions, the present embodiment does not do concrete restriction to this.
The present embodiment is not done specifically to limit to calculating each account form being triggered the 3rd marginal probability of item under corresponding default Lexical Cohesion relation according to the 7th quantity and the 6th quantity, also do not do specifically to limit to the second account form triggering the 4th marginal probability of item under corresponding default Lexical Cohesion relation calculating each item that is triggered according to the 8th quantity and the 6th quantity corresponding, include but not limited to: the business value of the 7th quantity obtained divided by the value of the 6th quantity is as each three marginal probability of item under the default Lexical Cohesion relation of correspondence that be triggered, the business that the value of the 8th quantity obtains divided by the value of the 6th quantity is triggered four marginal probability of item under the default Lexical Cohesion relation of correspondence as second of each item correspondence that is triggered.
For the ease of understanding, same with the item that is triggered for vehicle, triggering item is " car ", default Lexical Cohesion pass is the next closing is example, explain calculating each the second process triggering the 4th marginal probability of item under the default Lexical Cohesion relation of correspondence being triggered the 3rd marginal probability of item under the default Lexical Cohesion relation of correspondence and each item correspondence that is triggered, concrete explaination is as follows:
Suppose that in the text of corpus, add up the value with the second quantity of the text of the next relation is 5, wherein, occurred in first text being triggered simultaneously a vehicle with trigger item " car " and meet the next relation, namely the D-goal language vocabulary of item " car " for a vehicle that is triggered is triggered, a vehicle that is triggered only has been there is in second text, triggering item " car " has only been there is in 3rd text, occurred in 4th text being triggered simultaneously a vehicle with trigger item " car " and meet the next relation, a vehicle that is triggered only has been there is in 5th text.
Now, the value can adding up the 7th quantity of the text of a vehicle that obtains occurring being triggered in the text of corpus is 4.In like manner, in the text of corpus, statistics the second value triggering the 8th quantity of the text of item " car " corresponding to a vehicle that obtain occurring being triggered is 3.
Calculate according to the 7th quantity and the 6th quantity three marginal probability of a vehicle under the next relation that be triggered, namely the 3rd marginal probability is 4/5.Four marginal probability of triggering item " car " corresponding to a vehicle that is triggered under the next relation is calculated according to the 8th quantity and the 6th quantity, the 4th marginal probability calculated is 3/5, certainly, other method can also be adopted to calculate each three marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 7th quantity and the 6th quantity according to actual conditions, the present embodiment does not do concrete restriction to this.According to actual conditions can also adopt other computing method according to the 8th quantity and the 6th quantity calculate each item that is triggered corresponding second trigger four marginal probability of item under corresponding default Lexical Cohesion relation, the present embodiment does not do concrete restriction to this.
The present embodiment does not trigger second joint probability of item under corresponding default Lexical Cohesion relation to according to each item and corresponding each second that is triggered, the account form that 3rd marginal probability and the 4th marginal probability calculate the sub-pointwise mutual information that each item and corresponding each second that is triggered triggers between item is done specifically to limit, include but not limited to: the 3rd marginal probability is multiplied with the 4th marginal probability, what obtain after the second joint probability being multiplied divided by the above two is long-pending, and using after the business obtained takes the logarithm as final calculation result, final calculation result is triggered sub-pointwise mutual information between item as each item and corresponding each second that is triggered.Said method can represent with following formula:
PMI ( xRy ) = log p ( x , y , R ) p ( x , R ) p ( y , R )
Wherein, p(x, y, R) represent the second joint probability, x representative triggers item, y represents the item that is triggered, and R representative presets Lexical Cohesion relation, p(x, R) the 3rd marginal probability is represented, p(y, R) represent the 4th marginal probability, PMI(xRy) represent the sub-pointwise mutual information be triggered between a y and triggering item x.
Such as, according to the result of calculation of example in above-mentioned steps, p(x, y, R)=2/5, p(x, R)=4/5, p(y, R)=3/5, now, can PMI(xRy be calculated)=log(5/6).It should be noted that, the truth of a matter of log function can get 2, and can get other numerical value as required, the present embodiment does not do concrete restriction to this yet.
After calculating the sub-pointwise mutual information that each item and corresponding each second that is triggered triggers between item, each the second pointwise mutual information be triggered between item and described second source language vocabulary can be determined according to each item and corresponding each second sub-pointwise mutual information triggered between item that is triggered, to according to each item and corresponding each second sub-pointwise mutual information triggered between item that is triggered, the present embodiment does not determine that the determination mode of each the second pointwise mutual information be triggered between item and described second source language vocabulary does concrete restriction, include but not limited to: determine each pointwise mutual information of item under often kind of Lexical Cohesion relation and between the second source language vocabulary that be triggered according to each item and corresponding each second sub-pointwise mutual information triggered between item that is triggered, each the second pointwise mutual information be triggered between item and described second source language vocabulary is determined according to each pointwise mutual information of item under often kind of Lexical Cohesion relation and between the second source language vocabulary that be triggered.
Wherein, determine that each item that is triggered can include but not limited to the pointwise mutual information of the second source language vocabulary under often kind of Lexical Cohesion relation according to each item and corresponding each second sub-pointwise mutual information triggered between item that is triggered: add up each item corresponding second number triggering item under often kind of Lexical Cohesion relation that is triggered, the each second corresponding under often kind of Lexical Cohesion relation with this item that is triggered for each item that is triggered sub-pointwise mutual information triggered between item is multiplied, the result obtained after being multiplied is carried out extracting operation, using the net result obtained after extracting operation as each pointwise mutual information of item under often kind of Lexical Cohesion relation and between the second source language vocabulary that be triggered.Wherein, the number of times of extracting operation can be the number of each item second triggering item of correspondence under often kind of Lexical Cohesion relation that is triggered of statistics, and the present embodiment does not do concrete restriction to this.For any one be triggered item and any one Lexical Cohesion relation, item and the Lexical Cohesion relation of this being triggered is designated as be triggered item 1 and Lexical Cohesion relation 1 respectively, statistics is triggered item 1 in the second number triggering item corresponding to Lexical Cohesion relation 1 time, by being triggered, item 1 is multiplied at each second sub-pointwise mutual information triggered between item that Lexical Cohesion relation 1 time is corresponding with the item 1 that is triggered, the result obtained after being multiplied is carried out extracting operation, using the net result obtained after extracting operation as the pointwise mutual information of item 1 between Lexical Cohesion relation 1 time and the second source language vocabulary that be triggered.Wherein, the number of times of extracting operation can be the item 1 that is triggered of statistics in the second number triggering item corresponding to Lexical Cohesion relation 1 time, and the present embodiment does not do concrete restriction to this.
For the ease of explaining explanation, to determine one of them item pointwise mutual information wherein under a kind of Lexical Cohesion relation and between the second source language vocabulary that is triggered: the item that this is triggered is designated as the item 1 that is triggered, suppose that vocabulary joining relation is the next relation, if be triggered, the total number of all triggering items of item 1 under the next relation is n, now, by being triggered, item 1 is multiplied with the sub-pointwise mutual information be triggered between all triggering items of item 1 under the next relation, the result obtained after being multiplied is opened n power, using last result of calculation as the pointwise mutual information of item 1 under the next relation and between the second source language vocabulary that be triggered.
Such as, presetting Lexical Cohesion pass is the next relation, the triggering item that a vehicle that is triggered is corresponding in the target language vocabulary that the second source language vocabulary is corresponding under the next relation is " car " and " passenger vehicle ", wherein, the value of the sub-pointwise mutual information PMI be triggered between a vehicle and triggering item " car " is 0.2, the value of the sub-pointwise mutual information PMI be triggered between a vehicle and triggering item " passenger vehicle " is 0.8, now, can calculate the pointwise mutual information of a vehicle under the next relation and between the second source language vocabulary that be triggered is (0.8*0.2) ^0.5=0.4, circular is, be triggered a vehicle and the sub-pointwise mutual information triggered between item " passenger vehicle " are multiplied with the sub-pointwise mutual information be triggered between a vehicle with triggering item " car " and take away again square, accordingly, when a vehicle that is triggered has n to trigger item under the next relation, the sub-pointwise mutual information between a vehicle with each triggering item that is triggered can be multiplied and open n power, thus obtain the pointwise mutual information of a vehicle under the next relation and between the second source language vocabulary that be triggered, n be more than or equal to 1 positive integer.
To according to each pointwise mutual information of item under often kind of Lexical Cohesion relation and between the second source language vocabulary that be triggered, the present embodiment does not determine that the defining method of each the second pointwise mutual information be triggered between item and the second source language vocabulary is done specifically to limit, include but not limited to: each item that is triggered is superposed with the pointwise mutual information between the second source language vocabulary under often kind of Lexical Cohesion relation, using the result after superposition as each the second pointwise mutual information be triggered between item and the second source language vocabulary.For any one item that is triggered, the item that this is triggered is designated as the item 1 that is triggered, by being triggered, item 1 superposes with the pointwise mutual information between the second source language vocabulary under often kind of Lexical Cohesion relation, using the result after superposition as the second pointwise mutual information be triggered between item 1 and the second source language vocabulary.
Such as, preset Lexical Cohesion relation and have two kinds: the next relation and synonymy, the pointwise mutual information of a vehicle under the next relation and between the second source language vocabulary that be triggered is 0.4, pointwise mutual information under synonymy and between the second source language vocabulary is 0.6, and the second pointwise mutual information between the target language vocabulary that a vehicle that now can determine to be triggered is corresponding with the second source language vocabulary is (0.4+0.6)=1.In addition, in advance weight can also be set for often kind of Lexical Cohesion relation, superpose after each item that is triggered is multiplied by weight with the pointwise mutual information between the second source language vocabulary under often kind of Lexical Cohesion relation.Certainly, can also adopt other stacking method according to actual conditions, the present embodiment does not do concrete restriction to this.
204: the first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary.
To according to the first pointwise mutual information corresponding to each target language vocabulary to be selected and the second pointwise mutual information, the present embodiment does not determine that the determination mode of the translation result of the first source language vocabulary does concrete restriction, include but not limited to: be that the first pointwise mutual information and the second pointwise mutual information arrange a weighted value respectively, the first corresponding for each target language vocabulary to be selected pointwise mutual information and the second pointwise mutual information are multiplied by corresponding weight superpose, using the result of calculation that obtains after superposition as metric, the metric that more each target language vocabulary to be selected calculates according to the method described above, using the translation result of target language vocabulary to be selected larger for metric as the first source language vocabulary.
Such as, the first source language vocabulary to be translated is " vehicles ", and corresponding target language vocabulary to be selected is vehicle and transportation, and the weighted value of the first pointwise mutual information is the weighted value of the 0.4, second pointwise mutual information is 0.6.Wherein, the first pointwise mutual information that target language vocabulary vehicle to be selected is corresponding is the 1, second pointwise mutual information is 2, and therefore, the metric of target language vocabulary vehicle to be selected is (1*0.4+2*0.6)=1.6.In like manner, the first pointwise mutual information that target language vocabulary transportation to be selected is corresponding is the 0.8, second pointwise mutual information is 3, and the metric that target language vocabulary transportation to be selected is corresponding is (0.8*0.4+3*0.6)=2.12.The metric of target language vocabulary vehicle and transportation more to be selected, the metric of known target language vocabulary transportation to be selected is larger, therefore, can using the translation result of transportation as the first meta-language vocabulary vehicles to be translated.
The method that the present embodiment provides, by determining at least one target language vocabulary to be selected that the first source language vocabulary to be translated in current text is corresponding, according to corpus determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text and and the second source language vocabulary between the second pointwise mutual information after, the first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary.Owing to have employed the pointwise mutual information between target language end and the pointwise mutual information between source language end to target language end is translated source language vocabulary to be translated simultaneously, therefore, source language translation become the quality translated during target language higher.
Embodiment three
Embodiments provide a kind of machine translation apparatus, this device is for the function performed by terminal in the method that performs above-described embodiment one or embodiment two and provide.See Fig. 3, this device comprises:
Acquisition module 301, for obtaining the first source language vocabulary to be translated in current text;
First determination module 302, for determining at least one target language vocabulary to be selected that the first source language vocabulary is corresponding;
Second determination module 303, for determining the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus;
3rd determination module 304, for determining the second pointwise mutual information between each target language vocabulary to be selected and the second source language vocabulary according to corpus;
4th determination module 305, for determining the translation result of the first source language vocabulary according to the first pointwise mutual information corresponding to each target language vocabulary to be selected and the second pointwise mutual information.
As a kind of preferred embodiment, the second determination module 303, see Fig. 4, comprising:
First determining unit 3031, for using each target language vocabulary to be selected as the item that is triggered, and in the target language vocabulary that the second source language vocabulary is corresponding, determine at least one first triggering item of each item correspondence that is triggered, the first triggering item is target language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
First computing unit 3032, for triggering the sub-pointwise mutual information between item according to each item and corresponding each first that is triggered of corpus calculating;
Second determining unit 3033, for determining the first pointwise mutual information between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary according to each item and corresponding each first sub-pointwise mutual information triggered between item that is triggered.
As a kind of preferred embodiment, the 3rd determination module 304, see Fig. 5, comprising:
3rd determining unit 3041, for using each target language vocabulary to be selected as the item that is triggered, and in the second source language vocabulary, determine at least one second triggering item of each item correspondence that is triggered, the second triggering item is the second source language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
Second computing unit 3042, for triggering the sub-pointwise mutual information between item according to each item and corresponding each second that is triggered of corpus calculating;
4th determining unit 3043, for determining each the second pointwise mutual information be triggered between item and the second source language vocabulary according to each item and corresponding each second sub-pointwise mutual information triggered between item that is triggered.
As a kind of preferred embodiment, the first computing unit 3032, see Fig. 6, comprising:
First computation subunit 30321, triggers first joint probability of item under corresponding default Lexical Cohesion relation for calculating each item and corresponding each first that is triggered according to corpus;
Second computation subunit 30322, for calculating each first marginal probability of item under the default Lexical Cohesion relation of correspondence and second marginal probability of the first triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered according to corpus;
3rd computation subunit 30323, triggers sub-pointwise mutual information between item for triggering each item and corresponding each first that is triggered of first joint probability of item under corresponding default Lexical Cohesion relation, the first marginal probability and the second edge probability calculation according to each item and corresponding each first that is triggered.
As a kind of preferred embodiment, for adding up in the text of corpus, the first computation subunit 30321, occurs that each item that is triggered triggers item and the first quantity of the text of satisfied corresponding default Lexical Cohesion relation with corresponding each first simultaneously; In the text of corpus, statistics has the second quantity that each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each item and corresponding each first that is triggered according to the first quantity and the second quantity and trigger first joint probability of item under corresponding default Lexical Cohesion relation.
As a kind of preferred embodiment, there is the 3rd quantity of the text of each item that is triggered for statistics in the text of corpus in the second computation subunit 30322; There is the 4th quantity of the text of the first triggering item of each item correspondence that is triggered in statistics in the text of corpus; In the text of corpus, statistics has the second quantity that each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each first marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 3rd quantity and the second quantity, and according to the 4th quantity and the second quantity calculate each item that is triggered corresponding first trigger second marginal probability of item under corresponding default Lexical Cohesion relation.
As a kind of preferred embodiment, the second computing unit 3042, see Fig. 7, comprising:
4th computation subunit 30421, triggers second joint probability of item under corresponding default Lexical Cohesion relation for calculating each item and corresponding each second that is triggered according to corpus;
5th computation subunit 30422, for calculating each three marginal probability of item under the default Lexical Cohesion relation of correspondence and four marginal probability of the second triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered according to corpus;
6th computation subunit 30423, calculates each item and corresponding each second that is triggered trigger sub-pointwise mutual information between item for triggering second joint probability of item under corresponding default Lexical Cohesion relation, the 3rd marginal probability and the 4th marginal probability according to each item and corresponding each second that is triggered.
As a kind of preferred embodiment, for adding up in the text of corpus, the 4th computation subunit 30421, occurs that each item that is triggered triggers item and the 5th quantity of the text of satisfied corresponding default Lexical Cohesion relation with corresponding each second simultaneously; In the text of corpus, statistics has the 6th quantity that each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each item and corresponding each second that is triggered according to the 5th quantity and the 6th quantity and trigger second joint probability of item under corresponding default Lexical Cohesion relation.
As a kind of preferred embodiment, there is the 7th quantity of the text of each item that is triggered for statistics in the text of corpus in the 5th computation subunit 30422; There is the 8th quantity of the text of the second triggering item of each item correspondence that is triggered in statistics in the text of corpus; In the text of corpus, statistics has the 6th quantity that each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each three marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 7th quantity and the 6th quantity, and according to the 8th quantity and the 6th quantity calculate each item that is triggered corresponding second trigger four marginal probability of item under corresponding default Lexical Cohesion relation.
The device that the present embodiment provides, by determining at least one target language vocabulary to be selected that the first source language vocabulary to be translated in current text is corresponding, according to corpus determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text and and the second source language vocabulary between the second pointwise mutual information after, the first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary.Owing to have employed the pointwise mutual information between target language end and the pointwise mutual information between source language end to target language end is translated source language vocabulary to be translated simultaneously, therefore, source language translation become the quality translated during target language higher.
Embodiment four
Present embodiments provide a kind of terminal, this terminal may be used for performing the machine translation method provided in above-described embodiment.See Fig. 8, this terminal 800 comprises:
Terminal 800 can comprise RF(Radio Frequency, radio frequency) circuit 110, the storer 120 including one or more computer-readable recording mediums, input block 130, display unit 140, sensor 150, voicefrequency circuit 160, WiFi(Wireless Fidelity, Wireless Fidelity) module 170, include the parts such as processor 180 and power supply 190 that more than or processes core.It will be understood by those skilled in the art that the restriction of the not structure paired terminal of the terminal structure shown in Fig. 8, the parts more more or less than diagram can be comprised, or combine some parts, or different parts are arranged.Wherein:
RF circuit 110 can be used for receiving and sending messages or in communication process, the reception of signal and transmission, especially, after being received by the downlink information of base station, transfer to more than one or one processor 180 to process; In addition, base station is sent to by relating to up data.Usually, RF circuit 110 includes but not limited to antenna, at least one amplifier, tuner, one or more oscillator, subscriber identity module (SIM) card, transceiver, coupling mechanism, LNA(Low Noise Amplifier, low noise amplifier), diplexer etc.In addition, RF circuit 110 can also by radio communication and network and other devices communicatings.Described radio communication can use arbitrary communication standard or agreement, include but not limited to GSM (Global System of Mobile communication, global system for mobile communications), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband CodeDivision Multiple Access, Wideband Code Division Multiple Access (WCDMA)), LTE (Long Term Evolution, Long Term Evolution), Email, SMS (Short Messaging Service, Short Message Service) etc.
Storer 120 can be used for storing software program and module, and processor 180 is stored in software program and the module of storer 120 by running, thus performs the application of various function and data processing.Storer 120 mainly can comprise storage program district and store data field, and wherein, storage program district can store operating system, application program (such as sound-playing function, image player function etc.) etc. needed at least one function; Store data field and can store the data (such as voice data, phone directory etc.) etc. created according to the use of terminal 800.In addition, storer 120 can comprise high-speed random access memory, can also comprise nonvolatile memory, such as at least one disk memory, flush memory device or other volatile solid-state parts.Correspondingly, storer 120 can also comprise Memory Controller, to provide the access of processor 180 and input block 130 pairs of storeies 120.
Input block 130 can be used for the numeral or the character information that receive input, and produces and to arrange with user and function controls relevant keyboard, mouse, control lever, optics or trace ball signal and inputs.Particularly, input block 130 can comprise Touch sensitive surface 131 and other input equipments 132.Touch sensitive surface 131, also referred to as touch display screen or Trackpad, user can be collected or neighbouring touch operation (such as user uses any applicable object or the operations of annex on Touch sensitive surface 131 or near Touch sensitive surface 131 such as finger, stylus) thereon, and drive corresponding coupling arrangement according to the formula preset.Optionally, Touch sensitive surface 131 can comprise touch detecting apparatus and touch controller two parts.Wherein, touch detecting apparatus detects the touch orientation of user, and detects the signal that touch operation brings, and sends signal to touch controller; Touch controller receives touch information from touch detecting apparatus, and converts it to contact coordinate, then gives processor 180, and the order that energy receiving processor 180 is sent also is performed.In addition, the polytypes such as resistance-type, condenser type, infrared ray and surface acoustic wave can be adopted to realize Touch sensitive surface 131.Except Touch sensitive surface 131, input block 130 can also comprise other input equipments 132.Particularly, other input equipments 132 can include but not limited to one or more in physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, control lever etc.
Display unit 140 can be used for the various graphical user interface showing information or the information being supplied to user and the terminal 800 inputted by user, and these graphical user interface can be made up of figure, text, icon, video and its combination in any.Display unit 140 can comprise display panel 141, optionally, the form such as LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) can be adopted to configure display panel 141.Further, Touch sensitive surface 131 can cover display panel 141, when Touch sensitive surface 131 detects thereon or after neighbouring touch operation, send processor 180 to determine the type of touch event, on display panel 141, provide corresponding vision to export with preprocessor 180 according to the type of touch event.Although in fig. 8, Touch sensitive surface 131 and display panel 141 be as two independently parts realize input and input function, in certain embodiments, can by Touch sensitive surface 131 and display panel 141 integrated and realize input and output function.
Terminal 800 also can comprise at least one sensor 150, such as optical sensor, motion sensor and other sensors.Particularly, optical sensor can comprise ambient light sensor and proximity transducer, and wherein, ambient light sensor the light and shade of environmentally light can regulate the brightness of display panel 141, proximity transducer when terminal 800 moves in one's ear, can cut out display panel 141 and/or backlight.As the one of motion sensor; Gravity accelerometer can detect the size of all directions (are generally three axles) acceleration; size and the direction of gravity can be detected time static, can be used for identifying the application (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating) of mobile phone attitude, Vibration identification correlation function (such as passometer, knock) etc.; As for terminal 800 also other sensors such as configurable gyroscope, barometer, hygrometer, thermometer, infrared ray sensor, do not repeat them here.
Voicefrequency circuit 160, loudspeaker 161, microphone 162 can provide the audio interface between user and terminal 800.Voicefrequency circuit 160 can by receive voice data conversion after electric signal, be transferred to loudspeaker 161, by loudspeaker 161 be converted to voice signal export; On the other hand, the voice signal of collection is converted to electric signal by microphone 162, voice data is converted to after being received by voicefrequency circuit 160, after again voice data output processor 180 being processed, through RF circuit 110 to send to such as another terminal, or export voice data to storer 120 to process further.Voicefrequency circuit 160 also may comprise earphone jack, to provide the communication of peripheral hardware earphone and terminal 800.
WiFi belongs to short range wireless transmission technology, and terminal 800 can help user to send and receive e-mail by WiFi module 170, browse webpage and access streaming video etc., and its broadband internet wireless for user provides is accessed.Although Fig. 8 shows WiFi module 170, be understandable that, it does not belong to must forming of terminal 800, can omit in the scope of essence not changing invention as required completely.
Processor 180 is control centers of terminal 800, utilize the various piece of various interface and the whole mobile phone of connection, software program in storer 120 and/or module is stored in by running or performing, and call the data be stored in storer 120, perform various function and the process data of terminal 800, thus integral monitoring is carried out to mobile phone.Optionally, processor 180 can comprise one or more process core; Preferably, processor 180 accessible site application processor and modem processor, wherein, application processor mainly processes operating system, user interface and application program etc., and modem processor mainly processes radio communication.Be understandable that, above-mentioned modem processor also can not be integrated in processor 180.
Terminal 800 also comprises the power supply 190(such as battery of powering to all parts), preferably, power supply can be connected with processor 180 logic by power-supply management system, thus realizes the functions such as management charging, electric discharge and power managed by power-supply management system.Power supply 190 can also comprise one or more direct current or AC power, recharging system, power failure detection circuit, power supply changeover device or the random component such as inverter, power supply status indicator.
Although not shown, terminal 800 can also comprise camera, bluetooth module etc., does not repeat them here.Specifically in the present embodiment, the display unit of terminal is touch-screen display, and terminal also includes storer, and one or more than one program, one of them or more than one program are stored in storer, and are configured to be performed by more than one or one processor.Described more than one or one routine package is containing the instruction for performing following operation:
Obtain the first source language vocabulary to be translated in current text, and determine at least one target language vocabulary to be selected that the first source language vocabulary is corresponding;
Determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus, and determine the second pointwise mutual information between each target language vocabulary to be selected and the second source language vocabulary according to corpus;
The first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary
Suppose that above-mentioned is the first possible embodiment, then, in the embodiment that the second provided based on the embodiment that the first is possible is possible, in the storer of terminal, also comprise the instruction for performing following operation:
Using each target language vocabulary to be selected as the item that is triggered, and in the target language vocabulary that the second source language vocabulary is corresponding, determine at least one first triggering item of each item correspondence that is triggered, the first triggering item is target language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
Trigger the sub-pointwise mutual information between item according to each item and corresponding each first that is triggered of corpus calculating, and determine the first pointwise mutual information between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary according to each item and corresponding each first sub-pointwise mutual information triggered between item that is triggered.
In the third the possible embodiment provided based on the embodiment that the first is possible, in the storer of terminal, also comprise the instruction for performing following operation:
Using each target language vocabulary to be selected as the item that is triggered, and in the second source language vocabulary, determine at least one second triggering item of each item correspondence that is triggered, the second triggering item is the second source language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
Trigger the sub-pointwise mutual information between item according to each item and corresponding each second that is triggered of corpus calculating, and determine each the second pointwise mutual information be triggered between item and the second source language vocabulary according to each item and corresponding each second sub-pointwise mutual information triggered between item that is triggered.
In the 4th kind of possible embodiment provided based on the embodiment that the second is possible, in the storer of terminal, also comprise the instruction for performing following operation:
Calculate each item and corresponding each first that is triggered according to corpus and trigger first joint probability of item under corresponding default Lexical Cohesion relation;
Each first marginal probability of item under the default Lexical Cohesion relation of correspondence and second marginal probability of the first triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered is calculated according to corpus;
Trigger each item and corresponding each first that is triggered of first joint probability of item under corresponding default Lexical Cohesion relation, the first marginal probability and the second edge probability calculation according to each item and corresponding each first that is triggered and trigger sub-pointwise mutual information between item.
In the 5th kind of possible embodiment provided based on the 4th kind of possible embodiment, in the storer of terminal, also comprise the instruction for performing following operation:
Add up in the text of corpus and occur that each item that is triggered triggers item with corresponding each first and meets the first quantity of the text of corresponding default Lexical Cohesion relation simultaneously;
In the text of corpus, statistics has the second quantity that each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each item and corresponding each first that is triggered according to the first quantity and the second quantity and trigger first joint probability of item under corresponding default Lexical Cohesion relation.
In the 6th kind of possible embodiment provided based on the 4th kind of possible embodiment, in the storer of terminal, also comprise the instruction for performing following operation:
There is the 3rd quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 4th quantity of the text of the first triggering item of each item correspondence that is triggered in statistics in the text of corpus;
In the text of corpus, statistics has the second quantity that each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each first marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 3rd quantity and the second quantity, and according to the 4th quantity and the second quantity calculate each item that is triggered corresponding first trigger second marginal probability of item under corresponding default Lexical Cohesion relation.
In the 7th kind of possible embodiment provided based on the embodiment that the third is possible, in the storer of terminal, also comprise the instruction for performing following operation:
Calculate each item and corresponding each second that is triggered according to corpus and trigger second joint probability of item under corresponding default Lexical Cohesion relation;
Each three marginal probability of item under the default Lexical Cohesion relation of correspondence and four marginal probability of the second triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered is calculated according to corpus;
Trigger second joint probability of item under corresponding default Lexical Cohesion relation, the 3rd marginal probability and the 4th marginal probability according to each item and corresponding each second that is triggered to calculate each item and corresponding each second that is triggered and trigger sub-pointwise mutual information between item.
In the 8th kind of possible embodiment provided based on the 7th kind of possible embodiment, in the storer of terminal, also comprise the instruction for performing following operation:
Add up in the text of corpus and occur that each item that is triggered triggers item with corresponding each second and meets the 5th quantity of the text of corresponding default Lexical Cohesion relation simultaneously;
In the text of corpus, statistics has the 6th quantity that each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each item and corresponding each second that is triggered according to the 5th quantity and the 6th quantity and trigger second joint probability of item under corresponding default Lexical Cohesion relation.
In the 9th kind of possible embodiment provided based on the 7th kind of possible embodiment, in the storer of terminal, also comprise the instruction for performing following operation:
There is the 7th quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 8th quantity of the text of the second triggering item of each item correspondence that is triggered in statistics in the text of corpus;
In the text of corpus, statistics has the 6th quantity that each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each three marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 7th quantity and the 6th quantity, and according to the 8th quantity and the 6th quantity calculate each item that is triggered corresponding second trigger four marginal probability of item under corresponding default Lexical Cohesion relation.
Terminal provided by the invention, by determining at least one target language vocabulary to be selected that the first source language vocabulary to be translated in current text is corresponding, according to corpus determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text and and the second source language vocabulary between the second pointwise mutual information after, the first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary.Owing to have employed the pointwise mutual information between target language end and the pointwise mutual information between source language end to target language end is translated source language vocabulary to be translated simultaneously, therefore, source language translation become the quality translated during target language higher.
Embodiment eight
The embodiment of the present invention additionally provides a kind of computer-readable recording medium, and this computer-readable recording medium can be the computer-readable recording medium comprised in the storer in above-described embodiment; Also can be individualism, be unkitted the computer-readable recording medium allocated in terminal.This computer-readable recording medium stores more than one or one program, and this more than one or one program is used for performing the authority querying method realizing multidimensional data by one or more than one processor, the method comprises:
Obtain the first source language vocabulary to be translated in current text, and determine at least one target language vocabulary to be selected that the first source language vocabulary is corresponding;
Determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus, and determine the second pointwise mutual information between each target language vocabulary to be selected and the second source language vocabulary according to corpus;
The first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary.
Suppose that above-mentioned is the first possible embodiment, in the embodiment that the second then provided based on the embodiment that the first is possible is possible, described the first pointwise mutual information determining between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus, comprising:
Using each target language vocabulary to be selected as the item that is triggered, and in the target language vocabulary that the second source language vocabulary is corresponding, determine at least one first triggering item of each item correspondence that is triggered, the first triggering item is target language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
Trigger the sub-pointwise mutual information between item according to each item and corresponding each first that is triggered of corpus calculating, and determine the first pointwise mutual information between the target language vocabulary that each item that is triggered is corresponding with the second source language vocabulary according to each item and corresponding each first sub-pointwise mutual information triggered between item that is triggered.
In the third the possible embodiment provided based on the embodiment that the first is possible, described the second pointwise mutual information determining between each target language vocabulary to be selected and the second source language vocabulary according to corpus, comprising:
Using each target language vocabulary to be selected as the item that is triggered, and in the second source language vocabulary, determine at least one second triggering item of each item correspondence that is triggered, the second triggering item is the second source language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
Trigger the sub-pointwise mutual information between item according to each item and corresponding each second that is triggered of corpus calculating, and determine each the second pointwise mutual information be triggered between item and the second source language vocabulary according to each item and corresponding each second sub-pointwise mutual information triggered between item that is triggered.
In the 4th kind of possible embodiment provided based on the embodiment that the second is possible, describedly calculate each item and corresponding each first that is triggered according to corpus and trigger sub-pointwise mutual information between item, comprising:
Calculate each item and corresponding each first that is triggered according to corpus and trigger first joint probability of item under corresponding default Lexical Cohesion relation;
Each first marginal probability of item under the default Lexical Cohesion relation of correspondence and second marginal probability of the first triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered is calculated according to corpus;
Trigger each item and corresponding each first that is triggered of first joint probability of item under corresponding default Lexical Cohesion relation, the first marginal probability and the second edge probability calculation according to each item and corresponding each first that is triggered and trigger sub-pointwise mutual information between item.
In the 5th kind of possible embodiment provided based on the 4th kind of possible embodiment, described according to corpus calculate each item and corresponding each first that is triggered trigger first joint probability of item under corresponding default Lexical Cohesion relation, comprising:
Add up in the text of corpus and occur that each item that is triggered triggers item with corresponding each first and meets the first quantity of the text of corresponding default Lexical Cohesion relation simultaneously;
In the text of corpus, statistics has the second quantity that each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each item and corresponding each first that is triggered according to the first quantity and the second quantity and trigger first joint probability of item under corresponding default Lexical Cohesion relation.
In the 6th kind of possible embodiment provided based on the 4th kind of possible embodiment, described according to each first marginal probability of item under the default Lexical Cohesion relation of correspondence and second marginal probability of the first triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered of corpus calculating, comprising:
There is the 3rd quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 4th quantity of the text of the first triggering item of each item correspondence that is triggered in statistics in the text of corpus;
In the text of corpus, statistics has the second quantity that each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each first marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 3rd quantity and the second quantity, and according to the 4th quantity and the second quantity calculate each item that is triggered corresponding first trigger second marginal probability of item under corresponding default Lexical Cohesion relation.
In the 7th kind of possible embodiment provided based on the embodiment that the third is possible, describedly calculate each item and corresponding each second that is triggered according to corpus and trigger sub-pointwise mutual information between item, comprising:
Calculate each item and corresponding each second that is triggered according to corpus and trigger second joint probability of item under corresponding default Lexical Cohesion relation;
Each three marginal probability of item under the default Lexical Cohesion relation of correspondence and four marginal probability of the second triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered is calculated according to corpus;
Trigger second joint probability of item under corresponding default Lexical Cohesion relation, the 3rd marginal probability and the 4th marginal probability according to each item and corresponding each second that is triggered to calculate each item and corresponding each second that is triggered and trigger sub-pointwise mutual information between item.
In the 8th kind of possible embodiment provided based on the 7th kind of possible embodiment, described according to corpus calculate each item and corresponding each second that is triggered trigger second joint probability of item under corresponding default Lexical Cohesion relation, comprising:
Add up in the text of corpus and occur that each item that is triggered triggers item with corresponding each second and meets the 5th quantity of the text of corresponding default Lexical Cohesion relation simultaneously;
In the text of corpus, statistics has the 6th quantity that each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each item and corresponding each second that is triggered according to the 5th quantity and the 6th quantity and trigger second joint probability of item under corresponding default Lexical Cohesion relation.
In the 9th kind of possible embodiment provided based on the 7th kind of possible embodiment, described according to each three marginal probability of item under the default Lexical Cohesion relation of correspondence and four marginal probability of the second triggering item under the default Lexical Cohesion relation of correspondence of each item correspondence that is triggered of being triggered of corpus calculating, comprising:
There is the 7th quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 8th quantity of the text of the second triggering item of each item correspondence that is triggered in statistics in the text of corpus;
In the text of corpus, statistics has the 6th quantity that each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each three marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to the 7th quantity and the 6th quantity, and according to the 8th quantity and the 6th quantity calculate each item that is triggered corresponding second trigger four marginal probability of item under corresponding default Lexical Cohesion relation.
The computer-readable recording medium that the embodiment of the present invention provides, by determining at least one target language vocabulary to be selected that the first source language vocabulary to be translated in current text is corresponding, according to corpus determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text and and the second source language vocabulary between the second pointwise mutual information after, the first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary.Owing to have employed the pointwise mutual information between target language end and the pointwise mutual information between source language end to target language end is translated source language vocabulary to be translated simultaneously, therefore, source language translation become the quality translated during target language higher.
Embodiment nine
Embodiments provide a kind of graphical user interface, described graphical user interface is used in terminal, and described terminal comprises touch-screen display, storer and one or more than one processor for performing one or more than one program; Described graphical user interface comprises:
Obtain the first source language vocabulary to be translated in current text, and determine at least one target language vocabulary to be selected that the first source language vocabulary is corresponding;
Determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus, and determine the second pointwise mutual information between each target language vocabulary to be selected and the second source language vocabulary according to corpus;
The first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary.
The graphical user interface that the embodiment of the present invention provides, by determining at least one target language vocabulary to be selected that the first source language vocabulary to be translated in current text is corresponding, according to corpus determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text and and the second source language vocabulary between the second pointwise mutual information after, the first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of the first source language vocabulary.Owing to have employed the pointwise mutual information between target language end and the pointwise mutual information between source language end to target language end is translated source language vocabulary to be translated simultaneously, therefore, source language translation become the quality translated during target language higher.
It should be noted that: the machine translation apparatus that above-described embodiment provides is when becoming target language by source language translation, only be illustrated with the division of above-mentioned each functional module, in practical application, can distribute as required and by above-mentioned functions and be completed by different functional modules, inner structure by device is divided into different functional modules, to complete all or part of function described above.In addition, the machine translation apparatus that above-described embodiment provides and machine translation method embodiment belong to same design, and its specific implementation process refers to embodiment of the method, repeats no more here.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (18)

1. a machine translation method, is characterized in that, described method comprises:
Obtain the first source language vocabulary to be translated in current text, and determine at least one target language vocabulary to be selected that described first source language vocabulary is corresponding;
Determine the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus, and determine the second pointwise mutual information between each target language vocabulary to be selected and described second source language vocabulary according to described corpus;
The first pointwise mutual information corresponding according to each target language vocabulary to be selected and the second pointwise mutual information determine the translation result of described first source language vocabulary.
2. method according to claim 1, is characterized in that, described the first pointwise mutual information determining between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus, comprising:
Using each target language vocabulary to be selected as the item that is triggered, and in the target language vocabulary that described second source language vocabulary is corresponding, determine at least one first triggering item of each item correspondence that is triggered, described first triggering item is target language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
Trigger the sub-pointwise mutual information between item according to each item and corresponding each first that is triggered of described corpus calculating, and determine the first pointwise mutual information between the target language vocabulary that each item that is triggered is corresponding with described second source language vocabulary according to each item and corresponding each first sub-pointwise mutual information triggered between item that is triggered.
3. method according to claim 1, is characterized in that, described the second pointwise mutual information determining between each target language vocabulary to be selected and described second source language vocabulary according to described corpus, comprising:
Using each target language vocabulary to be selected as the item that is triggered, and in described second source language vocabulary, determine at least one second triggering item of each item correspondence that is triggered, described second triggering item is the second source language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
Trigger the sub-pointwise mutual information between item according to each item and corresponding each second that is triggered of described corpus calculating, and determine each the second pointwise mutual information be triggered between item and described second source language vocabulary according to each item and corresponding each second sub-pointwise mutual information triggered between item that is triggered.
4. method according to claim 2, is characterized in that, describedly calculates each item and corresponding each first that is triggered according to described corpus and triggers sub-pointwise mutual information between item, comprising:
Calculate each item and corresponding each first that is triggered according to described corpus and trigger first joint probability of item under corresponding default Lexical Cohesion relation;
Each first marginal probability of item under the default Lexical Cohesion relation of correspondence and second marginal probability of the first triggering item under the default Lexical Cohesion relation of described correspondence of each item correspondence that is triggered of being triggered is calculated according to described corpus;
Trigger each item and corresponding each first that is triggered of first joint probability of item under corresponding default Lexical Cohesion relation, the first marginal probability and the second edge probability calculation according to each item and corresponding each first that is triggered and trigger sub-pointwise mutual information between item.
5. method according to claim 4, is characterized in that, describedly calculates each item and corresponding each first that is triggered according to described corpus and triggers first joint probability of item under corresponding default Lexical Cohesion relation, comprising:
Add up in the text of described corpus and occur that each item that is triggered triggers item with corresponding each first and meets the first quantity of the text of corresponding default Lexical Cohesion relation simultaneously;
In the text of described corpus, statistics has the second quantity that described each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each item and corresponding each first that is triggered according to described first quantity and described second quantity and trigger first joint probability of item under corresponding default Lexical Cohesion relation.
6. method according to claim 4, it is characterized in that, described according to each first marginal probability of item under the default Lexical Cohesion relation of correspondence and second marginal probability of the first triggering item under the default Lexical Cohesion relation of described correspondence of each item correspondence that is triggered of being triggered of described corpus calculating, comprising:
There is the 3rd quantity of the text of each item that is triggered in statistics in the text of described corpus;
There is the 4th quantity of the text of the first triggering item of each item correspondence that is triggered in statistics in the text of described corpus;
In the text of described corpus, statistics has the second quantity that described each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each first marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to described 3rd quantity and described second quantity, and according to described 4th quantity and described second quantity calculate each item that is triggered corresponding first trigger second marginal probability of item under described corresponding default Lexical Cohesion relation.
7. method according to claim 3, is characterized in that, describedly calculates each item and corresponding each second that is triggered according to described corpus and triggers sub-pointwise mutual information between item, comprising:
Calculate each item and corresponding each second that is triggered according to described corpus and trigger second joint probability of item under corresponding default Lexical Cohesion relation;
Each three marginal probability of item under the default Lexical Cohesion relation of correspondence and four marginal probability of the second triggering item under the default Lexical Cohesion relation of described correspondence of each item correspondence that is triggered of being triggered is calculated according to described corpus;
Trigger second joint probability of item under corresponding default Lexical Cohesion relation, the 3rd marginal probability and the 4th marginal probability according to each item and corresponding each second that is triggered to calculate each item and corresponding each second that is triggered and trigger sub-pointwise mutual information between item.
8. method according to claim 7, is characterized in that, describedly calculates each item and corresponding each second that is triggered according to described corpus and triggers second joint probability of item under corresponding default Lexical Cohesion relation, comprising:
Add up in the text of described corpus and occur that each item that is triggered triggers item with corresponding each second and meets the 5th quantity of the text of corresponding default Lexical Cohesion relation simultaneously;
In the text of described corpus, statistics has the 6th quantity that described each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each item and corresponding each second that is triggered according to described 5th quantity and described 6th quantity and trigger second joint probability of item under corresponding default Lexical Cohesion relation.
9. method according to claim 7, it is characterized in that, described according to each three marginal probability of item under the default Lexical Cohesion relation of correspondence and four marginal probability of the second triggering item under the default Lexical Cohesion relation of described correspondence of each item correspondence that is triggered of being triggered of described corpus calculating, comprising:
There is the 7th quantity of the text of each item that is triggered in statistics in the text of described corpus;
There is the 8th quantity of the text of the second triggering item of each item correspondence that is triggered in statistics in the text of described corpus;
In the text of described corpus, statistics has the 6th quantity that described each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item;
Calculate each three marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to described 7th quantity and described 6th quantity, and according to described 8th quantity and described 6th quantity calculate each item that is triggered corresponding second trigger four marginal probability of item under described corresponding default Lexical Cohesion relation.
10. a machine translation apparatus, is characterized in that, described device comprises:
Acquisition module, for obtaining the first source language vocabulary to be translated in current text;
First determination module, for determining at least one target language vocabulary to be selected that described first source language vocabulary is corresponding;
Second determination module, for determining the first pointwise mutual information between the target language vocabulary that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus;
3rd determination module, for determining the second pointwise mutual information between each target language vocabulary to be selected and described second source language vocabulary according to described corpus;
4th determination module, for determining the translation result of described first source language vocabulary according to the first pointwise mutual information corresponding to each target language vocabulary to be selected and the second pointwise mutual information.
11. devices according to claim 10, is characterized in that, described second determination module, comprising:
First determining unit, for using each target language vocabulary to be selected as the item that is triggered, and in the target language vocabulary that described second source language vocabulary is corresponding, determine at least one first triggering item of each item correspondence that is triggered, described first triggering item is target language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
First computing unit, for triggering the sub-pointwise mutual information between item according to each item and corresponding each first that is triggered of described corpus calculating;
Second determining unit, for determining the first pointwise mutual information between the target language vocabulary that each item that is triggered is corresponding with described second source language vocabulary according to each item and corresponding each first sub-pointwise mutual information triggered between item that is triggered.
12. devices according to claim 10, is characterized in that, described 3rd determination module, comprising:
3rd determining unit, for using each target language vocabulary to be selected as the item that is triggered, and in described second source language vocabulary, determine at least one second triggering item of each item correspondence that is triggered, described second triggering item is the second source language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relation;
Second computing unit, for triggering the sub-pointwise mutual information between item according to each item and corresponding each second that is triggered of described corpus calculating;
4th determining unit, for determining each the second pointwise mutual information be triggered between item and described second source language vocabulary according to each item and corresponding each second sub-pointwise mutual information triggered between item that is triggered.
13. devices according to claim 11, is characterized in that, described first computing unit, comprising:
First computation subunit, triggers first joint probability of item under corresponding default Lexical Cohesion relation for calculating each item and corresponding each first that is triggered according to described corpus;
Second computation subunit, for calculating each first marginal probability of item under the default Lexical Cohesion relation of correspondence and second marginal probability of the first triggering item under the default Lexical Cohesion relation of described correspondence of each item correspondence that is triggered of being triggered according to described corpus;
3rd computation subunit, triggers sub-pointwise mutual information between item for triggering each item and corresponding each first that is triggered of first joint probability of item under corresponding default Lexical Cohesion relation, the first marginal probability and the second edge probability calculation according to each item and corresponding each first that is triggered.
14. devices according to claim 13, it is characterized in that, for adding up in the text of described corpus, described first computation subunit, occurs that each item that is triggered triggers item and the first quantity of the text of satisfied corresponding default Lexical Cohesion relation with corresponding each first simultaneously; In the text of described corpus, statistics has the second quantity that described each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each item and corresponding each first that is triggered according to described first quantity and described second quantity and trigger first joint probability of item under corresponding default Lexical Cohesion relation.
15. devices according to claim 13, is characterized in that, described second computation subunit, occur the 3rd quantity of the text of each item that is triggered for statistics in the text of described corpus; There is the 4th quantity of the text of the first triggering item of each item correspondence that is triggered in statistics in the text of described corpus; In the text of described corpus, statistics has the second quantity that described each item and corresponding each first that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each first marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to described 3rd quantity and described second quantity, and according to described 4th quantity and described second quantity calculate each item that is triggered corresponding first trigger second marginal probability of item under described corresponding default Lexical Cohesion relation.
16. devices according to claim 12, is characterized in that, described second computing unit, comprising:
4th computation subunit, triggers second joint probability of item under corresponding default Lexical Cohesion relation for calculating each item and corresponding each second that is triggered according to described corpus;
5th computation subunit, for calculating each three marginal probability of item under the default Lexical Cohesion relation of correspondence and four marginal probability of the second triggering item under the default Lexical Cohesion relation of described correspondence of each item correspondence that is triggered of being triggered according to described corpus;
6th computation subunit, calculates each item and corresponding each second that is triggered trigger sub-pointwise mutual information between item for triggering second joint probability of item under corresponding default Lexical Cohesion relation, the 3rd marginal probability and the 4th marginal probability according to each item and corresponding each second that is triggered.
17. devices according to claim 16, it is characterized in that, for adding up in the text of described corpus, described 4th computation subunit, occurs that each item that is triggered triggers item and the 5th quantity of the text of satisfied corresponding default Lexical Cohesion relation with corresponding each second simultaneously; In the text of described corpus, statistics has the 6th quantity that described each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each item and corresponding each second that is triggered according to described 5th quantity and described 6th quantity and trigger second joint probability of item under corresponding default Lexical Cohesion relation.
18. devices according to claim 16, is characterized in that, described 5th computation subunit, occur the 7th quantity of the text of each item that is triggered for statistics in the text of described corpus; There is the 8th quantity of the text of the second triggering item of each item correspondence that is triggered in statistics in the text of described corpus; In the text of described corpus, statistics has the 6th quantity that described each item and corresponding each second that is triggered triggers the text of default Lexical Cohesion relation corresponding to item; Calculate each three marginal probability of item under corresponding default Lexical Cohesion relation that be triggered according to described 7th quantity and described 6th quantity, and according to described 8th quantity and described 6th quantity calculate each item that is triggered corresponding second trigger four marginal probability of item under described corresponding default Lexical Cohesion relation.
CN201410026026.3A 2014-01-20 2014-01-20 Machine translation method and device Active CN104794110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410026026.3A CN104794110B (en) 2014-01-20 2014-01-20 Machine translation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410026026.3A CN104794110B (en) 2014-01-20 2014-01-20 Machine translation method and device

Publications (2)

Publication Number Publication Date
CN104794110A true CN104794110A (en) 2015-07-22
CN104794110B CN104794110B (en) 2018-11-23

Family

ID=53558908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410026026.3A Active CN104794110B (en) 2014-01-20 2014-01-20 Machine translation method and device

Country Status (1)

Country Link
CN (1) CN104794110B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781662A (en) * 2019-10-21 2020-02-11 腾讯科技(深圳)有限公司 Method for determining point-to-point mutual information and related equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002075586A1 (en) * 2001-03-16 2002-09-26 Eli Abir Content conversion method and apparatus
CN1471029A (en) * 2002-06-28 2004-01-28 System and method for auto-detecting collcation mistakes of file
CN1503161A (en) * 2002-11-20 2004-06-09 Statistical method and apparatus for learning translation relationship among phrases
CN1567297A (en) * 2003-07-03 2005-01-19 中国科学院声学研究所 Method for extracting multi-word translation equivalent cells from bilingual corpus automatically
CN101763402A (en) * 2009-12-30 2010-06-30 哈尔滨工业大学 Integrated retrieval method for multi-language information retrieval
CN102375809A (en) * 2010-08-04 2012-03-14 英业达股份有限公司 System of instantly outputting second language by using input first language and method thereof
CN102486770A (en) * 2010-12-02 2012-06-06 财团法人资讯工业策进会 Character conversion method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002075586A1 (en) * 2001-03-16 2002-09-26 Eli Abir Content conversion method and apparatus
CN1471029A (en) * 2002-06-28 2004-01-28 System and method for auto-detecting collcation mistakes of file
CN1503161A (en) * 2002-11-20 2004-06-09 Statistical method and apparatus for learning translation relationship among phrases
CN1567297A (en) * 2003-07-03 2005-01-19 中国科学院声学研究所 Method for extracting multi-word translation equivalent cells from bilingual corpus automatically
CN101763402A (en) * 2009-12-30 2010-06-30 哈尔滨工业大学 Integrated retrieval method for multi-language information retrieval
CN102375809A (en) * 2010-08-04 2012-03-14 英业达股份有限公司 System of instantly outputting second language by using input first language and method thereof
CN102486770A (en) * 2010-12-02 2012-06-06 财团法人资讯工业策进会 Character conversion method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林晓庆 等: ""基于改进互信息的译文选择技术研究"", 《技术与方法》 *
葛运东 等: ""基于网络的跨语言信息检索中OOV译文挖掘研究"", 《微电子学与计算机》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781662A (en) * 2019-10-21 2020-02-11 腾讯科技(深圳)有限公司 Method for determining point-to-point mutual information and related equipment

Also Published As

Publication number Publication date
CN104794110B (en) 2018-11-23

Similar Documents

Publication Publication Date Title
CN104978115A (en) Content display method and device
CN104063362B (en) A kind of truncation of a string method and device
CN103942113A (en) System restarting reason detection method, device and terminal equipment
CN104572430A (en) Method, device and system for testing terminal application interface
CN105302452A (en) Gesture interaction-based operation method and device
CN105022616A (en) Method and device for generating web page
CN104239343A (en) User input information processing method and device
CN104850406A (en) Page switching method and device
CN103177217B (en) A kind of file scanning method, system and client and server
CN104281600A (en) Method and device for intelligent selecting words
CN107040610A (en) Method of data synchronization, device, storage medium, terminal and server
CN105530239A (en) Multimedia data obtaining method and device
CN104516624A (en) Method and device for inputting account information
CN105512150A (en) Method and device for information search
CN104820546A (en) Functional information exhibition method and apparatus
CN104951637A (en) Method and device for obtaining training parameters
CN104063400A (en) Data search method and data search device
CN104391629A (en) Method for sending message in orientation manner, method for displaying message, server and terminal
CN104901992A (en) Resource transfer method and device
CN103871050A (en) Image partition method, device and terminal
CN104391588B (en) A kind of method of input prompt and device
CN104731782A (en) Information handling method and mobile terminal
CN105095161A (en) Method and device for displaying rich text information
CN104636455A (en) Acquisition method and device for application mapping information
CN104238931A (en) Information input method, information input device and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190730

Address after: Shenzhen Futian District City, Guangdong province 518000 Zhenxing Road, SEG Science Park 2 East Room 403

Co-patentee after: Tencent cloud computing (Beijing) limited liability company

Patentee after: Tencent Technology (Shenzhen) Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518000 Zhenxing Road, SEG Science Park 2 East Room 403

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.