CN112395889A

CN112395889A - Machine-synchronized translation

Info

Publication number: CN112395889A
Application number: CN201910706922.7A
Authority: CN
Inventors: 林超伦
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-08-01
Filing date: 2019-08-01
Publication date: 2021-02-23

Abstract

The present disclosure relates to machine-synchronized translation. A method for converting a first sequence of expressions of a first language into a corresponding second sequence of expressions of a second language is proposed, the first sequence of expressions comprising at least one vocabulary, the method comprising: for each word of the first sequence of expressions entered in sequence, starting from the beginning: identifying attribute identification information of the vocabulary; and translating the recognized vocabulary into an expression of the vocabulary in a second language based on the recognized attribute identification information of the vocabulary and the logical relationship between the recognized vocabulary and a certain number of adjacent vocabularies, such that the vocabularies can be sequentially converted into the expression of the second language from the beginning of the first expression sequence.

Description

Machine-synchronized translation

Technical Field

The present disclosure relates to the field of machine translation. In particular, the present disclosure relates to methods and apparatus for machine-synchronized translation.

Background

Machine translation (hereinafter, machine translation) is an ultimate goal of artificial intelligence in terms of languages in conversation and meeting occasions instead of manual simultaneous transmission, because once manual simultaneous transmission can be replaced, foreign language learning can be replaced, and the future of cross-language conversation by using native language can be realized.

The synchronous transmission refers to that the translation output is basically synchronous with the original speech input during translation. However, the technology of all companies, whether the initial sentence pattern rule method or the later statistical method, is developed at home and abroad, until the present neural network data increasing method is a technology for interpretation, only the accuracy of the translated text is noticed, and the synchronization is not considered, so that the translated version can be determined only by waiting for the whole sentence in the original text to be completely listened, which is completely not in line with the requirement of listening and understanding by people, and even if the translated text is accurate, the translated text can not be used for the same pass.

In view of the above disadvantages, the present disclosure provides ideas and technical solutions for implementing synchronous translation.

Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Likewise, the problems identified with respect to one or more methods should not be assumed to be recognized in any prior art based on this section unless otherwise indicated.

Disclosure of Invention

It is an object of the present disclosure to propose an improved solution for machine-synchronized translation.

In one aspect, the present disclosure presents a translation method for converting a first sequence of expressions in a first language to a corresponding second sequence of expressions in a second language, the first sequence of expressions comprising at least one word, the method comprising: for each word of the first sequence of expressions entered in sequence, starting from the beginning: identifying attribute identification information of the vocabulary; and translating the recognized vocabulary into an expression of the vocabulary in a second language based on the recognized attribute identification information of the vocabulary and the logical relationship between the recognized vocabulary and a certain number of adjacent vocabularies, such that the vocabularies can be sequentially converted into the expression of the second language from the beginning of the first expression sequence.

In another aspect, the present disclosure proposes a translation apparatus for converting a first sequence of expressions in a first language into a corresponding second sequence of expressions in a second language, the first sequence of expressions comprising at least one word, the apparatus comprising: for each word of the first sequence of expressions entered in sequence, starting from the beginning: a recognition unit configured to recognize attribute identification information of the vocabulary; and a translation unit configured to translate the recognized vocabulary into an expression of the vocabulary in a second language based on the recognized attribute identification information of the vocabulary and a logical relationship between the recognized vocabulary and a certain number of adjacent vocabularies, such that the vocabularies can be sequentially converted into the expression of the second language from the beginning of the first expression sequence.

In yet another aspect, there is provided a device comprising at least one processor and at least one storage device having instructions stored thereon, which when executed by the at least one processor, may cause the at least one processor to perform a method as described herein.

In yet another aspect, a storage medium is provided having stored thereon instructions that, when executed by a processor, may cause performance of a method as described herein.

In yet another aspect, an apparatus is presented that includes means for performing a method as described herein.

The technical scheme according to the present disclosure essentially relies on the conversion of language codes to achieve accurate translation, i.e., the code conversion is performed on the basis of vocabularies without considering and understanding the meaning of the original sentence from the sentence level. Thus, accurate and synchronous machine simultaneous translation can be realized.

The translation technology provided by the disclosure adopts the logic analysis of the vocabulary of the language to be translated, and solves the difficulty that machine translation synchronous transmission must be accurate and synchronous. In the method, the vocabulary of the language to be translated can be preprocessed by utilizing the objective rule of the language to be translated, and the logical analysis of the vocabulary (and possibly the relation between the vocabulary and the adjacent vocabulary) can be carried out during translation, so that the accurate paraphrasing of the target vocabulary can be accurately and conveniently obtained, and good simultaneous translation can be realized.

Drawings

A more complete appreciation of the invention and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings.

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention. In the drawings, like numbering represents like items. Wherein:

FIG. 1 is a conceptual diagram of the concept of an existing translation method;

FIG. 2 is a diagrammatic view of the concept of the translation method of the present disclosure;

FIG. 3 is a flow chart of a translation method of the present disclosure;

FIG. 4 is a block diagram of a translation device of the present disclosure;

FIG. 5 is a flow diagram of an exemplary translation process of the present disclosure; and

fig. 6 is a basic environment for implementing the technical solution of the present disclosure.

Detailed Description

Exemplary possible embodiments related to machine-to-machine translation are described herein. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It may be evident, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are not described in detail to avoid unnecessarily obscuring, or obscuring the present invention.

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that like reference numbers and letters in the figures refer to like items and, thus, once an item is defined in a figure, it need not be discussed again with respect to subsequent figures.

In this disclosure, the terms "first", "second", and the like are used merely to distinguish elements or steps, and are not intended to indicate temporal order, preference, or importance.

The present disclosure relates generally to techniques that are automatic, co-propagating translations that may be used, among other things, for translation of a portion of a vocabulary or all of a vocabulary in an input sentence. The technique is particularly well suited for speech translation applications, such as various conference applications and the like, where the speaker speaks while the content spoken by the speaker is translated and communicated to the listener in substantial synchronization. In order to more conveniently understand the technical solution of the present disclosure, some concepts to be referred to in the present disclosure will be described in detail below.

Translation synchronization

The synchronization involved in the machine translation mainly refers to that translation results are output basically synchronously while input. In particular, the translation output only lags behind the input by a certain time, so that it appears at the output that the output is substantially simultaneous with the input. The specific time may be less than or equal to the output's tolerance to time lag, which may depend, for example, on the listener's tolerance to speech delay, typically a short time, but may vary due to language, the listener's habits, and so on.

For example, in the case of a speech input, the translation is delayed by only a certain time relative to the speaker's speech, so that after the speaker speaks, the listener hears the corresponding translation only within the certain time. The translation is also complete as if the original voice had not dropped. This is equivalent to a person-to-person conversation being understood while listening. The following are several exemplary synchronization examples:

translation only lags a certain time (e.g., 1 second) with respect to the speaker's speech, so that the listener begins translation substantially within about 1 second after the speaker begins speaking, which is equivalent to listening in 1 second in the same language; during the period that the speaker continuously speaks, the translation lags behind the speaker in time, which is equivalent to that the speaker does not always understand within 1 second when continuously speaking, and sometimes does not understand immediately, so the time of understanding is slightly longer than 1 second; this situation is obviously substantially equivalent to a substantially synchronous conversation in reality;

as long as the answer is heard immediately after the question in the primitive is asked, the translated language can also hear the answer almost immediately, which is equivalent to answering immediately after the question is heard;

hear the humorous and smile in the primitive language, and start smiling almost immediately in the translation, which is equivalent to hear the humorous and then smile with reaction when in the same language;

if a bilingual meeting discusses you say, i am, everyone knows who is saying what, which is equivalent to you say in the same language, there is little neutral between i am.

However, neither the current knowledge in the machine-turnover industry nor the technological development recognize the importance of translation synchronization, and in particular does not recognize the key that only hysteresis is kept within 1 second to sound like synchronization.

Transcoding

The current translation theory, both international and national, profoundly considers a language to be only a code of meaning, and the translation is for understanding the meaning. Therefore, in the translation process, it is often necessary to first understand the entire sentence of the language to be translated and then express the understood meaning in another language. For example, in English translation, a translator must first learn/decode English sentences, understand the meaning of the sentences, and then express the original meaning in Chinese, so that the meaning in Chinese is the same as the original meaning in English. Fig. 1 shows an overview of the current translation theory.

For example, in the following example, the English vocabulary in the city center is located at the end of a sentence, and the conventional translation theory requires that the word order must be adjusted during Chinese translation and inserted into the middle of a sentence:

english original text: the letter a house in the city center.

Common translations: they bought houses in city centers.

Obviously, according to the existing translation theory, since the word order between the city centre and the bought a house needs to be reversed, the translation must be done after listening to a speech, which inevitably results in that the translation is delayed from the original speech by a large amount of time, and the time or length of at least one speech cannot be transmitted in the same way in the known sense.

Therefore, the current machine translation technology is mainly the technical idea of expressing the accurate meaning based on the whole sentence, the translation is carried out by understanding the meaning of the whole sentence, and because the word sequences of different languages are different, the word sequences are often required to be reversed during translation, the sentences are recombined, and the situation can not realize good synchronization at all, thereby bringing great difficulty to machine-to-machine simultaneous transmission. Moreover, due to the deviation of machine-pass translation cognition and the difference of emphasis, the existing translation theory and technology mainly mix the pass and the transliteration into one talk, and the technology aiming at transliteration is used for the pass, so that the translation always lags behind the whole sentence of the original speech. The ultimate goal of replacing the spoken foreign language with a machine is never achieved regardless of the improvement in accuracy.

The present disclosure proposes a transcoding-based technical concept, i.e. the translation is handled entirely as a direct conversion between codes. Especially, transcoding is performed in units of words constituting sentences. According to the logical analysis proposed by the present disclosure, it is determined how the vocabulary is processed, so that the translation is performed vocabulary by vocabulary according to the order of the input sentence vocabulary, and an accurate translation result is finally obtained substantially simultaneously as the vocabulary input ends, without requiring translation at the sentence level as in the prior art. Fig. 2 illustrates the basic concept of the present disclosure.

It should be noted that the above 'vocabulary' refers specifically to phrases consisting of a specific number of words that are customary in a specific language, in addition to the individual words of a specific language. Therefore, the machine translation technology of the present application can be considered as translating words or phrases, and the words or phrases are described below in terms of words instead of words or phrases.

Still taking the just sentence as an example, the technical solution according to the present disclosure will convert the input words one by one, respectively, so as to obtain the translation result. The following were used:

english original text: the letter a house in the city center.

Common translations: they bought houses in city centers.

The present disclosure: they bought houses, at the market center.

To give another example:

english original text: we have more market share and other company.

Common translations: our market share is greater than any other company.

The present disclosure: we have a lot of market share, surpassing any other company.

Specifically, the technical scheme of the present disclosure is to perform code conversion on vocabularies, perform translation on a vocabulary by vocabulary basis according to the sequence of an input sentence and perform language replacement on the input sentence in situ, that is, the translation starts from the vocabulary at the beginning of the input sentence, and the translation is performed while inputting, so that the vocabulary is in place once, and the generation of a sentence is caused after the translation of the vocabulary is completed one by one, that is, the sentence is the result of successful translation and is not a prerequisite for successful translation. Although the translation result is different from the common version structure, the meaning is accurate, and the translation result accords with the understanding sequence when a listener speaks and is understood while listening, but not understood until the sentence is finished.

It should be noted that although the translated version generated by the technical solution of the present disclosure is spoken rather than written as a result of translation under the guidance of the current translation theory, the feature of the application scenario of the mechanical translation and the traditional language is just spoken, so comparing the spoken version is not a problem. In contrast, the technical solution of the present disclosure can well achieve substantially synchronous translation.

Logical analysis

To achieve translation synchronization and translation accuracy, logical analysis is employed in the techniques of this disclosure. That is, in judging how the vocabulary is translated, the technique of the present disclosure analyzes the current target vocabulary and, if necessary, the logical relationship between the current target vocabulary and the preceding vocabulary or the following vocabulary, and performs accurate and appropriate translation based on the result of such analysis. The analysis result includes, but is not limited to, attribute information of the target vocabulary and the adjacent vocabulary, logical relationship between the target vocabulary and the adjacent vocabulary, and the like. In particular, a logical relationship may refer to part of speech, meaning, role, meaning, morphology, etc. of adjacent words relative to a target word.

Therefore, the translation method based on the logic analysis proposed by the present disclosure performs translation according to the attribute of the vocabulary itself and the logic relationship between the vocabulary and the adjacent vocabulary as necessary, rather than performing translation by human judgment only according to the paraphrase of the vocabulary to be translated in the dictionary as in the prior art.

According to an embodiment, in the logical analysis, the properties of the vocabulary itself of the current target may be analyzed to select the appropriate paraphrase for that vocabulary.

According to embodiments, in the logical analysis, alternatively or additionally, the logical relationship between the current subject vocabulary and the adjacent vocabulary (e.g., preceding or succeeding vocabulary) may be analyzed to select an appropriate translation version for the vocabulary. In the case of analyzing logical relationships, appropriate logical rules/formulas may be selected based on the analyzed results to more accurately and conveniently determine paraphrases of words.

According to an embodiment, in the logical analysis, the relationship between the previous vocabulary and the current target vocabulary is analyzed in most cases. This analysis may also be referred to as a review analysis. Since the words before the analysis are analyzed, there is no need to wait for the words after the analysis, and the effect of synchronous translation with the original language can be achieved. According to the embodiment, the number of preceding words analyzed in the review analysis is not particularly limited, and is preferably about 3.

For example, in English, the logical analysis formula (simplified version) when the word to is encountered is as follows:

search left for 1 word. If words are found which indicate mood, e.g.

Any one of honour/capture/privilege/delay/joy, etc. is:

1. ignoring the target word to

2. Insertion of "can":

It was a privilege to attend the meeting.

fortunately, can attend meetings.

Because of the adoption of the review analysis, accurate translation can be started only by utilizing a plurality of preceding words in the input sentence, and translation is performed while inputting, and translation from beginning to end is completed once without modification. This is equivalent to translating and conveying the content spoken by the speaker to the listener substantially synchronously while the speaker speaks, and is equivalent to listening while understanding when natural language is used. To give a further example:

english original text: we have more market share and other company.

Common translations: our market share is greater than any other company.

The technology comprises the following steps: we have a lot of market share, surpassing any other company.

Wherein, when hearing the than, look back the analysis according to the logical formula, find the adjective before, then according to the logical formula stipulate will translate the than into "exceed".

In another aspect, the logical analysis of the present disclosure may also include for a target vocabulary, not translating the target vocabulary for a while, but rather analyzing the vocabulary or other indications that appear after the target vocabulary in order to achieve better translation. It should be noted that the techniques of this disclosure require analysis back only in a few specific cases, but even then no more than three words at most, to ensure that the lag is no more than 1 second.

In short, the present invention is clearly different from the existing translation theory starting from sentences, but the translation is completely changed into the analysis and processing of the self-attribute of the vocabulary and the relationship with other vocabularies, the logical analysis (preferably, the review analysis) is carried out from the beginning of the input vocabulary, the relationship between the current vocabulary and the previous vocabulary is continuously analyzed according to the input sequence of the vocabulary, and the corresponding logical formula is applied according to the recognition and analysis results, so that the proper translation expression is selected for the vocabulary. Because the sentence structure is not needed to be considered, the language order is not needed to be reversed, the whole sentence is not needed to be input and finished, and the translation is ensured to be accurate and good synchronization is realized.

Applicability of the technique

It should be noted that the current stage of the technology of the present disclosure is to translate text that has been transcribed from speech to text, and does not involve speech transcription. The disclosed technology can be fully combined with technologies such as speech recognition (accurately recognizing speech), speech transcription (expressing the recognized language in words), speech restoration (converting the translated words into speech of a corresponding language for listening by listeners) and the like to replace manual simultaneous transmission. In addition, the technique of the present disclosure can also be applied to transliteration which has no particular requirement on the genre or as the first draft of the transliteration.

Advantageous technical effects

On one hand, the technology of the present disclosure is not restricted by the whole sentence, but processes the words in the sentence as a unit, so that each word of the sentence can be sequentially translated while the sentence is input, and the translation synchronization can be well realized while the accurate translation is realized. In contrast, all the existing techniques have two major drawbacks. The first is to target sentences, which are not desirable because sentences are ever changing in use. This is true of both early logic, grammatical and sentence approaches, later statistical approaches, and today's dominant neural networks and big data approaches. Secondly, the interpretation is wrongly treated as a transliteration, and the product cannot be used for replacing the simultaneous transmission no matter how accurate the product is.

For example, even the products of the top-level company, whether domestic or international, can immediately find that the english transcription is the version that has been transcribed as soon as a long sentence of english is spoken, and the translated version of chinese characters is also continuously modified until the sentence is finished.

The fact that the version is continuously rewritten and continuously corrected is just because the current organic translation technology targets sentences, and the translation version can be determined only after the whole sentence is heard. This is not a problem in transliteration, since transliteration only requires a final version to be seen, but is simply not available for the same pass, since the same pass is interpreted, is constantly heard, and cannot constantly modify what has just been said, while still saying what is now needed. And because of the continuous modification, the user can fix the draft and restore the draft to the voice only when a sentence is spoken, and the translation always lags behind the whole sentence of the original speech, so that the user can never replace the foreign language in the conversation. Most importantly, this technique cannot be used for the same pass, without taking into account the synchronicity between machine translation and language.

In another aspect, the techniques of this disclosure apply a logical analysis of the vocabulary during the translation process. Specifically, the disclosed technology performs logical analysis on the vocabulary to be translated based on the preprocessed vocabulary, so that the translation is successful in most cases in the cases of only the upper context and no lower context, sentences do not need to be considered at all, a neural network is not needed, and large data is not needed. Following listening and translation, the lag is within 1 second, and the translated version is always in place once without modification, which is equivalent to no language barrier in conversation and meeting situations. Therefore, the technology is completely different from the traditional logic method and the probability method for reasoning sentence patterns and sentence structures according to grammar and the popular neural network augmented data method at present.

The implementation of the technical solution of the present disclosure will be described in detail below.

According to an embodiment of the present disclosure, a method for converting a first sequence of expressions of a first language into a corresponding second sequence of expressions of a second language is presented, the first sequence of expressions comprising a plurality of words, the method comprising: for each word of the first sequence of expressions entered in sequence, starting from the beginning: identifying attribute identification information of the vocabulary; and translating the recognized vocabulary into an expression of the vocabulary in a second language based on the recognized attribute identification information of the vocabulary and the logical relationship between the recognized vocabulary and a certain number of adjacent vocabularies, such that the vocabularies can be sequentially converted into the expression of the second language from the beginning of the first expression sequence. That is, as the first expression sequence is input, each input word is sequentially converted into an expression in the second language, so that the first expression sequence can be automatically converted into the second expression sequence already at the end of the first expression sequence.

Fig. 3 shows a flow chart of a method according to the present disclosure. For each vocabulary in the input expression sequence to be translated, in step S301, attribute identification information of the vocabulary is recognized, and in step S302, a translation output of the vocabulary is obtained based on the recognized attribute identification information of the vocabulary and the logical relationship between the recognized vocabulary and a certain number of adjacent vocabularies. Then, further judge whether there is any vocabulary to be translated in the expression sequence, if yes, go back to step S301 for further translation, if not, then obtain the translation output of the whole expression sequence. Therefore, the translation of the present disclosure is performed for vocabulary, and the input vocabulary is sequentially translated along with the input of the expression sequence, and the translation output is basically completed until the input of the expression sequence is completed, so that the basic synchronization can be realized simultaneously in accurate translation.

Preferably, according to the embodiment, the attribute identification information of the vocabulary is set based on the attribute of the vocabulary. By way of example, the attributes of a vocabulary may correspond to the characteristics, functions, roles, usages, morphologies, etc. of the vocabulary, which may be generally determined through statistical analysis of a history of the vocabulary, or may be determined through empirical analytical judgment by a translator. According to an embodiment, the vocabulary may be sorted according to the determined attributes, and attribute identification information is specified for each class of vocabulary.

According to an embodiment, the attribute identification information of the vocabulary may be information regarding at least one of a tag identifying the vocabulary, whether the vocabulary has a corresponding vocabulary formula, a translated version of the vocabulary itself, and the like.

Preferably, according to an embodiment, the version number of the vocabulary may be set based on a history of translation of the vocabulary. For example, the vocabulary can be assigned a corresponding version number by analyzing the common definitions of the vocabulary in the history translation record and, if necessary, the combined usage with other vocabularies. According to an embodiment, the version number of the vocabulary may be set accordingly based on all the different definitions in the existing vocabulary dictionary.

Preferably, according to an embodiment, the logical relationship may comprise a logical relationship between the vocabulary and a predetermined number of vocabularies preceding the vocabulary and/or a predetermined number of vocabularies following the vocabulary. Preferably, according to an embodiment, the logical relationship may refer to an attribute selected from the group consisting of part of speech, word sense, role, morphology of a certain number of adjacent words with respect to the recognized word, and the translation timing and/or translation paraphrase of the recognized word may be determined according to the attribute of the certain number of adjacent words.

Preferably, according to an embodiment, the method may further comprise selecting a corresponding logical formula to translate the vocabulary into the expression of the second language of the vocabulary based on the identified attribute identification information of the vocabulary and the logical relationship between the identified vocabulary and a certain number of adjacent vocabularies. Preferably, according to an embodiment, the logic rule/formula may include at least one of a tag formula and a vocabulary formula. In particular, according to embodiments, the logical rules/formulas may be stored in a database for invocation, for example, may be stored in correspondence with a vocabulary tag, such that upon recognition of the attribute identifying information of the vocabulary, the logical rules/formulas corresponding to the attribute tag may be invoked.

Preferably, according to an embodiment, the attribute identification information is a tag of a vocabulary, and the logic formula is a tag formula. Also, the translation timing and/or translation definitions of the recognized vocabulary may be determined from the logical relationship between the recognized vocabulary and the adjacent vocabulary based on the tag formulas corresponding to the tags of the recognized vocabulary.

Preferably, according to an embodiment, the logical formula is a lexical formula, and the respective lexical formula may be invoked to determine a translation paraphrase for the recognized vocabulary according to a logical relationship between the recognized vocabulary and an adjacent vocabulary.

Preferably, according to an embodiment, the vocabulary in the method of the present disclosure may comprise any of the following:

word elements consisting of a single word;

a phrase element consisting of a specific number of adjacent words; and

a particular combination element consisting of a particular number of adjacent words that follow the same lexical formula.

That is, the vocabulary may be a single word, a phrase, or even a combination of words that satisfy a particular relationship.

Preferably, according to an embodiment, the recognition in the method of the present disclosure may include recognizing word by word starting from the beginning of the sequence to determine whether the word is a component of the phrase. If it is a component of a phrase, the word is temporarily stored without translation and recognition of subsequent words continues until the entire phrase is recognized. Then, for the identified phrase, the appropriate paraphrase is selected. On the other hand, if the word is not a phrase, the word/vocabulary itself is translated. According to embodiments, it may be determined first whether a word conforms to a particular logical rule/formula, for example by identifying tags of the word and applying the corresponding tag formula for translation, or identifying the word formula of the word and translating accordingly if no tag setting exists for the word, or selecting an appropriate translation version based on the logical relationship of the word to an adjacent word. If no appropriate logic rule/formula exists, the common definition of the vocabulary is directly selected for translation. If neither of the above conditions is met, the vocabulary is translated according to its default definitions (e.g., by statistical interpretation of translation history, dictionary definitions, etc.). The above process is sequentially executed for each input vocabulary as the sentence is input until the last vocabulary is successfully translated, so that the final translation of the whole sentence can be obtained.

Fig. 4 illustrates a block diagram of a translation device according to the present disclosure. The apparatus for converting a first sequence of expressions in a first language into a corresponding second sequence of expressions in a second language, the first sequence of expressions comprising at least one word, said apparatus 400 comprising: for each word of the first sequence of expressions entered in sequence, starting from the beginning: a recognition unit 401 configured to recognize the attribute tag of the vocabulary; and a translation unit 402 configured to translate the vocabulary into an expression of the second language of the vocabulary based on the identified property tags of the vocabulary, such that the vocabularies can be sequentially converted into an expression of the second language from the beginning of the first expression sequence.

The device 400 may further include a storage device 403 for storing the translation output of each vocabulary during translation, or for storing the vocabulary that has been recognized for subsequent use in logical analysis, or for temporarily storing the word to be translated for recognizing whether the word is a phrase or a combination that meets certain vocabulary rules.

The device 400 may also include a storage device for storing tag formulas and/or lexical formulas. By way of example, the storage device stores tag formulas and/or lexical formulas for each vocabulary, and corresponding paraphrases or relationships of the vocabulary to the paraphrases, and so on, such that by way of the storage device, various formulas may be retrieved from the storage device during translation to obtain corresponding translated paraphrases. Note that the storage device may be the storage device 403 itself, or may be different from the storage device 403. As an example, the tag formula, the vocabulary formula, and the correspondence/paraphrase, etc. may be stored in separate storage devices, or may be stored in the same storage device.

It should be noted that the above-mentioned storage devices are not necessarily included in the device 400, but may be located outside the device 400.

It should be noted that the above units/logic modules are only divided according to the specific functions implemented by the units/logic modules, and are not used for limiting the specific implementation manner, and may be implemented in software, hardware or a combination of software and hardware, for example. In actual implementation, the above units/modules/sub-units may be implemented as separate physical entities, or may also be implemented by a single entity (e.g., a processor (CPU or DSP, etc.), an integrated circuit, etc.). Furthermore, the various elements described above are shown in dashed lines in the figures to indicate that these elements may not actually be present, but that the operations/functions that they implement may be implemented by the processing circuitry itself.

It should be noted that various operations of the translation method according to the embodiment of the present disclosure may be performed by the translation apparatus of the embodiment of the present disclosure, for example, by corresponding units in the translation apparatus, and may also be performed by other processing devices.

An exemplary implementation of a translation operation according to the present disclosure will be described in further detail below. FIG. 5 schematically illustrates a flow diagram of translation operations according to the present disclosure. In which a series of operations is performed for each word in an input sentence. Specifically, in step S501, it is determined whether the vocabulary is a word or a phrase. If the word is then passed to step S502 to identify the tags of the vocabulary and the corresponding logical formula is applied in S503 to obtain an accurate translation corresponding to the word in S504. On the other hand, if the translation is a phrase, the translation is directly obtained. And after the word is processed, outputting the translation corresponding to the word, judging whether the remaining words are not translated, and if so, repeating the operation processing on the remaining words until all the words are translated. At this time, the inputted sentence is also translated and outputted.

Where pre-processing (e.g., stored in a database) may have been performed on the words in the language to be translated before determining whether the words are words or phrases, the determination and recognition operations are performed based on the words that have been pre-processed.

It should be noted that in the case where the logic formulas include a tag formula and a lexical formula, the tag formula may be executed first and then the lexical formula, but it should be understood that this order of execution of the tag formula and the lexical formula is merely exemplary and not limiting. According to embodiments, the tag formula and the lexical formula may also be performed in parallel, or even the lexical formula may be performed first and then the tag formula.

Implementations of various details in the translation operations of the present disclosure are described in further detail below.

Preprocessing of words

In order to achieve transcoding more accurately and conveniently, specific preprocessing of the vocabulary of the language to be translated is proposed in the present disclosure. The pre-processing of the vocabulary may already be done prior to vocabulary recognition. This preprocessing operation may be performed by the device used to perform the translation, or by another device. The results of the pre-processing may be stored in any accessible device or database for retrieval during the translation process.

According to an embodiment, the preprocessing of the vocabulary in the present disclosure may include performing specific classification and identification on the vocabulary according to the attributes of the vocabulary, so that the vocabulary translation can be performed more timely and accurately according to the identification information of the recognized vocabulary during the translation process.

Since vocabularies may have individual characteristics in a language, they serve specific functions and roles in sentences, involving different usages. These properties, functions, actions, usages, and morphologies all directly affect the translation of words into sentences. Therefore, these characteristics, functions, actions, usages, and the like need to be considered in order to achieve more accurate translation. In the present disclosure, characteristics, functions, actions, usages, and the like of words are considered as attributes of the words.

According to an embodiment, the properties of the vocabulary may be determined by analyzing the characteristics, functions, roles, usage, morphology, etc. of the vocabulary in the language to be translated. According to the embodiment, the attribute may be set manually by an operator or may be set according to a history translation record of the vocabulary, and is preferably set correspondingly by analyzing and counting characteristics, functions, roles, usages, forms, and the like of the vocabulary in the history translation record. In this regard, the vocabulary attributes belong to the statistics of the vocabulary history, which should be objective, following the objective rules of language translation.

In this way, the vocabulary may be logically analyzed and translated as appropriate during machine translation based on the identifying information of the vocabulary, thereby substantially synchronizing and accurately determining translation paraphrases. In contrast, in the current translation languages, only according to the correspondence between a certain language given in a dictionary or a dictionary and another language, a word is translated in the translation process according to the correspondence, and how to translate the word needs to be manually judged by a translator. Thus, translation cannot be performed directly and synchronously.

According to an embodiment, the identification information of the vocabulary may include at least one of a tag of the vocabulary, a version number of the vocabulary, and the like.

According to an embodiment, preprocessing of the vocabulary in the present disclosure may include assigning specific tags to the vocabulary based on attributes of the vocabulary. The setting of the label is based on the analysis statistical result of the attribute of the vocabulary, so that the set label can be more helpful to accurately and comprehensively reflect the translation of the vocabulary and improve the translation quality. According to an embodiment, a tag may be indicated with a certain number of bits. In particular, by identifying the tags of the words to be translated, and if necessary the tags of the adjacent words of the words to be translated, and their interrelationships with the words to be translated, a particular logical analysis (e.g., logical analysis rules/formulas as described below) is followed to select the appropriate paraphrases for the words to be translated.

According to an embodiment, preprocessing the vocabulary may include setting corresponding translation version numbers for the vocabulary, so that an appropriate version number may be selected according to logic rules during translation, thereby providing translation quickly and accurately. Depending on the embodiment, the version number may be indicated with a certain number of bits.

From the perspective of artificial intelligence, the preprocessing process belongs to a data training process, and translation efficiency and accuracy can be remarkably improved.

Vocabulary label

Vocabulary tag settings and applications in accordance with embodiments of the present disclosure will be exemplarily described below.

In the present disclosure, for a vocabulary with a translated language, attributes of the vocabulary are considered to set corresponding tags for the vocabulary, and the attributes of the vocabulary may indicate any of characteristics (e.g., word sense, part of speech, etc.), functions, roles, usages, morphologies, etc. of the vocabulary. It should be noted that the analysis of the properties of the words may correspond in a sense to the classification of the words, i.e. the words are classified according to their properties and a respective label is set for each class of words.

According to embodiments, the properties of the vocabulary may indicate some inherent characteristics of the vocabulary itself, in which case the tags of the vocabulary may be set to indicate such inherent characteristics of the class of vocabulary itself, such as part of speech, number of words, and so on. It should be noted, however, that the vocabulary of tags may not merely indicate these properties, as in a dictionary, but may further summarize and summarize these characteristics, making the tags more concise and convenient for logical analysis.

For example, the tags may relate to characteristics of the vocabulary, such as part of speech, unit number, morphology of the vocabulary. As an example, a tag indicating a singular number referring to humans and animals is NNSO and a tag referring to a plural number referring to humans and animals is NNPSO. For example, for a fixed part of speech, PREN indicates articles or the like, VBA indicates verb aids or the like, NN indicates nouns or the like, and so on. For example, for a morphology of vocabulary, PAST may indicate when the vocabulary was in the PAST.

According to embodiments, tags may relate to the function, effect, usage, etc. of a vocabulary. Thus, a label may be set for a class of words having the same or similar function, effect, usage, etc.

Example one: TM stands for all time-related words, phrases and expressions; MD represents the modification relationship:

Many people joined in ^TMduring a week ^MDof intense debates.

example two: MDNUL stands for words that elicit clauses but are not translated:

There are many books ^MDNULthat their children liked.

there are many books that their children like.

Example three: according to the English grammar, the verb adding can be the tense in the verb and can also be the adjective in the present participle. However, in the present disclosure, a tag PRES is used, because although it is different in part-of-speech in the grammar and needs to be treated separately, it is considered in the present disclosure to serve only one function, so the tags are the same. Please see the comparison of the labels of verb plus ing in the two example sentences:

They have been ^PRESworking hard.

There is reward for ^PRESworking hard.

according to an embodiment, the tags of the vocabulary also include dynamic tags and static tags that primarily indicate whether the attributes of the vocabulary, including characteristics, functions, roles, usage, etc., are fixed or changing.

Static tags generally indicate that the properties of the vocabulary, including properties, functions, roles, usage, etc., are fixed during the translation process. As an example, the label for the English word man is always NNSO, while the label for the word scope is always NNPSO. For static tags, their corresponding paraphrases may be translated directly, or their paraphrases may be determined using tag formulas or lexical formulas, etc., as described below.

Many english words have varying attributes such as properties, functions, actions, usage, etc. For example, the part of speech of a vocabulary may have variations: the word clean can be an adjective or a verb; the words plan, design and work may be nouns or verbs. Therefore, in the present disclosure, a dynamic tag may be set for such dynamic vocabulary, and is denoted by DYN. DYN is an abbreviation of English word dynamic, namely, the word with the label can play different functions in different sentences or word collocations and is a dynamic label.

In the translation process, a specific tag formula can be applied after the part-of-speech tag of the target vocabulary is recognized to accurately determine the part-of-speech and the translation of the target vocabulary. For example, in the case where the vocabulary's clean tags are variable, the associated tag formula is as follows (simplified version, the actual formula is more complex than shown here):

1. search left from target tag

2. If the PREN tag (i.e., article), then translate to "clean";

example sentence: the y have a clean room.

3. If not, translate to "clean".

Example sentence: the way to clean the room.

According to an embodiment, for a vocabulary with dynamic tags, it may be indicated by DYN plus a different number, thus indicating the translation rule/formula to which the vocabulary corresponds. As an example, different dynamic tag numbers may be specified for a vocabulary, depending on the dynamic properties of the vocabulary. For example, dynamic parts of speech, dynamic effects, dynamic usages, etc. may correspond to different dynamic tag numbers, respectively. For example, a class of words having different parts of speech may be assigned a tag formula of DYN plus a particular number, and a class of words having different roles or functions may be assigned a tag formula of DYN plus another number.

As an example, in a dynamic lexicon, the label of will is DYN4, and the label of plan, design, work is DYN 5. Therefore, according to the present disclosure, when a label of DYN4 or DYN5 is encountered during translation, it is calculated what function and part of speech, when the word should be translated, into what chinese, etc. is in the word according to the logic rules in the corresponding logic rules (formula DYN4 and formula DYN 5). The following are each illustrated in a simplified manner, the actual formula being more complex than:

DYN4 formula

1. Search for 1 word to the left

2. If PREN (article) is found, then translation is by noun:

They found the lost will.

3. if not, according to the auxiliary verb translation: the y wire find it.

DYN5 formula

1. Search for 1 word to the left

2. If PREN is found, the invented method^PRENthe plan，

Then turn over by noun: they like this plan.

3. They if VBA (verb-assisted) is found^VBAshould design it now.

Pressing a word turn: they should now be designed.

4. If the noun is found: the^NNconstruction design started，

The next step is performed.

5. Search for 1 word to the left

6. If PREN is found, turn by noun: the architectural design began.

It should be noted that the above exemplified tags are only exemplary, and the corresponding tags may be set according to the properties of words, including parts of speech, roles, usages, and the like.

It should be noted that the tags of words as proposed in the present disclosure are primarily intended to be able to accurately indicate the attributes of a class of words, so that it is possible to automatically, quickly and accurately determine, from their word tags, which way to translate and to interpret for that class of words during the translation process. As an example, during translation, if the tag indicates that the word does not need to be translated, it may be skipped directly. If the tags indicate that the word requires translation, translation is performed, and the paraphrase of the word is accurately determined from the tags of the word using a logical formula, as described above, without the need for manual judgment and selection as in the prior art.

Version number

The version number is the number of different translated versions of the vocabulary, and there are as many numbers as there are translated versions of all the vocabulary in the lexicon, e.g. from 1 to 15. According to an embodiment, the version numbers are set for each vocabulary, intended to accurately indicate the various translation definitions possible for each vocabulary itself. For a vocabulary, a version number corresponds to a paraphrase. Many logic formulas contain rules for judging the translation version number of the vocabulary, so that an algorithm can determine which translation version is used in the process of executing the logic formulas, and the problem that the Chinese translation is different in different contexts for one word with multiple meanings or the same English word is solved in a large quantity. According to an embodiment, the version number of the vocabulary and the corresponding rule for judging the translation version number of the vocabulary can be correspondingly stored in the database.

According to the embodiment, the translation version of the vocabulary is not the translation given by a simple dictionary, but the corresponding translation version is set for the specific vocabulary by analyzing and counting the translation usage of the specific vocabulary in the translation record. This is also a statistical information, which is obtained according to objective rules. And in the translation process, in the case that the vocabulary is judged to have a plurality of translation versions, the logical relationship between the vocabulary and the adjacent vocabulary is analyzed to judge the proper translation version of the vocabulary, and the translation of the vocabulary is accurately obtained according to the determined translation version.

For example, the English word for needs to be translated into different Chinese vocabulary for different uses. Therefore, in the present disclosure, each version of the english word for is analyzed and counted, and a corresponding version number is set. Thus, during translation, a more appropriate translation may be selected by identifying and selecting the appropriate tag (i.e., version number) for the english word for.

The following two simplified algorithm examples are presented to illustrate how to select the appropriate paraphrase after identifying the subject word for having a version number:

example 1

1. Searching 1 word from the target word to the left

2. If nouns are found, such as demand/approve/need/requirement, etc., then version 5 "pair" is matched:

example (c): demand for export contacts to rise.

The demand for outlets continues to rise.

3. If a single verb is found, such as is/was/wee/are/be/stand/stood, version 6 "representative" is matched.

Example 2

The‘F’is for father.

F represents the father.

It’s S for sugar.

Is S represents a sugar.

In the present disclosure, one special case of a version number is a default version, which is a translated version without any reference (i.e., in the context of the translation referred to in the translation). For example, a default version may also be considered a version number of a word having a particular value, in a sense, such as version 0.

According to embodiments of the present disclosure, the default version is not the most common version used in dictionaries or in the prior art, but is the version used when there is no lexical formula. All English words in the dynamic word stock in the application have respective default versions, and the default versions are the versions which are repeatedly verified and accumulated in the interpretation practice of more than ten years by the inventor and have the widest application range, and are also the versions which are identified by analyzing and counting a large amount of historical data by the inventor.

For example, the chinese translation of english policy may be a policy or policy in insurance. The default version of policy is set as the policy in the vocabulary library. Then, in the translation process, the above (previous vocabulary) logical relationship of policy in the input statement is analyzed to automatically find out whether the input statement needs to be translated into other paraphrases besides the policy.

According to the embodiment, version information of each vocabulary can be retrieved by default, so that the information storage amount can be saved appropriately.

According to an embodiment, for a vocabulary, specific information may be employed to identify whether the vocabulary has multiple versions, such as indicated by a binary bit, where 1 represents multiple versions and 0 represents none. Therefore, the version situation of the vocabulary can be determined by detecting the version identification information in the translation process, so that the translation efficiency is improved appropriately.

According to embodiments, version information for a vocabulary may be considered to be encompassed in a vocabulary tag such that translation according to the tag will automatically identify the version information for the vocabulary. For example, for the DYN tag, version information can be automatically determined for the vocabulary identified by the tag, so that the part of speech of the vocabulary can be determined and the paraphrase of the vocabulary can be automatically determined at the same time, and thus the translation of the vocabulary can be determined quickly and accurately.

According to an embodiment, version information for the vocabulary may be used in conjunction with the aforementioned vocabulary tags. In particular, the tags may be determined first during translation, and then version information of the vocabulary may be determined during translation according to the tags, thereby further determining an appropriate version number of the vocabulary. For example, as in the DYN tag translation, after the part of speech and the like of the vocabulary corresponding to the DYN tag are appropriately determined, the most appropriate translation version number of the vocabulary is determined by logical analysis, and the most appropriate paraphrase is determined.

Preferably, according to an embodiment, the version number of the vocabulary may be incorporated into a vocabulary formula described later, i.e., a logical formula corresponding to the vocabulary formula may include determining the version number of the vocabulary by determining the relationship with adjacent vocabulary, thereby determining the paraphrasing of the vocabulary. The lexical formula will be described in detail below.

Dynamic word stock

The vocabulary tags set according to the vocabulary attributes may be stored in a dynamic lexicon such that the vocabulary tags may be correspondingly retrieved from the dynamic lexicon during the translation process for recognition and logical analysis. In general, a dynamic lexicon is a database, which has two modules: a word library and a phrase library. As the name implies, the former contains only words and the latter contains only phrases, which are collectively referred to as dynamic word stock or lexicon hereinafter. As an example, in a dynamic vocabulary library, all stored vocabularies, in particular words, are preprocessed, with labels and corresponding paraphrases.

According to an embodiment, tag formulas and/or lexical formulas of the vocabulary and the like may be stored in the dynamic thesaurus. According to further embodiments, the correspondence between the vocabulary and the tag formulas and/or vocabulary formulas, etc. may be stored in a dynamic thesaurus, while the tag formulas and/or vocabulary formulas, etc. of the vocabulary are stored in a further database, such that during operation the mapping to the further database may be performed according to the tags, etc. of the vocabulary to select the corresponding formula.

The word stock under the conventional concept (hereinafter referred to as conventional word stock) is a vocabulary list of English-Chinese contrast, and is a fixed entry, a fixed definition and a translation version, and words with multiple parts of speech only list various parts of speech, and the human brain must determine what part of speech a word is in a specific sentence, which definition is applicable and which translation version is appropriate. Furthermore, the separate listing of words and phrases also requires the human brain to determine when a word is present and when a phrase is present. The conventional word stock cannot automatically judge the part of speech and the phrase, and cannot automatically provide a translation version. That is, the conventional lexicon is static, is only the correspondence between the translation version of the vocabulary and the expression of the vocabulary, only provides various options, and completely depends on the human brain to judge which is applicable.

In contrast, the tag setting of the present disclosure considers the attributes expressed by the vocabularies in the translation process, including, for example, part of speech, word meaning, action, common expressions, etc., so that more comprehensive and more diverse information can be stored for each vocabulary in the dynamic lexicon, and the retrieval from the dynamic lexicon is directly facilitated in the translation process, so that a suitable paraphrase can be automatically and accurately provided without human judgment.

According to an embodiment, the dynamic thesaurus according to the present disclosure may further contain the following: translated versions of the vocabulary (version numbers are correspondingly provided), including in particular default versions of the vocabulary (referred to as default versions, which may also be provided with version numbers having particular values).

Logical analysis

According to an embodiment, with logical analysis, each vocabulary starting from the beginning vocabulary of the input sentence can identify the attributes of the vocabulary itself (e.g., identify the tags of the vocabulary), and then determine whether to translate directly or to further analyze the logical relationship between the vocabulary and the adjacent vocabulary based on the identified attributes of the vocabulary. In the case of needing to analyze the logical relationship, a specified number of words are searched from the target word to the left (i.e. the word that has been heard) or to the right (i.e. the word that is heard next) frequently, and then the corresponding logical formula/rule is selected according to the attribute of the target word itself and the condition satisfied by the search result (for example, the searched word has a specific part of speech, a specific type of word sense, a specific type of action, and the like, which can be indicated by a tag or other information), so as to accurately determine the position of the target word in the translated text sentence and its paraphrase, and output the paraphrase, and then similarly translate the next word until the translation is completed at the almost same time as the input of the sentence. Wherein the number of words searched to the left is not particularly limited, preferably not more than 3; searching to the right does not exceed 3 words, which is key to ensure that the lag is within 1 second.

This improved translation method based on lexical logic analysis can be considered as an algorithm improvement process in artificial intelligence technology, and applies set data attributes to perform proper translation on input sentences for processing algorithm improvement.

According to embodiments, a logical rule/formula may refer to a rule/formula that specifies how translation is to be performed based on vocabulary attributes and/or logical relationships between the vocabulary and adjacent vocabularies. In particular, the logical rules/formulas may specify when and/or what definitions the vocabulary translates for a particular vocabulary attribute and/or logical relationship.

According to an embodiment, a logical rule/formula may refer to a rule/formula that selects a lexical paraphrase based on properties of the vocabulary itself. In particular, the definitions of the vocabulary may be selected directly based on its tags, such as the aforementioned tags of part of speech, role, etc. As an example, a corresponding version or default version of a vocabulary may be selected directly based on its tags, such as the static tags described above, tags indicating static parts of speech, roles, functions, etc.

According to the embodiment, the logic rule/formula can refer to accurately judging the proper translation mode of the vocabulary to be translated and selecting proper paraphrasing according to the logic relationship between the vocabulary to be translated and the adjacent vocabulary. Preferably, the logical relationship may refer to an attribute of an adjacent word, such as part of speech, word sense, role, etc., as compared to the word to be translated, and thus the logical rule/formula may refer to a logical rule/formula corresponding to the detected attribute of the adjacent word, so that an appropriate paraphrase may be determined for the word to be translated based on the attribute of the adjacent word.

In the present disclosure, the logic rules/formulas are set by the inventor through analyzing and counting a large number of translation history records, and are the result of objective analysis, and follow the objective rules in language translation.

According to embodiments, the logic rules/formulas in the present disclosure may include tag formulas, may also include lexical formulas, and the like. All of which pertain to exemplary implementations of the logic analysis of the present disclosure. The logical analysis is not limited thereto, and any rule/provision that analyzes the logical relationship between the target vocabulary and the adjacent vocabulary so as to accurately obtain the paraphrase of the target vocabulary belongs to the logical analysis defined in the present disclosure. It should be noted that the logic rules/formulas, etc. in the present disclosure may be stored in a corresponding database and invoked directly from the database during the translation process.

The tag formula is made up of a set of logical inference formulas that indicate how the translation operation is to be performed when a particular tag is encountered. The label formula mainly aims to determine the translation opportunity more accurately and properly, so that the problem of the word sequence can be effectively solved, and the paraphrase of the vocabulary is accurately determined on the premise of basically meeting the synchronism. In particular, the tag formula may specify how to translate when certain conditions are met (e.g., certain relationships are met with a certain number of adjacent words, any of the certain number of adjacent words are a particular type of word, etc.) for a particular tag. Thus, in the translation process, when a word with a specific tag is encountered, the word and/or the relationship to adjacent words can be analyzed with reference to the specification of the corresponding tag formula, and the translation is performed accordingly when a specific condition is satisfied.

According to embodiments, the tag formula may indicate that when a particular tag is encountered, the relationship between the word and a particular number of adjacent words (e.g., preceding or succeeding words) is analyzed, preferably the part of speech, word sense, function, role, etc. of the adjacent words (e.g., as may be indicated by the tag or other information) and how the words are translated based on the analysis results, e.g., whether it is decided to translate now or later. If translated later, where later. In particular, for a particular tag, the tag formula may specify that when the target vocabulary and a particular number of adjacent vocabularies satisfy a particular condition (e.g., any vocabulary in the particular number of adjacent vocabularies is a vocabulary with a particular attribute (e.g., part of speech, word sense, function, etc.), with a particular tag or with other information), the translation is temporarily not to be translated until the forward or backward detection of the particular vocabulary triggers the translation and places the translation paraphrase in place; or calling a corresponding vocabulary formula; or omit translation; or translated into the appropriate definitions.

Moreover, the label formula can solve the problem of word ambiguity to a certain extent. In particular, the paraphrase of the vocabulary to be translated can be accurately determined by analyzing the attributes of the adjacent vocabulary and the like. For example, in the case of a vocabulary tag being DYN as described above, the part of speech of the vocabulary can be determined accurately by means of the corresponding tag formula according to the attributes of the neighboring vocabulary, thereby determining the paraphrase of the vocabulary, which can also be accurate in the case of a vocabulary with dynamic translation results.

Some exemplary tag formulas of the present disclosure for different tags are exemplarily shown above and in the following tables, but it should be noted that these are merely exemplary and the tag formulas of the present disclosure are not limited thereto.

The vocabulary formula consists of a logical inference formula of the vocabulary that indicates how the translation operation is to be performed when a particular vocabulary is encountered. Lexical formulas primarily specify which definitions are to be selected when the vocabulary itself and/or the relationships to adjacent vocabularies satisfy which conditions. Thus, in the translation process, when a word having a particular meaning is encountered, the meaning of the word and/or the relationship to adjacent words can be analyzed with reference to the specification of the corresponding lexical formula, and the translation can be performed accordingly when certain conditions are met. Therefore, the translation version of the vocabulary which should be used here can be judged according to the logic analysis result, and the problem of word ambiguity is solved.

In particular, the lexical formula may indicate that when a particular word is encountered, the relationship between that word and a particular number of adjacent words is analyzed, and the most appropriate paraphrase is selected based on the analysis. In particular, the relationship may refer to word senses, parts of speech, functions, roles, etc. of a certain number of adjacent words, e.g., for a certain word, the corresponding lexical formula may indicate that a certain number of words or symbols preceding or following the word are analyzed, and that an appropriate translation version is selected when a certain condition is satisfied (e.g., any one of the certain number of words has a certain word sense, part of speech, function, role, etc., the symbol being a certain type of symbol), thereby determining an appropriate paraphrase. The term symbol, in the context of actual simultaneous interpretation, may refer to a specific dwell time.

For example, the english word back needs to be translated into "back" in the first sentence and "back" in the second sentence:

They were back.

they come back.

They were back in Beijing.

They returned to Beijing.

By way of example, the vocabulary formula associated with the english vocabulary of back is as follows (simplified version, the actual formula is more complex than shown here):

1. search right from the target word back

2. If a period is found (corresponding to a particular pause time in spoken language), then translation is "back";

3. if a preposition is found, it is translated "back to".

As an example, the foregoing set of version numbers for words can be considered a special implementation of a lexical formula. That is, the vocabulary to which the version information is set may be complied with the vocabulary formula so that, in the translation process, the relationship between a certain vocabulary and an adjacent vocabulary may be judged first to judge the version number of the vocabulary, and translation may be performed according to the version number of the vocabulary. As another example, the foregoing set of version numbers for words may be applied in conjunction with a lexical formula. The version number of the vocabulary and the associated paraphrases are first determined. If the vocabulary does not have a specific version number or the logical analysis of the vocabulary and adjacent vocabulary does not correspond to the relevant logic of the vocabulary version number, the vocabulary formula is further selected for paraphrasing determination.

The vocabulary formula and the previous tag formula play a role in accurately judging how each English vocabulary is translated in a sentence through strict logic analysis and calculation, such as analyzing the attributes of the vocabulary itself and the interrelation between the vocabulary and adjacent vocabularies. For example, in the above example, there is no grammar or lexical specification on how the back should be translated, and manual translation has to be solved by relying on thinking.

Some exemplary logic rules/formulas of the present disclosure for different tags have been exemplarily shown above, and it should be noted that the exemplary logic rules/formulas of the present disclosure are not limited thereto, but may contain more other logic rules/formulas.

The terms used in the present application are merely expressions, and the present invention is intended to represent the essential contents of concepts, procedures, functions, logics, effects, and the like.

The translation process according to embodiments of the present disclosure will be described in further detail below.

Exemplary translation flow

In the case of input speech, considering that the input speech is directly input word by word in order, translation will also be performed word by word starting from the first word input in the translation process. As an example, in the case where the input speech is a sentence, translation is performed step by step starting from the first word on the left side of the sentence when the sentence is input.

First, for each word, it is first determined whether it is part of a phrase. If not, the recognized word is subjected to subsequent translation processing; if yes, the next word is directly switched to until the whole phrase is judged, and then the identified phrase is subjected to subsequent translation processing. The recognized words or phrases are hereinafter referred to as words.

According to the present disclosure, phrase identification is the determination of whether a word is used as a word or a component of a phrase. Whenever a word is encountered, for example, the first word that is likely to be a phrase (e.g., a), the member of checking to the right whether the next word is a phrase is automatically checked according to the vocabulary library. If so (e.g., a lot), look to the right for the next word (e.g., a lot of) until the next word is not a phrase (e.g., a lot of a scope, which is determined to be a phrase and a scope is a word). This calculation formula ensures that phrases are prioritized over word recognition. For example, a is simply a definite article in the book, but is an element of the word of the book of peoples, so a should be recognized as an integral part of the book of, rather than a separate definite article.

Then, a translation process is performed on the recognized vocabulary. In the translation process, for a recognized vocabulary, tags of the vocabulary are recognized, and an operation is performed according to the recognition result.

According to the embodiment of the present disclosure, a series of operations can be performed in the translation process in order to select a translation more accurately. Such as:

1. searching: looking at a given label or word to the left or right

2. Matching: interacting with the vocabulary library, and converting the original text into a translation; the version number can be specified when there is a match

3. Memory: temporarily storing the words, inserting the words into a specified position after the words are stored, and converting the words into a translation; or inserting a translated version of a word not in the original language into the specified location

4. Ignoring: jump over the current vocabulary without matching

5. Executing a logic formula: execute the specified formula, and so on.

It should be noted that these operations are merely exemplary and not exhaustive. Furthermore, in view of the requirement of synchronicity, some operations, such as memorizing, inserting, etc., are only performed for a number of consecutive vocabularies, which may avoid translation lag of excessive vocabularies.

According to the embodiment, the labels of the vocabulary to be translated are recognized, and the logical relationship between the vocabulary to be translated and the adjacent vocabulary is analyzed if necessary to judge which Chinese the vocabulary should be translated into, i.e. selecting the translated text. If the vocabulary relates to a tag formula, the translation of the word or phrase is logically derived according to the corresponding tag formula to determine the proper translation timing and/or obtain a more suitable translation expression. If the word does not relate to a tag formula, it may be considered whether the word satisfies other rules/formulas, such as a lexical formula, version number, etc., or a default translation expression may be selected directly for the word or phrase, such as in the absence of a corresponding other rule/formula.

According to an embodiment, after determining whether to relate to a tag formula, it may be determined whether to relate to a vocabulary formula immediately at the same time. The determination is performed according to the logic rules of a predetermined vocabulary formula. If the word or phrase belongs to any lexical formula, then translation is performed according to the lexical formula, such as the exemplary operation of the lexical formula described above.

If the word or phrase does not belong to any lexical formula, the word or phrase can be directly translated according to the default version as a translation result.

Therefore, according to the embodiment, in the translation process of the present disclosure, for the vocabulary to be translated, the vocabulary to be translated may be detected to determine whether the vocabulary to be translated relates to a tag formula or a vocabulary formula. The former formula is executed if the former is involved. The latter formula is executed if the latter is involved. If both are involved, the former is performed first and then the latter is performed almost simultaneously. If neither version is concerned, the version number of the vocabulary may be considered, or the vocabulary may be translated directly to the default version of the setting without multiple version numbers for the vocabulary. And automatically entering the next word after each translation, and rolling forwards in such a way, and continuing according to the input sentence language order until the translation is finished. The following exemplarily describes the translation process by a statement translation table:

example 1

They bought a house in London。

They bought a house, in london.

Example 2:

take a longer sentence as an example:

It is a pleasure to be back in Beijing so soon after my successful visit last November.

this is a happy event that can go back to 11 months before Beijing was so much before my successful visit. Please see the following table schematic:

of course, the above table is of a summary nature, and only lists the results of the execution of the formula, and does not fully show all the calculation steps in the relevant formula.

The above illustrates translation examples according to the present disclosure. It should be noted that although many english sentences are more complex than the above example sentences, the required logic formulas and calculation flows are exactly the same.

Formula example

The following illustrates an implementation of some of the logical formulas of the present disclosure. It should be noted that the following is merely exemplary, and any logical rules/formulas for vocabulary translation based on attribute identification information of the vocabulary and the relationship of the vocabulary to neighboring vocabularies (e.g., attributes of neighboring vocabularies, etc.) are encompassed within this common thread.

Accessory 1-formula TM

The term "TM" stands for time

1. Searching: from the target word to the left 1 word

1.1. If NN, execute 2

1.2. If VB/PAST/PRES

1.2.1. Matching target words

1.2.2. Inserting 'memory verb'

They disappeared last week.

They disappeared in the last week.

1.3. If not, the target word is matched.

(on the left hand side VB has been processed by VB)

2. Searching: left 6 words

2.1. If PAST

2.1.1. Inserted with comma'

2.1.2. Matching target words

2.1.3. Inserting a memory verb

2.1.4. Insert "of"

They bought a car last November.

They bought one car, 11 months in the last year.

2.2. If VB/PRES

2.2.1. Inserted with comma'

2.2.2. Matching target words

2.2.3. Inserting a memory verb

They buy a car every summer.

They buy one car, each in summer.

They will be buying a car every summer.

They will buy a car every summer.

2.3. If DYN8/that

2.3.1. Insert comma

2.3.2. Insert "that is"

2.3.3. Matching target words

…after my excellent visit last November.

Previously my excellent visit, that was 11 months.

They bought the place last November.

They purchased that place, which was 11 months.

2.4. If the number of the first-class signal transmission line is not greater than the preset value,

2.4.1. matching target words

2.4.2. Insert "of"

2.4.3. Insert F1(NN)

2.4.4.The group discussion last week was about the system.

Panel discussion the last week of discussion was about the system.

The discussion group meeting last week ended early.

The conference in the last week of the discussion group conference ends very early.

Accessory 2-formula VB

The target words are VB/PAST/DYN9+ PAST, which are verbs and their forms

3. Searching: go to the left 6 words (judge whether question or statement sentence, if it is below, then question sentence.)

3.1. If how + ADV

3.1.1. Match how version 3 (how much)

3.1.2. Matching ADV

3.1.3. Matching target words

This shows just how quickly China is developing its industry.

This is just shown (how fast china is developing).

3.2. If how + ADJ

3.2.1. Match F1 right side each word up to the target word (match how + NN subject behind ADJ)

3.2.2. Matching words (VB)

3.2.3. Match F1-1 version 3 (how much)

3.2.4. Match F1-2(ADJ)

This shows just how important China is.

This is just shown (how important china has been)

3.3. If how

3.3.1. Match F1 right side each word up to the target word (match the NN subject behind how)

3.3.2. Match F1(how)

3.3.3. Matching words (VB)

They told us how the team managed to win.

They tell us how the team tries to win.

3.4. If DYN9/DYN4

3.4.1. Match F1 right side each word up to the target word (match NN subject after DYN9)

3.4.2. Match F1(DYN4 and DYN9)

3.4.3. Matching words (VB)

Has the team won？

Has the team won?

Will the team win？

Will the team win?

3.5. If not (statement sentence), execute 2

4. Searching: from the target word to the right 1 word

4.1. If and

4.1.1. matching target words

4.1.2. Match F1-1 version 2 (AND)

4.1.3. The word of the memory label is 'memory verb'

They nod and leave quickly and quietly at the end.

They nod and (go quickly and quietly, go at end.)

4.2. If ADV

4.2.1. Memorizing the target word as 'memory verb'

4.2.2. Carry out formula ADV (treatment with left-view predicate)

They nod and leave quickly and quietly at the end.

They nod and leave quickly and silently end up. )

4.3. If ADJ, execute formula ADJ (find VB to the left and then match)

4.3.1. Matching target words

4.3.2. Matching F1

The show ended unfinished.

The performance is over.

4.4. If DYN3

4.4.1. Matching target words

4.4.2. Execute formula DYN3

They studied to enter a university.

They learn (to enter university.)

4.5. If MD

4.5.1. The memory target word is a memory verb

4.5.2. Executing formula MD (see to the left whether there is a verb, insert if so)

They ate at a local restaurant.

They eat at the local restaurant.

4.6. If TM

4.6.1. Memorizing the subject word as verb

4.6.2. Executing formula TM (see to the left whether there is a verb, insert if so)

They disappear over time.

They disappear over a period of time.

4.7. If PREN/NN/NNP/NNSO/NNPSO/DYN13

4.7.1. Memorizing the target word as 'memory verb'

4.7.2. Matching target words

4.7.3. Match F2(PREN/NN/NNP/NNSO/NNPSO/DYN13)

(if ADV is present thereafter, formula ADV, or a memory verb is inserted, if NN is present thereafter, formula NN will be processed)

They told us the team won.

They tell us team (won.)

4.8. If not

4.8.1. Matching target words

They nod.

They nod their heads.

They told us that the team won.

They tell us (team won.)

Fig. 6 is a block diagram illustrating an exemplary hardware configuration of a computer system 1000 in which embodiments of the invention may be implemented.

As shown in fig. 6, the computer system includes a computer 1110. The computer 1110 includes a processing unit 1120, a system memory 1130, a non-removable, non-volatile memory interface 1140, a removable, non-volatile memory interface 1150, a user input interface 1160, a network interface 1170, a video interface 1190, and an output peripheral interface 1195, which are connected by a system bus 1121.

The system memory 1130 includes a ROM (read only memory) 1131 and a RAM (random access memory) 1132. A BIOS (basic input output system) 1133 resides in ROM 1131. Operating system 1134, application programs 1135, other program modules 1136, and some program data 1137 reside in RAM 1132.

Non-removable non-volatile memory 1141 (such as a hard disk) is connected to non-removable non-volatile memory interface 1140. Non-removable non-volatile memory 1141 may store, for example, an operating system 1144, application programs 1145, other program modules 1146, and some program data 1147.

Removable nonvolatile memory, such as a floppy disk drive 1151 and a CD-ROM drive 1155, is connected to the removable nonvolatile memory interface 1150. For example, a floppy disk 1152 may be inserted into the floppy disk drive 1151, and a CD (compact disk) 1156 may be inserted into the CD-ROM drive 1155.

Input devices such as a mouse 1161 and keyboard 1162 are connected to the user input interface 1160.

The computer 1110 may be connected to a remote computer 1180 through a network interface 1170. For example, the network interface 1170 may be connected to a remote computer 1180 via a local network 1171. Alternatively, the network interface 1170 may be connected to a modem (modulator-demodulator) 1172, and the modem 1172 is connected to the remote computer 1180 via the wide area network 1173.

Remote computer 1180 may include a memory 1181, such as a hard disk, that stores remote application programs 1185.

The video interface 1190 is connected to a monitor 1191.

An output peripheral interface 1195 is connected to a printer 1196 and speakers 1197.

The computer system shown in FIG. 6 is illustrative only and is not intended to limit the invention, its application, or uses in any way.

The computer system shown in fig. 6 may be implemented for either embodiment as a stand alone computer, or as a processing system in a device, in which one or more unnecessary components may be removed or one or more additional components may be added.

It should be noted that the methods and apparatus described herein may be implemented as software, firmware, hardware, or any combination thereof. Some components may be implemented as software running on a digital signal processor or microprocessor, for example. Other components may be implemented as hardware and/or application specific integrated circuits, for example.

In addition, the methods and systems of the present invention may be implemented in a variety of ways. For example, the methods and systems of the present invention may be implemented in software, hardware, firmware, or any combination thereof. The order of the steps of the method described above is merely illustrative and, unless specifically stated otherwise, the steps of the method of the present invention are not limited to the order specifically described above. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, including machine-readable instructions for implementing a method according to the present invention. The invention therefore also covers a recording medium storing a program for implementing the method according to the invention.

Those skilled in the art will appreciate that the boundaries between the above described operations merely illustrative. Multiple operations may be combined into a single operation, single operations may be distributed in additional operations, and operations may be performed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments. However, other modifications, variations, and alternatives are also possible. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

While the present invention has been described with reference to the exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. The various embodiments disclosed herein may be combined in any combination without departing from the spirit and scope of the present disclosure. It will also be appreciated by those skilled in the art that various modifications may be made to the embodiments without departing from the scope and spirit of the disclosure.

Claims

1. An apparatus for converting a first sequence of expressions in a first language into a corresponding second sequence of expressions in a second language, the first sequence of expressions comprising at least one word, the apparatus comprising: for each word of the first sequence of expressions entered in sequence, starting from the beginning:

the recognition unit recognizes the attribute identification information of the vocabulary; and is

A translation unit that translates the vocabulary into an expression in a second language of the vocabulary based on the recognized attribute identification information of the vocabulary and a logical relationship between the recognized vocabulary and a certain number of adjacent vocabularies,

so that words can be converted in turn from the beginning of the first expression sequence into expressions of the second language.

2. The apparatus of claim 1, wherein the property identification information of the vocabulary is set based on a property of the vocabulary, and wherein the property of the vocabulary is determined by analyzing at least one of a part of speech, a sense of speech, a function, a role, a morphology of the vocabulary.

3. The apparatus of claim 2, wherein the vocabulary is sorted according to the determined attribute, and attribute identification information is specified for each of the sorts of vocabulary.

4. The apparatus of claim 1, wherein the vocabulary comprises any of:

word elements consisting of a single word;

a phrase element consisting of a specific number of adjacent words; and

a particular element consisting of a particular number of adjacent words that obey the same lexical criteria.

5. The device of claim 4, wherein the identification unit is configured to: for each word in the first sequence of expressions,

identifying whether the word belongs to a phrase element;

if the word does not belong to a phrase, identifying whether the word belongs to a word element or a particular element,

wherein the translation of the vocabulary is performed based on the recognition result.

6. The apparatus of claim 1, said logical relationship comprising a logical relationship between the vocabulary and a predetermined number of vocabularies preceding the vocabulary and/or a predetermined number of vocabularies following the vocabulary.

7. The apparatus of claim 1, wherein the logical relationship refers to an attribute selected from the group consisting of part-of-speech, word sense, role, morphology of a certain number of adjacent words relative to the recognized word, and

determining a translation timing and/or a translation paraphrase of the identified vocabulary according to the attributes of the specific number of adjacent vocabularies.

8. The apparatus of claim 1, further configured to apply a corresponding logical formula to translate the vocabulary to an expression of a second language of the vocabulary based on the identified attribute identification information of the vocabulary and a logical relationship between the identified vocabulary and a certain number of adjacent vocabularies.

9. The apparatus of claim 8, wherein the attribute identification information is a tag of a vocabulary, the logic formula is a tag formula, and

wherein the translation timing and/or translation definitions of the recognized vocabulary are determined according to the logical relationship between the recognized vocabulary and the adjacent vocabulary based on the tag formula corresponding to the tag of the recognized vocabulary.

10. The apparatus of claim 8, wherein the logical formula is a lexical formula, and

and calling a corresponding vocabulary formula according to the logical relation between the recognized vocabulary and the adjacent vocabulary so as to determine the translation paraphrase of the recognized vocabulary.

11. A method for converting a first sequence of expressions in a first language into a corresponding second sequence of expressions in a second language, the first sequence of expressions comprising at least one vocabulary, the method comprising: for each word of the first sequence of expressions entered in sequence, starting from the beginning:

identifying attribute identification information of the vocabulary; and is

Translating the vocabulary into an expression of a second language of the vocabulary based on the identified attribute identification information of the vocabulary and the logical relationship between the identified vocabulary and a certain number of adjacent vocabularies,

12. A device comprising at least one processor and at least one storage device having instructions stored thereon, which when executed by the at least one processor, cause the at least one processor to perform the method of any one of claims 1-10.

13. A storage medium having stored thereon instructions which, when executed by a processor, may cause performance of the method according to any one of claims 1-10.

14. An apparatus comprising means to perform the method of any one of claims 1-10.