CN107783965B

CN107783965B - Sentence pattern structure-based translation method and device

Info

Publication number: CN107783965B
Application number: CN201610797794.8A
Authority: CN
Inventors: 张炳林
Original assignee: Shenzhen Qingfeng Quanneng Education Training Co ltd
Current assignee: Shenzhen Qingfeng Quanneng Education Training Co ltd
Priority date: 2016-08-31
Filing date: 2016-08-31
Publication date: 2021-07-02
Anticipated expiration: 2036-08-31
Also published as: CN107783965A

Abstract

The invention discloses a sentence structure-based translation method and a sentence structure-based translation device, wherein the method comprises the following steps: acquiring a source language text and determining a predicate in the source language text; performing semantic analysis on the source language text, and determining one or more source language units of the source language text according to the semantic analysis result and the position of the predicate in the source language text; performing semantic analysis on the source language units respectively, and determining source language subunits in each source language unit respectively; rearranging the sequence of the source language units and the predicates according to the sentence pattern structure of the target language, and rearranging the sequence of the source language subunits in each source language unit; and translating the source language text into the target language text according to the arranged sequence. The method firstly determines predicates, and then adopts a layered arrangement mode to ensure that the arrangement sequence of a source language unit and a source language subunit is consistent with the sentence pattern structure of a target language, so that the translated target language text is more standard.

Description

Sentence pattern structure-based translation method and device

Technical Field

The invention relates to the technical field of translation methods, in particular to a sentence pattern structure-based translation method and device.

Background

With the increasing deepening of international communication and cooperation, languages of different countries can be used in daily life or work; therefore, when performing communication and collaboration, it is generally necessary to translate texts in different source languages into target language translations in order to facilitate smooth communication and collaboration between the countries. Each language has its own grammar or sentence structure due to the differences between the languages used in different countries. For example, the English word "I student English at your school" translates directly into Chinese as "I study English at your school", and according to the grammar rules of Chinese, should translate into "I study English at your school".

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

when a source language is translated into a target language by using the existing translation method, although the sequence of sentences can be adjusted according to the grammatical rules of the target language, the existing translation method can only adjust the sequence of simple sentences, but complex sentences are still translated according to the sequence of the source language. For example, the source language is Chinese and the target language is English; the source language is that Forty students sings an English song ten times apart around a river in a park at night, the existing translation method translates the English into the English and then the formula students at eight night pm yesterday evolution in the park after the English translation method translates the English into the formula students at eight times of the park after the English translation method translates the English into the English translation method, and the source language should express the formula students at eight times of the English translation method by the formula of the English comparison standard, namely the formula students at eight times of the park after the English translation method translates the formula into the English translation method at 8 times of the park at least one time of the park after the English translation method translates into the English translation method. Because the existing translation method translates the complex sentences according to the sequence of the source language, the language sequence of the translated target language is disordered, and even the problem of grammar error exists. If the translation result of some translation software for the above Chinese language is "About for students at right o' clock for expressing in the product near the product is right for having two verb verbs of" is "and" sang "exist in the sentence.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention aims to provide a sentence pattern structure-based translation method and a sentence pattern structure-based translation device, so that the defect that the conventional translation method cannot effectively translate complex sentences is overcome.

The translation method based on the sentence pattern structure provided by the embodiment of the invention comprises the following steps:

acquiring a source language text and determining a predicate in the source language text;

performing semantic analysis on the source language text, and determining one or more source language units of the source language text according to the semantic analysis result and the position of the predicate in the source language text;

performing semantic analysis on the source language units respectively, and determining source language subunits in each source language unit respectively;

rearranging the sequence of the source language units and the predicates according to the sentence pattern structure of the target language, and rearranging the sequence of the source language subunits in each source language unit;

and translating the source language text into the target language text according to the arranged sequence.

In one possible implementation, the source language unit includes: subject, object, style, place, or time; the source language subunit includes: one or more of a sub-subject, a sub-predicate, a sub-object, a sub-way-state, a sub-location-state, and a sub-time-state.

In one possible implementation, the subject includes: one or more of a first sub-subject, a first sub-predicate, a first sub-object, a first sub-mode-state, a first sub-location-state, and a first sub-time-state; the object includes: one or more of a second sub-subject, a second sub-predicate, a second sub-object, a second sub-mode-state, a second sub-location-state, and a second sub-time-state; the mode shape words include: one or more of a third sub-subject, a third sub-predicate, a third sub-object, a third sub-mode-state, a third sub-location-state, and a third sub-time-state; the place number words include: one or more of a fourth sub-subject, a fourth sub-predicate, a fourth sub-object, a fourth sub-mode state, a fourth sub-location state, and a fourth sub-time state; the time status words include: one or more of a fifth sub-subject, a fifth sub-predicate, a fifth sub-object, a fifth sub-mode-state, a fifth sub-location-state, and a fifth sub-time-state;

the sentence structure of the target language is: (first sub-subject + first sub-predicate + first sub-object + first sub-manner-state + first sub-point-state + first sub-time-state) + predicate + (second sub-subject + second sub-predicate + second sub-object + second sub-manner-state + second sub-time-state) + (third sub-subject + third sub-predicate + third sub-object + third sub-manner-state + third sub-time-state) + (fourth sub-subject + fourth sub-predicate + fourth sub-manner-state + fourth sub-point-state + fourth sub-time-state) + (fifth sub-subject + fifth sub-object + fifth sub-manner-state + fifth sub-point-state + fifth sub-time-state) + first sub-subject + second sub-time-state-.

In one possible implementation, after determining the source language sub-units in each source language unit separately, the method further includes:

establishing an n-order initial matrix, wherein each source language subunit corresponds to one element in the initial matrix, and the predicate also corresponds to one element in the initial matrix; the determinant of the initial matrix is not zero and n is more than or equal to 6;

determining a characteristic matrix after determining the source language subunit, wherein the characteristic matrix is determined after adding corresponding preset difference values to elements corresponding to the source language subunit and the predicate respectively;

when a plurality of feature matrixes can be determined, the feature matrix with the largest determinant is used as a target feature matrix, and a source language subunit corresponding to the target feature matrix is used as a finally determined source language subunit.

In one possible implementation manner, the element values of the elements on the diagonal of the initial matrix are first initial values, and the other element values are second initial values; each source language unit corresponds to a preset region of the initial matrix, and each source language subunit in the source language units sequentially corresponds to an element in the preset region; the preset area is a row, a column or a nine-square grid of the initial matrix.

Based on the same inventive concept, an embodiment of the present invention further provides a translation apparatus based on a sentence structure, including:

the obtaining module is used for obtaining a source language text and determining a predicate in the source language text;

the first analysis module is used for performing semantic analysis on the source language text and determining one or more source language units of the source language text according to the semantic analysis result and the position of the predicate in the source language text;

the second analysis module is used for performing semantic analysis on the source language units respectively and determining source language subunits in each source language unit respectively;

the sequencing module is used for rearranging the sequence of the source language units and the predicates according to the sentence pattern structure of the target language and rearranging the sequence of the source language subunits in each source language unit;

and the translation module is used for translating the source language text into the target language text according to the arranged sequence.

In one possible implementation, the source language unit includes: subject, object, style, place, or time;

the source language subunit includes: one or more of a sub-subject, a sub-predicate, a sub-object, a sub-way-state, a sub-location-state, and a sub-time-state.

In one possible implementation, the subject includes: one or more of a first sub-subject, a first sub-predicate, a first sub-object, a first sub-mode-state, a first sub-location-state, and a first sub-time-state;

the object includes: one or more of a second sub-subject, a second sub-predicate, a second sub-object, a second sub-mode-state, a second sub-location-state, and a second sub-time-state;

the mode shape words include: one or more of a third sub-subject, a third sub-predicate, a third sub-object, a third sub-mode-state, a third sub-location-state, and a third sub-time-state;

the place number words include: one or more of a fourth sub-subject, a fourth sub-predicate, a fourth sub-object, a fourth sub-mode state, a fourth sub-location state, and a fourth sub-time state;

the time status words include: one or more of a fifth sub-subject, a fifth sub-predicate, a fifth sub-object, a fifth sub-mode-state, a fifth sub-location-state, and a fifth sub-time-state;

In one possible implementation, the apparatus further includes: the device comprises an establishing module, a determining module and a processing module;

after the second analysis module respectively determines a source language subunit in each source language unit, the establishment module is used for establishing an n-order initial matrix, each source language subunit corresponds to one element in the initial matrix, and the predicate also corresponds to one element in the initial matrix; the determinant of the initial matrix is not zero and n is more than or equal to 6;

the determining module is used for determining a characteristic matrix after determining the source language subunit, wherein the characteristic matrix is determined after adding corresponding preset difference values to elements corresponding to the source language subunit and the predicate respectively;

and the processing module is used for taking the characteristic matrix with the maximum determinant as a target characteristic matrix and taking the source language subunit corresponding to the target characteristic matrix as the finally determined source language subunit when a plurality of characteristic matrices can be determined.

In one possible implementation manner, the element values of the elements on the diagonal of the initial matrix are first initial values, and the other element values are second initial values;

each source language unit corresponds to a preset region of the initial matrix, and each source language subunit in the source language units sequentially corresponds to an element in the preset region; the preset area is a row, a column or a nine-square grid of the initial matrix.

According to the sentence structure-based translation method and device provided by the embodiment of the invention, the predicates of the source language texts are determined firstly, and then each source language unit and the source language subunits in the source language unit can be determined accurately; and then rearranging the source language text according to the sentence pattern structure of the target language, ensuring that the arrangement sequence of the source language units and the source language subunits is consistent with the sentence pattern structure of the target language by adopting a layered arrangement mode, ensuring that the translated target language text is consistent with the sentence pattern structure of the target language, and ensuring that the translated target language text is more standard. When the target language is English, sentence pattern structures of subject + predicate + object + mode shape + time shape + place shape are uniformly adopted, and sentence components are arranged in a layered mode, so that a large amount of target language texts can be summarized by adopting one sentence pattern structure of the target language without selecting one of the sentence pattern structures, and thus, the processing flow is saved, and the efficiency is improved. Meanwhile, an optimal division scheme is determined according to the determinant of the characteristic demonstration, so that each source language unit and each source language subunit of the divided text can be easily distinguished, and the target language text is more in line with grammatical rules and more standard.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flowchart of a sentence structure-based translation method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for selecting an optimal partitioning scheme according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a first structure of a sentence-based translation apparatus according to an embodiment of the present invention;

FIG. 4 is a second block diagram of a sentence structure-based translation apparatus according to an embodiment of the present invention.

Detailed Description

The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

According to an embodiment of the present invention, a sentence structure-based translation method is provided, and fig. 1 is a flowchart of the method, which specifically includes steps 101-105:

step 101: and acquiring a source language text, and determining a predicate in the source language text.

In the embodiment of the invention, the source language text is a text to be translated, and the target language text is a translated text. Generally, a predicate exists in each sentence; especially for english, the predicate is an indispensable component in the sentence. For example, in Chinese, predicates are used to indicate how, what nature, what state, etc. a subject is in, and are used to state the subject; in Chinese, verbs, verb phrases, adjective phrases, nouns, noun phrases and main and subordinate phrases are used as predicates. For example, "hot" in "very hot weather" is the adjective predicate. In english, a predicate is composed of simple verbs or verb phrases, and the morphemes of the predicate are represented by various tenses. For example, "is hot" in "The weather is very hot" is a compound predicate consisting of a concatenated verb + a table. The compound predicate refers to a predicate composed of a (generalized) emotional verb (a simple emotional verb or a compound emotional verb) and a non-emotional verb in languages such as english; in English, the following are included: a verb-assisted + a composite predicate composed of verb indefinite without to, a composite predicate composed of a concatenated verb + a table, and the like.

Since predicates in sentences are indispensable and only one predicate exists for one sentence, the predicate in the source language text is determined first in the embodiment of the present invention. Specifically, semantic analysis can be performed on the received source language text according to a pre-trained source language corpus to obtain a predicate in the source language text; semantic analysis according to the corpus is a mature technology, and is not described herein in detail.

For example, the source language is chinese, the source language text is "forty students have sung an english song ten times apart around a river in the park every night eight times yesterday", and it can be known from semantic analysis that "sung" is a predicate. If the "happy" is taken as a predicate, firstly, according to semantic analysis, the "happy" in the source language text is used for modifying the "sing", namely, the "happy" is an adverb, and the situation that the adverb is a predicate does not exist in the Chinese language; meanwhile, if "open" is used as a predicate, "singing" in a sentence is not clearly positioned, which leads to an error in the background art in which both "open" and "singing" are used as predicates.

Step 102: and performing semantic analysis on the source language text, and determining one or more source language units of the source language text according to the semantic analysis result and the position of the predicate in the source language text.

In the embodiment of the present invention, after determining the predicate of the source language text, since the source language generally has a specific sentence structure, other components can be generally determined after determining the location of the predicate. For example, chinese generally adopts a sentence structure of "subject-predicate", that is, the part before the predicate is the subject, and the part after the predicate is the object. One or more source language units of the source language text may be determined concurrently with the semantic analysis results. In the embodiment of the present invention, the source language unit is a part of a source language text, and specifically, for languages such as chinese or english, the source language unit may specifically be: subject, object, style, place, or time; namely, the source language unit is a sentence component in the source language text; determining one or more source language units of the source language text in step 102 is determining one or more of a subject, an object, a mode shape, a place shape and a time shape of the source language text; the mode-like language is generally an adverb or a combination of adverbs (preposition phrase) consisting of prepositions and nouns, and the mode-like language is generally used for describing how actions are performed and mainly used for modifying predicate verbs.

In particular, each source language unit may be determined in conjunction with word segmentation techniques. For example, the source language text is "forty students are singing an english song ten times apart around a river in the park at yesterday night by eight points, the text after the participle processing is" forty students/yesterday night/eight points around/at the park/small river/happy place/singing/one english song ", it can be known from semantic analysis that" sung "is a predicate," forty students "are subjects," yesterday night by eight points around "is a time-shape," small river at the park "is a place-shape," ten times apart heart "is a mode-shape, and" one english song "is a guest.

Step 103: and performing semantic analysis on the source language units respectively, and determining source language subunits in each source language unit respectively.

In the embodiment of the invention, after the complex source language text is divided into a plurality of source language units, because each source language unit may still consist of a more complex sentence structure, semantic analysis needs to be performed on each source language unit respectively to determine the source language sub-unit in each source language unit. Wherein the source language subunit includes: one or more of a sub-subject, a sub-predicate, a sub-object, a sub-way-state, a sub-location-state, and a sub-time-state.

Specifically, as described above, since the predicate is generally in a simple form, further semantic analysis of the predicate is not required, that is, the source language unit does not include the predicate. When the source language units are: and when the subject, the object, the mode-state, the place-state or the time-state is adopted, semantic analysis is sequentially carried out on the subject, the object, the mode-state, the place-state or the time-state, and then the sub-subject, the sub-predicate, the sub-object and the like in each source language unit are determined. Also, if the source language unit has only one sentence component, the determined source language sub-unit corresponds to the source language unit. For example, if the determined object is a simple noun, the source language subunit determined in step 103 based on the noun is a child object. In the above example, "forty students" are subjects, "one english song" is object, and then the "forty students" are children in the subjects and "one english song" is children in the object when determining the source language subunits.

Step 104: the order of the source language units and predicates is rearranged according to the sentence structure of the target language, and the order of the source language sub-units is rearranged in each source language unit.

In the embodiment of the invention, the sentence pattern structure of the target language is determined according to the grammar rule of the target language. After determining the source language unit and the predicate, rearranging the sequence of the source language unit and the predicate according to the sentence structure of the target language, so that the sequence of the source language unit and the predicate is consistent with the sentence structure of the target language; and then respectively arranging the source language subunits in each source language unit according to the sentence structure of the target language. And ensuring that the arrangement sequence of the source language units and the source language subunits is consistent with the sentence pattern structure of the target language by adopting a hierarchical arrangement mode.

Specifically, when the target language is english, if the source language units include: subject, object, mode, place, or time, the source language subunit comprising: one or more of a sub-subject, a sub-predicate, a sub-object, a sub-way-state, a sub-location-state, and a sub-time-state; at this time:

the subject language includes: one or more of a first sub-subject, a first sub-predicate, a first sub-object, a first sub-mode-state, a first sub-location-state, and a first sub-time-state;

the time status words include: one or more of a fifth sub-subject, a fifth sub-predicate, a fifth sub-object, a fifth sub-mode-state, a fifth sub-location-state, and a fifth sub-time-state.

The translation method is confused because multiple sentence structures may be determined according to the grammatical rules of the target language. In the embodiment of the present invention, when the target language is english, the following sentence structure is adopted: (first sub-subject + first sub-predicate + first sub-object + first sub-manner-state + first sub-point-state + first sub-time-state) + predicate + (second sub-subject + second sub-predicate + second sub-object + second sub-manner-state + second sub-time-state) + (third sub-subject + third sub-predicate + third sub-object + third sub-manner-state + third sub-time-state) + (fourth sub-subject + fourth sub-predicate + fourth sub-manner-state + fourth sub-point-state + fourth sub-time-state) + (fifth sub-subject + fifth sub-object + fifth sub-manner-state + fifth sub-point-state + fifth sub-time-state) + first sub-subject + second sub-time-state-. Each group of parentheses in the sentence structure represents a source language unit, and in order to distinguish the source language units, each group of source language subunits is added with parentheses, or all the parentheses in the sentence structure can be omitted; in the examples of the present invention, the meanings are the same with or without parentheses.

That is, the source language units and predicates are sorted by the structure of "subject + predicate + object + mode-state + point-state + time-state"; meanwhile, the source language subunits in each source language unit also adopt similar sentence pattern structures to carry out sequencing on 'son subject + son predicate + son object + son way object + son place object + son time object'. The two arrangement modes are combined to form the sentence structure.

Step 105: and translating the source language text into the target language text according to the arranged sequence.

The rearranged sequence according to the sentence pattern structure of the target language conforms to the arrangement sequence of the target language, so that the translated target language text conforms to the sentence pattern structure of the target language, and the target language text is more standard. Specifically, the "translating the source language text into the target language text in the arranged order" in step 105 may be: after each source language subunit is translated into a target language subunit, arranging the translated target language subunits according to the sentence pattern structure of the target language and combining the translated target language subunits into a target language text; or after arranging the source language subunits according to the sentence structure of the target language, sequentially translating each source language subunit into a target language subunit, and combining the translated target language subunits into a target language text. That is, the translation may be performed first and then the arrangement, or the arrangement may be performed first and then the translation. Meanwhile, when the text is translated into the target language text, the translated source language subunit can be adjusted according to the grammatical rules of the target language, for example, when the target language is english, action nouns may be used as the subject.

According to the sentence structure-based translation method provided by the embodiment of the invention, the predicate of the source language text is determined firstly, and then each source language unit and the source language subunits in the source language unit can be determined accurately; and then rearranging the source language text according to the sentence pattern structure of the target language, ensuring that the arrangement sequence of the source language units and the source language subunits is consistent with the sentence pattern structure of the target language by adopting a layered arrangement mode, ensuring that the translated target language text is consistent with the sentence pattern structure of the target language, and ensuring that the translated target language text is more standard.

In one possible implementation, since there may be multiple schemes for dividing the source language text in

steps

102 and 103, i.e. there may be multiple forms for the divided source language units and source language sub-units, an optimal division scheme needs to be selected. Specifically, when there are multiple types of division results (the division results indicate the results of dividing the source language unit and the source language sub-unit), in the embodiment of the present invention, after determining the source language sub-unit in each source language unit, as shown in fig. 2, the method further includes step 201 and step 203:

step 201: establishing an n-order initial matrix, wherein each source language subunit corresponds to one element in the initial matrix, and the predicate also corresponds to one element in the initial matrix; the determinant of the initial matrix is not zero and n is more than or equal to 6.

In the embodiment of the present invention, the initial matrix is a 6 th-order matrix, that is, there are 36 elements at least, and the predicate in the source language text and each source language subunit correspond to one element in the initial matrix, and the corresponding elements are different from each other. Since the initial matrix has a minimum of 36 elements, some of them may not have correspondence to the predicate or source language subunit.

Optionally, the element values of the elements on the diagonal of the initial matrix are first initial values, the other element values are second initial values, and the first initial values are different from the second initial values. Preferably, the first initial value is not zero. Since the elements on one diagonal of the initial matrix are different from the other elements, the determinant of the initial matrix is guaranteed to be not zero. Specifically, the initial matrix may be a cell matrix, i.e., the first initial value is 1 and the second initial value is 0. Cell matrix A of order 6₀Comprises the following steps:

step 202: and determining a characteristic matrix after determining the source language subunit, wherein the characteristic matrix is determined by adding corresponding preset difference values to elements corresponding to the source language subunit and the predicate respectively.

In the embodiment of the invention, the characteristic matrix is a matrix converted from the initial matrix. Specifically, after determining the Source language sub-units and predicates, one will initiateAnd increasing a preset difference value for corresponding elements in the matrix, wherein the initial matrix after the preset difference value is increased is the characteristic matrix. For example, if the target language is English, the first subobject of the subject and a in the initial matrix₃₁Correspondingly, when the source language text is determined to have the first subobject, a is carried out₃₁Increasing the value of (a) by a preset difference value; the initial matrix may be converted to a feature proof after adding a preset difference to all elements corresponding to the source language sub-units and predicates in the source language text. The preset difference may be a uniform constant value, or different values may be determined according to different source language subunits and predicates, specifically according to actual conditions.

Meanwhile, each source language unit corresponds to a preset region of the initial matrix, and each source language subunit in the source language units sequentially corresponds to an element in the preset region; the preset area is a row, a column or a nine-square grid of the initial matrix.

Preferably, when two rows or two columns in the matrix are the same, the determinant of the matrix is zero; in order to avoid the situation that two rows or two columns are the same, in the embodiment of the invention, the sum of the second initial value and the preset difference is not equal to the first initial value. Since, when the sum of the second initial value and the preset difference is not equal to the first initial value, even after the preset difference is added to the element whose initial value is the second initial value, the value of the diagonal element of the initial matrix is the first initial value, the value of the other element except the diagonal element is unlikely to be the first initial value after the preset difference is added, that is, each row or each column of the matrix is unlikely to be completely the same, and the first initial value is different from the second initial value, the determinant of the feature matrix is also not zero.

Step 203: when a plurality of feature matrixes can be determined, the feature matrix with the largest determinant is used as a target feature matrix, and a source language subunit corresponding to the target feature matrix is used as a finally determined source language subunit.

In the embodiment of the invention, as various schemes for dividing the source language text may exist, namely, the divided source language unit and source language subunit may exist in various forms, each divided form corresponds to one feature matrix, at this time, the determinant of each feature matrix is respectively calculated, the feature matrix with the largest determinant is taken as a target feature matrix, and the source language subunit corresponding to the target feature matrix is taken as the finally determined source language subunit. Since the determinant of the feature matrix may be negative, it is preferable to calculate the absolute value of the determinant of the feature matrix, and then to take the feature matrix having the largest absolute value of the determinant as the target feature matrix.

In general, the larger the determinant of the feature matrix is, the wider the distribution range of the source language subunits is, and particularly, the larger the number of the source language subunits corresponding to diagonal elements is, so that each source language unit and source language subunit of the divided text can be easily distinguished, and the target language text is more in line with the grammar rules and more standard.

For example, the source language text is "the weathered night 10 years ago suddenly leaves David in beijing without any sign, actually as if stabbing his heart with a sharp knife again and again", and it is necessary to translate the source language text into english text. The "sudden weather-related night before 10 years" leaves David without any sign in Beijing "as the subject," leaves "as the first sub-predicate (moving noun after translating into English)," David "as the first sub-object," sudden without any sign "as the first sub-mode, and" first sub-place in Beijing "as the first sub-place, and" wind-and-rain-related night before 10 years "as the first sub-time. But there may be a variety of partitions for the second half:

the first method is as follows: "actually" is like "as a predicate," stabbed his heart with a sharp knife on a firm and tough basis over and over "is an object," and "sharp knife" is a second child subject, "stabbed" is a second child-like idiom, "stabbed" is a second child predicate, and "his heart" is a second child object.

The second method comprises the following steps: "actually" as if "is a predicate," stabbing his heart "is the object, and" stabbing "is the second sub-predicate," his heart "is the second sub-object; "feel tough again and again with a sharp knife" as a mode.

The third method comprises the following steps: "stabbing" is the predicate, "his heart" is the object, "actually feels like one's tough and tough again and again with a sharp knife.

In the embodiment of the invention, the initial matrix is a 6-order cell matrix, each source language cell corresponds to one column of the initial matrix, and the predicate corresponds to a in the initial matrix₂₂And each source language subunit in the source language unit sequentially corresponds to one element in the preset region according to the sequence of the sentence structure, namely, the source language subunits in the subject correspond to the matrix elements according to the sequence of the sub-subjects, the sub-predicates, the sub-objects, the sub-mode subjects, the sub-place subjects and the sub-time subjects, and other objects, mode subjects and the like are also in the sequence. Namely a₁₁Corresponding to a first sub-subject, a₂₁Corresponding to the first sub-predicate, a₃₁Corresponding to the first sub-object, a₄₁Corresponding to a first sub-mode shape word, a₅₁Corresponding to a first sub-locale shape language, a₆₁Corresponding to a first sub-time shape language; a is₂₂Corresponding predicates; a is₁₃Corresponding to the second sub-subject, … …, a₆₆Corresponding to the fifth sub-temporal phrase. The corresponding position of each source language subunit is specifically shown in table 1 below:

TABLE 1

Meanwhile, if the preset difference values are unified to 1, then for the first method, the a of the initial matrix needs to be set₂₁，a₃₁，a₄₁，a₅₁，a₆₁，a₂₂，a₁₃，a₂₃，a₃₃，a₄₃Adding a preset difference value 1, determining a characteristic matrix A at the moment₁Comprises the following steps:

for the second method, a for the initial matrix is needed₂₁，a₃₁，a₄₁，a₅₁，a₆₁，a₂₂，a₂₃，a₃₃，a₄₄Adding a preset difference value 1, determining a characteristic matrix A at the moment₂Comprises the following steps:

for the third way, a to the initial matrix is needed₂₁，a₃₁，a₄₁，a₅₁，a₆₁，a₂₂，a₃₃，a₄₄Adding a preset difference value 1, determining a characteristic matrix A at the moment₃Comprises the following steps:

respectively calculating determinant, | A, of each characteristic matrix₁|＝2，|A₂|＝8，|A₃And 8. From step 203, it can be seen that the feature matrix corresponding to the second or third mode can be selected as the final target feature matrix. In the embodiment of the invention, when the determinant of the plurality of feature matrices is the same, the feature matrix with the largest number of source language subunits is selected as the target feature matrix. The second mode is divided into a₂₁，a₃₁，a₄₁，a₅₁，a₆₁，a₂₃，a₃₃，a₄₄A total of 8 source language subunits; the third mode is divided into a₂₁，a₃₁，a₄₁，a₅₁，a₆₁，a₃₃，a₄₄A total of 7 source language subunits (since each partition includes a predicate, the predicate may not be considered); therefore, the feature matrix corresponding to the second selection mode is the target feature matrix in the embodiment of the invention.That is, "the night of wind and rain addition before 10 years suddenly leaves David without sign" in Beijing as the subject, "leave" as the first sub-predicate, "David" as the first sub-object, "suddenly" as the first sub-mode status without sign, "in Beijing" as the first sub-location status, "and" the night of wind and rain addition before 10 years "as the first sub-time status; "actually" as if "is a predicate," stabbing his heart "is the object, and" stabbing "is the second sub-predicate," his heart "is the second sub-object; "feel tough again and again with a sharp knife" as a mode.

When the target language is english, correspondingly, the first sub-predicate is "leave" (after the verb is adjusted to be a noun as a subject according to english grammar rules), the first sub-object is "David", the first sub-mode state is "Suddenly with out and warming", "in Beijing" is the first sub-place state, and "at th stop night 3year ago" is the first sub-time state; "actuallylike" is a predicate (according to the grammatical rules of english, the system verb is needs to be added), "hurt his heart" is an object, and "hurt" is a second sub-predicate, and "his heart" is a second sub-object; "with a sharp flag so hard again and again" is a mode shape. After translating the source language text into the target language text in step 105, the obtained target language text is "Leaving David dominant with out and bending in Beijing at once stop night bright 3 layers of age ago as actual like with hi heart hot his heart with a sharp maker so hard earlier and again".

Also, there are multiple source language subunits for some source language text in the form, time, or place state. The following is a complex time-like phrase.

For example, the source language text is "after she got 200 in the bank in the morning open, she happy to go to the newly opened starbucks next to her home to buy the best 10cups of coffee", wherein "after she got 200 in the bank in the morning open" is the time-wise, and "she" in the time-wise is the fifth child subject, "got" is the fifth child predicate, "200 in" is the fifth child object, "open" is the fifth child-wise, from the bank "is the fifth child place-wise," morning "is the fifth child time-wise; "she enjoyably goes to newly opened starbucks next to her home and buys the best 10cups of coffee" includes subject, predicate, object, mode-like, and place-like. Still taking the initial matrix as the unit matrix as an example, the feature matrix at this time is:

because only one feature matrix exists, the feature matrix can be directly used as a target feature matrix. The translated target language text is "Shell build 10cups of the last coffee hash for the later derivatives at the later amplified Starbucks near the third home after the shell build 200 ya laptop from the bank build.

It should be noted that the sentence structure-based translation method provided by the embodiment of the present invention is applicable to a statement sentence. When the source language text is an interrogative sentence, after step 105, steps a1-a2 are also included:

step A1: when a source language unit is a question unit, determining a question word corresponding to the question unit;

step A2: and deleting the source language unit and setting the query words at the beginning of the sentence.

For example, the source language text is "basketball you buy today? ", the target language is English, and the sentence structure is as described above. Wherein "today" time-of-day, "you" are the subject, "where" is the place-of-question unit, "buy" is the predicate, and "basketball" is the object. In the sentence structure of subject + predicate + object + mode shape + time shape + place shape according to the above step 101-? ". In general, the query corresponding to the subject or object is who or what, the query corresponding to the predicate is do, the query corresponding to the mode-state is how, the query corresponding to the place-state is where, and the query corresponding to the time-state is when. After the query words are added to the sentence head, the statement sentence can be converted into a question sentence.

According to the sentence structure-based translation method provided by the embodiment of the invention, the predicate of the source language text is determined firstly, and then each source language unit and the source language subunits in the source language unit can be determined accurately; and then rearranging the source language text according to the sentence pattern structure of the target language, ensuring that the arrangement sequence of the source language units and the source language subunits is consistent with the sentence pattern structure of the target language by adopting a layered arrangement mode, ensuring that the translated target language text is consistent with the sentence pattern structure of the target language, and ensuring that the translated target language text is more standard. When the target language is English, sentence pattern structures of subject + predicate + object + mode shape + time shape + place shape are uniformly adopted, and sentence components are arranged in a layered mode, so that a large amount of target language texts can be summarized by adopting one sentence pattern structure of the target language without selecting one of the sentence pattern structures, and thus, the processing flow is saved, and the efficiency is improved. Meanwhile, an optimal division scheme is determined according to the determinant of the characteristic demonstration, so that each source language unit and each source language subunit of the divided text can be easily distinguished, and the target language text is more in line with grammatical rules and more standard.

The above describes a sentence structure-based translation process in detail, and the method can also be implemented by a corresponding apparatus, and the structure and function of the apparatus are described in detail below.

The translation device based on the sentence structure provided by the embodiment of the invention is shown in fig. 3, and comprises:

the obtaining module 31 is configured to obtain a source language text and determine a predicate in the source language text;

the first analysis module 32 is used for performing semantic analysis on the source language text and determining one or more source language units of the source language text according to the semantic analysis result and the position of the predicate in the source language text;

the second analysis module 33 is configured to perform semantic analysis on the source language units respectively, and determine source language subunits in each source language unit respectively;

a sorting module 34 for rearranging the order of the source language units and the predicates according to the sentence structure of the target language, and rearranging the order of the source language sub-units in each source language unit;

and the translation module 35 is configured to translate the source language text into the target language text according to the arranged sequence.

In one possible implementation, referring to fig. 4, the apparatus further includes: an establishing module 36, a determining module 37 and a processing module 38;

after the second analysis module 33 determines the source language subunits in each source language unit, the establishing module 36 is configured to establish an n-order initial matrix, where each source language subunit corresponds to one element in the initial matrix, and the predicate also corresponds to one element in the initial matrix; the determinant of the initial matrix is not zero and n is more than or equal to 6;

the determining module 37 is configured to determine a feature matrix after determining the source language subunit, where the feature matrix is determined after adding corresponding preset difference values to elements corresponding to the source language subunit and the predicate, respectively;

the processing module 38 is configured to, when a plurality of feature matrices can be determined, take the feature matrix with the largest determinant as a target feature matrix, and take the source language sub-units corresponding to the target feature matrix as finally determined source language sub-units.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. A sentence structure-based translation method is characterized by comprising the following steps:

obtaining a source language text and determining a predicate in the source language text;

establishing an n-order initial matrix, wherein each source language subunit corresponds to one element in the initial matrix, the predicate also corresponds to one element in the initial matrix, and the determinant of the initial matrix is not zero;

determining a characteristic matrix after determining a source language subunit, wherein the characteristic matrix is determined after adding corresponding preset difference values to elements corresponding to the source language subunit and the predicate respectively;

when a plurality of feature matrixes can be determined, taking the feature matrix with the maximum determinant as a target feature matrix, and taking a source language subunit corresponding to the target feature matrix as a finally determined source language subunit;

and translating the source language text into a target language text according to the arranged sequence.

2. The method of claim 1,

the source language unit includes: subject, object, style, place, or time;

3. The method of claim 2,

the subject includes: one or more of a first sub-subject, a first sub-predicate, a first sub-object, a first sub-mode-state, a first sub-location-state, and a first sub-time-state;

the place number includes: one or more of a fourth sub-subject, a fourth sub-predicate, a fourth sub-object, a fourth sub-mode state, a fourth sub-location state, and a fourth sub-time state;

the sentence structure of the target language is as follows: (first sub-subject + first sub-predicate + first sub-object + first sub-manner-state + first sub-point-state + first sub-time-state) + predicate + (second sub-subject + second sub-predicate + second sub-object + second sub-manner-state + second sub-time-state) + (third sub-subject + third sub-predicate + third sub-object + third sub-manner-state + third sub-time-state) + (fourth sub-subject + fourth sub-predicate + fourth sub-manner-state + fourth sub-point-state + fourth sub-time-state) + (fifth sub-subject + fifth sub-object + fifth sub-manner-state + fifth sub-point-state + fifth sub-time-state) + first sub-subject + second sub-time-state-.

4. The method according to claim 1, wherein the element values of the diagonal elements of the initial matrix are a first initial value, and the other element values are a second initial value;

each source language unit corresponds to a preset region of the initial matrix, and each source language subunit in the source language units sequentially corresponds to an element in the preset region; the preset area is one row, one column or one Sudoku of the initial matrix.

5. A sentence structure-based translation apparatus, comprising:

the establishing module is used for establishing an n-order initial matrix, each source language subunit corresponds to one element in the initial matrix, and the predicate also corresponds to one element in the initial matrix; the determinant of the initial matrix is not zero;

the determining module is used for determining a characteristic matrix after determining a source language subunit, wherein the characteristic matrix is determined after adding corresponding preset difference values to elements corresponding to the source language subunit and the predicate respectively;

the processing module is used for taking the characteristic matrix with the maximum determinant as a target characteristic matrix and taking a source language subunit corresponding to the target characteristic matrix as a finally determined source language subunit when a plurality of characteristic matrices can be determined; the sequencing module is used for rearranging the sequence of the source language units and the predicates according to the sentence pattern structure of the target language and rearranging the sequence of the source language subunits in each source language unit;

6. The apparatus of claim 5,

the source language unit includes: subject, object, style, place, or time;

7. The apparatus of claim 6,

8. The apparatus according to claim 5, wherein the element values of the diagonal elements of the initial matrix are a first initial value, and the other element values are a second initial value;