CN103744840B

CN103744840B - A kind of analysis method of document translation difficulty

Info

Publication number: CN103744840B
Application number: CN201310713175.2A
Authority: CN
Inventors: 江潮; 张芃
Original assignee: Language Network (wuhan) Information Technology Co Ltd
Current assignee: Language Network (wuhan) Information Technology Co Ltd
Priority date: 2013-12-23
Filing date: 2013-12-23
Publication date: 2016-12-07
Anticipated expiration: 2033-12-23
Also published as: CN103744840A

Abstract

The invention discloses a kind of analysis method of document translation difficulty, including: scanning waiting for translating shelves, determine all vocabulary in described waiting for translating shelves and all statements；Carry out complicated dynamic behaviour respectively according to the described vocabulary determined and statement, obtain vocabulary complexity and the statement complexity of document；Described vocabulary complexity according to described document and described document complicated dynamic behaviour obtain the translation difficulty numerical value of described document.Translation difficulty numerical value according to described document is carried out in grade of difficulty table, determines the translation grade of difficulty of described document.The present invention, by providing the computational methods of the translation difficulty of a kind of document, calculates the translation difficulty of waiting for translating shelves accurately, improves the accuracy analyzing document translation difficulty.

Description

A kind of analysis method of document translation difficulty

Technical field

The present invention relates to translation technology field, in particular to a kind of analysis method of document translation difficulty.

Background technology

Differentiation for document translation difficulty can be divided into artificial cognition and machine to differentiate.Artificial cognition is special by language Family or translation expert are labeled documents to be translated and judge, owing to reading and the understanding of people limit, this method speed is relatively Slow to expend the biggest human cost simultaneously, and owing to differentiating irregular and everyone reason to document difficulty of people's ability Solve different and produce the biggest differentiation difference, it determines result cannot accomplish unified standard, and objectivity is very poor.It is to pass through that machine differentiates Computer structure is unified fixed method and is carried out document translating difficulty judgement, and current most common method is by uncommon in document The statistics of words carries out difficulty judgement, the determination methods of this single dimension its be used as the Reliability comparotive of differentiation factor Thin, there is bigger one-sidedness, the differentiation result obtained is often the biggest with practical situation difference, it is impossible to ensure to differentiate the standard of result Really property.Differentiation to document translation difficulty at present, also lacks a method of discrimination the most efficient but also relatively accurate.

Summary of the invention

It is desirable to provide a kind of analysis method of document translation difficulty, solve how by document respectively to suitably The problem of interpreter.

The invention discloses a kind of analysis method of document translation difficulty, including:

Scanning waiting for translating shelves, determine all vocabulary in described waiting for translating shelves and all statements；

Carry out complicated dynamic behaviour respectively according to the described vocabulary determined and statement, obtain vocabulary complexity and the statement of document Complexity；

The translation that described vocabulary complexity according to described document and described statement complicated dynamic behaviour obtain described document is difficult Number of degrees value；

Translation difficulty numerical value according to described document mates in grade of difficulty table, determines that the translation of described document is difficult Degree grade.

Preferably, the process of the vocabulary complexity calculating described document includes:

Calculate the vocabulary grade of document, class symbol pictograph ratio and notional word meaning of a word density；

Calculating according to vocabulary complicated dynamic behaviour formula, obtain the described vocabulary complexity of described document, described vocabulary is complicated Degree computing formula is as follows:

Diff_word=K₁₁·grade_word+K₁₂·STTR+K₁₃·density_notional；

Wherein, diff_word is described document vocabulary complexity, and grade_word is the vocabulary grade of described document, STTR is the class symbol pictograph ratio of described document, and density_notional is the notional word meaning of a word density of described document, K₁₁、K₁₂ And K₁₃For being calculated vocabulary complexity adjustment factor by sample.

Preferably, before calculating the vocabulary grade of described document, also include:

Described document is carried out word segmentation processing, obtains all vocabulary, and statistics obtains total vocabulary number；

The each described vocabulary obtained is mated in vocabulary hierarchical table, obtains the vocabulary level of each described vocabulary Not；Described vocabulary level is one-level, two grades, three grades or level Four；

Add up the quantity of the described vocabulary of the rank that described vocabulary level is two grades or more than two grades respectively；

The process of the vocabulary grade calculating described document includes:

Calculate the vocabulary grade of described document according to vocabulary rating calculation formula, described vocabulary rating calculation formula is such as Under:

g r a d e_w o r d = K_{111} \cdot \frac{{word}_{2}}{w o r d} + K_{112} \cdot \frac{{word}_{3}}{w o r d} + K_{113} \cdot \frac{{word}_{4}}{w o r d};

Wherein, word_xFor the quantity of vocabulary that vocabulary level is X level, K₁₁₁、K₁₁₂And K₁₁₃For being calculated by sample Vocabulary grade adjustment factor, word is total vocabulary number.

Preferably, the process at the class symbol pictograph ratio calculating described document includes；

According to all described vocabulary obtained, add up class therein symbol number and pictograph number, calculate described class symbol number with described The ratio of pictograph number, obtains the class symbol pictograph ratio of described document；Or

The all described vocabulary obtained is divided into multiple subdocument, and 1 not enough criterion numeral measure word according to standard number The subdocument converged, calculates than computing formula according to class symbol pictograph, obtains the class symbol pictograph ratio of described document；Described class symbol pictograph ratio Computing formula is as follows:

S T T R = \begin{matrix} \frac{1}{(n + 1) \cdot S T \cdot t o k e n} \cdot (t y p e \cdot S T + t o k e n \cdot Σ_{i = 1}^{n} {type}_{i}), & (n &GreaterEqual; 1) \\ \frac{t y p e}{t o k e n}, & (n = 1) \end{matrix}

；Wherein, token is the pictograph number of the subdocument of described not enough standard number vocabulary, and type is not enough standard number The class symbol number of the subdocument of vocabulary, type_iFor the class symbol number of the i-th subdocument containing standard number vocabulary, n is described containing mark The subdocument quantity of quasi-quantity vocabulary, ST is described standard number vocabulary dividing unit.

Preferably, before calculating the described notional word meaning of a word density of described document, also include:

The all described vocabulary obtained is carried out part-of-speech tagging, obtains notional word therein；

The all described notional word obtained is arranged according to a definite sequence；

The senses of a dictionary entry number meanings of each described notional word is obtained according to synonym ontology tool_i, wherein i is described real justice The sequence number of word；And add up the senses of a dictionary entry sum of described notional word；

Calculate according to notional word meaning of a word density computing formula, obtain the notional word meaning of a word density of described document；Described real justice Word meaning of a word density computing formula is as follows:

d e n s i t y_n o t i o n a l = \frac{Σ_{i = 1}^{c o u n t_n o t i o n a l} {meanings}_{i}}{Σ_{i = 1}^{c o u n t_n o t i o n a l} {meanings}_{i} + (w o r d - c o u n t_n o t i o n a l)};

Wherein, meanings_iFor i-th notional word senses of a dictionary entry number, count_notional is the number of described notional word Amount.

Preferably, described notional word at least includes the part of speech of one below: noun, synonym, verb, adjective, adverbial word And interjection.

Preferably, before calculating the described statement complexity of described document, also include:

The whole sentence number being determined by described document calculates the average length of whole sentence；

The quantity of the first kind clause in all described whole sentence being determined by described document calculates in whole sentence The average length of one generic clause；

The long sentence number being determined by described document and the length gauge of each long sentence calculate the average length of long sentence；

The quantity being determined by the Equations of The Second Kind clause in all described long sentence in described document calculates in long sentence The average length of two generic clauses；

The process of the described statement complexity calculating described document includes:

The described statement complexity of described document is calculated according to statement complicated dynamic behaviour formula；Described statement complexity Computing formula is as follows:

Diff_sentence=K₂₁·MLS+K₂₂·MLC+K₂₃·MLL+K₂₄·MLCL；

Wherein, MLS is the average length of described whole sentence, and MLC is the average length of described first kind clause, and MLL is described The average length of long sentence, MLCL is the average length of described Equations of The Second Kind clause, K₂₁、K₂₂、K₂₃And K₂₄For being calculated by sample Statement complexity adjustment factor.

Preferably, the process of the average length calculating described whole sentence and described first kind clause includes:

By described total vocabulary number except described whole sentence number, obtain the average length of described whole sentence；

By described total vocabulary number except the quantity of described first kind clause, obtain the average length of described first kind clause.

Preferably, the process of the average length calculating described long sentence and described Equations of The Second Kind clause includes:

Add up length word_long of each described long sentence_i, 1≤i≤count_long；Wherein, i is the sequence number of long sentence；

Average length computing formula according to long sentence is calculated the average length of described long sentence；The average meter of described long sentence Calculation formula is as follows:

M L L = \frac{1}{c o u n t_l o n g} \cdot Σ_{i = 1}^{c o u n t_l o n g} w o r d_{long}_{i};

Wherein, count_long is described long sentence number；

The average length of described Equations of The Second Kind clause it is calculated according to the average length computing formula of Equations of The Second Kind clause；Described The average length computing formula of Equations of The Second Kind clause is as follows:

M L C L = \frac{1}{c o u n t_c l a u s e_l o n g} \cdot Σ_{i = 1}^{c o u n t_l o n g} w o r d_{long}_{i};

Wherein, count_clause_long is the quantity of described Equations of The Second Kind clause.

Preferably, the calculating process of the translation difficulty numerical value of described document includes:

The translation difficulty numerical value of described document it is calculated according to translation difficulty computing formula；Described translation difficulty calculates public affairs Formula is as follows:

Diff_doc=K₁·diff_word+K₂·diff_sentence；

Wherein, K₁And K₂For being calculated translation difficulty adjustment factor by sample, diff_doc is translation difficulty numerical value.

The analysis method of the document translation difficulty in the present invention, has the advantage that

1, the unified translation difficulty objectively calculating document, improves the accuracy of the translation difficulty calculated；

2, can be used for distributing translation duties to interpreter, rationally realize distributing rationally of resource.

Accompanying drawing explanation

Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, this Bright schematic description and description is used for explaining the present invention, is not intended that inappropriate limitation of the present invention.In the accompanying drawings:

Fig. 1 shows the flow chart of embodiment.

Detailed description of the invention

Below with reference to the accompanying drawings and in conjunction with the embodiments, the present invention is described in detail.

The technical program carries out the analysis of waiting for translating shelves translation difficulty in terms of 2: vocabulary complexity and statement are complicated Degree, determines the translation difficulty of waiting for translating shelves, specifically includes according to the vocabulary complexity of waiting for translating shelves and statement complexity

S11, scanning waiting for translating shelves, determine all vocabulary in described waiting for translating shelves and all statements；

S12, carry out complicated dynamic behaviour respectively according to the described vocabulary determined and statement, obtain document vocabulary complexity and Statement complexity；

S13, described vocabulary complexity and described statement complicated dynamic behaviour according to described document obtain turning over of described document Translate difficulty numerical value；

S14, translation difficulty numerical value according to described document mate in grade of difficulty table, determine turning over of described document Translate grade of difficulty.

Based on said method, a preferred embodiment presented below:

Determine waiting for translating shelves, i.e. document；

1, calculating the vocabulary complexity of the document, process is as follows:

The document being carried out word segmentation processing, obtains all vocabulary in the document, wherein term " vocabulary " should only not understand For English word, it is also understood as the word with character form structure, such as Chinese character, Japanese, Korean etc.；And/or there is alphabetical shape knot The word of structure, such as French, Russian etc.；And all vocabulary are interpreted as including dittograph and converge；

1), the vocabulary grade of calculating document:

The each vocabulary obtained is mated in vocabulary hierarchical table, it is thus achieved that the rank that each vocabulary is mated, this level Wei one-level, two grades, three grades or level Four；Wherein, one-level, two grades and three grades are obtained by coupling of tabling look-up, will be in vocabulary hierarchical table The unsuccessful vocabulary of middle coupling is as level Four；

The frequency that each languages can occur in actual use according to its vocabulary, carries out staged care to vocabulary.This skill Art scheme according to each languages to vocabulary various authority grading rules, set up the vocabulary hierarchical table of each languages, by each language The vocabulary planted is divided into 3 ranks by conventional degree.Such as Chinese is with " general specification Chinese character table " and " information exchange encoding of chinese characters Character set baseset " as the classification reference of Chinese character, by Chinese character by conventional, secondary conventional and uncommon corresponding one-level respectively, two grades With three grades.

Adding up the vocabulary quantity that rank is one-level is word₁, adding up the vocabulary quantity that rank is two grades is word₂, add up level Be not the vocabulary quantity of three grades be word3, statistics rank be the vocabulary quantity of level Four be word₄；

The quantity of all vocabulary in statistic document, as total vocabulary number word；

Calculate the ratio that two grades and above vocabulary are shared in a document, as follows:

Rank is that ratio shared by the vocabulary of two grades isRank is that ratio shared by the vocabulary of three grades isAnd level Shared by the vocabulary of level Four, ratio is not

Carry out being calculated the vocabulary grade of document according to vocabulary rating calculation formula；Formula is as follows:

g r a d e_w o r d = K_{111} \cdot \frac{{word}_{2}}{w o r d} + K_{112} \cdot \frac{{word}_{3}}{w o r d} + K_{113} \cdot \frac{{word}_{4}}{w o r d};

Wherein, grade_word is vocabulary grade, K₁₁₁、K₁₁₂And K₁₁₃The vocabulary grade calculated by given sample is adjusted Joint coefficient, belongs to third level adjustment factor, and this adjustment factor is that multiple linear regression coefficient can be calculated by method of least square Obtain.Circular is as follows:

Order:

Y = g r a d e_w o r d, X_{1} = \frac{{word}_{2}}{w o r d}, X_{2} = \frac{{word}_{3}}{w o r d}, X_{3} = \frac{{word}_{4}}{w o r d};

N group sample data for collecting:

{X₁₁,X₁₂,X₁₃}；

{X₂₁,X₂₂,X₂₃}；

.；

{X_n1,X_n2,X_n3}；

Correspondence provides the vocabulary grade that expert evaluation goes out:

\{\begin{matrix} Y_{1} \\ Y_{2} \\ . \\ . \\ . \\ Y_{n} \end{matrix}\};

Thus can obtain following system of linear equations:

Y₁=K₁₁₁·X₁₁+K₁₁₂·X₁₂+K₁₁₃·X₁₃；

Y₂=K₁₁₁·X₂₁+K₁₁₂·X₂₂+K₁₁₃·X₂₃；

.；

Y_n=K₁₁₁·X₂₁+K₁₁₂·X₂₂+K₁₁₃·X₂₃；

Obtain:

[\begin{matrix} K_{111} \\ K_{112} \\ K_{113} \end{matrix}] = {(X^{'} X)}^{- 1} X^{'} Y;

Wherein,

X = [\begin{matrix} X_{11} & X_{12} & X_{13} \\ X_{21} & X_{22} & X_{23} \\ . \\ . \\ . \\ X_{n 1} & X_{n 2} & X_{n 3} \end{matrix}], Y = [\begin{matrix} y_{1} \\ y_{2} \\ . \\ . \\ . \\ y_{n} \end{matrix}],

X ' is the transposed matrix of X.

2) the standard class symbol pictograph, calculating document compares:

The total vocabulary number occurred in pictograph in statistic document, i.e. document；

Class symbol in statistic document, the vocabulary number differed i.e. occurred in document；

Class symbol pictograph ratio (TTR) represents vocabulary rate of change, and document collects the abundant degree of vocabulary.The ratio of TTR is more Height, illustrates that the different vocabulary that the text is used are the most, and its reading difficulty increases the most accordingly.Due to for any one language The quantity of word or vocabulary is fixing fixed, so when document is the biggest, class symbol pictograph ratio will be the least, and the class symbol pictograph ratio counted Will distortion.Therefore actual treatment can be that unit is carried out based on TTR by every standard number ST (such as ST value 1000) individual vocabulary Calculate, finally using the average of all TTR as final value, i.e. standard class symbol pictograph ratio (STTR, Standard TTR).Not enough mark The document of quasi-quantity, directly carries out TTR calculating.Specific as follows:

All vocabulary of document are divided into n the first subdocument according to standard number ST, each first subdocument has The quantity having class to accord with is type_i；Wherein i is the sequence number of the first subdocument；

Also include second subdocument of a vocabulary lazy weight ST；Class symbol in second subdocument is type and pictograph For token

It is calculated the standard class symbol pictograph ratio of document than computing formula according to standard class symbol pictograph；Formula is as follows:

S T T R = \begin{matrix} \frac{1}{(n + 1) \cdot S T \cdot t o k e n} \cdot (t y p e \cdot S T + t o k e n \cdot Σ_{i = 1}^{n} {type}_{i}), & (n &GreaterEqual; 1) \\ \frac{t y p e}{t o k e n}, & (n = 1) \end{matrix}

Wherein, token is the pictograph number of the subdocument of described not enough standard number vocabulary, and type is not enough criterion numeral measure word The class symbol number of the subdocument converged, type_iFor the class symbol number of the i-th subdocument containing standard number vocabulary, n is described containing standard The subdocument quantity of quantity vocabulary, ST is described standard number vocabulary dividing unit.

3), the notional word meaning of a word density of calculating document:

Lexical density refers to that in a text, notional word accounts for the ratio of total word number.Generally lexical density is the highest, the reality of text Justice word ratio is the biggest, and quantity of information is the biggest, reads and translates difficulty and increase the most therewith.

Quantity count_notional of notional word in statistic document, i.e. statistics include noun, synonym, verb, describe The quantity of word, adverbial word, interjection etc.；

According to synonym ontology tool, add up the senses of a dictionary entry number meanings of each notional word_i(1≤i≤count_ notional)；Wherein, i is the sequence number of notional word；

Add up the senses of a dictionary entry of all notional words, the senses of a dictionary entry number of all notional words is added the total senses of a dictionary entry obtaining all notional words Number.

The notional word meaning of a word density of document it is calculated according to notional word meaning of a word density computing formula；Formula is as follows:

d e n s i t y_n o t i o n a l = \frac{Σ_{i = 1}^{c o u n t_n o t i o n a l} {meanings}_{i}}{Σ_{i = 1}^{c o u n t_n o t i o n a l} {meanings}_{i} + (w o r d - c o u n t_n o t i o n a l)}

Wherein, density_notional is notional word meaning of a word density,

Σ_{i = 1}^{c o u n t_n o t i o n a l} {meanings}_{i}

For reality Total senses of a dictionary entry number of justice word；

Wherein, the notional word meaning of a word density of the vocabulary grade of document, the standard class symbol pictograph ratio of document and document is calculated There is not sequencing in step, can calculate respectively, it is also possible to calculate simultaneously.

4) according to vocabulary grade, standard class symbol pictograph ratio and the notional word meaning of a word density of document, the vocabulary of document is calculated Complexity:

The vocabulary complexity of document is calculated according to vocabulary complicated dynamic behaviour formula；Formula is as follows:

Diff_word=K₁₁·grade_word+K₁₂·STTR+K₁₃·density_notional；

Wherein, diff_word is vocabulary complexity, and grade_word is vocabulary grade, and STTR is that standard class accords with pictograph ratio, Density_notional is notional word meaning of a word density；K₁₁、K₁₂And K₁₃The vocabulary complexity regulation calculated by given sample Coefficient, belongs to second level adjustment factor, and this adjustment factor is that multiple linear regression coefficient can be calculated by method of least square Arrive.Circular is consistent with vocabulary grade adjustment factor.

2, the statement complexity of document is calculated, specific as follows:

Term " whole sentence " is construed as have expressed the lexical set of the complete meaning, such as: document lead-in is to end mark Lexical set between conjunction；Terminating symbol is one of fullstop, exclamation mark, question mark, ellipsis；Or the lead-in that first after terminating symbol Lexical set between the second terminating symbol；

Term " clause " is construed as a part for whole sentence, the word come with mark spaces such as comma, pause mark, branches or Lexical set；

Term " long sentence " is construed as the vocabulary quantity whole sentence more than predetermined threshold；

The first kind used herein and Equations of The Second Kind are served only for distinguishing.

Scheme is specific as follows:

Scanned document, determines all whole sentence in document, and adds up the sum of whole sentence, is denoted as count_sentence；

Using vocabulary quantity more than the whole sentence of predetermined threshold as long sentence, and add up the sum of long sentence, be denoted as count_long Vocabulary quantity with in each long sentence, is denoted as word_long_i, 1≤i≤count_long；I is the sequence number of long sentence；

Clause in whole sentence is first kind clause, the sum of statistics first kind clause, is denoted as count_clause；

Clause in long sentence is Equations of The Second Kind clause, the sum of statistics Equations of The Second Kind clause, is denoted as count_clause_long；

Calculate the average length of whole sentence, the average length of long sentence, the average length of first kind clause and Equations of The Second Kind respectively The average length of sentence；As follows:

The average length (MLS, mean length of sentence) of whole sentence, computational methods are: MLS=word/ count_sentence；

The average length (MLC, mean length of clause) of first kind clause, computational methods are: MLC=word/ count_clause；

The average length (MLL, mean length of long sentence) of long sentence, computational methods are:

M L L = \frac{1}{c o u n t_l o n g} \cdot Σ_{i = 1}^{c o u n t_l o n g} w o r d_{long}_{i};

The average length (MLCL, mean length of clause of long sentence) of Equations of The Second Kind clause, meter Calculation method is:

M L C L = \frac{1}{c o u n t_c l a u s e_l o n g} \cdot Σ_{i = 1}^{c o u n t_l o n g} w o r d_{long}_{i};

It is calculated statement complexity according to statement complicated dynamic behaviour formula；Statement complicated dynamic behaviour formula is as follows:

Diff_sentence=K₂₁·MLS+K₂₂·MLC+K₂₃·MLL+K₂₄·MLCL；

K₂₁、K₂₂、K₂₃And K₂₄For the sentence difficulty level adjustment factor calculated by institute's collecting sample, belong to second level regulation Coefficient, this adjustment factor is that multiple linear regression coefficient can be calculated by method of least square.Circular and word The grade adjustment factor that converges is consistent.

3, the translation difficulty numerical value of document is calculated；

Vocabulary complexity according to the document obtained and statement complexity, be calculated literary composition according to translation difficulty computing formula The translation difficulty numerical value of shelves；Formula is as follows:

Diff_doc=K₁·diff_word+K₂·diff_sentence；

K₁、K₂For the translation difficulty adjustment factor calculated by institute's collecting sample, belong to first order adjustment factor, this tune Joint coefficient is that multiple linear regression coefficient can be calculated by method of least square.Circular regulates with vocabulary grade Coefficient is consistent.

4, the translation grade of difficulty of document is determined；

Translation difficulty numerical value according to document mates in grade of difficulty table, obtains the difficulty etc. that this numerical value is corresponding Level；

Grade of difficulty table is analogous to the form of dictionary, including the translation corresponding to several grade of difficulty and grade of difficulty Difficulty numerical range；

Translation difficulty numerical range in grade of difficulty table carries out learning or train computing to obtain.

The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.All within the spirit and principles in the present invention, that is made any repaiies Change, equivalent, improvement etc., should be included within the scope of the present invention.

Claims

1. the analysis method of a document translation difficulty, it is characterised in that including:

Carrying out complicated dynamic behaviour respectively according to the described vocabulary determined and statement, the vocabulary complexity and the statement that obtain document are complicated Degree；

Described vocabulary complexity according to described document and described statement complicated dynamic behaviour obtain the translation difficulty number of described document Value；

Translation difficulty numerical value according to described document mates in grade of difficulty table, determines the translation difficulty etc. of described document Level；

The process of the vocabulary complexity calculating described document includes:

Calculate according to vocabulary complicated dynamic behaviour formula, obtain the described vocabulary complexity of described document, described vocabulary complexity meter Calculation formula is as follows:

Diff_word=K₁₁·grade_word+K₁₂·STTR+K₁₃·density_notional；

Wherein, diff_word is described document vocabulary complexity, and grade_word is the vocabulary grade of described document, and STTR is The class symbol pictograph ratio of described document, density_notional is the notional word meaning of a word density of described document, K₁₁、K₁₂And K₁₃For It is calculated vocabulary complexity adjustment factor by sample.

Method the most according to claim 1, it is characterised in that before calculating the vocabulary grade of described document, also include:

The each described vocabulary obtained is mated in vocabulary hierarchical table, obtains the vocabulary level of each described vocabulary；Institute Stating vocabulary level is one-level, two grades, three grades or level Four；

The process of the vocabulary grade calculating described document includes:

Calculate the vocabulary grade of described document according to vocabulary rating calculation formula, described vocabulary rating calculation formula is as follows:

g r a d e_w o r d = K_{111} \cdot \frac{{word}_{2}}{w o r d} + K_{112} \cdot \frac{{word}_{3}}{w o r d} + K_{113} \cdot \frac{{word}_{4}}{w o r d};

Wherein, word_xFor the quantity of vocabulary that vocabulary level is X level, K₁₁₁、K₁₁₂And K₁₁₃For being calculated vocabulary etc. by sample Level adjustment factor, word is total vocabulary number.

Method the most according to claim 2, it is characterised in that at the process bag of the class symbol pictograph ratio calculating described document Include；

According to all described vocabulary obtained, add up class therein symbol number and pictograph number, calculate described class symbol number and described pictograph The ratio of number, obtains the class symbol pictograph ratio of described document；Or

The all described vocabulary obtained is divided into multiple subdocument according to standard number, and 1 not enough standard number vocabulary Subdocument, calculates than computing formula according to class symbol pictograph, obtains the class symbol pictograph ratio of described document；Described class symbol pictograph ratio calculates Formula is as follows:

S T T R = \begin{matrix} \frac{1}{(n + 1) \cdot S T \cdot t o k e n} \cdot (t y p e \cdot S T + t o k e n \cdot Σ_{i = 1}^{n} {type}_{i}), & (n &GreaterEqual; 1) \\ \frac{t y p e}{t o k e n}, & (n = 0) \end{matrix};

Wherein, token is the pictograph number of the subdocument of described not enough standard number vocabulary, and type is not enough standard number vocabulary The class symbol number of subdocument, type_iFor the class symbol number of the i-th subdocument containing standard number vocabulary, n is described containing standard number The subdocument quantity of individual vocabulary, ST is described standard number vocabulary dividing unit.

Method the most according to claim 2, it is characterised in that calculate described document described notional word meaning of a word density it Before, also include:

The senses of a dictionary entry number meanings of each described notional word is obtained according to synonym ontology tool_i, wherein i is described notional word Sequence number；And add up the senses of a dictionary entry sum of described notional word；

Calculate according to notional word meaning of a word density computing formula, obtain the notional word meaning of a word density of described document；Described notional word word Justice density computing formula is as follows:

d e n s i t y_n o t i o n a l = \frac{Σ_{i = 1}^{c o u n t_n o t i o n a l} {meanings}_{i}}{Σ_{i = 1}^{c o u n t_n o t i o n a l} {meanings}_{i} + (w o r d - c o u n t_n o t i o n a l)};

Wherein, meanings_iFor i-th notional word senses of a dictionary entry number, count_notional is the quantity of described notional word.

Method the most according to claim 4, it is characterised in that described notional word at least includes the part of speech of one below: name Word, synonym, verb, adjective, adverbial word and interjection.

Method the most according to claim 2, it is characterised in that before calculating the described statement complexity of described document, Also include:

The quantity of the first kind clause in all described whole sentence being determined by described document calculates the first kind in whole sentence The average length of clause；

It is determined by the Equations of The Second Kind that the quantity of the Equations of The Second Kind clause in all described long sentence in described document calculates in long sentence The average length of clause；

The described statement complexity of described document is calculated according to statement complicated dynamic behaviour formula；Described statement complicated dynamic behaviour Formula is as follows:

Diff_sentence=K₂₁·MLS+K₂₂·MLC+K₂₃·MLL+K₂₄·MLCL；

Wherein, MLS is the average length of described whole sentence, and MLC is the average length of described first kind clause, and MLL is described long sentence Average length, MLCL is the average length of described Equations of The Second Kind clause, K₂₁、K₂₂、K₂₃And K₂₄For being calculated statement by sample Complexity adjustment factor.

Method the most according to claim 6, it is characterised in that calculate the average length of described whole sentence and described first kind clause The process of degree includes:

By described total vocabulary number divided by described whole sentence number, obtain average length MLS of described whole sentence；

By described total vocabulary number divided by the quantity of described first kind clause, obtain average length MLC of described first kind clause.

Method the most according to claim 6, it is characterised in that calculate the average length of described long sentence and described Equations of The Second Kind clause The process of degree includes:

Average length computing formula according to long sentence is calculated the average length of described long sentence；The average computation of described long sentence is public Formula is as follows:

M L L = \frac{1}{c o u n t_l o n g} \cdot Σ_{i = 1}^{c o u n t_l o n g} w o r d_{long}_{i};

Wherein, count_long is described long sentence number；

The average length of described Equations of The Second Kind clause it is calculated according to the average length computing formula of Equations of The Second Kind clause；Described second The average length computing formula of generic clause is as follows:

M L C L = \frac{1}{c o u n t_c l a u s e_l o n g} \cdot Σ_{i = 1}^{c o u n t_l o n g} w o r d_{long}_{i};

Method the most according to claim 1, it is characterised in that the calculating process bag of the translation difficulty numerical value of described document Include:

The translation difficulty numerical value of described document it is calculated according to translation difficulty computing formula；Described translation difficulty computing formula is such as Under: diff_doc=K₁·diff_word+K₂·diff_sentence；