CN104503960B - A kind of text data processing method for English Translation - Google Patents
A kind of text data processing method for English Translation Download PDFInfo
- Publication number
- CN104503960B CN104503960B CN201510006001.1A CN201510006001A CN104503960B CN 104503960 B CN104503960 B CN 104503960B CN 201510006001 A CN201510006001 A CN 201510006001A CN 104503960 B CN104503960 B CN 104503960B
- Authority
- CN
- China
- Prior art keywords
- translation
- text
- msub
- text message
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention provides a kind of text data processing method for English Translation, user logs in client by account number and accesses common platform, upload text to be translated, document handling system is split text to be translated in units of sentence, translation result is searched in database by search system, the task delivery system of common platform can be distributed to the sentence for not searching result, task is got for other translation users and is translated.The result of human translation can be stored in database, so that database is expanded, user can also be scored translation result and be stored in the lump in database, and the low translation that scores can be scored high translation and be replaced, and be optimized database.The present invention makes machine translation be combined with human translation, has given full play to machine translation and the respective advantage of human translation, and with the continuous expansion and optimization of database, can gradually decrease the amount of human translation, realizes quick, accurate, economic interpretation method.
Description
Technical field
The invention belongs to English Translation field, more particularly to a kind of text data processing method for English Translation.
Background technology
With deepening continuously for international exchange, the translation demand to English file is also increasing, has promoted large quantities of English
The appearance of language translation tool, these English Translation instruments are generally divided into online version and city edition, regardless of whether online version or local
Version is all to search for translation in database to be translated, and the great translation for meeting user of the appearance of these translation tools is needed
Ask, to improving translation efficiency, promote social progress to contribute.
And grammer due to English, rule are numerous, differ in the database of translation tool surely matches needs and turns over completely
The sentence translated, is essentially all that sentence to be translated is carried out into man-to-man word translation, tense and word order often make a mistake,
And translate stiff, do not reach the requirement for often saying that fidelity, fluency, elegance in translation, the user for now also needing to english foundation is carried out
Check and correction sentence by sentence, makes word order in order, adjusts tense, and language is reorganized according to the knowledge of grammar of oneself, and poor for english foundation
People for just it is helpless.
Artificial turn over of translator's progress that please be professional is another method to information translation, and professional turn over simply is asked at present
Translate personnel to carry out translating the commission for also needing to pay costliness, and need to wait for the long period, the level of translator is also uneven,
The subjective consciousness of translator also can produce influence to translation result.Therefore it is badly in need of at present a kind of economical, quick and can guarantee that standard
The English Translation method of determination.
The content of the invention
In order to solve the above problems, the invention provides a kind of text data processing method for English Translation, the party
Method improves the accuracy of translation, improves translation efficiency.
The present invention solve its technical problem use technical scheme be:There is provided at a kind of text data for English Translation
Reason method, comprises the following steps:
Step 1: carrying out Text region to the file that the first user uploads, the first text message is obtained;
Step 2: splitting to first text message, the punctuate in first text message is recognized, with fullstop
For split position, the second text message in units of sentence is obtained;
Step 3: carrying out database search according to second text message, correspondence or similar translation are searched whether
Object statement, is exported the object statement as the 3rd text message if having, otherwise goes to step four;
Step 4: system is classified to each sentence of the second text message according to translation difficulty, second user selection is arrogated to oneself
Long field and appropriate difficulty, human translation is carried out to second text message, and cypher text is used as the 4th text message
Output, and the 4th text message is stored in the database as object statement.
Preferably, second user described in step 4 can change the 3rd text message, and be used as the 5th text message
Exported, the 5th text message is stored in the database as object statement;
Preferably, the first user to the 3rd text message, the 4th text message, the 5th text message using sentence as
Unit carries out object statement confidence score, and the confidence score information is stored in the database together with object statement;
Preferably, certain points transfer in first user account is used as institute in the second user account
State second user and translate second text, the reward of the 3rd text, the first user is according to the adjusting of difficulty reward points of sentence
Number.
Preferably, first user is to the confidence score of the 3rd text, the original confidence level with the text
Processing is weighted in scoring, and is stored in the database as new confidence score information, and weighted calculation is according to as follows
Formula:
Wherein, i is the number of times of scoring, XiFor the confidence score after being scored through i times, aiFor the scoring of the first user, A is
The full marks of confidence score, σiFor biasing degree, c is constant, and e is Euler's numbers.
Preferably, the multiple special translating purpose texts for the same sentence being stored in the database, are commented according to confidence level
Point information is arranged, and the high special translating purpose text of confidence score replaces the low special translating purpose text of confidence score.
The beneficial effects of the present invention are:A kind of text data processing method for English Translation uploads the first user
File split in units of sentence, reduce the difficulty of subsequent searches translation, search corresponding target in database
Text can be exported directly, so as to obtain translation result, the translation for realizing text rapidly and efficiently, save translation cost and
Time.Correspondence target text is not searched human translation is carried out by second user, to ensure the accuracy and grammer of translation just
True property.The result of second user translation can be stored in database as object statement, database is continuously available expansion, so as to increase
The big matching degree of follow-up text search, reduces the amount of second user human translation.First user enters to the target text of output
Row confidence score, confidence score information is stored between database, the plurality of target text of same sentence in the lump with target text
It can be ranked up and replace according to confidence score information, be optimized database, improve the accuracy of translation.The present invention
Machine translation and the respective advantage of human translation have been given full play to, and with the continuous expansion and optimization of database, can gradually
The amount of human translation is reduced, quick, accurate, economic interpretation method is realized.
Brief description of the drawings
Fig. 1 is a kind of text data processing method flow chart for English Translation of the invention.
Embodiment
Technical scheme is specifically addressed with specific embodiment below in conjunction with the accompanying drawings.
The invention provides a kind of text data processing method for English Translation as shown in figure 1, being divided into following four
Step:
First, the first user uploads the file for needing to translate by client, and this document can be identified for system, to obtain
Computer can handle text message, by recognizing the information such as the letter in file, space, punctuate, obtain the first text message;
2nd, system is split the first text message obtained in step one, is recognized in first text message
Punctuate, using fullstop as split position, is divided into single sentence one by one by the first text message, is used as the second text message;
3rd, there is one to be stored with system and largely correspond to the databases of special translating purpose text messages, the basis in the database
Second text message is scanned for, and searches correspondence or similar special translating purpose sentence, can for similarity very high object statement
The word distinguished in the sentence is replaced, area in the object statement is replaced with the conventional meaning of the word in the second text message
The semanteme of other word, the object statement is exported as the 3rd text message, if being to search object statement in database
Then switch to step 4;
4th, second user carries out human translation to second text message, and second user may be referred to the first text
Context, and the 3rd text message of database output are translated, and special translating purpose text is exported as the 4th text message, and
And be stored in the 4th text message as object statement in database, database is supplemented, is that the database of next time is searched
Rope provides facility.
In addition, second user can also change what is searched for and export in database according to the wish of oneself in step 4
3rd text message, makes translation result more accurate, and the translation result is exported as the 5th text message, and the 5th text
This information is stored in the database as object statement, is supplemented database.
Used exporting the 3rd text message, second user translation the 4th text message of output and second by database search
Family change the 3rd text message obtain after the 5th text message, the first user to the 3rd text message, the 4th text message,
5th text message carries out object statement confidence score in units of sentence, accuracy of first user according to translation, language
The standard such as politeness, the correctness of grammer, true, the translation of objective appraisal object statement quality, this confidence score letter
Breath can be stored in database in the lump together with the object statement.
Database search, which exports the 3rd text message, has original confidence score information, and the first user enters again to it
After row confidence score, the confidence score that the confidence score of first user can be original with the 3rd text message is added
Power processing, so as to obtain the new confidence score information of the 3rd text message.Weighted calculation is according to equation below:
Wherein, i is the number of times of scoring, XiFor the confidence score after being scored through i times, aiFor the scoring of the first user, A is
The full marks of confidence score, σiFor biasing degree, c is constant, and e is Euler's numbers.I.e. confidence score is by the scoring of each user
Obtained after being weighted, and every time the scoring of user will again in certain scope, allow when initial the scope that user scores compared with
Greatly, with the increase of scoring number, biasing degree is less and less, that is, allows user smaller in last time confidence level weighted scoring number both sides
In the range of score.
The new confidence score information is together stored in database with the 3rd text message, so as to complete the 3rd text
The renewal of this information confidence score, after the multiple confidence score of numerous first users and weighting, comments the confidence level
Branch more they tends to objective reality.
In addition, database search exports the 3rd text message and second user changes the 3rd text message and obtains the 5th text
Information is the different special translating purpose texts to same sentence, and these different special translating purpose texts can be commented together with their confidence level
Point information is together stored in database, can be arranged them according to confidence score in database, is searched in progress database
Suo Shi, acquiescence output confidence score highest special translating purpose text, the first user and second user can be looked into voluntarily as needed
See other special translating purpose texts.When the different special translating purpose texts for being stored in same sentence are excessive, database can be according to confidence level
Score information, deletes the minimum special translating purpose text of scoring, makes database information obtain updating optimization, what data-base content was obtained
The survival of the fittest, makes database too fat to move, the information of high-quality is preserved again.
In addition, the client of each user is both provided with points account, the first user need to be by one in oneself points account
The integration of fixed number amount takes out the remuneration as second user human translation, and it is positive that second user can increase translation by reward on total mark
Property, fulfil one's duty for the first user translate, second user can with earning come integration be used as another user translation second user
The remuneration of the file to be translated uploaded.User can be that points account increases and integrated by the mode such as supplementing with money, and system can also be regular
Reward any active ues and the larger certain integration of contribution, increase the loyalty of user.Confidence score is carried out for malice
The first user and malice translation second user system the punishment of integration can be deducted to it, or even nullify its client letter
Breath.
By embodiment of above, the purpose of the present invention is realized well, and the invention provides a good ecology
System, user, which can upload file request translation, can also translate the file of other users upload, realize doulbe-sides' victory, the information of translation and
Confidence score information is constantly extended in database, and database can also voluntarily optimize letter according to confidence score information
Breath, makes that information capacity in database is increasing, and quality is more and more excellent, realizes rapid development, is provided for users
It is convenient.Database expansion to a certain extent after, the first user upload file can all be searched substantially in database, and
With high accuracy, human translation is reduced as far as possible, makes translation process more convenient, quick.
Although embodiment of the present invention is disclosed as above, it is not restricted in specification and embodiment listed
With it can be applied to various suitable the field of the invention completely, can be easily for those skilled in the art
Other modification is realized, therefore under the universal limited without departing substantially from claim and equivalency range, the present invention is not limited
In specific details and shown here as the legend with description.
Claims (3)
1. a kind of text data processing method for English Translation, it is characterised in that comprise the following steps:
Step 1: the first user uploads file to be translated by logging in client, and add the field letter that file to be translated is related to
Breath, the file that system is uploaded to the first user carries out Text region, obtains the first text message;
Step 2: split to first text message, the punctuate in first text message is recognized, using fullstop to divide
Position is cut, the second text message in units of sentence is obtained;
Step 3: carrying out database search according to second text message, correspondence or similar special translating purpose are searched whether
Sentence, is exported the object statement as the 3rd text message if having, otherwise goes to step four;
Step 4: system is classified to each sentence of the second text message according to translation difficulty, second user selection is good at
Field and appropriate difficulty, human translation is carried out to second text message, and cypher text is exported as the 4th text message,
And the 4th text message is stored in the database as object statement;
Wherein, second user described in step 4 can change the 3rd text message, and be exported as the 5th text message,
5th text message is stored in the database as object statement;First user is to the 3rd text message, the 4th
Text message, the 5th text message carry out object statement confidence score in units of sentence, and the confidence score information connects
The database is stored in object statement;
First user is to the confidence score of the 3rd text, and original confidence score is weighted meter with the text
Calculation is handled, and is stored in the database as new confidence score information, and weighted calculation is according to equation below:
<mrow>
<msub>
<mi>X</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>X</mi>
<mrow>
<mi>i</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msub>
<mo>&CenterDot;</mo>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>+</mo>
<msub>
<mi>a</mi>
<mi>i</mi>
</msub>
</mrow>
<mi>i</mi>
</mfrac>
<mo>,</mo>
<msub>
<mi>a</mi>
<mi>i</mi>
</msub>
<mo>&Element;</mo>
<mo>&lsqb;</mo>
<msub>
<mi>X</mi>
<mrow>
<mi>i</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mi>&sigma;</mi>
<mi>i</mi>
</msub>
<mi>A</mi>
<mo>,</mo>
<msub>
<mi>X</mi>
<mrow>
<mi>i</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</msub>
<mo>+</mo>
<msub>
<mi>&sigma;</mi>
<mi>i</mi>
</msub>
<mi>A</mi>
<mo>&rsqb;</mo>
</mrow>
<mrow>
<msub>
<mi>&sigma;</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<msup>
<mrow>
<mo>(</mo>
<mfrac>
<mi>c</mi>
<mi>e</mi>
</mfrac>
<mo>)</mo>
</mrow>
<mi>i</mi>
</msup>
</mrow>
Wherein, i is the number of times of scoring, XiFor the confidence score after being scored through i times, aiFor the scoring of the first user currently, A is
The full marks of confidence score, σiFor biasing degree, c is constant, and e is Euler's numbers.
2. the text data processing method according to claim 1 for English Translation, it is characterised in that described first uses
Certain points transfer in the account of family in the second user account, as the second user translate second text,
The reward of 3rd text, the first user is according to the numbers of the adjusting of difficulty reward points of sentence.
3. the text data processing method according to claim 2 for English Translation, it is characterised in that be stored in described
Multiple special translating purpose texts of same sentence in database, are arranged according to confidence score information, and confidence score is high
Special translating purpose text replace the low special translating purpose text of confidence score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510006001.1A CN104503960B (en) | 2015-01-07 | 2015-01-07 | A kind of text data processing method for English Translation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510006001.1A CN104503960B (en) | 2015-01-07 | 2015-01-07 | A kind of text data processing method for English Translation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104503960A CN104503960A (en) | 2015-04-08 |
CN104503960B true CN104503960B (en) | 2017-09-19 |
Family
ID=52945358
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510006001.1A Expired - Fee Related CN104503960B (en) | 2015-01-07 | 2015-01-07 | A kind of text data processing method for English Translation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104503960B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106933813A (en) * | 2017-02-16 | 2017-07-07 | 牡丹江师范学院 | A kind of text data processing method for English Translation |
CN107193809A (en) * | 2017-05-18 | 2017-09-22 | 广东小天才科技有限公司 | A kind of teaching material scenario generation method and device, user equipment |
CN107402918A (en) * | 2017-06-19 | 2017-11-28 | 上海青橙实业有限公司 | The character string translation processing method and device of electric terminal |
CN108647731A (en) * | 2018-05-14 | 2018-10-12 | 宁波江丰生物信息技术有限公司 | Cervical carcinoma identification model training method based on Active Learning |
CN109858745A (en) * | 2018-12-26 | 2019-06-07 | 语联网(武汉)信息技术有限公司 | Transcription platform matching process and device |
CN109582983A (en) * | 2018-12-27 | 2019-04-05 | 王婧锦 | A kind of translation science commonly uses data processing method |
CN114341867B (en) * | 2019-10-15 | 2023-06-09 | 深圳市欢太科技有限公司 | Translation method, translation device, translation client, translation server and translation storage medium |
CN112328790A (en) * | 2020-11-06 | 2021-02-05 | 渤海大学 | Fast text classification method of corpus |
CN112446213B (en) * | 2020-11-26 | 2022-10-14 | 电子科技大学 | Text corpus expansion method |
CN112597779A (en) * | 2020-12-24 | 2021-04-02 | 语联网(武汉)信息技术有限公司 | Document translation method and device |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0660117A (en) * | 1992-08-11 | 1994-03-04 | Toshiba Corp | Translated word selection device in machine translation system |
CN102662934A (en) * | 2012-04-01 | 2012-09-12 | 百度在线网络技术(北京)有限公司 | Method and device for proofing translated texts in inter-lingual communication |
CN102708097A (en) * | 2012-04-27 | 2012-10-03 | 曾立人 | Online computer translation method and online computer translation system |
CN103810159B (en) * | 2012-11-14 | 2017-03-01 | 阿里巴巴集团控股有限公司 | Machine translation data processing method, system and terminal |
CN104090870B (en) * | 2014-06-26 | 2018-04-20 | 语联网(武汉)信息技术有限公司 | A kind of method for pushing of translation on line engine |
-
2015
- 2015-01-07 CN CN201510006001.1A patent/CN104503960B/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN104503960A (en) | 2015-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104503960B (en) | A kind of text data processing method for English Translation | |
Zhang et al. | Attentive interactive neural networks for answer selection in community question answering | |
US20170185581A1 (en) | Systems and methods for suggesting emoji | |
CN104102630B (en) | A kind of method for normalizing for Chinese and English mixing text in Chinese social networks | |
CN104731774B (en) | Towards the personalized interpretation method and device of general machine translation engine | |
Zhang et al. | Building earth mover's distance on bilingual word embeddings for machine translation | |
Yagcioglu et al. | A distributed representation based query expansion approach for image captioning | |
CN102663129A (en) | Medical field deep question and answer method and medical retrieval system | |
KR20050005523A (en) | Word association method and apparatus | |
US20080120092A1 (en) | Phrase pair extraction for statistical machine translation | |
CN103646088A (en) | Product comment fine-grained emotional element extraction method based on CRFs and SVM | |
CN100535907C (en) | Method for extracting entity address message in text context | |
Kit et al. | Comparative evaluation of online machine translation systems with legal texts | |
Štajner et al. | Shared task on quality assessment for text simplification | |
CN108717459B (en) | A kind of mobile application defect positioning method of user oriented comment information | |
CN108984711B (en) | Personalized APP recommendation method based on hierarchical embedding | |
CN111785387A (en) | Method and system for disease standardized mapping classification by using Bert | |
Oravecz et al. | etranslation’s submissions to the wmt 2020 news translation task | |
CN110297897B (en) | Question-answer processing method and related product | |
CN110334362B (en) | Method for solving and generating untranslated words based on medical neural machine translation | |
CN108491399A (en) | Chinese to English machine translation method based on context iterative analysis | |
CN103488629A (en) | Method for extracting translation unit table in machine translation | |
CN106649289A (en) | Realization method and realization system for simultaneously identifying bilingual terms and word alignment | |
Gasperin et al. | Natural language processing for social inclusion: a text simplification architecture for different literacy levels | |
WO2021000400A1 (en) | Hospital guide similar problem pair generation method and system, and computer device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170919 Termination date: 20190107 |