CN104503960B - A kind of text data processing method for English Translation - Google Patents

A kind of text data processing method for English Translation Download PDF

Info

Publication number
CN104503960B
CN104503960B CN201510006001.1A CN201510006001A CN104503960B CN 104503960 B CN104503960 B CN 104503960B CN 201510006001 A CN201510006001 A CN 201510006001A CN 104503960 B CN104503960 B CN 104503960B
Authority
CN
China
Prior art keywords
translation
text
msub
text message
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510006001.1A
Other languages
Chinese (zh)
Other versions
CN104503960A (en
Inventor
姜华
程迎新
单畅
丛岩
李飞
李一飞
胡帅
项睿
李峰华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bohai University
Original Assignee
Bohai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bohai University filed Critical Bohai University
Priority to CN201510006001.1A priority Critical patent/CN104503960B/en
Publication of CN104503960A publication Critical patent/CN104503960A/en
Application granted granted Critical
Publication of CN104503960B publication Critical patent/CN104503960B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a kind of text data processing method for English Translation, user logs in client by account number and accesses common platform, upload text to be translated, document handling system is split text to be translated in units of sentence, translation result is searched in database by search system, the task delivery system of common platform can be distributed to the sentence for not searching result, task is got for other translation users and is translated.The result of human translation can be stored in database, so that database is expanded, user can also be scored translation result and be stored in the lump in database, and the low translation that scores can be scored high translation and be replaced, and be optimized database.The present invention makes machine translation be combined with human translation, has given full play to machine translation and the respective advantage of human translation, and with the continuous expansion and optimization of database, can gradually decrease the amount of human translation, realizes quick, accurate, economic interpretation method.

Description

A kind of text data processing method for English Translation
Technical field
The invention belongs to English Translation field, more particularly to a kind of text data processing method for English Translation.
Background technology
With deepening continuously for international exchange, the translation demand to English file is also increasing, has promoted large quantities of English The appearance of language translation tool, these English Translation instruments are generally divided into online version and city edition, regardless of whether online version or local Version is all to search for translation in database to be translated, and the great translation for meeting user of the appearance of these translation tools is needed Ask, to improving translation efficiency, promote social progress to contribute.
And grammer due to English, rule are numerous, differ in the database of translation tool surely matches needs and turns over completely The sentence translated, is essentially all that sentence to be translated is carried out into man-to-man word translation, tense and word order often make a mistake, And translate stiff, do not reach the requirement for often saying that fidelity, fluency, elegance in translation, the user for now also needing to english foundation is carried out Check and correction sentence by sentence, makes word order in order, adjusts tense, and language is reorganized according to the knowledge of grammar of oneself, and poor for english foundation People for just it is helpless.
Artificial turn over of translator's progress that please be professional is another method to information translation, and professional turn over simply is asked at present Translate personnel to carry out translating the commission for also needing to pay costliness, and need to wait for the long period, the level of translator is also uneven, The subjective consciousness of translator also can produce influence to translation result.Therefore it is badly in need of at present a kind of economical, quick and can guarantee that standard The English Translation method of determination.
The content of the invention
In order to solve the above problems, the invention provides a kind of text data processing method for English Translation, the party Method improves the accuracy of translation, improves translation efficiency.
The present invention solve its technical problem use technical scheme be:There is provided at a kind of text data for English Translation Reason method, comprises the following steps:
Step 1: carrying out Text region to the file that the first user uploads, the first text message is obtained;
Step 2: splitting to first text message, the punctuate in first text message is recognized, with fullstop For split position, the second text message in units of sentence is obtained;
Step 3: carrying out database search according to second text message, correspondence or similar translation are searched whether Object statement, is exported the object statement as the 3rd text message if having, otherwise goes to step four;
Step 4: system is classified to each sentence of the second text message according to translation difficulty, second user selection is arrogated to oneself Long field and appropriate difficulty, human translation is carried out to second text message, and cypher text is used as the 4th text message Output, and the 4th text message is stored in the database as object statement.
Preferably, second user described in step 4 can change the 3rd text message, and be used as the 5th text message Exported, the 5th text message is stored in the database as object statement;
Preferably, the first user to the 3rd text message, the 4th text message, the 5th text message using sentence as Unit carries out object statement confidence score, and the confidence score information is stored in the database together with object statement;
Preferably, certain points transfer in first user account is used as institute in the second user account State second user and translate second text, the reward of the 3rd text, the first user is according to the adjusting of difficulty reward points of sentence Number.
Preferably, first user is to the confidence score of the 3rd text, the original confidence level with the text Processing is weighted in scoring, and is stored in the database as new confidence score information, and weighted calculation is according to as follows Formula:
Wherein, i is the number of times of scoring, XiFor the confidence score after being scored through i times, aiFor the scoring of the first user, A is The full marks of confidence score, σiFor biasing degree, c is constant, and e is Euler's numbers.
Preferably, the multiple special translating purpose texts for the same sentence being stored in the database, are commented according to confidence level Point information is arranged, and the high special translating purpose text of confidence score replaces the low special translating purpose text of confidence score.
The beneficial effects of the present invention are:A kind of text data processing method for English Translation uploads the first user File split in units of sentence, reduce the difficulty of subsequent searches translation, search corresponding target in database Text can be exported directly, so as to obtain translation result, the translation for realizing text rapidly and efficiently, save translation cost and Time.Correspondence target text is not searched human translation is carried out by second user, to ensure the accuracy and grammer of translation just True property.The result of second user translation can be stored in database as object statement, database is continuously available expansion, so as to increase The big matching degree of follow-up text search, reduces the amount of second user human translation.First user enters to the target text of output Row confidence score, confidence score information is stored between database, the plurality of target text of same sentence in the lump with target text It can be ranked up and replace according to confidence score information, be optimized database, improve the accuracy of translation.The present invention Machine translation and the respective advantage of human translation have been given full play to, and with the continuous expansion and optimization of database, can gradually The amount of human translation is reduced, quick, accurate, economic interpretation method is realized.
Brief description of the drawings
Fig. 1 is a kind of text data processing method flow chart for English Translation of the invention.
Embodiment
Technical scheme is specifically addressed with specific embodiment below in conjunction with the accompanying drawings.
The invention provides a kind of text data processing method for English Translation as shown in figure 1, being divided into following four Step:
First, the first user uploads the file for needing to translate by client, and this document can be identified for system, to obtain Computer can handle text message, by recognizing the information such as the letter in file, space, punctuate, obtain the first text message;
2nd, system is split the first text message obtained in step one, is recognized in first text message Punctuate, using fullstop as split position, is divided into single sentence one by one by the first text message, is used as the second text message;
3rd, there is one to be stored with system and largely correspond to the databases of special translating purpose text messages, the basis in the database Second text message is scanned for, and searches correspondence or similar special translating purpose sentence, can for similarity very high object statement The word distinguished in the sentence is replaced, area in the object statement is replaced with the conventional meaning of the word in the second text message The semanteme of other word, the object statement is exported as the 3rd text message, if being to search object statement in database Then switch to step 4;
4th, second user carries out human translation to second text message, and second user may be referred to the first text Context, and the 3rd text message of database output are translated, and special translating purpose text is exported as the 4th text message, and And be stored in the 4th text message as object statement in database, database is supplemented, is that the database of next time is searched Rope provides facility.
In addition, second user can also change what is searched for and export in database according to the wish of oneself in step 4 3rd text message, makes translation result more accurate, and the translation result is exported as the 5th text message, and the 5th text This information is stored in the database as object statement, is supplemented database.
Used exporting the 3rd text message, second user translation the 4th text message of output and second by database search Family change the 3rd text message obtain after the 5th text message, the first user to the 3rd text message, the 4th text message, 5th text message carries out object statement confidence score in units of sentence, accuracy of first user according to translation, language The standard such as politeness, the correctness of grammer, true, the translation of objective appraisal object statement quality, this confidence score letter Breath can be stored in database in the lump together with the object statement.
Database search, which exports the 3rd text message, has original confidence score information, and the first user enters again to it After row confidence score, the confidence score that the confidence score of first user can be original with the 3rd text message is added Power processing, so as to obtain the new confidence score information of the 3rd text message.Weighted calculation is according to equation below:
Wherein, i is the number of times of scoring, XiFor the confidence score after being scored through i times, aiFor the scoring of the first user, A is The full marks of confidence score, σiFor biasing degree, c is constant, and e is Euler's numbers.I.e. confidence score is by the scoring of each user Obtained after being weighted, and every time the scoring of user will again in certain scope, allow when initial the scope that user scores compared with Greatly, with the increase of scoring number, biasing degree is less and less, that is, allows user smaller in last time confidence level weighted scoring number both sides In the range of score.
The new confidence score information is together stored in database with the 3rd text message, so as to complete the 3rd text The renewal of this information confidence score, after the multiple confidence score of numerous first users and weighting, comments the confidence level Branch more they tends to objective reality.
In addition, database search exports the 3rd text message and second user changes the 3rd text message and obtains the 5th text Information is the different special translating purpose texts to same sentence, and these different special translating purpose texts can be commented together with their confidence level Point information is together stored in database, can be arranged them according to confidence score in database, is searched in progress database Suo Shi, acquiescence output confidence score highest special translating purpose text, the first user and second user can be looked into voluntarily as needed See other special translating purpose texts.When the different special translating purpose texts for being stored in same sentence are excessive, database can be according to confidence level Score information, deletes the minimum special translating purpose text of scoring, makes database information obtain updating optimization, what data-base content was obtained The survival of the fittest, makes database too fat to move, the information of high-quality is preserved again.
In addition, the client of each user is both provided with points account, the first user need to be by one in oneself points account The integration of fixed number amount takes out the remuneration as second user human translation, and it is positive that second user can increase translation by reward on total mark Property, fulfil one's duty for the first user translate, second user can with earning come integration be used as another user translation second user The remuneration of the file to be translated uploaded.User can be that points account increases and integrated by the mode such as supplementing with money, and system can also be regular Reward any active ues and the larger certain integration of contribution, increase the loyalty of user.Confidence score is carried out for malice The first user and malice translation second user system the punishment of integration can be deducted to it, or even nullify its client letter Breath.
By embodiment of above, the purpose of the present invention is realized well, and the invention provides a good ecology System, user, which can upload file request translation, can also translate the file of other users upload, realize doulbe-sides' victory, the information of translation and Confidence score information is constantly extended in database, and database can also voluntarily optimize letter according to confidence score information Breath, makes that information capacity in database is increasing, and quality is more and more excellent, realizes rapid development, is provided for users It is convenient.Database expansion to a certain extent after, the first user upload file can all be searched substantially in database, and With high accuracy, human translation is reduced as far as possible, makes translation process more convenient, quick.
Although embodiment of the present invention is disclosed as above, it is not restricted in specification and embodiment listed With it can be applied to various suitable the field of the invention completely, can be easily for those skilled in the art Other modification is realized, therefore under the universal limited without departing substantially from claim and equivalency range, the present invention is not limited In specific details and shown here as the legend with description.

Claims (3)

1. a kind of text data processing method for English Translation, it is characterised in that comprise the following steps:
Step 1: the first user uploads file to be translated by logging in client, and add the field letter that file to be translated is related to Breath, the file that system is uploaded to the first user carries out Text region, obtains the first text message;
Step 2: split to first text message, the punctuate in first text message is recognized, using fullstop to divide Position is cut, the second text message in units of sentence is obtained;
Step 3: carrying out database search according to second text message, correspondence or similar special translating purpose are searched whether Sentence, is exported the object statement as the 3rd text message if having, otherwise goes to step four;
Step 4: system is classified to each sentence of the second text message according to translation difficulty, second user selection is good at Field and appropriate difficulty, human translation is carried out to second text message, and cypher text is exported as the 4th text message, And the 4th text message is stored in the database as object statement;
Wherein, second user described in step 4 can change the 3rd text message, and be exported as the 5th text message, 5th text message is stored in the database as object statement;First user is to the 3rd text message, the 4th Text message, the 5th text message carry out object statement confidence score in units of sentence, and the confidence score information connects The database is stored in object statement;
First user is to the confidence score of the 3rd text, and original confidence score is weighted meter with the text Calculation is handled, and is stored in the database as new confidence score information, and weighted calculation is according to equation below:
<mrow> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <mrow> <msub> <mi>X</mi> <mrow> <mi>i</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>&amp;CenterDot;</mo> <mrow> <mo>(</mo> <mi>i</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>a</mi> <mi>i</mi> </msub> </mrow> <mi>i</mi> </mfrac> <mo>,</mo> <msub> <mi>a</mi> <mi>i</mi> </msub> <mo>&amp;Element;</mo> <mo>&amp;lsqb;</mo> <msub> <mi>X</mi> <mrow> <mi>i</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>-</mo> <msub> <mi>&amp;sigma;</mi> <mi>i</mi> </msub> <mi>A</mi> <mo>,</mo> <msub> <mi>X</mi> <mrow> <mi>i</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>&amp;sigma;</mi> <mi>i</mi> </msub> <mi>A</mi> <mo>&amp;rsqb;</mo> </mrow>
<mrow> <msub> <mi>&amp;sigma;</mi> <mi>i</mi> </msub> <mo>=</mo> <msup> <mrow> <mo>(</mo> <mfrac> <mi>c</mi> <mi>e</mi> </mfrac> <mo>)</mo> </mrow> <mi>i</mi> </msup> </mrow>
Wherein, i is the number of times of scoring, XiFor the confidence score after being scored through i times, aiFor the scoring of the first user currently, A is The full marks of confidence score, σiFor biasing degree, c is constant, and e is Euler's numbers.
2. the text data processing method according to claim 1 for English Translation, it is characterised in that described first uses Certain points transfer in the account of family in the second user account, as the second user translate second text, The reward of 3rd text, the first user is according to the numbers of the adjusting of difficulty reward points of sentence.
3. the text data processing method according to claim 2 for English Translation, it is characterised in that be stored in described Multiple special translating purpose texts of same sentence in database, are arranged according to confidence score information, and confidence score is high Special translating purpose text replace the low special translating purpose text of confidence score.
CN201510006001.1A 2015-01-07 2015-01-07 A kind of text data processing method for English Translation Expired - Fee Related CN104503960B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510006001.1A CN104503960B (en) 2015-01-07 2015-01-07 A kind of text data processing method for English Translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510006001.1A CN104503960B (en) 2015-01-07 2015-01-07 A kind of text data processing method for English Translation

Publications (2)

Publication Number Publication Date
CN104503960A CN104503960A (en) 2015-04-08
CN104503960B true CN104503960B (en) 2017-09-19

Family

ID=52945358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510006001.1A Expired - Fee Related CN104503960B (en) 2015-01-07 2015-01-07 A kind of text data processing method for English Translation

Country Status (1)

Country Link
CN (1) CN104503960B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933813A (en) * 2017-02-16 2017-07-07 牡丹江师范学院 A kind of text data processing method for English Translation
CN107193809A (en) * 2017-05-18 2017-09-22 广东小天才科技有限公司 A kind of teaching material scenario generation method and device, user equipment
CN107402918A (en) * 2017-06-19 2017-11-28 上海青橙实业有限公司 The character string translation processing method and device of electric terminal
CN108647731A (en) * 2018-05-14 2018-10-12 宁波江丰生物信息技术有限公司 Cervical carcinoma identification model training method based on Active Learning
CN109858745A (en) * 2018-12-26 2019-06-07 语联网(武汉)信息技术有限公司 Transcription platform matching process and device
CN109582983A (en) * 2018-12-27 2019-04-05 王婧锦 A kind of translation science commonly uses data processing method
CN114341867B (en) * 2019-10-15 2023-06-09 深圳市欢太科技有限公司 Translation method, translation device, translation client, translation server and translation storage medium
CN112328790A (en) * 2020-11-06 2021-02-05 渤海大学 Fast text classification method of corpus
CN112446213B (en) * 2020-11-26 2022-10-14 电子科技大学 Text corpus expansion method
CN112597779A (en) * 2020-12-24 2021-04-02 语联网(武汉)信息技术有限公司 Document translation method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0660117A (en) * 1992-08-11 1994-03-04 Toshiba Corp Translated word selection device in machine translation system
CN102662934A (en) * 2012-04-01 2012-09-12 百度在线网络技术(北京)有限公司 Method and device for proofing translated texts in inter-lingual communication
CN102708097A (en) * 2012-04-27 2012-10-03 曾立人 Online computer translation method and online computer translation system
CN103810159B (en) * 2012-11-14 2017-03-01 阿里巴巴集团控股有限公司 Machine translation data processing method, system and terminal
CN104090870B (en) * 2014-06-26 2018-04-20 语联网(武汉)信息技术有限公司 A kind of method for pushing of translation on line engine

Also Published As

Publication number Publication date
CN104503960A (en) 2015-04-08

Similar Documents

Publication Publication Date Title
CN104503960B (en) A kind of text data processing method for English Translation
Zhang et al. Attentive interactive neural networks for answer selection in community question answering
US20170185581A1 (en) Systems and methods for suggesting emoji
CN104102630B (en) A kind of method for normalizing for Chinese and English mixing text in Chinese social networks
CN104731774B (en) Towards the personalized interpretation method and device of general machine translation engine
Zhang et al. Building earth mover's distance on bilingual word embeddings for machine translation
Yagcioglu et al. A distributed representation based query expansion approach for image captioning
CN102663129A (en) Medical field deep question and answer method and medical retrieval system
KR20050005523A (en) Word association method and apparatus
US20080120092A1 (en) Phrase pair extraction for statistical machine translation
CN103646088A (en) Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN100535907C (en) Method for extracting entity address message in text context
Kit et al. Comparative evaluation of online machine translation systems with legal texts
Štajner et al. Shared task on quality assessment for text simplification
CN108717459B (en) A kind of mobile application defect positioning method of user oriented comment information
CN108984711B (en) Personalized APP recommendation method based on hierarchical embedding
CN111785387A (en) Method and system for disease standardized mapping classification by using Bert
Oravecz et al. etranslation’s submissions to the wmt 2020 news translation task
CN110297897B (en) Question-answer processing method and related product
CN110334362B (en) Method for solving and generating untranslated words based on medical neural machine translation
CN108491399A (en) Chinese to English machine translation method based on context iterative analysis
CN103488629A (en) Method for extracting translation unit table in machine translation
CN106649289A (en) Realization method and realization system for simultaneously identifying bilingual terms and word alignment
Gasperin et al. Natural language processing for social inclusion: a text simplification architecture for different literacy levels
WO2021000400A1 (en) Hospital guide similar problem pair generation method and system, and computer device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170919

Termination date: 20190107