CN108537246A - A kind of method and system that parallel corpora is classified by translation quality - Google Patents

A kind of method and system that parallel corpora is classified by translation quality Download PDF

Info

Publication number
CN108537246A
CN108537246A CN201810166481.1A CN201810166481A CN108537246A CN 108537246 A CN108537246 A CN 108537246A CN 201810166481 A CN201810166481 A CN 201810166481A CN 108537246 A CN108537246 A CN 108537246A
Authority
CN
China
Prior art keywords
parallel corpora
scoring
classified
algorithms
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810166481.1A
Other languages
Chinese (zh)
Inventor
席斌
张马成
李明
彭成超
王兴强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Excellent Translation Information Technology Ltd By Share Ltd
Original Assignee
Chengdu Excellent Translation Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Excellent Translation Information Technology Ltd By Share Ltd filed Critical Chengdu Excellent Translation Information Technology Ltd By Share Ltd
Priority to CN201810166481.1A priority Critical patent/CN108537246A/en
Publication of CN108537246A publication Critical patent/CN108537246A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0463Neocognitrons

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of methods that parallel corpora is classified by translation quality, include the following steps:S1:Establish the mapping relations between parallel corpora scoring and parallel corpora grade;S2:Parallel corpora is scored according to translation result;S3:Classified using feedforward neural network model to the scoring score value of translation result according to the mapping relations between parallel corpora scoring and parallel corpora grade.The invention also discloses systems in this way.A kind of method and system that parallel corpora is classified by translation quality of the present invention, either use BLEU algorithms, improved B LEU algorithms or METEOR algorithms, it can be classified by feedforward neural network, effectively increase the versatility and the scope of application of the present invention.

Description

A kind of method and system that parallel corpora is classified by translation quality
Technical field
The present invention relates to translation fields, and in particular to a kind of parallel corpora is by the method that translation quality is classified and is System.
Background technology
With the rapid development of economic globalization and Internet, the translation of natural language is promoting politics, economic, text Change exchange etc. and plays increasingly important role.Past, people needed to turn over spoken and written languages in international exchange field When translating, need to use human translation, take time and effort, and with the high speed development of computer hardware technique, machine translation and Computer-aided translation is more and more widely used.
Parallel corpora refers to by source text and its parallel corresponding bilingual or multi-lingual language material translated Chinese language and originally constituted;For Raising translation quality, existing parallel corpora is generally scored using BLEU algorithms, however the result of BLEU algorithms is often It can be by influences such as language material grammer, sentence length and synonyms, so appraisal result accuracy is poor.
Invention content
The technical problem to be solved by the present invention is to existing parallel corporas generally to be scored using BLEU algorithms, however The result of BLEU algorithms is often by influences such as language material grammer, sentence length and synonyms, so appraisal result accuracy It is poor, and it is an object of the present invention to provide a kind of method and system that parallel corpora is classified by translation quality, solve the above problems.
The present invention is achieved through the following technical solutions:
A kind of method that parallel corpora is classified by translation quality, includes the following steps:S1:Establish parallel corpora scoring Mapping relations between parallel corpora grade;S2:Parallel corpora is scored according to translation result;S3:According to parallel language Mapping relations between material scoring and parallel corpora grade to the scoring score value of translation result using feedforward neural network model into Row classification.
In the prior art, in order to improve translation quality, existing parallel corpora is generally scored using BLEU algorithms, so And the result of BLEU algorithms is often by influences such as language material grammer, sentence length and synonyms, so appraisal result is accurate Property is poor.The present invention is in application, establish the mapping relations between parallel corpora scoring and parallel corpora grade, this mapping relations Generally one-to-many mapping relations, i.e. parallel corpora grade correspond to " one ", and parallel corpora scoring is corresponding " more ";By parallel language Material scores according to translation result, and existing scoring algorithm, such as BLEU algorithms or modified may be used in this scoring BLEU algorithms can also be scored using METEOR algorithms;Then according between parallel corpora scoring and parallel corpora grade Mapping relations classified using feedforward neural network model to the scoring score value of translation result;Due to passing through BP Neural Network Network model is classified, and the similar language material that can will score assigns to same class, to effectively reduce error score;Meanwhile BLEU algorithms, improved B LEU algorithms or METEOR algorithms are either used, can be divided by feedforward neural network Class effectively increases the versatility and the scope of application of the present invention.
Further, scoring described in step S2 is scored using BLEU algorithms.
The present invention is in application, BLEU algorithms are ripe algorithms, and in general appraisal result is stablized relatively, so selection BLEU Algorithm.
Further, the feedforward neural network model uses perceptron model.
The present invention is in application, perceptron model is a kind of basic model of feedforward neural network model, due to perceptron Operation characteristic carries out operation by introducing bigoted amount, and by the activation primitive of perceptron, can have convergence speed faster Degree reduces operand, is very suitable for the analysis and arrangement of a large amount of parallel corporas.
A kind of system that parallel corpora is classified by translation quality, including:For establish parallel corpora scoring with it is parallel Map unit between language material grade;Scoring unit for parallel corpora to score according to translation result;For foundation The mapping relations that parallel corpora scores between parallel corpora grade use feedforward neural network to the scoring score value of translation result The feedforward neural network unit that model is classified.
In the prior art, in order to improve translation quality, existing parallel corpora is generally scored using BLEU algorithms, so And the result of BLEU algorithms is often by influences such as language material grammer, sentence length and synonyms, so appraisal result is accurate Property is poor.The present invention is in application, establish the mapping relations between parallel corpora scoring and parallel corpora grade, this mapping relations Generally one-to-many mapping relations, i.e. parallel corpora grade correspond to " one ", and parallel corpora scoring is corresponding " more ";By parallel language Material scores according to translation result, and existing scoring algorithm, such as BLEU algorithms or modified may be used in this scoring BLEU algorithms can also be scored using METEOR algorithms;Then according between parallel corpora scoring and parallel corpora grade Mapping relations classified using feedforward neural network model to the scoring score value of translation result;Due to passing through BP Neural Network Network model is classified, and the similar language material that can will score assigns to same class, to effectively reduce error score;Meanwhile BLEU algorithms, improved B LEU algorithms or METEOR algorithms are either used, can be divided by feedforward neural network Class effectively increases the versatility and the scope of application of the present invention.
Further, the scoring unit is scored using BLEU algorithms.
Further, the feedforward neural network unit uses perceptron model.
Compared with prior art, the present invention having the following advantages and advantages:
1, a kind of method that parallel corpora is classified by translation quality of the present invention either uses BLEU algorithms, improves Type BLEU algorithms or METEOR algorithms, can be classified by feedforward neural network, effectively increase the logical of the present invention With property and the scope of application;
2, a kind of method that parallel corpora is classified by translation quality of the present invention has faster convergence rate, reduces Operand is very suitable for the analysis and arrangement of a large amount of parallel corporas;
3, a kind of system that parallel corpora is classified by translation quality of the present invention either uses BLEU algorithms, improves Type BLEU algorithms or METEOR algorithms, can be classified by feedforward neural network, effectively increase the logical of the present invention With property and the scope of application.
Description of the drawings
Attached drawing described herein is used for providing further understanding the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is step schematic diagram of the present invention;
Fig. 2 is present system structural schematic diagram.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiment and attached drawing, to this Invention is described in further detail, and exemplary embodiment of the invention and its explanation are only used for explaining the present invention, do not make For limitation of the invention.
Embodiment 1
As shown in Figure 1, a kind of method that parallel corpora is classified by translation quality of the present invention, includes the following steps:S1: Establish the mapping relations between parallel corpora scoring and parallel corpora grade;S2:Parallel corpora is commented according to translation result Point;S3:Before the mapping relations that foundation parallel corpora scores between parallel corpora grade are to the scoring score value use of translation result Classify to neural network model.
When the present embodiment is implemented, the mapping relations between parallel corpora scoring and parallel corpora grade, this mapping are established Relationship is generally one-to-many mapping relations, i.e. parallel corpora grade corresponds to " one ", and parallel corpora scoring is corresponding " more ";It will put down Row language material scores according to translation result, and existing scoring algorithm, such as BLEU algorithms, Huo Zhegai may be used in this scoring Into type BLEU algorithms, can also be scored using METEOR algorithms;Then according to parallel corpora scoring and parallel corpora grade Between mapping relations classified using feedforward neural network model to the scoring score value of translation result;Due to before passing through Godwards Classify through network model, the similar language material that can will score assigns to same class, to effectively reduce error score;Together When, BLEU algorithms, improved B LEU algorithms or METEOR algorithms are either used, can be carried out by feedforward neural network Classification effectively increases the versatility and the scope of application of the present invention.
Embodiment 2
The present embodiment on the basis of embodiment 1, using BLEU algorithms scored by scoring described in step S2.
When the present embodiment is implemented, BLEU algorithms are ripe algorithms, and in general appraisal result is stablized relatively, so selection BLEU algorithms.
Embodiment 3
For the present embodiment on the basis of embodiment 1, the feedforward neural network model uses perceptron model.
When the present embodiment is implemented, perceptron model is a kind of basic model of feedforward neural network model, due to perceptron Operation characteristic, carry out operation by introducing bigoted amount, and by the activation primitive of perceptron, can have convergence speed faster Degree reduces operand, is very suitable for the analysis and arrangement of a large amount of parallel corporas.
Embodiment 4
As shown in Fig. 2, a kind of system that parallel corpora is classified by translation quality of the present invention, including:It is flat for establishing The map unit that row language material scores between parallel corpora grade;For parallel corpora to be commented according to what translation result scored Subdivision;For making to the scoring score value of translation result according to the mapping relations between parallel corpora scoring and parallel corpora grade The feedforward neural network unit classified with feedforward neural network model;The scoring unit is commented using BLEU algorithms Point.The feedforward neural network unit uses perceptron model.
When the present embodiment is implemented, the mapping relations between parallel corpora scoring and parallel corpora grade, this mapping are established Relationship is generally one-to-many mapping relations, i.e. parallel corpora grade corresponds to " one ", and parallel corpora scoring is corresponding " more ";It will put down Row language material scores according to translation result, and existing scoring algorithm, such as BLEU algorithms, Huo Zhegai may be used in this scoring Into type BLEU algorithms, can also be scored using METEOR algorithms;Then according to parallel corpora scoring and parallel corpora grade Between mapping relations classified using feedforward neural network model to the scoring score value of translation result;Due to before passing through Godwards Classify through network model, the similar language material that can will score assigns to same class, to effectively reduce error score;Together When, BLEU algorithms, improved B LEU algorithms or METEOR algorithms are either used, can be carried out by feedforward neural network Classification effectively increases the versatility and the scope of application of the present invention.
Above-described specific implementation mode has carried out further the purpose of the present invention, technical solution and advantageous effect It is described in detail, it should be understood that the foregoing is merely the specific implementation mode of the present invention, is not intended to limit the present invention Protection domain, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (6)

1. a kind of method that parallel corpora is classified by translation quality, which is characterized in that include the following steps:
S1:Establish the mapping relations between parallel corpora scoring and parallel corpora grade;
S2:Parallel corpora is scored according to translation result;
S3:Before the mapping relations that foundation parallel corpora scores between parallel corpora grade are to the scoring score value use of translation result Classify to neural network model.
2. the method that a kind of parallel corpora according to claim 1 is classified by translation quality, which is characterized in that step Scoring described in S2 is scored using BLEU algorithms.
3. the method that a kind of parallel corpora according to claim 1 is classified by translation quality, which is characterized in that described Feedforward neural network model uses perceptron model.
4. using the system of any one of claims 1 to 3 method, which is characterized in that including:
For establishing the map unit between parallel corpora scoring and parallel corpora grade;
Scoring unit for parallel corpora to score according to translation result;
For being used the scoring score value of translation result according to the mapping relations between parallel corpora scoring and parallel corpora grade The feedforward neural network unit that feedforward neural network model is classified.
5. the system that a kind of parallel corpora according to claim 4 is classified by translation quality, which is characterized in that described Scoring unit is scored using BLEU algorithms.
6. the system that a kind of parallel corpora according to claim 4 is classified by translation quality, which is characterized in that described Feedforward neural network unit uses perceptron model.
CN201810166481.1A 2018-02-28 2018-02-28 A kind of method and system that parallel corpora is classified by translation quality Pending CN108537246A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810166481.1A CN108537246A (en) 2018-02-28 2018-02-28 A kind of method and system that parallel corpora is classified by translation quality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810166481.1A CN108537246A (en) 2018-02-28 2018-02-28 A kind of method and system that parallel corpora is classified by translation quality

Publications (1)

Publication Number Publication Date
CN108537246A true CN108537246A (en) 2018-09-14

Family

ID=63486291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810166481.1A Pending CN108537246A (en) 2018-02-28 2018-02-28 A kind of method and system that parallel corpora is classified by translation quality

Country Status (1)

Country Link
CN (1) CN108537246A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684648A (en) * 2019-01-14 2019-04-26 浙江大学 A kind of Chinese automatic translating method at all times of multiple features fusion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739867A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for scoring interpretation quality by using computer
CN102945232A (en) * 2012-11-16 2013-02-27 沈阳雅译网络技术有限公司 Training-corpus quality evaluation and selection method orienting to statistical-machine translation
CN105335357A (en) * 2015-11-18 2016-02-17 成都优译信息技术有限公司 Linguistic data recommending method in translation system
CN106598957A (en) * 2016-12-21 2017-04-26 语联网(武汉)信息技术有限公司 Data analysis method and system of translated sentence
CN107704806A (en) * 2017-09-01 2018-02-16 深圳市唯特视科技有限公司 A kind of method that quality of human face image prediction is carried out based on depth convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739867A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for scoring interpretation quality by using computer
CN102945232A (en) * 2012-11-16 2013-02-27 沈阳雅译网络技术有限公司 Training-corpus quality evaluation and selection method orienting to statistical-machine translation
CN105335357A (en) * 2015-11-18 2016-02-17 成都优译信息技术有限公司 Linguistic data recommending method in translation system
CN106598957A (en) * 2016-12-21 2017-04-26 语联网(武汉)信息技术有限公司 Data analysis method and system of translated sentence
CN107704806A (en) * 2017-09-01 2018-02-16 深圳市唯特视科技有限公司 A kind of method that quality of human face image prediction is carried out based on depth convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵中堂著: "《基于智能移动终端的行为识别方法研究》", 30 April 2015, 电子科技大学出版社 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684648A (en) * 2019-01-14 2019-04-26 浙江大学 A kind of Chinese automatic translating method at all times of multiple features fusion

Similar Documents

Publication Publication Date Title
CN104750687B (en) Improve method and device, machine translation method and the device of bilingualism corpora
CN104933027B (en) A kind of open Chinese entity relation extraction method of utilization dependency analysis
Dowling et al. SMT versus NMT: Preliminary comparisons for Irish
US20180089178A1 (en) Mining multi-lingual data
JP2018037095A (en) Phrase-based dictionary extraction and translation quality evaluation
CN107885737A (en) A kind of human-computer interaction interpretation method and system
CN107729316A (en) The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese
CN108519979A (en) The method and system that translation memory library and MT are combined in a kind of CAT systems
CN112784589B (en) Training sample generation method and device and electronic equipment
CN106547743B (en) Translation method and system
CN111178098B (en) Text translation method, device, equipment and computer readable storage medium
CN106951416A (en) Multilingual instant translation system based on big data processing and manual intervention
CN108537246A (en) A kind of method and system that parallel corpora is classified by translation quality
Wang et al. Breaking the representation bottleneck of Chinese characters: Neural machine translation with stroke sequence modeling
CN109657244A (en) A kind of English long sentence automatic segmentation method and system
CN106874262A (en) A kind of statistical machine translation method for realizing domain-adaptive
Zhang Recognition and Segmentation of English Long and Short Sentences Based on Machine Translation.
CN109697287A (en) Sentence-level bilingual alignment method and system
JP2023002730A (en) Text error correction and text error correction model generating method, device, equipment, and medium
Che et al. A word segmentation method of ancient Chinese based on word alignment
CN102955842A (en) Multi-feature-fused controlling method for recognizing Chinese organization name
CN113962225A (en) Road name translation method and device, electronic equipment and storage medium
Salunkhe et al. A research work on English to Marathi hybrid translation system
Leng et al. Analysis and research on lexical errors in machine translation in Chinese and Korean translation
CN105354188A (en) Batch scoring method for translation teaching system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180914