CN108537246A - A kind of method and system that parallel corpora is classified by translation quality - Google Patents
A kind of method and system that parallel corpora is classified by translation quality Download PDFInfo
- Publication number
- CN108537246A CN108537246A CN201810166481.1A CN201810166481A CN108537246A CN 108537246 A CN108537246 A CN 108537246A CN 201810166481 A CN201810166481 A CN 201810166481A CN 108537246 A CN108537246 A CN 108537246A
- Authority
- CN
- China
- Prior art keywords
- parallel corpora
- scoring
- classified
- algorithms
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0463—Neocognitrons
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of methods that parallel corpora is classified by translation quality, include the following steps:S1:Establish the mapping relations between parallel corpora scoring and parallel corpora grade;S2:Parallel corpora is scored according to translation result;S3:Classified using feedforward neural network model to the scoring score value of translation result according to the mapping relations between parallel corpora scoring and parallel corpora grade.The invention also discloses systems in this way.A kind of method and system that parallel corpora is classified by translation quality of the present invention, either use BLEU algorithms, improved B LEU algorithms or METEOR algorithms, it can be classified by feedforward neural network, effectively increase the versatility and the scope of application of the present invention.
Description
Technical field
The present invention relates to translation fields, and in particular to a kind of parallel corpora is by the method that translation quality is classified and is
System.
Background technology
With the rapid development of economic globalization and Internet, the translation of natural language is promoting politics, economic, text
Change exchange etc. and plays increasingly important role.Past, people needed to turn over spoken and written languages in international exchange field
When translating, need to use human translation, take time and effort, and with the high speed development of computer hardware technique, machine translation and
Computer-aided translation is more and more widely used.
Parallel corpora refers to by source text and its parallel corresponding bilingual or multi-lingual language material translated Chinese language and originally constituted;For
Raising translation quality, existing parallel corpora is generally scored using BLEU algorithms, however the result of BLEU algorithms is often
It can be by influences such as language material grammer, sentence length and synonyms, so appraisal result accuracy is poor.
Invention content
The technical problem to be solved by the present invention is to existing parallel corporas generally to be scored using BLEU algorithms, however
The result of BLEU algorithms is often by influences such as language material grammer, sentence length and synonyms, so appraisal result accuracy
It is poor, and it is an object of the present invention to provide a kind of method and system that parallel corpora is classified by translation quality, solve the above problems.
The present invention is achieved through the following technical solutions:
A kind of method that parallel corpora is classified by translation quality, includes the following steps:S1:Establish parallel corpora scoring
Mapping relations between parallel corpora grade;S2:Parallel corpora is scored according to translation result;S3:According to parallel language
Mapping relations between material scoring and parallel corpora grade to the scoring score value of translation result using feedforward neural network model into
Row classification.
In the prior art, in order to improve translation quality, existing parallel corpora is generally scored using BLEU algorithms, so
And the result of BLEU algorithms is often by influences such as language material grammer, sentence length and synonyms, so appraisal result is accurate
Property is poor.The present invention is in application, establish the mapping relations between parallel corpora scoring and parallel corpora grade, this mapping relations
Generally one-to-many mapping relations, i.e. parallel corpora grade correspond to " one ", and parallel corpora scoring is corresponding " more ";By parallel language
Material scores according to translation result, and existing scoring algorithm, such as BLEU algorithms or modified may be used in this scoring
BLEU algorithms can also be scored using METEOR algorithms;Then according between parallel corpora scoring and parallel corpora grade
Mapping relations classified using feedforward neural network model to the scoring score value of translation result;Due to passing through BP Neural Network
Network model is classified, and the similar language material that can will score assigns to same class, to effectively reduce error score;Meanwhile
BLEU algorithms, improved B LEU algorithms or METEOR algorithms are either used, can be divided by feedforward neural network
Class effectively increases the versatility and the scope of application of the present invention.
Further, scoring described in step S2 is scored using BLEU algorithms.
The present invention is in application, BLEU algorithms are ripe algorithms, and in general appraisal result is stablized relatively, so selection BLEU
Algorithm.
Further, the feedforward neural network model uses perceptron model.
The present invention is in application, perceptron model is a kind of basic model of feedforward neural network model, due to perceptron
Operation characteristic carries out operation by introducing bigoted amount, and by the activation primitive of perceptron, can have convergence speed faster
Degree reduces operand, is very suitable for the analysis and arrangement of a large amount of parallel corporas.
A kind of system that parallel corpora is classified by translation quality, including:For establish parallel corpora scoring with it is parallel
Map unit between language material grade;Scoring unit for parallel corpora to score according to translation result;For foundation
The mapping relations that parallel corpora scores between parallel corpora grade use feedforward neural network to the scoring score value of translation result
The feedforward neural network unit that model is classified.
In the prior art, in order to improve translation quality, existing parallel corpora is generally scored using BLEU algorithms, so
And the result of BLEU algorithms is often by influences such as language material grammer, sentence length and synonyms, so appraisal result is accurate
Property is poor.The present invention is in application, establish the mapping relations between parallel corpora scoring and parallel corpora grade, this mapping relations
Generally one-to-many mapping relations, i.e. parallel corpora grade correspond to " one ", and parallel corpora scoring is corresponding " more ";By parallel language
Material scores according to translation result, and existing scoring algorithm, such as BLEU algorithms or modified may be used in this scoring
BLEU algorithms can also be scored using METEOR algorithms;Then according between parallel corpora scoring and parallel corpora grade
Mapping relations classified using feedforward neural network model to the scoring score value of translation result;Due to passing through BP Neural Network
Network model is classified, and the similar language material that can will score assigns to same class, to effectively reduce error score;Meanwhile
BLEU algorithms, improved B LEU algorithms or METEOR algorithms are either used, can be divided by feedforward neural network
Class effectively increases the versatility and the scope of application of the present invention.
Further, the scoring unit is scored using BLEU algorithms.
Further, the feedforward neural network unit uses perceptron model.
Compared with prior art, the present invention having the following advantages and advantages:
1, a kind of method that parallel corpora is classified by translation quality of the present invention either uses BLEU algorithms, improves
Type BLEU algorithms or METEOR algorithms, can be classified by feedforward neural network, effectively increase the logical of the present invention
With property and the scope of application;
2, a kind of method that parallel corpora is classified by translation quality of the present invention has faster convergence rate, reduces
Operand is very suitable for the analysis and arrangement of a large amount of parallel corporas;
3, a kind of system that parallel corpora is classified by translation quality of the present invention either uses BLEU algorithms, improves
Type BLEU algorithms or METEOR algorithms, can be classified by feedforward neural network, effectively increase the logical of the present invention
With property and the scope of application.
Description of the drawings
Attached drawing described herein is used for providing further understanding the embodiment of the present invention, constitutes one of the application
Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is step schematic diagram of the present invention;
Fig. 2 is present system structural schematic diagram.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiment and attached drawing, to this
Invention is described in further detail, and exemplary embodiment of the invention and its explanation are only used for explaining the present invention, do not make
For limitation of the invention.
Embodiment 1
As shown in Figure 1, a kind of method that parallel corpora is classified by translation quality of the present invention, includes the following steps:S1:
Establish the mapping relations between parallel corpora scoring and parallel corpora grade;S2:Parallel corpora is commented according to translation result
Point;S3:Before the mapping relations that foundation parallel corpora scores between parallel corpora grade are to the scoring score value use of translation result
Classify to neural network model.
When the present embodiment is implemented, the mapping relations between parallel corpora scoring and parallel corpora grade, this mapping are established
Relationship is generally one-to-many mapping relations, i.e. parallel corpora grade corresponds to " one ", and parallel corpora scoring is corresponding " more ";It will put down
Row language material scores according to translation result, and existing scoring algorithm, such as BLEU algorithms, Huo Zhegai may be used in this scoring
Into type BLEU algorithms, can also be scored using METEOR algorithms;Then according to parallel corpora scoring and parallel corpora grade
Between mapping relations classified using feedforward neural network model to the scoring score value of translation result;Due to before passing through Godwards
Classify through network model, the similar language material that can will score assigns to same class, to effectively reduce error score;Together
When, BLEU algorithms, improved B LEU algorithms or METEOR algorithms are either used, can be carried out by feedforward neural network
Classification effectively increases the versatility and the scope of application of the present invention.
Embodiment 2
The present embodiment on the basis of embodiment 1, using BLEU algorithms scored by scoring described in step S2.
When the present embodiment is implemented, BLEU algorithms are ripe algorithms, and in general appraisal result is stablized relatively, so selection
BLEU algorithms.
Embodiment 3
For the present embodiment on the basis of embodiment 1, the feedforward neural network model uses perceptron model.
When the present embodiment is implemented, perceptron model is a kind of basic model of feedforward neural network model, due to perceptron
Operation characteristic, carry out operation by introducing bigoted amount, and by the activation primitive of perceptron, can have convergence speed faster
Degree reduces operand, is very suitable for the analysis and arrangement of a large amount of parallel corporas.
Embodiment 4
As shown in Fig. 2, a kind of system that parallel corpora is classified by translation quality of the present invention, including:It is flat for establishing
The map unit that row language material scores between parallel corpora grade;For parallel corpora to be commented according to what translation result scored
Subdivision;For making to the scoring score value of translation result according to the mapping relations between parallel corpora scoring and parallel corpora grade
The feedforward neural network unit classified with feedforward neural network model;The scoring unit is commented using BLEU algorithms
Point.The feedforward neural network unit uses perceptron model.
When the present embodiment is implemented, the mapping relations between parallel corpora scoring and parallel corpora grade, this mapping are established
Relationship is generally one-to-many mapping relations, i.e. parallel corpora grade corresponds to " one ", and parallel corpora scoring is corresponding " more ";It will put down
Row language material scores according to translation result, and existing scoring algorithm, such as BLEU algorithms, Huo Zhegai may be used in this scoring
Into type BLEU algorithms, can also be scored using METEOR algorithms;Then according to parallel corpora scoring and parallel corpora grade
Between mapping relations classified using feedforward neural network model to the scoring score value of translation result;Due to before passing through Godwards
Classify through network model, the similar language material that can will score assigns to same class, to effectively reduce error score;Together
When, BLEU algorithms, improved B LEU algorithms or METEOR algorithms are either used, can be carried out by feedforward neural network
Classification effectively increases the versatility and the scope of application of the present invention.
Above-described specific implementation mode has carried out further the purpose of the present invention, technical solution and advantageous effect
It is described in detail, it should be understood that the foregoing is merely the specific implementation mode of the present invention, is not intended to limit the present invention
Protection domain, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (6)
1. a kind of method that parallel corpora is classified by translation quality, which is characterized in that include the following steps:
S1:Establish the mapping relations between parallel corpora scoring and parallel corpora grade;
S2:Parallel corpora is scored according to translation result;
S3:Before the mapping relations that foundation parallel corpora scores between parallel corpora grade are to the scoring score value use of translation result
Classify to neural network model.
2. the method that a kind of parallel corpora according to claim 1 is classified by translation quality, which is characterized in that step
Scoring described in S2 is scored using BLEU algorithms.
3. the method that a kind of parallel corpora according to claim 1 is classified by translation quality, which is characterized in that described
Feedforward neural network model uses perceptron model.
4. using the system of any one of claims 1 to 3 method, which is characterized in that including:
For establishing the map unit between parallel corpora scoring and parallel corpora grade;
Scoring unit for parallel corpora to score according to translation result;
For being used the scoring score value of translation result according to the mapping relations between parallel corpora scoring and parallel corpora grade
The feedforward neural network unit that feedforward neural network model is classified.
5. the system that a kind of parallel corpora according to claim 4 is classified by translation quality, which is characterized in that described
Scoring unit is scored using BLEU algorithms.
6. the system that a kind of parallel corpora according to claim 4 is classified by translation quality, which is characterized in that described
Feedforward neural network unit uses perceptron model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810166481.1A CN108537246A (en) | 2018-02-28 | 2018-02-28 | A kind of method and system that parallel corpora is classified by translation quality |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810166481.1A CN108537246A (en) | 2018-02-28 | 2018-02-28 | A kind of method and system that parallel corpora is classified by translation quality |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108537246A true CN108537246A (en) | 2018-09-14 |
Family
ID=63486291
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810166481.1A Pending CN108537246A (en) | 2018-02-28 | 2018-02-28 | A kind of method and system that parallel corpora is classified by translation quality |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108537246A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684648A (en) * | 2019-01-14 | 2019-04-26 | 浙江大学 | A kind of Chinese automatic translating method at all times of multiple features fusion |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101739867A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Method for scoring interpretation quality by using computer |
CN102945232A (en) * | 2012-11-16 | 2013-02-27 | 沈阳雅译网络技术有限公司 | Training-corpus quality evaluation and selection method orienting to statistical-machine translation |
CN105335357A (en) * | 2015-11-18 | 2016-02-17 | 成都优译信息技术有限公司 | Linguistic data recommending method in translation system |
CN106598957A (en) * | 2016-12-21 | 2017-04-26 | 语联网(武汉)信息技术有限公司 | Data analysis method and system of translated sentence |
CN107704806A (en) * | 2017-09-01 | 2018-02-16 | 深圳市唯特视科技有限公司 | A kind of method that quality of human face image prediction is carried out based on depth convolutional neural networks |
-
2018
- 2018-02-28 CN CN201810166481.1A patent/CN108537246A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101739867A (en) * | 2008-11-19 | 2010-06-16 | 中国科学院自动化研究所 | Method for scoring interpretation quality by using computer |
CN102945232A (en) * | 2012-11-16 | 2013-02-27 | 沈阳雅译网络技术有限公司 | Training-corpus quality evaluation and selection method orienting to statistical-machine translation |
CN105335357A (en) * | 2015-11-18 | 2016-02-17 | 成都优译信息技术有限公司 | Linguistic data recommending method in translation system |
CN106598957A (en) * | 2016-12-21 | 2017-04-26 | 语联网(武汉)信息技术有限公司 | Data analysis method and system of translated sentence |
CN107704806A (en) * | 2017-09-01 | 2018-02-16 | 深圳市唯特视科技有限公司 | A kind of method that quality of human face image prediction is carried out based on depth convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
赵中堂著: "《基于智能移动终端的行为识别方法研究》", 30 April 2015, 电子科技大学出版社 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109684648A (en) * | 2019-01-14 | 2019-04-26 | 浙江大学 | A kind of Chinese automatic translating method at all times of multiple features fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104750687B (en) | Improve method and device, machine translation method and the device of bilingualism corpora | |
CN104933027B (en) | A kind of open Chinese entity relation extraction method of utilization dependency analysis | |
Dowling et al. | SMT versus NMT: Preliminary comparisons for Irish | |
US20180089178A1 (en) | Mining multi-lingual data | |
JP2018037095A (en) | Phrase-based dictionary extraction and translation quality evaluation | |
CN107885737A (en) | A kind of human-computer interaction interpretation method and system | |
CN107729316A (en) | The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese | |
CN108519979A (en) | The method and system that translation memory library and MT are combined in a kind of CAT systems | |
CN112784589B (en) | Training sample generation method and device and electronic equipment | |
CN106547743B (en) | Translation method and system | |
CN111178098B (en) | Text translation method, device, equipment and computer readable storage medium | |
CN106951416A (en) | Multilingual instant translation system based on big data processing and manual intervention | |
CN108537246A (en) | A kind of method and system that parallel corpora is classified by translation quality | |
Wang et al. | Breaking the representation bottleneck of Chinese characters: Neural machine translation with stroke sequence modeling | |
CN109657244A (en) | A kind of English long sentence automatic segmentation method and system | |
CN106874262A (en) | A kind of statistical machine translation method for realizing domain-adaptive | |
Zhang | Recognition and Segmentation of English Long and Short Sentences Based on Machine Translation. | |
CN109697287A (en) | Sentence-level bilingual alignment method and system | |
JP2023002730A (en) | Text error correction and text error correction model generating method, device, equipment, and medium | |
Che et al. | A word segmentation method of ancient Chinese based on word alignment | |
CN102955842A (en) | Multi-feature-fused controlling method for recognizing Chinese organization name | |
CN113962225A (en) | Road name translation method and device, electronic equipment and storage medium | |
Salunkhe et al. | A research work on English to Marathi hybrid translation system | |
Leng et al. | Analysis and research on lexical errors in machine translation in Chinese and Korean translation | |
CN105354188A (en) | Batch scoring method for translation teaching system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180914 |