CN108628847A - A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms - Google Patents
A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms Download PDFInfo
- Publication number
- CN108628847A CN108628847A CN201710173403.XA CN201710173403A CN108628847A CN 108628847 A CN108628847 A CN 108628847A CN 201710173403 A CN201710173403 A CN 201710173403A CN 108628847 A CN108628847 A CN 108628847A
- Authority
- CN
- China
- Prior art keywords
- english
- simultaneous interpretation
- birch
- translation
- mandarin
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/55—Rule-based translation
- G06F40/56—Natural language generation
Abstract
The invention discloses a kind of simultaneous interpretation casees for translating mandarin and English using BIRCH clustering algorithms, including:1)Simultaneous interpretation box main body part, 2)Simultaneous interpretation case fitting part, 3)The mandarin audio large database concept of acquisition;4)The English audio large database concept of acquisition;5)By the large database concept of English words permutation and combination and its paraphrase and syntactic rule that 26 basic letters are constituted;6)The large database concept of the Chinese written language structure and word constituent grammars that be made of radical radical;7)Equilibrium iteration stipulations and cluster using hierarchical method are the translation model of BIRCH clustering algorithms, seven components;By above-mentioned component, the present invention can substitute the advanced simultaneous interpretation translator of profession, be translated for user, the benefit brought is:Translation will not lead to the mistake caused by fatigue for a long time;Greatly reduce the fund cost for engaging simultaneous interpretation translator;Since, without sitting people, therefore volume very little can be that space is saved in international conference place, the seats for placing participants increase the number of participants of meeting more in the simultaneous interpretation case of the present invention.
Description
Technical field
Field the present invention relates to BIRCH clustering algorithms for translation using BIRCH clustering algorithms more particularly to a kind of
Translate the simultaneous interpretation case of mandarin and English.
Background technology
With the quickening of internationalization process, English-Chinese SI translation demand it is increasing, and existing simultaneous interpretation translation be by
People completes, and professional simultaneous interpretation translator's labor intensity is big, and translation accuracy is vulnerable to the influence of personal physical factors, in state
In the meeting of border, if the duration of meeting is long, after muscle power and the energy constantly overdraw of translator, it will because of fatigue so that turning over
The accuracy translated declines.
Invention content
Mandarin and English are translated using BIRCH clustering algorithms the invention mainly solves the technical problem of providing a kind of
Simultaneous interpretation case can substitute the high level translation of high wages, and providing to the user will not turn over because the translation time is long caused by fatigue
Translate mistake.
In order to solve the above technical problems, one aspect of the present invention is:It provides a kind of poly- using BIRCH
The simultaneous interpretation case of class algorithm translation mandarin and English, which is characterized in that including:1)The audio input device of mandarin, 2)Translation
At the audio output apparatus of English, 3)The mandarin audio large database concept of acquisition, 4)The English audio big data of acquisition, 5)By 26
The English words permutation and combination and its large database concept of paraphrase and syntactic rule, 6 that a basis letter is constituted)It is made of radical radical
Chinese written language structure and word constituent grammar large database concept, 7)It is using the equilibrium iteration stipulations and cluster of hierarchical method
The translation model of BIRCH clustering algorithms, seven components;By above-mentioned seven components, the present invention can substitute the advanced of high wages
English-Chinese SI is translated, provide to the user it is cheap and can not fear that fatigue can carry out long-time high quality translation can will be common
Words translate into English or the translation by English Translation at mandarin.
A kind of simultaneous interpretation case for being translated mandarin and English using BIRCH clustering algorithms, is used when building BIRCH clustering trees
Method be euclidean distance function and manhatton distance function, specific formula is as follows:
The structure of CF tree is similar to a B- tree, and there are two parameters for it:Internal node balance factor B, leaf node balance factor
L, cluster radius threshold T.Each node contains up to B child nodes in tree, is denoted as(CFi, CHILDi), 1<=i<=B, CFi are
Ith cluster feature in this node, CHILDi are directed toward i-th of child nodes of node, correspond to i-th of this node
Cluster feature;Need it is specifically intended that:During building CF trees, an important parameter is cluster radius threshold T, because it determines
The scale for having determined CF tree, to allow CF tree to adapt to currently in the big of the memory that cloud computing center is the distribution of BIRCH models
It is small.If T is too small, the quantity of cluster will be very big, also will increase so as to cause tree node quantity, may lead in this way
Memory is with regard to not enough before causing all data points to scan through not yet, while the accuracy rate translated and T values and being distributed interior
It is proportional to deposit size, memory cannot be less than 100TB herein.
Specific implementation mode
In one embodiment, the user A to speak standard Chinese pronunciation says a mandarin against translater audio input device, leads to
Network is crossed by the BRICH clustering algorithm models of the transmission of speech information to cloud computing center, with the big data after deep learning
After being compared, the audio-frequency information synchronous transfer of English will be translated into translater audio output apparatus, user B uses the equipment
The pronunciation of English of the translation of the simultaneous interpretation to user's A speech contents is heard.
In another embodiment, English-speaking user B says an English against translater audio input device, passes through
Network by the BRICH clustering algorithm models of the transmission of speech information to cloud computing center, with the big data after deep learning into
After row compares, the audio-frequency information synchronous transfer of mandarin will be translated into translater audio output apparatus, user A uses the equipment
The translation audio of the mandarin of the translation of the simultaneous interpretation to user's B speech contents is heard.
Claims (5)
1. a kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms, which is characterized in that including:1)Simultaneous interpretation case
Main part, 2)Simultaneous interpretation case fitting part, 3)The mandarin audio large database concept of acquisition;4)The English audio big data of acquisition
Library;5)By the large database concept of English words permutation and combination and its paraphrase and syntactic rule that 26 basic letters are constituted;6)By portion
The large database concept of Chinese written language structure and word constituent grammar that first radical is constituted;7)Utilize the equilibrium iteration stipulations of hierarchical method
It is the translation model of BIRCH clustering algorithms, seven components with cluster.
2. component 1 according to claim 1), it is characterised in that:Without sitting people in the simultaneous interpretation case of the present invention, therefore volume is remote
It is smaller than traditional simultaneous interpretation case, save the space of Conference Room.
3. a kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms according to claim 1, feature
It is:Component is divided into user terminal physical components and server-side cloud computing component is constituted;User's end pieces are described in claim 1
1)With 2);Server-side cloud computing component is described in claim 13)、4)、5)、6)、7), and component 7)It needs to component 3)、
4)、5)、6)It could be to component 1 after the BIRCH clusterings and deep learning of progress big data)The voice data that input comes carries out
Translation, then by the voice transfer after translation to component 1).
4. a kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms according to claim 1, feature
It is to include such as step:
Step 1: English letter and grammer big data are acquired with Chinese written language and grammer big data;
Step 2: English Phonetics big data is acquired with Chinese speech big data;
Step 3: scanning all data in large database concept, the clustering tree of initialization, i.e. CF tree are built, dense number
According to cluster is divided into, sparse data is treated as isolated point;
Step 4: the overall situation or half global clustering algorithm in BIRCH have the requirement of input range, refinement CF is required accordingly
Tree trees establish several smaller CF trees;
Step 5: remedy the division brought due to input sequence and page-size, using it is global/half Global Algorithm to whole leaf segments
Point is clustered;
Step 6: using the central point in step 5 as seed, data point is re-assigned on nearest seed, ensures to repeat
Data are assigned in the same cluster, while being added cluster label and being made the accuracy of translation more accurate;
Step 7: completing BIRCH Clustering Models to the audio of mandarin and the audio of pronunciation of English by step 3 to step 6
The deep learning that data are translated, at this time BIRCH translation models structure completion, component 1 described in claim 1)In it is defeated
Enter and is made it through after BIRCH translation models are translated from component 2 described in claim 1 not less than 10000 English audios)In it is defeated
Go out audio, detects it and translate accuracy;10000 mandarins will be not less than again and input component 1 described in claim 1)Pass through
From component 2 described in claim 1 after the translation of BIRCH translation models)Middle output audio detects it and translates accuracy;If above-mentioned
The friendship detected twice passes translation accuracy rate and is higher than 95%, and the accuracy rate of simultaneous interpretation translation is trained to higher than 70% BIRCH Clustering Model
Work(can come into operation;If accuracy rate is relatively low, repeatedly step 3 is to step 6, and extends the depth of BIRCH Clustering Models
The time is practised, until terminating after translation accuracy rate is up to standard.
5. a kind of simultaneous interpretation case for being translated mandarin and English using BIRCH clustering algorithms according to claim 1, is being built
For the method used when BIRCH clustering trees for euclidean distance function and manhatton distance function, specific formula is as follows:
The structure of CF tree is similar to a B- tree, and there are two parameters for it:Internal node balance factor B, leaf node balance factor
L, cluster radius threshold T;
Each node contains up to B child nodes in tree, is denoted as(CFi, CHILDi), 1<=i<=B, CFi are in this node
Ith cluster feature, CHILDi be directed toward node i-th of child nodes, correspond to this node ith cluster feature;
Need it is specifically intended that:During building CF trees, an important parameter is cluster radius threshold T, because it determines CF
The scale of tree, to allow CF tree to adapt to currently in the size for the memory that cloud computing center is the distribution of BIRCH models;
If T is too small, the quantity of cluster will be very big, also will increase so as to cause tree node quantity, in this way may
Memory is with regard to not enough before causing all data points to scan through not yet, while the accuracy rate translated and T values and being distributed
Memory size is proportional, and memory cannot be less than 100TB herein.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710173403.XA CN108628847A (en) | 2017-03-22 | 2017-03-22 | A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710173403.XA CN108628847A (en) | 2017-03-22 | 2017-03-22 | A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108628847A true CN108628847A (en) | 2018-10-09 |
Family
ID=63707322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710173403.XA Pending CN108628847A (en) | 2017-03-22 | 2017-03-22 | A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108628847A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113077818A (en) * | 2021-04-08 | 2021-07-06 | 焦作大学 | Voice comparison system for English translation |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079028A (en) * | 2007-05-29 | 2007-11-28 | 中国科学院计算技术研究所 | On-line translation model selection method of statistic machine translation |
CN204305145U (en) * | 2014-12-25 | 2015-04-29 | 小蝌蚪旅游俱乐部(湖南)有限公司 | The multilingual simultaneous interpretation mobile phone of a kind of mobile interchange intelligence |
-
2017
- 2017-03-22 CN CN201710173403.XA patent/CN108628847A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079028A (en) * | 2007-05-29 | 2007-11-28 | 中国科学院计算技术研究所 | On-line translation model selection method of statistic machine translation |
CN204305145U (en) * | 2014-12-25 | 2015-04-29 | 小蝌蚪旅游俱乐部(湖南)有限公司 | The multilingual simultaneous interpretation mobile phone of a kind of mobile interchange intelligence |
Non-Patent Citations (1)
Title |
---|
熊赟 等: "《大数据挖掘》", 30 April 2016, 上海:上海科学技术出版社 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113077818A (en) * | 2021-04-08 | 2021-07-06 | 焦作大学 | Voice comparison system for English translation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101826325B (en) | Method and device for identifying Chinese and English speech signal | |
US9472190B2 (en) | Method and system for automatic speech recognition | |
WO2022095345A1 (en) | Multi-modal model training method, apparatus, device, and storage medium | |
CN108231062B (en) | Voice translation method and device | |
Wheatley et al. | An evaluation of cross-language adaptation for rapid HMM development in a new language | |
CN108647214A (en) | Coding/decoding method based on deep-neural-network translation model | |
CN110060690A (en) | Multi-to-multi voice conversion method based on STARGAN and ResNet | |
CN108170749A (en) | Dialogue method, device and computer-readable medium based on artificial intelligence | |
CN110335608A (en) | Voice print verification method, apparatus, equipment and storage medium | |
CN104575497A (en) | Method for building acoustic model and speech decoding method based on acoustic model | |
CN110060657A (en) | Multi-to-multi voice conversion method based on SN | |
CN108628836A (en) | The robot of voiced translation is carried out using artificial intelligence BP neural network algorithm | |
CN104239579A (en) | Method for constructing multi-language phonetic symbol database, multi-language phonetic notation method and device | |
Le et al. | G2G: TTS-driven pronunciation learning for graphemic hybrid ASR | |
Costa-Jussà et al. | Evaluating gender bias in speech translation | |
CN108628847A (en) | A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms | |
TW201937479A (en) | Multilingual mixed speech recognition method | |
CN111898342A (en) | Chinese pronunciation verification method based on edit distance | |
Wu et al. | Feature based adaptation for speaking style synthesis | |
CN108628841A (en) | The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms | |
CN108717854A (en) | Method for distinguishing speek person based on optimization GFCC characteristic parameters | |
CN108628848A (en) | The method that Sichuan accent and English are translated with BIRCH clustering algorithms | |
Ranzato | Dubbing teenage speech into Italian: creative translation in Skins. | |
CN108197122B (en) | Hiding Hans name transliteration method based on syllable insertion | |
CN104599670B (en) | The audio recognition method of talking pen |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181009 |