CN108628847A - A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms - Google Patents

A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms Download PDF

Info

Publication number
CN108628847A
CN108628847A CN201710173403.XA CN201710173403A CN108628847A CN 108628847 A CN108628847 A CN 108628847A CN 201710173403 A CN201710173403 A CN 201710173403A CN 108628847 A CN108628847 A CN 108628847A
Authority
CN
China
Prior art keywords
english
simultaneous interpretation
birch
translation
mandarin
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710173403.XA
Other languages
Chinese (zh)
Inventor
邱念
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Original Culture Development Co Ltd
Original Assignee
Hunan Original Culture Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Original Culture Development Co Ltd filed Critical Hunan Original Culture Development Co Ltd
Priority to CN201710173403.XA priority Critical patent/CN108628847A/en
Publication of CN108628847A publication Critical patent/CN108628847A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation

Abstract

The invention discloses a kind of simultaneous interpretation casees for translating mandarin and English using BIRCH clustering algorithms, including:1)Simultaneous interpretation box main body part, 2)Simultaneous interpretation case fitting part, 3)The mandarin audio large database concept of acquisition;4)The English audio large database concept of acquisition;5)By the large database concept of English words permutation and combination and its paraphrase and syntactic rule that 26 basic letters are constituted;6)The large database concept of the Chinese written language structure and word constituent grammars that be made of radical radical;7)Equilibrium iteration stipulations and cluster using hierarchical method are the translation model of BIRCH clustering algorithms, seven components;By above-mentioned component, the present invention can substitute the advanced simultaneous interpretation translator of profession, be translated for user, the benefit brought is:Translation will not lead to the mistake caused by fatigue for a long time;Greatly reduce the fund cost for engaging simultaneous interpretation translator;Since, without sitting people, therefore volume very little can be that space is saved in international conference place, the seats for placing participants increase the number of participants of meeting more in the simultaneous interpretation case of the present invention.

Description

A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms
Technical field
Field the present invention relates to BIRCH clustering algorithms for translation using BIRCH clustering algorithms more particularly to a kind of Translate the simultaneous interpretation case of mandarin and English.
Background technology
With the quickening of internationalization process, English-Chinese SI translation demand it is increasing, and existing simultaneous interpretation translation be by People completes, and professional simultaneous interpretation translator's labor intensity is big, and translation accuracy is vulnerable to the influence of personal physical factors, in state In the meeting of border, if the duration of meeting is long, after muscle power and the energy constantly overdraw of translator, it will because of fatigue so that turning over The accuracy translated declines.
Invention content
Mandarin and English are translated using BIRCH clustering algorithms the invention mainly solves the technical problem of providing a kind of Simultaneous interpretation case can substitute the high level translation of high wages, and providing to the user will not turn over because the translation time is long caused by fatigue Translate mistake.
In order to solve the above technical problems, one aspect of the present invention is:It provides a kind of poly- using BIRCH The simultaneous interpretation case of class algorithm translation mandarin and English, which is characterized in that including:1)The audio input device of mandarin, 2)Translation At the audio output apparatus of English, 3)The mandarin audio large database concept of acquisition, 4)The English audio big data of acquisition, 5)By 26 The English words permutation and combination and its large database concept of paraphrase and syntactic rule, 6 that a basis letter is constituted)It is made of radical radical Chinese written language structure and word constituent grammar large database concept, 7)It is using the equilibrium iteration stipulations and cluster of hierarchical method The translation model of BIRCH clustering algorithms, seven components;By above-mentioned seven components, the present invention can substitute the advanced of high wages English-Chinese SI is translated, provide to the user it is cheap and can not fear that fatigue can carry out long-time high quality translation can will be common Words translate into English or the translation by English Translation at mandarin.
A kind of simultaneous interpretation case for being translated mandarin and English using BIRCH clustering algorithms, is used when building BIRCH clustering trees Method be euclidean distance function and manhatton distance function, specific formula is as follows:
The structure of CF tree is similar to a B- tree, and there are two parameters for it:Internal node balance factor B, leaf node balance factor L, cluster radius threshold T.Each node contains up to B child nodes in tree, is denoted as(CFi, CHILDi), 1<=i<=B, CFi are Ith cluster feature in this node, CHILDi are directed toward i-th of child nodes of node, correspond to i-th of this node Cluster feature;Need it is specifically intended that:During building CF trees, an important parameter is cluster radius threshold T, because it determines The scale for having determined CF tree, to allow CF tree to adapt to currently in the big of the memory that cloud computing center is the distribution of BIRCH models It is small.If T is too small, the quantity of cluster will be very big, also will increase so as to cause tree node quantity, may lead in this way Memory is with regard to not enough before causing all data points to scan through not yet, while the accuracy rate translated and T values and being distributed interior It is proportional to deposit size, memory cannot be less than 100TB herein.
Specific implementation mode
In one embodiment, the user A to speak standard Chinese pronunciation says a mandarin against translater audio input device, leads to Network is crossed by the BRICH clustering algorithm models of the transmission of speech information to cloud computing center, with the big data after deep learning After being compared, the audio-frequency information synchronous transfer of English will be translated into translater audio output apparatus, user B uses the equipment The pronunciation of English of the translation of the simultaneous interpretation to user's A speech contents is heard.
In another embodiment, English-speaking user B says an English against translater audio input device, passes through Network by the BRICH clustering algorithm models of the transmission of speech information to cloud computing center, with the big data after deep learning into After row compares, the audio-frequency information synchronous transfer of mandarin will be translated into translater audio output apparatus, user A uses the equipment The translation audio of the mandarin of the translation of the simultaneous interpretation to user's B speech contents is heard.

Claims (5)

1. a kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms, which is characterized in that including:1)Simultaneous interpretation case Main part, 2)Simultaneous interpretation case fitting part, 3)The mandarin audio large database concept of acquisition;4)The English audio big data of acquisition Library;5)By the large database concept of English words permutation and combination and its paraphrase and syntactic rule that 26 basic letters are constituted;6)By portion The large database concept of Chinese written language structure and word constituent grammar that first radical is constituted;7)Utilize the equilibrium iteration stipulations of hierarchical method It is the translation model of BIRCH clustering algorithms, seven components with cluster.
2. component 1 according to claim 1), it is characterised in that:Without sitting people in the simultaneous interpretation case of the present invention, therefore volume is remote It is smaller than traditional simultaneous interpretation case, save the space of Conference Room.
3. a kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms according to claim 1, feature It is:Component is divided into user terminal physical components and server-side cloud computing component is constituted;User's end pieces are described in claim 1 1)With 2);Server-side cloud computing component is described in claim 13)、4)、5)、6)、7), and component 7)It needs to component 3)、 4)、5)、6)It could be to component 1 after the BIRCH clusterings and deep learning of progress big data)The voice data that input comes carries out Translation, then by the voice transfer after translation to component 1).
4. a kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms according to claim 1, feature It is to include such as step:
Step 1: English letter and grammer big data are acquired with Chinese written language and grammer big data;
Step 2: English Phonetics big data is acquired with Chinese speech big data;
Step 3: scanning all data in large database concept, the clustering tree of initialization, i.e. CF tree are built, dense number According to cluster is divided into, sparse data is treated as isolated point;
Step 4: the overall situation or half global clustering algorithm in BIRCH have the requirement of input range, refinement CF is required accordingly Tree trees establish several smaller CF trees;
Step 5: remedy the division brought due to input sequence and page-size, using it is global/half Global Algorithm to whole leaf segments Point is clustered;
Step 6: using the central point in step 5 as seed, data point is re-assigned on nearest seed, ensures to repeat Data are assigned in the same cluster, while being added cluster label and being made the accuracy of translation more accurate;
Step 7: completing BIRCH Clustering Models to the audio of mandarin and the audio of pronunciation of English by step 3 to step 6 The deep learning that data are translated, at this time BIRCH translation models structure completion, component 1 described in claim 1)In it is defeated Enter and is made it through after BIRCH translation models are translated from component 2 described in claim 1 not less than 10000 English audios)In it is defeated Go out audio, detects it and translate accuracy;10000 mandarins will be not less than again and input component 1 described in claim 1)Pass through From component 2 described in claim 1 after the translation of BIRCH translation models)Middle output audio detects it and translates accuracy;If above-mentioned The friendship detected twice passes translation accuracy rate and is higher than 95%, and the accuracy rate of simultaneous interpretation translation is trained to higher than 70% BIRCH Clustering Model Work(can come into operation;If accuracy rate is relatively low, repeatedly step 3 is to step 6, and extends the depth of BIRCH Clustering Models The time is practised, until terminating after translation accuracy rate is up to standard.
5. a kind of simultaneous interpretation case for being translated mandarin and English using BIRCH clustering algorithms according to claim 1, is being built For the method used when BIRCH clustering trees for euclidean distance function and manhatton distance function, specific formula is as follows:
The structure of CF tree is similar to a B- tree, and there are two parameters for it:Internal node balance factor B, leaf node balance factor L, cluster radius threshold T;
Each node contains up to B child nodes in tree, is denoted as(CFi, CHILDi), 1<=i<=B, CFi are in this node Ith cluster feature, CHILDi be directed toward node i-th of child nodes, correspond to this node ith cluster feature; Need it is specifically intended that:During building CF trees, an important parameter is cluster radius threshold T, because it determines CF The scale of tree, to allow CF tree to adapt to currently in the size for the memory that cloud computing center is the distribution of BIRCH models;
If T is too small, the quantity of cluster will be very big, also will increase so as to cause tree node quantity, in this way may Memory is with regard to not enough before causing all data points to scan through not yet, while the accuracy rate translated and T values and being distributed Memory size is proportional, and memory cannot be less than 100TB herein.
CN201710173403.XA 2017-03-22 2017-03-22 A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms Pending CN108628847A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710173403.XA CN108628847A (en) 2017-03-22 2017-03-22 A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710173403.XA CN108628847A (en) 2017-03-22 2017-03-22 A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms

Publications (1)

Publication Number Publication Date
CN108628847A true CN108628847A (en) 2018-10-09

Family

ID=63707322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710173403.XA Pending CN108628847A (en) 2017-03-22 2017-03-22 A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms

Country Status (1)

Country Link
CN (1) CN108628847A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113077818A (en) * 2021-04-08 2021-07-06 焦作大学 Voice comparison system for English translation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079028A (en) * 2007-05-29 2007-11-28 中国科学院计算技术研究所 On-line translation model selection method of statistic machine translation
CN204305145U (en) * 2014-12-25 2015-04-29 小蝌蚪旅游俱乐部(湖南)有限公司 The multilingual simultaneous interpretation mobile phone of a kind of mobile interchange intelligence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079028A (en) * 2007-05-29 2007-11-28 中国科学院计算技术研究所 On-line translation model selection method of statistic machine translation
CN204305145U (en) * 2014-12-25 2015-04-29 小蝌蚪旅游俱乐部(湖南)有限公司 The multilingual simultaneous interpretation mobile phone of a kind of mobile interchange intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
熊赟 等: "《大数据挖掘》", 30 April 2016, 上海:上海科学技术出版社 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113077818A (en) * 2021-04-08 2021-07-06 焦作大学 Voice comparison system for English translation

Similar Documents

Publication Publication Date Title
CN101826325B (en) Method and device for identifying Chinese and English speech signal
US9472190B2 (en) Method and system for automatic speech recognition
WO2022095345A1 (en) Multi-modal model training method, apparatus, device, and storage medium
CN108231062B (en) Voice translation method and device
Wheatley et al. An evaluation of cross-language adaptation for rapid HMM development in a new language
CN108647214A (en) Coding/decoding method based on deep-neural-network translation model
CN110060690A (en) Multi-to-multi voice conversion method based on STARGAN and ResNet
CN108170749A (en) Dialogue method, device and computer-readable medium based on artificial intelligence
CN110335608A (en) Voice print verification method, apparatus, equipment and storage medium
CN104575497A (en) Method for building acoustic model and speech decoding method based on acoustic model
CN110060657A (en) Multi-to-multi voice conversion method based on SN
CN108628836A (en) The robot of voiced translation is carried out using artificial intelligence BP neural network algorithm
CN104239579A (en) Method for constructing multi-language phonetic symbol database, multi-language phonetic notation method and device
Le et al. G2G: TTS-driven pronunciation learning for graphemic hybrid ASR
Costa-Jussà et al. Evaluating gender bias in speech translation
CN108628847A (en) A kind of simultaneous interpretation case for translating mandarin and English using BIRCH clustering algorithms
TW201937479A (en) Multilingual mixed speech recognition method
CN111898342A (en) Chinese pronunciation verification method based on edit distance
Wu et al. Feature based adaptation for speaking style synthesis
CN108628841A (en) The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms
CN108717854A (en) Method for distinguishing speek person based on optimization GFCC characteristic parameters
CN108628848A (en) The method that Sichuan accent and English are translated with BIRCH clustering algorithms
Ranzato Dubbing teenage speech into Italian: creative translation in Skins.
CN108197122B (en) Hiding Hans name transliteration method based on syllable insertion
CN104599670B (en) The audio recognition method of talking pen

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181009