CN108628841A - The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms - Google Patents
The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms Download PDFInfo
- Publication number
- CN108628841A CN108628841A CN201710172504.5A CN201710172504A CN108628841A CN 108628841 A CN108628841 A CN 108628841A CN 201710172504 A CN201710172504 A CN 201710172504A CN 108628841 A CN108628841 A CN 108628841A
- Authority
- CN
- China
- Prior art keywords
- translation
- english
- birch
- tree
- translated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72403—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
- H04M1/72406—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality by software upgrading or downloading
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of APP translating Guangdong language accent and English based on BIRCH clustering algorithms, including:1)Cell phone application client;2)The Guangdong language of cloud computing center translation large database concept related to English;3)The BIRCH cluster arithmetic modules that cloud computing center carries, three components are constituted;By above-mentioned component, the present invention can substitute the advanced simultaneous interpretation translator of profession, be translated for user, the benefit brought is:Translation will not lead to the mistake caused by fatigue for a long time;Greatly reduce the fund cost for engaging simultaneous interpretation translator;The present invention be for a user as an app on smart mobile phone, it is easy to carry, without carrying retinue translator.
Description
Technical field
The present invention relates to the fields that BIRCH clustering algorithms are combined with cell phone application technology, more particularly to based on BIRCH
Clustering algorithm translates the APP of Guangdong language accent and English.
Background technology
With the quickening of internationalization process, the demand of translation is increasing, and existing simultaneous interpretation translation is completed by people,
Simultaneous interpretation translator's labor intensity of profession is big, and translation accuracy is vulnerable to the influence of personal physical factors, in international conference,
If the duration of meeting is long, after the physical and energy constantly overdraw of translator, it will because fatigue makes the accurate of translation
Degree declines;When individual travels abroad, since the simultaneous interpretation translation salary level of profession is high, general ruck is relatively difficult to receive to take
Band translator goes on a journey;It is heavier for Guangdong language accent, and the non-type people of Mandarin Chinese speech, when translating its sentence, if translation
Personnel are the personnel that foreign countries are ignorant of Guangdong language accent, then easily generate mistake to cause damages.
Invention content
The invention mainly solves the technical problem of providing one kind based on the translation Guangdong language accent of BIRCH clustering algorithms and English
APP, the high level translation of high wages can be substituted, providing to the user will not translate because the translation time is long caused by fatigue
Mistake, and can identify the Guangdong language accent of user, avoid user from not speaking standard Chinese pronunciation, translator is ignorant of the awkward of Guangdong language accent again
A word used for translation situation;
In order to solve the above technical problems, one aspect of the present invention is:One kind is provided to turn over based on BIRCH clustering algorithms
Translate the APP of Guangdong language accent and English, which is characterized in that including:1)Cell phone application client;2)The Guangdong language and English of cloud computing center
Language correlation translates large database concept;3)The BIRCH cluster arithmetic modules that cloud computing center carries, three components are constituted;By above-mentioned
Three components, the present invention can substitute the advanced English-Chinese SI translation of high wages, provide to the user cheap and can not fear tired
Labor can carry out the translation that can Guangdong language words translated into English or be talked about English Translation at Guangdong language of long-time high quality translation;Together
When due to the present invention user terminal be cell phone application software form, therefore user carry it is very convenient.
The APP of Guangdong language accent and English, the side used when building BIRCH clustering trees are translated based on BIRCH clustering algorithms
Method is euclidean distance function and manhatton distance function, and specific formula is as follows:
The structure of CF tree is similar to a B- tree, and there are two parameters for it:Internal node balance factor B, leaf node balance factor
L, cluster radius threshold T.Each node contains up to B child nodes in tree, is denoted as(CFi, CHILDi), 1<=i<=B, CFi are
Ith cluster feature in this node, CHILDi are directed toward i-th of child nodes of node, correspond to i-th of this node
Cluster feature;Need it is specifically intended that:During building CF trees, an important parameter is cluster radius threshold T, because it determines
The scale for having determined CF tree, to allow CF tree to adapt to currently in the big of the memory that cloud computing center is the distribution of BIRCH models
It is small.If T is too small, the quantity of cluster will be very big, also will increase so as to cause tree node quantity, may lead in this way
Memory is with regard to not enough before causing all data points to scan through not yet, while the accuracy rate translated and T values and being distributed interior
It is proportional to deposit size, memory cannot be less than 100TB herein.
Specific implementation mode
In one embodiment, say that the user A of Guangdong language accent says a Guangdong language against translater audio input device,
By network by the BRICH clustering algorithm models of the transmission of speech information to cloud computing center, with the big number after deep learning
After being compared, the audio-frequency information synchronous transfer of English will be translated into translater audio output apparatus, user B is set using this
The standby pronunciation of English for having heard the translation of the simultaneous interpretation to user's A speech contents.
In another embodiment, English-speaking user B says an English against translater audio input device, passes through
Network by the BRICH clustering algorithm models of the transmission of speech information to cloud computing center, with the big data after deep learning into
After row compares, the audio-frequency information synchronous transfer of Guangdong language accent will be translated into translater audio output apparatus, user A is set using this
The translation audio of the standby Guangdong language accent for having heard the translation of the simultaneous interpretation to user's B speech contents.
Claims (7)
1. translating the APP of Guangdong language accent and English based on BIRCH clustering algorithms, which is characterized in that including:1)Cell phone application client
End;2)The Guangdong language of cloud computing center translation large database concept related to English;3)The BIRCH clustering algorithm moulds that cloud computing center carries
Block, three components are constituted.
2. component 1 according to claim 1)It is characterized in that:Cell phone application client need to be mounted on the smart mobile phone of user
On, the mobile phone need to be connect with internet when use, and user need to wear and connect the earphone of the mobile phone and listen to the audio letter after translation
Breath.
3. component 2 according to claim 1)It specifically includes:Guangdong language accent audio large database concept, English audio big data
It is library, the English words permutation and combination being made of 26 basic letters and its large database concept of paraphrase and syntactic rule, inclined by radical
The large database concept, big by the industry proper noun not less than 10 industries of Chinese written language structure and word constituent grammar that side is constituted
Database.
4. component 3 according to claim 1)It is to be mounted in the equilibrium iteration stipulations that hierarchical method is utilized on cloud computing center
It is the translation model of BIRCH clustering algorithms with cluster;The model needs input part 2)In all kinds of translation big datas, and carry out
After the big data deep learning of Guangdong language accent and pronunciation of English, it could be translated.
5. the APP according to claim 1 for being translated Guangdong language accent and English based on BIRCH clustering algorithms, is translated
Process be:The audio-frequency information for the Guangdong language accent or English that APP acquisition users send out, is transmitted to cloud computing center, by through too deep
BIRCH clustering algorithm translation models after degree study translation big data are translated, then the audio-frequency information after translation is synchronized and is passed
On the APP at reuse family, user connects the earphone of the mobile phone by wearing, listens to the audio content after translation.
6. the APP according to claim 1 for translating Guangdong language accent and English based on BIRCH clustering algorithms, it is characterised in that
Including such as step:
Step 1: English letter and grammer big data are acquired with Chinese written language and grammer big data;
Step 2: English Phonetics big data is acquired with Chinese speech big data;
Step 3: scanning all data in large database concept, the clustering tree of initialization, i.e. CF tree are built, dense number
According to cluster is divided into, sparse data is treated as isolated point;
Step 4: the overall situation or half global clustering algorithm in BIRCH have the requirement of input range, refinement CF is required accordingly
Tree trees establish several smaller CF trees;
Step 5: remedy the division brought due to input sequence and page-size, using it is global/half Global Algorithm to whole leaf segments
Point is clustered;
Step 6: using the central point in step 5 as seed, data point is re-assigned on nearest seed, ensures to repeat
Data are assigned in the same cluster, while being added cluster label and being made the accuracy of translation more accurate;
Step 7: completing BIRCH Clustering Models to the audio of Guangdong language accent and the sound of pronunciation of English by step 3 to step 6
Frequency is completed according to the deep learning translated, at this time BIRCH translation models structure, component 1 described in claim 1)In
Input makes it through after BIRCH translation models are translated not less than 10000 English audios from component 2 described in claim 1)In
Audio is exported, it is detected and translates accuracy;10000 Cantonise dialects will be not less than again and input component 1 described in claim 1)
From component 2 described in claim 1 after being translated by BIRCH translation models)Middle output audio detects it and translates accuracy;If
The above-mentioned friendship detected twice passes translation accuracy rate and is higher than 95%, and higher than 70% BIRCH Clustering Model of accuracy rate of simultaneous interpretation translation is instructed
Practice successfully, can come into operation;If accuracy rate is relatively low, repeatedly step 3 is to step 6, and extends the depth of BIRCH Clustering Models
Learning time is spent, until terminating after translation accuracy rate is up to standard.
7. the APP according to claim 1 for translating Guangdong language accent and English based on BIRCH clustering algorithms, in structure BIRCH
For the method used when clustering tree for euclidean distance function and manhatton distance function, specific formula is as follows:
The structure of CF tree is similar to a B- tree, and there are two parameters for it:Internal node balance factor B, leaf node balance factor
L, cluster radius threshold T, each node contains up to B child nodes in tree, is denoted as(CFi, CHILDi), 1<=i<=B, CFi are
Ith cluster feature in this node, CHILDi are directed toward i-th of child nodes of node, correspond to i-th of this node
Cluster feature;Need it is specifically intended that:During building CF trees, an important parameter is cluster radius threshold T, because it determines
The scale for having determined CF tree, to allow CF tree to adapt to currently in the big of the memory that cloud computing center is the distribution of BIRCH models
Small, if T is too small, the quantity of cluster will be very big, also will increase so as to cause tree node quantity, may lead in this way
Memory is with regard to not enough before causing all data points to scan through not yet, while the accuracy rate translated and T values and being distributed interior
It is proportional to deposit size, memory cannot be less than 100TB herein.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710172504.5A CN108628841A (en) | 2017-03-22 | 2017-03-22 | The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710172504.5A CN108628841A (en) | 2017-03-22 | 2017-03-22 | The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108628841A true CN108628841A (en) | 2018-10-09 |
Family
ID=63706913
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710172504.5A Pending CN108628841A (en) | 2017-03-22 | 2017-03-22 | The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108628841A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961794A (en) * | 2019-01-14 | 2019-07-02 | 湘潭大学 | A kind of layering method for distinguishing speek person of model-based clustering |
CN113466578A (en) * | 2021-05-27 | 2021-10-01 | 中能瑞通(北京)科技有限公司 | Rural power grid distribution area box table topological relation identification method and user electricity utilization monitoring method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079028A (en) * | 2007-05-29 | 2007-11-28 | 中国科学院计算技术研究所 | On-line translation model selection method of statistic machine translation |
CN204305145U (en) * | 2014-12-25 | 2015-04-29 | 小蝌蚪旅游俱乐部(湖南)有限公司 | The multilingual simultaneous interpretation mobile phone of a kind of mobile interchange intelligence |
-
2017
- 2017-03-22 CN CN201710172504.5A patent/CN108628841A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079028A (en) * | 2007-05-29 | 2007-11-28 | 中国科学院计算技术研究所 | On-line translation model selection method of statistic machine translation |
CN204305145U (en) * | 2014-12-25 | 2015-04-29 | 小蝌蚪旅游俱乐部(湖南)有限公司 | The multilingual simultaneous interpretation mobile phone of a kind of mobile interchange intelligence |
Non-Patent Citations (1)
Title |
---|
熊赟 等: "《大数据挖掘》", 30 April 2016, 上海:上海科学技术出版社 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961794A (en) * | 2019-01-14 | 2019-07-02 | 湘潭大学 | A kind of layering method for distinguishing speek person of model-based clustering |
CN113466578A (en) * | 2021-05-27 | 2021-10-01 | 中能瑞通(北京)科技有限公司 | Rural power grid distribution area box table topological relation identification method and user electricity utilization monitoring method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Godard et al. | A very low resource language speech corpus for computational language documentation experiments | |
KR102260216B1 (en) | Intelligent voice recognizing method, voice recognizing apparatus, intelligent computing device and server | |
US8571849B2 (en) | System and method for enriching spoken language translation with prosodic information | |
Wang et al. | Word embedding for recurrent neural network based TTS synthesis | |
WO2018153213A1 (en) | Multi-language hybrid speech recognition method | |
CN108231062B (en) | Voice translation method and device | |
CN109741732A (en) | Name entity recognition method, name entity recognition device, equipment and medium | |
CN110852075B (en) | Voice transcription method and device capable of automatically adding punctuation marks and readable storage medium | |
CN103971686A (en) | Method and system for automatically recognizing voice | |
CN114416934B (en) | Multi-modal dialog generation model training method and device and electronic equipment | |
CN109448699A (en) | Voice converting text method, apparatus, computer equipment and storage medium | |
CN111433847A (en) | Speech conversion method and training method, intelligent device and storage medium | |
CN110264992A (en) | Speech synthesis processing method, device, equipment and storage medium | |
CN104575497A (en) | Method for building acoustic model and speech decoding method based on acoustic model | |
CN109829173A (en) | A kind of English place name interpretation method and device | |
Kumar et al. | Translations of the CALLHOME Egyptian Arabic corpus for conversational speech translation | |
US20230127787A1 (en) | Method and apparatus for converting voice timbre, method and apparatus for training model, device and medium | |
CN110335608A (en) | Voice print verification method, apparatus, equipment and storage medium | |
CN110010136A (en) | The training and text analyzing method, apparatus, medium and equipment of prosody prediction model | |
CN108628836A (en) | The robot of voiced translation is carried out using artificial intelligence BP neural network algorithm | |
Costa-Jussà et al. | Evaluating gender bias in speech translation | |
CN113836945B (en) | Intention recognition method, device, electronic equipment and storage medium | |
CN108628841A (en) | The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms | |
CN104217039A (en) | Method and system for recording telephone conversations in real time and converting telephone conversations into declarative sentences | |
TW201937479A (en) | Multilingual mixed speech recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181009 |