CN108628841A - The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms - Google Patents

The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms Download PDF

Info

Publication number
CN108628841A
CN108628841A CN201710172504.5A CN201710172504A CN108628841A CN 108628841 A CN108628841 A CN 108628841A CN 201710172504 A CN201710172504 A CN 201710172504A CN 108628841 A CN108628841 A CN 108628841A
Authority
CN
China
Prior art keywords
translation
english
birch
tree
translated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710172504.5A
Other languages
Chinese (zh)
Inventor
邱念
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Original Culture Development Co Ltd
Original Assignee
Hunan Original Culture Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Original Culture Development Co Ltd filed Critical Hunan Original Culture Development Co Ltd
Priority to CN201710172504.5A priority Critical patent/CN108628841A/en
Publication of CN108628841A publication Critical patent/CN108628841A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/72406User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality by software upgrading or downloading

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of APP translating Guangdong language accent and English based on BIRCH clustering algorithms, including:1)Cell phone application client;2)The Guangdong language of cloud computing center translation large database concept related to English;3)The BIRCH cluster arithmetic modules that cloud computing center carries, three components are constituted;By above-mentioned component, the present invention can substitute the advanced simultaneous interpretation translator of profession, be translated for user, the benefit brought is:Translation will not lead to the mistake caused by fatigue for a long time;Greatly reduce the fund cost for engaging simultaneous interpretation translator;The present invention be for a user as an app on smart mobile phone, it is easy to carry, without carrying retinue translator.

Description

The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms
Technical field
The present invention relates to the fields that BIRCH clustering algorithms are combined with cell phone application technology, more particularly to based on BIRCH Clustering algorithm translates the APP of Guangdong language accent and English.
Background technology
With the quickening of internationalization process, the demand of translation is increasing, and existing simultaneous interpretation translation is completed by people, Simultaneous interpretation translator's labor intensity of profession is big, and translation accuracy is vulnerable to the influence of personal physical factors, in international conference, If the duration of meeting is long, after the physical and energy constantly overdraw of translator, it will because fatigue makes the accurate of translation Degree declines;When individual travels abroad, since the simultaneous interpretation translation salary level of profession is high, general ruck is relatively difficult to receive to take Band translator goes on a journey;It is heavier for Guangdong language accent, and the non-type people of Mandarin Chinese speech, when translating its sentence, if translation Personnel are the personnel that foreign countries are ignorant of Guangdong language accent, then easily generate mistake to cause damages.
Invention content
The invention mainly solves the technical problem of providing one kind based on the translation Guangdong language accent of BIRCH clustering algorithms and English APP, the high level translation of high wages can be substituted, providing to the user will not translate because the translation time is long caused by fatigue Mistake, and can identify the Guangdong language accent of user, avoid user from not speaking standard Chinese pronunciation, translator is ignorant of the awkward of Guangdong language accent again A word used for translation situation;
In order to solve the above technical problems, one aspect of the present invention is:One kind is provided to turn over based on BIRCH clustering algorithms Translate the APP of Guangdong language accent and English, which is characterized in that including:1)Cell phone application client;2)The Guangdong language and English of cloud computing center Language correlation translates large database concept;3)The BIRCH cluster arithmetic modules that cloud computing center carries, three components are constituted;By above-mentioned Three components, the present invention can substitute the advanced English-Chinese SI translation of high wages, provide to the user cheap and can not fear tired Labor can carry out the translation that can Guangdong language words translated into English or be talked about English Translation at Guangdong language of long-time high quality translation;Together When due to the present invention user terminal be cell phone application software form, therefore user carry it is very convenient.
The APP of Guangdong language accent and English, the side used when building BIRCH clustering trees are translated based on BIRCH clustering algorithms Method is euclidean distance function and manhatton distance function, and specific formula is as follows:
The structure of CF tree is similar to a B- tree, and there are two parameters for it:Internal node balance factor B, leaf node balance factor L, cluster radius threshold T.Each node contains up to B child nodes in tree, is denoted as(CFi, CHILDi), 1<=i<=B, CFi are Ith cluster feature in this node, CHILDi are directed toward i-th of child nodes of node, correspond to i-th of this node Cluster feature;Need it is specifically intended that:During building CF trees, an important parameter is cluster radius threshold T, because it determines The scale for having determined CF tree, to allow CF tree to adapt to currently in the big of the memory that cloud computing center is the distribution of BIRCH models It is small.If T is too small, the quantity of cluster will be very big, also will increase so as to cause tree node quantity, may lead in this way Memory is with regard to not enough before causing all data points to scan through not yet, while the accuracy rate translated and T values and being distributed interior It is proportional to deposit size, memory cannot be less than 100TB herein.
Specific implementation mode
In one embodiment, say that the user A of Guangdong language accent says a Guangdong language against translater audio input device, By network by the BRICH clustering algorithm models of the transmission of speech information to cloud computing center, with the big number after deep learning After being compared, the audio-frequency information synchronous transfer of English will be translated into translater audio output apparatus, user B is set using this The standby pronunciation of English for having heard the translation of the simultaneous interpretation to user's A speech contents.
In another embodiment, English-speaking user B says an English against translater audio input device, passes through Network by the BRICH clustering algorithm models of the transmission of speech information to cloud computing center, with the big data after deep learning into After row compares, the audio-frequency information synchronous transfer of Guangdong language accent will be translated into translater audio output apparatus, user A is set using this The translation audio of the standby Guangdong language accent for having heard the translation of the simultaneous interpretation to user's B speech contents.

Claims (7)

1. translating the APP of Guangdong language accent and English based on BIRCH clustering algorithms, which is characterized in that including:1)Cell phone application client End;2)The Guangdong language of cloud computing center translation large database concept related to English;3)The BIRCH clustering algorithm moulds that cloud computing center carries Block, three components are constituted.
2. component 1 according to claim 1)It is characterized in that:Cell phone application client need to be mounted on the smart mobile phone of user On, the mobile phone need to be connect with internet when use, and user need to wear and connect the earphone of the mobile phone and listen to the audio letter after translation Breath.
3. component 2 according to claim 1)It specifically includes:Guangdong language accent audio large database concept, English audio big data It is library, the English words permutation and combination being made of 26 basic letters and its large database concept of paraphrase and syntactic rule, inclined by radical The large database concept, big by the industry proper noun not less than 10 industries of Chinese written language structure and word constituent grammar that side is constituted Database.
4. component 3 according to claim 1)It is to be mounted in the equilibrium iteration stipulations that hierarchical method is utilized on cloud computing center It is the translation model of BIRCH clustering algorithms with cluster;The model needs input part 2)In all kinds of translation big datas, and carry out After the big data deep learning of Guangdong language accent and pronunciation of English, it could be translated.
5. the APP according to claim 1 for being translated Guangdong language accent and English based on BIRCH clustering algorithms, is translated Process be:The audio-frequency information for the Guangdong language accent or English that APP acquisition users send out, is transmitted to cloud computing center, by through too deep BIRCH clustering algorithm translation models after degree study translation big data are translated, then the audio-frequency information after translation is synchronized and is passed On the APP at reuse family, user connects the earphone of the mobile phone by wearing, listens to the audio content after translation.
6. the APP according to claim 1 for translating Guangdong language accent and English based on BIRCH clustering algorithms, it is characterised in that Including such as step:
Step 1: English letter and grammer big data are acquired with Chinese written language and grammer big data;
Step 2: English Phonetics big data is acquired with Chinese speech big data;
Step 3: scanning all data in large database concept, the clustering tree of initialization, i.e. CF tree are built, dense number According to cluster is divided into, sparse data is treated as isolated point;
Step 4: the overall situation or half global clustering algorithm in BIRCH have the requirement of input range, refinement CF is required accordingly Tree trees establish several smaller CF trees;
Step 5: remedy the division brought due to input sequence and page-size, using it is global/half Global Algorithm to whole leaf segments Point is clustered;
Step 6: using the central point in step 5 as seed, data point is re-assigned on nearest seed, ensures to repeat Data are assigned in the same cluster, while being added cluster label and being made the accuracy of translation more accurate;
Step 7: completing BIRCH Clustering Models to the audio of Guangdong language accent and the sound of pronunciation of English by step 3 to step 6 Frequency is completed according to the deep learning translated, at this time BIRCH translation models structure, component 1 described in claim 1)In Input makes it through after BIRCH translation models are translated not less than 10000 English audios from component 2 described in claim 1)In Audio is exported, it is detected and translates accuracy;10000 Cantonise dialects will be not less than again and input component 1 described in claim 1) From component 2 described in claim 1 after being translated by BIRCH translation models)Middle output audio detects it and translates accuracy;If The above-mentioned friendship detected twice passes translation accuracy rate and is higher than 95%, and higher than 70% BIRCH Clustering Model of accuracy rate of simultaneous interpretation translation is instructed Practice successfully, can come into operation;If accuracy rate is relatively low, repeatedly step 3 is to step 6, and extends the depth of BIRCH Clustering Models Learning time is spent, until terminating after translation accuracy rate is up to standard.
7. the APP according to claim 1 for translating Guangdong language accent and English based on BIRCH clustering algorithms, in structure BIRCH For the method used when clustering tree for euclidean distance function and manhatton distance function, specific formula is as follows:
The structure of CF tree is similar to a B- tree, and there are two parameters for it:Internal node balance factor B, leaf node balance factor L, cluster radius threshold T, each node contains up to B child nodes in tree, is denoted as(CFi, CHILDi), 1<=i<=B, CFi are Ith cluster feature in this node, CHILDi are directed toward i-th of child nodes of node, correspond to i-th of this node Cluster feature;Need it is specifically intended that:During building CF trees, an important parameter is cluster radius threshold T, because it determines The scale for having determined CF tree, to allow CF tree to adapt to currently in the big of the memory that cloud computing center is the distribution of BIRCH models Small, if T is too small, the quantity of cluster will be very big, also will increase so as to cause tree node quantity, may lead in this way Memory is with regard to not enough before causing all data points to scan through not yet, while the accuracy rate translated and T values and being distributed interior It is proportional to deposit size, memory cannot be less than 100TB herein.
CN201710172504.5A 2017-03-22 2017-03-22 The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms Pending CN108628841A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710172504.5A CN108628841A (en) 2017-03-22 2017-03-22 The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710172504.5A CN108628841A (en) 2017-03-22 2017-03-22 The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms

Publications (1)

Publication Number Publication Date
CN108628841A true CN108628841A (en) 2018-10-09

Family

ID=63706913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710172504.5A Pending CN108628841A (en) 2017-03-22 2017-03-22 The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms

Country Status (1)

Country Link
CN (1) CN108628841A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961794A (en) * 2019-01-14 2019-07-02 湘潭大学 A kind of layering method for distinguishing speek person of model-based clustering
CN113466578A (en) * 2021-05-27 2021-10-01 中能瑞通(北京)科技有限公司 Rural power grid distribution area box table topological relation identification method and user electricity utilization monitoring method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079028A (en) * 2007-05-29 2007-11-28 中国科学院计算技术研究所 On-line translation model selection method of statistic machine translation
CN204305145U (en) * 2014-12-25 2015-04-29 小蝌蚪旅游俱乐部(湖南)有限公司 The multilingual simultaneous interpretation mobile phone of a kind of mobile interchange intelligence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079028A (en) * 2007-05-29 2007-11-28 中国科学院计算技术研究所 On-line translation model selection method of statistic machine translation
CN204305145U (en) * 2014-12-25 2015-04-29 小蝌蚪旅游俱乐部(湖南)有限公司 The multilingual simultaneous interpretation mobile phone of a kind of mobile interchange intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
熊赟 等: "《大数据挖掘》", 30 April 2016, 上海:上海科学技术出版社 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961794A (en) * 2019-01-14 2019-07-02 湘潭大学 A kind of layering method for distinguishing speek person of model-based clustering
CN113466578A (en) * 2021-05-27 2021-10-01 中能瑞通(北京)科技有限公司 Rural power grid distribution area box table topological relation identification method and user electricity utilization monitoring method

Similar Documents

Publication Publication Date Title
Godard et al. A very low resource language speech corpus for computational language documentation experiments
KR102260216B1 (en) Intelligent voice recognizing method, voice recognizing apparatus, intelligent computing device and server
US8571849B2 (en) System and method for enriching spoken language translation with prosodic information
Wang et al. Word embedding for recurrent neural network based TTS synthesis
WO2018153213A1 (en) Multi-language hybrid speech recognition method
CN108231062B (en) Voice translation method and device
CN109741732A (en) Name entity recognition method, name entity recognition device, equipment and medium
CN110852075B (en) Voice transcription method and device capable of automatically adding punctuation marks and readable storage medium
CN103971686A (en) Method and system for automatically recognizing voice
CN114416934B (en) Multi-modal dialog generation model training method and device and electronic equipment
CN109448699A (en) Voice converting text method, apparatus, computer equipment and storage medium
CN111433847A (en) Speech conversion method and training method, intelligent device and storage medium
CN110264992A (en) Speech synthesis processing method, device, equipment and storage medium
CN104575497A (en) Method for building acoustic model and speech decoding method based on acoustic model
CN109829173A (en) A kind of English place name interpretation method and device
Kumar et al. Translations of the CALLHOME Egyptian Arabic corpus for conversational speech translation
US20230127787A1 (en) Method and apparatus for converting voice timbre, method and apparatus for training model, device and medium
CN110335608A (en) Voice print verification method, apparatus, equipment and storage medium
CN110010136A (en) The training and text analyzing method, apparatus, medium and equipment of prosody prediction model
CN108628836A (en) The robot of voiced translation is carried out using artificial intelligence BP neural network algorithm
Costa-Jussà et al. Evaluating gender bias in speech translation
CN113836945B (en) Intention recognition method, device, electronic equipment and storage medium
CN108628841A (en) The APP of Guangdong language accent and English is translated based on BIRCH clustering algorithms
CN104217039A (en) Method and system for recording telephone conversations in real time and converting telephone conversations into declarative sentences
TW201937479A (en) Multilingual mixed speech recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181009