CN115796194A - English translation system based on machine learning - Google Patents
English translation system based on machine learning Download PDFInfo
- Publication number
- CN115796194A CN115796194A CN202211439914.9A CN202211439914A CN115796194A CN 115796194 A CN115796194 A CN 115796194A CN 202211439914 A CN202211439914 A CN 202211439914A CN 115796194 A CN115796194 A CN 115796194A
- Authority
- CN
- China
- Prior art keywords
- language
- data
- standard
- database
- english
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013519 translation Methods 0.000 title claims abstract description 52
- 238000010801 machine learning Methods 0.000 title claims abstract description 21
- 238000013507 mapping Methods 0.000 claims abstract description 7
- 239000012535 impurity Substances 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 15
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Machine Translation (AREA)
Abstract
The invention relates to the field of translation, and discloses an English translation system based on machine learning, which comprises: the database is used for storing a translation set, and the translation set comprises a mapping set of English-standard language-common language; the voice receiving module is used for receiving the voice of a user to be treated and carrying out noise reduction and impurity removal treatment on the voice; the translation module is used for translating English and standard language mutually; the selection module is used for selecting a common language corresponding to the standard Chinese language in the database; the output module is used for outputting the common language; the learning module receives the language of the user, learns the common language habits of the user, establishes the corresponding relation with the standard language and stores the corresponding relation into the database.
Description
Technical Field
The invention relates to the field of translation, in particular to an English translation system based on machine learning.
Background
Machine translation, also known as automatic translation, is the process of converting one natural language (source language) to another natural language (target language) using a computer. It is a branch of computational linguistics, is one of the ultimate targets of artificial intelligence, and has important scientific research value.
Meanwhile, machine translation has important practical value. With the rapid development of the globalization of economy and the internet, the machine translation technology plays an increasingly important role in the aspects of promoting political, economic and cultural communication and the like.
The existing machine translation systems are various in types, wherein English-to-Chinese systems are numerous, but the translated languages are hard, the translation cannot be carried out according to the common voice habits, speeches and the like of users, and the practical requirements of the existing translation cannot be met.
Disclosure of Invention
The invention provides an English translation system based on machine learning, which comprises:
the database is used for storing a translation set, and the translation set comprises a mapping set of English-standard language-common language;
the voice receiving module is used for receiving the voice of a user to be treated and carrying out noise reduction and impurity removal treatment on the voice;
the translation module is used for translating English and standard language;
the selection module is used for selecting a common language corresponding to the standard Chinese language in the database;
the output module is used for outputting the common language;
the learning module receives the language of the user, learns the common language habit of the user, establishes a corresponding relation with the standard language and stores the corresponding relation in the database.
Further: the method for translating Chinese into English by the translation module comprises the following steps:
s1: acquiring voice data information, and segmenting data to obtain a segmented data information table;
s2: acquiring an input data information table, and retrieving a corresponding data set in a database according to the key words;
s3: extracting corresponding data in the data set, and combining according to rules to obtain standard Chinese sentences;
s4: selecting the obtained standard Chinese sentences, and selecting the standard Chinese sentences with the highest probability;
s5: and outputting the obtained standard Chinese sentence to a selection module.
Further, the method comprises the following steps: in step S1, after acquiring the voice data, converting the voice data into text information, then segmenting the text information, and performing word segmentation on the text information, the steps are as follows:
s11: analyzing the language type of the character information to obtain a language type analysis result of the character information, wherein the language type analysis result at least comprises a prestored standard language type;
s12: according to the language patterns obtained by analysis, dividing characters and/or words of the character information, wherein each language pattern obtains a group of characters and/or words;
s13: and compiling each group of characters and/or words into a word information table.
Further, the method comprises the following steps: in step S2, a keyword is extracted for each piece of word information, and a corresponding data set is retrieved from the database according to the keyword, wherein a weight is given to the word list, and the higher the frequency of occurrence of the keyword in the word information list, the higher the corresponding weight.
Further: in step S2, the data set is composed of words and phrases corresponding to english and chinese.
Further, the method comprises the following steps: the learning module comprises a selection unit and a learning unit, wherein the selection unit is used for selecting the place where the voice habit of the user belongs and then downloading a corpus of the place from the master server to the database, and the corpus comprises a mapping set of English-standard language-common language;
the learning unit is used for learning the common language habit of the user, establishing a corresponding relation with the standard language, and storing and updating the database.
Further, the method comprises the following steps: the learning method of the learning unit comprises the following steps:
s101: receiving language data input by a user, analyzing and processing the language data, searching whether the same data exists in a database, if so, executing S102, and if not, executing S104;
s102: extracting a data set containing the language data in the database, and selecting and extracting corresponding standard language data in the data set;
s103: inputting the standard language data into a translation module;
s104: segmenting the language data according to the language criterion of the belonged place to obtain an analysis result, wherein the analysis result comprises a plurality of segmentation data tables according to the language criterion of the belonged place;
s104: finding out corresponding standard Chinese participles according to the obtained participle data table, and recombining the standard Chinese participles into sentences according to the corresponding relation between the language criterion of the place to which the standard Chinese participles belong and the standard Chinese language criterion;
s105: selecting a standard Chinese sentence with the maximum probability, establishing a corresponding relation between the language data and the standard Chinese sentence, and storing the language data and the standard Chinese sentence into a database;
s106: inputting the statement data in the standard into a translation module.
Further, the method comprises the following steps: in step S101, data is retrieved from the database by means of a keyword search.
Further: in step S105, a standard chinese sentence with the highest probability is selected according to the semantics and intonation of the language data input by the user.
Further: the output module comprises a voice playing module, and the voice playing module is used for playing the translated voice.
The invention has the beneficial effects that: the English translation system based on machine learning can realize learning according to the voice habit of a user, so that the translated language and the semantic are accurate and vivid in expression.
Drawings
FIG. 1 is a block diagram of the English translation system based on machine learning according to the present invention;
FIG. 2 is a schematic flowchart of a method for translating Chinese into English by a translation module in an English translation system based on machine learning according to the present invention;
fig. 3 is a schematic flow chart of a learning method of a learning unit in an english translation system based on machine learning according to the present invention.
Detailed Description
The subject matter described herein will now be discussed with reference to example embodiments. It is to be understood that these embodiments are discussed to enable those skilled in the art to better understand and thereby implement the subject matter described herein. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as necessary. In addition, features described with respect to some examples may also be combined in other examples.
Example 1
Referring to fig. 1, in the present embodiment, an english translation system based on machine learning is proposed, including:
the database is used for storing a translation set, and the translation set comprises a mapping set of English-standard language-common language;
the voice receiving module is used for receiving the voice of a user to be treated and carrying out noise reduction and impurity removal treatment on the voice;
the translation module is used for translating English and standard language;
the selection module is used for selecting a common language corresponding to the standard Chinese language in the database;
the output module is used for outputting the common language;
the learning module receives the language of the user, learns the common language habit of the user, establishes a corresponding relation with the standard language and stores the corresponding relation in the database.
Example 2
Referring to fig. 2, in this embodiment, the method for translating chinese into english by the translation module includes the following steps:
s1: acquiring voice data information, and segmenting data to obtain a segmented data information table;
s2: acquiring an input data information table, and retrieving a corresponding data set in a database according to the key words;
s3: extracting corresponding data in the data set, and combining according to rules to obtain standard Chinese sentences;
s4: selecting the obtained standard Chinese sentences, and selecting the standard Chinese sentences with the highest probability;
s5: and outputting the obtained standard Chinese sentence to a selection module.
In step S1, after acquiring the voice data, converting the voice data into text information, then segmenting the text information, and performing word segmentation on the text information, the steps are as follows:
s11: analyzing the language type of the character information to obtain a language type analysis result of the character information, wherein the language type analysis result at least comprises a prestored standard language type;
s12: according to the language patterns obtained by analysis, dividing characters and/or words of the character information, wherein each language pattern obtains a group of characters and/or words;
s13: each group of characters and/or words is organized into a word information table.
In step S2, a keyword is extracted for each piece of word information, and a corresponding data set is retrieved from the database according to the keyword, wherein a weight is given to the word list, and the higher the frequency of occurrence of the keyword in the word information list is, the higher the corresponding weight is.
In step S2, the data set is composed of words and phrases corresponding to english and chinese.
Example 3
Referring to fig. 3, in this embodiment, the learning module includes a selecting unit and a learning unit, the selecting unit is configured to select a location to which a voice habit of a user belongs, and then download a corpus of the location from the general server to the database, where the corpus includes a mapping set of english-standard language-common language;
the learning unit is used for learning the common language habits of the user, establishing a corresponding relation with the standard language, and storing and updating the database.
The learning method of the learning unit comprises the following steps:
s101: receiving language data input by a user, analyzing and processing the language data, searching whether the same data exists in a database, if so, executing S102, and if not, executing S104;
s102: extracting a data set containing the language data in the database, and selecting and extracting corresponding standard language data in the data set;
s103: inputting the standard language data into a translation module;
s104: segmenting the language data according to the language criterion of the belonged place to obtain an analysis result, wherein the analysis result comprises a plurality of segmentation data tables according to the language criterion of the belonged place;
s104: finding out corresponding standard Chinese participles according to the obtained participle data table, and recombining the standard Chinese participles into sentences according to the corresponding relation between the language criterion of the place to which the standard Chinese participles belong and the standard Chinese language criterion;
s105: selecting a standard Chinese sentence with the maximum probability, establishing a corresponding relation between the language data and the standard Chinese sentence, and storing the language data and the standard Chinese sentence into a database;
s106: and inputting the statement data in the standard into a translation module.
In step S101, data is retrieved from the database by means of a keyword search.
In step S105, a standard chinese sentence with the highest probability is selected according to the semantics and intonation of the language data input by the user.
The output module comprises a voice playing module, and the voice playing module is used for playing the translated voice.
The English translation system based on machine learning provided by the invention can realize learning according to the voice habit of a user, so that the translated language and the semantic are accurate, and vivid expression is realized.
The embodiments of the present invention have been described with reference to the drawings, but the present invention is not limited to the above-mentioned specific embodiments, which are only illustrative and not restrictive, and those skilled in the art can make many forms without departing from the spirit and scope of the present invention and the protection scope of the claims.
Claims (10)
1. An english translation system based on machine learning, comprising:
the database is used for storing a translation set, and the translation set comprises a mapping set of English-standard language-common language;
the voice receiving module is used for receiving the voice of a user to be treated and carrying out noise reduction and impurity removal treatment on the voice;
the translation module is used for translating English and standard language mutually;
the selection module is used for selecting a common language corresponding to the standard Chinese language in the database;
the output module is used for outputting the common language;
the learning module receives the language of the user, learns the common language habit of the user, establishes a corresponding relation with the standard language and stores the corresponding relation into the database.
2. The machine learning-based english translation system according to claim 1, wherein the method for translating chinese into english of the translation module comprises the following steps:
s1: acquiring voice data information, and segmenting data to obtain a segmented data information table;
s2: acquiring an input data information table, and retrieving a corresponding data set in a database according to the key words;
s3: extracting corresponding data in the data set, and combining according to rules to obtain standard Chinese sentences;
s4: selecting the obtained standard Chinese sentences, and selecting the standard Chinese sentences with the highest probability;
s5: and outputting the obtained standard Chinese sentence to a selection module.
3. The english translation system based on machine learning of claim 2, wherein in step S1, after obtaining the speech data, the speech data is converted into text information, and then the text information is segmented, and the step of segmenting the text information is as follows:
s11: analyzing the language form of the character information to obtain a language form analysis result of the character information, wherein the language form analysis result at least comprises a prestored standard language form;
s12: according to the language forms obtained by analysis, dividing characters and/or words into the character information, wherein each language form obtains a group of characters and/or words;
s13: each group of characters and/or words is organized into a word information table.
4. The english translation system based on machine learning according to claim 3, wherein in step S2, a keyword is extracted for each piece of word information, and the database is searched for the corresponding data set according to the keyword, wherein the word list is weighted, and the higher the frequency of occurrence of the keyword in the word information list is, the higher the corresponding weight is.
5. The machine-learning-based english translation system according to claim 4, wherein in step S2, the data set is composed of words and phrases corresponding to english and chinese.
6. The English translation system based on machine learning of claim 5, wherein the learning module comprises a selection unit and a learning unit, the selection unit is configured to select a location to which the voice habit of the user belongs, and then download a corpus of the location from the overall server into the database, the corpus comprising a mapping set of English-standard language-common language;
the learning unit is used for learning the common language habit of the user, establishing a corresponding relation with the standard language, and storing and updating the database.
7. The English translation system based on machine learning of claim 6, wherein the learning method of the learning unit comprises the following steps:
s101: receiving language data input by a user, analyzing and processing the language data, searching whether the same data exists in a database, if so, executing S102, and if not, executing S104;
s102: extracting a data set containing the language data in the database, and selecting and extracting corresponding standard language data in the data set;
s103: inputting the standard language data into a translation module;
s104: performing word segmentation on the piece of language data according to the language criterion of the belonged place to obtain an analysis result, wherein the analysis result comprises a plurality of word segmentation data tables according to the language criterion of the belonged place;
s104: finding out corresponding standard Chinese participles according to the obtained participle data table, and recombining the standard Chinese participles into sentences according to the corresponding relation between the language criterion of the corresponding place and the standard Chinese language criterion;
s105: selecting a standard Chinese sentence with the maximum probability, establishing a corresponding relation between the language data and the standard Chinese sentence, and storing the language data and the standard Chinese sentence into a database;
s106: and inputting the statement data in the standard into a translation module.
8. The machine-learning-based english translation system according to claim 7, wherein in step S101, data is retrieved from the database by means of keyword retrieval.
9. The system for translating english according to machine learning of claim 8, wherein in step S105, the standard chinese sentence with the highest probability is selected according to the semantic meaning and intonation of the language data inputted by the user.
10. The machine learning-based english translation system according to claim 9, wherein the output module includes a speech playing module, and the speech playing module is configured to play the speech after translation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211439914.9A CN115796194A (en) | 2022-11-17 | 2022-11-17 | English translation system based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211439914.9A CN115796194A (en) | 2022-11-17 | 2022-11-17 | English translation system based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115796194A true CN115796194A (en) | 2023-03-14 |
Family
ID=85438479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211439914.9A Pending CN115796194A (en) | 2022-11-17 | 2022-11-17 | English translation system based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115796194A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101494621A (en) * | 2009-03-16 | 2009-07-29 | 西安六度科技有限公司 | Translation system and translation method for multi-language instant communication terminal |
CN107608978A (en) * | 2017-10-30 | 2018-01-19 | 华北水利水电大学 | A kind of inter-translation method of English and Russian |
CN113591497A (en) * | 2021-07-29 | 2021-11-02 | 内蒙古工业大学 | Mongolian Chinese machine translation method based on morpheme media |
CN114169344A (en) * | 2021-12-07 | 2022-03-11 | 山东建筑大学 | Intelligent translation method based on corpus big data |
-
2022
- 2022-11-17 CN CN202211439914.9A patent/CN115796194A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101494621A (en) * | 2009-03-16 | 2009-07-29 | 西安六度科技有限公司 | Translation system and translation method for multi-language instant communication terminal |
CN107608978A (en) * | 2017-10-30 | 2018-01-19 | 华北水利水电大学 | A kind of inter-translation method of English and Russian |
CN113591497A (en) * | 2021-07-29 | 2021-11-02 | 内蒙古工业大学 | Mongolian Chinese machine translation method based on morpheme media |
CN114169344A (en) * | 2021-12-07 | 2022-03-11 | 山东建筑大学 | Intelligent translation method based on corpus big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108287858B (en) | Semantic extraction method and device for natural language | |
KR101130444B1 (en) | System for identifying paraphrases using machine translation techniques | |
KR101744861B1 (en) | Compound splitting | |
JP3272288B2 (en) | Machine translation device and machine translation method | |
CN110543644A (en) | Machine translation method and device containing term translation and electronic equipment | |
US20080040095A1 (en) | System for Multiligual Machine Translation from English to Hindi and Other Indian Languages Using Pseudo-Interlingua and Hybridized Approach | |
CN108304375A (en) | A kind of information identifying method and its equipment, storage medium, terminal | |
JP2013502643A (en) | Structured data translation apparatus, system and method | |
CN1971554A (en) | Apparatus, method and for translating speech input using example | |
US20070011160A1 (en) | Literacy automation software | |
EP3267327A1 (en) | Entailment pair expansion device, computer program therefor, and question-answering system | |
CN112765977B (en) | Word segmentation method and device based on cross-language data enhancement | |
US8041556B2 (en) | Chinese to english translation tool | |
CN112380848B (en) | Text generation method, device, equipment and storage medium | |
CN101520778A (en) | Apparatus and method for determing parts-of-speech in chinese | |
KR101092354B1 (en) | Compound noun recognition apparatus and its method | |
CN110705285B (en) | Government affair text subject word library construction method, device, server and readable storage medium | |
JP4431759B2 (en) | Unregistered word automatic extraction device and program, and unregistered word automatic registration device and program | |
KR100617319B1 (en) | Apparatus for selecting target word for noun/verb using verb patterns and sense vectors for English-Korean machine translation and method thereof | |
CN115796194A (en) | English translation system based on machine learning | |
JP5293607B2 (en) | Abbreviation generation apparatus and program, and abbreviation generation method | |
KR100376931B1 (en) | A Method of Database System Implementation for Korean-English Translation Using Information Retrieval Techniques | |
JP3326646B2 (en) | Dictionary / rule learning device for machine translation system | |
JP2000250913A (en) | Example type natural language translation method, production method and device for list of bilingual examples and recording medium recording program of the production method and device | |
Raza et al. | Saraiki Language Word Prediction And Spell Correction Framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230314 |