CN115796194A - English translation system based on machine learning - Google Patents

English translation system based on machine learning Download PDF

Info

Publication number
CN115796194A
CN115796194A CN202211439914.9A CN202211439914A CN115796194A CN 115796194 A CN115796194 A CN 115796194A CN 202211439914 A CN202211439914 A CN 202211439914A CN 115796194 A CN115796194 A CN 115796194A
Authority
CN
China
Prior art keywords
language
data
standard
database
english
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211439914.9A
Other languages
Chinese (zh)
Inventor
张芳舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin Agricultural Science and Technology College
Original Assignee
Jilin Agricultural Science and Technology College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin Agricultural Science and Technology College filed Critical Jilin Agricultural Science and Technology College
Priority to CN202211439914.9A priority Critical patent/CN115796194A/en
Publication of CN115796194A publication Critical patent/CN115796194A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to the field of translation, and discloses an English translation system based on machine learning, which comprises: the database is used for storing a translation set, and the translation set comprises a mapping set of English-standard language-common language; the voice receiving module is used for receiving the voice of a user to be treated and carrying out noise reduction and impurity removal treatment on the voice; the translation module is used for translating English and standard language mutually; the selection module is used for selecting a common language corresponding to the standard Chinese language in the database; the output module is used for outputting the common language; the learning module receives the language of the user, learns the common language habits of the user, establishes the corresponding relation with the standard language and stores the corresponding relation into the database.

Description

English translation system based on machine learning
Technical Field
The invention relates to the field of translation, in particular to an English translation system based on machine learning.
Background
Machine translation, also known as automatic translation, is the process of converting one natural language (source language) to another natural language (target language) using a computer. It is a branch of computational linguistics, is one of the ultimate targets of artificial intelligence, and has important scientific research value.
Meanwhile, machine translation has important practical value. With the rapid development of the globalization of economy and the internet, the machine translation technology plays an increasingly important role in the aspects of promoting political, economic and cultural communication and the like.
The existing machine translation systems are various in types, wherein English-to-Chinese systems are numerous, but the translated languages are hard, the translation cannot be carried out according to the common voice habits, speeches and the like of users, and the practical requirements of the existing translation cannot be met.
Disclosure of Invention
The invention provides an English translation system based on machine learning, which comprises:
the database is used for storing a translation set, and the translation set comprises a mapping set of English-standard language-common language;
the voice receiving module is used for receiving the voice of a user to be treated and carrying out noise reduction and impurity removal treatment on the voice;
the translation module is used for translating English and standard language;
the selection module is used for selecting a common language corresponding to the standard Chinese language in the database;
the output module is used for outputting the common language;
the learning module receives the language of the user, learns the common language habit of the user, establishes a corresponding relation with the standard language and stores the corresponding relation in the database.
Further: the method for translating Chinese into English by the translation module comprises the following steps:
s1: acquiring voice data information, and segmenting data to obtain a segmented data information table;
s2: acquiring an input data information table, and retrieving a corresponding data set in a database according to the key words;
s3: extracting corresponding data in the data set, and combining according to rules to obtain standard Chinese sentences;
s4: selecting the obtained standard Chinese sentences, and selecting the standard Chinese sentences with the highest probability;
s5: and outputting the obtained standard Chinese sentence to a selection module.
Further, the method comprises the following steps: in step S1, after acquiring the voice data, converting the voice data into text information, then segmenting the text information, and performing word segmentation on the text information, the steps are as follows:
s11: analyzing the language type of the character information to obtain a language type analysis result of the character information, wherein the language type analysis result at least comprises a prestored standard language type;
s12: according to the language patterns obtained by analysis, dividing characters and/or words of the character information, wherein each language pattern obtains a group of characters and/or words;
s13: and compiling each group of characters and/or words into a word information table.
Further, the method comprises the following steps: in step S2, a keyword is extracted for each piece of word information, and a corresponding data set is retrieved from the database according to the keyword, wherein a weight is given to the word list, and the higher the frequency of occurrence of the keyword in the word information list, the higher the corresponding weight.
Further: in step S2, the data set is composed of words and phrases corresponding to english and chinese.
Further, the method comprises the following steps: the learning module comprises a selection unit and a learning unit, wherein the selection unit is used for selecting the place where the voice habit of the user belongs and then downloading a corpus of the place from the master server to the database, and the corpus comprises a mapping set of English-standard language-common language;
the learning unit is used for learning the common language habit of the user, establishing a corresponding relation with the standard language, and storing and updating the database.
Further, the method comprises the following steps: the learning method of the learning unit comprises the following steps:
s101: receiving language data input by a user, analyzing and processing the language data, searching whether the same data exists in a database, if so, executing S102, and if not, executing S104;
s102: extracting a data set containing the language data in the database, and selecting and extracting corresponding standard language data in the data set;
s103: inputting the standard language data into a translation module;
s104: segmenting the language data according to the language criterion of the belonged place to obtain an analysis result, wherein the analysis result comprises a plurality of segmentation data tables according to the language criterion of the belonged place;
s104: finding out corresponding standard Chinese participles according to the obtained participle data table, and recombining the standard Chinese participles into sentences according to the corresponding relation between the language criterion of the place to which the standard Chinese participles belong and the standard Chinese language criterion;
s105: selecting a standard Chinese sentence with the maximum probability, establishing a corresponding relation between the language data and the standard Chinese sentence, and storing the language data and the standard Chinese sentence into a database;
s106: inputting the statement data in the standard into a translation module.
Further, the method comprises the following steps: in step S101, data is retrieved from the database by means of a keyword search.
Further: in step S105, a standard chinese sentence with the highest probability is selected according to the semantics and intonation of the language data input by the user.
Further: the output module comprises a voice playing module, and the voice playing module is used for playing the translated voice.
The invention has the beneficial effects that: the English translation system based on machine learning can realize learning according to the voice habit of a user, so that the translated language and the semantic are accurate and vivid in expression.
Drawings
FIG. 1 is a block diagram of the English translation system based on machine learning according to the present invention;
FIG. 2 is a schematic flowchart of a method for translating Chinese into English by a translation module in an English translation system based on machine learning according to the present invention;
fig. 3 is a schematic flow chart of a learning method of a learning unit in an english translation system based on machine learning according to the present invention.
Detailed Description
The subject matter described herein will now be discussed with reference to example embodiments. It is to be understood that these embodiments are discussed to enable those skilled in the art to better understand and thereby implement the subject matter described herein. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as necessary. In addition, features described with respect to some examples may also be combined in other examples.
Example 1
Referring to fig. 1, in the present embodiment, an english translation system based on machine learning is proposed, including:
the database is used for storing a translation set, and the translation set comprises a mapping set of English-standard language-common language;
the voice receiving module is used for receiving the voice of a user to be treated and carrying out noise reduction and impurity removal treatment on the voice;
the translation module is used for translating English and standard language;
the selection module is used for selecting a common language corresponding to the standard Chinese language in the database;
the output module is used for outputting the common language;
the learning module receives the language of the user, learns the common language habit of the user, establishes a corresponding relation with the standard language and stores the corresponding relation in the database.
Example 2
Referring to fig. 2, in this embodiment, the method for translating chinese into english by the translation module includes the following steps:
s1: acquiring voice data information, and segmenting data to obtain a segmented data information table;
s2: acquiring an input data information table, and retrieving a corresponding data set in a database according to the key words;
s3: extracting corresponding data in the data set, and combining according to rules to obtain standard Chinese sentences;
s4: selecting the obtained standard Chinese sentences, and selecting the standard Chinese sentences with the highest probability;
s5: and outputting the obtained standard Chinese sentence to a selection module.
In step S1, after acquiring the voice data, converting the voice data into text information, then segmenting the text information, and performing word segmentation on the text information, the steps are as follows:
s11: analyzing the language type of the character information to obtain a language type analysis result of the character information, wherein the language type analysis result at least comprises a prestored standard language type;
s12: according to the language patterns obtained by analysis, dividing characters and/or words of the character information, wherein each language pattern obtains a group of characters and/or words;
s13: each group of characters and/or words is organized into a word information table.
In step S2, a keyword is extracted for each piece of word information, and a corresponding data set is retrieved from the database according to the keyword, wherein a weight is given to the word list, and the higher the frequency of occurrence of the keyword in the word information list is, the higher the corresponding weight is.
In step S2, the data set is composed of words and phrases corresponding to english and chinese.
Example 3
Referring to fig. 3, in this embodiment, the learning module includes a selecting unit and a learning unit, the selecting unit is configured to select a location to which a voice habit of a user belongs, and then download a corpus of the location from the general server to the database, where the corpus includes a mapping set of english-standard language-common language;
the learning unit is used for learning the common language habits of the user, establishing a corresponding relation with the standard language, and storing and updating the database.
The learning method of the learning unit comprises the following steps:
s101: receiving language data input by a user, analyzing and processing the language data, searching whether the same data exists in a database, if so, executing S102, and if not, executing S104;
s102: extracting a data set containing the language data in the database, and selecting and extracting corresponding standard language data in the data set;
s103: inputting the standard language data into a translation module;
s104: segmenting the language data according to the language criterion of the belonged place to obtain an analysis result, wherein the analysis result comprises a plurality of segmentation data tables according to the language criterion of the belonged place;
s104: finding out corresponding standard Chinese participles according to the obtained participle data table, and recombining the standard Chinese participles into sentences according to the corresponding relation between the language criterion of the place to which the standard Chinese participles belong and the standard Chinese language criterion;
s105: selecting a standard Chinese sentence with the maximum probability, establishing a corresponding relation between the language data and the standard Chinese sentence, and storing the language data and the standard Chinese sentence into a database;
s106: and inputting the statement data in the standard into a translation module.
In step S101, data is retrieved from the database by means of a keyword search.
In step S105, a standard chinese sentence with the highest probability is selected according to the semantics and intonation of the language data input by the user.
The output module comprises a voice playing module, and the voice playing module is used for playing the translated voice.
The English translation system based on machine learning provided by the invention can realize learning according to the voice habit of a user, so that the translated language and the semantic are accurate, and vivid expression is realized.
The embodiments of the present invention have been described with reference to the drawings, but the present invention is not limited to the above-mentioned specific embodiments, which are only illustrative and not restrictive, and those skilled in the art can make many forms without departing from the spirit and scope of the present invention and the protection scope of the claims.

Claims (10)

1. An english translation system based on machine learning, comprising:
the database is used for storing a translation set, and the translation set comprises a mapping set of English-standard language-common language;
the voice receiving module is used for receiving the voice of a user to be treated and carrying out noise reduction and impurity removal treatment on the voice;
the translation module is used for translating English and standard language mutually;
the selection module is used for selecting a common language corresponding to the standard Chinese language in the database;
the output module is used for outputting the common language;
the learning module receives the language of the user, learns the common language habit of the user, establishes a corresponding relation with the standard language and stores the corresponding relation into the database.
2. The machine learning-based english translation system according to claim 1, wherein the method for translating chinese into english of the translation module comprises the following steps:
s1: acquiring voice data information, and segmenting data to obtain a segmented data information table;
s2: acquiring an input data information table, and retrieving a corresponding data set in a database according to the key words;
s3: extracting corresponding data in the data set, and combining according to rules to obtain standard Chinese sentences;
s4: selecting the obtained standard Chinese sentences, and selecting the standard Chinese sentences with the highest probability;
s5: and outputting the obtained standard Chinese sentence to a selection module.
3. The english translation system based on machine learning of claim 2, wherein in step S1, after obtaining the speech data, the speech data is converted into text information, and then the text information is segmented, and the step of segmenting the text information is as follows:
s11: analyzing the language form of the character information to obtain a language form analysis result of the character information, wherein the language form analysis result at least comprises a prestored standard language form;
s12: according to the language forms obtained by analysis, dividing characters and/or words into the character information, wherein each language form obtains a group of characters and/or words;
s13: each group of characters and/or words is organized into a word information table.
4. The english translation system based on machine learning according to claim 3, wherein in step S2, a keyword is extracted for each piece of word information, and the database is searched for the corresponding data set according to the keyword, wherein the word list is weighted, and the higher the frequency of occurrence of the keyword in the word information list is, the higher the corresponding weight is.
5. The machine-learning-based english translation system according to claim 4, wherein in step S2, the data set is composed of words and phrases corresponding to english and chinese.
6. The English translation system based on machine learning of claim 5, wherein the learning module comprises a selection unit and a learning unit, the selection unit is configured to select a location to which the voice habit of the user belongs, and then download a corpus of the location from the overall server into the database, the corpus comprising a mapping set of English-standard language-common language;
the learning unit is used for learning the common language habit of the user, establishing a corresponding relation with the standard language, and storing and updating the database.
7. The English translation system based on machine learning of claim 6, wherein the learning method of the learning unit comprises the following steps:
s101: receiving language data input by a user, analyzing and processing the language data, searching whether the same data exists in a database, if so, executing S102, and if not, executing S104;
s102: extracting a data set containing the language data in the database, and selecting and extracting corresponding standard language data in the data set;
s103: inputting the standard language data into a translation module;
s104: performing word segmentation on the piece of language data according to the language criterion of the belonged place to obtain an analysis result, wherein the analysis result comprises a plurality of word segmentation data tables according to the language criterion of the belonged place;
s104: finding out corresponding standard Chinese participles according to the obtained participle data table, and recombining the standard Chinese participles into sentences according to the corresponding relation between the language criterion of the corresponding place and the standard Chinese language criterion;
s105: selecting a standard Chinese sentence with the maximum probability, establishing a corresponding relation between the language data and the standard Chinese sentence, and storing the language data and the standard Chinese sentence into a database;
s106: and inputting the statement data in the standard into a translation module.
8. The machine-learning-based english translation system according to claim 7, wherein in step S101, data is retrieved from the database by means of keyword retrieval.
9. The system for translating english according to machine learning of claim 8, wherein in step S105, the standard chinese sentence with the highest probability is selected according to the semantic meaning and intonation of the language data inputted by the user.
10. The machine learning-based english translation system according to claim 9, wherein the output module includes a speech playing module, and the speech playing module is configured to play the speech after translation.
CN202211439914.9A 2022-11-17 2022-11-17 English translation system based on machine learning Pending CN115796194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211439914.9A CN115796194A (en) 2022-11-17 2022-11-17 English translation system based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211439914.9A CN115796194A (en) 2022-11-17 2022-11-17 English translation system based on machine learning

Publications (1)

Publication Number Publication Date
CN115796194A true CN115796194A (en) 2023-03-14

Family

ID=85438479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211439914.9A Pending CN115796194A (en) 2022-11-17 2022-11-17 English translation system based on machine learning

Country Status (1)

Country Link
CN (1) CN115796194A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101494621A (en) * 2009-03-16 2009-07-29 西安六度科技有限公司 Translation system and translation method for multi-language instant communication terminal
CN107608978A (en) * 2017-10-30 2018-01-19 华北水利水电大学 A kind of inter-translation method of English and Russian
CN113591497A (en) * 2021-07-29 2021-11-02 内蒙古工业大学 Mongolian Chinese machine translation method based on morpheme media
CN114169344A (en) * 2021-12-07 2022-03-11 山东建筑大学 Intelligent translation method based on corpus big data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101494621A (en) * 2009-03-16 2009-07-29 西安六度科技有限公司 Translation system and translation method for multi-language instant communication terminal
CN107608978A (en) * 2017-10-30 2018-01-19 华北水利水电大学 A kind of inter-translation method of English and Russian
CN113591497A (en) * 2021-07-29 2021-11-02 内蒙古工业大学 Mongolian Chinese machine translation method based on morpheme media
CN114169344A (en) * 2021-12-07 2022-03-11 山东建筑大学 Intelligent translation method based on corpus big data

Similar Documents

Publication Publication Date Title
CN108287858B (en) Semantic extraction method and device for natural language
KR101130444B1 (en) System for identifying paraphrases using machine translation techniques
KR101744861B1 (en) Compound splitting
JP3272288B2 (en) Machine translation device and machine translation method
CN110543644A (en) Machine translation method and device containing term translation and electronic equipment
US20080040095A1 (en) System for Multiligual Machine Translation from English to Hindi and Other Indian Languages Using Pseudo-Interlingua and Hybridized Approach
CN108304375A (en) A kind of information identifying method and its equipment, storage medium, terminal
JP2013502643A (en) Structured data translation apparatus, system and method
CN1971554A (en) Apparatus, method and for translating speech input using example
US20070011160A1 (en) Literacy automation software
EP3267327A1 (en) Entailment pair expansion device, computer program therefor, and question-answering system
CN112765977B (en) Word segmentation method and device based on cross-language data enhancement
US8041556B2 (en) Chinese to english translation tool
CN112380848B (en) Text generation method, device, equipment and storage medium
CN101520778A (en) Apparatus and method for determing parts-of-speech in chinese
KR101092354B1 (en) Compound noun recognition apparatus and its method
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
JP4431759B2 (en) Unregistered word automatic extraction device and program, and unregistered word automatic registration device and program
KR100617319B1 (en) Apparatus for selecting target word for noun/verb using verb patterns and sense vectors for English-Korean machine translation and method thereof
CN115796194A (en) English translation system based on machine learning
JP5293607B2 (en) Abbreviation generation apparatus and program, and abbreviation generation method
KR100376931B1 (en) A Method of Database System Implementation for Korean-English Translation Using Information Retrieval Techniques
JP3326646B2 (en) Dictionary / rule learning device for machine translation system
JP2000250913A (en) Example type natural language translation method, production method and device for list of bilingual examples and recording medium recording program of the production method and device
Raza et al. Saraiki Language Word Prediction And Spell Correction Framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230314