CN114756795A - Webpage translation method and device, computer equipment and storage medium - Google Patents

Webpage translation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114756795A
CN114756795A CN202210364711.1A CN202210364711A CN114756795A CN 114756795 A CN114756795 A CN 114756795A CN 202210364711 A CN202210364711 A CN 202210364711A CN 114756795 A CN114756795 A CN 114756795A
Authority
CN
China
Prior art keywords
webpage
translation
translated
data
web page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210364711.1A
Other languages
Chinese (zh)
Inventor
司健
张伯超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Asset Management Co Ltd
Original Assignee
Ping An Asset Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Asset Management Co Ltd filed Critical Ping An Asset Management Co Ltd
Priority to CN202210364711.1A priority Critical patent/CN114756795A/en
Publication of CN114756795A publication Critical patent/CN114756795A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a webpage translation method, a webpage translation device, computer equipment and a storage medium, and belongs to the field of machine translation. The webpage translation method can be compatible with website translation with fixed content and variable content, can extract webpage data in the webpage to be translated and webpage characteristics corresponding to the webpage data, classifies the webpage data according to the webpage characteristics to further determine the translation category to which the webpage data belong, and stores the webpage data in a first category table to be translated or a second category table to be translated so as to translate the webpage data in a targeted manner and improve the translation efficiency; and selecting corresponding translation modes according to different types of webpage data to translate the data, and translating the webpage data in the second type of table to be translated into a second translation result of the target language through the translation model, so that a translation webpage corresponding to the webpage to be translated is generated according to the first translation result and the second translation result, and the purpose of effectively and quickly translating diversified webpages is achieved.

Description

Webpage translation method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of machine translation, and in particular, to a web page translation method and apparatus, a computer device, and a storage medium.
Background
The current webpage translation mainly adopts the I18N technology. I18N allows web pages to be adapted to different languages and locale needs without major changes. But there is a certain limitation in the technology, and for non-fixed content, background personnel are required to manually configure the non-fixed content into different configuration files for translation. Therefore, the I18N is friendly to websites with relatively fixed and uncomplicated contents (simple layout), and is not friendly to websites with variable contents, such as e-commerce websites and news websites, and has low translation efficiency and poor user experience.
Disclosure of Invention
Aiming at the problem that the existing translation method has limitation on website translation with variable contents, a webpage translation method, a webpage translation device, a computer device and a storage medium which are compatible with website translation with fixed contents and variable contents are provided.
The invention provides a webpage translation method, which comprises the following steps:
acquiring at least one webpage to be translated and a translation request associated with the webpage to be translated;
extracting webpage data of each webpage to be translated and webpage characteristics corresponding to the webpage data;
classifying the webpage data according to the webpage characteristics of the webpage data, and storing the webpage data in a first category table to be translated or a second category table to be translated according to the category of the webpage data;
according to the translation request, translating the webpage data in the first category to-be-translated table of the webpage to be translated into a first translation result of a target language;
according to the translation request, translating the webpage data in the second category of tables to be translated into a second translation result of the target language by adopting a translation model;
and generating a translation webpage corresponding to the webpage to be translated according to the first translation result and the second translation result.
Optionally, the translating, according to the translation request, the webpage data in the first category to-be-translated table of the webpage to be translated into a first translation result in a target language, including:
and translating the webpage data in the first category to-be-translated table of the webpage to be translated into a first translation result of a target language by adopting an I18N translation mode, an L10N translation mode or a G11N translation mode according to the translation request.
Optionally, the web page features are used to identify a state to which the web page data belongs, and the web page features include dynamic features and static features;
the classifying the webpage data according to the webpage features of the webpage data, and storing the webpage data in a first category to-be-translated table or a second category to-be-translated table according to the category of the webpage data includes:
identifying a state of the web page feature;
when the webpage features of the webpage data are static features, storing the webpage data in a first category table to be translated;
and when the webpage features of the webpage data are dynamic features, acquiring a storage path of the webpage data, and storing the storage path in a second category to-be-translated table.
Optionally, the translating, according to the translation request, the translating and obtaining a second translation result of the target language by using a translation model to translate the web page data in the second category to-be-translated table includes:
extracting the webpage data according to the storage path of the webpage data in the second category table to be translated, and matching the webpage data with translation data in a translation result table one by one;
if the translation result table is matched with the webpage data, taking the data matched with the webpage data in the translation result table as the second translation result;
and if not, translating the webpage data into a second translation result of the target language by adopting the translation model according to the translation request.
Optionally, the translation model comprises an encoder and a decoder;
the encoder acquires a word vector of each data in the webpage data, and encodes the word vector to obtain a hidden state representation corresponding to each processed word vector;
and the decoder generates the second translation result according to the hidden state representation corresponding to the word vector.
Optionally, the generating a translation webpage corresponding to the webpage to be translated according to the first translation result and the second translation result includes:
extracting the webpage attribute of the webpage to be translated;
constructing a webpage template corresponding to the webpage to be translated based on the webpage attributes;
and adding the data in the first translation result and the data in the second translation result to the webpage template to obtain a translation webpage corresponding to the webpage to be translated.
The invention also provides a web page translation device, comprising:
the translation device comprises an acquisition unit, a translation unit and a translation unit, wherein the acquisition unit is used for acquiring at least one webpage to be translated and a translation request related to the webpage to be translated;
the extraction unit is used for extracting the webpage data of each webpage to be translated and the webpage features corresponding to the webpage data;
the classification unit is used for classifying the webpage data according to the webpage characteristics of the webpage data and storing the webpage data in a first category table to be translated or a second category table to be translated according to the category of the webpage data;
the first translation unit is used for translating the webpage data in the first category to-be-translated table of the webpage to be translated into a first translation result of a target language according to the translation request;
the second translation unit is used for translating the webpage data in the second category to-be-translated table into a second translation result of the target language by adopting a translation model according to the translation request;
and the generating unit is used for generating a translation webpage corresponding to the webpage to be translated according to the first translation result and the second translation result.
Optionally, the first translation unit is configured to translate, according to the translation request, web page data in the first category to-be-translated table of the to-be-translated web page into a first translation result in a target language by using an I18N translation manner, an L10N translation manner, or a G11N translation manner.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when executing the computer program.
The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The invention provides a webpage translation method, a webpage translation device, computer equipment and a storage medium, which are compatible with website translation with fixed content and variable content, can extract webpage data in a webpage to be translated and webpage characteristics corresponding to the webpage data, classify the webpage data according to the webpage characteristics to further determine the translation category of the webpage data, and store the webpage data in a first category table to be translated or a second category table to be translated so as to translate the webpage data in a targeted manner and improve the translation efficiency; and selecting corresponding translation modes according to different types of webpage data to translate the data, and translating the webpage data in the second type of table to be translated into a second translation result of the target language through the translation model, so that a translation webpage corresponding to the webpage to be translated is generated according to the first translation result and the second translation result, and the purpose of effectively and quickly translating diversified webpages is achieved.
Drawings
FIG. 1 is a flowchart of a method of one embodiment of a web page translation method of the present invention;
FIG. 2 is a flowchart of the present invention for generating a translated web page based on a first translation result and a second translation result;
FIG. 3 is a block diagram of an embodiment of a web page translation apparatus according to the present invention;
fig. 4 is a schematic diagram of a hardware architecture of an embodiment of a computer device according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The invention provides a webpage translation method, a webpage translation device, computer equipment and a storage medium, which are suitable for the business fields of finance, medical treatment, banking, leasing, insurance and the like. The invention can be compatible with website translation with fixed content and variable content, can extract webpage data in a webpage to be translated and webpage characteristics corresponding to the webpage data, classifies the webpage data according to the webpage characteristics to further determine the translation category of the webpage data, and stores the webpage data in a first category table to be translated or a second category table to be translated so as to translate the webpage data in a targeted manner and improve the translation efficiency; and selecting corresponding translation modes according to different types of webpage data to translate the data, and translating the webpage data in the second type of table to be translated into a second translation result of the target language through the translation model, so that a translation webpage corresponding to the webpage to be translated is generated according to the first translation result and the second translation result, and the purpose of effectively and quickly translating diversified webpages is achieved.
Example one
Referring to fig. 1, a web page translation method of the present embodiment includes the following steps:
s1, at least one webpage to be translated and a translation request associated with the webpage to be translated are obtained.
In this embodiment, the web page to be translated may be a credit evaluation web page, a fixed income web page, a financial summary web page, a business collection web page, and the like, and the category of the web page to be translated is not particularly limited. When a user accesses a webpage to be translated through a browser of a client, a translation button can be triggered to send a translation request to a server.
And S2, extracting the webpage data of each webpage to be translated and the webpage characteristics corresponding to the webpage data.
The webpage features are used for identifying the state of the webpage data, and the webpage features comprise dynamic features and static features.
In this embodiment, after receiving a translation request, extracting web page data of a current web page to be translated, where the web page data includes two types of structured data and non-structured data, and the structured data includes static data such as general data, regional data, and industrial data; the non-organization data comprises announcement data, bond data, public opinion data and other dynamic data. The web page features corresponding to the structured data are static features, and the web page features corresponding to the unstructured data are dynamic features. The non-mechanization data can be stored in the cloud server, and the webpage data extracted in the embodiment is a storage path of the non-mechanization data.
And S3, classifying the webpage data according to the webpage characteristics of the webpage data, and storing the webpage data in a first category table to be translated or a second category table to be translated according to the category of the webpage data.
Further, step S3 may include:
and S31, identifying the state of the webpage features.
And S32, when the webpage features of the webpage data are static features, storing the webpage data in a first-class table to be translated.
And S33, when the webpage features of the webpage data are dynamic features, acquiring a storage path of the webpage data, and storing the storage path in a second type table to be translated.
In the embodiment, the first type of table to be translated is used for storing structured data, and the second type of table to be translated is used for storing a storage path of unstructured data. The webpage characteristics can comprise characteristics such as data labels and label types, whether the webpage characteristics are structured data (static characteristics) or unstructured data (dynamic characteristics) is judged according to the label types, if the webpage characteristics are the structured data, the corresponding webpage data are stored in a first category of tables to be translated, and if the webpage characteristics are the unstructured data, storage paths of the corresponding webpage data are stored in a second category of tables to be translated.
And S4, according to the translation request, translating the webpage data in the first category to-be-translated table of the webpage to be translated into a first translation result of the target language.
Further, step S4 may include: and translating the webpage data in the first category to-be-translated table of the webpage to be translated into a first translation result of the target language by adopting an I18N translation mode, an L10N translation mode, a G11N translation mode or an M17N translation mode according to the translation request.
In the present embodiment, the first category pending translation table is used for storing structured data. In this embodiment, for the translation of the structured data, an I18N (the source of the I and n is the first and last characters of the international translation, and 18 is the number of the middle characters, and is an "international" abbreviation), an L10N translation, or a G11N translation may be used. The I18N translation mode supports multiple languages, and only supports english and a selected speech, such as english + chinese, english + german, english + korean, etc., at the same time. The I18N translation mode separates the language files and then translates the separated language files through the gettext software package. The L10N translation mode supports two languages, English and another language (e.g., Chinese); the translation of G11N can be understood as G11N ═ I18N + L10N; the M17N translation may support multiple languages at the same time, such as showing english, chinese, german, and korean at the same time on one page.
And S5, according to the translation request, translating the webpage data in the second category of tables to be translated into a second translation result of the target language by adopting a translation model.
Further, step S5 may include:
and S51, extracting the webpage data according to the storage path of the webpage data in the second category table to be translated, and matching the webpage data with the translation data in the translation result table one by one.
And S52, if the translation result table is matched with the webpage data, taking the data matched with the webpage data in the translation result table as the second translation result.
And S53, if not, translating the webpage data into a second translation result of the target language by adopting the translation model according to the translation request.
In the present embodiment, a translation result table is stored in the database, and the translation result table is used to store historical translation result data. When the webpage data content is matched with the data in the translation result table, the matched translation result can be directly used as a second translation result, the translation model is not needed to be adopted for translating the webpage data, webpage data which do not exist in the translation result table are screened out for translation through an incremental duplication removal mode, and the translation efficiency can be effectively improved.
In particular, the translation model comprises an encoder and a decoder;
and the encoder acquires a word vector of each data in the webpage data and encodes the word vector to obtain the processed hidden state representation corresponding to each word vector.
And the decoder generates the second translation result according to the hidden state representation corresponding to the word vector.
In this embodiment, the encoder may employ an LSTM model and the decoder may employ an RNN model. Converting webpage data into word embedding, inputting the word embedding into a bidirectional LSTM encoder to extract features, inputting text features into a Bernoulli variant layer, combining outputs of the variant layers, inputting the combined outputs into a batch normalization layer to obtain potential semantic distribution, sampling the distribution to obtain potential semantic codes, simultaneously adding an attention mechanism on the bidirectional RNN encoder, and inputting the potential semantic codes and attention vectors into an RNN decoder to obtain probability distribution of target language translations.
And S6, generating a translation webpage corresponding to the webpage to be translated according to the first translation result and the second translation result.
Further, step S6 referring to fig. 2 may include:
and S61, extracting the webpage attribute of the webpage to be translated.
The webpage attributes comprise information such as character codes of the webpage, language statements of the webpage, height of a visible region of the webpage, width of the visible region of the webpage, height of full text of the webpage, width of the full text of the webpage, and vertical distance and horizontal distance of the current page relative to the upper left corner of a window display region.
And S62, constructing a webpage template corresponding to the webpage to be translated based on the webpage attributes.
And S63, adding the data in the first translation result and the data in the second translation result to the webpage template to obtain a translation webpage corresponding to the webpage to be translated.
In this embodiment, a web page template corresponding to the visual effect is constructed according to the web page attributes of the web page to be translated, and translation results of each type are added to corresponding positions of the web page template, so that a translated web page is obtained. According to the embodiment, the translation consistency and the integral translation effect with high conformity are improved on the basis of the original interface of the webpage to be translated through the webpage template, the visual experience of a webpage translation user is improved, the correlation between the translated webpage content and the original webpage content is ensured, and the webpage user can conveniently read the webpage.
In this embodiment, the web page translation method is compatible with web site translation with fixed content and variable content, can extract web page data in a web page to be translated and web page features corresponding to the web page data, classifies the web page data according to the web page features to further determine a translation category to which the web page data belongs, and stores the web page data in a first category table to be translated or a second category table to be translated, so that the web page data can be translated in a targeted manner to improve the translation efficiency; and selecting corresponding translation modes according to different types of webpage data to translate the data, and translating the webpage data in the second type of table to be translated into a second translation result of the target language through the translation model, so that a translation webpage corresponding to the webpage to be translated is generated according to the first translation result and the second translation result, and the purpose of effectively and quickly translating diversified webpages is achieved. The webpage translation method of the embodiment is not limited by the I18N technology, the type of the applicable webpage is promoted and expanded through the translation model, and the flexibility of the webpage translation is increased.
Example two
Referring to fig. 3, a web page translation apparatus 1 of the present embodiment includes: the device comprises an acquisition unit 11, an extraction unit 12, a classification unit 13, a first translation unit 14, a second translation unit 15 and a generation unit 16.
The acquiring unit 11 is configured to acquire at least one to-be-translated web page and a translation request associated with the to-be-translated web page.
In this embodiment, the web page to be translated may be a credit evaluation web page, a fixed income web page, a financial summary web page, a business collection web page, and the like, and the category of the web page to be translated is not particularly limited. When a user accesses a webpage to be translated through a browser of a client, a translation button can be triggered to send a translation request to a server.
The extracting unit 12 is configured to extract web page data of each to-be-translated web page and web page features corresponding to the web page data.
The webpage features are used for identifying the state of the webpage data, and the webpage features comprise dynamic features and static features.
In this embodiment, after receiving a translation request, extracting web page data of a current web page to be translated, where the web page data includes two types of structured data and non-structured data, and the structured data includes static data such as general data, regional data, and industrial data; the non-organization data comprises announcement data, bond data, public opinion data and other dynamic data. The web page features corresponding to the structured data are static features, and the web page features corresponding to the unstructured data are dynamic features. The non-mechanization data can be stored in the cloud server, and the webpage data extracted in the embodiment is a storage path of the non-mechanization data.
The classification unit 13 is configured to classify the web page data according to the web page features of the web page data, and store the web page data in a first category to-be-translated table or a second category to-be-translated table according to a category of the web page data.
Further, the classification unit 13 is configured to identify a state of the web page feature. When the webpage features of the webpage data are static features, storing the webpage data in a first category table to be translated. And when the webpage features of the webpage data are dynamic features, acquiring a storage path of the webpage data, and storing the storage path in a second category to-be-translated table.
In the embodiment, the first type of table to be translated is used for storing structured data, and the second type of table to be translated is used for storing a storage path of unstructured data. The webpage characteristics can comprise characteristics such as data labels and label types, whether the webpage characteristics are structured data (static characteristics) or unstructured data (dynamic characteristics) is judged according to the label types, if the webpage characteristics are the structured data, the corresponding webpage data are stored in a first category of tables to be translated, and if the webpage characteristics are the unstructured data, storage paths of the corresponding webpage data are stored in a second category of tables to be translated.
The first translation unit 14 is configured to translate, according to the translation request, the web page data in the first category to-be-translated table of the web page to be translated into a first translation result in the target language.
Further, the first translation unit 14 is configured to translate the web page data in the first category to-be-translated table of the web page to be translated into the first translation result of the target language according to the translation request by adopting an I18N translation manner, an L10N translation manner, a G11N translation manner or an M17N translation manner.
In the present embodiment, the first category pending translation table is used for storing structured data. In this embodiment, for the translation of the structured data, an I18N (the source of the I and n is the first and last characters of the international translation, and 18 is the number of the middle characters, and is an "international" abbreviation), an L10N translation, or a G11N translation may be used. The I18N translation supports multiple languages, and only supports english and a selected speech, such as english + chinese, english + german, english + korean, etc., at the same time. The I18N translation method separates the language files and then translates the separated language files through a gettext software package. The L10N translation mode supports two languages, English and another language (e.g., Chinese); the translation of G11N can be understood as G11N ═ I18N + L10N; the M17N translation may support multiple languages at the same time, such as showing english, chinese, german, and korean at the same time on one page.
And the second translation unit 15 is configured to translate, according to the translation request, the web page data in the second category to-be-translated table into a second translation result in the target language by using a translation model.
Further, the second translation unit 15 is configured to extract the web page data according to the storage path of the web page data in the second category table to be translated, and match the web page data with the translation data in the translation result table one by one. And if so, taking the data matched with the webpage data in the translation result table as the second translation result. And if not, translating the webpage data into a second translation result of the target language by adopting the translation model according to the translation request.
In the present embodiment, a translation result table is stored in the database, and the translation result table is used to store historical translation result data. When the webpage data content is matched with the data of the translation result table, the matched translation result can be directly used as a second translation result, the translation model is not needed to be adopted for translating the webpage data, webpage data which does not exist in the translation result table is screened out for translation through an incremental duplication removal mode, and the translation efficiency can be effectively improved.
In particular, the translation model comprises an encoder and a decoder;
and the encoder acquires a word vector of each data in the webpage data and encodes the word vector to obtain the processed hidden state representation corresponding to each word vector.
And the decoder generates the second translation result according to the hidden state representation corresponding to the word vector.
In this embodiment, the encoder may employ an LSTM model and the decoder may employ an RNN model. Converting webpage data into word embedding, inputting the word embedding into a bidirectional LSTM encoder to extract features, inputting text features into a Bernoulli variant layer, combining outputs of the variant layers, inputting the combined outputs into a batch normalization layer to obtain potential semantic distribution, sampling the distribution to obtain potential semantic codes, simultaneously adding an attention mechanism on the bidirectional RNN encoder, and inputting the potential semantic codes and attention vectors into an RNN decoder to obtain probability distribution of target language translations.
And the generating unit 16 is configured to generate a translated webpage corresponding to the webpage to be translated according to the first translation result and the second translation result.
Further, the generating unit 16 is configured to extract a web page attribute of the web page to be translated, construct a web page template corresponding to the web page to be translated based on the web page attribute, add data in the first translation result and data in the second translation result to the web page template, and obtain a translated web page corresponding to the web page to be translated.
The webpage attributes comprise information such as character codes of the webpage, language statements of the webpage, height of a visible region of the webpage, width of the visible region of the webpage, height of full text of the webpage, width of the full text of the webpage, and vertical distance and horizontal distance of the current page relative to the upper left corner of a window display region.
In this embodiment, a web page template corresponding to the visual effect is constructed according to the web page attributes of the web page to be translated, and translation results of each type are added to corresponding positions of the web page template, so that a translated web page is obtained. According to the embodiment, the translation consistency and the integral translation effect with high conformity are improved on the basis of the original interface of the webpage to be translated through the webpage template, the visual experience of a webpage translation user is improved, the correlation between the translated webpage content and the original webpage content is ensured, and the webpage user can conveniently read the webpage.
In this embodiment, the web page translation device 1 is compatible with web site translation with fixed content and variable content, and is capable of extracting web page data in a web page to be translated and web page features corresponding to the web page data, classifying the web page data according to the web page features to further determine a translation category to which the web page data belongs, and storing the web page data in a first category table to be translated or a second category table to be translated so as to translate the web page data in a targeted manner and improve translation efficiency; and selecting corresponding translation modes according to different types of webpage data to translate the data, and translating the webpage data in the second type of table to be translated into a second translation result of the target language through the translation model, so that a translation webpage corresponding to the webpage to be translated is generated according to the first translation result and the second translation result, and the purpose of effectively and quickly translating diversified webpages is achieved. The web page translation apparatus 1 of the present embodiment is not limited by the I18N technology, and the translation model improves the types of the applicable web pages and expands them, thereby increasing the flexibility of page translation.
EXAMPLE III
In order to achieve the above object, the present invention further provides a computer device 2, where the computer device 2 includes a plurality of computer devices 2, components of the web page translation apparatus 1 according to the second embodiment may be dispersed in different computer devices 2, and the computer device 2 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including an independent server or a server cluster formed by a plurality of servers) that executes a program, or the like. The computer device 2 of the embodiment at least includes but is not limited to: a memory 21, a processor 23, a network interface 22, and the web page translation apparatus 1 (refer to fig. 4) that are communicatively connected to each other through a system bus. It is noted that fig. 4 only shows the computer device 2 with components, but it is to be understood that not all of the shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the memory 21 includes at least one type of computer-readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 2. Of course, the memory 21 may also comprise both an internal storage unit of the computer device 2 and an external storage device thereof. In this embodiment, the memory 21 is generally used for storing an operating system installed in the computer device 2 and various application software, such as a program code of the web page translation method in the first embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 23 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip in some embodiments. The processor 23 is typically arranged to control the overall operation of the computer device 2, such as to perform control and processing related to data interaction or communication with said computer device 2. In this embodiment, the processor 23 is configured to run the program code stored in the memory 21 or process data, for example, run the web page translation apparatus 1.
The network interface 22 may comprise a wireless network interface or a wired network interface, and the network interface 22 is typically used to establish a communication connection between the computer device 2 and other computer devices 2. For example, the network interface 22 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, or other wireless or wired network.
It is noted that fig. 4 only shows the computer device 2 with components 21-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the web page translation apparatus 1 stored in the memory 21 can be further divided into one or more program modules, and the one or more program modules are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 23) to complete the present invention.
Example four
To achieve the above objects, the present invention also provides a computer-readable storage medium including a plurality of storage media such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by the processor 23, implements corresponding functions. The computer-readable storage medium of the present embodiment is used for storing the web page translation apparatus 1, and when being executed by the processor 23, the computer-readable storage medium implements the web page translation method of the first embodiment.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for web page translation, comprising:
acquiring at least one webpage to be translated and a translation request associated with the webpage to be translated;
extracting webpage data of each webpage to be translated and webpage characteristics corresponding to the webpage data;
classifying the webpage data according to the webpage characteristics of the webpage data, and storing the webpage data in a first category table to be translated or a second category table to be translated according to the category of the webpage data;
according to the translation request, translating the webpage data in the first category to-be-translated table of the webpage to be translated into a first translation result of a target language;
according to the translation request, translating the webpage data in the second category of tables to be translated into a second translation result of the target language by adopting a translation model;
and generating a translation webpage corresponding to the webpage to be translated according to the first translation result and the second translation result.
2. The web page translation method according to claim 1, wherein the translating, according to the translation request, the web page data in the first category to-be-translated table of the web page to be translated into the first translation result of the target language comprises:
and translating the webpage data in the first category to-be-translated table of the webpage to be translated into a first translation result of a target language by adopting an I18N translation mode, an L10N translation mode or a G11N translation mode according to the translation request.
3. The web page translation method according to claim 1, wherein the web page features are used for identifying a state to which the web page data belongs, and the web page features include dynamic features and static features;
the classifying the webpage data according to the webpage features of the webpage data, and storing the webpage data in a first category to-be-translated table or a second category to-be-translated table according to the category of the webpage data includes:
identifying a state of the web page feature;
when the webpage features of the webpage data are static features, storing the webpage data in a first-class table to be translated;
and when the webpage features of the webpage data are dynamic features, acquiring a storage path of the webpage data, and storing the storage path in a second category to-be-translated table.
4. The web page translation method according to claim 3, wherein the translating, according to the translation request, the web page data in the second category table to be translated into the second translation result of the target language by using a translation model, comprises:
extracting the webpage data according to the storage path of the webpage data in the second category to-be-translated table, and matching the webpage data with translation data in a translation result table one by one;
if the translation result table is matched with the webpage data, taking the data matched with the webpage data in the translation result table as the second translation result;
and if not, translating the webpage data into a second translation result of the target language by adopting the translation model according to the translation request.
5. The web page translation method of claim 4, wherein the translation model comprises an encoder and a decoder;
the encoder acquires a word vector of each data in the webpage data, and encodes the word vector to obtain a hidden state representation corresponding to each processed word vector;
and the decoder generates the second translation result according to the hidden state representation corresponding to the word vector.
6. The web page translation method according to claim 1, wherein the generating a translated web page corresponding to the web page to be translated according to the first translation result and the second translation result comprises:
extracting the webpage attribute of the webpage to be translated;
constructing a webpage template corresponding to the webpage to be translated based on the webpage attributes;
and adding the data in the first translation result and the data in the second translation result to the webpage template to obtain a translation webpage corresponding to the webpage to be translated.
7. A web page translation apparatus, comprising:
the translation device comprises an acquisition unit, a translation unit and a translation unit, wherein the acquisition unit is used for acquiring at least one webpage to be translated and a translation request related to the webpage to be translated;
the extraction unit is used for extracting the webpage data of each webpage to be translated and the webpage characteristics corresponding to the webpage data;
the classification unit is used for classifying the webpage data according to the webpage characteristics of the webpage data and storing the webpage data in a first category table to be translated or a second category table to be translated according to the category of the webpage data;
the first translation unit is used for translating the webpage data in the first category to-be-translated table of the webpage to be translated into a first translation result of a target language according to the translation request;
the second translation unit is used for translating the webpage data in the second category to-be-translated table into a second translation result of the target language by adopting a translation model according to the translation request;
and the generating unit is used for generating a translation webpage corresponding to the webpage to be translated according to the first translation result and the second translation result.
8. The web page translation apparatus according to claim 7, wherein the first translation unit is configured to translate web page data in the first category to-be-translated table of the web page to be translated into the first translation result of the target language according to the translation request by using an I18N translation manner, an L10N translation manner, or a G11N translation manner.
9. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202210364711.1A 2022-04-07 2022-04-07 Webpage translation method and device, computer equipment and storage medium Pending CN114756795A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210364711.1A CN114756795A (en) 2022-04-07 2022-04-07 Webpage translation method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210364711.1A CN114756795A (en) 2022-04-07 2022-04-07 Webpage translation method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114756795A true CN114756795A (en) 2022-07-15

Family

ID=82328806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210364711.1A Pending CN114756795A (en) 2022-04-07 2022-04-07 Webpage translation method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114756795A (en)

Similar Documents

Publication Publication Date Title
US20070208997A1 (en) Xsl transformation and translation
CN108108342B (en) Structured text generation method, search method and device
US11256912B2 (en) Electronic form identification using spatial information
US9904936B2 (en) Method and apparatus for identifying elements of a webpage in different viewports of sizes
CN107798001B (en) Webpage processing method, device and equipment
CN112015430A (en) JavaScript code translation method and device, computer equipment and storage medium
CN112559106A (en) Multi-language-based page translation method
US9811505B2 (en) Techniques to provide processing enhancements for a text editor in a computing environment
US11620807B2 (en) Systems and methods for Unicode homograph anti-spoofing using optical character recognition
CN109271598B (en) Method, device and storage medium for extracting news webpage content
CN104750663A (en) Identification method and device for text messy codes in page
CN111831384A (en) Language switching method and device, equipment and storage medium
US20150106701A1 (en) Input support method and information processing system
CN114860233A (en) Page generation method, device, equipment, storage medium and product
CN114444465A (en) Information extraction method, device, equipment and storage medium
US9208134B2 (en) Methods and systems for tokenizing multilingual textual documents
CN114398138A (en) Interface generation method and device, computer equipment and storage medium
CN114092948A (en) Bill identification method, device, equipment and storage medium
CN113626441A (en) Text management method, device and equipment based on scanning equipment and storage medium
CN116774973A (en) Data rendering method, device, computer equipment and storage medium
CN114756795A (en) Webpage translation method and device, computer equipment and storage medium
CN112800078A (en) Lightweight text labeling method, system, equipment and storage medium based on javascript
JP2010272116A (en) Online dictionary service providing apparatus and method
JP2019053469A (en) Database creating device, database creating method, and program
CN113407890B (en) Information extraction method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination