WO2022134577A1

WO2022134577A1 - Translation error identification method and apparatus, and computer device and readable storage medium

Info

Publication number: WO2022134577A1
Application number: PCT/CN2021/109257
Authority: WO
Inventors: 刘丽珍
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2020-12-22
Filing date: 2021-07-29
Publication date: 2022-06-30
Also published as: CN112667208A

Abstract

The present application relates to the field of computer development. Disclosed are a translation error identification method and apparatus, and a computer device and a readable storage medium. The translation error identification method comprises: acquiring second text of a first page, wherein the second text is obtained by translating first text; identifying a grammatical error object including a grammatical error in the second text and a content error object including a content error in the second text; identifying, in a DOM tree of the first page, a specified label corresponding to the grammatical error object and the content error object, and setting the specified label as a target label; and replacing the target label with a grammatical content marking label, such that the first page displays the second text according to the grammatical content marking label. In the present application, words and phrases including grammatical errors and content translation errors in second text are accurately identified, so as to automatically identify a part including an error in the second text, thereby improving error identification speed and identification comprehensiveness, reducing the manpower input, and avoiding the omission of errors.

Description

Translation error identification method, apparatus, computer device and readable storage medium

This application claims the priority of the Chinese patent application with the application number CN 202011528714.1 and the title of "Translation Error Recognition Method, Device, Computer Equipment and Readable Storage Medium" filed on December 22, 2020, the entire contents of which are incorporated by reference in this application.

technical field

The present application relates to the technical field of computer development, and in particular, to a translation error recognition method, device, computer equipment and readable storage medium, which are applied in the technical field of natural language processing of artificial intelligence.

Background technique

When designing web pages, websites usually use texts in multiple languages (such as the first text: Chinese, the second text: English). However, when testers test the pages of the second text (English), due to the content of the page It is very complex, and it is often difficult to identify the content of each vocabulary translation and the grammatical errors in each paragraph.

The inventors found that even if the administrator uses a lot of manpower to identify the errors on the pages of the above website one by one, the process is not only time-consuming and inefficient, but also prone to omissions that cause errors to be omitted.

SUMMARY OF THE INVENTION

The purpose of this application is to provide a translation error identification method, device, computer equipment and readable storage medium, which are used to solve the problem of using a large amount of manpower to identify page errors one by one in the prior art, resulting in long time consuming and low efficiency in checking, It is also prone to omissions that result in erroneous parts being missed.

In order to achieve the above object, the application provides a translation error identification method for checking the errors that occur in the second text obtained by the translation of the first text, including:

acquiring the second text of the first page, wherein the second text is obtained by translating the first text;

Identifying grammatical error objects with grammatical errors in the second text, and content error objects with content errors in the second text;

Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, so that all The first page displays the second text according to the grammatical content annotation tag.

In order to achieve the above purpose, the application also provides a translation error identification device, comprising:

a text input module, configured to obtain the second text of the first page, wherein the second text is obtained by translating the first text;

An error recognition module for identifying a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text;

An error labeling module, configured to identify the specified label corresponding to the syntax error object and the content error object in the DOM tree of the first page, and set the specified label as a target label; replace the target label with a grammar A content annotation tag, so that the first page displays the second text according to the grammatical content annotation tag.

In order to achieve the above object, the present application also provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, which is implemented when the processor of the computer device executes the computer program. Steps of the above translation error identification method.

In order to achieve the above purpose, the present application also provides a computer-readable storage medium, on which a computer program is stored, and the above-mentioned translation error is realized when the computer program stored in the readable storage medium is executed by a processor. Identify the steps of the method.

The translation error identification method, device, computer equipment and readable storage medium provided by the present application improve the error identification speed and comprehensiveness, reduce manpower input, and avoid wrong omission.

Description of drawings

Fig. 1 is the flow chart of the first embodiment of the translation error identification method of the application;

2 is a schematic diagram of the environmental application of the translation error identification method in Embodiment 2 of the translation error identification method of the application;

Fig. 3 is the concrete method flow chart of the translation error identification method in the second embodiment of the translation error identification method of the present application;

4 is a schematic diagram of a program module of Embodiment 3 of the translation error identification device of the present application;

FIG. 5 is a schematic diagram of a hardware structure of a computer device in Embodiment 4 of the computer device of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

The translation error identification method, device, computer equipment and readable storage medium provided by this application are suitable for the technical field of UI design developed by computer, and provide a text input module, error identification module, error labeling module, trigger module, configuration Translation error identification method for input module, spelling format module, spelling format recognition module, and preloading module. In the present application, by identifying the grammatical error object with grammatical errors in the second text, and identifying the content error object with content errors in the second text; identifying the grammatical error object and/or the content error object, in the The corresponding specified tag in the DOM tree of the first page, and the specified tag is set as the target tag; the target tag is replaced with a grammatical content tagging tag, so that the first page displays all the grammatical content tagging tags according to the grammatical content tagging tag. the second text.

Example 1:

Referring to FIG. 1, a translation error identification method of the present embodiment is used to check the errors occurring in the second text obtained by the translation of the first text, including:

S102: Obtain the second text of the first page, wherein the second text is obtained by translating the first text;

S105: Identifying a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text;

S107: Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, causing the first page to display the second text according to the grammatical content markup tag.

In an exemplary embodiment, the DOM tree is extracted from the first page, the text information corresponding to the preset specified tags is obtained from the DOM tree, and the second text is formed by aggregating.

Identifying grammatical error objects with grammatical errors in the second text according to the part of speech of each vocabulary in the second text, and matching each vocabulary in the second text with the vocabulary in the first text to identify all the grammatical error objects in the second text Describe the content error object with content error in the second text, realize the accurate identification of the words with grammatical errors and content translation errors in the second text, so as to automatically identify the wrong part in the second text, improve the error recognition speed and speed. The comprehensiveness of recognition reduces manpower input, and avoids mistakes and omissions, thereby helping to avoid the situation where users who speak the second text as their mother tongue cannot accurately grasp the information conveyed by the second text.

By identifying the corresponding designated tags of the grammatical error object and the content error object in the DOM tree of the first page, and replacing it with the method of labeling the grammatical content, the labeling of the grammatical error object and the content error object not only realizes the The effect of accurately locating the wrong position of the second text in the first page also avoids that the DOM tree is affected by directly marking the grammatical error objects and content error objects in the first page by highlighting and replacing them. Pollution occurs when it affects the positioning of Elements in the DOM.

Embodiment 2:

This embodiment is a specific application scenario of the above-mentioned Embodiment 1. Through this embodiment, the method provided in this application can be described more clearly and specifically.

Next, in the server running the translation error identification method, identify the grammatical error objects with grammatical errors in the second text, and identify the content error objects with content errors in the second text, and analyze the grammar for the grammatical errors. The method provided in this embodiment is specifically described by taking the error object and/or the content error object marked as an example. It should be noted that this embodiment is only exemplary, and does not limit the protection scope of the embodiment of this application.

FIG. 2 schematically shows a schematic diagram of an environmental application of the translation error identification method according to the second embodiment of the present application.

In an exemplary embodiment, the server 2 where the translation error identification method is located is connected to the client 4 through a network 3 respectively; the server 2 may provide services through one or more networks 3, and the network 3 may include various network devices, such as Routers, switches, multiplexers, hubs, modems, bridges, repeaters, firewalls, proxy devices and/or etc. The network 3 may include physical links such as coaxial cable links, twisted pair cable links, fiber optic links, combinations thereof, and/or the like. The network 3 may include wireless links, such as cellular links, satellite links, Wi-Fi links and/or the like; the clients 4 may be computer devices such as smartphones, tablets, laptops, desktops, and the like.

FIG. 3 is a flowchart of a specific method of a translation error identification method provided by an embodiment of the present application, and the method specifically includes steps S201 to S208.

S201: Complete the first page with the second text in response to the browser loading.

In order to avoid sharing the computing power of the browser, the loading rate of the first page of the browser is slow, when the browser completes the translation of the first text to obtain the second text, and the loading is completed with the second When the first page of the text is used, the method of triggering the translation error identification method avoids affecting the browser to translate the first text and load the process with the second text, thereby ensuring the translation and loading rate of the first page.

In addition, when the administrator completes the correction of the errors in the second text through the translation error identification method, and causes the browser to load the first page with the second text again, the system will be triggered again. The method for identifying translation errors is described, so as to ensure that the administrator can identify errors in the modified second text again, thereby improving the efficiency of identifying and modifying errors in the second text.

The browser runs on the client, and is used to display the first page, so that the developer can observe and use it.

S202: Obtain the second text of the first page, where the second text is obtained by translating the first text.

In this step, the DOM tree is extracted from the first page, the text information corresponding to the preset specified tags is obtained from the DOM tree, and the second text is formed by summarizing.

In an exemplary embodiment, the DOM tree is extracted from the first page through the document.body.innerHTML code, and the BODY part of the DOM tree is obtained.

The specified label is an HTML label set according to requirements, the HTML label is a hypertext markup language (abbreviation in foreign language: HTML) markup label, which is the most basic unit in the HTML language, and the HTML label is an HTML (under the standard general markup language). An application) of the most important component, used to mark the text, pictures, input boxes, etc. in the first page; for example: the specified tags include <div><span><a><button>< label><ui><li><input>.

Through the getElementsByTagName("tagName").innerHTML() code, obtain the specified text corresponding to the specified tag in the DOM tree (such as the BODY part of the DOM tree in the above example), and summarize the specified text to obtain the second text .

Further, the data attribute value of the specified label can also be obtained by means of bj.dataset.xx; the value attribute value of the specified label can be obtained by means of obj.value, and the data attribute value and/or the value attribute value can be obtained. Function texts are formed in association with the designated texts, and the function texts reflect the prescribed attributes of the designated texts in the second text, so as to facilitate the management personnel to identify and manage the definitions and specifications of the designated texts.

S203: Extract the text information in the configuration file of the browser, and load it into the second text;

Since the text displayed on the page is not only limited to the page itself, but also includes the pop-up box generated by operating on the page (such as clicking a button on the page), therefore, in order to avoid translation errors in the text on the pop-up box, which may cause the user to The information to be conveyed by the pop-up box cannot be accurately obtained. In this step, the text information displayed on the pop-up box is also included in the second text by extracting the text information in the configuration file of the browser and loading it into the second text. In the second text, the text information on the pop-up box is avoided when checking the second text, resulting in the problem that the second text is not checked comprehensively.

S204: Identify misspelled words and incorrectly formatted characters in the second text, so as to obtain a misspelling object and a format error object, respectively.

In order to realize the automatic identification of the wrong part in the second text, improve the speed of wrong identification, reduce the input of manpower, and thus avoid the situation that the users who speak the second text as their mother tongue cannot accurately grasp the information conveyed by the second text, this step is carried out by Identify misspelled words and incorrectly formatted characters in the second text, so as to obtain misspelled objects and formatted objects respectively; improve the speed and comprehensiveness of error recognition, reduce manpower input, and avoid wrong omission.

In a preferred embodiment, the step of recognizing misspelled words and formatted characters in the second text to obtain misspelling objects and format error objects respectively includes:

S41: Create a list of spelling errors and a list of formatting errors.

In this step, by creating a spelling error list "SpellingErrorList" and a format error list "FormatErrorList", standardized management of spelling errors and format errors occurring in the second text is implemented, so as to facilitate subsequent use of different spelling errors and format errors for spelling errors and format errors The labeling method is used for labeling, which is convenient for managers to view and identify.

In this embodiment, a storage space can be opened in the memory module of the computer device as a storage space for the spelling error list and the format error list.

Preferably, the spelling errors in the spelling error list are deduplicated to reduce the memory occupation of the spelling error list, and the format errors in the format error list are deduplicated to reduce the memory occupation of the format error list .

S42: Perform spelling proofreading on the words in the second text, take words with spelling errors as spelling error objects and save them in the spelling error list.

In this step, words in the second text are sequentially used as variables of a preset regular expression to obtain a word expression, and the word expression is used to search whether the dictionary has the word, so as to realize The technical effect of spelling proofreading the vocabulary in the second text; if the vocabulary does not exist in the dictionary, it is determined that the vocabulary has a spelling error; if the vocabulary exists in the dictionary, it is determined that the vocabulary does not exist. Has spelling errors.

Wherein, a vocabulary list corpus used for natural language processing is used as the dictionary.

S43: Determine whether the format of the second text complies with a preset format rule, take the words and/or symbols with format errors as format error objects, and save the format error objects in the format error list.

In this step, the format rules include case rules and punctuation rules, for example, the case rules are after the carriage return (\r, namely: return), the line feed (\n, namely: newline), and the tab separator. The first letter of the word is capitalized, and the punctuation rule is that all punctuation marks are the punctuation marks of the language corresponding to the second text.

The capitalization rules and punctuation rules can be set as required, and the set capitalization rules and punctuation rules are saved to a preset configuration file, so as to facilitate invoking to determine whether the second text has a format error.

S205: Identify a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text. In order to realize the automatic recognition of the wrong part in the second text, improve the speed of error recognition, reduce the input of manpower, and thus avoid the situation that users who speak the second text as their mother tongue cannot accurately grasp the information conveyed by the second text, this step is based on the following steps: The part-of-speech of each vocabulary in the second text, identifying the grammatical error objects in the second text with grammatical errors, and matching each vocabulary in the second text with the vocabulary in the first text to identify the Content error objects with wrong content in the second text can accurately identify words with grammatical errors and content translation errors in the second text; improve the speed and comprehensiveness of error recognition, reduce manpower input, and avoid wrong omission.

In a preferred embodiment, the step of identifying a grammatical error object with a grammatical error in the second text, and identifying a content error object with a content error in the second text, includes:

S51: Perform word segmentation on the second text to obtain text words, and mark the parts of speech of the text words to obtain a sequence of parts of speech arranged according to the order of the text words.

In this step, the second text paragraphs and sentences are segmented through a preset word segmentation library, and the entire text is segmented into word segments that can be marked, so as to implement word segmentation on the second text to obtain text words ; Mark the parts of speech of the text words through a preset dictionary, and arrange the text words with parts of speech according to the order of the text words in the second text to obtain a part-of-speech sequence.

It should be noted that a third-party lexicon is used ICTCLAS is used as the word segmentation database to perform word segmentation on the second text, wherein ICTCLAS refers to the Chinese Lexical Analysis System (Institute of Computing Technology, Chinese Lexical Analysis System). System), its main functions include Chinese word segmentation, part-of-speech tagging, named entity recognition, new word recognition, and support for user dictionaries.

The dictionary refers to a dictionary with part-of-speech tags corresponding to the second text, such as: The Natural Language Processing Dictionary, The Prolog Dictionary, The Artificial Intelligence Dictionary, Machine The Machine Learning Dictionary, etc.

S52: Determine whether there is an erroneous logical sequence in the part-of-speech sequence through a preset logic rule; if so, set the text word corresponding to the erroneous logical sequence as a grammatical error object; if not, determine the The second text described has no grammatical errors.

In this step, a preset error grammar template is used as the logic rule, wherein the error grammar template is a part-of-speech error sequence constructed by at least one part-of-speech, for example: verb, verb, verb.

Determine whether the wrong part-of-speech sequence occurs in the part-of-speech sequence; if (for example, three consecutive verbs are arranged in sequence in a certain segment of the part-of-speech sequence), set the text word corresponding to the wrong part-of-speech sequence as is a grammar error object; if not, it is determined that the second text does not have a grammar error.

In this embodiment, the logic rules are stored by constructing a grammar library, the logic rules are stored in the grammar library in XML format, and the rules in the grammar library are defined according to the pattern.xsd and/or rules.xsd templates. Error grammar template, and use it as the logic rule, so that the logic rule becomes a regular expression, so as to find the text word with grammar error in the part-of-speech sequence and set it as the grammar error object. Meanwhile, the logic rules can also be defined in other formats.

S53: Translate the text words to obtain back-translated words, and determine whether the back-translated words appear in the first text; if so, set the text words corresponding to the back-translated words as content errors object; if not, it is determined that the second text has no grammatical errors.

In this step, a multi-thread processing method is used to translate the text words to obtain back-translated words, and the back-translated words are used as variables of a preset regular expression to obtain a retrieval expression; through the retrieval expression Retrieve the first text to determine whether the back-translated word appears in the first text; if so, set the text word corresponding to the back-translated word as a content error object; if not, then It is determined that the second text has no grammatical errors.

It should be noted that, due to the translation of the text with the associated relationship, the words with the associated relationship in the text need to be translated by the same thread, so that the multi-threaded processing method cannot be used to translate the text without word segmentation; and this step The text words in the text are obtained by segmenting the second text, so there is no correlation between the obtained text words, so the text words can be translated directly through multi-thread processing, improving the The technical effect of translating text words and examining them.

After identifying the grammatical error object with grammatical errors in the second text, and identifying the content error object with content errors in the second text, the method further includes:

Uploading the syntax error object and/or content error object to the blockchain.

It should be noted that the corresponding summary information is obtained based on the grammatical error object and/or the content error object. Specifically, the summary information is obtained by hashing the grammatical error object and/or the content error object, for example, by using the sha256s algorithm. Uploading summary information to the blockchain ensures its security and fairness and transparency to users. The user equipment can download the summary information from the blockchain in order to verify whether the grammatical error object and/or the content error object has been tampered with. The blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

S206: Identify the specified tags corresponding to the misspelling object and the format error object in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with spelling format annotations label, causing the first page to display the second text according to the spelling format labeling label.

In order to mark the misspelling object and the format error object on the first page for easy identification by the administrator, this step identifies the corresponding designated tags of the misspelling object and the format error object in the DOM tree of the first page, and assigns them The method of replacing the label with the spelling format label, and then labeling the misspelling object and the format error object, not only realizes the effect of accurately locating the wrong position of the second text on the first page, but also avoids the error caused by the first page. The misspelling object and the format error object in the page are directly marked by highlighting replacement, which causes the DOM tree to be polluted and affects the positioning of the Element (element) in the DOM number.

In this embodiment, the content of the spelling format labeling label can be set as required, such as: highlighting label, commenting label, etc., so that the text corresponding to the spelling format labeling label, on the first page, will be marked with Highlights, comments, etc. are displayed.

Exemplarily, first, use element.childNodes In the DOM tree of the first page, the tags with interference characteristics are cleaned up to reduce interference;

Then, via parent.childNodes Get all child nodes Obtain all child nodes (ie child nodes) in the DOM tree, then identify the specified label corresponding to the misspelling object and/or format error object in the child node, and set it as the target Label;

Finally, by In the method of innerText.replce(keyword,result), replace the target label in the child node with the spelling format labeling label to obtain the first page with the labeling effect corresponding to the spelling format labeling label, for example: <span id= "child"><b>keyword</b> 2</span> (recursive processing: replace when the child node has no child nodes). The regular replacement expression used is as follows:

keyword.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), which means to highlight "-\/\\^$*+?.()|[\]{}" through a loop.

Among them, the square brackets [] and the content inside are the misspelling object and the format error object, and the meaning expressed in combination with the square brackets is any one of the characters (single characters) inside.

S207: Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, causing the first page to display the second text according to the grammatical content markup tag.

In order to mark the grammatical error object and the content error object on the first page, so as to facilitate the identification of the administrator, this step identifies the corresponding specified label of the grammatical error object and the content error object in the DOM tree of the first page, and puts them in the DOM tree. The method of replacing the grammatical content labeling label, and then labeling the grammatical error object and the content error object, not only realizes the effect of accurately locating the wrong position of the second text on the first page, but also avoids the first page. The syntax error object and the content error object in the page are directly marked by highlighting replacement, which causes the DOM tree to be polluted and affects the positioning of Element (element) in the DOM number.

In this embodiment, the content of the grammatical content labeling label can be set as required, such as: highlighting labels, commenting labels, etc., so that the text corresponding to the grammatical content labeling label, on the first page, will be marked with Highlights, comments, etc. are displayed.

Then, via parent.childNodes Get all child nodes Obtain all child nodes (ie child nodes) in the DOM tree, then identify the specified label corresponding to the syntax error object and/or content error object in the child node, and set it as the target Label;

Finally, by In the method of innerText.replce(keyword,result), replace the target label in the child node with the grammatical content labeling label to obtain the first page with the labeling effect corresponding to the grammatical content labeling label, for example: <span id= "child"><b>keyword</b> 2</span> (recursive processing: replace when the child node has no child nodes). The regular replacement expression used is as follows:

Among them, the square brackets [] and the content inside are syntax error objects and content error objects, and the meaning expressed in combination with the square brackets is any one of the characters (single characters) inside.

S208: Acquire a first URL of the first page, identify a second URL according to the first URL, and send the second URL to the browser.

Since there are often multiple pages in a website, if the administrator needs to load the pages one by one, it will cause too many inspection operations and low efficiency. The first URL of the first page, identifying the second URL according to the first URL, and sending the second URL to the browser, so that the browser can preload the second URL with the second URL. The second page of text, or directly loading the second page after the execution of S207 is completed, so as to improve the efficiency of the administrator continuing to check the errors of the second text in the second page.

In this embodiment, the first URL of the first page is obtained, the server where the first URL is located is accessed, and the second URL located next to the first URL is obtained, and the second URL is sent to the browser.

For example, the first URL is: https://zhidao.baidu.com/question/3591601.html, access the server where the first URL is located, and obtain the second URL located next to the first URL: https://zhidao.baidu.com/question/3591602.html.

It should be noted that the URL is a Uniform Resource Locator, which is an address of a standard resource on the Internet. And every file bai on the Internet has a unique URL, which contains information that points out the location of the file.

Embodiment three:

Please refer to FIG. 4 , a translation error recognition device 1 of the present embodiment includes:

a text input module 12, configured to obtain the second text of the first page, wherein the second text is obtained by translating the first text;

An error identification module 15, configured to identify a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text;

The error labeling module 17 is used to identify the specified label corresponding to the grammatical error object and the content error object in the DOM tree of the first page, and set the specified label as a target label; replace the target label with A grammatical content annotation tag, so that the first page displays the second text according to the grammatical content annotation tag.

Optionally, the translation error identification device 1 further includes:

The triggering module 11 is configured to enable the translation error identification device 1 to be configured to complete the first page with the second text in response to the browser loading.

Optionally, the translation error identification device 1 further includes:

The configuration input module 13 is configured to extract the text information in the configuration file of the browser and load it into the second text.

Optionally, the translation error identification device 1 further includes:

The spelling format module 14 is configured to identify the misspelled words and the incorrectly formatted characters in the second text, so as to obtain a spelling error object and a format error object respectively.

Optionally, the translation error identification device 1 further includes:

The spelling format recognition module 16 is used to identify the specified tag corresponding to the misspelling object and the format error object in the DOM tree of the first page, and set the specified tag as a target tag; set the target The label is replaced with a spelling format markup label, so that the first page displays the second text according to the spelling format markup label.

Optionally, the translation error identification device 1 further includes:

The preloading module 18 is configured to acquire the first URL of the first page, identify the second URL according to the first URL, and send the second URL to the browser.

The technical solution is applied in the field of UI design developed by computer, by identifying the grammatical error objects with grammatical errors in the second text, and identifying the content error objects with content errors in the second text, and classifying the grammatical errors The specified label corresponding to the object and/or the content error object is replaced with a grammatical content labeling label, so that the first page becomes an H5 page displaying the second text according to the grammatical content labeling label.

Embodiment 4:

In order to achieve the above purpose, the present application also provides a computer device 5, the components of the translation error recognition device 1 of the third embodiment can be dispersed in different computer devices, and the computer device 5 can be a smart phone, tablet computer, Notebook computers, desktop computers, rack servers, blade servers, tower servers or rack servers (including independent servers, or server clusters composed of multiple application servers), etc. The computer device in this embodiment at least includes but is not limited to: a memory 51 and a processor 52 that can be communicatively connected to each other through a system bus, as shown in FIG. 5 . It should be pointed out that FIG. 5 only shows a computer device having a component -, but it should be understood that it is not required to implement all the shown components, and more or less components may be implemented instead.

In this embodiment, the memory 51 (ie, a readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disc, etc. In some embodiments, the memory 51 may be an internal storage unit of a computer device, such as a hard disk or a memory of the computer device. In other embodiments, the memory 51 may also be an external storage device of a computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, Flash Card (Flash Card), etc. Of course, the memory 51 may also include both the internal storage unit of the computer device and its external storage device. In this embodiment, the memory 51 is generally used to store the operating system and various application software installed in the computer equipment, such as the program code of the translation error identification device of the third embodiment. In addition, the memory 51 can also be used to temporarily store various types of data that have been output or will be output.

The processor 52 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 52 is typically used to control the overall operation of the computer device. In this embodiment, the processor 52 is configured to run the program code or process data stored in the memory 51 , for example, run the translation error identification device, so as to implement the translation error identification methods of the first and second embodiments.

Embodiment 5:

In order to achieve the above purpose, the present application also provides a computer-readable storage medium, which can be non-volatile or volatile, such as flash memory, hard disk, multimedia card, card-type memory (for example, , SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM) ), magnetic storage, magnetic disk, optical disk, server, App application mall, etc., on which computer programs are stored, and when the programs are executed by the processor 52, corresponding functions are realized. The computer-readable storage medium of this embodiment is used to store the translation error identification device, and when executed by the processor 52, implements the translation error identification methods of the first embodiment and the second embodiment.

The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation.

The above are only the preferred embodiments of the present application, and are not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied in other related technical fields , are similarly included within the scope of patent protection of this application.

Claims

A translation error identification method for checking errors occurring in a second text obtained by translating a first text, comprising:

acquiring the second text of the first page, wherein the second text is obtained by translating the first text;

Identifying grammatical error objects with grammatical errors in the second text, and content error objects with content errors in the second text;

Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, so that all The first page displays the second text according to the grammatical content annotation tag.
The method for identifying translation errors according to claim 1, wherein before the acquiring the second text of the first page, the method further comprises:

The translation error identification method is configured to complete the first page with the second text in response to the browser loading.
The method for identifying translation errors according to claim 1, wherein after acquiring the second text of the first page, the method further comprises:

Identifying misspelled words and malformed characters in the second text to obtain misspelling objects and malformed objects, respectively.
The method for recognizing translation errors according to claim 3, wherein the step of recognizing misspelled words and formatted characters in the second text to obtain misspelled objects and formatted objects respectively, comprises:

Create lists of spelling errors and formatting errors;

Perform spelling proofreading on the vocabulary in the second text, take the vocabulary with the spelling error as the misspelling object and save it in the spelling error list;

It is judged whether the format of the second text complies with a preset format rule, the words and/or symbols in which the format is wrong are used as a format error object, and the format error object is saved in the format error list.
The method for identifying translation errors according to claim 1, wherein the step of identifying a grammatical error object with a grammatical error in the second text and a content error object with a content error in the second text comprises:

Perform word segmentation on the second text to obtain text words, and mark the parts of speech of the text words to obtain a sequence of parts of speech arranged according to the ordering of the text words;

Determine whether there is an erroneous logical sequence in the part-of-speech sequence through preset logic rules; if so, set the text word corresponding to the erroneous logical sequence as a grammatical error object; if not, determine the first 2. The text has no grammatical errors;

Translating the text word to obtain a back-translated word, and determining whether the back-translated word appears in the first text; if so, setting the text word corresponding to the back-translated word as a content error object; If not, it is determined that the second text has no grammatical errors;

After identifying the grammatical error object with grammatical errors in the second text, and identifying the content error object with content errors in the second text, the method further includes:

Uploading the syntax error object and/or content error object to the blockchain.
The method for recognizing translation errors according to claim 3, wherein after recognizing words with misspellings and characters with format errors in the second text to obtain misspelling objects and format error objects, respectively, the method Also includes:

Identifying the specified tag corresponding to the misspelling object and the format error object in the DOM tree of the first page, and setting the specified tag as a target tag; replacing the target tag with a spelling format annotation tag, causing the first page to display the second text in accordance with the spelling format callout tag.
The method for identifying translation errors according to claim 1, wherein after replacing the target tag with a grammatical content tagging tag, and causing the first page to display the second text according to the grammatical content tagging tag, the The method also includes:

Acquire the first URL of the first page, identify the second URL according to the first URL, and send the second URL to the browser.
A translation error identification device, comprising:

a text input module, configured to obtain the second text of the first page, wherein the second text is obtained by translating the first text;

An error recognition module for identifying a grammatical error object with a grammatical error in the second text, and a content error object with a content error in the second text;

An error labeling module, configured to identify the specified label corresponding to the syntax error object and the content error object in the DOM tree of the first page, and set the specified label as a target label; replace the target label with a grammar A content annotation tag, so that the first page displays the second text according to the grammatical content annotation tag.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the method for identifying translation errors is implemented when the processor of the computer device executes the computer program, The steps of the method for identifying translation errors include:

acquiring the second text of the first page, wherein the second text is obtained by translating the first text;

Identifying grammatical error objects with grammatical errors in the second text, and content error objects with content errors in the second text;

Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, so that all The first page displays the second text according to the grammatical content annotation tag.
The computer device according to claim 9, wherein before the acquiring the second text of the first page, the method further comprises:

The translation error identification method is configured to complete the first page with the second text in response to the browser loading.
The computer device according to claim 9, wherein after the acquiring the second text of the first page, the method further comprises:

Identifying misspelled words and malformed characters in the second text to obtain misspelled objects and malformed objects respectively;

The step of identifying the misspelled vocabulary and the wrongly formatted character in the second text to obtain the misspelled object and the wrongly formatted object, respectively, includes:

Create lists of spelling errors and formatting errors;

Perform spelling proofreading on the vocabulary in the second text, take the vocabulary with the spelling error as the misspelling object and save it in the spelling error list;

It is judged whether the format of the second text complies with a preset format rule, the words and/or symbols in which the format is wrong are used as a format error object, and the format error object is saved in the format error list.
The computer device according to claim 9, wherein the step of identifying a grammatical error object with a grammatical error in the second text and a content error object with a content error in the second text comprises:

Perform word segmentation on the second text to obtain text words, and mark the parts of speech of the text words to obtain a sequence of parts of speech arranged according to the ordering of the text words;

Determine whether there is an erroneous logical sequence in the part-of-speech sequence through preset logic rules; if so, set the text word corresponding to the erroneous logical sequence as a grammatical error object; if not, determine the first 2. The text has no grammatical errors;

Translating the text word to obtain a back-translated word, and determining whether the back-translated word appears in the first text; if so, setting the text word corresponding to the back-translated word as a content error object; If not, it is determined that the second text has no grammatical errors;

After identifying the grammatical error object with grammatical errors in the second text, and identifying the content error object with content errors in the second text, the method further includes:

Uploading the syntax error object and/or content error object to the blockchain.
The computer device according to claim 11, wherein after identifying the misspelled words and formatted characters in the second text to obtain the misspelling object and the format error object, respectively, the method further comprises: :

Identifying the specified tag corresponding to the misspelling object and the format error object in the DOM tree of the first page, and setting the specified tag as a target tag; replacing the target tag with a spelling format annotation tag, causing the first page to display the second text in accordance with the spelling format callout tag.
The computer device according to claim 9, wherein after replacing the target tag with a grammatical content tagging tag, and causing the first page to display the second text according to the grammatical content tagging tag, the method Also includes:

Acquire the first URL of the first page, identify the second URL according to the first URL, and send the second URL to the browser.
A computer-readable storage medium on which a computer program is stored, wherein, when the computer program stored in the readable storage medium is executed by a processor, the translation error identification method is implemented, and the translation The steps of the error identification method include:

acquiring the second text of the first page, wherein the second text is obtained by translating the first text;

Identifying grammatical error objects with grammatical errors in the second text, and content error objects with content errors in the second text;

Identify the specified tags corresponding to the grammatical error objects and the content error objects in the DOM tree of the first page, and set the specified tags as target tags; replace the target tags with grammatical content annotation tags, so that all The first page displays the second text according to the grammatical content annotation tag.
The readable storage medium of claim 15, wherein before the acquiring the second text of the first page, the method further comprises:

The translation error identification method is configured to complete the first page with the second text in response to the browser loading.
The readable storage medium of claim 15, wherein after the acquiring the second text of the first page, the method further comprises:

Identifying misspelled vocabulary and malformed characters in the second text to obtain misspelled objects and malformed objects respectively;

The step of identifying the misspelled vocabulary and the wrongly formatted character in the second text to obtain the misspelled object and the wrongly formatted object, respectively, includes:

Create lists of spelling errors and formatting errors;

Perform spelling proofreading on the vocabulary in the second text, take the vocabulary with the spelling error as the misspelling object and save it in the spelling error list;

It is judged whether the format of the second text complies with a preset format rule, the words and/or symbols in which the format is wrong are used as a format error object, and the format error object is saved in the format error list.
The readable storage medium according to claim 15, wherein the step of identifying a grammatical error object with a grammatical error in the second text and a content error object with a content error in the second text comprises:

Perform word segmentation on the second text to obtain text words, and mark the parts of speech of the text words to obtain a sequence of parts of speech arranged according to the ordering of the text words;

Determine whether there is an erroneous logical sequence in the part-of-speech sequence through preset logic rules; if so, set the text word corresponding to the erroneous logical sequence as a grammatical error object; if not, determine the first 2. The text has no grammatical errors;

Translating the text word to obtain a back-translated word, and determining whether the back-translated word appears in the first text; if so, setting the text word corresponding to the back-translated word as a content error object; If not, it is determined that the second text has no grammatical errors;

After identifying the grammatical error object with grammatical errors in the second text, and identifying the content error object with content errors in the second text, the method further includes:

Uploading the syntax error object and/or content error object to the blockchain.
The computer device according to claim 17, wherein after identifying words with misspellings and characters with formatting errors in the second text to obtain a misspelling object and a formatting error object, respectively, the method further comprises: :

Identifying the specified tag corresponding to the misspelling object and the format error object in the DOM tree of the first page, and setting the specified tag as a target tag; replacing the target tag with a spelling format annotation tag, causing the first page to display the second text in accordance with the spelling format callout tag.
The computer device according to claim 15, wherein after replacing the target tag with a grammatical content tagging tag, and causing the first page to display the second text according to the grammatical content tagging tag, the method Also includes:

Acquire the first URL of the first page, identify the second URL according to the first URL, and send the second URL to the browser.