CN112667208A - Translation error recognition method and device, computer equipment and readable storage medium - Google Patents

Translation error recognition method and device, computer equipment and readable storage medium Download PDF

Info

Publication number
CN112667208A
CN112667208A CN202011528714.1A CN202011528714A CN112667208A CN 112667208 A CN112667208 A CN 112667208A CN 202011528714 A CN202011528714 A CN 202011528714A CN 112667208 A CN112667208 A CN 112667208A
Authority
CN
China
Prior art keywords
text
error
grammar
content
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011528714.1A
Other languages
Chinese (zh)
Inventor
刘丽珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202011528714.1A priority Critical patent/CN112667208A/en
Publication of CN112667208A publication Critical patent/CN112667208A/en
Priority to PCT/CN2021/109257 priority patent/WO2022134577A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/38Creation or generation of source code for implementing user interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the field of computer development, and discloses a translation error recognition method, a translation error recognition device, computer equipment and a readable storage medium, wherein the translation error recognition method comprises the following steps: acquiring a second text of the first page; identifying a grammar error object with grammar error in the second text and a content error object with content error in the second text; identifying a designated tag corresponding to the grammar error object and the content error object in a DOM tree of the first page, and setting the designated tag as a target tag; replacing the target label with a grammar content labeling label, and enabling the first page to display the second text according to the grammar content labeling label. The invention realizes accurate recognition of the vocabulary with grammar error and content translation error in the second text so as to automatically recognize the part with error in the second text, thereby improving the recognition speed and the recognition comprehensiveness of the error, reducing the input of manpower and avoiding the omission of the error.

Description

Translation error recognition method and device, computer equipment and readable storage medium
Technical Field
The present invention relates to the field of computer development technologies, and in particular, to a translation error recognition method and apparatus, a computer device, and a readable storage medium.
Background
When designing a web page, a website usually adopts texts in multiple languages (such as a first text: chinese, a second text: english), however, when testing a page of a second text (english), because contents in the page are numerous and complicated, it is usually difficult to recognize errors occurring in the contents of each vocabulary translation and the grammar of each segment.
The inventor finds that even though a manager adopts a large amount of manpower to identify the errors of the pages in the website one by one, the process not only has long time consumption and low efficiency in inspection, but also is easy to miss parts with errors caused by omission.
Disclosure of Invention
The invention aims to provide a translation error recognition method, a translation error recognition device, computer equipment and a readable storage medium, which are used for solving the problems that in the prior art, a large amount of manpower is adopted to recognize errors of pages one by one, so that the inspection is long in time consumption and low in efficiency, and parts with errors are easily omitted due to omission.
To achieve the above object, the present invention provides a translation error recognition method for checking an error occurring in a second text obtained by translation of a first text, comprising:
acquiring a second text of the first page, wherein the second text is obtained by translating the first text;
identifying a grammar error object with grammar error in the second text and a content error object with content error in the second text;
identifying a designated tag corresponding to the grammar error object and the content error object in a DOM tree of the first page, and setting the designated tag as a target tag; replacing the target label with a grammar content labeling label, and enabling the first page to display the second text according to the grammar content labeling label.
In the foregoing solution, before the obtaining the second text of the first page, the method further includes:
the translation fault recognition method is configured to complete the first page with the second text in response to the browser loading.
In the foregoing solution, after the second text of the first page is obtained, the method further includes:
and identifying the vocabulary with misspelling and the character with misformatting in the second text so as to respectively obtain misspelling objects and misformatting objects.
In the foregoing solution, the step of identifying the vocabulary with the misspelling and the character with the misformatting in the second text to obtain the misspelling object and the misformatting object respectively includes:
creating a spelling error list and a format error list;
performing spelling correction on the vocabulary in the second text, taking the vocabulary with misspelling as misspelling objects and storing the misspelling objects into the misspelling list;
and judging whether the format of the second text accords with a preset format rule, taking the vocabulary and/or the symbol with the format error as a format error object, and storing the format error object into the format error list.
In the foregoing solution, the step of identifying the syntax error object in which the syntax error occurs in the second text and the content error object in which the content error occurs in the second text includes:
segmenting the second text to obtain text words, and marking the parts of speech of the text words to obtain a part of speech sequence arranged according to the sequence of the text words;
judging whether an error logic sequence occurs in the part of speech sequence or not according to a preset logic rule; if so, setting the text words corresponding to the logic sequence with errors as grammar error objects; if not, judging that the second text has no grammar error;
translating the text words to obtain translated words, and judging whether the translated words appear in the first text; if so, setting the text word corresponding to the retranslate word as a content error object; if not, judging that the second text has no grammar error;
after the identifying the grammar error object with grammar error in the second text and the identifying the content error object with content error in the second text, the method further comprises:
and uploading the syntax error object and/or the content error object to a block chain.
In the above scheme, after recognizing the vocabulary with misspelling and the character with misformatting in the second text to obtain a misspelling object and a misformatting object, respectively, the method further includes:
identifying a designated tag corresponding to the spelling error object and the format error object in a DOM tree of the first page, and setting the designated tag as a target tag; and replacing the target label with a spelling format marking label, and enabling the first page to display the second text according to the spelling format marking label.
In the foregoing solution, after replacing the target tag with a syntax content tagging tag and displaying the second text on the first page according to the syntax content tagging tag, the method further includes:
and acquiring a first URL of the first page, identifying a second URL according to the first URL, and sending the second URL to the browser. In order to achieve the above object, the present invention further provides a translation error recognition apparatus, including:
the text input module is used for acquiring a second text of the first page, wherein the second text is obtained by translating the first text;
the error identification module is used for identifying a grammar error object with grammar error in the second text and a content error object with content error in the second text;
the error labeling module is used for identifying the designated tag corresponding to the grammar error object and the content error object in the DOM tree of the first page and setting the designated tag as a target tag; replacing the target label with a grammar content labeling label, and enabling the first page to display the second text according to the grammar content labeling label.
To achieve the above object, the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor of the computer device implements the steps of the above translation error recognition method when executing the computer program.
To achieve the above object, the present invention further provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the translation error recognition method.
According to the translation error recognition method, the translation error recognition device, the computer equipment and the readable storage medium, the grammar error object with grammar error in the second text is recognized according to the part of speech of each vocabulary in the second text, the content error object with content error in the second text is recognized according to the matching of each vocabulary in the second text and the vocabulary in the first text, the vocabulary with grammar error and content translation error in the second text is recognized accurately, the part with error in the second text is recognized automatically, the recognition speed and the recognition comprehensiveness of errors are improved, the input of manpower is reduced, the omission of errors is avoided, and the situation that a user using the second text as a native language cannot accurately grasp the information transmitted by the second text is avoided.
By identifying the designated tags corresponding to the grammar error object and the content error object in the DOM tree of the first page and replacing the designated tags with grammar content tagging tags, and further tagging the grammar error object and the content error object, not only is the effect of accurately locating the position of the second text in the first page realized, but also the situation that the DOM tree is polluted and the location of the Element in the DOM number is influenced due to the fact that the grammar error object and the content error object in the first page are directly tagged in a highlight replacement mode is avoided.
Drawings
FIG. 1 is a flowchart of a translation error recognition method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an environmental application of a translation error recognition method according to a second embodiment of the translation error recognition method of the present invention;
FIG. 3 is a flowchart illustrating a detailed method of a translation error recognition method according to a second embodiment of the translation error recognition method of the present invention;
FIG. 4 is a block diagram of a third embodiment of a translation error recognition apparatus according to the present invention;
fig. 5 is a schematic diagram of a hardware structure of a computer device according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a translation error recognition method, a translation error recognition device, computer equipment and a readable storage medium, which are suitable for the technical field of UI design of computer development and are used for providing a translation error recognition method based on a text input module, an error recognition module, an error marking module, a triggering module, a configuration input module, a spelling format recognition module and a preloading module. The method comprises the steps of identifying a grammar error object with grammar error in a second text and identifying a content error object with content error in the second text; identifying the grammar error object and/or the content error object, corresponding to a designated tag in a DOM tree of the first page, and setting the designated tag as a target tag; replacing the target label with a grammar content labeling label, and enabling the first page to display the second text according to the grammar content labeling label.
The first embodiment is as follows:
referring to fig. 1, a translation error recognition method of the present embodiment is used for checking an error occurring in a second text obtained by translating a first text, and includes:
s102: acquiring a second text of the first page, wherein the second text is obtained by translating the first text;
s105: identifying a grammar error object with grammar error in the second text and a content error object with content error in the second text;
s107: identifying a designated tag corresponding to the grammar error object and the content error object in a DOM tree of the first page, and setting the designated tag as a target tag; replacing the target label with a grammar content labeling label, and enabling the first page to display the second text according to the grammar content labeling label.
In an exemplary embodiment, a DOM tree is extracted from the first page, and the text information corresponding to the preset specified tag is acquired from the DOM tree and summarized to form a second text.
According to the part of speech of each vocabulary in the second text, a grammar error object with grammar error in the second text is identified, the vocabulary in the second text is matched with the vocabulary in the first text to identify a content error object with content error in the second text, the vocabulary with grammar error and content translation error in the second text is accurately identified, the part with error in the second text is automatically identified, the identification speed and the identification comprehensiveness of the error are improved, the labor input is reduced, the omission of the error is avoided, and the situation that a user using the second text as a native language cannot accurately grasp the information transmitted by the second text is avoided.
By identifying the designated tags corresponding to the grammar error object and the content error object in the DOM tree of the first page and replacing the designated tags with grammar content tagging tags, and further tagging the grammar error object and the content error object, not only is the effect of accurately locating the position of the second text in the first page realized, but also the situation that the DOM tree is polluted and the location of the Element in the DOM number is influenced due to the fact that the grammar error object and the content error object in the first page are directly tagged in a highlight replacement mode is avoided.
Example two:
the embodiment is a specific application scenario of the first embodiment, and the method provided by the present invention can be more clearly and specifically explained through the embodiment.
The method provided in this embodiment is specifically described below by taking as an example that, in a server running a translation error recognition method, a syntax error object in which a syntax error occurs in the second text is recognized, a content error object in which a content error occurs in the second text is recognized, and the syntax error object and/or the content error object are/is labeled. It should be noted that the present embodiment is only exemplary, and does not limit the protection scope of the embodiments of the present invention.
Fig. 2 schematically shows an environment application diagram of a translation error recognition method according to the second embodiment of the present application.
In an exemplary embodiment, the servers 2 in which the translation error recognition method is located are respectively connected with the clients 4 through the network 3; the server 2 may provide services through one or more networks 3, which networks 3 may include various network devices, such as routers, switches, multiplexers, hubs, modems, bridges, repeaters, firewalls, proxy devices, and/or the like. The network 3 may include physical links, such as coaxial cable links, twisted pair cable links, fiber optic links, combinations thereof, and/or the like. The network 3 may include wireless links, such as cellular links, satellite links, Wi-Fi links, and/or the like; the client 4 may be a computer device such as a smart phone, a tablet computer, a notebook computer, and a desktop computer.
Fig. 3 is a flowchart of a specific method of a translation error recognition method according to an embodiment of the present invention, where the method specifically includes steps S201 to S208.
S201: completing the first page with the second text in response to the browser loading.
In order to avoid the slow loading rate of the first page of the browser caused by the implementation sharing of the computing power of the browser, when the browser finishes translating the first text to obtain the second text and finishes loading the first page with the second text, the method for identifying the translation error is triggered, the browser is prevented from being influenced to translate the first text and load the process with the second text, and the translation and loading rates of the first page are guaranteed.
And when the manager modifies the error in the second text by the translation error recognition method and then loads the first page with the second text on the browser again, the translation error recognition method is triggered again to ensure that the manager performs error recognition on the modified second text again, so that the efficiency of error recognition and modification operation in the second text is improved.
The browser runs on a client and is used for displaying the first page so as to be convenient for a developer to observe and use.
S202: and acquiring a second text of the first page, wherein the second text is obtained by translating the first text.
In the step, a DOM tree is extracted from the first page, and the text information corresponding to the preset specified label is obtained from the DOM tree and summarized to form a second text.
In an exemplary embodiment, a DOM tree is extracted from the first page by document BODY innerrhtml code, and a BODY part of the DOM tree is obtained.
The appointed tag is an HTML tag set according to requirements, the HTML tag is a hypertext markup language (HTML) markup tag which is the most basic unit in the HTML, and the HTML tag is the most important component of the HTML (an application in a standard general markup language) and is used for marking texts, pictures, input boxes and the like in the first page; for example: the designated tag includes < div > < span > < a > < button > < label > < ui > < li > < input >.
And acquiring the specified text corresponding to the specified tag of the DOM tree (such as the BODY part of the example DOM tree) through a getElementsByTagName (tagName) lnnerHTML () code, and summarizing the specified text to obtain the second text.
Further, the data attribute value of the designated tag can be obtained in a bj.dataset.xx manner; and obtaining a value attribute value of the specified label in an obj.value mode, and associating the data attribute value and/or the value attribute value with the specified text to form a function text, wherein the function text reflects the specified attribute of each specified text in the second text, so that a manager can conveniently identify and manage the definition and the specification of each specified text.
S203: extracting text information in the configuration file of the browser and loading the text information into the second text;
because the text displayed on the page is not limited to the page itself, but also comprises the pop-up box generated by operating on the page (such as clicking a button on the page), in order to avoid that the text on the pop-up box is translated incorrectly and the user cannot accurately obtain the information to be conveyed by the pop-up box, the text information displayed on the pop-up box is also included in the second text in a mode of extracting the text information in the configuration file of the browser and loading the text information into the second text, and further the problem that the second text is not checked comprehensively due to the fact that the text information on the pop-up box is omitted when the second text is checked is avoided.
S204: and identifying the vocabulary with misspelling and the character with misformatting in the second text so as to respectively obtain misspelling objects and misformatting objects.
In order to realize automatic recognition of the part with errors in the second text, improve the error recognition speed, reduce the labor input and further avoid the situation that a user using the second text as a native language cannot accurately grasp the information transmitted by the second text, the step identifies the vocabulary with misspelling and the characters with wrong format in the second text to respectively obtain a misspelling object and a misformat object; the method improves the speed and the comprehensiveness of the error recognition, reduces the labor input and avoids the omission of the error.
In a preferred embodiment, the step of identifying the misspelled vocabulary and the misformatted characters in the second text to obtain the misspelled object and the misformatted object respectively comprises:
s41: a spelling error list and a format error list are created.
In this step, the spelling error list "spellingerror list" and the format error list "are created to achieve normalized management of the spelling errors and the format errors occurring in the second text, so that the spelling errors and the format errors are labeled in different labeling modes in the following process, and the administrator can check and recognize the spelling errors and the format errors conveniently.
In this embodiment, a memory space can be opened up in a memory module of a computer device to serve as a storage space for the spelling error list and the format error list.
Preferably, the duplication removal is performed on the spelling errors in the spelling error list to reduce the memory occupation of the spelling error list, and the duplication removal is performed on the format errors in the format error list to reduce the memory occupation of the format error list.
S42: and performing spelling correction on the vocabulary in the second text, and taking the vocabulary with misspelling as misspelling objects and storing the misspelling objects into the misspelling list.
In this step, the vocabulary in the second text is sequentially used as variables of a preset regular expression to obtain a vocabulary expression, and whether the dictionary has the vocabulary is searched through the vocabulary expression, so that the technical effect of spelling and proofreading the vocabulary in the second text is realized; if the dictionary does not have the vocabulary, determining that the vocabulary has spelling errors; if the vocabulary is in the dictionary, the vocabulary is judged not to have spelling errors.
Wherein a vocabulary list corpus for natural language processing is employed as the dictionary.
S43: and judging whether the format of the second text accords with a preset format rule, taking the vocabulary and/or the symbol with the format error as a format error object, and storing the format error object into the format error list.
In this step, the format rules include case rules and punctuation rules, such as: the capital and lowercase rules are that the initial letters of the words behind the carriage return symbol (\ r, return), the line feed symbol (\\ n, newline) and the tab separator are capital letters, and the punctuation rules are that all punctuation symbols are punctuation symbols of the language corresponding to the second text.
The case rule and the punctuation rule can be set according to needs, and the set case rule and the punctuation rule are stored in a preset configuration file so as to be convenient for calling and judging whether the second text has a format error.
S205: and identifying a grammar error object with grammar error in the second text and a content error object with content error in the second text. In order to realize automatic recognition of a part with an error in a second text, improve error recognition speed, reduce labor input and further avoid the situation that a user using the second text as a native language cannot accurately grasp information transmitted by the second text, the step of recognizing a grammar error object with a grammar error in the second text according to the part of speech of each vocabulary in the second text and matching each vocabulary in the second text with the vocabulary in the first text to recognize a content error object with a content error in the second text so as to accurately recognize the vocabulary with the grammar error and the content translation error in the second text; the method improves the speed and the comprehensiveness of the error recognition, reduces the labor input and avoids the omission of the error.
In a preferred embodiment, the step of identifying a syntax error object in which a syntax error occurs in the second text and identifying a content error object in which a content error occurs in the second text includes:
s51: and segmenting the second text to obtain text words, and labeling the parts of speech of the text words to obtain a part of speech sequence arranged according to the sequence of the text words.
In this step, the second text paragraphs and the sentences are segmented through a preset word segmentation library, and the whole text is segmented into word segments which can be labeled, so that the second text is segmented to obtain text words; and marking the part of speech of the text words through a preset dictionary, and arranging the text words with the part of speech according to the sequence of the text words in the second text to obtain a part of speech sequence.
It should be noted that a third-party word segmentation library ICTCLAS is used as the word segmentation library to perform word segmentation on the second text, wherein ICTCLAS refers to a Chinese Lexical Analysis System (Institute of Computing Technology), and its main functions include Chinese word segmentation, part-of-speech tagging, named entity recognition, new word recognition, and user dictionary support.
The dictionary refers to a dictionary with part-of-speech tags of a language corresponding to the second text, such as: natural Language Processing Dictionary (The Natural Language Processing Dictionary), Prolog Dictionary (The Prolog Dictionary), Artificial Intelligence Dictionary (The intellectual understanding Dictionary), Machine Learning Dictionary (The Machine Learning Dictionary), and The like.
S52: judging whether an error logic sequence occurs in the part of speech sequence or not according to a preset logic rule; if so, setting the text words corresponding to the logic sequence with errors as grammar error objects; if not, determining that the second text has no grammar error.
In this step, a preset error grammar template is used as the logic rule, where the error grammar template is a part-of-speech error sequence constructed by at least one part-of-speech, for example: verb, and verb.
Judging whether the part-of-speech error sequence appears in the part-of-speech sequence; if so (for example, a certain section in the part-of-speech sequence has three continuous verbs arranged in sequence), setting the text word corresponding to the part-of-speech error sequence as a grammar error object; if not, determining that the second text has no grammar error.
In this embodiment, the logic rule is stored by constructing a grammar library, where the logic rule is stored in the grammar library in an XML format, and an error grammar template in the grammar library is defined according to a pattern.xsd and/or rule.xsd template and is used as the logic rule, so that the logic rule becomes a regular expression, and text words with grammar errors in a part-of-speech sequence are conveniently searched and set as grammar error objects. Meanwhile, the logic rule can be defined according to other formats.
S53: translating the text words to obtain translated words, and judging whether the translated words appear in the first text; if so, setting the text word corresponding to the retranslate word as a content error object; if not, determining that the second text has no grammar error.
In the step, a multithreading processing mode is adopted to translate the text words to obtain translated words, and the translated words are used as variables of a preset regular expression to obtain a retrieval expression; retrieving the first text through the retrieval expression to judge whether the retraced word appears in the first text; if so, setting the text word corresponding to the retranslate word as a content error object; if not, determining that the second text has no grammar error.
It should be noted that, because the texts with the association relationships are translated, the words with the association relationships in the texts need to be translated by the same thread, and the texts without word segmentation cannot be translated by using a multithread processing mode; the text words in the step are obtained by segmenting the second text, so that the obtained text words have no incidence relation, the text words can be directly translated in a multithreading mode, and the technical effects of translating the text words and checking the text words are improved.
After the identifying the grammar error object with grammar error in the second text and the identifying the content error object with content error in the second text, the method further comprises:
and uploading the syntax error object and/or the content error object to a block chain.
It should be noted that, the corresponding digest information is obtained based on the syntax error object and/or the content error object, and specifically, the digest information is obtained by performing hash processing on the syntax error object and/or the content error object, for example, by using the sha256s algorithm. Uploading summary information to the blockchain can ensure the safety and the fair transparency of the user. The user equipment may download the summary information from the blockchain to verify whether the syntax error object and/or the content error object are tampered. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
S206: identifying a designated tag corresponding to the spelling error object and the format error object in a DOM tree of the first page, and setting the designated tag as a target tag; (ii) a And replacing the target label with a spelling format marking label, and enabling the first page to display the second text according to the spelling format marking label.
In order to mark the spelling error object and the format error object on the first page so as to be convenient for a manager to recognize, the steps identify the corresponding appointed tags of the spelling error object and the format error object in the DOM tree of the first page and replace the appointed tags with the spelling format marked tags, so as to mark the spelling error object and the format error object, thereby not only realizing the effect of accurately positioning the position of the error of the second text in the first page, but also avoiding the condition that the DOM tree is polluted and the positioning of elements (elements) in DOM numbers is influenced due to the fact that the spelling error object and the format error object in the first page are directly marked in a highlight replacement mode.
In this embodiment, the content of the spell format label tag can be set as required, for example: highlighting the label, annotating the label and the like, so that the text corresponding to the spelling format label is displayed in the first page in the forms of highlighting, annotating and the like.
Illustratively, firstly, in a DOM tree of the first page, tags with interference characteristics are cleaned in an element.
Obtaining all child nodes through parent nodes to obtain all child nodes (namely child nodes) in the DOM tree, identifying the designated labels corresponding to the spelling error objects and/or format error objects in the child nodes, and setting the designated labels as target labels;
finally, replacing the target tag in the child node with a spelling format tagging tag in a manner of lnnertext. < span id ═ child' > < b keyword 2</span > (recursive processing: replace operation is performed when a child node does not contain a child node). The canonical substitution expression used is as follows:
replace (/ [ - \\\ \ \ \ $? () | [ \\{ } "is highlighted by cycling.
The content in the middle brackets [ ] and the content in the middle brackets are spelling error objects and format error objects, and the meaning expressed by combining the middle brackets is any one of characters (single characters) in the middle brackets.
S207: identifying a designated tag corresponding to the grammar error object and the content error object in a DOM tree of the first page, and setting the designated tag as a target tag; replacing the target label with a grammar content labeling label, and enabling the first page to display the second text according to the grammar content labeling label.
In order to mark the grammar error object and the content error object on the first page so as to be convenient for a manager to recognize, the grammar error object and the content error object are marked by recognizing the corresponding designated tags of the grammar error object and the content error object in the DOM tree of the first page and replacing the designated tags with the grammar content marking tags, so that the grammar error object and the content error object are marked, the effect of accurately positioning the position of the second text in the first page is realized, and the phenomena that the DOM tree is polluted and the positioning of elements in DOM numbers is influenced due to the fact that the grammar error object and the content error object in the first page are marked directly in a highlight replacement mode are avoided.
In this embodiment, the content of the syntax content tagging tag may be set as required, for example: highlighting the label, annotating the label and the like, so that the text corresponding to the label is labeled to the grammar content, and the grammar content is displayed in the first page in the forms of highlighting, annotating and the like.
Illustratively, firstly, in a DOM tree of the first page, tags with interference characteristics are cleaned in an element.
Obtaining all child nodes through parent nodes to obtain all child nodes (namely child nodes) in the DOM tree, identifying the designated labels corresponding to the grammar error objects and/or the content error objects in the child nodes, and setting the designated labels as target labels;
finally, replacing the target tag in the child node with a grammar content tagging tag in a manner of lnnertext. < span id ═ child' > < b keyword 2</span > (recursive processing: replace operation is performed when a child node does not contain a child node). The canonical substitution expression used is as follows:
replace (/ [ - \\\ \ \ \ $? () | [ \\{ } "is highlighted by cycling.
The content in the middle brackets [ ] and the inside is a grammar error object and a content error object, and the meaning expressed by combining the middle brackets is any one of characters (single characters) in the inside.
S208: and acquiring a first URL of the first page, identifying a second URL according to the first URL, and sending the second URL to the browser.
Since a website often has a plurality of pages, if a manager loads pages one by one, the problem of excessive checking work operation and low efficiency is caused, and in order to improve the checking efficiency of texts in the pages, in this step, by acquiring a first URL of a first page, identifying a second URL according to the first URL, and sending the second URL to the browser, the browser can preload a second page having the second text, or directly load the second page after the execution of S207 is completed, so as to improve the efficiency of the manager continuing to check the error of the second text in the second page.
In this embodiment, a first URL of the first page is obtained, the server where the first URL is located is accessed, a second URL next to the first URL is obtained, and the second URL is sent to the browser.
For example, the first URL is: html, accessing a server where the first URL is located, and obtaining a second URL located next to the first URL: https:// zhidao.baidu.com/query/3591602. html.
It should be noted that the URL is a uniform resource locator and is an address of a standard resource on the internet. Each file bai on the internet has a unique URL that contains information indicating the location of the file.
Example three:
referring to fig. 4, a translation error recognition apparatus 1 of the present embodiment includes:
the text input module 12 is configured to acquire a second text of the first page, where the second text is obtained by translating the first text;
an error recognition module 15, configured to recognize a syntax error object in which a syntax error occurs in the second text and a content error object in which a content error occurs in the second text;
an error labeling module 17, configured to identify a specified tag in a DOM tree of the first page, where the specified tag corresponds to the syntax error object and the content error object, and set the specified tag as a target tag; replacing the target label with a grammar content labeling label, and enabling the first page to display the second text according to the grammar content labeling label.
Optionally, the translation error recognition apparatus 1 further includes:
a triggering module 11, configured to cause the translation error recognition apparatus 1 to be set to complete the first page with the second text in response to the browser loading.
Optionally, the translation error recognition apparatus 1 further includes:
and the configuration input module 13 is configured to extract text information in the configuration file of the browser and load the text information into the second text.
Optionally, the translation error recognition apparatus 1 further includes:
and the spelling format module 14 is used for identifying the vocabulary with misspelling and the characters with misformatting in the second text so as to respectively obtain misspelling objects and misformatting objects.
Optionally, the translation error recognition apparatus 1 further includes:
a spelling format recognition module 16, configured to recognize a designated tag corresponding to the spelling error object and the format error object in a DOM tree of the first page, and set the designated tag as a target tag; and replacing the target label with a spelling format marking label, and enabling the first page to display the second text according to the spelling format marking label.
Optionally, the translation error recognition apparatus 1 further includes:
the preloading module 18 is configured to obtain a first URL of the first page, identify a second URL according to the first URL, and send the second URL to the browser.
The technical scheme is applied to the field of UI design of computer development, and the first page is changed into an H5 page for displaying the second text according to the grammar content marking label by identifying the grammar error object with grammar errors in the second text, identifying the content error object with content errors in the second text, and replacing the grammar error object and/or the designated label corresponding to the content error object with the grammar content marking label.
Example four:
in order to achieve the above object, the present invention further provides a computer device 5, where components of the translation error recognition apparatus 1 according to the third embodiment may be distributed in different computer devices, and the computer device 5 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including an independent server or a server cluster formed by multiple application servers) that executes programs. The computer device of the embodiment at least includes but is not limited to: a memory 51, a processor 52, which may be communicatively coupled to each other via a system bus, as shown in FIG. 5. It should be noted that fig. 5 only shows a computer device with components, but it should be understood that not all of the shown components are required to be implemented, and more or fewer components may be implemented instead.
In this embodiment, the memory 51 (i.e., a readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 51 may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the memory 51 may be an external storage device of a computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device. Of course, the memory 51 may also include both internal and external storage devices of the computer device. In this embodiment, the memory 51 is generally used for storing an operating system and various application software installed in the computer device, such as the program code of the translation error recognition apparatus in the third embodiment. Further, the memory 51 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 52 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 52 is typically used to control the overall operation of the computer device. In this embodiment, the processor 52 is configured to run the program code stored in the memory 51 or process data, for example, run the translation error recognition apparatus, so as to implement the translation error recognition methods of the first and second embodiments.
Example five:
to achieve the above objects, the present invention also provides a computer readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor 52, implements corresponding functions. The computer readable storage medium of the present embodiment is used for storing a translation error recognition apparatus, and when being executed by the processor 52, implements the translation error recognition method of the first embodiment and the second embodiment.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A translation error recognition method for checking an error occurring in a second text translated from a first text, comprising:
acquiring a second text of the first page, wherein the second text is obtained by translating the first text;
identifying a grammar error object with grammar error in the second text and a content error object with content error in the second text;
identifying a designated tag corresponding to the grammar error object and the content error object in a DOM tree of the first page, and setting the designated tag as a target tag; replacing the target label with a grammar content labeling label, and enabling the first page to display the second text according to the grammar content labeling label.
2. The method of claim 1, wherein prior to obtaining the second text of the first page, the method further comprises:
the translation fault recognition method is configured to complete the first page with the second text in response to the browser loading.
3. The method of claim 1, wherein after obtaining the second text of the first page, the method further comprises:
and identifying the vocabulary with misspelling and the character with misformatting in the second text so as to respectively obtain misspelling objects and misformatting objects.
4. The translation error recognition method according to claim 3, wherein the step of recognizing the misspelled vocabulary and the misformatted characters in the second text to obtain the misspelled object and the misformatted object, respectively, comprises:
creating a spelling error list and a format error list;
performing spelling correction on the vocabulary in the second text, taking the vocabulary with misspelling as misspelling objects and storing the misspelling objects into the misspelling list;
and judging whether the format of the second text accords with a preset format rule, taking the vocabulary and/or the symbol with the format error as a format error object, and storing the format error object into the format error list.
5. The translation error recognition method according to claim 1, wherein the step of recognizing the grammar error object in which the grammar error occurs in the second text and the content error object in which the content error occurs in the second text includes:
segmenting the second text to obtain text words, and marking the parts of speech of the text words to obtain a part of speech sequence arranged according to the sequence of the text words;
judging whether an error logic sequence occurs in the part of speech sequence or not according to a preset logic rule; if so, setting the text words corresponding to the logic sequence with errors as grammar error objects; if not, judging that the second text has no grammar error;
translating the text words to obtain translated words, and judging whether the translated words appear in the first text; if so, setting the text word corresponding to the retranslate word as a content error object; if not, judging that the second text has no grammar error;
after the identifying the grammar error object with grammar error in the second text and the identifying the content error object with content error in the second text, the method further comprises:
and uploading the syntax error object and/or the content error object to a block chain.
6. The translation error recognition method of claim 3, wherein after recognizing misspelled words and misformatted characters in the second text to obtain misspelled objects and misformatted objects, respectively, the method further comprises:
identifying a designated tag corresponding to the spelling error object and the format error object in a DOM tree of the first page, and setting the designated tag as a target tag; and replacing the target label with a spelling format marking label, and enabling the first page to display the second text according to the spelling format marking label.
7. The method of claim 1, wherein the replacing the target tag with a grammar content tag causes the first page to display the second text according to the grammar content tag, and the method further comprises:
and acquiring a first URL of the first page, identifying a second URL according to the first URL, and sending the second URL to the browser.
8. A translation error recognition apparatus, comprising:
the text input module is used for acquiring a second text of the first page, wherein the second text is obtained by translating the first text;
the error identification module is used for identifying a grammar error object with grammar error in the second text and a content error object with content error in the second text;
the error labeling module is used for identifying the designated tag corresponding to the grammar error object and the content error object in the DOM tree of the first page and setting the designated tag as a target tag; replacing the target label with a grammar content labeling label, and enabling the first page to display the second text according to the grammar content labeling label.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the translation error recognition method according to any of claims 1 to 7 are implemented by the processor of the computer device when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program stored in the computer-readable storage medium, when being executed by a processor, implements the steps of the translation error recognition method according to any one of claims 1 to 7.
CN202011528714.1A 2020-12-22 2020-12-22 Translation error recognition method and device, computer equipment and readable storage medium Pending CN112667208A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011528714.1A CN112667208A (en) 2020-12-22 2020-12-22 Translation error recognition method and device, computer equipment and readable storage medium
PCT/CN2021/109257 WO2022134577A1 (en) 2020-12-22 2021-07-29 Translation error identification method and apparatus, and computer device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011528714.1A CN112667208A (en) 2020-12-22 2020-12-22 Translation error recognition method and device, computer equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN112667208A true CN112667208A (en) 2021-04-16

Family

ID=75407584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011528714.1A Pending CN112667208A (en) 2020-12-22 2020-12-22 Translation error recognition method and device, computer equipment and readable storage medium

Country Status (2)

Country Link
CN (1) CN112667208A (en)
WO (1) WO2022134577A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022134577A1 (en) * 2020-12-22 2022-06-30 深圳壹账通智能科技有限公司 Translation error identification method and apparatus, and computer device and readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114881011B (en) * 2022-07-12 2022-09-23 中国人民解放军国防科技大学 Multichannel Chinese text correction method, device, computer equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662932A (en) * 2012-03-15 2012-09-12 中国科学院自动化研究所 Method for establishing tree structure and tree-structure-based machine translation system
US8296124B1 (en) * 2008-11-21 2012-10-23 Google Inc. Method and apparatus for detecting incorrectly translated text in a document
CN103365838A (en) * 2013-07-24 2013-10-23 桂林电子科技大学 Method for automatically correcting syntax errors in English composition based on multivariate features
CN107329958A (en) * 2017-06-08 2017-11-07 努比亚技术有限公司 Language transfer method and device based on webpage
CN108519974A (en) * 2018-03-31 2018-09-11 华南理工大学 English composition automatic detection of syntax error and analysis method
CN108804428A (en) * 2018-06-12 2018-11-13 苏州大学 Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation
CN111695343A (en) * 2020-06-23 2020-09-22 深圳壹账通智能科技有限公司 Wrong word correcting method, device, equipment and storage medium
CN111767709A (en) * 2019-03-27 2020-10-13 武汉慧人信息科技有限公司 Logic method for carrying out error correction and syntactic analysis on English text
CN112015430A (en) * 2020-09-07 2020-12-01 平安国际智慧城市科技股份有限公司 JavaScript code translation method and device, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083845B (en) * 2019-04-25 2023-06-16 四川语言桥信息技术有限公司 Webpage translation method and system
CN112100063B (en) * 2020-08-31 2022-03-01 腾讯科技(深圳)有限公司 Interface language display test method and device, computer equipment and storage medium
CN112667208A (en) * 2020-12-22 2021-04-16 深圳壹账通智能科技有限公司 Translation error recognition method and device, computer equipment and readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8296124B1 (en) * 2008-11-21 2012-10-23 Google Inc. Method and apparatus for detecting incorrectly translated text in a document
CN102662932A (en) * 2012-03-15 2012-09-12 中国科学院自动化研究所 Method for establishing tree structure and tree-structure-based machine translation system
CN103365838A (en) * 2013-07-24 2013-10-23 桂林电子科技大学 Method for automatically correcting syntax errors in English composition based on multivariate features
CN107329958A (en) * 2017-06-08 2017-11-07 努比亚技术有限公司 Language transfer method and device based on webpage
CN108519974A (en) * 2018-03-31 2018-09-11 华南理工大学 English composition automatic detection of syntax error and analysis method
CN108804428A (en) * 2018-06-12 2018-11-13 苏州大学 Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation
CN111767709A (en) * 2019-03-27 2020-10-13 武汉慧人信息科技有限公司 Logic method for carrying out error correction and syntactic analysis on English text
CN111695343A (en) * 2020-06-23 2020-09-22 深圳壹账通智能科技有限公司 Wrong word correcting method, device, equipment and storage medium
CN112015430A (en) * 2020-09-07 2020-12-01 平安国际智慧城市科技股份有限公司 JavaScript code translation method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022134577A1 (en) * 2020-12-22 2022-06-30 深圳壹账通智能科技有限公司 Translation error identification method and apparatus, and computer device and readable storage medium

Also Published As

Publication number Publication date
WO2022134577A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
US10423649B2 (en) Natural question generation from query data using natural language processing system
US10545999B2 (en) Building features and indexing for knowledge-based matching
CN114616572A (en) Cross-document intelligent writing and processing assistant
EP4044047A1 (en) Patent document creating device, method, computer program, computer-readable recording medium, server and system
US9122674B1 (en) Use of annotations in statistical machine translation
CN111177532A (en) Vertical search method, device, computer system and readable storage medium
CN112015430A (en) JavaScript code translation method and device, computer equipment and storage medium
US20210073257A1 (en) Logical document structure identification
US20140244234A1 (en) Chinese name transliteration
KR101509727B1 (en) Apparatus for creating alignment corpus based on unsupervised alignment and method thereof, and apparatus for performing morphological analysis of non-canonical text using the alignment corpus and method thereof
CN113986864A (en) Log data processing method and device, electronic equipment and storage medium
US8880391B2 (en) Natural language processing apparatus, natural language processing method, natural language processing program, and computer-readable recording medium storing natural language processing program
CN112667208A (en) Translation error recognition method and device, computer equipment and readable storage medium
US20160062965A1 (en) Generation of parsable data for deep parsing
Wong et al. iSentenizer‐μ: Multilingual Sentence Boundary Detection Model
CN106372232B (en) Information mining method and device based on artificial intelligence
US20080040352A1 (en) Method for creating a disambiguation database
KR20210013991A (en) Apparatus, method, computer program, computer-readable storage device, server and system for drafting patent document
CN107862045B (en) Cross-language plagiarism detection method based on multiple features
CN113987320A (en) Real-time information crawler method, device and equipment based on intelligent page analysis
CN110489528B (en) Electronic dictionary reconstruction method based on electronic book content and computing equipment
CN110543641A (en) chinese and foreign language information comparison method and device
CN113139145B (en) Page generation method and device, electronic equipment and readable storage medium
CN114742051A (en) Log processing method, device, computer system and readable storage medium
WO2014049310A2 (en) Method and apparatuses for interactive searching of electronic documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40045418

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210416

WD01 Invention patent application deemed withdrawn after publication